最近还有个项目是打算重构@晨旭 的音乐台,其中一项就是如何通过非录屏的方式实现类似(hocassian.cn/SC)的音频可视化效果:

关于ffmpeg的音频可视化(music visualization)

首先考虑的肯定是ffmpeg这款工业级编码神器了,但我在国内的网络上找了一圈也没找到,都是这种模式的:

关于ffmpeg的音频可视化(music visualization)

https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/index.html

正当我打算放弃的时候,机缘巧合之下最后在github上搜到了这个:

https://github.com/mfcc64/showcqt-bash

虽然只是一个只有2star的shell脚本,但实际效果真的不错:

关于ffmpeg的音频可视化(music visualization)

关于ffmpeg的音频可视化(music visualization)

#!/bin/bash

# Usage:
# showcqtlow-encode background.png input-audio.mp3 [output-video.mp4]

background="$1"
input="$2"

if test "_$3" = "_"; then
    output="`basename "$input"`.showcqtbar.mp4"
else
    output="$3"
fi

firequalizer="
    firequalizer    =
        gain        = '1.4884e8 * f*f*f / (f*f + 424.36) / (f*f + 1.4884e8) / sqrt(f*f + 25122.25)':
        scale       = linlin:
        wfunc       = hamming:
        zero_phase  = on:
        fft2        = on
"

background_in="scale=1920:1080, format=gbrp"

audio_in="
    asettb              =1/sr,
    asetpts             = N,
    aformat             =
        channel_layouts = stereo,
    asplit [ao],
    atrim               =
        start_pts       = 0,
    afifo
"

showcqt="
    showcqt         =
        fps         = 60:
        size        = 1920x564:
        count       = 1:
        csp         = bt709:
        bar_g       = 2:
        sono_g      = 4:
        bar_v       = 3:
        sono_v      = 19:
        sono_h      = 0:
        axis_h      = 36:
        bar_t       = 0.5:
        axis        = 0:
        tc          = 0.33:
        attack      = 0.033:
        tlength     = 'st(0,0.17); 384*tc / (384 / ld(0) + tc*f /(1-ld(0))) + 384*tc / (tc*f / ld(0) + 384 /(1-ld(0)))',
    format  = rgb24,
    format  = gbrp
"

stack="
    split [vstack],
    crop    =
        w   = iw:
        h   = ih - 48:
        x   = 0:
        y   = 0,
    vflip [vstack2];

    [vstack][vstack2]
    vstack
"

merge_alpha="
    mergeplanes =
        format  = gbrap:
        mapping = 0x00010200
"

filter_complex="
    $background_in [overlay_image];
    $audio_in,
    $firequalizer,
    $showcqt,
    $stack,
    split [overlay_base],
    $merge_alpha [overlay_top];

    [overlay_base][overlay_image]
    overlay                 =
        format              = gbrp [overlay_middle];

    [overlay_middle][overlay_top]
    overlay                 =
        format              = gbrp:
        alpha               = premultiplied,

    scale                   =
        out_color_matrix    = bt709,
    format                  = yuv420p [vo]
"

ffmpeg -i "$background" -i "$input" -ss 00:00:15 -to 00:00:25 -filter_complex "$filter_complex" -codec:a aac -b:a 384k \
    -codec:v libx264 -crf 22 -qcomp 0.7 -preset fast -movflags faststart \
    -map '[vo]' -map '[ao]' -colorspace bt709 -color_range tv \
    -color_primaries bt709 -color_trc bt709 "$output"

具体参数汉化:https://www.jianshu.com/p/0ad6e9526487

在参考了各种大佬的研究结果之后,发现这种效果已经是最好的了……所以考虑下是否要投入实战哈哈(总之骑驴找马,先用着,边用边看看有没有更好的选择),感觉是ffmpeg开发组怕大家滥用所以故意没整新花样……


人生有無數種可能,人生有無限的精彩,人生沒有盡頭。一個人只要足夠的愛自己,尊重自己內心的聲音,就算是真正的活著。