Hardware Encoding with Apple silicon using videotoolbox (H.265)

Background info

I have videos from a few years ago where I used a video editing software to cut and export videos I recorded with a GoPro. There were limited export options available and I wasn’t very familiar with video compression, so I chose the highest setting to not make the quality worse.

This produced large H.264 video files, larger than the source video sections combined.

Storing data consumes power and costs money. I don’t like to store this data unnecessarily, so I wanted to fix what I did back then and also have a guideline for future videos.

Goal

The goal was to convert H.264 to H.265, find a decent quality setting that doesn’t waste space, but has a result where I don’t notice any difference to the original.

I’m no video codec expert, but since I found almost no information online (for Apple videotoolbox) I wanted to share what my process was to determine a decent quality setting.

tl;dr spoiler

hevc_videotoolbox quality 45-55 seems to be the sweet spot. Default quality is bad.

Output files will be here for some time

ffmpeg: Hardware vs. software encoding

Most of the times hardware encoding produces larger output files than using a software library. But hardware encoding is significantly faster. With apple silicon you can utilize videotoolbox, which ffmpeg supports.

Using a software encoder

When you software encode H.265 you usually chose a “constant rate factor” CRF. This way your main goal is to keep a specific quality regardless of the final file size/bitrate.

For different purposes like streaming with network bottlenecks “constant bit rate” CBR or “video buffering verifier” VBS can be used. But I am only interested in maintaining a decent quality here.

If I wanted to do this with software I’d use libx265 with a constant rate factor CRF. The default for H.265 is 28. Lower values have better quality.

How ffmpeg does it with Apple’s `hevc_videotoolbox`

With ffmpeg you can output the available decoders with ffmpeg -encoders. With the ffmpeg version that was compiled for homebrew, you’d see something like this when limiting the output to entries with toolbox

$ ffmpeg -encoders | grep toolbox 
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
 V....D h264_videotoolbox    VideoToolbox H.264 Encoder (codec h264)
 V....D hevc_videotoolbox    VideoToolbox H.265 Encoder (codec hevc)
 V....D prores_videotoolbox  VideoToolbox ProRes Encoder (codec prores)

I’m interested in encoding with H.265 so I’m using hevc_videotoolbox. For comparison: On a a NVIDIA GPU you can use hevc_nvenc instead.

Encoding test videos

I mainly used my own videos for the following process, but to show what’s happening I also did it with “Tears of Steel”, a Creative Commons Attribution 3.0 licensed short movie by the Blender Foundation (mango.blender.org). This way I can use screenshots and videos for this post.

The original video in 4k is 12 minutes and 14 seconds long and I extracted 30 seconds of it like this

ffmpeg -i input.mov -ss 00:06:00 -t 30 -c:a copy -c:v copy 30seconds.mov

Then I started the encoding with different quality settings. Doing this manually would look like this:

ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 5 -c:a copy -tag:v hvc1 output-q1.mp4
ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 10 -c:a copy -tag:v hvc1 output-q5.mp4
ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 15 -c:a copy -tag:v hvc1 output-q10.mp4
ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 20 -c:a copy -tag:v hvc1 output-q20.mp4
ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 25 -c:a copy -tag:v hvc1 output-q30.mp4
[...]

So now I have a 30 second clip for each quality setting

Notice that after 80% quality the resulting video is larger than the source. And 100% quality went from 317 MB to 1,73 GB 😳. Obviously the quality can’t improve beyond the source video.

Then I wrote a quick and dirty bash script:

#!/bin/bash

x_values=("03" "08" "19")
y_values=("1" "5" "10" "15" "20" "25" "30" "35" "40" "45" "50" "55" "60" "65" "70" "75" "80" "85" "100")

for x in "${x_values[@]}"; do
  for y in "${y_values[@]}"; do
    input_video="output-q${y}.mp4"
    output_image="sec${x}-q${y}-output.jpg"

    ffmpeg -i "$input_video" -ss "00:00:${x}" -vframes 1 -qscale:v 2 "$output_image"
  done
done

For my test I did it with uncompressed bitmaps, but those are 20MB each. I want to share the results here, so I used qscale 2 which is a very high quality value for jpgs, which is sufficient for checking the results.

So this script extracts a frame from second 3, 8 and 19 from each of the videos with different quality settings.

[…]

And with the preview in Finder (key: space) you can quickly switch between the pictures with arrow up / arrow down. This way I got a really good impression on when I don’t notice a difference anymore.

For my own videos that was 45% most of the time. The changes between 35% and 45% were already minor. In some frames I could see a difference from 45 to 50, and if I really paid attention and looked real close I saw minor changes between 50 and 55. But nothing I’d notice from a normal distance.

As you can see here I’m zoomed in, and there is a big difference between quality 5 and 50.

(Same image as in the beginning)

Conclusion

Avoid using the default quality (meaning: use -q:v). I don’t know what it is exactly but judging from the file size the default quality must be between 15% and 20%.

From my private videos I never had a case where I noticed a change after 50%. This is subjective of course.

If you check the quality after encoding: Search for parts with a lot of motion. This is where I first saw differences in lower quality settings.

So I checked the file differences. For the whole short film it looks like this:

Although I didn’t notice a difference beyond 50%, I have decided to use 55% just to calm the nerves if I one day decide to delete the old H.264 footage.

I’ll leave my test files + output online for some time, if you want to compare for yourself

This is the link to the archive with the extracted frames, so you can quickly switch between frames with different qualities locally:

https://f.27p.de/b/00016-ffmpeg/frames.zip

The ffmpeg command

This is what I’ll use to convert my videos:

ffmpeg -i input.mov -c:v hevc_videotoolbox -q:v 55 -tag:v hvc1 -c:a aac -b:a 128k output.mp4

-i input.mov
- input file
-c:v hevc_videotoolbox
- hardware encoding
-q:v 55
- Quality 55
-tag:v hvc1
- Apple being apple, they need a special tag to open it in QuickTime etc.
-c:a aac
- optional: use AAC audio
-b:a 128k
- optional: specify the audio bitrate (this is just because I want to have the same bitrate for every video)
output.mp4
- output file

Time

The reason why I’m using a hardware decoder is that it’s much faster. Here a comparison for encoding the 30 second source video

CPU

ffmpeg -i 30seconds.mov -c:v libx265 -crf 22 -tag:v hvc1 -c:a aac -b:a 128k time-output-crf22-software.mp4

[...]

encoded 723 frames in 286.54s (2.52 fps), 7813.58 kb/s, Avg QP:25.60

videotoolbox

ffmpeg -i 30seconds.mov -c:v hevc_videotoolbox -q:v 55 -tag:v hvc1 -c:a aac -b:a 128k time-output-q55.mp4

[...]

frame=  723 fps= 64 q=-0.0 Lsize=   31756kB time=00:00:30.08 bitrate=8647.5kbits/s dup=3 drop=0 speed=2.66x

#(hevc_videotoolbox doesn't show the time, it was 11-12 seconds)

30 second footage: 287 s vs 12 s on a M2 Pro (10-core CPU/16-core GPU) 🧐

I hope this information will assist someone in deciding which quality setting is the most suitable. If you have some more insights, feel free to contact me via one of the socials below, so I can update this post.

Background info#

Goal#

tl;dr spoiler#

ffmpeg: Hardware vs. software encoding#

Using a software encoder#

How ffmpeg does it with Apple’s hevc_videotoolbox#

Encoding test videos#

Conclusion#

The ffmpeg command#

Time#