How Meta brought AV1 to Reels

We’re sharing how we’re enabling production and delivery of AV1 for Facebook Reels and Instagram Reels.
We believe AV1 is the most viable codec for Meta for the coming years. It offers higher quality at a much lower bit rate compared with previous generations of video codecs.
Meta has worked closely with the open source community to optimize AV1 software encoder and decoder implementations for real-world, global-scale deployment.

As people create, share, and consume an ever-increasing volume of online videos, Meta is working to develop the most bandwidth-efficient ways to transcode content while maintaining reasonable compute and power consumption levels. Choosing the most appropriate video coding formats — the algorithms for compressing and decompressing the file — is crucial. Over the past two decades, researchers have developed video coding standards with ever-higher compression efficiency, including AVC, HEVC, and VVC, developed by MPEG/JVET, and VP9 and AV1, developed by Google and the Alliance for Open Media (AOM). Newer-generation standard typically can reduce the bandwidth by about 30 percent to 50 percent compared with its predecessor while maintaining similar visual quality. At the same time, however, each new standard has consumed substantially more energy and compute than the last, while necessitating encoders that were many times more complex.

We believe AV1 will be the most viable codec for Meta over the next several years. AV1 is the first-generation royalty-free video coding standard developed by AOM, of which Meta is a founding member. It delivers about 30 percent better coding efficiency than VP9 and HEVC — allowing people who use our apps to enjoy high-quality video at much lower bandwidth, and enabling us to maximize storage efficiency and reduce egress traffic, CDN prefetching/caching, and network congestion. AV1 also has a much richer feature set than other video coding standards and can support most of Meta’s typical production usages. AV1 is royalty-free, and both the encoder and decoder implementations are open sourced, with very active development and good support.

Over the past few years, Meta has worked closely with the open source community to optimize AV1 software encoder and decoder implementations for real-world, global-scale deployment. Our goal is to improve playback from what we currently offer with AVC and VP9. We want to ensure that as we roll out AV1, it delivers real value to the people who use our apps.

Finding the right AV1 encoders and decoders

Several open source and closed-source encoder implementations are ready for production, all almost as efficient as the AV1 reference encoder. In a paper, “Towards much better SVT-AV1 quality-cycles tradeoffs for VOD applications,” jointly published with Intel at last year’s SPIE conference, we benchmarked multiple open source encoders — including x264, x265, libvp9, libaom, SVT AV1, and VVC reference encoder (vvenc) — for a video on demand (VOD) use case. The graph below illustrates the trade-off between encoder quality (vertical axis) and complexity (horizontal axis). Every point on the graph corresponds to an encoder preset. The y-axis represents the average BD-rate relative to libaom cpu-used=0; lower values indicate better coding efficiency. The x-axis represents the encoding time in seconds in logarithmic scale.

A few highlights from this graph:

SVT-AV1, the productization encoder for the AV1 coding standard, maintains consistent performance across a wide range of complexity levels. With a total of 13 presets, SVT-AV1 can cover a complexity range that extends from the higher quality AV1 to the higher speeds AVC presets corresponding to more than 1000x change in complexity. This complexity range covers all open source software encoders used in production systems.
At any given point on the x-axis, SVT-AV1 can maximize coding efficiency compared with any other production encoder. For example, the M12 preset has similar complexity performance to the x264 veryfast preset, but M12 is about 30 percent more efficient.
At any given point on the y-axis, SVT-AV1 can maximize encoding speed compared with any other production encoder. For example, the M8 preset is about as efficient as libvp9 preset 0, but M8 is almost 10 times faster.

SVT-AV1 offers 13 presets, allowing a fine-grained trade-off between quality and speed. More importantly, SVT-AV1 now includes a “-fast-decode” option, which accelerates software decoding — with only a slight drop in efficiency — by automatically limiting or disabling the use of AV1 coding tools that are not software-decoder friendly. SVT-AV1 also provides thread management parameters to balance density and speed — critical for large-scale production — potentially enabling a one- or two-second delay for live video streaming. Many parameters can be adjusted to improve coding efficiency or to support certain production scenarios. Some AV1 coding tools that were proposed for use cases in deployment, such as reference frame scaling, super resolution, film grain synthesis, and switch frames, are also supported in SVT-AV1.

Our biggest challenge will be client-side decoding of AV1. Many hardware vendors, including Intel and NVIDIA, have begun to support AV1 hardware decoding on PC. However, we are serving video primarily to mobile phones, most of which don’t include AV1 hardware decoders. For now, we must rely primarily on software decoders. Two major open source software decoders are compatible with multiple platforms: dav1d was developed by VideoLAN and the open source community and can serve as an app-level decoder, while Google’s libgav1 is integrated into the Android SDK.

After extensively benchmarking the decoders’ performance, focusing on facets such as resource requirements, crashes and responsiveness, and frame drops, we decided to integrate dav1d into the player for both iOS and Android platforms. We have been working closely with the open source community to optimize dav1d’s performance. In the last year, we also worked with Ittiam to conduct a benchmark test on Android phones. dav1d can support 720p30 real-time playback on most of the devices in our sample, achieving 1080p30 on certain mid-range and high-end models.

Some Android phones, such as the Google Pixel 6 Pro and Samsung Galaxy S21, already support hardware AV1 decoding. In the near future, we expect that a growing number of high-end Android models will support AV1 hardware decoding, with mid-tier devices following eventually.

Deploying AV1 encoding on Facebook Reels and Instagram Reels

Early in 2022, we deployed AV1 encoding for Facebook and Instagram Reels. When someone uploads a video, the platform generates multiple bit-rate encodings tailored to the video’s projected watch time. To prevent stalling caused by changes in bandwidth, clients can select the version that best fits their connection speed — a technique called adaptive bit rate (ABR) streaming. For videos with high projected watch time, we use advanced ABR encoding based on the convex hull dynamic optimizer algorithm. For each uploaded video, we produce multiple down-scaled versions and encode each with multiple quantization parameters (QPs) and Constant Rate Factors (CRFs). For example, for a 1080p video, we might create seven resolutions and five CRFs, for a total of 35 encodings. After encoding, the system upscales decoded videos to the original resolution and calculates the quality score.

In the graph of rate distortion (RD) curves below, the x-axis represents the encoding bit rate and the y-axis the quality score, expressed in FB-MOS units on a scale of 0 to 100.

From these 35 RD points, we calculate the convex hull, a curve that connects the RD points on the upper left boundary. (Theoretically, if we could use all possible encoding resolutions and CRFs to produce a much denser plot, any point on the convex hull will be the most optimal encoding option for this video in terms of resolution and CRF value.) As illustrated above, we can then select the best encoding for delivery based on the target quality or bit rate.

We have simplified this complicated process. In previous studies, we found that we could use the high-speed preset for first-pass encoding and to produce the convex hull, and then take a second pass to encode the selected (resolution, CRF) points with the high-quality preset. Even though this approach requires additional encoding, it’s faster because the first pass can be done much more quickly. (Coding efficiency drops only slightly.) This approach works even if the first and second passes use different encoders. For example, we can use AVC or VP9 in the first pass and AV1 in the second. We can also leverage the hardware encoder in our internally designed ASICs to accelerate this process.

In the end, we decided on a two-stage hybrid hardware/software ABR encoding approach. Hardware AVC encoding is triggered at video upload time; for this stage, we store only the quality and bit rate information but not encoded bitstreams. When projected watch time of the video exceeds the threshold, second stage encoding is triggered with software AVC, VP9 or AV1 encoder based on the selected (resolution, CRF) on the convex hull.

We can easily add AV1 as one of the second-stage encoders; it is already deployed for Facebook Reels. We have implemented a similar heuristic-based approach for Instagram Reels. For one example video shown in the graph above, three encoding families with AVC, VP9, and AV1 have been produced. Their RD curves closely follow the convex hull from the first-stage encoding. For this particular video example, the best-quality AV1 encoding rivals those of the other two standards, but with a bit rate 65 percent lower than AVC’s and 48 percent lower than VP9’s. In addition, AV1 achieves the desired quality within a very narrow bit rate range, so we can further reduce compute and storage costs by producing fewer encodings during the second stage. As a result, people who use our products can enjoy high-quality video at much lower bandwidth.

AV1 decoder integration and testing

It was relatively easy to enable AV1 decoding and playback on the iOS devices. After just a few rounds of tests, we started delivery. To integrate the dav1d decoder on iOS, we found that two to four threads would meet most of our production needs; any additional threads would waste memory and power without boosting performance.

dav1d has two modes: synchronous and asynchronous. In synchronous mode, dav1d decodes one frame at a time but enables low-latency decoding for each frame. In asynchronous mode, dav1d decodes multiple compressed frames in parallel, postponing rendering until all frames are decoded. In theory, asynchronous mode provides higher throughput and faster decoding. For now, we adopt synchronous mode on iOS since it fits the existing player stack, but we are looking into migrating to asynchronous mode in the future.

To support the decoding of 10-bit AV1-encoded HDR video, we built a single dav1d binary that supports both 8- and 10-bit decoding and ensures that color information is preserved in the transcoding process.

The Android platform presented bigger challenges. First, because people engage with our apps on a vast number of Android models, we had to run local and large-scale A/B tests on various devices to find the optimal decoder configurations. To help debug and triage problems from the AV1 decoder library, we added extensive logging that propagated back error messages from throughout the player stack. This critical step helped us quickly identify and resolve issues in the integration process.

Second, because we are using app level software decoders, we used the hardware VP9 decoder and software AV1 decoder together when playing the same video stream, to correctly support mixed codec manifest and in-stream ABR lane switch. We needed to make sure they interacted with the render engine correctly.

We also needed to support devices with low performance and display resolution. (This was not a problem with iPhones.) Although AV1 can encode high-resolution videos at a much lower bit rate than VP9, bit rate reduction is smaller for low-resolution videos. That makes it difficult to show improvement in top-line delivery metrics for low-performance Android phones. We responded by using higher-quality encoding presets to boost coding efficiency in low-resolution ABR lanes.

Another challenge was that memory allocation and thread creation increased the decoding latency of the first few video frames, prolonging the software decoder start time, delaying player startup, and causing in-play stalls. This was most challenging with Reels, because people typically scroll across multiple Reels videos in quick succession. To improve scrolling performance, we prefetched multiple Reels videos earlier, before they were played.

Before we conduct a large-scale A/B delivery test, we have to check whether the end device is powerful enough for real-time decoding and playback of AV1 bitstreams. However, there is no easy way to classify Android phone performance. We cannot test every model that exists, as there are thousands of them. And characteristics such as core counts, chipset vendors, RAM size, and year and model are not sufficient indicators of capability. We eventually decided to run a small benchmarking test to measure performance and give each phone a performance score. This benchmarking test consisted of basic compute operations, including Gaussian blur, memory allocation, memory copy, and 3D rendering. With this approach, we could assign scores to any existing or upcoming mobile phones and group them based on those numbers. Our A/B tests then identified the models that could support 720p, 1080p, and 10-bit HDR playback.

After the initial Android rollout, we started to enable AV1 hardware decoding for the few Android phones that support it. We expect hardware decoding to improve AV1 performance, and we plan to perform large-scale tests when a larger number of capable phones become available.

Latest delivery status

We started the AV1 delivery for Facebook Reels on iPhone in early 2022 and observed the benefits within the first week of the rollout.

The following graph shows the week-over-week average playback FB-MOS for all Facebook Reels videos played on iPhones. Playback FB-MOS improved by about 0.6 points after we deployed AV1.

This second graph shows the average bit rate for all Facebook Reels videos played on iPhones. AV1 reduced the average bit rate by 12 percent.

This last graph shows the watch time of different codecs for Facebook Reels on iPhone. AV1 watch time rose to about 70 percent during the first week of rollout.

We have continued to enable new features for iPhone, including 1080p30 8-bit AV1 delivery for iPhone 8 and beyond, 10-bit HDR delivery up to 1080p30 for models of iPhone X and beyond that support HDR display, and 1080p60 8-bit AV1 delivery for iPhone 11 and beyond. AV1 encodes a high percentage of the Facebook Reels and Instagram Reels videos watched on iPhones. We have also enabled 8-bit AV1 delivery to select midrange to high-end Android phones. The watch time percentage on Android for AV1 is relatively small but growing.

What’s next for AV1 at Meta?

AV1 delivers real value to the people who use our products. It offers higher quality at a much lower bit rate compared with previous generations of video codecs. For example, in the video below, there is an obvious difference in quality between AVC, VP9, and AV1 at roughly the same bit rate.

https://engineering.fb.com/wp-content/uploads/2023/02/Alley-Oop-AVI-VP9-H264.mp4

Going forward, we will continue to expand AV1 delivery for Android phones and enable hardware decoding in new devices that support it.

For low-end Android phones, it remains challenging to play back high-resolution AV1 bitstreams. To address this, we are currently experimenting with mixed codec manifest support. On the server side, the ABR delivery algorithm generates a mixed codec manifest that contains multiple video adaptation sets with bitstreams encoded using different codecs, such as VP9 and AV1. It also specifies which AV1 and VP9 lanes the device should choose from based on its performance score. For example, a low-end phone can play AV1 up to 540p and switch to VP9 for higher resolution lanes.

With more and more hardware vendors implementing AV1 decoders in mobile SOCs, we expect the number of AV1 capable devices to continue to grow in the next few years, allowing more end users to enjoy the benefits of AV1.

Acknowledgements

This work is a collective effort by the Video Infra team and Instagram team at Meta, along with external partners, including the Intel SVT team, VideoLAN, Ittiam, Two Orioles, and the open source community. The authors would like to thank Jamie Chen, Syed Emran, Xinyu Jin, Ioannis Katsavounidis, Denise Noyes, Mohanish Penta, Nam Pham, Srinath Reddy, Shankar Regunathan, David Ronca, Zafar Shahid, Nidhi Singh, Yassir Solomah, Cosmin Stejerean, Wai Lun Tam, Hassene Tmar, and Haixiong Wang for their contributions and support.

The post How Meta brought AV1 to Reels appeared first on Engineering at Meta.

Engineering at Meta