{"id":880,"date":"2024-06-13T14:45:58","date_gmt":"2024-06-13T14:45:58","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/06\/13\/mlow-metas-low-bitrate-audio-codec\/"},"modified":"2024-06-13T14:45:58","modified_gmt":"2024-06-13T14:45:58","slug":"mlow-metas-low-bitrate-audio-codec","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/06\/13\/mlow-metas-low-bitrate-audio-codec\/","title":{"rendered":"MLow: Meta\u2019s low bitrate audio codec"},"content":{"rendered":"<p><span>At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.\u00a0<\/span><br \/>\n<span>We are working to make RTC accessible by providing a high-quality experience for everyone \u2013 even those who might not have the fastest connections or the latest phones.<\/span><br \/>\n<span>As more and more people have relied on our products to make calls over the years, we\u2019ve been working on new ways to ensure all calls have a solid audio quality.<\/span><br \/>\n<span>We\u2019ve built the Meta Low Bitrate (MLow) codec: a new tool that improves audio quality especially for those on slow-speed connections.<\/span><\/p>\n<p>Figure 1: Increasing complexity or bitrate usually improves quality, but good codecs achieve higher quality while balancing the other two.<\/p>\n<p><span>RTC products use many building blocks to deliver the full experience, and one of the critical components is audio\/video codecs. These codecs help compress the captured audio\/video data so it can be sent across the internet efficiently to the recipient, keeping the experience real time. For example, the size of raw audio captured for a typical call is 768 kbps (mono, sampling at 48kHz, bit depth 16), which modern codecs are able to compress down to 25-30 kbps. Often this compression comes at the cost of some quality (loss of information), but good codecs can strike a balance among the trio of quality, bitrate, and complexity by exploiting deep knowledge about the nature of the audio signal as well as by using psychoacoustics.\u00a0<\/span><\/p>\n<p><span>Building a good codec is quite challenging, and that is why we don\u2019t see new codecs emerging very often. The last widely known, good open-source codec was Opus, released in 2012, which has become the codec of choice for the wide variety of applications on the internet. Meta has used Opus for all its RTC needs, and so far it has served us well \u2013 helping to deliver quality calls to billions of users across the globe.\u00a0<\/span><\/p>\n<h2><span>Our motivation for building a new codec<\/span><\/h2>\n<p><span>Given the massive scale of RTC usage in Meta products, we get to see how a codec performs in a range of network scenarios and how it impacts the end user\u2019s experience. In particular, we\u2019ve observed that a significant chunk of calls have poor network connections throughout or for part of a call. Typically a bandwidth estimation module (BWE) detects the quality of the network, and as the network quality degrades, we need to lower the codec operating bitrate to avoid congesting the network and keep the audio flowing \u2013 impacting the trio balance referenced above. Complicating matters, conducting a video call despite poor network quality leaves little room for audio and pushes the audio bitrate further down. The lowest operating point for Opus is 6 kbps, at which it runs in NarrowBand mode (0 \u2013 4kHz) and does not adequately capture all the sound frequencies produced by human voices\u2014and so doesn\u2019t sound as clear or natural. Here is an example of how Opus sounds at 6kbps and the corresponding reference file for comparison.<\/span><\/p>\n<p><span>Raw reference signal:\u00a0<\/span><\/p>\n\n<p><span>Opus @ 6 kbps NarrowBand (NB):\u00a0<\/span><\/p>\n\n<p><span>Over the last two years, we have seen development of some new machine learning (ML)-based audio codecs that provide good quality audio at very low bitrates. In October of 2022, Meta released <\/span><a href=\"https:\/\/ai.meta.com\/blog\/ai-powered-audio-compression-technique\/\" target=\"_blank\" rel=\"noopener\"><span>Encodec<\/span><\/a><span>, which achieves amazingly crisp audio quality at very low bitrates. While these AI\/ML-based codecs are able to achieve great quality at low bitrates, it often comes at the expense of heavy computational cost. Consequently, only the very high-end (expensive) mobile handsets are able to run these codecs reliably, while users running on lower-end devices continue to experience audio quality issues in low-bitrate conditions. So the net impact of these newer computationally expensive codecs is actually limited to a small portion of users.<\/span><\/p>\n<p><span>A significant number of our users still use low-end devices. For example, more than 20 percent of our calls are made on ARMv7 devices, and 10\u2019s of millions of daily calls on WhatsApp are on 10-year-old-plus devices. Given the readily available codec choices and our commitment to ensure that all users \u2013 regardless of what device they\u2019re on \u2013 have a quality calling experience, we clearly need a codec with very low-compute requirements that still delivers high-quality audio at these lowest bitrates.<\/span><\/p>\n<h2><span>The MLow codec<\/span><\/h2>\n<p><span>We broke ground with our development of a new codec in late 2021. After nearly two years of active development and testing, we are proud to announce <\/span>M<span>eta <\/span>Low<span> Bitrate audio codec, aka MLow, which achieves two-times-better quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Even more importantly, we are able to achieve this great quality while keeping MLow\u2019s computational complexity <\/span>10 percent lower<span> than that of Opus.\u00a0<\/span><\/p>\n<p><span>Figure 2 below shows a MOS (Mean Opinion Score) plot on a 1-5 scale and compares the POLQA scores between Opus and MLow at various bitrates. As the chart makes evident, MLow has a huge advantage over Opus at the lowest bitrates, where it saturates quality faster than Opus.<\/span><\/p>\n<p>Figure 2: POLQA score comparing Opus (WB) versus MLow at various bitrates across a large dataset of files.<\/p>\n<p><span>We have already fully launched MLow to all Instagram and Messenger calls and are actively rolling it out on WhatsApp\u2014and we\u2019ve already seen incredible improvement in user engagement driven by better audio quality.<\/span><\/p>\n<p><span>Here are some audio samples for you to listen to. We suggest that you use your favorite pair of headphones to appreciate the striking audio-quality differences.<\/span><\/p>\n<p><span>Opus 6 kbps NB<\/span><br \/>\n<span>MLow 6 kbps WB<\/span><br \/>\n<span>Reference<\/span><\/p>\n<p><span class=\"mce_SELRES_start\">\ufeff<\/span><\/p>\n<p><span class=\"mce_SELRES_start\">\ufeff<\/span><br \/>\n<span class=\"mce_SELRES_start\">\ufeff<\/span><\/p>\n<p><span>Being able to encode high-quality audio at lower bitrates also unlocks more effective Forward Error Correction (FEC) strategies. Compared with Opus, with MLow we can afford to pack FEC at much lower bitrates, which significantly helps to improve the audio quality in packet loss scenarios.\u00a0<\/span><\/p>\n<p><span>Here are two audio samples at 14 kbps with heavy 30 percent receiver-side packet loss.<\/span><\/p>\n<p>Opus:<\/p>\n<p>\nMLow:<\/p>\n<p><span class=\"mce_SELRES_start\">\ufeff<\/span><\/p>\n<p><span>Note that at these bitrates, Opus is not able to encode any inband FEC. It needs a minimum of 19 kbps to encode any inband FEC at 10 percent packet loss, which hurts the audio recovery.<\/span><\/p>\n<h2><span>MLow internals<\/span><\/h2>\n<p><span>MLow builds on the concepts of a classic CELP (Code Excited Linear Prediction) codec with advancements around excitation generation, parameter quantization, and coding schemes. Figure 3 is a high-level visual of how the codec works internally. On the left we have an input signal (raw PCM audio) feeding into the encoder, which then splits the signal into two low and high-frequency bands. Then, each band is encoded separately while making use of shared information to achieve better compression. All the output is passed through a range encoder to further compress and generate an encoded payload. The decoder does the exact opposite when given the payload to generate output audio signals.<\/span><\/p>\n<p>Figure 3: High level MLow encoder and decoder architecture.<\/p>\n<p><span>With these split-band optimizations, we are able to encode the high band using very few bits, which lets MLow deliver SuperWideBand (32kHz sampling) using a much lower bitrate.<\/span><\/p>\n<h2><span>What\u2019s next?<\/span><\/h2>\n<p><span>MLow has greatly enhanced audio quality on low-end devices while still ensuring calls are end-to-end encrypted. We are really excited about what we have accomplished in just the last two years\u2014from developing a new codec to successfully shipping it to billions of users around the globe. We\u2019re continuing to work on improving the audio recovery in heavy packet loss networks by pumping out more redundant audio, which MLow allows us to do efficiently. We\u2019re excited to share more as we continue working to make it easier for all our users to make quality audio calls.\u00a0<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2024\/06\/13\/web\/mlow-metas-low-bitrate-audio-codec\/\">MLow: Meta\u2019s low bitrate audio codec<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.\u00a0 We are working to make RTC accessible by providing a high-quality experience for everyone \u2013 even those who might not have the fastest connections or the latest phones. As more and more people have relied on&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/06\/13\/mlow-metas-low-bitrate-audio-codec\/\">Continue reading <span class=\"screen-reader-text\">MLow: Meta\u2019s low bitrate audio codec<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-880","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":841,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/20\/better-video-for-mobile-rtc-with-av1-and-hd\/","url_meta":{"origin":880,"position":0},"title":"Better video for mobile RTC with AV1 and HD","date":"March 20, 2024","format":false,"excerpt":"At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp. We\u2019ve seen significant benefits by adopting the AV1 codec for RTC. Here\u2019s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":842,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/20\/optimizing-rtc-bandwidth-estimation-with-machine-learning\/","url_meta":{"origin":880,"position":1},"title":"Optimizing RTC bandwidth estimation with machine learning","date":"March 20, 2024","format":false,"excerpt":"Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta\u2019s family of apps. We\u2019ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We\u2019re sharing our experiment results\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":699,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/11\/why-xhe-aac-is-being-embraced-at-meta\/","url_meta":{"origin":880,"position":2},"title":"Why xHE-AAC is being embraced at Meta","date":"April 11, 2023","format":false,"excerpt":"We\u2019re sharing how Meta delivers high-quality audio at scale with the xHE-AAC audio codec. xHE-AAC has already been deployed on Facebook and Instagram to provide enhanced audio for features like Reels and Stories.\u00a0 At Meta, we serve every media use case imaginable for billions of people across the world \u2014\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":597,"url":"https:\/\/fde.cat\/index.php\/2022\/06\/09\/under-the-hood-metas-cloud-gaming-infrastructure\/","url_meta":{"origin":880,"position":3},"title":"Under the hood: Meta\u2019s cloud gaming infrastructure","date":"June 9, 2022","format":false,"excerpt":"The promise of cloud gaming is a promise to democratize gaming. Anyone who loves games should be able to enjoy them and share the experience with their friends, no matter where they\u2019re located, and even if they don\u2019t have the latest, most expensive gaming hardware. Facebook launched its cloud gaming\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":682,"url":"https:\/\/fde.cat\/index.php\/2023\/02\/21\/how-meta-brought-av1-to-reels\/","url_meta":{"origin":880,"position":4},"title":"How Meta brought AV1 to Reels","date":"February 21, 2023","format":false,"excerpt":"We\u2019re sharing how we\u2019re enabling production and delivery of AV1 for Facebook Reels and Instagram Reels. We believe AV1 is the most viable codec for Meta for the coming years. It offers higher quality at a much lower bit rate compared with previous generations of video codecs. Meta has worked\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":295,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/how-facebook-encodes-your-videos\/","url_meta":{"origin":880,"position":5},"title":"How Facebook encodes your videos","date":"August 31, 2021","format":false,"excerpt":"People upload hundreds of millions of videos to Facebook every day. Making sure every video is delivered at the best quality \u2014 with the highest resolution and as little buffering as possible \u2014 means optimizing not only when and how our video codecs compress and decompress videos for viewing, but\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=880"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/880\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=880"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=880"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}