{"id":496,"date":"2021-10-26T19:00:42","date_gmt":"2021-10-26T19:00:42","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2021\/10\/26\/kangaroo-a-new-flash-cache-optimized-for-tiny-objects\/"},"modified":"2021-10-26T19:00:42","modified_gmt":"2021-10-26T19:00:42","slug":"kangaroo-a-new-flash-cache-optimized-for-tiny-objects","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/10\/26\/kangaroo-a-new-flash-cache-optimized-for-tiny-objects\/","title":{"rendered":"Kangaroo: A new flash cache optimized for tiny objects"},"content":{"rendered":"<h2><span>What the research is:\u00a0<\/span><\/h2>\n<div class=\"jn8vp64t l9j0dhe7 hpfvmrgz\">\n<div class=\"lzcic4wl\">\n<div class=\"ni8dbmo4 stjgntxs g5ia77u1 ii04i59q j83agx80 cbu4d94t ll8tlv6m\">\n<div class=\"l60d2q6s d1544ag0 sj5x9vvc tw6a2znq l9j0dhe7 ni8dbmo4 stjgntxs qlfml3jp inkptoze e72ty7fz qmr60zad jm1wdb64 qv66sw1b ljqsnud1 g6srhlxm odn2s2vf\">\n<div class=\"oo9gr5id\">\n<p><span>Kangaroo is a new flash cache that enables more efficient caching of tiny objects (objects that are ~100 bytes or less) and overcomes the challenges presented by existing flash cache designs. Since Kangaroo is implemented within <a href=\"https:\/\/engineering.fb.com\/2021\/09\/02\/open-source\/cachelib\/\">CacheLib<\/a>, Facebook\u2019s open source caching engine, developers can use Kangaroo through CacheLib\u2019s API to build their own customized cache services.<\/span><\/p>\n<p>Our work on Kangaroo was honored with a Best Paper award at the 2021 <span><a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3477132.3483568\">Symposium on Operating Systems Principles<\/a><\/span> (SOSP) conference.<\/p>\n<p>CacheLib\u2019s API allows developers to build and customize concurrent caches.<\/p>\n<p><span> Optimized for tiny objects, Kangaroo minimizes dynamic random-access memory (DRAM) usage and the number of writes \u2014 and introduces a new cache eviction policy that reduces cache misses with minimal DRAM overhead, further reducing load on back-end storage systems.<\/span><\/p>\n<p><span>Kangaroo\u2019s CacheLib implementation also allowed our researchers to quickly evaluate the impact of Kangaroo\u2019s design on real world production workloads. We evaluated Kangaroo using traces from production social graph caching workloads at Facebook and Twitter. Our experiments show Kangaroo reduces misses by 29 percent under realistic system constraints (DRAM and write rate constraints) compared with prior, state-of-the-art flash cache designs. These results are corroborated with a test deployment of Kangaroo in a shadow production setup.<\/span><\/p>\n<p><span>Miss ratio for all three systems over a seven-day Facebook trace. Kangaroo reduces cache misses by 29 percent versus set-associative (SA) caches and by 56 percent versus log-structured (LS) caches.<\/span>\n<\/p><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"mceTemp\"><\/div>\n<h2><span>How it works:\u00a0<\/span><\/h2>\n<p><span>Current flash caches fall into two main categories: set-associative caches and log-structured caches. Set-associative caches write many more bytes than necessary because they have to write a 4 KB flash page \u2014 the minimum write granularity \u2014 for every object. With tiny objects that are roughly 100 bytes or less in size, this leads to significant write amplification (i.e., set-associative caches write ~40x more bytes than strictly necessary).\u00a0<\/span><\/p>\n<p><span>Log-structured caches have a large DRAM overhead because they have to keep an index entry for every on-flash object. Thus, neither set-associative nor log-structured flash caches are well designed for tiny object flash caches.<\/span><\/p>\n<p><span>Kangaroo combines log-structured and set-associative caches to reduce both DRAM and flash-write overheads. Kangaroo has two main parts: KLog, a small log-structured flash cache, and KSet, a large set-associative flash cache. <\/span><\/p>\n<p><span>Lookups first check the DRAM cache (1). If the object is not in the DRAM cache, requests check KLog\u2019s index (2a) and then read the object from flash if the object was found in the index (2b). Requests finally check the per-set Bloom filter in KSet (3a) and read the set if the object could be in KSet (3b).<\/span><br \/>\n<span>For inserts, Kangaroo first inserts objects into the DRAM cache (1). Objects evicted from the DRAM cache are either dropped by a preflash admission policy (2a) or added to KLog\u2019s DRAM index (2b) and appended to KLog\u2019s flash log (2c). Objects evicted from Klog are either dropped by another admission policy (3a) or inserted to KSet (3b).<\/span><\/p>\n<p><span>One of Kangaroo\u2019s innovations is in how it moves objects from KLog to KSet. Kangaroo uses KLog to find multiple objects mapping to the same set in KSet, such as the pink and yellow objects shown above. Whenever an object is evicted from KLog, Kangaroo proactively evicts objects from KLog to minimize write amplification in KSet.\u00a0<\/span><\/p>\n<p><span>Since writing a set always requires writing 4 KB, regardless of the number of objects inserted, writing two objects instead of just one cuts the write overhead per object in half. Thus, Kangaroo can amortize writes to KSet over multiple objects, decreasing the overall number of bytes written to flash. KLog accomplishes this goal with a small capacity (~5 percent of flash), so Kangaroo needs only a small amount of DRAM to index KLog\u2019s entire capacity. Thus, Kangaroo addresses both the DRAM and flash-write overheads of caching tiny objects on flash.<\/span><\/p>\n<p><span>On top of this basic design, Kangaroo introduces additional techniques to realize its hierarchical design and increase its effectiveness. In particular, since Kangaroo is a cache and not a key-value store, it can evict objects to minimize writes. Kangaroo exploits this opportunity by adding a threshold admission policy that evicts objects instead of admitting them to KSet if there are fewer than n objects to insert from KLog. This admission policy allows Kangaroo to guarantee that the write overhead for moving objects to KSet will be much lower than a set-associative cache. Kangaroo provides further optimizations to reduce DRAM overhead and reduce misses, as explained in our <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3477132.3483568\">SOSP 2021 paper<\/a>.\u00a0<\/span><\/p>\n<p><span>Together, these optimizations allow Kangaroo to overcome the limitations of log-structured caches and set-associative caches, creating a flash cache that delivers on the goal of efficient caching for tiny objects.<\/span><\/p>\n<h2><span>Why it matters:\u00a0<\/span><\/h2>\n<p><span>Caching plays an important role in large-scale systems by allowing for quicker access to data and lowering the load on databases and other back-end systems. For large caches, it is far more efficient to use flash than to use DRAM. However, flash caches fall short when it comes to caching tiny objects. Flash caches require either too much DRAM, which loses the efficiency benefits of flash, or too many writes, which wears out the flash device too quickly. In either case, flash caching fails to live up to its potential as an efficient, large cache for tiny objects, ultimately resulting in smaller caches and more load on back-end systems.<\/span><\/p>\n<p><span>Many social media and IoT services have large numbers of tiny objects, each a few hundred bytes or less. For example, edges in Facebook\u2019s social graph, which are needed to connect friends, posts, and images, among other content, <\/span><a href=\"https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/berg\"><span>average under 100 bytes<\/span><\/a><span>. Caching these objects efficiently on flash allows them to be quickly retrieved at scale and minimizes the load on back-end storage services. Kangaroo enables large-scale flash caching of tiny objects by having low DRAM and write overheads, leading to a lower miss ratio in large flash caches.<\/span><\/p>\n<h2><span>Read<\/span><span> t<\/span><span>he<\/span><span> full paper:<\/span><span>\u00a0<\/span><\/h2>\n<p><a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3477132.3483568\"><span>Kangaroo: Caching billions of tiny objects on flash<\/span><\/a><\/p>\n<h2><span>Get it on GitHub:<\/span><\/h2>\n<p><a href=\"https:\/\/github.com\/saramcallister\/Kangaroo\"><span>Kangaroo<\/span><\/a><span> and <\/span><a href=\"https:\/\/cachelib.org\/\"><span>CacheLib<\/span><\/a><\/p>\n<h2>Acknowledgements:<\/h2>\n<p><em>The authors would like to thank Sara McAllister for leading the project and members of the Parallel Data Lab at Carnegie Mellon University for their input in guiding the research and collaboration with Facebook.<\/em><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2021\/10\/26\/core-data\/kangaroo\/\">Kangaroo: A new flash cache optimized for tiny objects<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Facebook Engineering<\/a>.<\/p>\n<p>Facebook Engineering<\/p>","protected":false},"excerpt":{"rendered":"<p>What the research is:\u00a0 Kangaroo is a new flash cache that enables more efficient caching of tiny objects (objects that are ~100 bytes or less) and overcomes the challenges presented by existing flash cache designs. Since Kangaroo is implemented within CacheLib, Facebook\u2019s open source caching engine, developers can use Kangaroo through CacheLib\u2019s API to build&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/10\/26\/kangaroo-a-new-flash-cache-optimized-for-tiny-objects\/\">Continue reading <span class=\"screen-reader-text\">Kangaroo: A new flash cache optimized for tiny objects<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-496","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":458,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/20\/cachelib-facebooks-open-source-caching-engine-for-web-scale-services\/","url_meta":{"origin":496,"position":0},"title":"CacheLib, Facebook\u2019s open source caching engine for web-scale services","date":"September 20, 2021","format":false,"excerpt":"Caching plays an important role in helping people access their information efficiently. For example, when an email app loads, it temporarily caches some messages, so the user can refresh the page without the app retrieving the same messages. However, large-scale caching has long been a complex engineering challenge. Companies must\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":299,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/caching-with-the-salesforce-commerce-sdk\/","url_meta":{"origin":496,"position":1},"title":"Caching with the Salesforce Commerce SDK","date":"August 31, 2021","format":false,"excerpt":"Co-written by Brian\u00a0RedmondEvery e-commerce application is going to need caching. For some of our customers, millions of shoppers may look at the same product information and, if you have to request that information from the service every time, your application will not scale. This is why we built the Commerce\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":606,"url":"https:\/\/fde.cat\/index.php\/2022\/07\/14\/owl-distributing-content-at-meta-scale\/","url_meta":{"origin":496,"position":2},"title":"Owl: Distributing content at Meta scale","date":"July 14, 2022","format":false,"excerpt":"Being able to distribute large, widely -consumed objects (so-called hot content) efficiently to hosts is becoming increasingly important within Meta\u2019s private cloud. These are commonly distributed content types such as executables, code artifacts, AI models, and search indexes that help enable our software systems. Owl is a new system for\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":596,"url":"https:\/\/fde.cat\/index.php\/2022\/06\/08\/cache-made-consistent-metas-cache-invalidation-solution\/","url_meta":{"origin":496,"position":3},"title":"Cache made consistent: Meta\u2019s cache invalidation solution","date":"June 8, 2022","format":false,"excerpt":"Caches help reduce latency, scale read-heavy workloads, and save cost. They are literally everywhere. Caches run on your phone and in your browser. For example, CDNs and DNS are essentially geo-replicated caches. It\u2019s thanks to many caches working behind the scenes that you can read this blog post right now.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":337,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/coordinated-rate-limiting-in-microservices\/","url_meta":{"origin":496,"position":4},"title":"Coordinated Rate Limiting in Microservices","date":"August 31, 2021","format":false,"excerpt":"The ProblemAny multitenant service with public REST APIs needs to be able to protect itself from excessive usage by one or more tenants. Additionally, as the number of instances that support these services is dynamic and varies based on load, the need arrises to perform coordinated rate limiting on a\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":479,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/29\/switch-it-up\/","url_meta":{"origin":496,"position":5},"title":"Switch It Up!","date":"September 29, 2021","format":false,"excerpt":"It is vital that a microservice striving for high availability have at its disposal several choices on how to rollback changes when unforeseen production issues occur. A very interesting article comes to mind from O\u2019Reilly, Generic Mitigations, where the theme is to restore service functionality FIRST and FAST\u200a\u2014\u200aand THEN root\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/496","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=496"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/496\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=496"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}