CarbonJ: A high performance, high-scale, drop-in replacement for carbon-cache and carbon-relay

The Problem In 2015, Salesforce Commerce Cloud (which was then called Demandware) was running a typical open source Grafana/Graphite/Carbon stack to store and visualize time series metrics of the Java application clusters powering our e-commerce business. Our JVM clusters at the time produced around 500k time series metrics per minute. Even though our organization needed us… Continue reading CarbonJ: A high performance, high-scale, drop-in replacement for carbon-cache and carbon-relay

Published
Categorized as Technology

Kangaroo: A new flash cache optimized for tiny objects

What the research is:  Kangaroo is a new flash cache that enables more efficient caching of tiny objects (objects that are ~100 bytes or less) and overcomes the challenges presented by existing flash cache designs. Since Kangaroo is implemented within CacheLib, Facebook’s open source caching engine, developers can use Kangaroo through CacheLib’s API to build… Continue reading Kangaroo: A new flash cache optimized for tiny objects

Published
Categorized as Technology

SRE Weekly Issue #293

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles The Downside of Hospitals Becoming “Highly Reliable” It’s one thing to… Continue reading SRE Weekly Issue #293

Published
Categorized as SRE

Connector framework: A generic approach to crawl activities in real-time

Authors: Jayanth Parayil Kumarji, Evan Jiang, Kevin Terusaki, Zhidong Ke, Heng Zhang, Jeff Lowe, Yifeng Liu, Priyadarshini Mitra Introduction Sales Cloud empowers our customers to make quick and well-informed decisions, with all of the tools they need to manage their selling process. The features we work on in the Activity Platform team, spanning Sales Cloud… Continue reading Connector framework: A generic approach to crawl activities in real-time

Published
Categorized as Technology

Autonomous testing of services at scale

Enabling developers to prototype, test, and iterate on new features quickly is important to Facebook’s success. To do this effectively, it’s key to have a stable infrastructure that doesn’t introduce unnecessary friction. This gets significantly more challenging when the infrastructure in question must also scale to support more than 3 billion people around the world,… Continue reading Autonomous testing of services at scale

Published
Categorized as Technology

Facebook engineers receive 2021 IEEE Computer Society Cybersecurity Award for static analysis tools

Until recently, static analysis tools weren’t seen by our industry as a reliable element of securing code at scale. After nearly a decade of investing in refining these systems, I’m so proud to celebrate our engineering teams today for being awarded the IEEE Computer Society’s Cybersecurity Award for Practice for development and deployment of static… Continue reading Facebook engineers receive 2021 IEEE Computer Society Cybersecurity Award for static analysis tools

Published
Categorized as Technology

RTMP Go Away: Lossless reconnections for live streaming

What it is: Real Time Messaging Protocol (RTMP) is a popular media streaming protocol that uses Transmission Control Protocol (TCP) persistent connections. When a connection between a live-streaming client and the platform is interrupted, data from the live event is lost until the client can reconnect to a new server. RTMP Go Away is a… Continue reading RTMP Go Away: Lossless reconnections for live streaming

Published
Categorized as Technology

Github Actions Security Best Practices

Introduction In the world of Continuous Integration and Continuous Deployment, Github Actions provide a nifty edge to quickly build end-to-end automation right into the repository. This makes integration of Actions into an organization’s Github repositories pretty straightforward and convenient. Github Actions bring velocity to the Software Development Lifecycle. However, if it is swiftly adopted without… Continue reading Github Actions Security Best Practices

Published
Categorized as Technology

SRE Weekly Issue #292

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles Four lessons every company should learn from the back-to-back Facebook outages… Continue reading SRE Weekly Issue #292

Published
Categorized as SRE

How to ETL at Petabyte-Scale with Trino

Trino (formerly known as PrestoSQL) is widely appreciated as a fast distributed SQL query engine, but there is precious little information online about using it for batch extract, transform, and load (ETL) ingestion (outside of the original Facebook paper), particularly at petabyte+ scale. After deciding to use Trino as a key piece of Salesforce’s Big… Continue reading How to ETL at Petabyte-Scale with Trino

Published
Categorized as Technology