What it is: Real Time Messaging Protocol (RTMP) is a popular media streaming protocol that uses Transmission Control Protocol (TCP) persistent connections. When a connection between a live-streaming client and the platform is interrupted, data from the live event is lost until the client can reconnect to a new server. RTMP Go Away is a… Continue reading RTMP Go Away: Lossless reconnections for live streaming
Blog
Github Actions Security Best Practices
Introduction In the world of Continuous Integration and Continuous Deployment, Github Actions provide a nifty edge to quickly build end-to-end automation right into the repository. This makes integration of Actions into an organization’s Github repositories pretty straightforward and convenient. Github Actions bring velocity to the Software Development Lifecycle. However, if it is swiftly adopted without… Continue reading Github Actions Security Best Practices
SRE Weekly Issue #292
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles Four lessons every company should learn from the back-to-back Facebook outages… Continue reading SRE Weekly Issue #292
How to ETL at Petabyte-Scale with Trino
Trino (formerly known as PrestoSQL) is widely appreciated as a fast distributed SQL query engine, but there is precious little information online about using it for batch extract, transform, and load (ETL) ingestion (outside of the original Facebook paper), particularly at petabyte+ scale. After deciding to use Trino as a key piece of Salesforce’s Big… Continue reading How to ETL at Petabyte-Scale with Trino
SRE Weekly Issue #291
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles Understanding How Facebook Disappeared from the Internet Facebook’s outage caused significantly… Continue reading SRE Weekly Issue #291
Inside the Lab: Expanding connectivity by sea, land, and air
The post Inside the Lab: Expanding connectivity by sea, land, and air appeared first on Facebook Engineering. Facebook Engineering
Evolution of Region Assignment in the Apache HBase Architecture — Part 3
Evolution of Region Assignment in the Apache HBase Architecture — Part 3 Authors: Viraj Jasani, Andrew Purtell, ĺĽ é“Ž(Duo Zhang) In the second part of this blog post series, we provided an overview of how the redesigned AssignmentManager in HBase 2 efficiently and reliably manages the process of region assignment. In this third entry in this blog post series, we… Continue reading Evolution of Region Assignment in the Apache HBase Architecture — Part 3
More details about the October 4 outage
Now that our platforms are up and running as usual after yesterday’s outage, I thought it would be worth sharing a little more detail on what happened and why — and most importantly, how we’re learning from it. This outage was triggered by the system that manages our global backbone network capacity. The backbone is… Continue reading More details about the October 4 outage
Lessons Learned using Spring Data Redis
Context Our Commerce Cloud team that is in charge of the Omnichannel Inventory service uses Redis as a remote cache to store data that lends itself for caching. The remote cache allows our multiple processes to get a synchronized and single view of the cached data. (See our previous blog post, Coordinated Rate Limiting in… Continue reading Lessons Learned using Spring Data Redis
Update about the October 4th outage
To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many… Continue reading Update about the October 4th outage