SRE Netflix at SRECon

190 Countries and 5 CORE SREs by Jonah Horowitz How does Netflix scale SRE? How do we manage over 70 million customers around the world without a 24/7 operations center? With tens of thousands of Linux instances in a distributed system architecture, and thousands of daily production changes, it’s an environment that’s both challenging and… Continue reading SRE Netflix at SRECon

SRE Weekly Issue #302

View on sreweekly.com Happy holidays, for those that celebrate! I put this issue together in advance, so no Outages section this week. A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up… Continue reading SRE Weekly Issue #302

Published
Categorized as SRE

Best Time to Send Emails

Today in email marketing, the time that an email is sent has a high impact on user engagement. Sending at an optimal time can help drive more successful and effective campaigns. In order to send at the best time, you need to have a good understand of your users’ email engagement pattern. Sending right before… Continue reading Best Time to Send Emails

Published
Categorized as Technology

SRE Weekly Issue #301

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.com/demo/?utm_source=sreweekly Articles BadgerDAO Exploit Technical Post Mortem This one perhaps belongs in a… Continue reading SRE Weekly Issue #301

Published
Categorized as SRE

Power Loss Siren: Making Meta resilient to power loss events

There are thousands of distributed services running on millions of servers in Meta’s data centers. Part of ensuring the reliability of those services means making them resilient to power loss events as our data center fleet grows. To help increase resiliency, we built the Power Loss Siren (PLS) — a rack level, low latency, distributed… Continue reading Power Loss Siren: Making Meta resilient to power loss events

Published
Categorized as Technology

Event Sourcing for an Inventory Availability Solution

Co-author — Balachandar Mariappan An Introduction to Terminology ATF — Available to Fulfill inventoryOn-Hand — Physical amount of Inventory availableSKU — Stock Keeping Unit, which represents a distinct type of item for sale.Location — Representation of a physical location like a store or warehouse where SKU’s are presentLocation Group — A Logical aggregation of typically one or more Locations.Reservation or Inventory Reservation — Reserving a quantity of a SKU. For example:… Continue reading Event Sourcing for an Inventory Availability Solution

Published
Categorized as Technology

Charting the future of our bug bounty program

We’re tackling the industry-wide issue of scraping by expanding our bug bounty program to reward valid reports of scraping bugs and unprotected data sets. To the best of our knowledge, this is an industry first.  Looking toward the future, we’re also launching new educational opportunities for researchers and hosting our first BountyConEDU — a three-day… Continue reading Charting the future of our bug bounty program

Published
Categorized as Technology

SRE Weekly Issue #300

View on sreweekly.com 300 issues. 6 years. Wow! I couldn’t have done it without all of you wonderful people, writing articles and reading issues. Thanks, you make curating this newsletter fun! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and… Continue reading SRE Weekly Issue #300

Published
Categorized as SRE

Using Redis HASH instead of SET to reduce cache size and operating costs

What if we told you that there was a way to dramatically reduce the cost to operate on cloud providers? That’s what we found when we dug into the different data structures offered in Redis. Before we committed to one, we did some research into the difference in memory usage between using HASH versus using… Continue reading Using Redis HASH instead of SET to reduce cache size and operating costs

Published
Categorized as Technology