5 Design Patterns for Building Observable Services

How can you make your services observable and embrace service ownership? This article presents a variety of universally applicable design patterns for the developer to consider. Design patterns in software development are repeatable solutions and best practices for solving commonly occurring problems. Even in the case of service monitoring, design patterns, when used appropriately, can… Continue reading 5 Design Patterns for Building Observable Services

Published
Categorized as Technology

SRE Weekly Issue #304

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https://rootly.com/demo/?utm_source=sreweekly Articles Channel global decoupling for region… Continue reading SRE Weekly Issue #304

Published
Categorized as SRE

Managing Availability in Service Based Deployments with Continuous Testing

The Problem At Salesforce, trust is our number one value. What this equates to is that our customers need to trust us; trust us to safeguard their data, trust that we will keep our services up and running, and trust that we will be there for them when they need us. In the world of Software… Continue reading Managing Availability in Service Based Deployments with Continuous Testing

Published
Categorized as Technology

SRE Weekly Issue #303

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo:https://rootly.com/demo/?utm_source=sreweekly Articles Hot Takes on Code Freezes There are way too many gorgeous, mind-blowing… Continue reading SRE Weekly Issue #303

Published
Categorized as SRE

SRE Netflix at SRECon

190 Countries and 5 CORE SREs by Jonah Horowitz How does Netflix scale SRE? How do we manage over 70 million customers around the world without a 24/7 operations center? With tens of thousands of Linux instances in a distributed system architecture, and thousands of daily production changes, it’s an environment that’s both challenging and… Continue reading SRE Netflix at SRECon

SRE Weekly Issue #302

View on sreweekly.com Happy holidays, for those that celebrate! I put this issue together in advance, so no Outages section this week. A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up… Continue reading SRE Weekly Issue #302

Published
Categorized as SRE

Best Time to Send Emails

Today in email marketing, the time that an email is sent has a high impact on user engagement. Sending at an optimal time can help drive more successful and effective campaigns. In order to send at the best time, you need to have a good understand of your users’ email engagement pattern. Sending right before… Continue reading Best Time to Send Emails

Published
Categorized as Technology

SRE Weekly Issue #301

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.com/demo/?utm_source=sreweekly Articles BadgerDAO Exploit Technical Post Mortem This one perhaps belongs in a… Continue reading SRE Weekly Issue #301

Published
Categorized as SRE

Power Loss Siren: Making Meta resilient to power loss events

There are thousands of distributed services running on millions of servers in Meta’s data centers. Part of ensuring the reliability of those services means making them resilient to power loss events as our data center fleet grows. To help increase resiliency, we built the Power Loss Siren (PLS) — a rack level, low latency, distributed… Continue reading Power Loss Siren: Making Meta resilient to power loss events

Published
Categorized as Technology

Event Sourcing for an Inventory Availability Solution

Co-author — Balachandar Mariappan An Introduction to Terminology ATF — Available to Fulfill inventoryOn-Hand — Physical amount of Inventory availableSKU — Stock Keeping Unit, which represents a distinct type of item for sale.Location — Representation of a physical location like a store or warehouse where SKU’s are presentLocation Group — A Logical aggregation of typically one or more Locations.Reservation or Inventory Reservation — Reserving a quantity of a SKU. For example:… Continue reading Event Sourcing for an Inventory Availability Solution

Published
Categorized as Technology