View on sreweekly.com A message from our sponsor, StackHawk: How do you know if your GraphQL API is secure? Watch StackHawk CSO Scott Gerlach walk through how to run application security tests for GraphQL-backed apps. http://sthwk.com/graphql-webinar Articles May 30 SSL incident TLS can be such a headache. This was an interesting situation. There was a… Continue reading SRE Weekly Issue #253
Author: fde
SRE Weekly Issue #252
View on sreweekly.com A message from our sponsor, StackHawk: Interested in how you can automate application security testing with GitHub Actions? Check out this on demand webinar from StackHawk and Snyk and see how simple it is to get started. https://sthwk.com/stackhawk-snyk Articles Building On-Call Culture at GitHub Their on-call started out as four 24 hour… Continue reading SRE Weekly Issue #252
Analytical Model for Capacity and Degradation in Distributed Systems
Activity Platform (AP) is part of our Einstein Activity Capture (EAC) and Engagement Activity Platform (EAP) eco-system. AP captures, on behalf of users, their activities and engagement data generated from the interaction between users and their leads and contacts. Upon capturing those records, they are scoped, augmented with artificial intelligence models, stored, indexed, used to… Continue reading Analytical Model for Capacity and Degradation in Distributed Systems
SRE Weekly Issue #256
View on sreweekly.com A message from our sponsor, StackHawk: Register now for the first-ever ZAPCon taking place March 9th. The free event will focus on OWASP ZAP and application security best practices. You wont want to miss it! http://sthwk.com/zapcon-sre-weekly Articles Slack’s Outage on January 4th 2021 Here’s a blog post from Slack giving even more… Continue reading SRE Weekly Issue #256
The Origin of MLMon
by Means of Natural Selection, or the Preservation of Favoured Microservices in the Struggle for Life So this is going to be an experimental format for a blog post. I’m going to describe a problem and solution then the problems that came up after, then solutions to them, and new problems, etc. I am not claiming… Continue reading The Origin of MLMon
Minesweeper automates root cause analysis as a first-line defense against bugs
Root cause analysis (RCA) is an important part of fixing any bug. After all, you can’t solve a problem without getting to the heart of it. But RCA isn’t always simple, especially at a scale like Facebook’s. When billions of people are using an app on a variety of platforms and devices, a single bug… Continue reading Minesweeper automates root cause analysis as a first-line defense against bugs
Zero Downtime Node Patching in a Kubernetes Cluster
Authors: Vaishnavi Galgali, Arpeet Kale, Robert Xue Introduction The Salesforce Einstein Vision and Language services are deployed in an AWS Elastic Kubernetes Service (EKS) cluster. One of the primary security and compliance requirements is operating system patching. The cluster nodes that the services are deployed on need to have regular operating system updates. Operating system patching… Continue reading Zero Downtime Node Patching in a Kubernetes Cluster
SRE Weekly Issue #257
View on sreweekly.com A message from our sponsor, StackHawk: Keeping your APIs secure requires thoughtful design and testing. Learn how to protect your REST, SOAP and GraphQL APIs from security vulnerabilities with StackHawk http://sthwk.com/api-protection Articles Sometimes alerts have inobvious reasons for existing This one really got me thinking. Make sure you document why an alert… Continue reading SRE Weekly Issue #257
Native Scrolling in Salesforce mobile app
Native Scrolling in Salesforce Mobile App The Salesforce mobile app is a native app with hybrid functionality available for both iOS and Android platforms. A hybrid app combines the best of both worlds, leveraging native experiences with rich web customizations provided by the Salesforce platform via Flexipages, Lightning Web Components, Aura, and VisualForce. UI Scroller was… Continue reading Native Scrolling in Salesforce mobile app
Faster, more efficient systems for finding and fixing regressions
Every workday, Facebook engineers commit thousands of diffs (which is a change consisting of one or more files) into production. This code velocity allows us to rapidly ship new features, deliver bug fixes and optimizations, and run experiments. However, a natural downside to moving quickly in any industry is the risk of inadvertently causing regressions… Continue reading Faster, more efficient systems for finding and fixing regressions