{"id":291,"date":"2021-08-31T14:40:23","date_gmt":"2021-08-31T14:40:23","guid":{"rendered":"https:\/\/fde.cat\/?p=291"},"modified":"2021-08-31T14:40:23","modified_gmt":"2021-08-31T14:40:23","slug":"sre-weekly-issue-263","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-263\/","title":{"rendered":"SRE Weekly Issue #263"},"content":{"rendered":"<p><a href=\"http:\/\/sreweekly.com\/sre-weekly-issue-263\/\" title=\"Permalink to SRE Weekly Issue #263\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, StackHawk:<\/h2>\n<p>You can utilize Swagger Docs in security testing to drive more thorough and accurate vulnerability scans of your APIs. Learn how:<br \/>\n<a href=\"http:\/\/sthwk.com\/swagger-api-testing\">http:\/\/sthwk.com\/swagger-api-testing<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/increment.com\/reliability\/observability-distributed-tracing\/\" target=\"_blank\" rel=\"noopener\">[Increment: Reliability] Tracing a path to observability<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>They make a really clear case for why traditional metrics and monitoring couldn\u2019t help them solve their problems.<\/p>\n<p><small>Mads Hartmann<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/flyingbarron.medium.com\/glynn-lunney-sre-leadership-9b34ed34eee8\" target=\"_blank\" rel=\"noopener\">Glynn Lunney\u200a\u2014\u200aSRE Leadership<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article commemorates the death of NASA flight director Glynn Lunney by showing the SRE lessons we can learn from him.<\/p>\n<p><small>Robert Barron<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/enterprisersproject.com\/article\/2021\/3\/7-top-site-reliability-engineer-sre-job-interview-questions\" target=\"_blank\" rel=\"noopener\">7 top Site Reliability Engineer (SRE) job interview questions<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I like that this focuses on human factors.<\/p>\n<p><small>Kevin Casey<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.blameless.com\/blog\/how-to-scale-for-reliability-and-trust\" target=\"_blank\" rel=\"noopener\">How to Scale for Reliability and Trust<\/a><\/div>\n<div class=\"sreweekly-description\">\n<blockquote>\n<p>Dealing with both the increased expectations and challenges of reliability as you scale is difficult. You\u2019ll need to maintain your development velocity and build customer trust through transparency.<\/p>\n<\/blockquote>\n<p><small>Blameless<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/eng.uber.com\/eng-failover-handling\/\" target=\"_blank\" rel=\"noopener\">Engineering Failover Handling in Uber\u2019s Mobile Networking Infrastructure<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Uber\u2019s customers are especially likely to be moving around and going in and out of tunnels, losing connectivity along the way. That means it\u2019s difficult to tell when the client should fail over to a different server.<\/p>\n<p><small>Sivabalan Narayanan, Rajesh Mahindra, and Christopher Francis \u2014 Uber<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/gocardless.com\/blog\/incident-review-service-outage-on-25-october-2020\/\" target=\"_blank\" rel=\"noopener\">Incident review: Service outage on 25 October 2020<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Here\u2019s one I missed from last November. Some good stuff to learn from, especially if you run Vault on kubernetes.<\/p>\n<blockquote>\n<p>This outage was caused by a cascading failure stemming from our secrets management engine, which is a dependency of almost all of the production GoCardless services.<\/p>\n<\/blockquote>\n<p><small>Ben Wheatley \u2014 GoCardless<\/small><\/p>\n<\/div>\n<\/div>\n<h2>Outages<\/h2>\n<ul class=\"sreweekly-outages\">\n<li><a href=\"https:\/\/www.google.com\/appsstatus#hl=en&amp;v=issue&amp;sid=1&amp;iid=aa75515d184a2423be444d676b7ebf45\">Gmail and a ton of other Android apps<\/a>\n<ul class=\"sreweekly-outage\">\n<li class=\"sreweekly-outage\">This one\u2019s kind of weird. Google presented it as a Gmail outage, but it\u2019s actually a problem with the Android system webview component. Tons of apps were crashing.<\/li>\n<\/ul>\n<\/li>\n<li><a href=\"https:\/\/piunikaweb.com\/2021\/03\/21\/mangadex-announces-temporary-shut-down\/\">MangaDex<\/a><\/li>\n<li><a href=\"https:\/\/www.newsweek.com\/canvas-down-not-working-learning-platform-school-college-1578719\">Canvas<\/a><\/li>\n<li><a href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/article-9407921\/Instagram-Worldwide-outage-hits-app-just-one-week-platform-crashed.html\">Instagram<\/a><\/li>\n<\/ul>\n<p>SRE WEEKLY<\/p>\n","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, StackHawk: You can utilize Swagger Docs in security testing to drive more thorough and accurate vulnerability scans of your APIs. Learn how: http:\/\/sthwk.com\/swagger-api-testing Articles [Increment: Reliability] Tracing a path to observability They make a really clear case for why traditional metrics and monitoring couldn\u2019t help them solve&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-263\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #263<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-291","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":746,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/14\/sre-weekly-issue-385\/","url_meta":{"origin":291,"position":0},"title":"SRE Weekly Issue #385","date":"August 14, 2023","format":false,"excerpt":"View on sreweekly.com Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt! A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":329,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-278\/","url_meta":{"origin":291,"position":1},"title":"SRE Weekly Issue #278","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Learn how our team at StackHawk tests external cookie authentication using Ktor, and check out some of the helper functions we wrote to make the tests easy to write, read, and maintain https:\/\/sthwk.com\/ktor Articles That Sinking Feeling (The #HugOps Song) Whoa.\u00a0\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":318,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-274\/","url_meta":{"origin":291,"position":2},"title":"SRE Weekly Issue #274","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Join the GraphQL Security Testing Learning Lab on June 29 at 9 AM PT. Learn how to run automated security testing against your GraphQL APIs so you can find and fix vulnerabilities fast. http:\/\/sthwk.com\/graphql-learning-lab Articles Chicken Soup for the SLO The\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":703,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/17\/sre-weekly-issue-368\/","url_meta":{"origin":291,"position":3},"title":"SRE Weekly Issue #368","date":"April 17, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":252,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-255\/","url_meta":{"origin":291,"position":4},"title":"SRE Weekly Issue #255","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: With StackHawk\u2019s new GitHub Action, you can integrate AppSec testing directly into your GitHub CI\/CD pipeline. See how: http:\/\/sthwk.com\/appsec-github-action Articles Why It Should Be Service, Not Site Reliability It really should! Even Google is much more accurately described as a \u201cservice\u201d\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":463,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/20\/sre-weekly-issue-287\/","url_meta":{"origin":291,"position":5},"title":"SRE Weekly Issue #287","date":"September 20, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Trying to figure out how to keep your APIs secure? You\u2019re not the only one. See how DataRobot is automating API security testing with StackHawk. https:\/\/sthwk.com\/DataRobot Articles Industry Interviews: Colm Doyle, Incident Commander at Slack Lots of details about how Slack\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=291"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/291\/revisions"}],"predecessor-version":[{"id":419,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/291\/revisions\/419"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}