{"id":314,"date":"2021-08-31T14:39:51","date_gmt":"2021-08-31T14:39:51","guid":{"rendered":"https:\/\/fde.cat\/?p=314"},"modified":"2021-08-31T14:39:51","modified_gmt":"2021-08-31T14:39:51","slug":"sre-weekly-issue-273","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-273\/","title":{"rendered":"SRE Weekly Issue #273"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-273\/\" title=\"Permalink to SRE Weekly Issue #273\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, StackHawk:<\/h2>\n<p>StackHawk is helping One Medical equip developers with automated security testing and self-service remediations. See how:<br \/>\n<a href=\"http:\/\/sthwk.com\/onemedical\">http:\/\/sthwk.com\/onemedical<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/rootly.io\/blog\/incident-management-vs-incident-response-what-s-the-difference\">Incident Management vs. Incident Response<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>What indeed? It depends on who you ask.<\/p>\n<p>Quentin Rousseau \u2014 Rootly<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/sigops.org\/s\/conferences\/hotos\/2021\/papers\/hotos21-s01-hochschild.pdf\">Cores that don\u2019t count<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This academic paper explains Google\u2019s efforts toward identifying \u201cmercurial\u201d CPU coores \u2014 cores that make erroneous computations.<\/p>\n<p>[\u2026] we observe on the order of a few mercurial cores per several thousand machines [\u2026]<\/p>\n<p>This one blew my mind:<\/p>\n<p>A deterministic AES mis-computation, which was \u201cselfinverting\u201d: encrypting and decrypting on the same core yielded the identity function, but decryption elsewhere yielded gibberish.<\/p>\n<p>Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat \u2014 Google<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.fastly.com\/blog\/minimizing-ossification-risk-is-everyones-responsibility\">Minimizing ossification risk is everyone\u2019s responsibility <\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>The decisions, non-decisions, and workarounds that we implement now can have lasting effects on the Internet as a whole.<\/p>\n<p>Mark Nottingham \u2014 Fastly<\/p>\n<p><em>Full disclosure: Fastly is my employer.<\/em><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.luminis.eu\/blog\/resilience-en\/what-is-resilience-engineering-a-lightning-talk-with-background-information\/\">What is resilience engineering? A lightning talk with background information<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A great intro to the topic of resilience engineering. Hint: resilience != high availability.<\/p>\n<p>Piet van Dongen \u2014 Luminis Arnhem<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2021\/05\/30\/dealing-with-new-kinds-of-trouble\/\">Dealing with new kinds of trouble<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>When you include people in your definition of \u201cthe system\u201d, something that looked like a system failure where humans had to \u201cstep in\u201d is actually a success in which the system adapted.<\/p>\n<p>Lorin Hochstein<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/rachelbythebay.com\/w\/2021\/06\/01\/count\/\">Please don\u2019t count outages (or SEVs, or whatever)<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I find the way this author presented this argument especially convincing. My favorite part is the real-world story toward the end.<\/p>\n<p>Rachel by the Bay<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/engineering.fb.com\/2021\/06\/02\/data-center-engineering\/how-facebook-deals-with-pcie-faults-to-keep-our-data-centers-running-reliably\/\">How Facebook deals with PCIe faults to keep our data centers running reliably<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Facebook presents their method for finding and dealing with PCIe errors in their infrastructure.<\/p>\n<p>Ashwin Poojary, Bill Holland, Makan Diarra, and Ray Park \u2014 Facebook<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/github.blog\/2021-06-02-github-availability-report-may-2021\/\">GitHub Availability Report: May 2021<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Overflow of\u00a0a 32-bit integer primary key caused a security issue.<\/p>\n<p>Scott Sanders \u2014 GitHub<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/developers.soundcloud.com\/blog\/building-a-healthy-on-call-culture\">Building a Healthy On-Call Culture<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This caught my eye. I\u2019ve seldom been in an on-call rotation with shifts that were not a week or two at a time.<\/p>\n<p>The optimal frequency for being on call is about three days a month.<\/p>\n<p>There\u2019s also a good discussion of paying for on-call shifts, which, in my experience, goes a long way toward making on-call more palatable.<\/p>\n<p>Christine Patton \u2014 SoundCloud<\/p>\n<\/div>\n<\/div>\n<h2>Outages<\/h2>\n<p><a href=\"https:\/\/www.dailymail.co.uk\/tvshowbiz\/article-9635911\/HBO-Max-outage-draws-ire-Twitter-users-eager-season-finale-Kate-Winslets-Mare-Easttown.html\">HBO Max<\/a><br \/>\n<a href=\"https:\/\/www.pymnts.com\/news\/payment-methods\/2021\/apple-card-outage-disrupting-purchases\/\">Apple Card<\/a><br \/>\n<a href=\"https:\/\/piunikaweb.com\/2021\/06\/03\/sling-tv-down-error-code-12-47-issue-gets-officially-acknowledged\/\">Sling TV<\/a><br \/>\n<a href=\"https:\/\/www.thehindubusinessline.com\/info-tech\/google-meet-services-hit-by-temporary-outage-in-india-issue-resolved-later\/article34739597.ece\">Google Meet<\/a><br \/>\n<a href=\"https:\/\/www.githubstatus.com\/incidents\/76nv9h8pmkv4\">GitHub<\/a><br \/>\n<a href=\"https:\/\/discord.statuspage.io\/incidents\/sdqlyt2zcwdt\">Discord<\/a><\/p>\n<p>Discord had several outages this week.<\/p>\n<p>SRE WEEKLY<\/p>\n","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, StackHawk: StackHawk is helping One Medical equip developers with automated security testing and self-service remediations. See how: http:\/\/sthwk.com\/onemedical Articles Incident Management vs. Incident Response What indeed? It depends on who you ask. Quentin Rousseau \u2014 Rootly Cores that don\u2019t count This academic paper explains Google\u2019s efforts toward&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-273\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #273<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-314","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":255,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-252\/","url_meta":{"origin":314,"position":0},"title":"SRE Weekly Issue #252","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Interested in how you can automate application security testing with GitHub Actions? Check out this on demand webinar from StackHawk and Snyk and see how simple it is to get started. https:\/\/sthwk.com\/stackhawk-snyk Articles Building On-Call Culture at GitHub Their on-call started\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":269,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-257\/","url_meta":{"origin":314,"position":1},"title":"SRE Weekly Issue #257","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Keeping your APIs secure requires thoughtful design and testing. Learn how to protect your REST, SOAP and GraphQL APIs from security vulnerabilities with StackHawk http:\/\/sthwk.com\/api-protection Articles Sometimes alerts have inobvious reasons for existing This one really got me thinking. Make sure\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":463,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/20\/sre-weekly-issue-287\/","url_meta":{"origin":314,"position":2},"title":"SRE Weekly Issue #287","date":"September 20, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Trying to figure out how to keep your APIs secure? You\u2019re not the only one. See how DataRobot is automating API security testing with StackHawk. https:\/\/sthwk.com\/DataRobot Articles Industry Interviews: Colm Doyle, Incident Commander at Slack Lots of details about how Slack\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":282,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-261\/","url_meta":{"origin":314,"position":3},"title":"SRE Weekly Issue #261","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Join Snyk and StackHawk on March 18 as they walk through how to use Software Composition Analysis (SCA) and Dynamic Application Security Testing (DAST) in CI\/CD to ship more secure applications. http:\/\/sthwk.com\/snyk-stackhawk-webinar Articles What Do Fighter Pilots and Incident Management Have\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":327,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-277\/","url_meta":{"origin":314,"position":4},"title":"SRE Weekly Issue #277","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Planelty saved weeks of work by implementing StackHawk instead of building an internal ZAP service. See how: https:\/\/sthwk.com\/planetly-stackhawk Articles FINRA Orders Record Financial Penalties Against Robinhood Financial LLC Remember all those Robinhood outages? The US financial regulatory agency is making Robinhood\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":293,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-264\/","url_meta":{"origin":314,"position":5},"title":"SRE Weekly Issue #264","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: StackHawk and FOSSA are getting together Thursday, April 8, to show you how to automate AppSec testing with GitHub actions. Register to learn how to test your open source and proprietary code for vulns in CI\/CD. https:\/\/hubs.ly\/H0Ks1dy0 Articles Balancing act: the\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/314","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=314"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/314\/revisions"}],"predecessor-version":[{"id":396,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/314\/revisions\/396"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=314"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=314"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=314"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}