{"id":612,"date":"2022-07-25T16:00:07","date_gmt":"2022-07-25T16:00:07","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/07\/25\/its-time-to-leave-the-leap-second-in-the-past\/"},"modified":"2022-07-25T16:00:07","modified_gmt":"2022-07-25T16:00:07","slug":"its-time-to-leave-the-leap-second-in-the-past","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/07\/25\/its-time-to-leave-the-leap-second-in-the-past\/","title":{"rendered":"It\u2019s time to leave the leap second in the past"},"content":{"rendered":"<p><span>The leap second concept was first introduced in 1972 by the <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/International_Earth_Rotation_and_Reference_Systems_Service\" target=\"_blank\" rel=\"noopener\"><span>International Earth Rotation and Reference Systems Service<\/span><\/a><span> (IERS) in an attempt to periodically update <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Coordinated_Universal_Time\" target=\"_blank\" rel=\"noopener\"><span>Coordinated Universal Time<\/span><\/a><span> (UTC) due to imprecise <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Solar_time#Mean_solar_time\" target=\"_blank\" rel=\"noopener\"><span>observed solar time<\/span><\/a><span> (<\/span><span>UT1<\/span><span>) and the long-term <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/%CE%94T_(timekeeping)\" target=\"_blank\" rel=\"noopener\"><span>slowdown in the Earth\u2019s rotation<\/span><\/a><span>. This periodic adjustment mainly benefits scientists and astronomers as it allows them to observe celestial bodies using UTC for most purposes. If there were no UTC correction, then adjustments would have to be made to the legacy equipment and software that synchronize to UTC for astronomical observations.<\/span><\/p>\n<p><span>As of today, since the introduction of the leap second, UTC has been updated 27 times.<\/span><\/p>\n<p><span>While the leap second might have been an acceptable solution in 1972, when it made both the scientific community and the telecom industry happy, these days UTC is equally bad for both digital applications and scientists, who often choose TAI or UT1 instead.<\/span><\/p>\n<p><span>At Meta, we\u2019re supporting an industry effort to stop future introductions of leap seconds and stay at a current level of 27. Introducing new leap seconds is a risky practice that does more harm than good, and we believe it is time to introduce new technologies to replace it.<\/span><\/p>\n\n<h2><span>Leap of faith<\/span><\/h2>\n<p><span>One of many contributing factors to irregularities in the Earth\u2019s rotation is the constant melting and refreezing of ice caps on the world\u2019s tallest mountains. This phenomenon can be simply visualized by thinking about a spinning figure skater, who manages angular velocity by controlling their arms and hands. As they spread their arms the angular velocity decreases, preserving the skater\u2019s momentum. As soon as the skater tucks their arms back in the angular velocity increases.<\/span><\/p>\n<p>To visualize angular velocity change, think of a spinning figure skater.<\/p>\n<p><span>So far, only positive leap seconds have been added. In the early days, this was done by simply adding an extra second, resulting in an unusual timestamp:<\/span><\/p>\n<p><span>23:59:59 -&gt; 23:59:60 -&gt; 00:00:00<\/span><\/p>\n<p><span>At best, such a time jump crashed programs or even corrupted data, due to weird timestamps in the data storage.<\/span><\/p>\n<p><span>With the Earth\u2019s rotation pattern changing, it\u2019s very likely that we will get a negative leap second at some point in the future. The timestamp will then look like this:<\/span><\/p>\n<p><span>23:59:58 -&gt; 00:00:00<\/span><\/p>\n<p><span>The impact of a negative leap second has never been tested on a large scale; it could have a devastating effect on the software relying on timers or schedulers.<\/span><\/p>\n<p><span>In any case, every leap second is a major source of pain for people who manage hardware infrastructures.<\/span><\/p>\n<h2><span>Smearing<\/span><\/h2>\n<p><span>More recently, it has become a common practice to \u201csmear\u201d a leap second by simply slowing down or speeding up the clock. There is no universal way to do this, but at Meta we smear the leap second throughout 17 hours, starting at 00:00:00 UTC <\/span><a href=\"https:\/\/engineering.fb.com\/2020\/03\/18\/production-engineering\/ntp-service\/\" target=\"_blank\" rel=\"noopener\"><span>based on the time zone data (tzdata) package content<\/span><\/a><span>.<\/span><\/p>\n<p>Leap second smearing at Meta.<\/p>\n<p><span>Let\u2019s break this down a bit.<\/span><\/p>\n<p><span>We chose a 17-hour duration primarily because smearing is happening on Stratum 2, where hundreds of <\/span><a href=\"https:\/\/engineering.fb.com\/2020\/03\/18\/production-engineering\/ntp-service\/\"><span>NTP<\/span><\/a><span> servers perform smearing at the same time. To ensure that the difference between them is tolerable, the steps must be minimal. If the smearing steps are too big, NTP clients may consider some devices faulty and exclude them from quorum, which may lead to an outage.<\/span><\/p>\n<p><span>The starting point at 00:00:00 UTC is also not standardized, and there are many possible options. For example, some companies begin smearing at 12:00:00 UTC the day before and throughout 24 hours; some do so two hours before the event, and others right at the edge.<\/span><\/p>\n<p><span>There are also different algorithms on the smearing itself. There is a kernel leap second correction, linear smearing (when equal steps are applied), cosine, and quadratic (which Meta uses). The algorithms are based on different mathematical models and produce different offset graphs:<\/span><\/p>\n<p>Kernel leap second smearing with NTPD<\/p>\n<p><span>The source of the leap indicator differs between GNSS constellations (e.g., GPS, GLONASS, Galileo, and BeiDou). In some cases, it is broadcast by satellites several hours in advance. In other cases, time is propagated in UTC with the leap already applied. In different constellations, the leap second value differs depending on when it was launched.<\/span><\/p>\n<p>Difference in leap second values between GNSS constellations.<\/p>\n<p><span>All of this requires the nontrivial conversion logic inside of the time sources, including our very own <\/span><a href=\"https:\/\/l.workplace.com\/l.php?u=https%3A%2F%2Fengineering.fb.com%2F2021%2F08%2F11%2Fopen-source%2Ftime-appliance%2F&amp;h=AT3EIwsl5L0Eu7aWr8Cq9XbPugyiGHFO2bQnGFfL6QU9Z4euNxRvjNk4kyLkt2fGrqAaTS-6xkjt4NOFqQSzmmwwXfxrgCAXX0EPLjJTF00EOiUvisve9iLO8WIRBxMeth24D_dJmqXBZoRl\"><span>Time Appliance<\/span><\/a><span>. Loss of a GNSS signal during such a sensitive time may lead to a loss of a leap indicator and a split-brain situation, which could lead to an outage.<\/span><\/p>\n<p><span>The leap event is also propagated via tzdata package months in advance, and for ntpd fans, via a <\/span><a href=\"https:\/\/l.workplace.com\/l.php?u=https%3A%2F%2Fwww.ietf.org%2Ftimezones%2Fdata%2Fleap-seconds.list&amp;h=AT3EIwsl5L0Eu7aWr8Cq9XbPugyiGHFO2bQnGFfL6QU9Z4euNxRvjNk4kyLkt2fGrqAaTS-6xkjt4NOFqQSzmmwwXfxrgCAXX0EPLjJTF00EOiUvisve9iLO8WIRBxMeth24D_dJmqXBZoRl\"><span>leap second file<\/span><\/a><span> distributed through the Internet Engineering Taskforce (IETF) website. Not having a fresh copy of the file may lead to forgetting about a leap second and causing an outage.<\/span><\/p>\n<p><span>As already mentioned, the smearing is a very sensitive moment. If the NTP server is restarted during this period, we will likely end up with either \u201cold\u201d or \u201cnew\u201d time, which may propagate to the clients and lead to an outage.<\/span><\/p>\n<p><span>Because of such ambiguities, public NTP pools don\u2019t do smearing, sometimes passing a leap indicator to the clients to figure this out. SNTP clients usually end up stepping the clock and dealing with the consequences described earlier. Smarter clients may choose a default strategy to smear the leap locally. All in all, this means big players like Meta, who perform smearing on public services, can\u2019t join the public pools.<\/span><\/p>\n<p><span>And even after the leap event, things are still at risk. NTP software needs to constantly apply offset compared to the source of time it\u2019s using (GNSS, TAI, or Atomic Clock), and PTP software needs to propagate a so-called UTC offset flag in the announce messages.\u00a0<\/span><\/p>\n<h2><span>The negative impact of leap seconds<\/span><\/h2>\n<p><span>The leap second and the offset it creates cause issues all over the industry. One of the simplest ways to cause an outage is to bake in an assumption of time always going forward. Say we have a code like this:<\/span><\/p>\n<p><span>start := time.Now()<\/span><\/p>\n<p><span>\/\/ do something<\/span><\/p>\n<p><span>spent := time.Now().Sub(start)<\/span><\/p>\n<p><span>Depending on how <\/span><span>spent<\/span><span> is used, we may end up in a situation relying on a negative value during a leap second event. Such assumptions have caused numerous outages, and there are plenty of articles that describe these cases.<\/span><\/p>\n<p><span>Back in 2012, Reddit <\/span><a href=\"https:\/\/www.wired.com\/2012\/07\/leap-second-glitch-explained\/\"><span>experienced<\/span><\/a><span> a massive outage because of a leap second; the site was inaccessible for 30 to 40 minutes. This happened when the time change confused the high-resolution timer (hrtimer), sparking hyperactivity on the servers, which locked up the machines\u2019 CPUs.<\/span><\/p>\n<p><span>In 2017, Cloudflare posted a very <\/span><a href=\"https:\/\/blog.cloudflare.com\/how-and-why-the-leap-second-affected-cloudflare-dns\/\"><span>detailed article<\/span><\/a><span> about the impact of a leap second on the company\u2019s public DNS. The root cause of the bug that affected their DNS service was the belief that time cannot go backward. The code took the upstream time values and fed them to Go\u2019s rand.Int63n() function. The rand.Int63n() function promptly panicked because the argument was negative, which caused the DNS server to fail.<\/span><\/p>\n<h2><span>Moving beyond the leap second<\/span><\/h2>\n<p><span>Leap second events have caused issues across the industry and continue to present many risks. As an industry, we bump into problems whenever a leap second is introduced. And because it\u2019s such a rare event, it devastates the community every time it happens. With a growing demand for clock precision across all industries, the leap second is now causing more damage than good, resulting in disturbances and outages.<\/span><\/p>\n<p><span>As engineers at Meta, we are supporting a larger community push to stop the future introduction of leap seconds and remain at the current level of 27, which we believe will be enough for the next millennium.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2022\/07\/25\/production-engineering\/its-time-to-leave-the-leap-second-in-the-past\/\">It\u2019s time to leave the leap second in the past<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>The leap second concept was first introduced in 1972 by the International Earth Rotation and Reference Systems Service (IERS) in an attempt to periodically update Coordinated Universal Time (UTC) due to imprecise observed solar time (UT1) and the long-term slowdown in the Earth\u2019s rotation. This periodic adjustment mainly benefits scientists and astronomers as it allows&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/07\/25\/its-time-to-leave-the-leap-second-in-the-past\/\">Continue reading <span class=\"screen-reader-text\">It\u2019s time to leave the leap second in the past<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-612","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":466,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/16\/autonomous-monitoring-and-healing-networks\/","url_meta":{"origin":612,"position":0},"title":"Autonomous Monitoring and Healing Networks","date":"September 16, 2021","format":false,"excerpt":"Autonomous Monitoring and Self-Healing Networks Occasional failure is inevitable in any network system. The need of the hour is a robust, self-reliant automated monitoring tool that provides great insight and a lesser degree of manual intervention. We need autonomous interventions that save us time and enhance system availability. What Salesforce\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":655,"url":"https:\/\/fde.cat\/index.php\/2022\/11\/21\/how-precision-time-protocol-is-being-deployed-at-meta\/","url_meta":{"origin":612,"position":1},"title":"How Precision Time Protocol is being deployed at Meta","date":"November 21, 2022","format":false,"excerpt":"Implementing Precision Time Protocol (PTP) at Meta allows us to synchronize the systems that drive our products and services down to nanosecond precision. PTP\u2019s predecessor, Network Time Protocol (NTP), provided us with millisecond precision, but as we scale to more advanced systems on our way to building the next computing\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":230,"url":"https:\/\/fde.cat\/index.php\/2021\/02\/02\/simplify-testing-with-the-singleton-pattern\/","url_meta":{"origin":612,"position":2},"title":"Simplify Testing With the Singleton Pattern","date":"February 2, 2021","format":false,"excerpt":"While you may be familiar with the technical offerings of Salesforce.com, you might not know much about Salesforce.org, the social impact center at Salesforce. We build solutions that utilize the Salesforce Platform, working to empower organizations in the non-profit and education sectors to achieve their missions by unlocking the power\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":616,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/01\/sre-weekly-issue-332\/","url_meta":{"origin":612,"position":3},"title":"SRE Weekly Issue #332","date":"August 1, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":741,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/07\/using-short-lived-certificates-to-protect-tls-secrets\/","url_meta":{"origin":612,"position":4},"title":"Using short-lived certificates to protect TLS secrets","date":"August 7, 2023","format":false,"excerpt":"Short-lived certificates (SLCs) are part of our latest efforts to further secure our Transport Layer Security (TLS) private keys on our edge networks. SLCs have a very short exposure compared to traditional certificates and lower the chances of a compromised private key being abused. Implementing SLCs has required us to\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":812,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/15\/sre-weekly-issue-407\/","url_meta":{"origin":612,"position":5},"title":"SRE Weekly Issue #407","date":"January 15, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Signals is now available in beta. Sign up to experience alerting for modern DevOps teams: Page teams, not services. Ingest inputs from any source. Bucket pricing based on usage. And one platform \u2014 ring to retro \u2014 finally. https:\/\/firehydrant.com\/blog\/signals-beta-live\/ On chains\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/612","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=612"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/612\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}