Systems and application logs play a key role in operations, observability, and debugging workflows at Meta. Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. In this post, we present the design behind Logarithm, and… Continue reading Logarithm: A logging engine for AI training workflows and services
Category: Technology
Encompass all posts related to Technology topic on this site
SRE Weekly Issue #416
View on sreweekly.com A message from our sponsor, FireHydrant: We need tools that help us show our value, enhance understanding of our systems, and free time for us to expand our skills. In this article, FireHydrant lays out three questions to ask vendors as you evaluate DevOps tools. https://firehydrant.com/blog/3-questions-to-ask-of-any-devops-tool-in-2024/ 4 Instructive Postmortems on Data Downtime… Continue reading SRE Weekly Issue #416
From Concept to Reality: Developing MuleSoft’s New Flex Gateway API Management Solution
In our “Engineering Energizers” Q&A series, we explore the remarkable journeys of engineering leaders who have made significant contributions in their respective fields. Today, we dive into the technical journey of Evangelina Martinez Ruiz Moreno, a Senior Director at Salesforce, who spearheaded the development of MuleSoft’s new Anypoint Flex Gateway. Read on to explore how… Continue reading From Concept to Reality: Developing MuleSoft’s New Flex Gateway API Management Solution
Building Meta’s GenAI Infrastructure
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open… Continue reading Building Meta’s GenAI Infrastructure
SRE Weekly Issue #415
View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You’ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https://app.livestorm.co/firehydrant/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way to Use DORA Metrics […]… Continue reading SRE Weekly Issue #415
How the New Einstein 1 Platform Manages Massive Data and AI Workloads at Scale
In our “Engineering Energizers” Q&A series, we feature Leo Tran, Chief Architect of Platform Engineering at Salesforce. With over 15 years of engineering leadership experience, Leo is instrumental in developing the Einstein 1 Platform. This platform integrates generative AI, data management, CRM capabilities, and trusted systems to provide businesses with personalized customer experiences and AI-driven… Continue reading How the New Einstein 1 Platform Manages Massive Data and AI Workloads at Scale
Making messaging interoperability with third parties safe for users in Europe
To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as… Continue reading Making messaging interoperability with third parties safe for users in Europe
SRE Weekly Issue #414
View on sreweekly.com A message from our sponsor, FireHydrant: 91% of engineering leaders say they want a better alerting tool. The other 9% couldn’t take the survey on their Blackberry. Meet Signals: a new standard in alerting and on call, now available. https://firehydrant.com/blog/alerting-and-on-call-scheduling-for-how-you-actually-work/ 2024 VOID Report This year’s VOID Report is out, and it’s well… Continue reading SRE Weekly Issue #414
Tackling Scaling Challenges Head-On: Industry Cloud’s New Engineering Team Drives Fundraising 2.0 App Development
In our “Engineering Energizers” Q&A series, we delve into the experiences that have shaped Salesforce Engineering leaders. Meet Jevarlo Boykins, a Lead Member of the Technical Staff for Salesforce Engineering. Jevarlo supports the new Salesforce for Nonprofits Nonprofit Cloud (NPC) team under Salesforce Industry Cloud — empowering fundraisers in the nonprofit sector with innovative products… Continue reading Tackling Scaling Challenges Head-On: Industry Cloud’s New Engineering Team Drives Fundraising 2.0 App Development
How DotSlash makes executable deployment simpler
Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository,… Continue reading How DotSlash makes executable deployment simpler