Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open… Continue reading Building Meta’s GenAI Infrastructure
Month: March 2024
SRE Weekly Issue #415
View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You’ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https://app.livestorm.co/firehydrant/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way to Use DORA Metrics […]… Continue reading SRE Weekly Issue #415
How the New Einstein 1 Platform Manages Massive Data and AI Workloads at Scale
In our “Engineering Energizers” Q&A series, we feature Leo Tran, Chief Architect of Platform Engineering at Salesforce. With over 15 years of engineering leadership experience, Leo is instrumental in developing the Einstein 1 Platform. This platform integrates generative AI, data management, CRM capabilities, and trusted systems to provide businesses with personalized customer experiences and AI-driven… Continue reading How the New Einstein 1 Platform Manages Massive Data and AI Workloads at Scale
Making messaging interoperability with third parties safe for users in Europe
To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as… Continue reading Making messaging interoperability with third parties safe for users in Europe
SRE Weekly Issue #414
View on sreweekly.com A message from our sponsor, FireHydrant: 91% of engineering leaders say they want a better alerting tool. The other 9% couldn’t take the survey on their Blackberry. Meet Signals: a new standard in alerting and on call, now available. https://firehydrant.com/blog/alerting-and-on-call-scheduling-for-how-you-actually-work/ 2024 VOID Report This year’s VOID Report is out, and it’s well… Continue reading SRE Weekly Issue #414