Under the hood: Meta’s cloud gaming infrastructure

The promise of cloud gaming is a promise to democratize gaming. Anyone who loves games should be able to enjoy them and share the experience with their friends, no matter where they’re located, and even if they don’t have the latest, most expensive gaming hardware. Facebook launched its cloud gaming platform in 2020 to give anyone on Facebook instant access to native Android and Windows games across every screen and web browser.

Creating the unprecedented access offered by cloud gaming required engineers at Meta to rise to new challenges and develop a growing hardware infrastructure capable of delivering quality game experiences to people all over the world.

But gaming itself is also evolving. From new 3D experiences like AR and VR to what will eventually become the metaverse, people all over the world want to play increasingly immersive games as seamlessly and easily as possible. While it will take a massive effort across the industry to bring the metaverse to fruition, we believe creating the infrastructure and solving the challenges of cloud gaming are pointing us toward solutions for the metaverse as well.

Why cloud gaming?

Cloud gaming is about accessibility — bringing gaming to people regardless of the device they’re using or where they’re located in the world. Placing game apps on the cloud frees people up from having to purchase new hardware and also removes the need for large downloads or waiting for updates. It enables seamless cross-play across desktop and mobile devices as well as flexible app distribution.

Cloud gaming also promises an easier developer experience since developers shouldn’t need to concern themselves as much with optimizing games for multiple hardware platforms (e.g., desktop and mobile). In an ideal world, developers will only need to build an app once and it can be distributed to multiple devices without the need for multiple binaries. And for players and developers concerned with security and integrity, cloud gaming means fewer concerns about cheating and piracy.

Inside Meta’s cloud gaming infrastructure

Enabling cloud gaming at Meta meant developing a new hardware and software infrastructure to address the challenges inherent in cloud gaming. Cloud-based games need low end-to-end latency to provide a fast and smooth gameplay experience where quality video and audio have to be streamed with as little jitter as possible. The infrastructure also needs to be capable of running multiple games on a single cloud gaming server to be economically efficient. And all this has to be secured against various cyberattacks while also remaining robust and efficient.

Edge computing, GPUs, and virtualization

The best way for our cloud gaming infrastructure to provide great latency for players is to bring it as close to them as possible in terms of network distance. Meta’s data centers alone cannot provide the level of ultra-low latency we require for cloud gaming. So we rely on edge computing, where we bring the cloud gaming infrastructure closer to players by deploying in edges that are in metropolitan areas close to large populations.

As we increase the number of edge computing sites, we can also improve latency for players.

Our goal within each edge computing site is to have a unified hosting environment to make sure we can run as many games as possible as smoothly as possible. Today’s games are designed for GPUs, so we partnered with NVIDIA to build a hosting environment on top of NVIDIA Ampere architecture-based GPUs. As games continue to become more graphically intensive and complex, GPUs will provide us with the high fidelity and low latency we need for loading, running, and streaming games.

To run games themselves, we use Twine, our cluster management system, on top of our edge computing operating system. We build orchestration services to manage the streaming signals and use Twine to coordinate the game servers on edge.

We built and used container technologies for both Windows and Android games. We have different hosting solutions for Windows and Android games, and the Windows hosting solution comes with the integration with PlayGiga. We’ve built a consolidated orchestration system to manage and run the games for both operating systems. This means we can deliver games and manage capacity with more flexibility across different platforms.

Video and audio streaming

Ultimately, delivering smooth video and audio is one of the most important parts of the cloud gaming experience. Anyone who has played an online game is familiar with the frustrations that latency can cause.

After considering the maturity and compatibility of the technology, we landed on WebRTC with Secure Real-Time Transport Protocol (SRTP) technology as our solution to streaming user inputs and video/audio frames for games. In doing so, we’ve also been able to significantly improve our video and audio streaming performance over time.

Let us start with a basic streaming flow we had for cloud gaming at the beginning. 

Whenever a player clicked to perform an action in a game (e.g., to make their character jump), we captured the click event and sent it to the server, where the game received the event. The game then rendered a frame that contained the result of that action (i.e., the character jumping). We captured that rendered frame, copied it out, and encoded it using a video encoder. The frame was then packetized so it could fit into a User Datagram Protocol (UDP) packet and sent through the network to the player. Finally, on the player’s side, there was a jitter buffer to help smooth play as the packets came in. We decoded the packets into frames, then rendered the frame for the player.

All this happened so fast that the player didn’t perceive any time between the click and the action. However, each of these steps takes a tiny bit of time. And when these bits of time add up, it can lead to higher latency and lag for the player.

Our current model significantly improves on this and improves latency using GPU encoding. Using GPUs for encoding means that when a game renders a frame, it’s rendered in the GPU and never leaves the GPU’s memory until it’s encoded. This process is much more efficient and doesn’t use a lot of the PCI bus bandwidth between the GPU and main server. The encoded frame also ends up being smaller than the raw frame.

Our current model for video and audio streaming significantly improves on latency by using GPU encoding.

On the networking side, having edge computing sites close to the player also helps reduce video and audio latency — even more significantly than the savings from moving the entire streaming pipeline to GPUs.

Over on the player side, we also now use hardware decoding to reduce the decode time. Video and audio are typically synced together, but we can also send video a bit ahead of audio to improve latency. We can also take advantage of the inherent latency of the player’s computer monitor or phone screen. The screen displays frames one by one at a certain rate (e.g., 30fps or 60fps). We can use those imperceptible intervals between frames to help absorb some of the jitter and smooth out video. For devices with support for higher FPS, the latency can go down further.

Keeping cloud gaming secure

Players and developers need to be assured that any cloud gaming experience is secure and safe. Players need to know their data is safe and that games won’t be dominated by cheaters. And developers need to be assured their product will be safe from piracy and other security vulnerabilities.

Incorporating edge computing, GPU virtualization, and video/audio streaming makes the cloud gaming infrastructure very complex. And with this complexity comes unique security challenges. As a Windows- and Android-based system, the system inherently takes on the security challenges of those environments and also needs to be protected against threats like DDOS attacks.

To identify and address security issues, we assess security at every stage of development, from design and implementation to testing. This includes threat modeling, security code reviews, fuzz testing, and security testing. We don’t want cloud gaming to become an entry point for attacking other Meta systems, so the cloud gaming infrastructure is completely isolated from Meta’s core data infrastructure. We also have an internal security team working alongside external companies to do regular security assessments on the system.

Cloud gaming and the metaverse

The metaverse holds major implications for the future of gaming — not only in the types of games people will play but also in how those games will be delivered to them. The metaverse will push network connectivity requirements further than ever. The groundwork we’re laying today is going to play an important role in helping the larger industry create the cloud infrastructure that will be needed to handle the complex computing required to create metaverse experiences.

Just like AAA games, future metaverse experiences will require the highest fidelity and lowest latency possible. If the experience isn’t frictionless, it won’t work for anyone, which means a metaverse network must have ultra-low latency, high scalability, high throughput, and federated data storage.

Whatever this new network architecture looks like, it will require a step-change enhancement to today’s overall network architecture. It will need a reliable, quality of service (QoS)-aware, peer-to-peer communication link and protocol to exchange information among people in the same proximity (e.g., under the same Wi-Fi AP coverage). It needs a unified and ubiquitous network topology to bound latency, jitter, and packet loss performance across the globe. Challenges around improving latency, reliability, and throughput and developing federated databases may force engineers to redesign and redistribute compute and routing resources in the end-to-end communication networks. Then the metaverse will need a ubiquitous end-to-end QoS management strategy at every segment of its networks, as well as at every application/network protocol layer vertically (i.e., from the application layer at the top of the stack to the OS layer at the bottom of the stack).

We’re working on solutions for all this right now. With our cloud gaming infrastructure providing some insights, we’re working with mobile network operators and carriers, hardware chipset makers, and other partners to create solutions that will address the needs of the metaverse.

What’s the immediate future of cloud gaming?

A lot of people are going to step into the metaverse for the first time through gaming. As we work towards our long-term vision of the metaverse, we’ll continue to build new 2D experiences to help bridge the gap to the metaverse for people across Meta’s family of apps. The recent launch of Crayta on Facebook Gaming as a cloud-streamed creation platform is a great example of this work. 

As we scale our cloud gaming platform, we’re continuously working to upgrade and improve our cloud gaming infrastructure. Over the next two to three years, we’re working on more international expansion to bring games to more people all over the world. We’re also working with mobile network operators and carriers to significantly improve the latency in their access network. On the hardware end, we’re working with chipset makers to improve latency in user devices. We’re also working on new container technologies to provide better streaming efficiency. And, of course, there will always be a continuous push on security as the system grows and improves.

Developers can expect significant improvements, including improving the compatibility of the system to reduce developer overhead and providing them with better tools for development, testing, debugging, experimentation, and analytics. And players, the most important part of our cloud gaming efforts, can expect new and more immersive gaming experiences coming soon. 

Our goal to let people play great games together — wherever and whenever they want — won’t change, but our ongoing efforts with cloud gaming will make sure those experiences only get better.

The post Under the hood: Meta’s cloud gaming infrastructure appeared first on Engineering at Meta.

Engineering at Meta

Published
Categorized as Technology