{"id":332,"date":"2021-08-31T14:39:51","date_gmt":"2021-08-31T14:39:51","guid":{"rendered":"https:\/\/fde.cat\/?p=332"},"modified":"2021-08-31T14:39:51","modified_gmt":"2021-08-31T14:39:51","slug":"fully-sharded-data-parallel-faster-ai-training-with-fewer-gpus","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/fully-sharded-data-parallel-faster-ai-training-with-fewer-gpus\/","title":{"rendered":"Fully Sharded Data Parallel: faster AI training with fewer GPUs"},"content":{"rendered":"<p><span>Training AI models at a large scale isn\u2019t easy. Aside from the need for large amounts of computing power and resources, there is also considerable engineering complexity behind training very large models. At Facebook AI Research (FAIR) Engineering, we have been working on building tools and infrastructure to make training large AI models easier. Our recent work in areas such as <\/span><a href=\"https:\/\/github.com\/pytorch\/fairseq\/blob\/master\/examples\/megatron_11b\/README.md\"><span>intra-layer model parallelism<\/span><\/a><span>, <\/span><a href=\"https:\/\/fairscale.readthedocs.io\/en\/latest\/deep_dive\/pipeline_parallelism.html\"><span>pipeline model parallelism<\/span><\/a><span>, <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale#optimizer-state-sharding-zero\"><span>optimizer state+gradient sharding<\/span><\/a><span>, and <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale\/blob\/master\/fairscale\/nn\/moe\/moe_layer.py\"><span>mixture of experts<\/span><\/a><span> is just part of our work to make training advanced AI models for any number of tasks more efficient.<\/span><\/p>\n<p><span>Fully Sharded Data Parallel (FSDP) is the newest tool we\u2019re introducing. It <a href=\"https:\/\/engineering.fb.com\/2020\/08\/24\/production-engineering\/scaling-services-with-shard-manager\/\">shards<\/a> an AI model\u2019s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. As its name suggests, FSDP is a type of data-parallel training algorithm. Although the parameters are sharded to different <a href=\"https:\/\/engineering.fb.com\/2018\/03\/20\/ml-applications\/the-next-step-in-facebook-s-ai-hardware-infrastructure\/\">GPUs<\/a>, the computation for each microbatch of data is still local to each GPU worker. This conceptual simplicity makes FSDP easier to understand and more applicable to a wide range of usage scenarios (compared with intra-layer parallelism and pipeline parallelism). Compared with optimizer state+gradient sharding data parallel methods, FSDP shards parameters more uniformly and is capable of better performance via communication and computation overlapping during training.<\/span><\/p>\n<p><span>With FSDP, it is now possible to more efficiently train models that are orders of magnitude larger using fewer GPUs. FSDP has been implemented in the <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale\"><span>FairScale library<\/span><\/a><span> and allows engineers and developers to scale and optimize the training of their models with simple APIs. At Facebook, FSDP has already been integrated and tested for training some of our <\/span><a href=\"https:\/\/github.com\/pytorch\/fairseq\"><span>NLP<\/span><\/a><span> and<\/span><a href=\"https:\/\/github.com\/facebookresearch\/vissl\"><span> Vision<\/span><\/a><span> models.<\/span><\/p>\n<h2><span>The high computational cost of large-scale training<\/span><\/h2>\n<p><a href=\"https:\/\/arxiv.org\/pdf\/2001.08361.pdf\"><span>NLP research<\/span><\/a><span> is one particular area where we can see the importance of efficiently leveraging compute for training AI. Last year, OpenAI announced that they had trained <\/span><a href=\"https:\/\/neurips.cc\/virtual\/2020\/public\/poster_1457c0d6bfcb4967418bfb8ac142f64a.html\"><span>GPT-3<\/span><\/a><span>, the largest-ever neural language model, with 175 billion parameters. It is <\/span><a href=\"https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3\/\"><span>estimated<\/span><\/a><span> to have taken roughly 355 GPU years to train GPT-3, or the equivalent of 1,000 GPUs working continuously for more than four months.<\/span><\/p>\n<p><span>Besides requiring a lot of compute and engineering resources, most approaches to scaling like this introduce additional communication costs and require engineers to carefully evaluate trade-offs between memory use and computational efficiency. For example, typical data parallel training requires maintaining redundant copies of the model on each GPU, and model parallel training introduces additional communication costs to move activations between workers (GPUs).<\/span><\/p>\n<p><span>FSDP is relatively free of trade-offs in comparison. It improves memory efficiency by sharding model parameters, gradients, and optimizer states across GPUs, and improves computational efficiency by decomposing the communication and overlapping it with both the forward and backward passes. FSDP produces identical results as standard distributed data parallel (DDP) training and is available in an easy-to-use interface that\u2019s a drop-in replacement for PyTorch\u2019s DistributedDataParallel module. Our early testing has shown that FSDP can enable scaling to trillions of parameters.<\/span><\/p>\n<h2><span>How FSDP works<\/span><\/h2>\n<p><span>In standard DDP training, every worker processes a separate batch and the gradients are summed across workers using an <\/span><a href=\"https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html#allreduce\"><span>all-reduce operation<\/span><\/a><span>. While DDP has become very popular, it takes more GPU memory than it needs because the model weights and optimizer states are replicated across all DDP workers.<\/span><\/p>\n<p><span>One method to reduce replications is to apply a process called full parameter sharding, where only a subset of the model parameters, gradients, and optimizers needed for a local computation is made available. An implementation of this method, ZeRO-3, has already been popularized by Microsoft.\u00a0<\/span><\/p>\n<p><span>The key insight to unlock full parameter sharding is that we can decompose the <\/span><a href=\"https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html#allreduce\"><span>all-reduce<\/span><\/a><span> operations in DDP into separate <\/span><a href=\"https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html#reducescatter\"><span>reduce-scatter<\/span><\/a><span> and <a href=\"https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html#allgather\">all-gather<\/a> operations:<\/span><\/p>\n<p>All-reduce as a combination of reduce-scatter and all-gather. The standard all-reduce operation to aggregate gradients can be decomposed into two separate phases: reduce-scatter and all-gather. During the reduce-scatter phase, the gradients are summed in equal blocks among ranks on each GPU based on their rank index. During the all-gather phase, the sharded portion of aggregated gradients available on each GPU are made available to all GPUs (see here for details on those operators).<\/p>\n<p><span>We can then rearrange the reduce-scatter and all-gather so that each DDP worker needs to store only a single shard of parameters and optimizer states. The figure below illustrates standard DDP training (top) and FSDP training (bottom):<\/span><\/p>\n<p>A comparison of standard data parallel training and fully sharded data parallel training. In standard data parallel training methods, a copy of the model is present on each GPU and a sequence of forward and backward passes are evaluated on only a shard of the data. After these local computations, the parameters and optimizers for each local process are shared with the other GPUs in order to calculate the global weight update. In FSDP, only a shard of the model is present on a GPU. Then, locally, all weights are gathered from the other GPUs \u2014 by means of an all-gather step \u2014 to calculate the forward pass. This gathering of weights is then performed again before the backward pass. After that backward pass, the local gradients are averaged and sharded across the GPUs by means of a reduce-scatter step, which allows each GPU to update its local weight shard.<\/p>\n<p><span>To maximize memory efficiency, we can discard the full weights after each layer\u2019s forward pass, saving memory for subsequent layers. This can be implemented by applying the FSDP wrapper to every layer in the network (with <\/span><span>reshard_after_forward=True<\/span><span>). <\/span><\/p>\n<p><span>In pseudo-code:<\/span><\/p>\n<p><span>FSDP forward pass:<\/span><br \/>\n<span> \u00a0\u00a0\u00a0for layer_i in layers:<\/span><br \/>\n<span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0all-gather full weights for layer_i<\/span><br \/>\n<span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0forward pass for layer_i<\/span><br \/>\n<span>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0discard full weights for layer_i<\/span><\/p>\n<p><span>FSDP backward pass:<\/span><br \/>\n<span> \u00a0\u00a0\u00a0for layer_i in layers:<\/span><br \/>\n<span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0all-gather full weights for layer_i<\/span><br \/>\n<span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0backward pass for layer_i<\/span><br \/>\n<span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0discard full weights for layer_i<\/span><br \/>\n<span>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0reduce-scatter gradients for layer_i<\/span><\/p>\n<h2 class=\"line-numbers\"><span>How to use FSDP<\/span><\/h2>\n<p><span>There are several ways to use FSDP in large-scale AI research.<\/span><span> At this time, we offer four solutions to adapt to different needs.<\/span><\/p>\n<h3><span>1. Using FSDP in language models<\/span><\/h3>\n<p class=\"line-numbers\"><span>For language models, FSDP is supported in the <\/span><a href=\"https:\/\/github.com\/pytorch\/fairseq\"><span>fairseq<\/span><span> framework<\/span><\/a><span> via the following new arguments:<\/span><\/p>\n<p><span>\u2013ddp-backend=fully_sharded<\/span><span>: enables full sharding via FSDP<\/span><br \/>\n<span>\u2013cpu-offload<\/span><span>: offloads the optimizer state and FP32 model copy to CPU (combine with<\/span><span>\u2013optimizer=cpu_adam<\/span><span>)<\/span><br \/>\n<span>\u2013no-reshard-after-forward<\/span><span>: increases training speed for large models (1B+ params) and is similar to ZeRO stage 2<\/span><br \/>\nOther popular options (<span>\u2013fp16<\/span><span>, <\/span><span>\u2013update-freq<\/span><span>, <\/span><span>\u2013checkpoint-activations<\/span><span>, <\/span><span>\u2013offload-activations<\/span><span>, etc.) continue to work as normal<\/span><\/p>\n<p><span>See the <\/span><a href=\"https:\/\/github.com\/pytorch\/fairseq\/tree\/master\/examples\/fully_sharded_data_parallel\"><span>fairseq tutorial<\/span><\/a><span> for instructions on using FSDP to train a 13B-parameter model on eight GPUs or on a single GPU with FSDP + CPU offloading.<\/span><\/p>\n<h3><span>2. Using FSDP in computer vision models<\/span><\/h3>\n<p><span>For computer vision models, FSDP is supported in <\/span><a href=\"https:\/\/github.com\/facebookresearch\/vissl\"><span>VISSL<\/span><\/a><span> and tested on RegNets architectures. Layers like BatchNorm and ReLU are seamlessly handled and tested for convergence.<\/span><\/p>\n<p><span>Use the following options to enable FSDP:<\/span><\/p>\n<p><span>config.MODEL.FSDP_CONFIG.AUTO_SETUP_FSDP=True<\/span><br \/>\n<span>config.MODEL.SYNC_BN_CONFIG.SYNC_BN_TYPE=pytorch<\/span><br \/>\n<span>config.MODEL.AMP_PARAMS.AMP_TYPE=pytorch<\/span><\/p>\n<p><span>See <\/span><a href=\"https:\/\/github.com\/facebookresearch\/vissl\/blob\/40441123a6f7098500676ca8800025c1f02e28b3\/vissl\/config\/defaults.yaml#L498-L513\"><span>this section<\/span><\/a><span> of the yaml config for additional options to config FSDP within VISSL.<\/span><\/p>\n<h3><span>3. Using FSDP from PyTorch Lightning<\/span><\/h3>\n<p><span>For easier integration with more general use cases, FSDP is supported as a beta feature by PyTorch Lightning. <\/span><a href=\"https:\/\/pytorch-lightning.readthedocs.io\/en\/latest\/advanced\/advanced_gpu.html#fully-sharded-training\"><span>This tutorial<\/span><\/a><span> contains a detailed example on how to use the FSDP plugin with PyTorch Lightning. At a high level, adding <\/span><span>plugins=\u2019fsdp\u2019<\/span><span> below can activate it.<\/span><\/p>\n<p><span>model = MyModel()<\/span><br \/>\n<span>trainer = Trainer(gpus=4, <\/span>plugins=&#8217;fsdp&#8217;<span>, precision=16)<\/span><br \/>\n<span>trainer.fit(model)<br \/>\n<\/span><span><br \/>\ntrainer.test()<\/span><br \/>\n<span>trainer.predict()<\/span><\/p>\n<h3><span>4. Using the FSDP library directly from FairScale<\/span><\/h3>\n<p class=\"line-numbers\"><span>The main library where FSDP has been developed, and where you can find the latest updates, is <\/span><a href=\"https:\/\/fairscale.readthedocs.io\/en\/latest\/deep_dive\/oss_sdp_fsdp.html\"><span>FairScale<\/span><\/a><span>. You can directly use FSDP from FairScale with the below example by simply replacing the <\/span><span>DDP(my_module)<\/span><span>:<\/span><\/p>\n<p><span>from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP<\/span><br \/>\n<span>&#8230;<\/span><br \/>\n<span>sharded_module = <\/span><span>DDP(my_module)<\/span>FSDP(my_module)<br \/>\n<span>optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)<\/span><br \/>\n<span>for sample, label in dataload.next_batch:<\/span><br \/>\n<span> \u00a0out = sharded_module(x=sample, y=3, z=torch.Tensor([1]))<\/span><br \/>\n<span> \u00a0loss = criterion(out, label)<\/span><br \/>\n<span> \u00a0loss.backward()<\/span><br \/>\n<span>\u00a0\u00a0optim.step()<\/span><\/p>\n<p><span>The FSDP library in FairScale exposes the low-level options for many important aspects of large-scale training. Here are some few important areas to consider when you apply FSDP with its full power.<\/span><\/p>\n<p>Model wrapping: <span>In order to minimize the transient GPU memory needs, users need to wrap a model in a nested fashion. This introduces additional complexity. The <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale\/blob\/master\/fairscale\/nn\/wrap\/auto_wrap.py\"><span>auto_wrap<\/span><\/a><span> utility is useful in annotating existing PyTorch model code for nested wrapping purposes.<\/span><br \/>\nModel initialization:<span> Unlike DDP, FSDP does <\/span>not<span> automatically synchronize model weights between GPU workers. This means model initialization must be done carefully so that all GPU workers have the identical initial weights.<\/span><br \/>\nOptimizer settings:<span> Due to sharding and wrapping, only certain types of optimizer and optimizer settings are supported by FSDP. In particular, if a module is wrapped by FSDP and its parameters are flattened into a single tensor, users cannot use different hyperparameters for different parameter groups in such a module.<\/span><br \/>\nMixed precision:<span> FSDP supports advanced mixed precision training with FP16 master weights, as well as FP16 reduce and scatter on the gradients. Certain parts of a model may converge only if full precision is used. In those cases, additional wrapping is needed to selectively run parts of a model in full precision.<\/span><br \/>\nState checkpointing and inference:<span> When the model scale is large, saving and loading the model state can become challenging. FSDP supports several ways to make that task possible, but it is by no means trivial.<\/span><br \/>\n<span>Finally, FSDP is often used together with <\/span>activation checkpointing<span> functions like <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale\/blob\/master\/fairscale\/nn\/checkpoint\/checkpoint_activations.py\"><span>checkpoint_wrapper<\/span><\/a><span> from FairScale. Users may need to carefully tune the activation checkpointing strategy to fit a large model within limited GPU memory space.<\/span><\/p>\n<h2><span>Next steps<\/span><\/h2>\n<p><span>FSDP is open source, and early users have tried it and contributed to it. We think it can benefit the entire research community, and we look forward to working with everyone in making it better. In particular, these are some of the important areas.<\/span><\/p>\n<p>Making FSDP more general.<span> So far, FSDP has been used on both NLP and vision models with SGD and Adam optimizers. As newer models and optimizers emerge, FSDP needs to continue supporting them. Being a purely data-parallel training scheme, FSDP has the greatest potential to be general in supporting a wide range of AI algorithms.<\/span><br \/>\nMaking FSDP auto-tune. <span>There are many knobs that users can tune today with FSDP for both scaling and performance. We look forward to developing algorithms for auto-tuning both GPU memory usage and training performance.<\/span><br \/>\n<span>In addition to training, more <\/span>scalable inference<span> and model serving is an important use case that FSDP might need to support.<\/span><br \/>\n<span>Last but not least, refactoring and continuing to <\/span>modularize FSDP<span> and its core components is equally important to newer and better features.<\/span><\/p>\n<h2><span>Try it out and contribute!<\/span><\/h2>\n<p><span>FSDP is currently available directly from the <\/span><a href=\"https:\/\/github.com\/facebookresearch\/fairscale\"><span>FairScale library<\/span><\/a><span>.<\/span><\/p>\n<p><span>Thanks for sticking with us thus far. Please try FSDP in your research or production work. We would love to hear your feedback, and, as always, pull requests are welcome! <\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2021\/07\/15\/open-source\/fsdp\/\">Fully Sharded Data Parallel: faster AI training with fewer GPUs<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Facebook Engineering<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.fb.com\/2021\/07\/15\/open-source\/fsdp\/\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Training AI models at a large scale isn\u2019t easy. Aside from the need for large amounts of computing power and resources, there is also considerable engineering complexity behind training very large models. At Facebook AI Research (FAIR) Engineering, we have been working on building tools and infrastructure to make training large AI models easier. Our&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/08\/31\/fully-sharded-data-parallel-faster-ai-training-with-fewer-gpus\/\">Continue reading <span class=\"screen-reader-text\">Fully Sharded Data Parallel: faster AI training with fewer GPUs<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-332","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":802,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/12\/developing-the-new-xgen-salesforces-foundational-large-language-models\/","url_meta":{"origin":332,"position":0},"title":"Developing the New XGen: Salesforce\u2019s Foundational Large Language Models","date":"December 12, 2023","format":false,"excerpt":"By Shafiq Rayhan Joty and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Shafiq Rayhan Joty, a Director at Salesforce AI Research. Shafiq co-leads the development of XGen, a series of groundbreaking large language models (LLMs) of different\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":879,"url":"https:\/\/fde.cat\/index.php\/2024\/06\/12\/how-meta-trains-large-language-models-at-scale\/","url_meta":{"origin":332,"position":1},"title":"How Meta trains large language models at scale","date":"June 12, 2024","format":false,"excerpt":"As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we\u2019ve experienced is the sheer scale of computation required to train large language models (LLMs). Traditionally, our AI model training has involved a training massive number of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":787,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/15\/watch-metas-engineers-on-building-network-infrastructure-for-ai\/","url_meta":{"origin":332,"position":2},"title":"Watch: Meta\u2019s engineers on building network infrastructure for AI","date":"November 15, 2023","format":false,"excerpt":"Meta is building for the future of AI at every level \u2013 from hardware like MTIA v1, Meta\u2019s first-generation AI inference accelerator to publicly released models like Llama 2, Meta\u2019s next-generation large language model, as well as new generative AI (GenAI) tools like Code Llama. Delivering next-generation AI products and\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":618,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/10\/scaling-data-ingestion-for-machine-learning-training-at-meta\/","url_meta":{"origin":332,"position":3},"title":"Scaling data ingestion for machine learning training at Meta","date":"August 10, 2022","format":false,"excerpt":"Many of Meta\u2019s products, such as search and language translations, utilize AI models to continuously improve user experiences. As the performance of hardware we use to support training infrastructure increases, we need to scale our data ingestion infrastructure accordingly to handle workloads more efficiently. GPUs, which are used for training\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":806,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/19\/ai-debugging-at-meta-with-hawkeye\/","url_meta":{"origin":332,"position":4},"title":"AI debugging at Meta with HawkEye","date":"December 19, 2023","format":false,"excerpt":"HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":599,"url":"https:\/\/fde.cat\/index.php\/2022\/06\/14\/applying-federated-learning-to-protect-data-on-mobile-devices\/","url_meta":{"origin":332,"position":5},"title":"Applying federated learning to protect data on mobile devices","date":"June 14, 2022","format":false,"excerpt":"What the research is: Federated learning with differential privacy (FL-DP) is one of the latest privacy-enhancing technologies being evaluated at Meta as we constantly work to enhance user privacy and further safeguard users\u2019 data in the products we design, build, and maintain. FL-DP enhances privacy in two important ways: It\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=332"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/332\/revisions"}],"predecessor-version":[{"id":378,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/332\/revisions\/378"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}