On March 12, the 800G pluggable MSA group has released the first 800G MSA white paper. The 800G Pluggable MSA group was formed in September 5, 2019, focusing on define interface specifications of the 800G pluggable optical modules for data center applications, including two specifications, 8X100G and 4X200G, and transmission distances including 100m, 500m, and 2km, as shown in the figure 1 below.

Figure 1

At present, believe that there will be a need for Ethernet transmission at 800 Gbps first appearing around 2021, but it will not be until 2023 that the market begins to mature. Correspondingly, there is a QSFP-DD800 MSA group led by Broadcom and Cisco has also been considerable effort to standardize 800G signaling.

The white paper-ENABLING THE NEXT GENERATION OF CLOUD & AI USING 800GB/S OPTICAL MODULES, explains the evolution path for Data Center architectures and optical interconnect requirements of cloud expansion. Different scenarios of 800G applications are discussed from the technical point of view.

Figure 2

The white paper is available to download from the MSA official website with the link: https://www.800gmsa.com/documents

Background – 800G is coming

The white paper points out that according to related research, cloud applications, AR/VR, AI and 5G application will generate more and more traffic. The explosive growth of traffic will lead to the demand for higher bandwidth, as shown in Figure 3. Shows that the global interconnect bandwidth capacity will continue to grow rapidly in the last four years, with a compound growth rate of up to 48%.

Figure 3

The market corresponding to the demand also reflects this trend. As shown in Figure 4, LightCounting Forecast model indicates that 400G modules will grow rapidly in the next 5 years, and 2x400G/800G modules will appear around 2022. According to the CEO of LightCounting Market Research, Dr. Vladimir, the operators of cloud datacenters will deploy 800G modules to keep up with the growth of data traffic by 2023 to 2024. Most of these modules will be still pluggable, and may also see some implementation of co-packaged optics.

Figure 4

The architecture of the cloud datacenters are being challenged by the capacity scaling of switching ASICs, which is doubling approximately every two years, unfazed by the talk about the end of Moore’s Law. The current commercial deployment of Ethernet switches has a capacity of 12.8Tb/s, but it will be replaced by 25.6Tb/s in one year. The capacity evolution path of the switch is shown in Figure 4. This will put further pressure onto the densification of optical interconnects, which do not scale at the speed of CMOS due to the lack of a common design methodology across the various components and a common large scale process.

Figure 5

In the past few years, the rapid adoption and price erosion of 100G short reach optical modules based on direct detection technology and non-return to zero (NRZ) have promoted the rapid expansion of cloud services. Since the IEEE initiated research on 400GE related standards in March 2011, the large-scale deployment of 400G optical modules will begin in 2020, and it is expected to have stronger growth in 2021, as shown in Figure 5. In fact, at the beginning of the application, 400G modules will be mainly used to transport 4x100G over 500m in DR4 application and 2x200G FR4 optics over 2km, not making use of the 400GbE MAC. At the same time, it seems unlikely that IEEE would soon standardize 800GE optical in the short term. At least in the next two years, it will not be able to complete the standardization of the higher density optics for the transport of 8x100GbE or 2x400GbE, but by then the actual demand for 800G has appeared. Therefore, the industry needs to formulate specifications to realize the interconnection and interoperability of 800G products from different manufacturers.

Data Center Architectures

Generally speaking, the structure and traffic characteristics of the data center may be different for different applications. For example, the main traffic in a data center that offering XaaS to external customers is more likely to be from the north-south client-to-server. In this case, the data center can be more smaller geographical clusters. However, if it is a cloud computing or storage data center for internal needs, traffic tends to flow between servers and servers in the east-west direction, which generally requires the huge clusters with a higher radix data center resources. Even though the application are similar, operators can still deploy interconnects solution such as PSM4 or CWDM4 according to their own preferences. This has led to the diversity of data center architecture and technology.

There are at least two main types of data center architectures. The following figure shows a typical data center architecture (layer 3) and its optical interconnect roadmap. However, the usual data center will have more equipment than the figure, and the architecture will appear larger and more complex. A convergence ratio of around 3:1 will be considered between each layer. For example, a Spine switch may be connected to 3 Leaf switches, and so on. Coherent ZR interconnects is required to achieve interconnection with other data centers (DCI scenarios). The sign of the 800G interface rate is that when the rate between the sever and the TOR switch reaches 200G, the TOR-leaf/spine layer would rely on PSM4 4x200G in a fan-out configuration.

Figure 6

The TOR, Leaf, and Spine switches here actually correspond to the access layer, aggregation layer, and core layer that we often say. Usually for a typical data center network (DCN), if you deploy a 200G bandwidth server, you must have 800G fabric, but you can also provide services according to the data center construction cost budget (i.e. bandwidth, transmission distance, etc. Resources) to make some compromises. Table 1 shows the detailed reach requirements depending on the DCN layer.

Scenario Server to TOR TOR to Leaf Leaf to Spine DCI
Bandwidth 200G 800G 800G 800G
Distance 4m within rack; 20m cross-rack ≥70m 100m is preferred 500m/2km 80km-120km

Table 1 – Detailed requirements of the typical hyper scale DCN

Considering the massive computing needs of the recently emerging AI applications, in some supercomputer clusters or AI data centers oriented to AI applications, people usually deploy Layer 2 switching architecture, as shown in Figure 7. This is because considering the characteristics of AI computing, there is no any convergence between the layers. The traffic of each server is already very large, so it directly corresponds to a switch interface and use exclusive bandwidth resources. It can be seen that in this type of AI or supercomputing data center network, the traffic characteristics are different from typical data centers. It is much larger big data flows and less frequent switching.

Figure 7

Compared with the traditional three-layer switching data center network, this two-layer architecture is more convenient and faster to deploy, and at the same time has lower latency, which is very suitable for future AI or supercomputing DCN. Table 2 shows the detailed requirements.

Scenario Server to Leaf Leaf to Spine
Bandwidth 400G 800G
Distance 4m within rack; 20m cross-rack 500m
Latency 92ns (IEEE PMA layer) 92ns (IEEE PMA layer)

Table 2 – Detailed requirements of the AI/HPC cluster DCN

However, for some small companies or small cloud datacenters, the transmission rate between the Leaf and the server may not require such a large bandwidth as 400G. This requires consideration of the relationship between actual application scenarios and costs in the specific design.

It is precisely because of the possible problems of over or under construction of data centers that the rapid expansion of data centers, convenient operation, and cost are major issues that many companies consider first. Enterprises want to deploy the most flexible solution, and usually choose the data center hosting model. Data center hosting operators allow users to “pay on demand and gradually expand.” Users can expand or reduce the rented space as needed, and only need to pay related usage fees, so that users have no idle or insufficient capacity, and there are no problems related to all facilities, and the value of IT investment can be maximized .

Finally, the figure below shows Cisco’s latest VNI traffic forecast, pointing out that video traffic has accounted for an increasing proportion of the network in recent years. By 2022, it is expected that video-related traffic will account for more than 80% of the entire Internet traffic. The continuous rise of video services is accompanied by changes in the bearer network architecture and changes in traffic distribution. This is why the data center network is so popular in recent years.

With the construction of content delivery network (CDN) and the sinking of data center network (DCN), content, such as videos, files, etc., these network resources are cached closer to users to provide lower latency, faster buffer rate. In this way, most of the traffic will no longer need to be transmitted through the long-distance backbone network, but will be terminated within the short/medium-distance metropolitan area network or data center network. As early as 2017, a report pointed out that the traffic of the short/medium-distance metro network has exceeded the traffic of the long-distance backbone network. The data center, especially the data center interconnect (DCI), is the most typical application of the metropolitan area network. Therefore, it is not surprising that the topic about it has become hot in recent years.

Figure 8