A piece of news about the support of domestic chip formats has unexpectedly become the focus of recent discussions in the AI circle. What is reflected behind this is actually the long-standing adaptation problem between domestic large models and computing power base.
Adaptation dilemma and performance loss

In the past, the development of large models relied on mainstream foreign chips for a long time, and its algorithms and tool chains were also designed around these chips. When domestic manufacturers try to transfer models to run on domestic GPUs, they often encounter compatibility issues, resulting in a significant reduction in computing efficiency.
This performance loss is not due to insufficient absolute computing power of the chip, but more to the lack of agreement on the software stack and underlying optimization. Model manufacturers have to invest a lot of energy in secondary adaptation. This process is both time-consuming and energy-consuming, and it is difficult to achieve the performance level of native support, thus forming a bottleneck for development.
The cost reduction significance of FP8 format
The training of large models requires processing massive amounts of data, and the inference of large models also requires processing massive amounts of data. Traditionally, high-precision formats such as FP16 and FP32 are used, which places extremely high requirements on storage and computing power. FP8 is a low-precision format that can significantly reduce the amount of data storage, and it can significantly reduce transmission bandwidth requirements.

In a domestic environment where computing resources are relatively tight, using low-precision formats such as FP8 is a pragmatic choice. It can significantly reduce computing costs while maintaining model effects as much as possible, allowing limited computing resources to support larger models or more frequent inference tasks.
From format support to collaborative design
Recent industry trends have shown that the current focus has changed. It is no longer simply requiring chips to support a certain format, but is moving towards deeper collaborative optimization. Domestic chip manufacturers have announced that their products have natively supported the FP8 format and achieved mass production, which has laid a hardware foundation for ecological construction.

However, the format provided by the hardware alone is not enough. The actual breakthrough is that the model manufacturer and the chip manufacturer start to cooperate in the hardware design stage to optimize the tool chain together. This shows that the algorithm design can be closer to the hardware characteristics of the chip, thereby releasing greater performance potential.
The core value of tool chain optimization
Therefore, the key to a technology being updated may not be that it uses a cutting-edge data format, but that it embodies a customized tool chain design idea for domestic chips. This design can enable the algorithm to "understand" more effectively and take advantage of the unique architecture of domestic chips.
This kind of collaboration will change the past adaptation model. Core material manufacturers do not need to passively adapt to rapidly iterative models after the hardware production is completed. Model manufacturers can also participate at an early stage to ensure that their technical routes and hardware development directions are consistent, which can improve overall R&D efficiency.
An ecological attempt of industry co-construction
This type of collaboration is not a single example. Recently, other domestic technology companies have also launched brand-new AI computing platforms, focusing on optimizing the combination of software and hardware to maximize computing power resources. Their purpose is the same: to pursue higher computing efficiency under the current hardware conditions.
More start-up companies will directly invest in underlying optimization by building joint laboratories with chip manufacturers or adopting strategic cooperation methods. They realized that when the chip went through a development cycle of several years, early intervention and adaptation was a unique and effective way to cope with the rapid iteration of the model.

Breaking through the limitations of single-point innovation
According to the current practical situation, any progress made in only a single link will be difficult to truly form a decisive advantage. Whether it is the manufacturing process of the chip or the innovation of the algorithm in the model, all of these must be integrated at the system level. Contents such as software optimization, compilation technology and memory scheduling, which are called "black technologies", have become equally important.
The recent adjustments that Huawei’s Pangu large model has undergone also remind us that in the global AI competition, it is difficult for Chinese companies to win by fighting alone. Future competitiveness will come from close collaboration and co-evolution of the complete ecological chain starting from chips, through frameworks, models, and applications.
Assuming that you have the idea of building a competitive domestic artificial intelligence ecosystem, which key link is most urgently needed to be solved at this moment? Welcome to the comment area to share your views. At the same time, please like and share this article with more friends who care about this topic.

