How to Choose an AI Server Power Supply Unit (PSU)? AI Server Power Supply Solutions

2023-10-04

如何挑選AI伺服器電源供應器(PSU)？

Overviews

With the rapid development and widespread adoption of AI technology, the server market has undergone significant changes in recent years. The launch of OpenAI's ChatGPT has sparked a trend in language models and intelligent chatbots. The computation behind ChatGPT relies on powerful "AI servers," which has brought attention to the AI server market.

What is an AI server?

An AI server is a specially designed and optimized server that may have one or more high-performance GPUs (Graphics Processing Units) or dedicated AI accelerators, such as Google's Tensor Processing Units (TPU) or NVIDIA's AI accelerator cards, among others. These hardware components provide a significant amount of parallel processing power for AI applications. Software is also a crucial component of AI servers. This may include operating systems optimized for AI and machine learning workloads, as well as libraries and tools that support AI frameworks like TensorFlow, PyTorch, and others.

Why do we need AI servers? Applications of AI servers

We need AI servers because the computational demands of AI are extremely high. AI servers offer specially optimized hardware and software for storing and processing vast amounts of data, thus supporting the training and execution of AI models. The applications of AI servers are very diverse, including image and voice recognition, natural language processing, predictive analytics, personalized recommendation systems, autonomous driving (image recognition), and the medical field (intelligent diagnosis), among others.

Differences between AI servers and general servers

The latest specification AI servers currently consume up to 6000 watts per unit, with data center energy consumption accounting for about 2% of global energy use. However, AI servers are more energy-efficient and environmentally friendly than general servers, with significantly faster processing speeds in applications such as model training, AI inference, and generative AI (GAI). The large language models behind AI model training contain billions to hundreds of billions of parameters, expected to exceed tens of trillions in 2024, making AI servers a key driver of technological advancement.

The main difference between AI servers and general servers lies in their design and purpose. General servers are mainly used for storing data, running programs, and providing network services, while AI servers are specifically designed for AI training and applications. Compared to general servers, AI servers are equipped with more powerful CPUs, GPUs, or other customized accelerators, giving them greater computing power, larger memory storage capacity, high network bandwidth, and low latency. They also feature advanced heat management technology and highly efficient power modules. What are the differences between AI servers (AI Server) and the traditional general servers (General Server) used in the past? Let's further compare them in the table.

The differences between AI servers and general servers can be summarized as follows
	General Server	Entry-level Accelerated Server	High-end Accelerated Server
Workload	Traditional Machine Learning	Inference, Generative AI	Inference, Training
CPU	1 or 2 CPUs	1 CPU	2 or more CPUs
Accelerator	CPU Built-In	1-4 GPUs or other custom accelerators.	4~10 GPUs or other custom accelerators.
Memory	Registered DDR Memory	Registered DDR Memory+GDDR VRAM	Registered DDR Memory+ HBM
Network Transfer	10 or 25 Gbps Ethernet	100+ Gbps Ethernet	400+ Gbps Ethernet NIC, Infiniband
Power Module	1300W~2000Wx2	2000Wx3 or 3000Wx4	3000Wx6

The current mainstream AI server is NVIDIA's H100, which adopts the Hopper GPU architecture and is the ninth-generation data center GPU, achieving 30 times the performance of the previous generation A100. It is particularly suitable for training large language models. However, its power consumption is also astonishing. According to the International Energy Agency (IEA), training an AI model consumes more electricity than 100 households in a year, indicating that data centers will become major electricity consumers in the future. AI servers prioritize system availability, and power interruptions during training can lead to loss of results. Therefore, multiple high-power PSU modules are required in parallel on AI servers to ensure uninterrupted operation.

FSP has many years of experience in developing high-power PSU modules, and its product line supports both traditional general servers and the latest AI acceleration servers. Several server brands have adopted FSP products. FSP PSU modules are designed with full digital (Full Digital) technology, and their efficiency performance meets the 80 PLUS Titanium standard. When PSU modules are used in parallel, they can support high-end AI servers with 4 to 10 GPUs operating synchronously, meeting AI computing requirements while achieving significant energy savings.

In the future, we will continue to see more innovative products and new architectures in the field of AI acceleration computing to support the continuous development of artificial intelligence. These new technologies will make computing more efficient, but GPU power consumption will inevitably continue to increase, requiring more from PSU modules. FSP continues to monitor industry trends and launch corresponding products. If you would like to learn more, please visit https://www.fsp-group.com/en/product/IPCPSU.html.

Advantages and prospects of AI servers

With the demand for emerging technologies such as VR/AR, ultra-high resolution, and autonomous driving, the world is currently in the era of a significant increase in data volume. According to IDC statistics, global cloud data has grown from 4.4ZB in 2013 to over 50ZB by 2023, a growth of more than tenfold. With the rapid growth of global cloud data, AI servers that excel at simultaneously processing large amounts of data and serving as fundamental equipment for storing vast amounts of data are destined to become a battleground for various industries. However, the PSU within AI servers is a crucial component that affects the performance of AI servers. This is because the hardware components inside AI servers have significantly higher power demands compared to regular servers. Therefore, the PSU inside an AI server must supply more power than regular servers to drive these high-performance components, while also ensuring sufficient redundant power to handle load variations and prevent data congestion.

The importance of AI server power supply (wattage, stability, reliability)

To understand how to select a suitable AI server power supply, one must first grasp its fundamentals. For dependable operation, AI servers rely on robust and stable PSUs. The power supply serves as a vital component responsible for converting alternating current (AC) from the electrical grid into the direct current (DC) necessary for the server's electronic components. In the context of high-performance AI servers, the PSU must deliver both ample and stable power to drive CPUs, GPUs, or AI accelerators. Additionally, power supply efficiency is crucial, as it directly influences overall energy consumption and heat dissipation requirements. A highly efficient power supply can minimize energy waste and reduce the demand for heat dissipation, ensuring server stability during high-load operations. Moreover, an efficient power supply can extend its lifespan and mitigate server downtime resulting from power-related issues.

How to choose an AI server PSU?

After understanding the importance of AI server power supply units (PSUs), let's now look at how to choose a good PSU. We can consider factors such as power requirements, efficiency rating, stability and reliability, protection mechanisms, connectors and dimensions, heat dissipation, and noise.

Power Requirements

First, ensure that the PSU provides enough power to meet the needs of all hardware and prevent any power fluctuations or interruptions that could lead to system failure. It is generally better to choose a wattage slightly higher than the requirement, rather than just meeting the power requirement. The more complex the computing system, the higher the wattage of the power supply. The total wattage of the PSU for an AI server can reach 18kW.

Efficiency Rating

Efficiency rating is also an important consideration. Low-efficiency products waste electricity, increase heat generation, and may reduce the lifespan of the PSU. The efficiency rating system mainly uses the "80 Plus" energy efficiency rating, which means the power supply can ensure at least 80% efficiency.

80 PLUS Rating
Rating	Typical Load Efficiency
80 PLUS Bronze	85%
80 PLUS Silver	89%
80 PLUS Gold	92%
80 PLUS Platinum	94%
80 PLUS Titanium	96%
Source: Intel, compiled by the author

Stability and Reliability

A good power supply unit (PSU) must have excellent stability and reliability to play a crucial protective role. Excellent PSU manufacturers will conduct various product tests, including output voltage adjustment, power regulation rate, and load regulation rate.

Protection Mechanisms

To ensure that the PSU has good protection functions, such as Over Current Protection (OCP), Over Temperature Protection (OTP), and Over Voltage Protection (OVP). These three protection functions are built-in safety features of the PSU, used to prevent hardware damage and ensure stable system operation. Among them, Over Current Protection (OCP) will shut down or limit the PSU when the output current exceeds the limit to prevent hardware damage; Over Temperature Protection (OTP) refers to the PSU automatically shutting down when the internal temperature is too high to prevent overheating; Over Voltage Protection (OVP) function is to shut down or limit the PSU when the output voltage exceeds the limit to protect the hardware from high voltage damage.

Connectors and Dimensions

The design of the PSU cables mainly includes three types: full modular design, semi-modular design, and non-modular design. The difference lies in whether the cables can be detached, which affects the customization ability. The choice of connectors is also important for correctly connecting the motherboard and other hardware components. In addition, PSUs come in many different sizes, choosing the right size can ensure that there is enough space inside the chassis to accommodate it.

Heat Dissipation and Noise

The PSU requires a cooling fan for heat dissipation, so the efficiency of the cooling fan and the noise it generates become considerations when choosing a PSU. A PSU with a faster fan speed may have better heat dissipation efficiency, but it may generate more noise. Therefore, consumers need to balance between the two or choose a PSU produced by a professional manufacturer to ensure a good balance between heat dissipation and noise.

FSP AI server power supply solutions

Considering these requirements, FSP has introduced a series of AI server power supplies, with the most representative being the FSP3000-20FE. This product features extremely low total harmonic distortion (iTHD), an operating temperature range of 0 to 55°C, and is designed to operate at altitudes of up to 5,000 meters, providing a total power capacity of up to 3,000 watts. In other words, even in harsh and extreme environments, this product can continue to perform, showcasing its strong adaptability. Moreover, this product incorporates circuit protection designs, including OCP, OTP, and OVP, along with output short-circuit protection and a resettable power shut-off feature that allows it to communicate with the motherboard, ensuring reliable safety. With these protections in place, concerns about overheating or system failures due to excessive loads are a thing of the past. For AI servers, FSP has also introduced several excellent products, such as the YSEC1600AM-2A00P10 and YSEC2000AM-2A00P10. These are specialized PSUs for AI servers, boasting 80 PLUS® Platinum certification with an efficiency rating of up to 94% and built-in PMBus 1.2 technology. Their advantage lies in their compact size, making them suitable for installation in edge computing devices. Furthermore, due to their high conversion efficiency and excellent heat dissipation performance, they ensure stable operation of both the power supply and edge computing devices during long-term usage.

In the future, with the further development of edge computing and AI technology, the demand for these hardware components will continue to rise. Among them, AI servers and their PSUs will play increasingly important roles in the future. Choosing the right power supply ensures not only the smooth operation of AI servers but also provides assurance for the ongoing development and innovation of AI applications.

IPC PSU: FSP3000-20FE

Low iTHD
Working temperature: 0 to 55°C
Design for 5,000 meters above sea level
Support OCP, OTP and OVP circuit protection
Short circuit protection on all outputs
Resettable power shut down
MTBF: 250K hours of continuous operation at 40°C, 75% output load

YSEC1600AM-2A00P10

N+1 Redundant
Support PMBus 1.2
Hight Power Density 39.5 W/in³
Application for IPC/Storage/Embedded server/Networking

YSEC2000AM-2A00P10

N+1 Redundant
Support PMBus 1.2
Hight Power Density 48.3 W/in³
Application for IPC/Storage/Embedded server/Networking

Know more about edge computing applications

About FSP

FSP Group is one of the global leading power supply manufacturer. Since 1993, FSP Group has followed the management conception “service, profession, and innovation” to fulfill its responsibilities as a green energy resolution supplier.