Intel’s Gaudi 3 Goes After Nvidia

Although the race to power the massive ambitions of AI companies might seem like it’s all about Nvidia, there is a real competition going in AI accelerator chips. The latest example: At
Intel’s Vision 2024 event this week in Phoenix, Ariz., the company gave the first architectural details of its third-generation AI accelerator, Gaudi 3.

With the predecessor chip, the company had touted how close to parity its performance was to Nvidia’s top chip of the time, H100, and claimed a superior ratio of price versus performance. With Gaudi 3, it’s pointing to large-language-model (LLM) performance where it can claim outright superiority. But, looming in the background is Nvidia’s next GPU, the Blackwell B200, expected to arrive later this year.

Gaudi Architecture Evolution

Gaudi 3 doubles down on its predecessor Gaudi 2’s architecture, literally in some cases. Instead of Gaudi 2’s single chip, Gaudi 3 is made up of two identical silicon dies joined by a high-bandwidth connection. Each has a central region of 48 megabytes of cache memory. Surrounding that are the chip’s AI workforce—four engines for matrix multiplication and 32 programmable units called tensor processor cores. All that is surrounded by connections to memory and capped with media processing and network infrastructure at one end.

Intel says that all that combines to produce double the AI compute of Gaudi 2 using 8-bit floating-point infrastructure that has emerged as
key to training transformer models. It also provides a fourfold boost for computations using the BFloat 16 number format.

Gaudi 3 LLM Performance

Intel projects a 40 percent faster training time for the GPT-3 175B large language model versus the H100 and even better results for the 7-billion and 8-billion parameter versions of Llama2.

For inferencing, the contest was much closer, according to Intel, where the new chip delivered 95 to 170 percent of the performance of H100 for two versions of Llama. Though for the Falcon 180B model, Gaudi 3 achieved as much as a fourfold advantage. Unsurprisingly, the advantage was smaller against the Nvidia H200—80 to 110 percent for Llama and 3.8x for Falcon.

Intel claims more dramatic results when measuring power efficiency, where it projects as much as 220 percent H100’s value on Llama and 230 percent on Falcon.

“Our customers are telling us that what they find limiting is getting enough power to the data center,” says Intel’s Habana Labs chief operating officer Eitan Medina.

The energy-efficiency results were best when the LLMs were tasked with delivering a longer output. Medina puts that advantage down to the Gaudi architecture’s large-matrix math engines. These are 512 bits across. Other architectures use many smaller engines to perform the same calculation, but Gaudi’s supersize version “needs almost an order of magnitude less memory bandwidth to feed it,” he says.

Gaudi 3 Versus Blackwell

It’s speculation to compare accelerators before they’re in hand, but there are a couple of data points to compare, particular in memory and memory bandwidth. Memory has always been important in AI, and as generative AI has taken hold and popular models reach the tens of billions of parameters in size it’s become even more critical.

Both make use of high-bandwidth memory (HBM), which is a stack of DRAM memory dies atop a control chip. In high-end accelerators, it sits inside the same package as the logic silicon, surrounding it on at least two sides. Chipmakers use advanced packaging, such as Intel’s EMIB silicon bridges or TSMC’s chip-on-wafer-on-silicon (CoWoS), to provide a high-bandwidth path between the logic and memory.

As the chart shows, Gaudi 3 has more HBM than H100, but less than H200, B200, or AMD’s MI300. It’s memory bandwidth is also superior to H100’s. Possibly of importance to Gaudi’s price competitiveness, it uses the less expensive HBM2e versus the others’ HBM3 or HBM3e, which are thought to be a
significant fraction of the tens of thousands of dollars the accelerators reportedly sell for.

One more point of comparison is that Gaudi 3 is made using
TSMC’s N5 (sometimes called 5-nanometer) process technology. Intel has basically been a process node behind Nvidia for generations of Gaudi, so it’s been stuck comparing its latest chip to one that was at least one rung higher on the Moore’s Law ladder. With Gaudi 3, that part of the race is narrowing slightly. The new chip uses the same process as H100 and H200. What’s more, instead of moving to 3-nm technology, the coming competitor Blackwell is done on a process called N4P. TSMC describes N4P as being in the same 5-nm family as N5 but delivering an 11 percent performance boost, 22 percent better efficiency, and 6 percent higher density.

In terms of Moore’s Law, the big question is what technology the next generation of Gaudi, currently code-named Falcon Shores, will use. So far the product has relied on TSMC technology while Intel gets its foundry business up and running. But next year Intel will begin offering its
18A technology to foundry customers and will already be using 20A internally. These two nodes bring the next generation of transistor technology, nanosheets, with backside power delivery, a combination TSMC is not planning until 2026.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
JAL、成田-香港3月再開 ホノルルは羽田増便、成田減便 thumbnail

JAL、成田-香港3月再開 ホノルルは羽田増便、成田減便

 日本航空(JAL/JL、9201)は1月12日、羽田-ソウル(金浦)線を除く3月分の国際線運航計画を発表した。1日から冬ダイヤ最終日の26日までが対象で、運休中の成田-香港線を再開するほか、羽田-ホノルル線を増便する。また、東京発着以外の帰任需要で中部・関西行きの臨時便を設定する。 3月分の国際線運航計画を発表したJAL=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire  再開・増便は3路線で、成田-香港線は週1往復で再開し、成田発を木曜、香港発を金曜に設定する。現在は現地発のみ運航する関西-ロサンゼルス線は関空発も再開。関西圏からの赴任需要に対応する。関西発は月曜、ロサンゼルス発は現在の土曜に加え、木曜発を設定する。  現在は週2往復の羽田-ホノルル線は、週3往復に増便する。火曜と金曜の羽田発は日曜を、水曜と土曜のホノルル発は月曜を、それぞれ追加する。  減便は2路線で、2月に1日1往復(週7往復)へ増便予定の成田-ホノルル線は、3月から週6往復に減便。成田発は日曜以外、ホノルル発は月曜以外運航する。現在は週3往復の羽田-ロンドン線は、ロンドン発を週1便に減便し水曜のみ運航する。羽田発は水曜と金曜、日曜の週3便を継続する。  帰任需要の臨時便は2路線で、ロンドン発関空行きを日曜のみ週1便、ダラス発中部行きは3月5日と12日に運航する。  1日3往復を計画する羽田-ソウル線は、1月29日から2月28日まで全便の運休が決まった。3月以降は未定となっている。  新型コロナウイルス前に策定した2020年度の事業計画によると、3月の冬ダイヤ期間中の国際線は未定となっているソウル線を除き、59路線3960便を計画していた。減便対象は59路線2769便で、減便率は70%となる。 *関西発着便の詳細はこちら。 関連リンク日本航空 ・JAL、関空-ロサンゼルス3月再開 帰国需要でロンドン臨時便も(22年1月13日) ・JAL赤坂社長「年度末へゆっくりした回復」地方で人材力生かす(22年1月2日) ・JAL、サーチャージ引き上げ 22年2-3月発券分(21年12月23日) ・ANA、羽田-シアトルを成田発着に バンクーバーも、3月までの国際線(21年12月14日) ・JAL、成田-ホノルル2月増便 1日1往復、北京は3月まで運休(21年12月2日)
Read More
Extremely metal-poor star stream discovered in Milky Way thumbnail

Extremely metal-poor star stream discovered in Milky Way

Stellar ejecta gradually enriches the gas out of which subsequent stars form. This creates the least chemically enhanced stellar systems direct fossils of structures formed in the early Universe. An international team of scientists from the University of Groningen discovered the remnants of a star cluster whose stars share a uniquely low fraction of heavy…
Read More
Time may be an illusion created by quantum entanglement thumbnail

Time may be an illusion created by quantum entanglement

Where does time come from?Quality Stock / Alamy Time may not be a fundamental element of our physical reality. New calculations add credence to the idea that it emerges from quantum entanglement, in which two objects are so inextricably linked that disturbing one disrupts the other, no matter how distant they are. “For centuries, time
Read More
A secret of stronger metals unveiled thumbnail

A secret of stronger metals unveiled

There are many ways to create metals in shapes needed for different purposes. However, these processes affect the sizes and shapes of the tiny crystalline grains that make up the bulk metal. A new MIT study determines what happens as these crystal grains form during an extreme deformation process, down to a few nanometers across…
Read More
Index Of News
Total
0
Share