dark

Hand-Picked Top-Read Stories

No complaints of excessive food price hikes at Ramadan bazaars so far – Fuziah thumbnail

No complaints of excessive food price hikes at Ramadan bazaars so far – Fuziah

General

March 7, 2025

Teacher loses over RM200,000 to bitcoin scam thumbnail

Teacher loses over RM200,000 to bitcoin scam

General

March 7, 2025

Tindakan tegas tanpa kompromi terhadap penghasut kaum dan agama thumbnail

Tindakan tegas tanpa kompromi terhadap penghasut kaum dan agama

General

March 7, 2025

Trending Tags

AI Models Are Undertrained by 100-1000 Times – AI Will Be Better With More Training Resources

The Chinchilla compute optimal point for an 8B (8 billion parameter) model would be train it for ~200B (billion) tokens. (if you were only interested to get the most “bang-for-the-buck” w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, [Karpathy] thinks this is extremely welcome. Because we all get a very capable model that is very small, easy to work with and inference. Meta mentions that even at this point, the model doesn’t seem to be “converging” in a standard sense. In other words, the LLMs we work with all the time are significantly undertrained by a factor of maybe 100-1000X or more, nowhere near their point of convergence. Actually, [Karpathy] really hope people carry forward the trend and start training and releasing even more long-trained, even smaller models.

Karpathy seems to be saying that if we have better compute, we can train up models to a more ideal level for better AI and AI performance.

Congrats to @AIatMeta on Llama 3 release!! 🎉https://t.co/fSw615zE8S
Notes:
Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we’ll see when the rankings come in @ @lmsysorg :))
400B is still training, but already encroaching…
— Andrej Karpathy (@karpathy) April 18, 2024

If a large language model is undertrained by 1000 times, it means that the model has not been trained on a sufficient amount of data or for a sufficient number of iterations to reach its full potential. In other words, the model has not learned enough from the data to perform well on the tasks it was designed for.

To illustrate this, let’s use an analogy. Imagine you’re trying to learn a new language. If you only study for 10 minutes a day, it will take you much longer to become fluent than if you studied for 10 hours a day. Similarly, if a large language model is trained on a small dataset or for a short period of time, it will not be able to learn as much as it could if it were trained on a larger dataset or for a longer period of time.

The performance of a large language model is often measured in terms of its perplexity, which is a measure of how well the model predicts the next word in a sequence. A lower perplexity score indicates better performance. If a model is undertrained, its perplexity score will be higher than it could be if it were trained properly.

The amount of improvement that can be achieved by training a model properly depends on a variety of factors, including the size of the model, the quality of the data, and the specific task the model is being trained for. However, in general, it is possible for a model to achieve a significant improvement in performance if it is trained properly.

For example, a recent study found that increasing the size of a large language model from 1.5 billion parameters to 175 billion parameters can lead to a 10-fold improvement in performance on some tasks. This suggests that larger models can be more powerful than smaller ones, but only if they are trained properly.

In summary, if a large language model is undertrained by 1000 times, it means that the model has not been trained on a sufficient amount of data or for a sufficient number of iterations to reach its full potential. If the model were trained properly, it could potentially achieve a significant improvement in performance.

AI’s Red Pajama dataset from Oct/2023 continues to hold the crown with 30 trillion tokens in 125 terabytes. Notably, all major AI labs have now expanded beyond text into multimodal datasets—especially audio and video—for training frontier multimodal models like Gemini, Claude 3 Opus, GPT-4o, and beyond.

What is in one of the major 5 trillion token (20-30 Terabyte) text AI training datasets?

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Amazon.com Gift Card in Various Gift Boxes

$25.00 (as of March 10, 2025 19:46 GMT +00:00 - )

Lipfidence Lip Lightening Cream for Dark Lips|Lip Lightener for Smokers and Non-Smokers | Help fade lip discoloration with Alpha Arbutin & Licorice Extract | Scented+Soothing Mint 10ml/0.33fl.oz

$22.99 (as of March 10, 2025 20:21 GMT +00:00 - )

Apple Watch Series 10 [GPS 46mm case] Smartwatch with Jet Black Aluminium Case with Black Sport Band - M/L. Fitness Tracker, ECG App, Always-On Retina Display, Water Resistant

$289.52 (as of March 10, 2025 20:21 GMT +00:00 - )

All-new Amazon Kindle Paperwhite (16 GB) – Our fastest Kindle ever, with new 7" glare-free display and weeks of battery life – Raspberry

$159.99 (as of March 10, 2025 19:44 GMT +00:00 - )

Wireless Earbuds Sport Bluetooth Headphones 5.3, 80Hrs Stereo New Bluetooth Earbuds with HD Mic, Earbuds with Earhooks ENC Noise Canceling Earphones, IP7 Waterproof Headset for Sports/Workout/Running

$23.99 (as of March 10, 2025 19:44 GMT +00:00 - )

Index Of News Author

Related Posts

Here's When Fixing Your Refrigerator Makes More Sense thumbnail

Science and Medical

Here’s When Fixing Your Refrigerator Makes More Sense

Modern refrigerators do more than keep food chilled: They also come with ice makers, water dispensers and filters, and features that help you keep contents organized. But the more bells and whistles there are, the more of a chance there is that at least one of the components will break. Homeowners are then faced with

July 3, 2023

Science and Medical

Earth’s interior is cooling faster than expected

The evolution of our Earth is the story of its cooling: 4.5 billion years ago, extreme temperatures prevailed on the surface of the young Earth, and it was covered by a deep ocean of magma. Over millions of years, the planet's surface cooled to form a brittle crust. However, the enormous thermal energy emanating from…

January 14, 2022

Elusive atmospheric molecule produced in a lab for the 1st time thumbnail

Science and Medical

Elusive atmospheric molecule produced in a lab for the 1st time

Methanediol molecule. Credit: University of Hawaii The previously elusive methanediol molecule of importance to the organic, atmospheric science and astrochemistry communities has been synthetically produced for the first time by University of Hawaiʻi at Mānoa researchers. Their discovery and methods were published in Proceedings of the National Academy of Sciences on December 30. Methanediol is…

December 31, 2021

Comfortably listen to music with these $40 open-ear headphones thumbnail

Science and Medical

Comfortably listen to music with these $40 open-ear headphones

Skip to content Image: StackCommerce Tired of invasive earbuds or over-ear headphones that constantly fall off? Meet the Mezzo Bone Conduction Headphones. These headphones make your listening experience safe and comfortable thanks to their open-ear design. Now, you can score them for only 39.96 (reg. $69) with code MEZZO! Unlike traditional headphones that deliver sound waves directly into your ear canal, these bone conduction headphones rest outside your ears

May 3, 2024

How to Prepare Now for Slack’s Next Outage thumbnail

Science and Medical

How to Prepare Now for Slack’s Next Outage

Photo: Tada Images (Shutterstock)When Slack goes down, it’s a big deal. In 2019, the workplace communications software company announced it had over 10 million daily users—and as workers shifted to home office settings amid the pandemic the following year, needing increasing ways to stay connected with colleagues, we can safely guess that number went up.…

February 23, 2022

'Already? Surely not': Why this mountaineer spent 500 days in a cave with no outside contact thumbnail

Science and Medical

‘Already? Surely not’: Why this mountaineer spent 500 days in a cave with no outside contact

Key PointsMs Flamini said Time flew though she lost track of day and night.Scientists will use experience for brain and sleep studies.She took two GoPro cameras to document her time, and got through 60 books and 1,000 litres of water.A 50-year-old Spanish extreme athlete emerged on Friday from a 500-day challenge living 230 feet deep

April 15, 2023