dark

Hand-Picked Top-Read Stories

Court remands suspects over murder of Offa Health College student thumbnail

Court remands suspects over murder of Offa Health College student

General

February 4, 2025

NYSC DG tasks corp members on impactful initiatives in hosts communities thumbnail

NYSC DG tasks corp members on impactful initiatives in hosts communities

General

February 4, 2025

EFCC arraigns former NHIS Boss, Usman Yusuf over alleged N90m fraud thumbnail

EFCC arraigns former NHIS Boss, Usman Yusuf over alleged N90m fraud

General

February 4, 2025

Trending Tags

AI Models Are Undertrained by 100-1000 Times – AI Will Be Better With More Training Resources

The Chinchilla compute optimal point for an 8B (8 billion parameter) model would be train it for ~200B (billion) tokens. (if you were only interested to get the most “bang-for-the-buck” w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, [Karpathy] thinks this is extremely welcome. Because we all get a very capable model that is very small, easy to work with and inference. Meta mentions that even at this point, the model doesn’t seem to be “converging” in a standard sense. In other words, the LLMs we work with all the time are significantly undertrained by a factor of maybe 100-1000X or more, nowhere near their point of convergence. Actually, [Karpathy] really hope people carry forward the trend and start training and releasing even more long-trained, even smaller models.

Karpathy seems to be saying that if we have better compute, we can train up models to a more ideal level for better AI and AI performance.

Congrats to @AIatMeta on Llama 3 release!! 🎉https://t.co/fSw615zE8S
Notes:
Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we’ll see when the rankings come in @ @lmsysorg :))
400B is still training, but already encroaching…
— Andrej Karpathy (@karpathy) April 18, 2024

If a large language model is undertrained by 1000 times, it means that the model has not been trained on a sufficient amount of data or for a sufficient number of iterations to reach its full potential. In other words, the model has not learned enough from the data to perform well on the tasks it was designed for.

To illustrate this, let’s use an analogy. Imagine you’re trying to learn a new language. If you only study for 10 minutes a day, it will take you much longer to become fluent than if you studied for 10 hours a day. Similarly, if a large language model is trained on a small dataset or for a short period of time, it will not be able to learn as much as it could if it were trained on a larger dataset or for a longer period of time.

The performance of a large language model is often measured in terms of its perplexity, which is a measure of how well the model predicts the next word in a sequence. A lower perplexity score indicates better performance. If a model is undertrained, its perplexity score will be higher than it could be if it were trained properly.

The amount of improvement that can be achieved by training a model properly depends on a variety of factors, including the size of the model, the quality of the data, and the specific task the model is being trained for. However, in general, it is possible for a model to achieve a significant improvement in performance if it is trained properly.

For example, a recent study found that increasing the size of a large language model from 1.5 billion parameters to 175 billion parameters can lead to a 10-fold improvement in performance on some tasks. This suggests that larger models can be more powerful than smaller ones, but only if they are trained properly.

In summary, if a large language model is undertrained by 1000 times, it means that the model has not been trained on a sufficient amount of data or for a sufficient number of iterations to reach its full potential. If the model were trained properly, it could potentially achieve a significant improvement in performance.

AI’s Red Pajama dataset from Oct/2023 continues to hold the crown with 30 trillion tokens in 125 terabytes. Notably, all major AI labs have now expanded beyond text into multimodal datasets—especially audio and video—for training frontier multimodal models like Gemini, Claude 3 Opus, GPT-4o, and beyond.

What is in one of the major 5 trillion token (20-30 Terabyte) text AI training datasets?

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

MAYHEM (Amazon Exclusive Opaque White Vinyl)

$39.98 (as of February 5, 2025 19:31 GMT +00:00 - )

Auckyeer Ring Sizer Measuring Tool 27 Pcs, Ring Sizing Kit 0-13 with Half Size, and 2 Pcs Reusable Finger Size Tape with Magnified Glass 1-17 Size, How to Measure Ring Size at Home for Women Men

(12)

$3.98 (as of February 6, 2025 19:27 GMT +00:00 - )

LEGO Creator 3 in 1 Wild Animals Surprising Spider Toy - Building Toy with 3 Build Options, Spider, Scorpion, or Snake - Animal Figures for Kids, Boys & Girls, Ages 7+ - Gift Idea for Birthday - 31159

(32)

$12.88 (as of February 6, 2025 19:27 GMT +00:00 - )

Apple AirTag 4 Pack

$64.39 (as of February 5, 2025 19:31 GMT +00:00 - )

Sundays Are for the Birds Sweatshirt | Cozy Unisex Pullover | Game Day Apparel | Gift for Sports Fans (Forest Green, Medium)

(2)

$28.97 (as of February 6, 2025 20:03 GMT +00:00 - )

Index Of News Author

Related Posts

How to delete music from an iPhone or iPad thumbnail

Science and Medical

How to delete music from an iPhone or iPad

Skip to content While it’s great to have every album you own always in your pocket, putting all your music on an iPhone can present storage issues, leaving you with no space for videos, photos, and games. Thankfully there are solutions to this issue. In this article, we show how to delete the music on

July 14, 2023

Unique Prom Outfit Alternatives To Traditional Long Gowns In 2022 thumbnail

Science and Medical

Unique Prom Outfit Alternatives To Traditional Long Gowns In 2022

Hello prom girls! Want to add a tinge of drama to your prom look? Let’s go...Prom is a perfect excuse to dress up to the nines. Long formal gowns with a touch of embroidery, beads, and embellishment are standard fare for gala prom. But today’s high schoolers are meant for more! They deserve unique styles…

September 28, 2021

Computer chip made using mushroom skin could be easily recycled thumbnail

Science and Medical

Computer chip made using mushroom skin could be easily recycled

The base of computer chips and batteries tends to be made from unrecyclable plastic, but using skin from a certain species of mushroom instead would reduce electronic waste Technology 11 November 2022 By Alex Wilkins Ganoderma lucidum grows a skin on its root-like mycelium that has the right qualities to work with electronicsShutterstock/ukjent Using mushroom

November 11, 2022

Why Do Dogs Bark? thumbnail

Science and Medical

Why Do Dogs Bark?

Is it even possible to write about why dogs bark without conjuring up the image of a particular Far Side cartoon? You know, the one which features a researcher donning an intricate dog decoder cap while listening in on the neighboring canines who are all apparently just saying, “Hey?" The humor lies, of course, in…

January 7, 2022

‘The heart of the city is still there.’ How this Mardi Gras stoked revival. thumbnail

Science and Medical

‘The heart of the city is still there.’ How this Mardi Gras stoked revival.

Tossing beads as he marched through a surging Mardi Gras crowd last week, Jonathan Barnes felt a significant shift – that his city was back.“Mardi Gras 2022 is going to be that symbol of when people look back and say it was a defining moment, where we got back as normal as possible,” he says.…

March 8, 2022

How to Clean Your Shower Head (and Why You Should) thumbnail

Science and Medical

How to Clean Your Shower Head (and Why You Should)

Photo: G.Tbov (Shutterstock)There’s nothing quite like a proper shower after a long day. But if weak water pressure and irregular spray patterns are preventing you from washing away your troubles, then it might be time to give your shower head a deep clean.Discovering that the thing meant to be cleaning you is itself dirty is…

February 14, 2022