Browsing Tag
Undertrained
1 post
AI Models Are Undertrained by 100-1000 Times – AI Will Be Better With More Training Resources
The Chinchilla compute optimal point for an 8B (8 billion parameter) model would be train it for ~200B (billion) tokens. (if you were only interested to get the most “bang-for-the-buck” w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, [Karpathy] thinks this is extremely welcome.
June 21, 2024