Published On Sep 13, 2024
Have we discovered an ideal gas law for AI? Head to https://brilliant.org/WelchLabs/ to try Brilliant for free for 30 days and get 20% off an annual premium subscription.
Welch Labs Book Ships December 2024: https://www.welchlabs.com/resources/i...
Welch Labs Posters: https://www.welchlabs.com/resources
Support Welch Labs on Patreon! / welchlabs
Special thanks to Patrons: Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin
Learn more about WelchLabs! https://www.welchlabs.com
TikTok: / welchlabs
Instagram: / welchlabs
REFERENCES
First 2020 OpenAI Scaling Paper: https://arxiv.org/pdf/2001.08361
GPT-3 Paper: https://arxiv.org/pdf/2005.14165
Second 202 OpenAI Scaling Paper: https://arxiv.org/pdf/2010.14701
Google Deepmind “Chinchilla Scaling” Paper: https://arxiv.org/abs/2203.15556
Nice summary of Chinchilla Scaling: https://www.lesswrong.com/posts/6Fpvc...
GPT-4 Technical Report: https://arxiv.org/pdf/2303.08774
Nice Neural Scaling Laws Summary: https://www.lesswrong.com/posts/Yt5wA...
Explaining Neural Scaling Laws: https://arxiv.org/pdf/2102.06701
A Neural Scaling Law from the Dimension of the Data Manifold: https://arxiv.org/pdf/2004.10802
High Cost of Training GPT-4: https://www.wired.com/story/openai-ce...
Nvidia V100 FLOPs: https://lambdalabs.com/blog/demystify...
Nvidia V100 Original Price: [https://www.microway.com/hpc-tech-tip... GPU model,Key Points](https://www.microway.com/hpc-tech-tip...)
Great paper on scaling up training infrastructure: https://arxiv.org/pdf/2104.04473
Eight Things to Know about LLMs: https://arxiv.org/abs/2304.00612
Emergent Properties of LLMs: https://arxiv.org/abs/2206.07682
Theoretical Motivation for Cross Entropy (Section 6.2): https://www.deeplearningbook.org/
Some papers that appear to pass the compute efficient frontier
https://arxiv.org/pdf/2206.14486
https://arxiv.org/abs/2210.11399
Leaked GPT-4 training info
https://patmcguinness.substack.com/p/...
https://www.semianalysis.com/p/gpt-4-...
https://epochai.org/blog/tracking-lar...