Over on Developmental Stages of GPTs, orthonormal mentions
it at least reduces the chance of a hardware overhang.An overhang is when you have had the ability to build transformative AI for
quite some time, but you haven't because no-one's realised it's possible. Then
someone does and surprise! It's a lot more capable than everyone expected.
I am worried we're in an overhang right now. I think we right now have the
ability to build an orders-of-magnitude more powerful system than we already
have, and I think GPT-3 is the trigger for 100x-larger projects at Google and
Facebook and the like, with timelines measured in months.
GPT-3 is the first AI system that has obvious, immediate, transformative
economic value. While much hay has been made about how much more expensive it is
than a typical AI research project, in the wider context of megacorp investment
it is insignificant.
GPT-3 has been estimated to cost $5m in compute to train, and - looking at the
author list and OpenAI's overall size - maybe another $10m in labour, on the
Google, Amazon and Microsoft all each spend ~$20bn/year on R&D and another
~$20bn each on capital expenditure. Very roughly it totals to ~$100bn/year. So
dropping $1bn or more on scaling GPT up by another factor of 100x is entirely
plausible right now. All that's necessary is that tech executives stop thinking
of NLP as cutesy blue-sky research and start thinking in terms of
A concrete example is Waymo, which is doing $2bn investment rounds - and that's
for a technology with a much longer road to market.
The other side of things is compute cost. The $5m GPT-3 training cost estimate
comes from using V100s at $10k/unit and 30 TFLOPS, which is the performance
without tensor cores being considered. Amortized over a year, this gives you
But there, the price is driven up an order of magnitude by NVIDIA's monopolistic
cloud contracts, and the perform