Scaling legal guidelines as mentioned above can solely be helpful in predicting the “pretraining check loss”, however not the precise duties or expertise that the mannequin will likely be good at. Builders can get assured that the mannequin will likely be higher, however solely god is aware of what truly.
This implies, scaling law-style predictions should not unreliable on the subject of predicting the talents the skilled mannequin will possess.
So,
When a lab invests in creating a brand new LLM, they’re primarily shopping for a “thriller field”
As soon as once more taking GPT for example. GPT-3 is the primary trendy LLM to indicate
- few-shot studying and
- chain-of-thought reasoning capabilities
Enjoyable reality — its few shot capabilities weren’t identified till it was skilled and its capability for chain-of-thought reasoning was found solely a number of months later as soon as it was accessible to the general public.