We present {that a} GPT-3 mannequin can study to specific uncertainty about its personal solutions in pure language—with out use of mannequin logits. When given a query, the mannequin generates each a solution and a degree of confidence (e.g. « 90% confidence » or « excessive confidence »). These ranges map to chances which are effectively calibrated. The mannequin additionally stays reasonably calibrated beneath distribution shift, and is delicate to uncertainty in its personal solutions, reasonably than imitating human examples. To our information, that is the primary time a mannequin has been proven to specific calibrated uncertainty about its personal solutions in pure language. For testing calibration, we introduce the CalibratedMath suite of duties. We evaluate the calibration of uncertainty expressed in phrases (« verbalized chance ») to uncertainty extracted from mannequin logits. Each sorts of uncertainty are able to generalizing calibration beneath distribution shift. We additionally present proof that GPT-3’s potential to generalize calibration will depend on pre-trained latent representations that correlate with epistemic uncertainty over its solutions.