In Depth

GPT-1 (2018) demonstrated that unsupervised pre-training followed by task-specific fine-tuning could outperform task-specific training from scratch. GPT-3 (2020) showed that scaling alone unlocked few-shot capabilities without fine-tuning. GPT-4 added multimodal input and further improved reasoning and instruction-following.