In Depth

In-context learning (ICL) is the remarkable ability of large language models to adapt their behavior based on examples or instructions provided in the input prompt. By showing the model a few examples of input-output pairs, it can infer the pattern and apply it to new inputs, effectively 'learning' a new task at inference time without any changes to its underlying parameters.

This capability emerged as a surprising property of large-scale language models and was first prominently demonstrated in the GPT-3 paper. The model isn't actually updating its weights; instead, it's using its pre-trained knowledge to recognize patterns in the examples and apply similar reasoning. The effectiveness of in-context learning scales with model size, and it may not work well with smaller models.

In-context learning has profound practical implications. It means that users can customize model behavior without the cost and complexity of fine-tuning. Prompt engineering, few-shot prompting, and chain-of-thought prompting are all techniques that leverage in-context learning. However, it has limitations: it is bounded by the context window size, performance can be sensitive to example selection and ordering, and it is less reliable than fine-tuning for complex or specialized tasks.