In Depth

U-Net was introduced in 2015 for biomedical image segmentation and gets its name from the U-shaped architecture diagram. It consists of a contracting path (encoder) that captures context by progressively downsampling the image, and an expansive path (decoder) that enables precise localization by upsampling back to the original resolution. Skip connections between corresponding encoder and decoder layers preserve fine-grained spatial details.

The architecture was revolutionary for medical imaging because it worked well with very small training datasets, a common constraint in medical applications. The skip connections allow the decoder to combine high-level semantic information from deep layers with precise spatial information from shallow layers, producing accurate pixel-level segmentation masks.

Beyond medical imaging, U-Net has become a critical component in modern generative AI. Diffusion models like Stable Diffusion use U-Net as their core denoising network, predicting and removing noise to generate images. This unexpected application has made U-Net one of the most important architectures in the current AI landscape, central to image generation, inpainting, and super-resolution systems.