In Depth

Semantic segmentation assigns a class label to every pixel in an image, producing a dense prediction map. Unlike object detection (which draws bounding boxes) or classification (which labels the whole image), segmentation provides pixel-precise understanding of scene composition. Every pixel is labeled as road, sidewalk, person, car, sky, building, or whatever categories the model is trained on.

Key architectures include U-Net (especially in medical imaging), DeepLab (using dilated convolutions and conditional random fields), and Segment Anything Model (SAM) from Meta, which enables zero-shot segmentation of any object. The task comes in several variants: semantic segmentation (classify every pixel by category), instance segmentation (distinguish individual objects within the same category), and panoptic segmentation (combining both).

Semantic segmentation is essential for applications requiring precise spatial understanding. In autonomous driving, it helps vehicles understand road layout and distinguish drivable areas from obstacles. In medical imaging, it delineates tumor boundaries or organ structures. In satellite imagery, it maps land use, vegetation, and urban development. In augmented reality, it enables realistic occlusion and placement of virtual objects in real scenes.