Computer vision is the field of AI that enables machines to interpret and understand visual information from the world — images, videos, and real-time camera feeds. It gives computers the ability to "see" and make decisions based on what they observe.

Computer vision has matured rapidly. In 2012, the best AI systems had a 26% error rate on image classification. By 2020, that dropped below 2% — surpassing average human performance on the same benchmarks.

Key applications of computer vision include:

Object detection and recognition identifies what's in an image and where. Self-driving cars use this to detect pedestrians, vehicles, traffic signs, and lane markings in real time, processing 20-30 frames per second. Retail stores use it for automated checkout — Amazon's Just Walk Out technology tracks items using ceiling-mounted cameras.

Medical imaging analyzes X-rays, MRIs, CT scans, and pathology slides. AI systems can detect diabetic retinopathy, certain cancers, and fractures with accuracy matching or exceeding specialist radiologists. Google's DeepMind developed a system that detects over 50 eye diseases from retinal scans.

Quality inspection in manufacturing catches defects that human inspectors miss. Factories using computer vision report defect detection rates above 99%, compared to 80-90% for manual inspection. This runs 24/7 without fatigue.

Facial recognition powers phone unlocking, identity verification for banking, and airport security. Apple's Face ID maps 30,000 invisible dots on your face to create a depth map that works even in the dark.

Video analytics monitors security footage, tracks inventory, analyzes sports performance, and counts foot traffic. A single AI system can monitor hundreds of camera feeds simultaneously — something that would require an army of human operators.

Document processing extracts information from forms, receipts, invoices, and handwritten notes. This is called Optical Character Recognition (OCR) when combined with AI, achieving above 99% accuracy on printed text.

The technology stack typically involves convolutional neural networks (CNNs) or vision transformers (ViTs) trained on millions of labeled images. Transfer learning allows companies to adapt pre-trained models to their specific use case with much less data than training from scratch.

For businesses considering computer vision, cloud APIs from Google, AWS, and Azure offer pay-per-use access without building models from scratch. Custom solutions typically cost $50,000-$500,000 depending on complexity.