In Depth

Falcon is a family of large language models created by the Technology Innovation Institute (TII) in Abu Dhabi, UAE. The initial Falcon 40B model topped the Hugging Face Open LLM Leaderboard upon release in 2023, demonstrating that high-quality open models could come from outside the traditional Silicon Valley AI labs.

Falcon models were trained on the RefinedWeb dataset, a carefully curated and filtered version of web data. TII's research showed that data quality matters more than dataset diversity, and that well-filtered web data alone could produce models rivaling those trained on more complex data mixtures. This insight influenced how the community approaches training data curation.

The Falcon family includes models at various sizes (7B, 40B, 180B) with Apache 2.0 licensing, making them truly open for commercial use. Falcon 180B was one of the largest openly available models at its release. For businesses, Falcon provides a capable, commercially licensable foundation for building custom AI applications without API dependencies.