About this model
GPT-4o ("o" for "omni") is OpenAI's flagship from the GPT-4 generation, accepting text and image inputs and producing text outputs including structured outputs. Architecturally it marked a shift: OpenAI trained a single new model end-to-end across text, vision, and audio, rather than feeding a separate vision encoder into a language model. This catalog entry refers to the 2024-11-20 revision.
Against its own predecessors, OpenAI reports that GPT-4o matches GPT-4 Turbo-level performance on text, reasoning, and coding while setting new high watermarks on multilingual, audio, and vision capabilities—running faster and at lower API cost than GPT-4 Turbo. The 2024-11-20 update specifically made the model, per OpenAI's release notes, "smarter across the board," with more up-to-date knowledge (training cutoff moved from November 2023 to June 2024), deeper image analysis, and gains on academic evaluations like GPQA, MATH, and MMLU.
For context on later siblings, OpenAI reports that on its SWE-bench Verified, GPT-4.1 completed 54.6% of tasks versus 33.2% for GPT-4o (2024-11-20). Within this family, GPT-4o has since been succeeded by reasoning-focused flagships such as GPT-5.2, GPT-5.4, and GPT-5.5, alongside smaller options like GPT-4o Mini.
GPT-4o is a multimodal model accepting text and image inputs, used for multimodal tasks, function-calling workflows, and web-search-augmented applications.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago