About this model
Hermes 3 Llama 3.1 405B is a frontier-scale, full-parameter finetune of Meta's Llama 3.1 405B foundation model, developed by Nous Research rather than Meta itself. Nous describes it as the first full-parameter fine-tune of the 405B base, with a design philosophy centered on aligning the model to the individual user and granting powerful steering capabilities and control to the end user. It is a generalist model offering a 128K-token context window, reliable function calling, and structured JSON output suitable for software integration.
Relative to its same-family predecessor, Hermes 2, Nous reports that Hermes 3 adds advanced agentic capabilities, improved roleplaying, stronger reasoning, better multi-turn conversation, and improved long-context coherence across the board. These are vendor-stated generational improvements rather than independently verified benchmark figures.
Because the full 405B model requires over 800GB of VRAM in FP16, this build uses NeuralMagic's FP8 quantization to reduce the footprint to roughly 430GB while remaining compatible with the VLLM inference engine. The model is released under Meta's Llama 3 community license, reflecting its Llama base.
Within this catalog, Hermes 3 is the largest and newest text model in its family, sitting alongside Meta's own smaller releases such as Llama 3.3 70B and Llama 3.2 3B. Here it is additionally equipped with web-search capability for retrieval-augmented responses.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
Primary reference paper for this model family, sourced from the HuggingFace model card.
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago