Meta·💬 Text Generation

Hermes 3 Llama 3.1 405b

Web Searchfp8private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Hermes 3 Llama 3.1 405B — TLDR

🆕 Full-parameter finetune of Meta's Llama 3.1 405B foundation model.
🏢 Built by Nous Research, hosted here with web search.
📏 128K-token context window for long, coherent conversations.
🔧 Reliable function calling and structured JSON output support.
🎯 User-steerable alignment, giving control to the end user.
⚡ Served in FP8 quantization, fitting roughly 430GB VRAM.
💬 Strong roleplaying, multi-turn dialogue, and agentic behavior.
📚 Licensed under Meta's Llama 3 community license.

💰 Pricing

$1.10 / $3.00

per 1M · input / output

📏 Context

128K tokens

📅 On Venice since

Sep 25, 2025

251 days ago

Provider

About this model

Hermes 3 Llama 3.1 405B is a frontier-scale, full-parameter finetune of Meta's Llama 3.1 405B foundation model, developed by Nous Research rather than Meta itself. Nous describes it as the first full-parameter fine-tune of the 405B base, with a design philosophy centered on aligning the model to the individual user and granting powerful steering capabilities and control to the end user. It is a generalist model offering a 128K-token context window, reliable function calling, and structured JSON output suitable for software integration.

Relative to its same-family predecessor, Hermes 2, Nous reports that Hermes 3 adds advanced agentic capabilities, improved roleplaying, stronger reasoning, better multi-turn conversation, and improved long-context coherence across the board. These are vendor-stated generational improvements rather than independently verified benchmark figures.

Because the full 405B model requires over 800GB of VRAM in FP16, this build uses NeuralMagic's FP8 quantization to reduce the footprint to roughly 430GB while remaining compatible with the VLLM inference engine. The model is released under Meta's Llama 3 community license, reflecting its Llama base.

Within this catalog, Hermes 3 is the largest and newest text model in its family, sitting alongside Meta's own smaller releases such as Llama 3.3 70B and Llama 3.2 3B. Here it is additionally equipped with web-search capability for retrieval-augmented responses.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

NousResearch/Hermes-3-Llama-3.1-405B · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2408.11857Aug 2024

Hermes 3 Technical Report(2024)

Ryan Teknium, Jeffrey Quesnelle, Chen Guang

Instruct (or "chat") tuned models have become the primary way in which most people interact with large language models. As opposed to "base" or "foundation" models, instruct-tuned models are optimized to respond to imperative statements. We present Hermes 3, a neutrally-aligned…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago