MetaMeta·💬 Text Generation

Hermes 3 Llama 3.1 405b

Web Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Hermes 3 Llama 3.1 405B — TLDR
  • 🆕 Full-parameter finetune of Meta's Llama 3.1 405B foundation model.
  • 🏢 Built by Nous Research, hosted here with web search.
  • 📏 128K-token context window for long, coherent conversations.
  • 🔧 Reliable function calling and structured JSON output support.
  • 🎯 User-steerable alignment, giving control to the end user.
  • ⚡ Served in FP8 quantization, fitting roughly 430GB VRAM.
  • 💬 Strong roleplaying, multi-turn dialogue, and agentic behavior.
  • 📚 Licensed under Meta's Llama 3 community license.
💰 Pricing
$1.10 / $3.00
per 1M · input / output
📏 Context
128K tokens
📅 On Venice since
Sep 25, 2025
251 days ago
Provider

Meta Platforms is an American multinational technology company headquartered in Menlo Park, California. Best known for operating Facebook, Instagram, WhatsApp, and Threads, Meta has become one of the most influential players in open-weight AI research through…

Read full profile →
3 models on Venice
3 text
Since Oct 3, 2024

About this model

Hermes 3 Llama 3.1 405B is a frontier-scale, full-parameter finetune of Meta's Llama 3.1 405B foundation model, developed by Nous Research rather than Meta itself. Nous describes it as the first full-parameter fine-tune of the 405B base, with a design philosophy centered on aligning the model to the individual user and granting powerful steering capabilities and control to the end user. It is a generalist model offering a 128K-token context window, reliable function calling, and structured JSON output suitable for software integration.

Relative to its same-family predecessor, Hermes 2, Nous reports that Hermes 3 adds advanced agentic capabilities, improved roleplaying, stronger reasoning, better multi-turn conversation, and improved long-context coherence across the board. These are vendor-stated generational improvements rather than independently verified benchmark figures.

Because the full 405B model requires over 800GB of VRAM in FP16, this build uses NeuralMagic's FP8 quantization to reduce the footprint to roughly 430GB while remaining compatible with the VLLM inference engine. The model is released under Meta's Llama 3 community license, reflecting its Llama base.

Within this catalog, Hermes 3 is the largest and newest text model in its family, sitting alongside Meta's own smaller releases such as Llama 3.3 70B and Llama 3.2 3B. Here it is additionally equipped with web-search capability for retrieval-augmented responses.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago