Arcee AIArcee AI·💬 Text Generation

Trinity Large Thinking

ReasoningCodeFunction CallingWeb Searchfp8private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Trinity Large Thinking — TLDR
  • 🧠 Reasoning-optimized variant of Arcee AI's Trinity-Large 398B sparse MoE family.
  • ⚡ Roughly 13B active parameters per token for efficient inference.
  • 📏 256K context window for long, multi-step agentic chains.
  • 💬 Emits extended chain-of-thought inside reasoning-trace blocks.
  • 🔧 Tool calling and agentic RL post-training for long-horizon tasks.
  • 🌐 Multilingual training across 14 non-English languages.
  • 🔒 Released under Apache 2.0 in 2026.
  • 🏢 Built on Trinity-Large-Base; trained with Muon optimizer and SMEBU.
💰 Pricing
$0.313 / $1.13
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Apr 2, 2026
62 days ago
Provider

Arcee AI is an artificial intelligence company focused on developing advanced language models. The organization has built a reputation in the open-source AI community for its work on model optimization and specialized text generation architectures.

Read full profile →
1 model on Venice
1 text
Added Apr 2, 2026

About this model

Trinity Large Thinking is the reasoning-oriented member of Arcee AI's Trinity-Large series, a sparse Mixture-of-Experts model with roughly 398–400B total parameters and about 13B activated per token. It shares the same MoE architecture as the chat-focused Trinity-Large-Preview but is post-trained for extended chain-of-thought reasoning and agentic reinforcement learning, making it suited to long-horizon agents, multi-turn tool calling, and audit-friendly stepwise output.

The chief distinction from its same-family predecessors is reasoning behavior. Where Trinity-Large-Preview is lightly post-trained and chat-ready without trace output, Thinking emits intermediate reasoning inside dedicated reasoning-trace blocks before its final answer, and it is built on the Trinity-Large-Base foundation rather than being a fresh pretraining run.

Architecturally, the Trinity-Large family uses 256 experts with 4 active per token, interleaved local and global attention, gated attention, and sigmoid routing, according to Arcee's technical report. Training used the Muon optimizer plus a load-balancing technique called Soft-clamped Momentum Expert Bias Updates (SMEBU) across a 17-trillion-token pretraining recipe, completing with zero loss spikes.

The model supports tool calling, multilingual input, and a large context window for sustained agentic workflows. It is distributed under Apache 2.0, with FP8 weights and quantized GGUF builds available for self-hosting.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago