Not currently listed in Venice's public API catalog — last listed Jun 30, 2026. Delisted models may still respond to direct API calls.

Arcee AI·💬 Text Generation

Trinity Large Thinking

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Trinity Large Thinking — TLDR

🧠 Reasoning-optimized variant of Arcee AI's Trinity-Large 398B sparse MoE family.
⚡ Roughly 13B active parameters per token for efficient inference.
📏 256K context window for long, multi-step agentic chains.
💬 Emits extended chain-of-thought inside reasoning-trace blocks.
🔧 Tool calling and agentic RL post-training for long-horizon tasks.
🌐 Multilingual training across 14 non-English languages.
🔒 Released under Apache 2.0 in 2026.
🏢 Built on Trinity-Large-Base; trained with Muon optimizer and SMEBU.

💰 Pricing

—

📅 On Venice since

Apr 13, 2026

96 days ago

Provider

Arcee AI

Arcee AI is an artificial intelligence company focused on developing advanced language models. The organization has built a reputation in the open-source AI community for its work on model optimization and specialized text generation architectures.

Read full profile →

About this model

Trinity Large Thinking is the reasoning-oriented member of Arcee AI's Trinity-Large series, a sparse Mixture-of-Experts model with roughly 398–400B total parameters and about 13B activated per token. It shares the same MoE architecture as the chat-focused Trinity-Large-Preview but is post-trained for extended chain-of-thought reasoning and agentic reinforcement learning, making it suited to long-horizon agents, multi-turn tool calling, and audit-friendly stepwise output.

The chief distinction from its same-family predecessors is reasoning behavior. Where Trinity-Large-Preview is lightly post-trained and chat-ready without trace output, Thinking emits intermediate reasoning inside dedicated reasoning-trace blocks before its final answer, and it is built on the Trinity-Large-Base foundation rather than being a fresh pretraining run.

Architecturally, the Trinity-Large family uses 256 experts with 4 active per token, interleaved local and global attention, gated attention, and sigmoid routing, according to Arcee's technical report. Training used the Muon optimizer plus a load-balancing technique called Soft-clamped Momentum Expert Bias Updates (SMEBU) across a 17-trillion-token pretraining recipe, completing with zero loss spikes.

The model supports tool calling, multilingual input, and a large context window for sustained agentic workflows. It is distributed under Apache 2.0, with FP8 weights and quantized GGUF builds available for self-hosting.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

Trinity-Large-Thinking | Arcee AI Documentationdocs.arcee.ai ↗

Arcee AI | Trinityarcee.ai ↗

Arcee Trinity Large Technical Reportarxiv.org ↗

arcee-ai/Trinity-Large-Thinking · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2602.17004Feb 2026

Arcee Trinity Large Technical Report(2026)

Varun Singh, Lucas Krauss, Sami Jaghouar et al.

We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token,…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 18d ago