MoEs, AB Testing and Reinforcement Learning
Last updated
Last updated
Forget about pre-training and fine-tuning large language models (LLMs) from scratch. We’re not here to reinvent the wheel — we’re here to make it spin faster and smarter. Enter DAMN, the Crowdsourced AI powered by Mixture-of-Experts (MoE) architectures. Guided by a gating model, as outlined in the original MoE paper, DAMN dynamically selects the right model for the right task, making it lean, agile, and incredibly efficient.
At the heart of DAMN is our LLM Router, the DAMN Controller — a decision algorithm designed to seamlessly manage and route conversations to the most appropriate model. Given a set of conversations C, the controller maps them to a latent "feature space," tokenizes them, and determines the best "model space" to activate. In simple terms, it matches the problem to the perfect problem-solver — no wasted computation, no unnecessary complexity.
Instead of relying on one massive, monolithic model, DAMN taps into a network of specialized experts. This results in faster response times, lower costs, and models that are better suited to specific tasks. It's like having a whole team of AI experts on standby, each ready to jump in when their specialty is needed.
We also leverage Reinforcement Learning from Human Feedback (RLHF), as outlined by OpenAI’s landmark paper. While OpenAI uses this process to train single, massive models, we apply it to optimize selection within our MoE framework. Our models improve continuously through a three-stage process:
Supervised Fine-Tuning (SFT) — Prepares a model with initial performance.
Reward Model (RM) Training — Creates a feedback loop to measure success.
Reinforcement Learning (PPO) — Optimizes models using feedback from the RM, ensuring every update makes the system sharper and more efficient.
Model Ensembling & Blending
Why settle for one perspective when you can harness the power of many? Our approach to Model Ensembling & Blending brings together the strengths of multiple smaller base models, combining their unique insights into a single, unified powerhouse. This synergy not only enhances predictive accuracy but also rivals — and often surpasses — the performance of much larger, resource-intensive models. By leveraging this technique, we achieve smarter, faster, and more cost-efficient AI solutions without compromising on quality or scale.
We will reveal more details regarding our technology over time.