Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Miles Rl Training

Skill Aktiv

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Zweck

To guide users in performing enterprise-grade Reinforcement Learning training for large-scale MoE models, leveraging advanced techniques like FP8/INT4 quantization and speculative RL for maximum efficiency and alignment.

Funktionen

  • Low-precision training (FP8, INT4)
  • MoE model training and alignment (R3)
  • Speculative RL for throughput optimization
  • Train-inference alignment
  • Production-ready framework guidance

Anwendungsfälle

  • Training large MoE models (1TB+)
  • Enabling FP8 or INT4 quantization-aware training
  • Achieving bit-wise identical train-inference alignment
  • Maximizing rollout throughput with speculative RL

Nicht-Ziele

  • Serving as the research-grade original slime framework
  • Providing flexible backend swapping (use verl)
  • Offering PyTorch-native abstractions (use torchforge)

Trust

  • warning:Issues Attentionopen=17, closed=4. The ratio of open to closed issues in the last 90 days is low, suggesting maintainers may be slow to respond to or resolve issues.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

92 /100
Analysiert about 22 hours ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Miles RL Training

97

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill
Orchestra-Research

Slime Rl Training

98

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill
Orchestra-Research

Slime RL Training

78

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill
davila7

Tensorrt Llm

98

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Skill
Orchestra-Research

Agentdb Learning

99

Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.

Skill
ruvnet

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
Orchestra-Research