此内容尚未提供您的语言版本,正在以英文显示。

Miles Rl Training

技能活跃

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

目的

To guide users in performing enterprise-grade Reinforcement Learning training for large-scale MoE models, leveraging advanced techniques like FP8/INT4 quantization and speculative RL for maximum efficiency and alignment.

功能

Low-precision training (FP8, INT4)
MoE model training and alignment (R3)
Speculative RL for throughput optimization
Train-inference alignment
Production-ready framework guidance

使用场景

Training large MoE models (1TB+)
Enabling FP8 or INT4 quantization-aware training
Achieving bit-wise identical train-inference alignment
Maximizing rollout throughput with speculative RL

非目标

Serving as the research-grade original slime framework
Providing flexible backend swapping (use verl)
Offering PyTorch-native abstractions (use torchforge)

Trust

warning:Issues Attentionopen=17, closed=4. The ratio of open to closed issues in the last 90 days is low, suggesting maintainers may be slow to respond to or resolve issues.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

92 /100

about 22 hours ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Miles RL Training

技能

Orchestra-Research

Slime Rl Training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

技能

Orchestra-Research

Slime RL Training

技能

davila7

Tensorrt Llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

技能

Orchestra-Research

Agentdb Learning

Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.

技能

ruvnet

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能

Orchestra-Research