此内容尚未提供您的语言版本,正在以英文显示。

Rwkv Architecture

技能已验证活跃

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

目的

To provide developers with a deep understanding and practical guidance on using the RWKV model architecture, enabling them to leverage its efficient inference and linear complexity for long-context AI applications.

功能

Hybrid RNN+Transformer architecture
O(n) inference and linear time complexity
Infinite context window with constant memory usage
Parallelizable training like GPT, sequential inference like RNN
Detailed installation, usage, and workflow examples

使用场景

Building AI applications requiring long-context processing
Deploying models in memory-constrained environments
Developing streaming AI services
Fine-tuning RWKV models for specific tasks

非目标

Replacing Transformers for absolute best performance in compute-rich environments
Focusing on state-space models (Mamba) or other specific architectures (RetNet, Hyena)

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

96 /100

about 2 months ago 分析

信任信号

最近提交about 2 months ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Rwkv Architecture

技能

Orchestra-Research

Mamba Architecture

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

技能

Orchestra-Research

Mamba Architecture

技能

davila7

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

技能

Orchestra-Research

Model Pruning

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

技能

Orchestra-Research

Model Merging

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.

技能

Orchestra-Research