Skip to main content

Rwkv Architecture

Skill Verified Active

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

Purpose

To provide developers with a deep understanding and practical guidance on using the RWKV model architecture, enabling them to leverage its efficient inference and linear complexity for long-context AI applications.

Features

  • Hybrid RNN+Transformer architecture
  • O(n) inference and linear time complexity
  • Infinite context window with constant memory usage
  • Parallelizable training like GPT, sequential inference like RNN
  • Detailed installation, usage, and workflow examples

Use Cases

  • Building AI applications requiring long-context processing
  • Deploying models in memory-constrained environments
  • Developing streaming AI services
  • Fine-tuning RWKV models for specific tasks

Non-Goals

  • Replacing Transformers for absolute best performance in compute-rich environments
  • Focusing on state-space models (Mamba) or other specific architectures (RetNet, Hyena)

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
96 /100
Analyzed 1 day ago

Trust Signals

Last commit1 day ago
Stars27.2k
LicenseMIT
Status
View Source

Similar Extensions

Rwkv Architecture

99

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

Skill
Orchestra-Research

Mamba Architecture

99

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

Skill
Orchestra-Research

Mamba Architecture

95

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

Skill
davila7

TorchTitan Distributed LLM Pretraining

99

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill
Orchestra-Research

Model Pruning

98

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

Skill
Orchestra-Research

Model Merging

98

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.

Skill
Orchestra-Research

© 2025 SkillRepo · Find the right skill, skip the noise.