Skip to main content

Mamba Architecture

Skill Verified Active

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

Purpose

To provide users with a comprehensive understanding of the Mamba state-space model architecture, its advantages over Transformers, and practical guidance on its implementation and usage.

Features

  • Detailed explanation of Mamba's Selective SSM
  • Comparison of Mamba-1 vs Mamba-2
  • Code examples for basic usage and language modeling
  • Performance benchmarks (speed, memory, perplexity)
  • Installation instructions and common issue resolution

Use Cases

  • Understanding alternative sequence modeling architectures to Transformers
  • Implementing or fine-tuning Mamba models for long-context tasks
  • Evaluating Mamba's performance benefits for inference and training
  • Debugging common issues encountered when working with Mamba

Non-Goals

  • Providing a pre-trained Mamba model for direct use
  • Implementing advanced Mamba features beyond basic configuration and usage
  • Covering non-Python implementations of Mamba

Trust

  • info:Issues AttentionThe repository has 17 open issues and 4 closed issues in the last 90 days, suggesting a moderate level of maintainer engagement but with a potentially slow closure rate.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
95 /100
Analyzed 1 day ago

Trust Signals

Last commit1 day ago
Stars27.2k
LicenseMIT
Status
View Source

Similar Extensions

Mamba Architecture

99

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

Skill
Orchestra-Research

Rwkv Architecture

99

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

Skill
Orchestra-Research

Rwkv Architecture

96

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

Skill
davila7

TorchTitan Distributed LLM Pretraining

99

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill
Orchestra-Research

Implementing Llms Litgpt

98

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill
Orchestra-Research

Distributed Llm Pretraining Torchtitan

98

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill
davila7

© 2025 SkillRepo · Find the right skill, skip the noise.