Mamba Architecture

Skill Verified Active

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

Purpose

To provide users with a comprehensive understanding of the Mamba state-space model architecture, its advantages over Transformers, and practical guidance on its implementation and usage.

Features

Detailed explanation of Mamba's Selective SSM
Comparison of Mamba-1 vs Mamba-2
Code examples for basic usage and language modeling
Performance benchmarks (speed, memory, perplexity)
Installation instructions and common issue resolution

Use Cases

Understanding alternative sequence modeling architectures to Transformers
Implementing or fine-tuning Mamba models for long-context tasks
Evaluating Mamba's performance benefits for inference and training
Debugging common issues encountered when working with Mamba

Non-Goals

Providing a pre-trained Mamba model for direct use
Implementing advanced Mamba features beyond basic configuration and usage
Covering non-Python implementations of Mamba

Trust

info:Issues AttentionThe repository has 17 open issues and 4 closed issues in the last 90 days, suggesting a moderate level of maintainer engagement but with a potentially slow closure rate.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

95 /100

Analyzed 1 day ago

Trust Signals

Last commit1 day ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

Mamba Architecture

Skill

Orchestra-Research

Rwkv Architecture

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

Skill

Orchestra-Research

Rwkv Architecture

Skill

davila7

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill

Orchestra-Research

Implementing Llms Litgpt

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill

Orchestra-Research

Distributed Llm Pretraining Torchtitan

Skill

davila7