Transformer Lens Interpretability

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

Purpose

To empower researchers and practitioners to deeply inspect and manipulate transformer model internals for mechanistic interpretability studies using the TransformerLens library.

Features

Inspect and manipulate transformer internals via HookPoints
Perform activation caching and patching experiments
Analyze attention patterns and information flow
Reverse-engineer learned model algorithms
Support for 50+ transformer models including LLaMA and Mistral

Use Cases

Reverse-engineering algorithms learned by transformer models
Performing activation patching and causal tracing experiments
Studying attention patterns and information flow within models
Analyzing specific circuits like induction heads or IOI circuits

Non-Goals

Working with non-transformer architectures
Training or analyzing Sparse Autoencoders
Remote execution on massive models requiring specialized infrastructure
Higher-level causal intervention abstractions better suited to other libraries

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

99 /100

Analyzed about 20 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

TransformerLens Mechanistic Interpretability

Skill

davila7

Embedding Strategies

100

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

Skill

wshobson

Aws Cdk Development

100

AWS Cloud Development Kit (CDK) expert for building cloud infrastructure with TypeScript/Python. Use when creating CDK stacks, defining CDK constructs, implementing infrastructure as code, or when the user mentions CDK, CloudFormation, IaC, cdk synth, cdk deploy, or wants to define AWS infrastructure programmatically. Covers CDK app structure, construct patterns, stack composition, and deployment workflows.

Skill

zxkane

Fit Drift Diffusion Model

100

Fit cognitive drift-diffusion models (Ratcliff DDM) to reaction time and accuracy data with parameter estimation (drift rate, boundary separation, non-decision time), model comparison, and parameter recovery validation. Use when modeling binary decision-making with reaction time data, estimating cognitive parameters from experimental data, comparing sequential sampling model variants, or decomposing speed-accuracy tradeoff effects into latent cognitive components.

Skill

pjt222

Ui Ux Pro Max

100

UI/UX design intelligence with searchable style, palette, typography, and chart databases. Use when designing UI components, choosing colors/fonts, reviewing code for UX issues, building landing pages, or implementing responsive layouts.

Skill

spartan-stratos

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

Skill

sanjay3290