跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Sparse Autoencoder Training & Analysis

技能 已验证 活跃

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.

目的

To enable researchers and practitioners to discover interpretable features within neural networks by training and analyzing Sparse Autoencoders.

功能

  • Train custom Sparse Autoencoders
  • Load and analyze pre-trained SAEs
  • Decompose neural network activations into sparse features
  • Perform feature attribution and steering
  • Analyze superposition and monosemanticity

使用场景

  • Discovering interpretable concepts learned by neural networks
  • Analyzing feature interactions and superposition effects
  • Studying safety-relevant features like bias or deception
  • Performing feature-based model steering or ablation experiments

非目标

  • Directly modifying neural network architectures beyond SAE integration
  • Performing causal intervention experiments without SAE features
  • Production deployment of steering mechanisms (focus is on analysis)

工作流

  1. Load model and pre-trained SAE
  2. Get model activations
  3. Encode activations to SAE features
  4. Analyze features and reconstruction
  5. Optionally, train a custom SAE
  6. Analyze feature attribution and steering

实践

  • Mechanistic Interpretability
  • Feature Engineering
  • Model Analysis

先决条件

  • Python 3.10+
  • transformer-lens>=2.0.0
  • torch>=2.0.0
  • sae-lens>=6.0.0

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证
98 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码

类似扩展

Sparse Autoencoder Training

98

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.

技能
davila7

Embedding Strategies

100

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

技能
wshobson

Aws Cdk Development

100

AWS Cloud Development Kit (CDK) 专家,用于使用 TypeScript/Python 构建云基础设施。在创建 CDK 堆栈、定义 CDK 构造、实现基础设施即代码,或当用户提及 CDK、CloudFormation、IaC、cdk synth、cdk deploy,或希望以编程方式定义 AWS 基础设施时使用。涵盖 CDK 应用结构、构造模式、堆栈组合和部署工作流。

技能
zxkane

Fit Drift Diffusion Model

100

Fit cognitive drift-diffusion models (Ratcliff DDM) to reaction time and accuracy data with parameter estimation (drift rate, boundary separation, non-decision time), model comparison, and parameter recovery validation. Use when modeling binary decision-making with reaction time data, estimating cognitive parameters from experimental data, comparing sequential sampling model variants, or decomposing speed-accuracy tradeoff effects into latent cognitive components.

技能
pjt222

Ui Ux Pro Max

100

UI/UX design intelligence with searchable style, palette, typography, and chart databases. Use when designing UI components, choosing colors/fonts, reviewing code for UX issues, building landing pages, or implementing responsive layouts.

技能
spartan-stratos

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

技能
sanjay3290