Circuit-Tracer: A New Library for Finding Feature Circuits

Open Access
Authors
Publication date 2025
Host editors
  • Yonatan Belinkov
  • Aaron Mueller
  • Najoung Kim
  • Hosein Mohebbi
  • Hanjie Chen
  • Dana Arad
  • Gabriele Sarti
Book title The 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Book subtitle BlackboxNLP 2025 : proceedings of the workshop: November 9, 2025
ISBN
  • 9798891763463
Event 8th BlackboxNLP Workshop
Pages (from-to) 239-249
Number of pages 11
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Feature circuits aim to shed light on LLM behavior by identifying the features that are causally responsible for a given LLM output, and connecting them into a directed graph, or *circuit*, that explains how both each feature and each output arose. However, performing circuit analysis is challenging: the tools for finding, visualizing, and verifying feature circuits are complex and spread across multiple libraries.To facilitate feature-circuit finding, we introduce `circuit-tracer`, an open-source library for efficient identification of feature circuits. `circuit-tracer` provides an integrated pipeline for finding, visualizing, annotating, and performing interventions on such feature circuits, tested with various model sizes, up to 14B parameters. We make `circuit-tracer` available to both developers and end users, via integration with tools such as Neuronpedia, which provides a user-friendly interface.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.blackboxnlp-1.14
Downloads
2025.blackboxnlp-1.14 (Final published version)
Permalink to this page
Back