A bilingual field guide to Intellitons

generation steps. Welcome to the Intellitons Blog. This GitHub Pages site turns the project’s research outputs into readable popular-science essays about a bold claim: large language models may contain stable, recurrent collective modes that can be tracked across layers, prompts, and generation steps.

The articles are based on the code and results in the Intelliton repository. They are written for readers who want a rigorous but accessible explanation of what the project is measuring, why the physics-flavored coordinate system is useful, and how to read the results without over-literalizing the particle metaphor.

  • Start with the first post if you want the core idea in plain language: residual streams are not being treated as literal matter, but as signals rewritten in a familiar lattice-field coordinate system.
  • The next post explains how to read an Intelliton report in human terms, including what I_0 to I_4 are doing and what momentum, spin-like complexity, mass, and helicity actually mean.
  • The newest explainer focuses on tasks: why pronoun tracking, arithmetic, factual recall, logical reasoning, and syntactic agreement light up different internal modes even though they all look like simple next-token prediction from the outside.
  • The three research-direction posts connect the Intelliton framework to the Abliteration / ARA jailbreaking technique demonstrated on Gemma 4 in April 2026, and propose how the Intelliton species catalogue can serve as an alignment audit and steering instrument.

How to use the site

  • Use the language toggle in the top-right corner to switch every article between English and Chinese.
  • Your preferred language is saved locally in the browser and reused on later visits.
  • The first five posts form a short introductory series; the three posts from 6 April onward form a research-directions series on alignment and representation engineering.

一份面向双语读者的 Intelliton 导读

欢迎来到 Intellitons 博客。这个 GitHub Pages 站点把项目中的研究结果整理成更易读的科普文章,围绕一个大胆的问题展开:大语言模型内部是否存在能够跨层、跨提示词、跨生成步骤稳定追踪的集体模式。

这些文章基于 Intelliton 仓库 中的代码与实验结果撰写,面向希望理解 AI 内部机制、但不必预先掌握高深物理或数学背景的读者。重点不是把模型说成真的有粒子,而是解释项目究竟测量了什么、为什么这套“物理坐标系”有帮助,以及应该怎样读图、读表、读模式。

推荐阅读顺序

  • 第一篇先把核心想法讲清楚:这里不是说模型里有真实粒子,而是把残差流换到一套更熟悉的晶格场论坐标系里重新描述。
  • 第二篇专门教你“怎么看谱表”,把 I_0I_4、动量、类自旋、质量、螺旋度这些词都翻译成人话。
  • 最新的一篇任务导读会解释:为什么代词跟踪、算术、事实回忆、逻辑推理和句法一致性,会点亮不同的内部模式。
  • 之后的三篇研究方向文章,把 Intelliton 框架与 2026 年 4 月 Gemma 4 越狱事件中展示的 Abliteration/ARA 技术联系起来,并提出如何把 Intelliton 物种目录用作对齐审计和引导工具。

使用方式

  • 右上角的语言切换按钮可以在整站范围内切换英文与中文。
  • 浏览器会在本地保存你的语言偏好,下次访问时自动沿用。
  • 前五篇构成入门系列,4 月 6 日起的三篇构成关于对齐与表征工程的研究方向系列。

Latest Articles

最新文章

Each article includes complete English and Chinese versions, and your preferred language is remembered locally.

每篇文章都提供完整的中英文版本,页面会在本地记住你的语言偏好。

2026-04-08

Safety Alignment Through the Intelliton Lens: Toward Structural Guarantees 用 Intelliton 视角看安全对齐:迈向结构性保证

RLHF-based alignment has been shown to be a thin spectral overlay that can be removed in minutes on any open-source model. This article argues that the Intelliton framework offers a route toward something more robust: structural alignment — where safety-relevant modes are architecturally entangled with capability modes, making removal costly rather than free.

基于 RLHF 的对齐已经被证明只是一层薄薄的谱覆盖,可以在几分钟内从任何开源模型上被移除。 本文认为,Intelliton 框架提供了一条通向更稳健方向的道路:结构性对齐——在这种对齐中,安全 相关模式在架构层面与能力模式相互纠缠,从而使移除变得代价高昂而非轻而易举。

Read article 阅读文章

2026-04-07

Representation Engineering and Intelliton Steering: A Research Proposal 表征工程与 Intelliton 引导:一份研究提案

Representation engineering intervenes directly on a model's internal activations to steer its behaviour — without fine-tuning. The Intelliton framework provides a natural language for describing those interventions: they are changes to specific Intelliton species. This article proposes a research direction that turns the Intelliton species catalogue into a steering map.

表征工程在不做微调的前提下,直接对模型的内部激活进行干预,从而引导模型行为。Intelliton 框架为描述这些干预提供了一套自然语言:它们就是对特定 Intelliton 物种的改变。本文提出一个 研究方向,将 Intelliton 物种目录变成一张可操作的引导地图。

Read article 阅读文章

2026-04-06

Refusal as an Intelliton: What Abliteration Reveals About Alignment Modes 拒绝即 Intelliton:Abliteration 揭示的对齐模式

The Abliteration jailbreak works by locating and erasing a "refusal direction" in the residual stream. That direction is, by the Intelliton framework's own definition, a linear mode of the residual stream — an Intelliton. This article proposes a research direction: use the Intelliton toolkit to characterise refusal as a species, and ask whether alignment modes are measurably distinct from task modes.

Abliteration 越狱的原理是定位并抹去残差流中的"拒绝方向"。按照 Intelliton 框架的定义, 这个方向本身就是残差流的一个线性模式——也就是一个 Intelliton。本文提出一个研究方向:用 Intelliton 工具套件把拒绝刻画为一个物种,并进一步追问,对齐模式能否被测量为在统计上与 任务模式明显不同的 Intelliton 类群。

Read article 阅读文章

2026-04-05

Why Different Prompts Light Up Different Intellitons 为什么不同提示词会点亮不同 Intelliton 模式

All prompt categories look like next-token prediction from the outside, but inside the model they ask for different kinds of work. This article uses the project's five prompt families to explain why different Intelliton modes become active.

从外面看,五类提示词都像“预测下一个 token”;但从模型内部看,它们要求的工作完全不同。 本文结合项目里的五类提示词,解释为什么不同 Intelliton 模式会被点亮。

Read article 阅读文章

2026-04-04

Hallucination as Internal Instability: An Intelliton Perspective 把幻觉理解为内部不稳定性:一种 Intelliton 视角

Hallucination — when a language model confidently produces false or unsupported information — is one of the most pressing practical problems in LLM research. This article explores what the Intelliton framework reveals about hallucination: not as an output-level mistake, but as an instability of internal collective modes during generation.

幻觉,也就是语言模型自信地生成错误或缺乏依据的信息,是 LLM 研究中最紧迫的实际问题之一。 本文讨论 Intelliton 框架对幻觉的启示:它不只是输出层面的失误,更可能是生成过程中内部集体 模式的不稳定。

Read article 阅读文章

2026-04-03

Scaling and Alignment Through the Intelliton Lens 用 Intelliton 视角看规模扩展与对齐

What happens to a model's internal quasi-particle spectrum when you double the parameter count? What does instruction tuning do to the excitation landscape? This article compares four Qwen3 models — 4B vs 8B, Base vs Instruct — and adds Mistral-7B-v0.3 for a cross-family perspective.

当参数规模翻倍时,模型内部的准粒子谱会发生什么变化?指令微调又会怎样改写激发景观? 本文比较四个 Qwen3 模型:4B 对 8B、Base 对 Instruct,并加入 Mistral-7B-v0.3 做跨家族 对照。

Read article 阅读文章

2026-04-02

How to Read `I_0` to `I_4`: A Human Guide to an Intelliton Spectrum 怎么看 `I_0` 到 `I_4`:一份 Intelliton 谱表的人话读法

Spectrum tables can look intimidating. This article translates a representative Intelliton report into ordinary language and explains what `I_0` to `I_4` are doing without over-reading the physics metaphor.

Intelliton 谱表很容易把人吓住。本文把一份代表性的分析报告翻译成人话,解释 `I_0` 到 `I_4` 分别像什么,以及动量、类自旋、质量、螺旋度这些词到底该怎么读。

Read article 阅读文章

2026-04-01

What Are Intellitons? A Friendly Guide to the Lattice-Field View 什么是 Intelliton?一篇看懂晶格场论视角的入门文

Intellitons are not claims that language models literally contain particles. They are a practical way to redescribe the residual stream using a lattice-field coordinate system that makes recurring modes easier to see, compare, and talk about.

Intelliton 不是说语言模型里真的有粒子,而是把残差流换到一套晶格场论式的坐标系里重新 描述,好让那些反复出现、能跨层传播的模式更容易被看见、比较和解释。

Read article 阅读文章