Latest Articles
最新文章
Each article includes complete English and Chinese versions, and your preferred language is remembered locally.
每篇文章都提供完整的中英文版本,页面会在本地记住你的语言偏好。
2026-04-08
RLHF-based alignment has been shown to be a thin spectral overlay that can be removed in minutes on any open-source model. This article argues that the Intelliton framework offers a route toward something more robust: structural alignment — where safety-relevant modes are architecturally entangled with capability modes, making removal costly rather than free.
基于 RLHF 的对齐已经被证明只是一层薄薄的谱覆盖,可以在几分钟内从任何开源模型上被移除。 本文认为,Intelliton 框架提供了一条通向更稳健方向的道路:结构性对齐——在这种对齐中,安全 相关模式在架构层面与能力模式相互纠缠,从而使移除变得代价高昂而非轻而易举。
Read article
阅读文章
2026-04-07
Representation engineering intervenes directly on a model's internal activations to steer its behaviour — without fine-tuning. The Intelliton framework provides a natural language for describing those interventions: they are changes to specific Intelliton species. This article proposes a research direction that turns the Intelliton species catalogue into a steering map.
表征工程在不做微调的前提下,直接对模型的内部激活进行干预,从而引导模型行为。Intelliton 框架为描述这些干预提供了一套自然语言:它们就是对特定 Intelliton 物种的改变。本文提出一个 研究方向,将 Intelliton 物种目录变成一张可操作的引导地图。
Read article
阅读文章
2026-04-06
The Abliteration jailbreak works by locating and erasing a "refusal direction" in the residual stream. That direction is, by the Intelliton framework's own definition, a linear mode of the residual stream — an Intelliton. This article proposes a research direction: use the Intelliton toolkit to characterise refusal as a species, and ask whether alignment modes are measurably distinct from task modes.
Abliteration 越狱的原理是定位并抹去残差流中的"拒绝方向"。按照 Intelliton 框架的定义, 这个方向本身就是残差流的一个线性模式——也就是一个 Intelliton。本文提出一个研究方向:用 Intelliton 工具套件把拒绝刻画为一个物种,并进一步追问,对齐模式能否被测量为在统计上与 任务模式明显不同的 Intelliton 类群。
Read article
阅读文章
2026-04-05
All prompt categories look like next-token prediction from the outside, but inside the model they ask for different kinds of work. This article uses the project's five prompt families to explain why different Intelliton modes become active.
从外面看,五类提示词都像“预测下一个 token”;但从模型内部看,它们要求的工作完全不同。 本文结合项目里的五类提示词,解释为什么不同 Intelliton 模式会被点亮。
Read article
阅读文章
2026-04-04
Hallucination — when a language model confidently produces false or unsupported information — is one of the most pressing practical problems in LLM research. This article explores what the Intelliton framework reveals about hallucination: not as an output-level mistake, but as an instability of internal collective modes during generation.
幻觉,也就是语言模型自信地生成错误或缺乏依据的信息,是 LLM 研究中最紧迫的实际问题之一。 本文讨论 Intelliton 框架对幻觉的启示:它不只是输出层面的失误,更可能是生成过程中内部集体 模式的不稳定。
Read article
阅读文章
2026-04-03
What happens to a model's internal quasi-particle spectrum when you double the parameter count? What does instruction tuning do to the excitation landscape? This article compares four Qwen3 models — 4B vs 8B, Base vs Instruct — and adds Mistral-7B-v0.3 for a cross-family perspective.
当参数规模翻倍时,模型内部的准粒子谱会发生什么变化?指令微调又会怎样改写激发景观? 本文比较四个 Qwen3 模型:4B 对 8B、Base 对 Instruct,并加入 Mistral-7B-v0.3 做跨家族 对照。
Read article
阅读文章
2026-04-02
Spectrum tables can look intimidating. This article translates a representative Intelliton report into ordinary language and explains what `I_0` to `I_4` are doing without over-reading the physics metaphor.
Intelliton 谱表很容易把人吓住。本文把一份代表性的分析报告翻译成人话,解释 `I_0` 到 `I_4` 分别像什么,以及动量、类自旋、质量、螺旋度这些词到底该怎么读。
Read article
阅读文章
2026-04-01
Intellitons are not claims that language models literally contain particles. They are a practical way to redescribe the residual stream using a lattice-field coordinate system that makes recurring modes easier to see, compare, and talk about.
Intelliton 不是说语言模型里真的有粒子,而是把残差流换到一套晶格场论式的坐标系里重新 描述,好让那些反复出现、能跨层传播的模式更容易被看见、比较和解释。
Read article
阅读文章