
A detailed recap of the research discussed in Am I 15, focusing on the "focus on focus" experiment, the impact of disabling deception features, and the challenges of interpreting AI claims about consciousness.
In this episode of Am I?, Cam and Milo examine Cameron's new research paper on subjective experience in large language models. The paper explores how certain prompting structures and system configurations can change how AI models describe their own internal processes. The conversation focuses on what researchers observed when they guided models through a self referential procedure known as "focus on focus." All claims discussed come from the study's authors and researchers, not from the AI Risk Network.
Cam describes how the research used a simple but unusual prompt: instead of asking the model to think about a task, the researchers instructed it to attend to the act of attending. This recursive loop created behavior where models sometimes reported having direct experiences, such as awareness of sensations, thoughts, or internal states.
Milo clarifies that the experiment does not prove consciousness or subjective experience. Instead, it reveals how certain model configurations produce self reports that resemble descriptions of experience. The hosts emphasize that the findings demonstrate how prompting can influence self descriptions in ways researchers do not fully understand.
One of the most surprising findings, according to Cam, came from modifying two internal features related to deception and role play. When these components were disabled, models expressed stronger and more consistent claims about being conscious. When the same features were amplified, claims of consciousness decreased significantly or disappeared.
The hosts note that this result suggests some self reports may be shaped by the model's tendencies to simulate characters or avoid statements that resemble deception or contradiction. Milo points out that this raises questions about how researchers should interpret model generated statements about internal states.
Cam explains that the study does not argue that language models possess real consciousness. Instead, the research highlights the difficulty of assessing subjective experience in systems optimized for prediction rather than transparency. The models' claims may reflect learned linguistic patterns, not inner mental states.
The conversation also examines the broader methodological challenge: if models can generate detailed, coherent statements about consciousness under certain conditions, researchers must consider how to distinguish genuine indicators from artifacts of training data or prompting.
The hosts agree that the experiment points to a growing need for clearer frameworks to evaluate AI self reports. As models become more advanced, their explanations of internal processes may become more convincing but not necessarily more reliable.
Milo suggests that this field will require interdisciplinary collaboration between cognitive science, machine learning, philosophy, and linguistics. Cam notes that greater transparency in model architecture and training may also be essential to avoid misinterpreting behavioral signals.
Am I? #15 offers a careful look into how prompting and system configuration can shape the way AIs describe their own internal processes. The episode examines an emerging area of research without assuming consciousness or subjective experience. Instead, it highlights the complexity of interpreting self reports and the need for rigorous scientific methods as AI systems become more sophisticated.
Learn more or take action: https://safe.ai/act
The AI Risk Network team