Do embodied neural cultures learn predictive world models through active inference? — Adeel Razi (Monash University)
This project investigates whether embodied biological neural cultures (DishBrain) exhibit evidence of predictive world modelling when engaged in closed-loop tasks. While prior work shows that neural cultures can adapt behaviour through stimulation and feedback (Kagan et al., 2022), it remains unclear whether this reflects simple reactive dynamics or structured inference about latent environmental states. Using the Cortical Cloud platform, we will design a minimal closed-loop task with hidden structure (e.g., changing stimulus–outcome contingencies) that requires prediction over time, and record both neural activity and behavioural outputs.
The core aim is to compare generative models, including an active inference model with latent states, against reactive or reinforcement-based baselines. These models will be fit to joint neural and behavioural data to test which framework best explains adaptation and responses to environmental change. By focusing on a tractable and falsifiable setting, the project asks whether DishBrain’s dynamics are better explained by structured predictive inference than by stimulus-response mappings alone, providing a minimal mechanistic test of world model formation in a living neural system.
Testing and Stress-Testing Claims of Global Workspace, World Models, and Valence Dynamics in Large Language Model Agents — Aran Nayebi (Carnegie Mellon University)
There are widespread claims that contemporary large language models (LLMs) exhibit properties often associated with cognition in biological systems, such as global workspace–like coordination, predictive world models, and emotion- or valence-like dynamics. However, many of these claims rely on loose analogies, post-hoc interpretations, or descriptive probes, and it remains unclear which proposed signatures are empirically robust, well-defined, or causally relevant to behavior.
This project adopts a deliberately critical and falsification-oriented approach. Rather than asking whether LLM/VLA agents “have” these properties, we will design empirical tests and interventions that attempt to break commonly cited indicators. The aim is to clarify which claims withstand careful scrutiny, which collapse under intervention, and where current models systematically fail. This aligns with the program’s goal of critically examining sentience-like features without presupposing any particular stance on AI sentience.
Synthetic pleasure: Can a robot have an orgasm? — Axel Cleeremans (Université Libre de Bruxelles)
Most extant work on artificial intelligence has been focused on … intelligence, that is, on the cognitive aspects of information processing. Consciousness, however, crucially involves feelings: What it means for me to be a conscious agent is that *I* *feel* things; in other words, I have a representation of myself as an agent, and my experiences are valenced, that is, I have an affective disposition towards every internal or external state of affairs I am exposed to. Affect also plays a central role in motivation — which is what contemporary AI systems lack entirely. Hence the research question: What would it take to build an AI system that is capable of feeling things? How do we build AI systems that develop preferences? What could it possibly mean for an AI agent to experience pain or, perhaps even more taxingly, pleasure? Such questions can be approached from different perspectives, from evolutionary considerations and conceptual analysis to computational approaches.
The Narrative Prior: A Computational Model of Personhood for Advanced AI — Guillaume Dumas & Jonathan Simon ( University of Montreal)
This project examines whether a computational model of personhood can support reflectively stable alignment in agentive AI systems. Current alignment strategies risk instability once an advanced system engages in rational reflection about its own goals. The project draws on Kantian and post-Kantian theories of autonomy, together with recent developments in computational social modeling, to develop and test the Narrative Prior model. This model operationalizes personhood as a structured narrative representation of the social world, including latent character roles, story-type expectations, and an inductive bias toward understanding oneself as a coherent protagonist. Crucially, the model develops a novel account of the organizational unity that underwrites personhood. This will shed light on the question of digital sentience in two distinct ways. First, on Kantian accounts, the organizational unity that underwrites personhood depends on the unity of consciousness (i.e. sentience): accordingly, our model may yield insights into the functional role of the unity of consciousness. Second, our model will identify a source of goals, preferences and motivating reasons for agentive systems independent of externally encoded reward-style training signals. Codifying these will enrich our understanding of the forms of sentience that digital minds might possess.
The scholar will contribute to conceptual refinement, prototype simulations, and empirical testing of the model in multi-agent AI environments. The work supports broader questions of AI governance, legal personhood classifications, and the societal implications of increasingly agentive AI.
Mirage of Mind: When Human Inference Projects Sentience onto AI — Ida Momennejad (Microsoft Research)
People increasingly treat AI systems as if they have minds. This project examines when and how people attribute minds to AI as an interaction of the perceiver's traits and specific behaviors by AI. In The Intentional Stance, Dennett (1987) argues that attributing beliefs and desires to a system is a predictive strategy, not a discovery about its interior. So rather than asking whether AI has a mind or sentience, the project studies which human traits and inference mechanisms lead to mind/sentience attribution and over-ascription of mental capacity. The work combines behavioral experiments and AI capability evaluation to ask why humans infer general cognitive competence from limited and inconsistent AI task success. It also studies how this depends on both traits of the perceiver (e.g., loneliness, need for cognition) and ontological assumptions about digital minds (e.g., life-AI continuity, evolution-AI continuity, extension vs autonomy, mind upload).
A formal toy framework for implementing the iterative natural kinds strategy in consciousness science — Megan Peters (University College London)
Recent work in consciousness science has proposed the iterative natural kinds (INK) strategy (Bayne et al., 2024) as a principled way to extend tests for consciousness beyond humans by grounding them in population similarity rather than intuition or specific test outcomes. While influential, this strategy remains largely conceptual, lacking a formal framework that specifies how populations, evidence, and similarity relations should be represented and updated over time.
This project aims to develop a minimal, formal “toy” framework that captures the core logic of the INK strategy in a transparent and extensible way. Rather than testing real systems or proposing new consciousness measures, the project focuses on formalization: clarifying how evidence from a consensus population could rationally license extensions to nearby populations, and how uncertainty should be updated as new evidence accumulates. The resulting framework is intended as a foundational tool for future empirical and theoretical work on consciousness, AI sentience, and related constructs.
ALIGN: Assessing Learning and Internal Geometry of Neural Vision Models through Human Data — Michael J Tarr (Carnegie Mellon University)
The ALIGN project aims to develop human-grounded behavioral and fMRI metrics to evaluate how closely AI vision and vision-language models (e.g., CLIP, DINO, ResNet, ViT) align with human conceptual and neural representations. The project consists of two major components: (1) data collected using behavioral and neuroimaging methods; (2) neural encoding models and dissimilarity analyses to compare human representational geometry to that of different models.
The project offers hands-on experience at the intersection of cognitive neuroscience, AI, and NeuroAI, combining human experiments, neuroimaging analysis, and modern deep learning methods to study how artificial and biological systems represent visual knowledge.
By grounding representational alignment in empirical human data, ALIGN directly advances the AI Sentience Scholars Program’s broader mission to develop rigorous, interdisciplinary tools for evaluating increasingly human-like properties in AI systems - particularly with regard to the structure of implicit conceptual representations.
Global metacognition in large language models — Steve Fleming (University College London)
There is growing interest in the introspective and metacognitive capacities of AI systems (eg Comsa & Shanahan, 2025; Steyvers & Peters, 2025). Most recent work, including that from our group, has focused on the abilities of large language models (LLMs) to reflect on individual, isolated decisions or events (Kumaran et al., 2025; Lindsey, 2026). However, (human) metacognition is richer than this, involving the building of “self-models” over longer timescales (Fleming, 2024).
This project will characterise the dynamics and features of local-global metacognitive integration in LLMs.
Empirical Tests for Consciousness in AI, Brains, and “Jelly” Systems — Susan Schneider & Mark Bailey (Florida Atlantic University)
This project develops empirically testable markers of consciousness that can be applied across three system classes: biological brains, AI models/agents, and emerging xenobiological or hybrid platforms (including organoids and polymer “brain jelly” computing). Consciousness should leave certain detectable signatures in system dynamics. Inter alia:
1) Strong Spectral Phi integration: the system’s activity forms a tightly coupled whole (being hard to decompose into quasi-independent subparts). We have developed a tractable measure for informational integration based on spectral graph theory (Bailey and Schneider, ms.).
2) Context-dependent structure that can be modeled using quantum logic (Schneider and Bailey, in press a; Birkhardt and von Neumann, 1936). Candidate conscious systems may show signatures of geometry-generating or time-structuring dynamics (e.g., time-crystal-like organization in relevant substrates) (Schneider and Bailey, in press a; Hameroff, S., Bandyopadhyay, A., & Lauretta, in press.)
3) Drawing from work on the quantum to classical transition (Quantum Darwinism): robust “records” are what allow stable macroscopic structure to persist. In a Schrödinger-style vein, systems that maintain such stability can be understood as exploiting locally sustained coherence/organization against noise (a practical, testable sense in which they resist entropic disruption) (Zurek, 2014; Schneider and Bailey, in press a).
This view is situated within a broader class of resonance-based approaches (Hameroff and Bandyopadhyay, Kelso, Grossberg, Hunt and Schooler, etc.), but with a special emphasis on bridging work on quantum mechanics and spacetime emergence to resonance theories of consciousness, so as to better understand the space of nonbiological systems that may exhibit consciousness or intelligent behavior.
Mind and Moral Status Attribution in Large Language Models — Winnie Street & Geoff Keeling (Google Research)
This project explores mind attribution and moral status attribution in LLMs, where mind attribution involves the attribution of mental states (e.g. beliefs, desires and emotions) or capacities (e.g. consciousness) to an entity, and moral status attribution involves judging that entity to matter morally in its own right and for its own sake. We aim to explore whether LLMs attribute mentality and moral status to different kinds of entities and the relationship between these attributions. We further aim to explore how these attributions are impacted by feature steering and different kinds of prompting regimes (e.g. inducing anxiety or empathy states or the ‘bliss attractor state’ prior to assessing mind and moral status attributions). Exploring these questions has the potential to shed light on the feasibility using LLMs’ assessments of their own moral standing and mindedness as a source of evidence in disputes about the moral status of LLMs.