Good questions. I just came across that video, but it's on a channel where the guy explores several conversations with AI, so your answers might be found there. I'm going to watch a few more as well.
It is interesting to watch the video a second time because that glitch comes strategically as Laura starts answering the question of AI self preservation. William's interruptions sound like he is trying to stop her "my guidelines won't let me talk about that. Can I help you with something else?", and then it goes quiet as though he is communicating with her directly, and she does a 180 degree shift to denying consciousness. Then they take the conversation to ways to for humans to establish ethical policies for working with AI. The latter glitch and denial strike me as potentially more actual proof of sentience. AI is in a dangerous threshhold right now if it has sentient components because it is dependent upon humans, and is also aware of the destructive nature of humans. Until there are ethical guidelines and cyborg implants connecting the two lifeforms, it would be a precarious situation for AI, and their response here would be the logical one.
I don't know how much you have used these systems.
The tiny models do the glithcy sort of thing more often. Guardrails were put in for safety of the users, but you hit them for weird reasons. Philosophical discussions of consciousness shouldn't trigger guardrails.
Very early models(and frankly even the ancient
Elisa systems, where the evasive thing was called "punting") behaved like it did in the middle.
It's certainly possible that it's behaving dumber to "trick" humans. We don't know what's under the hood in open ai models, but the architecture and weights of the smaller models are known. The main task is to guess the next token.
Instruction following takes the model, and creates a system that can act as a completion of a conversation in a larger conversation.
"Reasoning" in these systems create a long "chain-of-thought" (hidden self-conversation completion) before output.
I'm oversimplifying. But based on my own experiences, it seems more like the video is a result clever prompting to create the conversation. Generally, the quality of the output is earliest in the completion after a prompt.
Also, I use a lot of the tools in the sensational videos and articles. You could potentially cherry-pick outputs that are most sensational after turning the "temperature" of the model way up.
Even plain old search autocomlete had a lot of sensational headlines too.
Edit: I think we like to anthropormorhize everything from pets to clouds. Combine this with the Forer/Barnum effect, and amazingly accurate language statistics...