LLM are able to decipher the text without a key

I’m going over the Machine Learning and AI Skill path and came across ciphering exercise. I was stuck with it for a while and got curious if LLM like chatGPT, bard can directly decipher the message without the key for Vigenère Cipher encoded message.

LLM were able to decipher the text without a key!

How is that possible?
Are LLM so good that they can even predict the ciphered text??
Or were they simply got trained with the exact ciphered text so it was a match?
Let me know your thoughts about it!

The LLM does not actually “know” anything about the problem other than the most probable associations linguistically. If it has enough examples of vignere ciphers, there is some probability (being concrete here) that it will be able to give the key. It doesn’t actually do any computation for deciphering itself (so for example, it’s not going to crack complicated encryption schemes). It has no notion of encryption in general and the meaning of a key. At best it can only string together that the two might be related with some probability (if it has enough associations “learned” from its training data). Notice the “” around learned*.

I try to test it on basics of different subjects and to me this highlights to me a lot of holes it can have on a basic level. I’ll just give you a screenshot of a test i had for gpt-4 a month or two ago on basic stats questions:

It’s things like this (I have many more examples) that lead me to think that you have to already be pretty knowledgeable to be able to confirm its correctness in what it says, Example: ok maybe you’re too tired to type a unit test. But you should know exactly what each piece does, what the conditions are, and importantly, that you don’t create any side effects in it.

*The “” around learn refers to: yes in the literature the term “learn” gets used a lot for when an “agent” models some data (online or offline)… but there is no actual learning going on, it’s just modelling. Just like there’s no actual agent, just a program. The terminology (which personifies non-human things) is used so often in academic literature because in a way it helps digest what is going and talk about it in a more human sense, but the trap is that we have to keep that in the back of our heads the entire time. There is no agent and there is no learning going on. To belabor the point, we would never say a linear regression learns what housing prices are, but it can model them to some degree of accuracy if the conditions are right and some assumptions are made.

3 Likes