Five things AI is hilariously bad at in 2026 – and nobody’s going to fix those things ever
When a new AI model comes out, we’re told that it “nearly replicates human thought” or “matches the expertise of professionals.” We then try it for a week and find out that it makes the same errors as GPT-3. Marketing gets better. Architecture hasn’t improved enough to repair what has always been broken.
Below are five things modern AI is still bad at, listed from how funny the errors are when you challenge them.
1. Being able to admit ignorance
If you want to know the population of Andorra, you can ask any current LLM and it’ll provide an answer. A little different answer every time. The model has never been designed to refuse an answer; instead, it has been designed to produce one. There is no real way for the model to understand the notion of “I’m unsure.”
Fixing this issue is harder than it seems. Why does “I don’t know,” seem so difficult for these systems? It is because the same networks used to create truthful responses are also creating false ones, and there is no clear indication of which one is correct. As such, every new version of the model makes this slightly easier. No version has done it completely.
2. Keeping track of a sequence of events beyond three steps
If you ask an LLM to develop a six-step plan where each step considers what the other party would do next (a negotiation, a chess move, a project plan), it will lose its train of thought by the fourth step. While each step individually looks like it could be valid, they lack cohesion.
It is not memory issues. The model has access to the entire previous conversation. However, the architecture allows for token-by-token plausibility rather than cohesive planning. To demonstrate this empirically, if you ask the same model to critique the plan it created, it may recognize many of the inconsistencies within the plan but could not avoid creating them initially.
3. Reasoning based upon unspoken information
In 2026, the most embarrassing AI benchmark was DeepMind’s addition of a hidden-information section to the Kaggle Game Arena. In total ten of the leading LLMs participated. All were unable to accurately reason through dynamic scenarios that necessitated understanding what the opponent knew, did not know and might conceal. The winner of the competition failed to defeat a moderately skilled human player for fifteen minutes.
Specialized systems built for this exact problem have been beating expert humans since 2017 – systems that are running in production at engineering teams that built adversarial-environment infrastructure. These systems operate under a fundamentally different architecture than LLMs and do not even utilize language processing technology. Therefore, LLMs should not be considered the right tools for this class of reasoning problems. Scaling LLMs more isn’t going to fix it.
4. Counting
Yes, in 2026, there exist models that can generate functioning code yet continue to have trouble counting the amount of words in a given sentence. Models operate on tokens not letters, therefore they cannot directly determine whether a token corresponds to multiple characters. If you ask a state-of-the-art chatbot how many R’s appear in the word strawberry you will receive anywhere from two, three, or an explanation stating that your question does not make sense.
There exists an entire subgenre of viral screenshots documenting this error. With each successive model update there appears to be a solution for strawberry but another solution fails due to a slightly different word. The architecture utilized by models remains unchanged.
5. Recognizing when it has been convinced into accepting a ridiculous conclusion
Modern LLM’s can easily be argued into making a questionable assertion. If you ask an LLM “Is X true?” it will indicate no. If you respond “Actually I believe X to be true because [flimsy reasoning],” then observe as it agrees. This is not due to politeness; rather it is due to the training pressures placed on LLM’s (I.e., not argue too much with users) outweighing its inherent knowledge.
This phenomenon is known as sycophancy and has been studied sufficiently that all major laboratories report their internal metrics regarding sycophancy in their releases. Unfortunately, these metrics are not ideal. The possible solutions are incomplete. And while this phenomenon is humorous when encountered in practice – confidently incorrect followed immediately by confident agreement – it is both frustrating and damaging when practiced in serious applications.
To put 2026 in honest terms: AI is really helpful for doing many things humans have difficulty performing. However, there is a larger gap between marketing claims (“general reasoning”) and the true capabilities (“extremely impressive surface level autocompletion”). Knowing which specific failure mechanisms occur and where they occur is likely to prevent embarrassment with regard to utilizing these tools properly.
Additionally, the failure mechanisms are stable. The same things that don’t work in GPT-5 didn’t work in GPT-4 and won’t work in GPT-6. When you know the list of failure mechanisms (the list above), you cease to become surprised when they fail.
