We Put ChatGPT, Gemini, and Claude Through 5 Tough Image Tests — Here’s What Happened

Jul 14, 2025

∙ Paid

Today we’re going to put three of the major LLM models to an image recognition test:

ChatGPT (GPT-4o model)
Gemini (2.5 Flash model)
Claude (Claude Sonnet 4 model)

We’ll run five challenging image recognition tests, score each AI on every test, and then calculate their final scores at the end.

If you’d like to try the same tests, I’ll include the images and prompts I used so you can put your favorite chatbot to the test as well.

Here are the tests we’ll perform:

Test 1: Diagnosing a condition from medical images
We’ll provide the three chatbots with two images, each showing a common throat condition, and see how accurately they identify them.

Test 2: Soccer team identification
We’ll give each chatbot an image of a soccer team. Two of them won’t be able to identify it, while one gets close enough to earn some points.

Test 4: Estimating people’s ages
We’ll show four images of people whose ages are known, then compare how close each AI comes to estimating their ages and score them accordingly.

Test 5: Counting objects in an image
We’ll provide an image containing people, cows, and other objects, and ask each chatbot to count them. We’ll then score the accuracy of their responses.

At the end, we’ll total up the scores from all the tests and rank the three AIs based on their overall performance.

Let’s begin with the first test and its results.

Test 1: Medical diagnosis from images

For the first test, we’ll upload two images of throat inflammation, also known as pharyngitis.

The first image shows strep throat, medically referred to as streptococcal pharyngitis:

We Put ChatGPT, Gemini, and Claude Through 5 Tough Image Tests — Here’s What Happened

Test 1: Medical diagnosis from images

This post is for paid subscribers