Can generative AI detect pain in calves?

New technology may assist in pain assessment.

black and white faced calf in a white calf house
A dairy calf after disbudding. Photo credit: Pedro Trindade, MSU.

What if your smartphone could help you tell if a calf is in pain just by taking a photo? With AI tools like Claude, ChatGPT or Gemini, that future is closer than you might expect.

Dairy calves express pain through subtle changes in their facial expressions. Specific muscle movements and tensions around the eyes, ears, nostrils, mouth and muzzle have been systematically described as Facial Action Units (FAUs) and are associated with acute pain by researchers. Previous studies have shown that a trained human can score pain-related FAUs from a single photograph to identify and estimate pain intensity in beef and dairy cattle‌. Our team is developing a specific grimace pain scale for dairy calves. While this is an important advance, manual annotation is time-consuming and requires specialized training.

With the goal of providing a more accessible approach for real-time use on dairy farms, we asked a practical question: can available Generative AI tools perform pain assessments comparable to those of trained humans using only minimal training?

To address this question, six female calves were recorded with high resolution video cameras at eye level for 30 minutes at one time before and five times after hot-iron disbudding. From this video footage, 250 still images were extracted. To prevent bias, image order was randomized, ear-tag IDs were masked, and the horn area was blurred so the evaluator could not tell if a calf had been disbudded.

Trindade lab camera.JPEG
Members of the Trindade lab stand near a video camera capturing images of calves. Photo credit: Trina VanAtta, MSU.

Our team developed a three-page reference guide containing text definitions and example images of 16 pain-related FAUs. This material served as guidance for a trained human evaluator who scored each image twice within a two-week interval. The same guide was also used as the minimal training prompt provided to Claude (Sonnet 4.5, Max plan, Anthropic) to score the 250 calf images.

Cohen’s kappa coefficient (κ) was used to measure reliability:

  • very good (0.81–1.00)
  • good (0.61–0.80)
  • moderate (0.41–0.60)
  • reasonable (0.21–0.40)
  • poor (< 0.20)‌

Humans were both consistent between the two rating periods and also very good at detecting sleeping, tongue cleaning nose, tongue outside mouth, ears backwards and ears in different directions. However, humans were only moderate at identifying tension above eyes and reasonable at detecting totally closed eyes.

Next, we compared the reliability between human and Claude FAU detection to determine for which traits Claude was consistent with the judgments from the human evaluator. We found that there was good reliability for sleeping and moderate reliability for drooping ear, tension above eyes and tongue cleaning nose. However, there was only reasonable reliability for partially open eyes, ears in different directions and lip tension, and poor reliability for dilated nostril and ear backwards.

Now, we need your help! To build a larger database of calf images across breeds, ages and photo conditions, we need clear images of a calf’s head taken before or after disbudding. We plan to publish and share our reference guide used in training human and AI evaluators. Contact Dr. Trindade, DVM, at trindad4@msu.edu if you would like to contribute.

Did you find this article useful?