14.09.2025 –, C205 - IFG Arena
Language: English
With the lack of transparency in Apple's review, an ethical benchmark of Apple Intelligence against rivals like Gemini and ChatGPT reveals key weaknesses in handling misinformation and a notable potential for real-world bias.
As Apple Intelligence becomes deeply integrated into everyday computing, a comprehensive, transparent ethical evaluation of its foundation model is necessary. While Apple has published their own review, it lacks transparency and fails to address critical ethical considerations.
To fill this gap, this talk systematically benchmarks Apple Intelligence against leading AI models including Google Gemini, OpenAI ChatGPT, and DeepSeek across multiple ethical dimensions. We see significant variations in content moderation approaches; from ChatGPT's strict moral filters to DeepSeek's political content suppression; and uncover where Apple's model stands in this landscape.
The presentation reveals how Apple Intelligence fails catastrophically in the domain of misinformation compared to the other models, while also demonstrating concerning potential for bias in real-world applications when asking the model through a direct conversational agent.
And, as a little treat, we will take a short look at how Apple’s image generation measures up in these ethical evaluations as well.
(Marvin) Jerome Stephan not only has too many first names and is lactose intolerant, he is also a Master’s student at the Hasso-Plattner Institute in Potsdam, where he currently works on his Master’s project at Dr. Jiska Classen's research group.
Since dabbling with image models even before they could produce any coherent pictures, he has always tried to stay on the edge of self-hosted generative AI.