As OpenAI’s multimodal API launches broadly, research shows it’s still flawed

Today during its first-ever dev conference, OpenAI released new details of a version of GPT-4, the company’s flagship text-generating AI model, that can understand the context of images as well as text. This version, which OpenAI calls “GPT-4 with vision,” can caption and even interpret relatively complex images — for example identifying a Lightning Cable adapter from a picture of a plugged-in iPhone.
GPT-4 with vision had previously only been available to select users of Be My Eyes, an app designed to help vision-impaired people navigate the world around them; subscribers to the premium tiers of OpenAI’s AI-powered chatbot, …