OpenAI wants to work with organizations to build new AI training data sets

It’s an open secret that the data sets used to train AI models are deeply flawed.
Image corpora tends to be U.S.- and Western-centric, partly because Western images dominated the internet when the data sets were compiled. And as most recently highlighted by a study out of the Allen Institute for AI, the data used to train large language models like Meta’s Llama 2 contains toxic language and biases.
Models amplify these flaws in harmful ways. Now, OpenAI says that it wants to combat them by partnering with outside institutions to create new, hopefully improved data sets.