OpenAI’s latest blunder shows the challenges facing Chinese AI models
Last week’s release of GPT-4o, a new AI “omnimodel”, was supposed to be a big moment for OpenAI. But just days later, it feels as if the company is in big trouble. From the resignation of most of its safety team to Scarlett Johansson’s accusation that it replicated her voice for the model against her consent, it’s now in damage-control mode.
On top of that, the data it used to train GPT-4o’s tokenizer—a tool that helps the model parse and process text more efficiently—is polluted by Chinese spam websites. As a result, the model’s Chinese token library is full of phrases related to pornography and gambling. This could worsen some problems that are common with AI models: hallucinations, poor performance, and misuse. But OpenAI is not the only company struggling with this problem: there are some steep challenges associated with training large language models to speak Chinese. Read our story to learn more.
|
Comments
Post a Comment