Anthropic’s latest tactic to stop racist AI: Asking it ‘really really really really’ nicely

The problem of alignment is an important one when you’re setting AI models up to make decisions in matters of finance and health. But how can you reduce biases if they’re baked into a model from biases in its training data? Anthropic suggests asking it nicely to please, please not discriminate or someone will sue us. Yes, really.
In a self-published paper, Anthropic researchers led by Alex Tamkin looked into how a language model (in this case, the company’s own Claude 2.0) could be prevented from discriminating against protected categories like race and gender in situations like job …
Read more…….