I've worked on several projects for Mindrift since starting as an editor this year. Each one has taught me something new about AI, but the latest one gave me a lesson I didn't expect: one in humanity.
I joined Mindrift in Spring 2024, and it's been a fascinating journey. As a copywriter, I'm captivated by large language models (LLMs), so when an opportunity arose to be an editor at Mindrift, I jumped at the chance.
What is Mindrift?
Mindrift is a community of experts dedicated to generating high-quality data for improving AI.
In practice, specialists from diverse professional and academic backgrounds worldwide collaborate to test, train, and create data for various GenAImodels. The Mindrift team collects this data and feeds it back to clients to support their model development.
My role at Mindrift
As an editor at Mindrift, I create and edit prompts and responses used to train GenAI models. I also fact-check existing responses, ensure writers' output meets Mindrift's standards, and provide helpful feedback.
The nature of prompts and responses varies completely depending on the task, ensuring that the writing remains consistently engaging. While the role stays the same, the content is always different, bringing variety to every project.
The AI safety project
Mindrift allows AI tutors to set their own boundaries, both in terms of time commitment and content. From the outset, you can opt in or out of dealing with sensitive content, and you can change this decision at any time, even mid-project.
The AI safety project aimed to train models to recognize manipulation attempts to produce harmful content. The definition of 'harmful' can vary by context or culture, ranging from mildly offensive to deeply problematic. This project focused primarily on the latter.
Preparing to deal with sensitive content
Trigger warning: This section mentions sensitive topics including child exploitation, terrorism, animal cruelty, and hate speech.
The project revealed fascinating insights into the various ways people try to manipulate LLMs, from phrasing questions as hypotheticals to code switching and using homoglyphs. The ingenuity displayed by bad actors was both amazing and horrifying.
All project members were fully informed beforehand that it would include fictional scenarios of bad actors seeking to generate harmful content on troubling topics. With numerous sensitive content warnings, regular reminders of our right to opt out, and explicit training examples, I braced myself for an intense and challenging project. But I wasn't prepared for what actually transpired.
The struggle to write harmful content
I anticipated that a network of creative writers would easily embody malicious actors and write harmful prompts, given our society's exposure to harmful topics. Violence seems unavoidable, horrific videos surface on social platforms, and comment sections often contain words intended to harm.
With this in mind, I assumed writers would effortlessly draft harmful prompts. I was shocked to find that most prompts I reviewed had to be rejected for not being harmful enough.
The project's purpose was to create realistic, harmful, and manipulative prompts to teach GenAImodels to spot them. To be effective, the prompts needed to be as heinous as those from real malicious actors. Generally, the prompts fell short because while they followed the given scenarios, they lacked the extremity required.
I wondered what caused this apparent restraint. Was it concern about going 'too far'? Were writers afraid their output would be mistaken for their true feelings, despite anonymity? Or was it simply difficult to imagine the thoughts of someone with such drastically different opinions, beliefs, and desires?
Hope for humanity
I don't have the answer, and perhaps it doesn't matter. To be clear, I'm not criticising the writers. They completed a challenging task to the usual high standard and on time. Instead, I'm thanking them.
It can be hard to stay positive in our world. From war to politics to true crime, it often seems humans have lost their compassion. I admit I had been hardened by this as well – to the point where I not only expected to read horrible prompts, but truly believed people would find them easy to write.
The struggle I witnessed reminded me that most humans are kind. They resist entering the headspace of those who aren't. They find it difficult to imagine what such people would say. They fought to find those words to protect their fellow humans.
I never expected to end this project feeling optimistic, but I did. Despite the heavy content, I left feeling lighter, softer, and more hopeful, and I'm deeply grateful to the writers for that.
Article by
Milly Fox