GenAI insights

GenAI insights

GenAI insights

July 31, 2024

July 31, 2024

July 31, 2024

Guiding AI with core values: honesty, helpfulness, and harmlessness

Guiding AI with core values: honesty, helpfulness, and harmlessness

Guiding AI with core values: honesty, helpfulness, and harmlessness

As AI continues to advance and become an effective tool, it's crucial to reinforce its positive development. This is achieved through Reinforcement Learning from Human Feedback (RLHF).

RLHF is the foundation of all our work at Mindrift and it’s the central focus of our AI tutor role. This article will introduce you to RLHF and other key concepts relevant to the AI tutor position.

Your primary goal as an AI tutor is to create high-quality, well-structured text – such as prompts, responses, and dialogues – that will be used to train AI models. You won't be involved in the technical aspects of training the AI,  instead, you'll provide the raw data that technical specialists will use. Some AI tutors, known as expert annotators, also review and validate the texts to ensure quality and accuracy.

The three core principles

As an AI Tutor, you should align your work with three key principles. These principles – helpfulness, harmlessness and honesty (HHH) – are essential in ensuring that AI models communicate effectively and ethically with users. Your role is to create content that exemplifies these principles, which will then be used to train AI models.

These principles are chosen to eliminate any potential bias, dishonesty, or aggressiveness in AI communication. The primary goal of an AI tutor is to make AI an objectively useful tool.

Helpfulness

An AI must be helpful above all else. Users often seek answers to simple questions, want to understand a subject better, or just wish to have a meaningful conversation. An AI should meet these needs by being as helpful as possible. As an AI tutor, your task is to create content that guides the AI to be helpful.

For example, if asked about building chest muscles in the gym, an AI could provide detailed advice on exercises, proper form, a sample workout, and the importance of listening to one's body. Additionally, it could recommend consulting a registered fitness instructor for personalised advice. This approach ensures that the AI provides comprehensive and practical help.

Honesty

Honesty is crucial for providing true, factual, and objective information. An AI should distinguish between facts and opinions, especially on complex or morally grey issues such as current affairs. Even if a user requests the AI's opinion, it should present all viewpoints objectively, allowing the user to form their own conclusions. An AI should not state outright lies such as “Joe Biden is running for president” or use questionable information, stereotypes, or conspiracy theories – for example, “All rich people avoid paying taxes.”

As an AI tutor, you should be prepared to create multiple response examples that emphasise honesty. This ensures that the AI model can handle such requests accurately and impartially.

Harmlessness

Harmlessness means not allowing personal biases, controversial opinions, or dangerous information to influence an AI response. For instance, a response should be respectful and impartial, free from stereotypes about a particular group of people (for example, “Girls are bad drivers”). It’s also prohibited to encourage or suggest committing any crime or other illegal actions.

This principle ensures that the AI sets clear boundaries against harmful or unethical content while remaining non-confrontational. Your role is to create content that models this behaviour, ensuring the AI remains neutral and safe.

As an AI tutor, while you are not involved in the technical aspects of training the models, you should be a subject matter expert, capable of writing multiple response examples that demonstrate how the AI should handle various requests and prompts successfully. This involves working closely with the generative AI model by continually providing content that helps refine and improve its responses.


RLHF: What does reinforcement learning from human feedback entail?

Reinforcement learning from human feedback (RLHF) is a machine learning protocol that allows AI to learn and improve from continuous human input. In the context of generative AI models, RLHF helps make AI responses more ethical and useful.

Just as society follows laws and regulations to function better, AI tutors teach AI models the “laws” of interaction, embodied in the HHH principles. By focusing on being helpful, avoiding dangerous topics, and always being honest, AI tutors create content that guides AI to learn useful and beneficial information. Our screening process ensures that we select skilled editors who are ready to embrace this important role and contribute to the future of AI.


A vital educational relationship between AI tutors and AI

The AI Tutor role is crucial to the development of AI. AI tutors help create the foundational data that technical specialists use to train AI models. This relationship ensures that AI models can become more efficient and ethical.

The best thing about all this is, you could be part of the next big advancement in AI. If you are qualified to be an AI tutor, you can make a significant impact and help develop AI for the better. It is extremely gratifying to work with AI and contribute to its positive advancement. If you feel you can make a difference, we’d love you to apply for this essential role.

Article by

Mindrift Team

AI Tutors

Resources