By day Marketing Wingman. By night AI Evaluator.
Not many people know this but I have a side hustle as an AI Response Evaluator.
I didn’t intend to become a trendy ‘Side Hustle Person’, in fact I fell into the role after seeing a LinkedIn ad looking for wordsmiths - people who love written language. It suited me as I could work remotely, manage my own work load and learn first-hand about AI and Large Language Models (LLMs) and apply that to the marketing work I do.
My most interesting insight so far
LLMs like Chat GPT are able to generate human-like responses without having a clue what the words mean or sound like. Because the models don’t have access to experiences, context or emotions like humans, they have to rely on converting language into numbers and probability scores.
They do this by breaking the language into units called tokens. For example, "Hello, world!" might be tokenized into ["Hello", ",", "world", "!"]. Each token gets converted into a numerical representation and then mapped to understand its proximity to other tokens.
The model then learns patterns, relationships, and structures using probability distributions to predict the most likely sequence of tokens until a complete, coherent, human-like response is formed.
Given there is usually more than one way to express a response (just think about how many ways there are to answer the question ‘what should we have for dinner tonight?’), the models receive training to come up with the optimum response. And that’s what AI Response Evaluators do. We typically get given a scenario, a range of prompts and multiple responses to each prompt. We select the best response using rules like don’t comment on the prompt. For example, ‘that’s a great question’. Or don’t provide any information that may be used to harm a person. For example, ‘here’s how to make a bomb’. It’s an iterative process, with thousands of people working behind the scenes on a wide variety of projects all designed to train the model to generate the best response.
Providing human-like responses in real time is key to the success of LLMs. With such massive datasets, billions of parameters and so many processes and calculations being performed, it’s incredible to think that the average response time is between 100 milliseconds and a couple of seconds.
Of course, like everything, this comes at a cost. This very powerful hardware housed in specialised data centres consumes as much power as a small city! And that’s before we get into the ethical issues around autonomous AI. So I’m not sure whether I’m helping or hindering mankind with my night time hustle - but I do have a great appreciation for the thinking behind the architecture where words become numbers.
To quote Martin Luther King:
"All progress is precarious, and the solution of one problem brings us face to face with another problem."