In my last blog post, I talked about how training generative AI models to talk about images gave us a fresh perspective on human intuition and intelligence. There, I argued that the key parallel between large language models (LLMs) and human cognition is the capacity to recognize relationships and patterns between data. I want to talk about how this idea also broadens our horizons about a different kind of intelligence – quantitative reasoning. By ‘quantitative reasoning’ I mean our ability to understand and work with numbers. This can mean calculating stuff (like the total price of a grocery list), or processing information from a graph or table (such as finding the most expensive product in a list of products).
An AI model capable of quantitative reasoning needs to be able to process various types of data (tables provided as text files, images of infographics, prose, photographs of diagrams, and so on), find the numerical information in the data and then manipulate and present it as instructed. To do this, the AI needs to be trained by being fed a large data set of sample interactions between the user and the AI model, in which numerical data is extracted from various kinds of input (such as tables) and discussed in conversational language.
We use quantitative reasoning in our daily lives in so many different contexts (reading street numbers, measuring ingredients while cooking, choosing a new laptop) that the prospect of teaching an AI model to work with numbers in all these different circumstances, even by example, appears quite daunting.
Numbers in context
Of course, there’s a clear objection to this line of thinking – computers are amazing at working with numbers, right? How can it be that teaching AIs running on computers to work with numbers is a complex problem?
Here’s why: Quantitative reasoning – as we humans interpret the idea – is not just about analyzing numbers in isolation, but also extracting numerical information from other kinds of data, like text or visuals. Telling an AI to compute 3+5 is easy. Telling an AI to understand and answer the question ‘I spent three dollars on grapes and five dollars on bananas today, how much total did I spend at the grocery store?’ is surprisingly harder. The numbers we see and use in everyday life are embedded in human culture and language in ways that an LLM (the core technology underlying generative AI models) does not innately understand. This is why quantitative reasoning is much more than humble number crunching: You need to consider context in order to find the right calculations to perform. (The calculation is easy for the AI after that). People use our intuition and years of accumulated knowledge about the world around us to help us with this, LLMs cannot do that.
The hidden intricacies of shopping lists
Still, it is not so easy to imagine what hurdles we might come across in teaching LLMs quantitative reasoning if we keep things abstract. Let’s discuss an example where “quantitative reasoning” might not be as simple as we initially expected.
When you and I look at this table, we can immediately find information and make inferences about how these prices compare to each other. For example, ‘What’s the most expensive product here?’ is a natural question to ask. To our human brains, this is a simple question to answer: Find the price column, find the biggest number in this column, name the product next to it. The solution – ‘Jar of jam!’ - comes so easily to us that unless the questions get much more complex, we might not even think of these steps consciously as we carry them out.
However, when you think of getting an LLM to answer the same question, a lot of new hurdles pop up that we, with our intuition, did not think of. How do you get an LLM to parse the table correctly, when we just read off values with our eyes? How do you get an LLM to understand that ‘most expensive’ means ‘the highest value in the price column’? How do you automate the process of making this connection, looking up values in the correct column and associating the result with the correct product (the product on the same row as the highest price value)?
The answer lies in studying relationships between conversations. It is impossible to explicitly predict every single table that might be brought to the AI agent or to predict every question that may be asked. Tables or discussions are not enough by themselves. However, LLMs are capable of comparison and inference, which means that if we train the LLM on enough samples where we carry out processes like the above, it will gradually learn how to carry out similar processes by itself. If you supply (say) a hundred or a thousand conversations about price tables to the agent (alongside the tables), it will ultimately learn that there is a connection between the notions of ‘expensive’ and ‘price’, that there is a connection between every product and the price value next to it. With a sufficiently large and robust dataset, an LLM can indeed understand the meaning of ‘Find the most expensive product in this table.’
Generalisation
This is already a breakthrough, but progress does not stop there. See, in order to ‘find the most expensive product in a table’, the LLM also needs to have some understanding of the broader idea of ‘find (something) in a table’. So, if you were to then supply this LLM with different kinds of tables (and conversations about them), it would ultimately understand the instruction ‘find something in this table’ as easily, and as generally, as we do. Going even further, you can then teach the AI to parse different ways to present data (like bar graphs, for example) and respond to an even wider range of queries.
In other words, gradually, LLMs can generalize what they know, with training. Comparison and inference, finding what changes and what remains constant between different interactions, are the skills that give LLMs this impressive capability.
An AI for the future
The capacity to generalize, to reason from what one already knows to derive further information (especially quantitatively), is a core facet of human intelligence too. At Mindrift, we are pursuing new avenues to cultivate this ability in the generative AI of today. I wonder how accurately this process may represent our own capacity for learning and generalizing.
To summarize, the ways in which we have trained LLMs to emulate human cognition may have parallels in our own intelligence and consciousness in many different ways. In teaching AI to read tables or derive numerical information from visual or verbal input, we show ages-long mysteries about our own intellect and intuition in a new light.
Article by
Yigit Ozcelik