Journey into the mind of an AI agent

Ashish Arora
Dec 24, 2024
8 min read

Updated: Dec 26, 2024

Let me start off by reciting the famous quote by Herbert Simon, from 1957 - almost 70 years ago:

"It is not my aim to surprise or shock you—but the simplest way I can summarize is to say that there are now in the world machines that think, that learn and that create. Moreover, their ability to do these things is going to increase rapidly until—in a visible future—the range of problems they can handle will be coextensive with the range to which the human mind has been applied."

If Herbert Simon were alive today, he might envision AI-driven spaceships exploring other galaxies, learning about alien civilizations, and perhaps even acting as humanity's emissaries in a cosmos-spanning dialogue. Or perhaps he'd envision AI analyzing not just bones, but also ancient tools, cave paintings, and other artifacts, revealing hidden chapters in human history...

Turning from speculation to the present, imagine a world where your thermostat anticipates your needs, your car drives you safely through rush hour, and a computer program beats the world's best Go player. This isn't science fiction - it's the reality of today's artificial intelligence agents, rapidly evolving and shaping our world in profound ways.

But what exactly are these AI agents, and how do they work? In this post, we'll peek under the hood of these AI agents, exploring how they perceive, analyze, and make decisions. We'll also uncover some surprising capabilities that even surpass human expertise in certain areas, and consider what this means for the future of human-AI interaction. Get ready to explore the fascinating inner workings of the machines that are changing our world!

The Anatomy of an AI Agent: A Functional Perspective

Stuart Russell and Peter Norvig, in their seminal book Artificial Intelligence: A Modern Approach, define an agent as anything that perceives its environment through sensors and acts upon it using actuators. Let’s break this down into digestible parts, using real-world examples to illustrate the concept.

How does an AI Agent perceives the environment

Perceiving the Environment
An agent gathers information from its surroundings through sensors. For a human agent, this could be eyes (visual input), ears (auditory input), or even skin (tactile feedback).
- Robotic Agents: Robots, like Boston Dynamics' Spot, use cameras, LiDAR sensors, and microphones to "see," "hear," and map their environment. For example, Spot can navigate uneven terrain, avoiding obstacles using real-time visual data.
- Software Agents: Virtual assistants like Siri or Google Assistant rely on natural language processing (NLP) as their sensory input. They "hear" your voice through a microphone and convert it into text to process the query.
Analyzing and Computing
Once the sensory data is gathered, an agent needs to process it to make sense of the environment. This step involves:
- Extracting relevant features (like detecting faces in an image).
- Understanding context (like distinguishing between "book a table" for dining versus a reading material query).
This is where AI algorithms truly shine. Consider Tesla's Autopilot: it doesn’t just see the road; it recognizes vehicles, pedestrians, and road signs, determining safe routes and actions in real time. Similarly, chatbots use machine learning models to analyze your input and provide contextually appropriate responses.

Focusing on Decision-Making: The Core of an Agent

The decision-making process is what makes an agent truly powerful. The better its reasoning abilities and logical correctness, the more accurately it can perform tasks. But how exactly does an AI agent make decisions? To understand this, let's first consider how humans make decisions.

Human Decision-Making: Folk Psychology

Many philosophers and cognitive scientists believe that humans rely on an innate understanding of mental states—widely referred to as folk psychology. This theory explains human behavior using concepts like beliefs, desires, hunger, and pain. For example, if you're hungry (desire) and believe there’s food in the fridge, you’ll likely head to the kitchen to eat. This intuitive ability to explain and predict actions forms the foundation of human reasoning.

Rationality in Agents: A Different Approach

AI agents don’t possess folk psychology or common sense (yet), but they rely heavily on the concept of rationality to make decisions. A rational agent, as described in the paper Towards a Logic of Rational Agency, consistently strives to do the right thing – that is, to take the action most likely to achieve its goals given its current knowledge.

The information gathered and processed in the previous stages (Perceiving and Analyzing) now forms the basis of the agent's beliefs about the world. The accuracy and sophistication of the 'Analyzing and Computing' stage directly impact the quality of the agent's beliefs, and thus the effectiveness of its decision-making.

But how do we translate this abstract concept of rationality into a concrete framework that AI agents can use? This is where the belief-desire-intention (BDI) model comes in.

The Belief-Desire-Intention (BDI) Model: A Framework for Rational Action

The BDI model provides a structure for agents to reason and make decisions in a way that mirrors, to some extent, human practical reasoning. Let's break down each component:

Beliefs: What the agent knows or assumes about the world. These represent the agent's current understanding of its environment. AI agents are often trained on vast datasets, allowing them to learn patterns, relationships, and probabilities. This training data forms the basis of their initial beliefs or "knowledge" about the world, which are further refined by real-time data from sensors and user inputs. Sometimes, when beliefs are dynamic, they can change as the agent gathers new information. However, these beliefs can sometimes be incomplete, due to limited data, or incorrect, due to noisy or faulty inputs. For instance, a weather forecasting agent might believe it will rain tomorrow based on its current dataset, even though additional information could later contradict this prediction.
Desires: What the agent wants to achieve. These represent the objectives the agent strives for, often defined by its purpose or programming. Desires can sometimes conflict with one another, requiring prioritization. For example, a navigation app might desire to find the fastest route to a destination while also minimizing tolls and avoiding traffic.
Intentions: A subset of desires that the agent has committed to pursuing. Intentions represent a deliberate choice to act in order to achieve a specific desire. For instance, if our navigation app has the desire to get you to your destination quickly, its intention might be to direct you onto a specific highway, even if it means paying a toll.

The Human Touch: Where AI Still Falls Short

While AI can excel at logical reasoning and data processing, it still struggles with areas that come naturally to humans. Our 'folk psychology' – that innate understanding of beliefs, desires, and emotions – allows us to navigate complex social situations, empathize with others, and make decisions based on nuanced contextual cues that AI currently cannot grasp. This includes areas such as creativity and innovation, common sense reasoning, and ethical decision making. While these are active areas of research, current AI models still fall short. For instance, a 2021 study by TalentSmartEQ found that emotional intelligence accounts for 58% of job performance.

The Road Ahead: From Specialized AI to General Intelligence?

Currently, we are in the era of Artificial Narrow Intelligence (ANI), where AI excels at specific tasks but lacks general intelligence. Artificial General Intelligence (AGI) is a hypothetical type of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of domains, similar to a human.

Achieving AGI faces significant challenges, such as replicating consciousness, common sense reasoning, and genuine understanding. However, the potential impact of AGI is vast, from scientific breakthroughs to solving global challenges. The increasing investment in AGI research by tech companies and research institutions indicates the growing interest and belief in its potential. According to a report by MarketsandMarkets, the global AI market size is projected to reach $309.6 billion by 2026, growing at a CAGR of 42.2% from 2021 to 2026.

AI and Humans: A Powerful Partnership

The future of AI lies in enhancing human capabilities, creating powerful synergies across industries. Here’s how AI Agents are transforming our world already:

Healthcare: AI tools analyze medical images with precision, speeding up diagnoses and supporting timely treatments.
Scientific Research: AI accelerates drug discovery by analyzing molecular datasets and predicting potential breakthroughs.
Education: AI-powered tutors adapt lessons to individual learners, offering personalized and effective education.
Business Operations: From payroll to inventory, AI automates back-office tasks, boosting efficiency and reducing errors.
Marketing: AI analyzes consumer behavior, crafts personalized campaigns, and enhances creative strategies.
Support Functions: In HR, IT, and finance, AI handles routine queries and tasks, improving speed and productivity.
Customer Service: Chatbots and AI-driven assistants manage customer interactions, resolving issues efficiently.
Content and Media Creation: AI generates professional-grade content—images, videos, audio, and written text - at scale.

By automating repetitive tasks and enhancing decision-making, AI empowers humans to focus on creativity, strategy, and innovation. According to a report by McKinsey, AI has the potential to create $13 trillion in economic value annually by 2030. This value will be driven by AI's ability to automate tasks, improve decision-making, and personalize experiences.

Superpowers of AI: Beyond Human Expertise

But what if AI could go beyond simply augmenting human abilities? What if it could surpass them in certain domains? AlphaGo, developed by DeepMind, made headlines when it defeated Lee Sedol, one of the world's best Go players, in 2016. Imagine an AI that not only predicts the best next move in a game but also evaluates the overall strength of its position on the board. That's essentially what AlphaGo did using two powerful neural networks: one to suggest likely moves (the policy network) and another to judge the strength of a position (the value network).

AlphaGo didn't just react to the current move; it simulated thousands of possible future scenarios using a technique called Monte Carlo Tree Search (MCTS). It's like a chess player mentally playing out multiple variations of the game before making a move, but on a much larger scale.

From Supervised Learning to Reinforcement Learning: A Paradigm Shift

You might think that if AI learns from the best human players, it can only be as good as them, right? That's true to some extent for systems that rely solely on supervised learning. But what if an AI could learn from its own experiences, getting better with each game it plays, even inventing new strategies that no human has ever thought of? That's the power of reinforcement learning.

AlphaGo Zero, a later version, started as a blank slate, knowing only the rules of Go. By playing against itself millions of times, it not only surpassed all previous versions of AlphaGo but also developed entirely new strategies. As the paper Mastering the game of Go without human knowledge states, this demonstrates "that an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules, is capable of outperforming the strongest human players in the most challenging of domains."

This idea of an AI learning and improving through experience is also central to a long-standing research project called SOAR. Think of SOAR as a blueprint for building intelligent agents that can learn from their actions, plan ahead, and adapt to new situations, much like AlphaGo. SOAR agents, like AlphaGo, can search through possibilities, create plans to achieve their goals, and even build mental models of their environment to predict the consequences of their actions.

Conclusion

We've journeyed into the mind of the AI agent, and what we've found is both exciting and a little unsettling. These aren't just lines of code; they are systems capable of learning, adapting, and even outsmarting us in specific domains. As AI agents become more sophisticated, they will inevitably change the way we work, live, and interact with the world.

The question is no longer if this technology will transform our lives, but how. And that 'how' is up to us.

What kind of future do you envision with AI Agents?

Ashish Arora