AI vs. human brain: 14 commonalities and 21 differences

Mar 27, 2024

The history of AI has been fundamentally shaped by the idea that artificial neural networks are or should be like the human brain. This history is covered in the separate article “The Neural Metaphor: from artificial neuron to superintelligence“. This is a list of where AI and the brain have overlaps and where they differ.

Key structural commonalities

Let’s begin with AI techniques that have been inspired (or validated) by the human brain:

1) Basic neuron logic

The McCulloch-Pitts model of the artificial neuron, introduced in 1943, marked a foundational moment in the development of artificial intelligence. The basic idea was to represent the function of a biological neuron as a simple logical operation. This laid the groundwork for the concept of artificial neural networks.

2) Inhibition and excitation

Based on the neurotransmitters at the synapses between two biological neurons, a neuron that is activated can trigger or suppress the activation of a second neuron. The main excitatory neurotransmitter in the brain is glutamate, the main inhibitory neurotransmitter is GABA. Similarly, a connection between two artificial neurons can have a positive or a negative weight.

3) Normalization

The response of many biological neurons to stimuli is normalized in that it is divided by the sum of responses of neighboring neurons plus a constant. Neurons can encode the strength of a signal through the firing frequency, but his can only vary from 0 to about 250 per second. To deal with a much larger range of natural stimulus intensities, your light sensitivity adapts based on the overall light intensity to always produce an image with a useful level of contrast. In AI similar mechanisms have been employed to stabilize the activations and maintain a controlled range.

Responsiveness of light receptor neurons in the human brain (y-axis) to an incoming stimulus (colored balls) depends on the background intensity of light (colored squares). Source: Carandini, M., Heeger, D. (2012). Normalization as a canonical neural computation. *Nat Rev Neurosci* 13, 51–62.

4) Pooling

In the brain, there are neurons that are specialized to respond to specific types of visual stimuli, such as edges and angles without being sensitive to their exact location. This has inspired the development of AI techniques like “max pooling” in which only the maximum value within a certain range of input values is highlighted. Such selective attention to salient features can be an efficient way of processing visual inputs.

MaxPooling. Source: FirelordPhoenix. (2018). wikimedia.org

5) Attention

Inspired by the human visual system, attention mechanisms in neural networks have been developing for a long time. Today, large models have integrated self-attention mechanisms that focus computational resources on relevant information, somewhat similar to how the human brain selectively concentrates on aspects of the sensory input.

6) Multimodal neurons

In 2005, neuroscientists showed that human neurons responding to specific people, such as Jennifer Aniston or Halle Berry. They did so regardless of whether they were shown photographs, drawings, or even images of the person’s name. Meaning the neurons responded to a multimodal concept. In 2021 OpenAI has first developed multimodal artificial neurons that also respond to the same subject in photographs, drawings, and images of their name.

Illustration of multimodal neurons. Source: Gabriel Goh et al. (2021). Multimodal Neurons in Artificial Neural Networks. distill.pub

7) Reinforcement learning

In reinforcement learning, an agent learns from the consequences of its actions through rewards or punishments. Classic examples from psychology that show reinforcement learning in biological neural networks include the works of B.F. Skinner and Ivan Pavlov. Dopamine pathways are thought to play a crucial role in the reward-based learning process in the human brain. The first to apply reinforcement learning to AI was Marvin Minsky with his Stochastic Neural-Analog Reinforcement Calculator (1951). Today, reinforcement learning has remained one of the main AI learning techniques.

8) Temporal-difference learning

Reinforcement learning in its simplest form has its limits because external rewards for complex tasks can be sparse requiring many intermediate steps and it is challenging to decide afterwards, which steps should get how much credit for the outcome. Instead, you need an internal value function that assesses expected reward or punishment continuously, and in temporal-difference learning behavior is reinforced if it improves this internal expected reward. Such a learning algorithm has been developed for game-playing AIs and it has later been discovered in the firing rate of dopamine neurons in parts of the human brain.

9) Memory replay

Memory replay in AI is inspired by the biological processes observed in the human brain, specifically in how memories are consolidated during sleep or periods of rest. This process has been adopted in AI to help neural networks avoid catastrophic forgetting, allowing them to retain previously learned information while acquiring new knowledge.

Now, let’s shift to broader structural commonalities:

10) General-purpose architecture & intelligence

The human brain is a general-purpose processor, capable of learning a vast array of skills and adapting to countless environments. While biological neurons come in a variety of shapes and forms the overall adaptability of both to a wide range of tasks is remarkable. For example, individuals who are blind from an early age often have enhanced abilities in their remaining senses, such as more acute hearing or more sensitive touch. Similarly, children who undergo a hemispherectomy, where half of the brain is removed, often continue to develop language skills, even if the left hemisphere typically responsible for language is the one removed.

Neural networks are widely viewed as a general-purpose technology that can be applied to nearly any task in the economy. This used to mean having 1’000 separate neural nets trained on 1’000 separate narrow tasks. However, increasingly we can see artificial general intelligence, which is one large neural net that performs well across a broad spectrum of tasks. GPT-4 can fluently speak all major natural languages, pass coding interviews in all major programming languages, get master’s degrees in more than dozen different subjects, pass the bar exam, create recipes, and perform in poetry and rap battles. Similarly, there is no short online task anymore that most humans can do easily, and that no AI can do. AI now outperforms humans on the “Completely Automated Public Turing test to tell Computers and Humans Apart” (CAPTCHA) which is present in different forms on many major websites to keep out bots.

Of course, humans tend to still choose specialized architectures for specific tasks, but overall, it is remarkable how general-purpose deep artificial neural networks are. The training of large neural networks can also include a technique called dropout in which randomized part of its network are turned off. This increases robustness.

11) Intelligence increases with scale

Larger brains, especially in relation to body size, are generally associated with higher intelligence across biological species. This is especially true if we look less at a superficial factor such as brain volume, and more at the scale of synapses. In artificial neural networks we have increasingly formalized scaling laws that can predict the improved performance on a variety of tasks with increasing training data, training compute, and model parameters.

12) Cultural learning

Cultural learning represents a pinnacle of human intelligence, allowing individuals to acquire knowledge, skills, and behaviors from others, transcending individual experiences and the informational bottleneck of genetics. This capacity for cultural transmission is the “Secret of Our Success” and has led to the accumulation of knowledge across generations, enabling societies to develop complex technologies, languages, and institutions. AI is the first technology that can directly tap into this accumulated pool of human knowledge.

13) Complexity, explainability, and predictability

As humans we build a lot of complicated technology, and there are many complicated technical artifacts whose functioning no single human understands deeply in all aspects, such as building a rocket, a smartphone, or an EUV-machine. However, each individual aspect of these technologies is understood and designed by several humans. Each part has a specific role, their interactions can be predicted, and given inputs and conditions, we know what outputs we will get.

In contrast, complex systems are characterized by dynamic and often non-linear interactions between their components. This means small changes can have disproportionate and unpredictable effects. The system cannot be predicted merely by analyzing the parts. The whole is more than the sum of its parts. Large artificial and biological neural networks are complex systems. We know how humans reproduce and grow up, and we know how to train giant AI models. However, we have a very limited ability to explain how they get from a specific input to a specific output, and we often cannot predict outputs with high confidence.

This also influences how we study AI and the brain. Applying methods commonly used in biology and neuroscience, such as removing or destroying components one at a time and using subsequent malfunctions to understand function, to technology used to be a humorous juxtaposition to highlight the shortcomings of these methods (see “Can a biologist fix a radio?”, “Could a neuroscientist understand a microprocessor?”).

However, given the opacity of large artificial neural networks, technologists have started to be more inspired by biology. Mechanistic interpretability seeks to reverse engineer how artificial neural networks function. In the words of Anthropic CEO Dario Amodei it is “neuroscience for models” and some methods are indeed somewhat similar, such as “network dissection”, which is about systematically observing the activity of artificial neurons in response to stimuli to identify what concepts they represent.

14) Confabulations

While some might have classified AI hallucinations as a difference between human brains and AI, there is also a remarkable level of overlap. Neither humans nor large language models are particularly good at being aware when they do not know and can make confabulations. Although in humans this quality seems connected more strongly to limitations of memory (e.g., John Dean).

Key structural differences

1) Software vs wetware

Artificial neurons are a logical construct that runs on a hardware that looks nothing like human neurons. In the “wetware” of the human brain, hardware and software cannot be separated.

Human brains are embodied: The human brain is an inseparable part of the human body, whereas artificial neural networks can freely change their hardware. This has several implications, including:

Creating copies: it is much easier and faster to create copies of artificial neural networks
Editability: it is much easier and faster to edit artificial neural networks
Updating and replacing hardware: it is much easier and faster for artificial neural networks to update or replace their hardware and they are potentially immortal
Interoception: biological neural networks also have interoceptive inputs from the collection of senses providing information to the organism about the internal state of the body.

2) Neuron activation function

Biological neurons operate in a binary fashion: they either fire or don't fire based on the inputs that they receive. This all-or-nothing principle had been replicated in some early AI systems. However, step functions are not useful for gradient descent and cannot express any fine-grained distinction. The brain has still served as a source of inspiration for non-linear activation functions, where a neuron's output is not always directly proportional to its input. Still, commonly used activation functions in artificial neural networks, such as rectified linear unit (ReLU), are ultimately quite different from binary.

Binary activation function (left) and rectified linear unit (right). Laughsinthestocks. (2015). wikimedia.org CC 4.0

3) Temporal summation in biological neurons

In biological neurons the effect of each input signal can last for several milliseconds, allowing for the accumulation of charges from inputs that arrive close together in time, including from the same presynaptic neuron. In contrast, in a standard artificial neuron there is no built-in mechanism for accumulating charge or signal over time.

Studentne. (2011). Temporal summation. wikimedia.org CC 3.0

4) Specialized neurotransmitters, neuromodulators & hormones

Beyond the standard excitatory and inhibitory neurotransmitters, the human brain also has synapses that use more specialized neurotransmitters, such as dopamine, serotonin, acetylcholine, noradrenaline, and adrenaline. Neuromodulators can subtly adjust neural circuitry's sensitivity or responsiveness over various time scales, affecting mood, motivation, and other long-term brain states. Beyond that, hormones, which are chemicals secreted by the endocrine system can affect brain function as well. Examples include cortisol (stress hormone), melatonin (sleep regulation), as well as estrogen & testosterone (sex hormones) Overall, the brain has a complex system of neurotransmitters and modulating substances with no equivalent in AI.

5) Backpropagation

When AI models are trained, the model is given an input and this signal travels through many layers of neurons until it reaches an output layer. This output is evaluated against some desired “correct output”. Then the error signal, the difference between the two, travels through the neural network in reverse order from the output layer to the input layer and ensures that weights are adjusted in the direction of the desired “correct output”.

We do not fully understand how learning in the human brain works. However, it does not use backpropagation as learning algorithm. The transmission of an action potential between synapses is a chemical process which only works in one way. Meaning connections are unidirectional and cannot be reversed for learning. The activation function of biological neurons also does not allow to transmit a precise "error" value.

6) Stricter adherence to layers in artificial neural networks

Artificial neural networks typically have a more defined, layered structure than biological neural networks. In a traditional artificial neural network, neurons are organized into layers, with each neuron forwarding signals only to the next layer, simplifying the backpropagation process for efficient learning. There are also artificial neural networks in which neurons get shortcuts and can skip a couple of layers, in which outputs are fed back in as an input, or in which all layers can be connected. Still, overall, biological neural networks have more varied and complex connections with a mix of hierarchical structuring and diverse connection patterns.

7) Human brains are pre-wired

The development of the human brain begins with significant pre-wiring for basic physiological functions and reflexes. Humans may choose AI architectures tailored to specific tasks (e.g., convolutional neural networks for image processing), but the weights and biases do not contain built-in knowledge specific to survival or any particular task. Instead, artificial neural networks are usually trained from scratch with randomized initial weights.

8) Energy efficiency

The human brain is very energy efficient. It has something like 0.1-10 petaFLOPs of computing power, whilst operating on about 20 watts of power, similar to a light bulb. This gives us an energy efficiency of about 0.005-0.5 petaFLOPs per watt. Current supercomputers already outperform the human brain in computing power with the fastest supercomputer on the Top500 list reaching about 1’500 petaFLOPs. However, this comes at an energy consumption of 22’703’000 watt. The leading entry on the Green500 list of the most energy efficient supercomputers produces about 0.000065 petaFLOPs per watt.

Type of energy consumption: The final energy consumption is not an apples-to-apples comparison. Human brains get their energy from agriculture, AI gets its energy from the electricity grid. As a top-down sanity check, let’s assume that agriculture (incl. fertilizer) is about 10% of global energy consumption.1 If we attribute 20% of human-consumed calories to the brain, we get to a ballpark estimate of 2% of global energy consumption being spent indirectly on feeding human brains. 2% of 2’400 watt per person is about 50 watt per brain, which would bring the brain closer to 0.002-0.2 petaFLOPs per watt. Similarly, it seems reasonable to assume that it takes about 100 units of primary energy input, for 30 to 40 units of electricity to be delivered for consumption. So, let’s divide digital computing efficiency by three, and we arrive at about 0.00002 petaFLOPs per watt.

Evolution of energy efficiency: While the brain is extremely energy efficient, it does not change. The energy efficiency of computers doubles about every 18 months in line “Koomey’s Law”. However, as long as the growth in AI compute continues to double about every 6 months (“Huang’s Law”), it outpaces energy efficiency gains and the absolute power demand of AI will increase.

Koomey, Berard, Sanchez, & Wong. (2011). “Implications of Historical Trends in the Electrical Efficiency of Computing”, IEEE Annals of the History of Computing, 33(3), 46-54.

9) Sleep

Biological neural networks need sleep. For humans this corresponds to about one third of our life. While we have not uncovered all mysteries of sleep yet, chemical activity in the brain creates waste products that need to be disposed. One theory is that during sleep cells shrink slightly allowing cerebrospinal fluid to flow more freely and remove toxins. So, sleeping is in some ways the garbage removal service of the brain. In contrast, AI does not rely on biochemical processes and does not need any sleep.

10) Speed

Biological neural networks are much slower than digital hardware.

Computation cycles: The human brain has no unified clock speed, but neurons usually cannot fire more often than about 250 times per second. In contrast, modern computers work with a unified clock speed and instructions sent at intervals of multiple billion times per second. Computer cycle speeds grew exponentially for a long time but have plateaued in the last 20 years or so.
Communication speed: The signals between biological neurons can travel at speeds of up to 120 meters per second. In contrast, signals in a chip can travel optically up to the theoretical maximum of the speed of light, which is 300’000’000 meters per second.

While brains are slower, they can do more parallel processing than computer hardware. Having said that, the trend from CPUs to AI hardware is largely about enabling more parallel processing, and it’s difficult to directly compare something like the number of computer cores with the brain.

11) Working memory

Memory in digital computers is generally more reliable. However, the biggest difference is in working memory. This is volatile memory that is directly accessible to a working process with minimal latency.

Human brain: Humans have very limited working memory. The most cited study on the capacity of the human brain to hold different elements in mind simultaneously suggests an upper limit of 7 elements (plus or minus two).
Computer hardware: In hardware working memory is called “random access memory” (RAM) and it vastly exceeds human capabilities. Supercomputers can have petabytes of RAM, individual AI chips in server farms can have terabytes of RAM, and personal computers have gigabytes of RAM.
LLM context length: Large language models have something similar to working memory in the form of context, which is knowledge available during a conversation. When GPT-4 launched the context window had a maximum length of 8’000 tokens. As a rule of thumb, 1 token = 0.75 words. Today, it is already at 128’000 tokens or about 96’000 words, and all AGI companies are continuously increasing it. For comparison, Harry Potter and the Sorcerer’s Stone has about 77’000 words. In the future, an LLM will likely be able to hold an increasing share of all data about you in its working memory.

12) Upper limit of size

The size of human brains is limited by its protective skull, which is in turn limited by size of the birth channel of women. In theory, we can also create non-embodied biological neural networks outside of a skull by nurturing them in-vitro. For example, you can get a DishBrain with 800’000 neurons to play Pong. However, due to ethical reasons it is unclear if scientists will every try to fully scale biological neural networks outside of a skull.

Furthermore, the slow communication speed of biological neurons also puts another limit on brain size as the latency to integrate information increases fast. As Bostrom highlights “for a round-trip latency of less than 10 milliseconds between any two elements in a system a biological brain needs to be smaller than 0.11 m³. In contrast, an electronic system could grow up to 6*10¹⁷ m³, which is the size of a dwarf planet.”2

13) Variation in size

Digital hardware: On the lower end we have the Michigan Micro Mote with a volume of about 16 mm³. While its full specs are not known, a clock speed of 1 MHz could imply computing power as low as 500’000 FLOPs. The top supercomputer on the Top500 list takes up about 372 square meters and if we assume up to 3 meters rack height, we get close to 1’000 m³and a performance of about 1’500 petaFLOPs. So, the largest digital computer in the economy is more than a billion times larger and has more than a billion times more computing power than the smallest computer.

Artificial neural networks: AI does not have a fixed size in volume, that depends on the underlying hardware. However, we can look at size by looking at parameters in the neural network (which is roughly equivalent to synapses) or by looking at the amount of computing power used for training or inference. We can of course create arbitrarily small artificial neural networks. However, something like 25’000 parameters is on the lower end to be useful for a task such as digit recognition. On the upper end, we can find models such as GPT-4 with an estimated 1.8 trillion parameters. So, the biggest model is about 70 million times larger than the smallest model.

Human brain: Biological neural networks also have an impressively diverse range of volumes and computing power across species. However, this only works with the analogy to AI as a new domain of life, it does not work with the analogy to the human brain alone. Adult human brains are in the range of 1’200 to 1’500 cm³, meaning the largest ones are about 1.25 larger than the smallest ones. There are no estimates of the range of neurons, synapses and therefore FLOPs amongst humans (as you can imagine this would be a very sensitive topic).

14) Evolution of size

The amount of computing power going into the training of large artificial neural networks grows by about 4.2x per year, and the parameter count grows by about 2.8x per year. The human brain has also grown and evolved over time but on much slower time scales. The average doubling period for brain volume, from Australopithecus to early Homo sapiens, was approximately 1.8 million years. So, artificial neural networks grow more than a million times faster than human brains.

15) Parameters to training data ratio

The human brain still slightly beats current artificial neural networks in terms of synapses (ca 100 trillion) vs. parameters (1.8 trillion – GPT-4). In contrast, artificial neural networks are trained on amounts of data that would be impossible to consume for a human. The idea of reading the whole of Wikipedia is a joke to humans, but large language models have not just read that but large swaths of books, Reddit, Twitter, and the overall Internet (e.g. CommonCrawl, RefinedWeb). This also means that state-of-the-art large language models like GPT-4 have supergeneral knowledge. They have a broader range of knowledge than any individual human.

16) Brains require fewer examples to learn

Human brains outperform artificial neural networks in the face of sparse data. For example, children require far fewer examples to be able to recognize a class of objects. However, large AI models have admittedly gotten more general and a lot better at one-shot and zero-shot tasks. Humans may have ways to learn faster through short-term synaptic plasticity (“fast weights”). For example, a high-frequency burst can open new channels on synapses that lead to higher activation levels in the future. There is no equivalent in artificial neural networks.

17) Access to connectome

Large neural networks are currently not understandable and not fully predictable. However, we have access to their full connectome, meaning we have giant CSV files with all the weights and connections of artificial neurons in the network. In some cases, this AI connectome is even shared open-source. For comparison, the first (and, so far, only) fully reconstructed connectome of a biological neural network belongs to the roundworm C. elegans. This also means mechanistic interpretability has access to much better data in its quest to reverse engineer functionality than neuroscience.

18) White-box vs. black-box shared learning

AI models can learn from each other in more direct ways than human brains can, because they have access to more intermediate states rather than just the output of a model.

Weight / gradient sharing: In some instance AI models can directly share weight updates with each other. For example, in federated learning copies of AI model start working on their piece of the data puzzle. As they train on their local data, they figure out how to adjust their weights to make better predictions or decisions. Periodically, the models share their weight adjustments or gradients with each other.
Distillation - knowledge transfer via output probabilities: Knowledge distillation involves training a smaller AI model to replicate the behavior of a larger AI model. For example, we can take GPT-4 answers as the desired outputs on which we train the smaller model. Except, normally it is not just the final output but the probabilities of each predicted class (e.g., 92% bear, 5% gorilla, 0.01% snake etc.) that is shared with the student model.3 Geoffrey Hinton has analogized distillation to human students learning from a lecture. However, human students do not get access to the neural probabilities of words in a teacher’s brain, they only get access to his or her spoken words.
Emulation - knowledge transfer via reasoning patterns: Emulation is similar to distillation, but it focuses more on replicating the behavior, reasoning patterns, or decision-making processes of the teacher model, rather than just its outputs. The most well-known example of this is Orca AI from Microsoft. This is probably the closest equivalent to how humans learn from each other.

19) Ownership & distribution

Brains are “owned” by individual humans. The infrastructure of artificial neural networks is owned by the tech giants, such as Amazon, Microsoft, and Google. Brains are geographically distributed in accordance with global population distribution, and there are no dramatic differences in brain size or shape between humans. There are no “brain billionaires” that have more neocortex than entire countries. In contrast, artificial computing power and capital, are much more unequal both within and between countries.

20) Consciousness

We have not uncovered the mysteries of consciousness yet, but we can say with very high confidence that humans have experiential consciousness. Meaning we have qualia, feelings like happiness or pain are not just some abstract numbers but an experience. This is what gives us moral patient hood. It is commonly assumed that current computers and AI are not conscious, because they exist on a very different substrate, so they have nothing akin to our neural correlates of consciousness. However, we are still deeply ignorant about consciousness, and we cannot exclude with high confidence that artificial neural networks do experience consciousness.

21) Rights & duties

“Neurorights”, protective rights specific to the human brain (e.g. brain privacy), are still a small and emerging phenomenon. However, humans as a whole have a variety of legal rights and protections. For example, it is illegal to end the life of a human or for a company or another human to own another human. As humans we have further labor protection laws, such as maximum working hours and mandatory holidays. We can own property, we can open bank accounts, we can register patents under our name, we can sue other parties, we have a right to privacy, and, in democracies, we have a right to vote. As of now, no AI models have rights, irrespective of their size or sophistication.

While human brains have much more rights than artificial neural networks, they also have some additional duties. For example, humans pay income tax.

This is a ballpark number. It’s hard to find a reliable number. Greenhouse gas emissions of the sector are around 10%. Direct energy consumption is maybe closer to 3% but that doesn’t count many indirect inputs, fertilizers alone already add another 1-2%. There’s an FAO Report that claims the whole system is about 30% of the world's total energy consumption.

Nick Bostrom. (2014). Superintelligence: Paths, Dangers, Strategies. p. 72

These probabilities carry more information as they show the confidence level of the teacher model in each possible outcome not just the most likely.

Machinocene

Discussion about this post