Mimicking Evolution with Reinforcement Learning


Evolution is the only process we know of today that has given rise to general intelligence (as demonstrated in animals, and specifically in humans). This fact has been inspiring artificial intelligence (AI) researchers to mimic biological evolution in computers for a long time. To increase the pace of research discoveries, we need better tools and more efficient computational methods to mimic evolution. In fact, having these tools would not only have a major impact in AI research but they would speed up research across a multitude of other fields. The study of different types of replicators (the entities which are subject to evolutionary pressure, such as genes, ideas, business plans and behaviours) and their specific mechanisms for replicating makes up the core foundation for different research fields (such as biological evolution, mimetics, economics and sociology). These fields are based on a single universal truth: replicators that are better at surviving and self-replicating increase in numbers faster than their less capable peers. When these replicators are competing for a common limited-resource, the less capable ones eventually disappear leaving the more capable peers to dictate the world’s future.

A Promising Path for AGI based on Convergent Evolution

Here, we introduce a promising methodology for progress in AI, inspired by convergent evolution (CE). CE occurs when different species independently evolve similar solutions to solve similar problems. For example, the independent evolution of eyes has occurred at least fifty times across different species — as if when living in an environment with light, it becomes inevitable to see it.

Figure 1. There is an evident similarity between the octopus and the human eye despite their completely different evolutionary origin. Our common ancestor lived 750 million years ago, and it is believed to have had only a simple patch of tissue that could do little more than detecting the absence or presence of light. Curiosity: The octopus eye seems to be slightly better designed than ours because it has no blind spot! Our eye has no photoreceptor cells in a region where the nerve fibers need to pass through to go to the optic nerve.
  1. Design simplified bio-inspired agent(s) to live in a simplified bio-inspired environment.

The Success of Arms Race as a Tool for AI Research

(if you are already familiar with the concept of arms race in the context of AI research you can skip this section)

Figure 2. Fake images of cats generated by a GAN. Image by courtesy of Alexia Jolicoeur-Martineau.

Reinforcement Learning to Maximise Evolutionary Fitness

Evolutionary pressure acts on the replicator level, but reinforcement learning acts on the agent level. RL can be aligned with the evolutionary process by noting what evolution does to the agents through its selection of replicators: evolution generates agents with increasing capabilities to maximise the survival and reproduction success of the replicators they carry. So that’s what our RL algorithm should do as well.

Diagram 1. There are two different mechanisms driving the development of the brain: evolution (across generations) and learning (across one’s lifetime).

The Bacteria Colony

Figure 4. The Bacteria Colony.
Figure 5. Some of the evolved policies in action. The map boundaries are connected (e.g. a bacterium that moves over the top goes to the bottom part of the map). Curiosity: see how the yellow species displays an entirely different strategy for foraging; this behaviour is explained in our paper.

Results: The Colony’s History

In this section, we unfold the evolutionary History of the Bacteria Colony and narrate its key events. Then, we delve into the details of the evolved behaviours.

The Colony’s History

In the very beginning, God created Adam, Boltzmann, Claude, Darwin and Eve. These first bacteria were meant to populate the world, but they could hardly find the necessary resources to survive, let alone reproduce. Despite that, they evolved. You see, in this world, evolution works differently. God is watching every move of every being and reinforcing (or inhibiting) beneficial (or detrimental) behaviours for survival and self-replication.

Figure 6. Macro-statistics obtained in a test environment using the policies trained for a certain number of training episodes. Every 200 training episodes, we run 20 test episodes with a length of 500. The training episodes had a length between 450 and 550. The different colours correspond to the different eras; I: learning to survive (barely visible), II: learning to reproduce, III: learning to detect kin, IV: learning to self-sacrifice.

Cannibalism and suicide as a tool for gene survival

In era IV, we saw the rise of cannibalism practiced by young bacteria on older bacteria. Figure 7.a) shows how the average age of cannibals and their victims grows apart in this era. This allowed for an increase in population, but how did it affect the survival and reproduction success of a given family?

Figure 7. a) The average age of intra-family cannibals and cannibals’ victims. The vertical red line marks the start of era IV. b) the size of family 1 averaged along 90 test episodes. To compute the orange line we simply blocked all the attacks between members of the family 1. The shaded bands represent the 95% confidence interval. Note: The best thing about doing experiments inside simulations is that you can always run more experiments to reduce the error bars!

Final words


The paper behind this blogpost was written with my co-authors Arnaldo Abrantes and Frans Oliehoek.


[1] Christakis, Nicholas. “BLUEPRINT The Evolutionary Origins of a Good Society.” (2019)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
João Abrantes

João Abrantes

AI researcher into reinforcement learning, complex systems and collaboration tech.