Introduction to Artificial Intelligence (Part 2)

In Part 1 we looked at what AI is and how it is such an interesting field of study to understand intelligence as a concept. Here we will dive into the history and the evolution and challenges throughout. I’ll try and cover some of the main events.

We mentioned the Turing test, but it wasn’t until 1956 when the term Artificial Intelligence was introduced by a fellow named John McCarthy as the mission of the 1956 Summer Research Project at Dartmouth College. He had invited a group of 10 researchers in computer and cognitive sciences to solve this problem of intelligence.

1959: Arthur Samuel, who attended this workshop, coins the term machine learning and builds a self-learning checkers program. Machine learning is a subset of AI where we learn patterns from data without explicitly programming the rules like we do traditionally.

1964-1967: Computer scientist Joseph Weizenbaum develops ELIZA, which was the first natural language processing chatbot which could simulate conversations. Wait, what? I thought that was ChatGPT! 🤔 Not quite. This chatbot used a different mechanism. It applied keyword matching as well as decomposition of sentences to analyse them. It uses this analysis to apply reassembling rules to give the output.

1974-1980: The first AI winter. Hype was created by the media and developers on what AI can achieve, but it failed to deliver on these promises. At that time there were also limitations in compute and algorithms that could support AI development. Thus, interest and funding diminished.

1980s: The rise of expert systems. These are programs that are injected with rules, and we use IF-ELSE logic to decide which action to take based on the condition met. XCON was one such system that automatically decided which computer components to order for a customer based on their requirements. Hardware was also built to support these systems, namely LISP machines. LISP was the primary language for early AI and efficiently handled symbolic data since expert systems used symbolic logic. However, then came the second AI winter between 1987 and 1993. It became expensive and difficult to build and maintain these expert systems. This was because we had to manually define the rules, and there could be thousands of rules. Since we are injecting rigid rules into these systems, they fail to learn and adapt to novel situations. This was also a period where general-purpose PCs were coming into the market, leading to the decline of LISP machines.

Development of the backpropagation algorithm

This is one of the most important, if not the most important, algorithms to train deep learning models. Deep learning is a subset of machine learning using neural networks to learn patterns from data. Loosely speaking, this is an inspiration of the human brain. Information is propagated through layers of neurones; the strength of the connections (weights) determines how much signal passes through, similar to biological synapses. Essentially we start from the output of the neural network and work our way back to compute the gradients of each weight of the network using the chain rule in mathematics. Then we use these gradients to update the weights via gradient descent, and this is how the network learns.

Actually, work on backpropagation started around the 1960s to 70s. Early versions cropped up in control theory, and later on, methods for automatic differentiation (which is the mathematical backbone of backprop) were published. However, it seems that not too much attention was paid to it at the time. A thesis was published in 1974 by Paul Werbos that proposed how backprop could be used in neural networks, but it was in 1986 when one of the godfathers of AI, Geoffrey Hinton et al., demonstrated how backprop can be used to train neural networks and learn internal representations.

1997: IBM developed Deep Blue, which was an AI chess program that defeated grandmaster Gary Kasparov. Although there is a question of whether this was actually learning when the program was just brute-forcing to find the best next move based on the chessboard configuration.

2012:: This was really where deep learning took off due to the advancements in compute and data. Stanford professor Fei-Fei Li, the pioneering force behind the revolutionary ImageNet dataset that consists of 1 million images spanned across 1000 categories. A competition was held to build an ML model that could accurately classify images in this dataset such as cats, dogs etc. A neural network named AlexNet was trained on GPUs (these are powerful for parallel computation and matrix multiplication). Under the hood it used CNNs (Convolutional Neural Networks) which specifically handle spatial data such as images and overcame traditional ML methods such as Support Vector Machines (SVMs).

Since then there has been a lot of work done in deep learning. Their ability to handle various modalities such as text and images makes them versatile for many applications. Traditional ML required manual feature extraction before feeding the data into the model. Depending on the problem at hand, this requires domain knowledge and expertise. It can be incredibly time consuming as well. Deep learning is a form of representation learning where we learn features automatically.

2016: AlphaGo, developed by Google DeepMind, defeated the top Go player at that time, Lee Sedol, 4-1. Go is an incredibly complex board game based on strategy and creativity and has a number of combinations far greater than the number of atoms in the universe. This was an astonishing milestone and included the famous move 37, which took Go experts by surprise and that no human Go player had ever thought of before. We often have this notion that AI systems simply parrot what they have seen in training. But this showed that AI had emergent properties that could produce novel solutions that no human could come up with.

2017: Google released a research paper named Attention Is All You Need, which details the Transformer architecture. This was originally designed to solve machine translation between languages but turned out to be the crucial architecture that gave birth to LLMs (Large Language Models) that we see today. In the following years models around this architecture were developed, such as BERT (2018), T5 (2019) and GPT, first introduced in 2018. Yes, this is the model powering OpenAI’s ChatGPT! 😁

2022: ChatGPT was released and instantly became a hot commodity across the world. In just 2 months the application reached 100 million users! 😲 It was really when we started talking about generative AI, where AI produces content in the form of text, image, audio, video, etc.

Since then we have seen a wave of generative AI models developed by companies such as Anthropic’s Claude, Google’s Gemini and Meta’s Llama series. Even now we are seeing newer models being released that are progressively getting smarter and more capable across various tasks, such as coding and question answering. We are seeing a large ecosystem of AI products that leverage these LLMs to develop applications and the rise of AI agents. Remember agents from part 1? Their brains are the LLMs that reason and plan, and tools are used to perform actions such as web search and code execution.

Conclusion

So yes, there is a rich and deeper history of AI than you may think initially. We have certainly come a long way, and it is permeating our lives more and more. We need to embrace it and learn how to use it to become more productive, learn better and solve real-world problems. It is also a technology that have caused a significant change in the way we work, but past technologies have also done this. Think of industrial machines in manufacturing and Robotic Process Automation (RPA) software that automated repetitive work such as invoicing. There is now talk on how we can achieve AGI (Artificial General Intelligence). Many AI systems in the past were what we call narrow AI, where they specialise in one specific task. AlphaGo is for Go. It cannot help you code or create a song, for instance. Humans are capable of specialising and doing multiple tasks. Can we build AI systems that equal or even exceed humans in a wide range of tasks? That is pretty much the goal of AGI.

Overall it will be interesting to witness what lies ahead of us but also raises questions of where our place as humans will be. Will we have a world where AI replaces humans or where humans and AI can work together to improve our society? If you ask me, I would want the latter. I’ll leave you with this scene from WALL-E…. Can’t really imagine living like this! 😬 What about you?