Breakthroughs in AI

AI (Artificial Intelligence) is a fast-paced and rapidly changing field – it can be challenging to keep track of new developments and how they might fit into your use cases. While it may seem like many options are available, many AI solutions are built upon a small set of foundational concepts introduced through research papers before being implemented in the industry. Each of the ideas introduced in these papers was a groundbreaking paradigm in its time, playing a crucial role in shaping the field of AI as we know it today.

2012 – Image classification with AlexNet

Paper: ImageNet Classification with Deep CNNs

AlexNet is a convolutional neural network (CNN) developed and introduced in 2012. A dataset known as ImageNet, containing over 15 million images, was organised into roughly 22,000 categories as part of an object detection and image classification challenge. It outperformed the previous state-of-the-art approach by a significant margin using a novel combination of techniques. It combined the concept of using CNNs with a more considerable number (or depth) of convolutional layers and efficient parallelised training using graphics processing units (GPUs). It was proven that such an approach could scale quickly through larger datasets and faster hardware, and it paved the way for further research into deep learning.

Modern image classification techniques have made huge improvements since AlexNet was first introduced. Nonetheless, the model’s architecture is still used as an example when introducing deep learning and image recognition in entry-level AI courses.

2013 – Deep Q Networks

Paper: Playing Atari with Deep Reinforcement Learning

Reinforcement learning is a field of AI that deals with autonomous agents receiving inputs and taking actions within an environment. Training these agents involves a reward function – also known as a Q-function – which signals how much reward an agent can expect from taking an action at a specific state. Through multiple rounds of training, an agent learns to take the most optimal actions over time. However, one of the challenges is dealing with inputs with high dimensionality, such as video, audio, and other sensory feedback. For this reason, examples of reinforcement learning in real-world applications such as robotics are still relatively sparse.

In this paper, the authors introduce the concept of Deep Q-Networks (DQN), which applies the deep learning approach from AlexNet to approximate the Q-function for an agent playing video games on the Atari 2600. The approach successfully outperformed previous approaches in 6 out of 7 games and, in some cases, performed better than humans.

To date, reinforcement learning has seen the most success in games, the most popular of which would probably be AlphaGo. However, another increasingly common use of reinforcement learning today is fine-tuning large language models like ChatGPT, in which humans serve as the “reward function” by giving feedback on an LLM’s responses.

2014 – Generative Adversarial Networks

Paper: Generative Adversarial Networks

The fundamental idea behind generative adversarial networks (GANs) is having 2 neural networks – a generator and a discriminator – which compete against each other. The generator aims to create data that can “fool” the discriminator into classifying it as real, while the discriminator’s goal is to correctly differentiate between real and generated data. Both networks will improve with training over time, and through this process, we will be able to create models that generate highly realistic data such as images, audio, and video.

Forms the basis behind early approaches to generative AI text-to-image models like Midjourney, DALL-E and Stable Diffusion. Generative AI continues to be a heavily researched topic today, with offerings such as OpenAI’s Sora pushing the bounds of what was traditionally thought possible.

2017 – Transformers and Large Language Models


“Attention is all you need” was a seminal paper introducing the Transformer architecture and the attention mechanism, which has now become widely used in NLP and building language models. Where previous approaches to processing input sequences treat each word with equal importance, the attention mechanism can focus on the most relevant aspects of the input and give them a higher weight. Additionally, transformers are able to process input sequences in parallel, as opposed to the sequential operations that were necessary when using previous state-of-the-art recurrent neural network (RNN) approaches.

In a later paper “Language models are few-shot learners”, researchers at OpenAI explain the architecture of GPT-3. Its model was based on the transformer architecture just like its predecessor, GPT-2. The biggest difference between the 2 models, however, was scale. GPT-3 was trained with 175 billion parameters on about 570GB of text data, compared to GPT-2 with 1.5 billion parameters and 40GB of data. This proved to have a significant impact in the model’s overall performance. GPT-3 was able to generate high-quality responses in a wide range of domains that it was not purposely trained for. This trait is also known as few-shot learning, which refers to the ability of models to generalise and adapt to new domains with relatively sparse training data. At this point, the success of GPT-3 is well-known far beyond the AI community and has laid the foundations for ChatGPT and ongoing research into LLMs today.

The past few years have seen many exciting advancements in AI, as companies and research institutions continue to release new publications, products and features. At Knovel Engineering, we place a strong focus on integrating cutting-edge AI research into the solutions we develop. Recently, we emerged 3rd out of 182 shortlisted teams in a Large Language Model (LLM) Efficiency Challenge organised by NeurIPS.

Keen to find out more about our company’s innovations? Follow us on LinkedIn and stay tuned to learn more about our tech stack capabilities!

Add label

Related Articles

Share This