As pioneers in adopting ChatGPT technology in Malaysia, XIMNET dives in to take a look how far back does Conversational AI go?
Conversational AI has been around for some time, and one of the noteworthy early breakthroughs was when ELIZA, the first chatbot constructed in 1966. Pattern matching and substitution methodology were used to explore communication between humans and machines, in which both parties did not understand the conversation context.
The next milestone features A.L.I.C.E in 1995, coded using AIML (Artificial Intelligence Markup Language) based on heuristic pattern matching. The Open Source community subsequently gain interest and thus actively contribute to all sorts of research repositories which brings us the vast collection of machine learning models today.
Timeline by Antoine Louis on A Brief History of Natural Language Processing
Siri, Google Assistant, Cortana and Alexa, are the successive technologies rolled out in the 20th century. They are readily accessible via our handy devices and serve as an intelligent personal assistant instead of just simple question-answering based on internet information. NLP, Natural Language Processing and deep neural networks are the core building blocks of the technology which allows our machines, appliances and IOT devices to understand human language at ease. Command execution via voice recognition is the new norm where a simple instruction like "Hey Google, play me some country music!" will easily fire up your Spotify app to your liking.
A nonprofit American Artificial Intelligence company called OpenAI was created with the common goal of developing artificial intelligence "in the way that is most likely to benefit humanity as a whole," according to a statement on OpenAI's website from December 11, 2015.
In November 2022, the public was introduced to ChatGPT, a pre-trained language model that had been fine-tuned on conversational data, and its jaw-dropping capabilities quickly became the talk of the town. The public has been drawn to ChatGPT because of its remarkable capacity to produce natural and compelling responses in a conversational environment, regardless of whether they are AI experts or not. In just 5 days, the AI model has amassed over one million users, prompting people to wonder how ChatGPT can provide such accurate and human-like answers.
Illustration of neural network by DeepMind design and Novoto Studio
(A) Large Language Model (LLM)
It all started with a large Language Model (LLM), a type of pre-trained neural network that is designed to understand and generate natural language in a way that is similar to human language. Being one of the largest LLMs available today, ChatGPT consists of over 175 billion parameters which grant it the ability to generate text that is remarkably similar to human writing. These models are engineered to comprehend to process a large corpus of text data to learn the patterns and structures of natural language. By feeding the model a large dataset of text from Wikipedia and Reddit, the model can analyze and learn from the patterns and relationships between the words and phrases in the text. As the model continues to learn and refine its understanding of natural language, it becomes increasingly adept at generating high-quality text outputs.
Training steps like predicting a word in a sentence, be it a next-word prediction or masked language modelling are crucial in shaping a high-accuracy LLM. Both techniques are normally deployed using Long-Short Term Memory (LSTM), which consists of feedback connections, i.e., it is capable of processing the entire sequence of data, apart from single data points such as images. However, the model has its drawbacks which limit the potential of large datasets.
Prompted using Midjourney by The Decoder
In 2018, openAI released a paper "Improving Language Understanding by Generative Pre-Training" - introducing the concept of a Generative Pre-trained Transformer (GPT), which also serves as one of the contributing factors to the significant advancement in the area of transfer learning in the field of natural language processing (NLP). Simply put, GPTs are machine learning models based on the neural network architecture that mimics the human brain. These models are trained on vast amounts of human-generated text data and are capable of performing various tasks such as question generation and answering.
The model later evolved and they released GPT-2, which is a more robust version trained on a corpus of 8 million web pages, comprising 1.5 billion parameters that facilitate text prediction. However, due to their concerns about malicious applications of the powerful technology, they released a much smaller model for researchers to experiment with, as well as a technical paper. Other than next-word prediction, notable use cases include zero-shot learning. As opposed to typical large neural models that require an insane amount of data, a "zero-shot" framework enables measuring a model's performance having never been trained on the task.
Following two years of parameter adjustments and fine-tuning, GPT-3 was unveiled in May 2020, having been trained on a staggering 45 terabytes of text data, which ultimately translated into 175 billion parameters. It was smarter, faster, and more terrifying than anything we had seen before.
The key success of all GPT models lies within the transformer architecture, which is both encoder (processing the input sequence) and the decoder (generating the output sequence) contain a multi-head self-attention mechanism that enables the model to give different levels of importance to different parts of the sequence in order to understand its meaning and context.
A simple yet comprehensive animation by Raimi Karim illustrating the self-attention mechanism
Source from OpenAI
The figure above summarizes the steps taken by researchers to enhance GPT-3's ability to follow instructions and accomplish tasks rather than simply predicting the most probable word. To start, a fine-tuning process is carried out which produces InstructGPT or also known as a supervised fine-tuning model (SFT). This approach uses patterns and structures learned from labelled training data to generate responses. For instance, a chatbot trained on a dataset of medical conversations will generate informative and appropriate responses to medical-related questions based on its supervised policy.
To incentivize a chatbot to produce more suitable and favourable responses, a reward model is necessary. This model takes in a prompt and the chatbot's responses and outputs a scalar reward based on the desirability of the response. Comparison data is collected by having labellers rank the output they prefer for a given input.
In the last stage, a random prompt is provided to the policy to produce an output, which is then evaluated by the reward model to determine the reward. This reward is then employed to modify the policy using Proximal Policy Optimization (PPO). The Rewards model decides the reward or penalty for each response produced by the chatbot and employs this reward function to steer the learning process, generating relevant, informative, or engaging responses for the user while avoiding producing inappropriate or offensive ones. These processes are then repeated through multiple iterations using Azure AI supercomputing infrastructure which completes the ChatGPT model generation.
applications, for instance building a chatbot?
Let's assume that we are using OpenAI's GPT-3 model, which in their API documentation, is also known as text-davinci-002. The basic steps, include creating an OpenAI API account, setting up an environment to use the API, and programming the chatbot to interact with users.
How about asking ChatGPT to help us with this setup?
XIMNET is a digital solutions provider with two decades of track records specialising in web application development, AI Chatbot and system integration in Malaysia. XIMNET is launching a brand new way of building ChatGPT-powered AI Chatbot with XYAN. Get in touch with us to find out more.