Generative Artificial Intelligence

Artificial intelligence (AI) has been researched for decades and in recent years used within applications such as social media algorithms and autonomous systems. However, wide spread availability of generative artificial intelligence, on platforms such as ChatGPT and DALL-E, has enabled millions of people to experience their first direct interaction with AI. In this article I look at some of the key elements of generative artificial intelligence and compare popular Generative AI platforms.

What Is Generative AI?

Generative artificial intelligence is a field of artificial intelligence in which various forms of data, including content such as images, text and audio, are produced based upon the patterns and structures of data that the AI has been trained on. In essence, the difference between generative AI and the computer systems of previous decades, is that rather than follow a series of instructions, they recognise patterns, enabling them to generate something new, based upon user input. The term prompt engineering is widely used to describe the process of entering text instructions into a chatbot in order to generate content.

Generative AI Models

Generative AI Examples

Generative AI Models

Generative artificial intelligence uses machine learning models models to generate new samples of data, such a text, images and audio, by recognising the structure underlying datasets used for training. The purpose of discriminative AI is to classify data by placing different types of data into different categories, such as human, car, tree or bird. Supervised learning AI models are trained to predict the output from unknown inputs by going through a process of correctly labelling input and output data. Unsupervised learning models find patterns in data, without the data being given explicit labels. Reinforcement models learn through trial and error by interacting with a real or virtual environment. For example, robots navigating a room or computers playing games.

Generative AI is a class of artificial intelligence that is trained on a dataset, which after learning the underlying data distribution can generate new samples of data from that dataset. Transformers enable systems to evaluate the importance of each element of input data and are widely used in the modelling of language and text generation. By looking at and relating different pieces of data, they can generate coherent long form text. Typically used in image generation, diffusion models learn to reverse the process of adding noise and generate new images by simulating noisy images and then removing the noise.

Neural networks mimic functions within a biological brain and process data to find patterns, which enables the discovery of novel outputs. Machine Learning (ML) is a subset of AI, in which systems make decisions based upon data they analyse, rather than being explicitly programmed. For example, spam filters and recommendation algorithms. Deep Learning (DL) is a subset of ML that models complex data patterns, using artificial neural networks and large data sets. For example, autonomous cars and voice assistants. A subset of ML, DL enables a system to represent hierarchical data and learn features of large datasets, to produce realistic content, such as text, images and audio.

Natural Language Processing (NLP) involves analysing and generating human speech. For example, chatbots and translation software. Large Language Models (LLMs), such as ChatGPT, are a particular subset of AI and ML, built on NLP frameworks, enabling them to manage complex language tasks, such as human-like conversation and content generation. LLMs make use of DL to analyse vast quantities of text data. Based upon this text data analysis, they learn to predict and generate patterns of language. GPT (Generative Pre-trained Transformer) models are widely used by people to generate coherent content, which can be output in different languages and formats.

Generative AI Examples

Large Language Models (LLMs) and the many AI powered platforms, that can be used to generate text, images and audio from typed prompts, in addition to analysing data, have had a huge impact on content creation. Some individuals and businesses have found them to be useful tools, that enable them to increase their productivity, but others fear the implications of such technology for our future. There has also been controversy, as many people resent their creative work being used as data to train AI, without their prior agreement. There are many others available, but some of the popular generative AI examples are described below:

ChatGPT was developed by OpenAI and built upon transformer architecture. Trained on huge datasets taken from the Internet, it can be targeted at a specific niche and quickly scaled as required. It is commonly used for text generation tasks, such as providing summaries, writing code and answering questions on a wide range of subjects.

Bard was developed by Google and designed to focus on conversation, making it suitable for uses such as customer service and virtual assistants, which require dialogue and nuance. Bard is well integrated with other Google applications, such as maps and search. However, it is less flexible than GPT models and not well suited to the generation of content.

Claude was developed by Anthropic, which was founded by former OpenAI employees and prioritises development of AI that is safe, reliable and ethical. Although less flexible than some other AI platforms, Claude excels at language related tasks and it is suitable for use cases requiring high standards of artificial intelligence ethics, such as law, education and health care.

LLaMA was developed by Meta using transformer-based models and designed to deliver high performance, with minimal resource usage. More efficient than some of the other platforms, LLaMA is also more customisable, making it a popular choice for open-source projects and research-driven studies. LLaMA is a good option for businesses keen to develop their own AI applications.

Image and Audio Generation

While ChatGPT, Bard, Claude and LLaMA are widely used to generate text, AI is also frequently used to generate images and audio. DALL-E, MidJourney and Canva are popular image generation platforms. MidJourney and DALL-E excel at the production of artistic images from prompts. Canva is a design tool, often used to make images for marketing material and posts on social media. Using text to speech, such as WaveNet from Google and Polly from Amazon, written text can be converted into synthetic human voices. Other AI tools are also in use or in development for specialised activities, such as data analysis.

Musicians have long used technology to assist them in the making, recording and delivery of music. AI Music generators such as Suno and Udio are now being used by non-musicians, but to many the results lack the soul of music created by people, who can draw upon their personal experience and creative intelligence. Although inconsistent beyond short clips and often falling into the ‘uncanny valley’, AI generated video is becoming increasingly realistic. AI video generation platforms include Gen-2 from Runway and Stable Video Diffusion from StabilityAI, the company behind open-source Stable Diffusion. When comparing these different tools, it is important to remember that the ethical debate around their use continues.

Conclusions

Generative artificial intelligence has the potential to bring benefits to individuals and society, but there are challenges associated with the technology. Many organisations have developed their own AI business strategy or plan to do so. Some view AI as an assistant, that can help people build a better world, but others recommend caution. Concerns have been voiced regarding the importance of human beings retaining control of the technology and the need to share the potential benefits, rather than deepening existing inequalities.

In this article I have tried to provide an overview of artificial intelligence and the possible consequences of its increasing use. New developments are emerging at a rapid rate, making it difficult to predict with any confidence what will happen during the coming years. However, the more each of us understands about the nature of AI and its potential impact on how we live and work, the better prepared we should be. The decisions we make will shape the lives of current and future generations and the world we all share.