Transformer

Transformer

T: ChatGPT is a Transformer-based model

The T in GPT stands for “Transformer”. At its core, ChatGPT and basically all contemporary large language models, uses a Transformer, which is a special type of neural network which lets it learn language from lots of data. Given texts, the transformer has some special features which make it a very powerful model to learn language: it can look at all the words in a sentence at the same time and it learns which words are most important to each other through a mechanism called attention. Because of the way it’s built, a transformer model can keep track of meaning and relations between portions of text, even in very long sentences and documents. Transformer models are made internally of many steps (called “layers”), which deal with progressively more complex structures, so that words are turned into more abstract patterns, and then eventually back into words, which is what you see when they generate language.