What is Transformer? — AI Glossary

The transformer architecture, introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al., revolutionized natural language processing by replacing recurrent neural networks with a mechanism called self-attention. This allows the model to look at all parts of an input simultaneously rather than processing it sequentially, making training far more parallelizable and enabling models to capture long-range dependencies in text.

A transformer consists of an encoder that processes input text and a decoder that generates output text, though many modern LLMs use decoder-only architectures. The key innovation is the attention mechanism, which lets the model weigh the importance of different words in relation to each other. For example, in the sentence "The cat sat on the mat because it was tired," attention helps the model understand that "it" refers to "the cat."

Transformers have become the dominant architecture not just for language but also for computer vision (Vision Transformers), audio processing, and even protein structure prediction. Their ability to scale efficiently with more data and compute is what enabled the current generation of powerful AI systems.

Transformer

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials.
Start building.

Transformer

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials.
Start building.

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials. Start building.

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials. Start building.

Stop watching tutorials.
Start building.

Stop watching tutorials.
Start building.