ai:๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€

Artificial Intelligence

Transformers

What is Transformers?

Transformers are a type of machine learning model, famously used in natural language processing (NLP). They revolutionized AI by enabling powerful models like OpenAI's GPT and Google's BERT. The key innovation is the โ€œattention mechanism,โ€ which helps the model focus on important parts of the input, making it excellent at understanding context and relationships in data.
Snippet from Wikipedia: Transformer (deep learning architecture)

The transformer is a deep learning architecture that was developed by researchers at Google and is based on the multi-head attention mechanism, which was proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLM) on large (language) datasets.

Transformers were first developed as an improvement over previous architectures for machine translation, but have found many applications since. They are used in large-scale natural language processing, computer vision (vision transformers), reinforcement learning, audio, multimodal learning, robotics, and even playing chess. It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT (bidirectional encoder representations from transformers).

External links:

  • Transformers โ€”huggingface.co
    • Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Search this topic on ...

  • ai/๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€.txt
  • Last modified: 2025/03/30 10:54
  • by Henrik Yllemo