Transformers

  • Embedding Embeddings are low-dimensional dense semantics aware representation of objects

  • Similarity between words
    • dot product
    • cosine similarity
    • pearson correlatiom
  • Context
    • please buy and apple and orange
      • move ‘apple’ towards the ‘orange’ in embedding vector space
    • apple unveiled the new phone
      • move ‘apple’ towards the ‘phone’ in embedding vector space
  • Attention Mechanism
    • Use the similarity matrix to move the words around in the embedding space
  • Keys and Query Matrixes (as linear transformation) => gives us Left Embeddings
    • Orange embedding vector * keys matrix
    • queries matrox.T * Phone embedding vector.T
  • Keys and Query matrix transform the embeddings into a space where it’s convenient to calculate the similarities
  • The Left Embeddings know the features of the word

  • Value Matrixes (as linear transformation) => gives us Right Embeddings
    • Embeddings on the right is optimized to find the next word in the sentence
    • Left Embeedings x Value Matrix => Right Embeddings
  • The Right Embeddings know when two words could appear in the same context

Summary

  • Self-attention
  • Multi-head attention (using Transformers)

Updated:

Leave a Comment