Transformers

Embedding Embeddings are low-dimensional dense semantics aware representation of objects
Similarity between words
- dot product
- cosine similarity
- pearson correlatiom
Context
- please buy and apple and orange
  - move ‘apple’ towards the ‘orange’ in embedding vector space
- apple unveiled the new phone
  - move ‘apple’ towards the ‘phone’ in embedding vector space
Attention Mechanism
- Use the similarity matrix to move the words around in the embedding space
Keys and Query Matrixes (as linear transformation) => gives us Left Embeddings
- Orange embedding vector * keys matrix
- queries matrox.T * Phone embedding vector.T
Keys and Query matrix transform the embeddings into a space where it’s convenient to calculate the similarities
The Left Embeddings know the features of the word
Value Matrixes (as linear transformation) => gives us Right Embeddings
- Embeddings on the right is optimized to find the next word in the sentence
- Left Embeedings x Value Matrix => Right Embeddings
The Right Embeddings know when two words could appear in the same context

Summary