- Lesson 1 - Attention Mechanism with Pictures
- Lesson 2 - Attention Mechanism with Math
- Lesson 3 - Transformer Model
Transformers
-
Embedding Embeddings are low-dimensional dense semantics aware representation of objects
- Similarity between words
- dot product
- cosine similarity
- pearson correlatiom
- Context
- please buy and apple and orange
- move ‘apple’ towards the ‘orange’ in embedding vector space
- apple unveiled the new phone
- move ‘apple’ towards the ‘phone’ in embedding vector space
- please buy and apple and orange
- Attention Mechanism
- Use the similarity matrix to move the words around in the embedding space
- Keys and Query Matrixes (as linear transformation) => gives us Left Embeddings
- Orange embedding vector * keys matrix
- queries matrox.T * Phone embedding vector.T
- Keys and Query matrix transform the embeddings into a space where it’s convenient to calculate the similarities
-
The Left Embeddings know the features of the word
- Value Matrixes (as linear transformation) => gives us Right Embeddings
- Embeddings on the right is optimized to find the next word in the sentence
- Left Embeedings x Value Matrix => Right Embeddings
- The Right Embeddings know when two words could appear in the same context
Summary
- Self-attention
- Multi-head attention (using Transformers)
Leave a Comment