Learned Embeddings

First and Second principal components of embeddings

We see even without genre as a feature in model or age restriction rating, the embeddings have learnt the following concepts (rough interpretation of latent features):

Chick Flicks VS. Mass Appeal on PCA 1 (x-axis)

  • Chick Flicks - A Walk to Remember, The Prince & Me, Fifty Shades of Grey, 17 Again
  • Mass Appeal - Dark Knight, Skyfall, Ratatouille

Kids & Family VS. Restricted on PCA 2 (y-axis)

  • Kids & Family - Shrek, Harry Potter, How to Train Your Dragon, Santa Clause 2, Up
  • Restricted - Fifty Shades of Grey, Texas Chainsaw Massacre, Seed of Chucky

movies_embed_pca

# Principal Component Analysis to represent movie embeddings in 2-D
from sklearn.decomposition import PCA
from matplotlib import pyplot
X = model.m_emb.weight.data
pca = PCA(n_components=2)
result = pca.fit_transform(X)

Network Architecture

ann_movielens

# Make a neural network
class Model(nn.Module):
  def __init__(self, n_users, n_items, embed_dim, n_hidden=1024):
    super(Model, self).__init__()
    self.N = n_users
    self.M = n_items
    self.D = embed_dim

    self.u_emb = nn.Embedding(self.N, self.D)
    self.m_emb = nn.Embedding(self.M, self.D)
    self.fc1 = nn.Linear(2 * self.D, n_hidden)
    self.fc2 = nn.Linear(n_hidden, 1)

    # set the weights since N(0, 1) leads to poor results
    self.u_emb.weight.data = nn.Parameter(
        torch.Tensor(np.random.randn(self.N, self.D) * 0.01))
    self.m_emb.weight.data = nn.Parameter(
        torch.Tensor(np.random.randn(self.M, self.D) * 0.01))
  
  def forward(self, u, m):
    u = self.u_emb(u) # output is (num_samples, D)
    m = self.m_emb(m) # output is (num_samples, D)

    # merge
    out = torch.cat((u, m), 1) # output is (num_samples, 2D)

    # ANN
    out = self.fc1(out)
    out = F.relu(out)
    out = self.fc2(out)
    return out
Model(
  (u_emb): Embedding(671, 10)
  (m_emb): Embedding(9066, 10)
  (fc1): Linear(in_features=20, out_features=1024, bias=True)
  (fc2): Linear(in_features=1024, out_features=1, bias=True)
)

Recommender Accuracy using Embeddings & ANN

The ANN RMSE of 0.8839 is even lower than SVD++ RMSE of 0.8928 on the MovieLens 100K ratings dataset. Also, the power of deep learning is with large datasets and to test this, I ran the model additionally on MovieLens 20M ratings dataset to get an RMSE of 0.7941.

Epoch 1/10, Train Loss: 1.0377, Test Loss: 0.8340, Test RMSE: 0.9132, Duration: 0:00:00.694932
Epoch 2/10, Train Loss: 0.8072, Test Loss: 0.7952, Test RMSE: 0.8917, Duration: 0:00:00.736841
Epoch 3/10, Train Loss: 0.7554, Test Loss: 0.7935, Test RMSE: 0.8908, Duration: 0:00:00.683794
Epoch 4/10, Train Loss: 0.7204, Test Loss: 0.7739, Test RMSE: 0.8797, Duration: 0:00:00.660207
Epoch 5/10, Train Loss: 0.6986, Test Loss: 0.7796, Test RMSE: 0.8830, Duration: 0:00:00.622400
Epoch 6/10, Train Loss: 0.6831, Test Loss: 0.7816, Test RMSE: 0.8841, Duration: 0:00:00.655083
Epoch 7/10, Train Loss: 0.6696, Test Loss: 0.7722, Test RMSE: 0.8787, Duration: 0:00:00.648430
Epoch 8/10, Train Loss: 0.6604, Test Loss: 0.7941, Test RMSE: 0.8911, Duration: 0:00:00.627179
Epoch 9/10, Train Loss: 0.6475, Test Loss: 0.7886, Test RMSE: 0.8880, Duration: 0:00:00.641276
Epoch 10/10, Train Loss: 0.6312, Test Loss: 0.7812, Test RMSE: 0.8839, Duration: 0:00:00.628338

ann_loss

Recommendations using Deep Learning

We recommend: 

Singin' in the Rain (1952)
Princess Bride, The (1987)
Henry V (1989)
Name of the Rose, The (Name der Rose, Der) (1986)
Lock, Stock & Two Smoking Barrels (1998)
Grand Illusion (La grande illusion) (1937)
Dog Day Afternoon (1975)
Lilo & Stitch (2002)
Band of Brothers (2001)
Midnight in Paris (2011)

Updated:

Leave a Comment