Coursera NLP Module 1 Week 4 Notes
Machine Translation: An Overview
Transforming word vector
Given a set of english words X, a transformation matrix R and a desired set of french word Y the transformation
- \[XR \approx Y\]
- We initialize the weights R randomly and in a loop execute the following steps
- \[Loss = || XR - Y||_F\]
- \[g = \frac{d}{dR} Loss\]
- \[R = R - \alpha g\]
The Frobenius Norm takes all the squares of each elements of the matrix and sum them up.
- \[||A||_F = \sqrt{\sum_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}|^2}\]
To simplify we can take the norm squared, thus:
- \[||A||^2_F = \sum_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}|^2\]
Gradient:
- \[g = \frac{d}{dR} Loss = \frac{2}{m} (X^T (XR-Y))\]
Hash tables and hash functions
Hash might skip other proprieties of the itens being hashed. To ensure that the itens are hashed accordingly we will use Locality sensitive hashing.
Locality sensitive hashing
With multiple plans we can use a binary encoding to give the hash of the position given by the position.