## Vector Space Models

Can be used to identify similarity, question answering, paraphrasing and summarization.

You shall know a word by the company it keeps.

## Co-occurence Matrix

Word by Word - Number of times they occur together within a certain distance k.

Word by Document - Number of times a word occur within a certain category.

## Euclidean Distance

The distance between them is the straight line that connects them which is given by $d(B, A) = \sqrt{(B_1 - A_1)^2 + (B_2 - A_2)^2}$ in a more general way can be expressed as $d(\overrightarrow{v}, \overrightarrow{w}) = \sqrt{\sum_{i=1}^{n}{(v_i - w_i)^2}}$ for a n dimensional vector space.

In python it can be given as

d = np.linalg.norm(v-w)


## Cosine Similarity: Intuition

The cosine distance measures the inner distance between two vectors.

## Cosine Similarity

Vector norm:

• $||\overrightarrow{v}|| = \sqrt{\sum_{i=1}^{n}v_i^2}$

Dot Product:

• $\overrightarrow{v} . \overrightarrow{w} = \sum_{i=1}^{n}v_i.w_i$

Cossine Similarity:

TODO:

• Extrair os frames com ffmpeg
• Escrever script para isso
• Perguntar sobre metas
• Perguntar sobre OKRs