### What is Embedding

Google ML Crash Course - Embedding said

An embedding is a relatively

low-dimensional spaceinto which you can translate high-demensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embeddingcaptures some of the semanticsof the input placing sementically similar inputs close together in the embedding space. Anembedding can be learnedand resued across models

Will Koehrsen (Data Scientist at Cortex Intel) has mentioned embeddings in his medium article

An embedding is a mapping of a discrete - categorical - variable to a vector of continuous numbers. In the context of neural networks, embeddings are low-dimensional,

learned continuous vector representationsof discrete variables. Neural network embeddings are useful because they canreduce the dimensionalityof categorical variables andmeaningfullyrepresent categories in the transformed space.

In Machine Learning Design Patterns book written by Valliappa Lakshmanan, Sara Robinson & Michael Munn mentioned embeddings in Design Pattern 2: Embedding that

Embeddings are a

learnable data representationthat maphigh-cardinality data into a lower dimensional spacein such a way that the informationrelevant to the learning problemis preserved. Embeddings are at the heart of modern-day machine learning and have various incarnations through the field.

In brief, Embeddings are *"A learnable continuous vector presentation, use to reduce the dimensionalty and expected to captures semantics relationship of the input data"*

### Why Embedding

Most of the reason we widely use embedding is to **reduce the dimensionalty** of the input data in a **meaningful** way.

Basically we start by encode the categorical input into a one-hot encoding that maps each input string into a 1 and 0 of vectors.

Using one-hot encoding can causes 2 issues

**Sparse matrix**where there are high dimensions with a lot of zero values.**Independent**where each variable considered as independent.

#### Sparse Matrix

For example we have the input as followed

```
input = [
['Dog'],
['Cat'],
['Cat'],
['Dog']
]
```

When we encode it as a one-hot encoding the result will be

```
Dog = [1, 0]
Cat = [0, 1]
### The encoded will be
encoded_input = [
[1, 0],
[0, 1],
[0, 1],
[1, 0]
]
```

As you can see 2 dimensions is very low, but if we have many many more categories, it may cause a problem.

Imagine the recommendation system, we usually have multiple products which user interacted with, let's say we have 5k products, our one-hot encoding of each product should contains 5k dimensions. If we need to consider last 5 interactions for each users, the input will be 25k dimensions!

```
inputs = [
"user_id": "1",
"product_id_1": [0,0, ...., 0], # (length = 5,000, only one value as 1)
"product_id_2": [1,0, ...., 0], # (length = 5,000, only one value as 1)
"product_id_3": [0,0, ...., 1], # (length = 5,000, only one value as 1)
"product_id_4": [0,0, ...., 0], # (length = 5,000, only one value as 1)
"product_id_5": [0,0, ...., 0], # (length = 5,000, only one value as 1)
]
```

#### Independent

- http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

### How to Embedding

The embedding layer is just another hidden layer of the neural network. We can either use the embeding layer from any neural network libraries (Keras, Tensorflow, Tensorflow Feature Column, Pytorch, etc.)

**Here're the generic steps:**

- Create a lookup table (Mapped a string to an index)
- Transform those categories into a one-hot encoding (Mapped the number(index) to the one-hot encoding)
- Connect one-hot encoding to an embedding layer (Typically just a hidden layer with a lower dimensional space compared to one-hot layer)
- Connect to a hidden layer to produce the output
- Finally connect to the output layer with a softmax activation function.

For more detail consider look into the interesting articles.

#### To train generic embeddings

We also can leverage the complete packages to train the general purpose embeddings

### Alternatives to Embedding

- Autoencoders

###### Other Notes:

- Choosing the embedding dimension
- Use the forth root of the total number of unique categorical elements
- 1.6 times the square root of the number of unique elements in the category, and no less than 600

- Introducing TensorFlow Feature Columns said that

How do the values in the embeddings vectors magically get assigned? Actually, the assignments happen during training. That is, the model learns the best way to map your input numeric categorical values to the embeddings vector value in order to solve your problem. Embedding columns increase your model's capabilities, since an embeddings vector learns new relationships between categories from the training data.