Summary
- Tensorflow Serving can poll updated model automatically
- Tensorflow Serving can poll from Cloud Storage (e.g. AWS, GCP)
- Tensorflow Serving can deploy multiple models, or multiple model’s version
- Tensorflow Serving support batch inference requests
Deploy model with Tensorflow Serving
docker run -p 8500:8500 \
-p 8501:8501 \
--mount type=bind, source=/tmp/models, target=/models/my_model \
-e MODEL_NAME=my_model \
-e MODEL_BASE_PATH=/models/my_model \
-t tensorflow/serving
1 Model Polling
Tensorflow serving has an option called file_system_poll_wait_seconds
which tries to poll the files within a specific interval.
docker run ... \
--file_system_poll_wait_seconds=3600
2 Remote files
We can host the model files, along with other configuration files in the Cloud Storage (e.g., AWS, GCP)
2.1 Amazon Web Service (S3)
docker run ... \
-e MODEL_PATH=s3://.... \
-e AWS_ACCESS_KEY_ID=xxx \
-e AWS_SECRET_ACCESS_KEY=xxx \
-e AWS_REGION=xxx
2.2 Google Cloud Platform (Google Cloud Storage)
docker run ... \
-e MODEL_PATH=gs://... \
-e GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
3 Model Configuration
There are several configurations we can play around with. What we should do is create the file for configuration (e.g., named model_config_list
) and use it when running Tensorflow serving
docker run ... \
--model_config_file=gs://...../model_config_list
--model_config_file_poll_wait_seconds=3600
Within the file named model_config_list
we can specify the different configurations.
3.1 Deploy multiple models
model_config_list {
config {
name: 'my_first_model'
base_path: 'gs://.../my_first_model'
model_platform: 'tensorflow'
}
config {
name: 'my_second_model'
base_path: 'gs://.../my_second_model'
model_platform: 'tensorflow'
}
}
3.2 Deploy multiple model's versions
model_config_list {
config {
name: 'my_first_model'
base_path: 'gs://.../my_firist_model'
model_version_policy {
specific {
versions: 1556250435
versions: 1556251435
}
}
}
}
3.2.1 Version labels
model_config_list {
config {
name: 'my_first_model'
base_path: 'gs://.../my_firist_model'
model_version_policy {
specific {
versions: 1556250435
versions: 1556251435
}
}
version_labels {
key: 'stable'
value: 1556250435
}
version_labels {
key: 'testing'
value: 1556251435
}
}
}
Then the client can call to a different model version by specifying the model labels, versions, and model name.
REST Usage
With version number
v1/models/<model_name>/versions/<version_number>
With version label
v1/models/<model_name>/versions/<version_label >
Read more in Tensorflow serving official document.
4. Batching inference requests
docker run ... \
--enable_batching=true
--batching_parameters_file=gs://..../batching_parameters.txt
Within the batching_parameters.txt
you can just write down the configuration.
max_batch_size { value: 32 }
batch_timeout_micros { value: 5000 }
pad_variable_length: true
#tensorflow #machine-learning