Summary
- Tensorflow Serving can poll updated model automatically
- Tensorflow Serving can poll from Cloud Storage (e.g. AWS, GCP)
- Tensorflow Serving can deploy multiple models, or multiple model’s version
- Tensorflow Serving support batch inference requests
Deploy model with Tensorflow Serving
docker run -p 8500:8500 \
           -p 8501:8501 \
           --mount type=bind, source=/tmp/models, target=/models/my_model \
           -e MODEL_NAME=my_model \
           -e MODEL_BASE_PATH=/models/my_model \
           -t tensorflow/serving
1 Model Polling
Tensorflow serving has an option called file_system_poll_wait_seconds which tries to poll the files within a specific interval.
docker run ... \
           --file_system_poll_wait_seconds=3600
2 Remote files
We can host the model files, along with other configuration files in the Cloud Storage (e.g., AWS, GCP)
2.1 Amazon Web Service (S3)
docker run ... \
           -e MODEL_PATH=s3://.... \
           -e AWS_ACCESS_KEY_ID=xxx \
           -e AWS_SECRET_ACCESS_KEY=xxx \
           -e AWS_REGION=xxx
2.2 Google Cloud Platform (Google Cloud Storage)
docker run ... \
           -e MODEL_PATH=gs://... \
           -e GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
3 Model Configuration
There are several configurations we can play around with. What we should do is create the file for configuration (e.g., named model_config_list) and use it when running Tensorflow serving
docker run ... \
           --model_config_file=gs://...../model_config_list
           --model_config_file_poll_wait_seconds=3600
Within the file named model_config_list we can specify the different configurations.
3.1 Deploy multiple models
model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_first_model'
    model_platform: 'tensorflow'
  }
  config {
    name: 'my_second_model'
    base_path: 'gs://.../my_second_model'
    model_platform: 'tensorflow'
  }
}
3.2 Deploy multiple model's versions
model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_firist_model'
    model_version_policy {
      specific {
        versions: 1556250435
        versions: 1556251435
      }
    }
  }
}
3.2.1 Version labels
model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_firist_model'
    model_version_policy {
      specific {
        versions: 1556250435
        versions: 1556251435
      }
    }
    version_labels {
      key: 'stable'
      value: 1556250435
    }
    version_labels {
      key: 'testing'
      value: 1556251435
    }
  }
}
Then the client can call to a different model version by specifying the model labels, versions, and model name.
REST Usage
With version number
v1/models/<model_name>/versions/<version_number>
With version label
v1/models/<model_name>/versions/<version_label >
Read more in Tensorflow serving official document.
4. Batching inference requests
docker run ... \
           --enable_batching=true
           --batching_parameters_file=gs://..../batching_parameters.txt
Within the batching_parameters.txt you can just write down the configuration.
max_batch_size { value: 32 }
batch_timeout_micros { value: 5000 }
pad_variable_length: true
#tensorflow #machine-learning