Tensorflow Serving

2022/01/18


Summary

  • Tensorflow Serving can poll updated model automatically
  • Tensorflow Serving can poll from Cloud Storage (e.g. AWS, GCP)
  • Tensorflow Serving can deploy multiple models, or multiple model’s version
  • Tensorflow Serving support batch inference requests

Deploy model with Tensorflow Serving

docker run -p 8500:8500 \
           -p 8501:8501 \
           --mount type=bind, source=/tmp/models, target=/models/my_model \
           -e MODEL_NAME=my_model \
           -e MODEL_BASE_PATH=/models/my_model \
           -t tensorflow/serving

1 Model Polling

Tensorflow serving has an option called file_system_poll_wait_seconds which tries to poll the files within a specific interval.

docker run ... \
           --file_system_poll_wait_seconds=3600

2 Remote files

We can host the model files, along with other configuration files in the Cloud Storage (e.g., AWS, GCP)

2.1 Amazon Web Service (S3)

docker run ... \
           -e MODEL_PATH=s3://.... \
           -e AWS_ACCESS_KEY_ID=xxx \
           -e AWS_SECRET_ACCESS_KEY=xxx \
           -e AWS_REGION=xxx

2.2 Google Cloud Platform (Google Cloud Storage)

docker run ... \
           -e MODEL_PATH=gs://... \
           -e GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json

3 Model Configuration

There are several configurations we can play around with. What we should do is create the file for configuration (e.g., named model_config_list) and use it when running Tensorflow serving

docker run ... \
           --model_config_file=gs://...../model_config_list
           --model_config_file_poll_wait_seconds=3600

Within the file named model_config_list we can specify the different configurations.

3.1 Deploy multiple models

model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_first_model'
    model_platform: 'tensorflow'
  }
  config {
    name: 'my_second_model'
    base_path: 'gs://.../my_second_model'
    model_platform: 'tensorflow'
  }
}

3.2 Deploy multiple model's versions

model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_firist_model'
    model_version_policy {
      specific {
        versions: 1556250435
        versions: 1556251435
      }
    }
  }
}

3.2.1 Version labels

model_config_list {
  config {
    name: 'my_first_model'
    base_path: 'gs://.../my_firist_model'
    model_version_policy {
      specific {
        versions: 1556250435
        versions: 1556251435
      }
    }
    version_labels {
      key: 'stable'
      value: 1556250435
    }
    version_labels {
      key: 'testing'
      value: 1556251435
    }
  }
}

Then the client can call to a different model version by specifying the model labels, versions, and model name.

REST Usage

With version number

v1/models/<model_name>/versions/<version_number>

With version label

v1/models/<model_name>/versions/<version_label >

Read more in Tensorflow serving official document.

4. Batching inference requests

docker run ... \
           --enable_batching=true
           --batching_parameters_file=gs://..../batching_parameters.txt

Within the batching_parameters.txt you can just write down the configuration.

max_batch_size { value: 32 }
batch_timeout_micros { value: 5000 }
pad_variable_length: true

#tensorflow #machine-learning