The standard distributed training method for TensorFlow is the so-called Parameter Server (PS) model . In this model, there are worker processes that perform the heavy compute work and separate parameter server processes responsible for combining the worker processes results.
For more details about granting roles to service accounts, see the IAM documentation. Example: Training a sample MNIST model. This section shows you how to train a sample MNIST model using a TPU and runtime version 2.5. The example job uses the predefined BASIC_TPU scale tier for your machine configuration. Later sections of the guide show you how to set up a custom configuration.