—-

GCP TPU 관련 링크

paper 링크

원문 링크

dataset 링크

이탤릭 볼드 이탤릭볼드

Workflow stages

  1. Question or problem definition.
  2. Acquire training and testing data.
  3. Wrangle, prepare, cleanse the data.
  4. Analyze, identify patterns, and explore the data.
  5. Model, predict and solve the problem.
  6. Visualize, report, and present the problem solving steps and final solution.
  7. Supply or submit the results.

기본적으로 설치되어 있어야하는 패키지는 아래 코드 를 사용한다.

%tensorflow_version 2.x
import tensorflow as tf
print("Tensorflow version " + tf.__version__)

TPU 위치 찾기


try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
  raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)

# Simply swap out the distribution strategy and the model will run on the given device
strategy = tf.distribute.experimental.TPUStrategy(tpu)

# After the TPU is initialized, you can use manual device placement to place the computation on a single TPU device.
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
with tf.device('/TPU:0'):
  c = tf.matmul(a, b)
print("c device: ", c.device)
print(c)


# To replicate a computation so it can run in all TPU cores, you can simply pass it to strategy.run API. Below is an example that all the cores will obtain the same inputs (a, b), and do the matmul on each core independently. The outputs will be the values from all the replicas.
# 코어에 텐서 할당
@tf.function
def matmul_fn(x, y):
  z = tf.matmul(x, y)
  return z

z = tpu_strategy.run(matmul_fn, args=(a, b))
print(z)

모델을 TPU로 올리고 학습하기


with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

batch_size = 200
steps_per_epoch = 60000 // batch_size

train_dataset = get_dataset(batch_size, is_training=True)
test_dataset = get_dataset(batch_size, is_training=False)

model.fit(train_dataset,
          epochs=5,
          steps_per_epoch=steps_per_epoch,
          validation_data=test_dataset)

Comments