Benchmarking the Titan V (Volta) GPU with TensorFlow

Titan V Deep Learning Benchmarks

Here are the benchmarks comparing the GTX 1080 Ti to the new Titan V (Volta Architecture).

Contributors:
Stephen Balaban s@lambdal.com, Chuan Li c@lambdal.com, Steven Clarkson sc@lambdal.com

System Software:
OS: Ubuntu 16.04
Framework(s): Tensorflow 1.4.1, CUDA 9, cuDNN 7

TensorFlow Benchmark Script

MAX_GPU=4; BATCH_SIZE=32; NUM_BATCHES=500;
for MODEL in inception3 resnet101 googlenet alexnet vgg19 inception4
do
  for NUM_GPU in `seq ${MAX_GPU} -1 1`;
  do
    python tf_cnn_benchmarks.py --model ${MODEL} --batch_size ${BATCH_SIZE} --num_batches ${NUM_BATCHES} --num_gpus ${NUM_GPU} --data_name imagenet &> ~/bench_output/inference_imagenet_train-$(date +%Y%m%d)-${NUM_GPU}gpu-${NUM_BATCHES}batches-${BATCH_SIZE}bs-${MODEL}.log;
  done
done						

Display Results:

MAX_GPU=4; BATCH_SIZE=32; NUM_BATCHES=500;
for MODEL in inception3 resnet101 googlenet alexnet vgg19 inception4
do
  echo ${MODEL};
  for i in `seq ${MAX_GPU}`
  do
    echo "$(cat ~/bench_output/inference_imagenet_train-20171220-${i}gpu-${NUM_BATCHES}batches-${BATCH_SIZE}bs-${MODEL}.log|grep total\ images | awk '{ print $3 }')";
  done
done

System Hardware:
Lambda Quad Deep Learning Workstation
CPU: 1x i7-6850k
GPU: 4x Titan V GPU, 4x 1080Ti

Inception v3 (Synthetic Data)

                  Inceptionv3 (Synthetic Data) in (Images / Sec)
1080              85
1080Ti	          136
Titan V	          190

VGG19 (Synthetic Data)

                  VGG19 (Synthetic Data) in (Images / Sec)
1080               63
1080Ti	          107
Titan V	          147

Essentially between 37% to 40% more throughput on the Titan V compared with the 1080Ti. We’ll be publishing a more thorough analysis on our blog at a later point in time that will cover more than just CNN models and will include actual ImageNet training data. Here are some Titan V scaling charts that show how the throughput increases from 1 through 4 GPUs.

Inception v3 Multi-GPU Scaling

Resnet101 Multi-GPU Scaling

Once again, these are preliminary numbers and just wanted to get the info out there!

Images / Sec / $

As suggested by @Gary, here’s a chart featuring images / second / $ spent on the GPU. As you can see the 1080 Ti with 11 GB of memory is the clear winner.

Essentially between 37% to 40% more throughput on the Titan V compared with the 1080Ti but at over 3x the price!

33 PM

So it means that the V is not 5 times faster than 1080Ti???!!! Or Nvidia just claim 5x of Tesla P4?

Cost wise, maybe it is better to stick around with 1080Ti

Bill

1 Like

It depends. The Titan V is going to be much faster for 64-bit than the 1080 Ti. In addition, to some people, the Titan V’s 50% improvement will be worth it. For the average Deep Learning researcher on a budget, however, the 1080 Ti is still king on the FLOPS / $ battleground.

Also there is a significant advantage with FP16 and the Volta Tensorcore architecture over Pascal.

From what I can tell, the CuDNN 7 library already utilizes the mixed precision Tensorcores so this is the best we are likely to see out of Volta (Titan V or V100) with this version of CuDNN.

It would be interesting to put a $/image chart … shows the tradeoff for speed.

I added this chart to the original article. Thanks for the suggestion.
33 PM