Tensorflow binary not compiled to support AVX2 FMA

getting the below message when running python code. apparently, tensorflow is not compiled to support the AVX2 and FMA. also get the message below for CUDA. currently my code only runs 20 seconds faster than my HP

2018-04-14 08:01:56.859478: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-14 08:01:56.960850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-14 08:01:56.961120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1070 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.2655
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 206.62MiB
2018-04-14 08:01:56.961151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-14 08:01:57.170502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-14 08:01:57.170549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-14 08:01:57.170557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-14 08:01:57.170725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 152 MB memory) → physical GPU (device: 0, name: GeForce GTX 1070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-14 08:01:57.171981: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 152.62M (160038912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-14 08:01:57.172873: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 137.36M (144035072 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

1 Like

One thing that I notice is that you are running out of memory (you can see that from the CUDA_ERROR_OUT_OF_MEMORY errors that you’re getting:

2018-04-14 08:01:57.171981: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 152.62M (160038912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-14 08:01:57.172873: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 137.36M (144035072 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Can you try to re-run your job with a smaller batch size?