Question: is there any way that your combination of BERT and Horovod can accommodate the fact that, for SQuAD, the processing of each question against the paragraphs of a document is completely independent of the processing of each other question?
I ask because I have noticed no improvement in performance for prediction when using BERT Multi-GPU Tensorflow and Horovod.
Here are the details:
I’ve set up
BERT Multi-GPU implementation using TensorFlow and Horovod in hopes that it would both speed up run_squad performance for prediction and make use of both gpus on a host machine. Following the instructions, it appears that both gpus are indeed operating at or near capacity, and that more than one cpu is being used (so nice to see multi-cpu processing, too).
What I also observe is that the artifacts that are created in the result/output directory and that are also created in worker 1’s subdirectory are identical - the eval.tf_record files, the nbest_prediction files, …
BTW, the elapsed processing time is just slightly faster without the horovod adaptations.
So it appears that this very cool approach using horovod does not help out with prediction. Is that correct?
You can see a related conversation in a horovod issue.