Let’s say you’re running a pytorch script that seems to be crashing like this:
$ python test-pytorch.py
Killed
$ echo $?
137
Exit code 137
means that the process was killed with a signal value of 9
or SIGKILL
. (137 - 128 = 9
). That means your process was likely hogging system resources and was killed. The most likely situation is that you’re allocating too much memory / there’s a memory leak. I observed with htop that the memory allocation kept climbing up to the total system memory before the process was killed.