Let’s say you’re running a pytorch script that seems to be crashing like this:
$ python test-pytorch.py
$ echo $?
137 means that the process was killed with a signal value of
137 - 128 = 9). That means your process was likely hogging system resources and was killed. The most likely situation is that you’re allocating too much memory / there’s a memory leak. I observed with htop that the memory allocation kept climbing up to the total system memory before the process was killed.