Lambda stack has a pytorch/CUDA version incompatibility?

I am experiencing a seemingly similar problem. I’ve had my Lambda Tensorbook for about a year now, and a few times since then it suddenly has a problem where it doesn’t recognize the GPU. This normally happens after the computer goes to sleep (power cord removed, and left to idle), then rebooted. But it has also happened after just turned off for a couple days then booted normally. nvidia-smi usually doesn’t run at all (missing components), and sudo nvidia-settings has some assertion failures. To resolve these problems, after playing around a bit, I resort to updating the nvidia driver (usually by updating the Lambda stack with sudo apt-get update && sudo apt-get dist-upgrade. A reboot after this sets everything to working again. I’m certainly in favor of keeping drivers & other packages updated, but I don’t understand why a working system would suddenly stop working when there have been no changes. It seems that something related to the GPU/CUDA configuration is very fragile, especially when the power is disrupted. How can I make my installation more robust?
Thank you.