Lambda server continue rebooting

Hey there,

My server recently continue to rebooting while WFH. The server sits in my office with 4 2080Ti. It has been working smoothly in the past several month.

System info:
Linux thanos 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

A few days ago, the server continues rebooting for unknown reason. We did not upgrade any import components.
The var/kern.log does not give any useful info.

Any idea on how to locate the issue?


Maybe you have a power problem? If there’s another drain on that circuit, it could cause this behavior whenever it stresses the load. A bad power supply could do that too. What are you seeing in dmesg?

Does it reboot when it’s running a training job? If it’s only when it’s under load I would look into your power situation at the office. If it’s randomly rebooting on the other hand, it could be a thermal issue with the CPU and that would need to be repaired (for free of course if it’s a Lambda.)

For a Lambda workstation, you can always email and they can help figure it out.