4x 1080 Ti - #3 always red-lighting shortly after restart

Quad running 18.04 LTS with 4s 1080 Ti.

Despite multiple board and power reseats, #3 of 4 goes red-light on the right-side shortly after system restart.

Presuming at this point that the board is bad, but maybe are additional troubleshooting steps I can try?

I think my options are:

  1. solicit additional troubleshooting, try that, maybe successful. Life goes on.

  2. pull that card and run with 3x

  3. use this as an excuse to pull the 1080s and put RTX 2080 Ti in place. :wink:

I’ve got a working and productive PyTorch environment and am not sure I’m up for possible setup hell for (3).

Are there additional hardware troubleshooting steps I can try? A link or FAQ would get me going.

Thanks much - RH

1 Like

Your list sounds like the right approach.

Wondering why you haven’t done steps 2 and/or 3

PS - did it ever work? 4 cards may be pushing the power supply.

Thanks - yes, it worked - it’s a Lambda-Quad that I bought here with 4x GPUs.

I did the pull-card, reseat cycle a couple of times and it eventually worked. It’s been running all four without issue for five days now. I’m not convinced it’s actually fixed, but if it goes again I’ll pull it. I did do that as part of troubleshooting and ran with three GPUs without issue.

Sounds to me you are on the hairy edge but what do I know?

Fluxuations in external power from your utility (freq and voltage) may be pushing you over the edge.

Maybe look into a UPS ?

Good suggestion on the UPS - I bought a very good one just for this system (and network disks that hold the data) and have used it from the very beginning.

I’ve got a fair amount of electronics and use power regulators all over the place. I doubt it’s power. Probably the connector cable :slight_smile: - which would be ironic, after such care taken on the power and environment.

My next move would be a “slot swap” to isolate the issue to either the board or the slot and its connectors and cables.

i.e. Move the problem board to the first slot and vice versa
Leave any cables with the slot.

See if the problem travels with the board or stays with the slot and its connectors and cables.

I would also look closely at slots to look for slightly bent metal, chipped plastic or possible small pieces of dirt or packing material, jammed in, etc

Just thinking out loud.

1 Like