Thanks - yes, it worked - it’s a Lambda-Quad that I bought here with 4x GPUs.
I did the pull-card, reseat cycle a couple of times and it eventually worked. It’s been running all four without issue for five days now. I’m not convinced it’s actually fixed, but if it goes again I’ll pull it. I did do that as part of troubleshooting and ran with three GPUs without issue.
Good suggestion on the UPS - I bought a very good one just for this system (and network disks that hold the data) and have used it from the very beginning.
I’ve got a fair amount of electronics and use power regulators all over the place. I doubt it’s power. Probably the connector cable - which would be ironic, after such care taken on the power and environment.
My next move would be a “slot swap” to isolate the issue to either the board or the slot and its connectors and cables.
i.e. Move the problem board to the first slot and vice versa
Leave any cables with the slot.
See if the problem travels with the board or stays with the slot and its connectors and cables.
I would also look closely at slots to look for slightly bent metal, chipped plastic or possible small pieces of dirt or packing material, jammed in, etc