I’m just a user but i’m going to chime in.
This challenge lives at the heart of the concept of the lambda stack.
We would benefit from a deterministic list of instruction of how to build a system, say tensorbook from scratch to serve as the basis for tests and experiments we can recover from without endless trials and errors.
Path 1 - save the entire existing current working system in a way that can flatten and start over no matter what happens.
Path 2 - the instructions to start from bare metal and build a system with no preexsitng saves up to the point a lambda stack can be downloaded and installed
Path 3 - procedures for updating and upgrading (OS, python, etc) that are guaranteed to preserve the existing stack
Tools could include Containers, Virtual machines, environments, etc.
Just thinking out loud.