How to replace the SSD in LambdaQuad?

I am in a situation where the main SSD in my LambdaQuad (4 GPU workstation server) is in danger of dying.
I see the following error message once in a while:
EXT4-fs error device nvme0n1p2 ext4-find-entry:1455 inode: nnnnnn
System_Journal ID [nnn]: Failed to write entry…

It is giving me warning.
After rebooting the smartct says the device “PASSED”.
But I suspect better replace now than wait for it to crash.
The OS is Ubuntu 20.04.

The SSD details:
Model Number: Samsung SSD 970 EVO 1TB
Serial Number: S46DNF0K504699M

How to replace it and avoid reinstalling the OS (Ubuntu 20.04)?
Thanks,
Karun

I meant smartctl --all /dev/nvme0n1p2 says the
SMART overall-health self-assessment test result: PASSED

We have a ticket on this. It will be helpful to have the:
smartctl -x /dev/nvme0

Also it may be you just need to run fsck. I am waiting on data to
determine how/why the data was corrupt. (inode is basically a pointer to a file).
Either way it is best to back up your data sooner.

  • This will show previous errors, temperatures over threshold, etc.

For copying there are a few ways…

  1. The hold ‘hack’ way"
    a. Copy the partition table
    dd if=/dev/olddev of=/dev/newdev
    where olddev/newdev are your actual device names
    Control-C after a few seconds (copies the partition tables)
    Or you can use fdisk, parted, gparted to create the same partitions.
    b. mkfs/Mount the partitions
    - mkfs the new partitions
    - mount the partitions
    - Then use cp -ax or find . | cpio -pdum directory
  2. Use one of the tools… I will test clonezilla and gparted to do the moves.
    https://clonezilla.org/
    GParted -- GParted Manual
    * Again I will test these.

Normally I just do a fresh install, and copy over the home or data directories.

Mark