I am in a situation where the main SSD in my LambdaQuad (4 GPU workstation server) is in danger of dying.
I see the following error message once in a while: EXT4-fs error device nvme0n1p2 ext4-find-entry:1455 inode: nnnnnn System_Journal ID [nnn]: Failed to write entry…
It is giving me warning.
After rebooting the smartct says the device “PASSED”.
But I suspect better replace now than wait for it to crash.
The OS is Ubuntu 20.04.
The SSD details:
Model Number: Samsung SSD 970 EVO 1TB
Serial Number: S46DNF0K504699M
How to replace it and avoid reinstalling the OS (Ubuntu 20.04)?
Thanks,
Karun
We have a ticket on this. It will be helpful to have the:
smartctl -x /dev/nvme0
Also it may be you just need to run fsck. I am waiting on data to
determine how/why the data was corrupt. (inode is basically a pointer to a file).
Either way it is best to back up your data sooner.
This will show previous errors, temperatures over threshold, etc.
For copying there are a few ways…
The hold ‘hack’ way"
a. Copy the partition table
dd if=/dev/olddev of=/dev/newdev
where olddev/newdev are your actual device names
Control-C after a few seconds (copies the partition tables)
Or you can use fdisk, parted, gparted to create the same partitions.
b. mkfs/Mount the partitions
- mkfs the new partitions
- mount the partitions
- Then use cp -ax or find . | cpio -pdum directory