* The mysterious case of the disappearing superblock ... @ 2022-01-18 19:51 anthony 2022-01-18 20:00 ` Phil Turmel 2022-01-18 23:00 ` NeilBrown 0 siblings, 2 replies; 10+ messages in thread From: anthony @ 2022-01-18 19:51 UTC (permalink / raw) To: Linux RAID; +Cc: Phil Turmel, NeilBrown You all know the story of how the cobbler's children are the worst shod, I expect :-) Well, the superblock to my raid (containing /home, etc) has disappeared, and I don't have a backup ... (well I do but it's now well out of date). So, a new hard drive is on order, for backup ... Firstly, given that superblocks seem to disappear every now and then, does anybody have any ideas for something that might help us track it down? The 1.2 superblock is 4K into the device I believe? So if I copy the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each partition, that might help provide any clues as to what's happened to it? What am I looking for? What is the superblock supposed to look like? Secondly, once I've backed up my partitions, I obviously need to do --create --assume-clean ... The only snag is, the array has been rebuilt, so I doubt my data offset is the default. The history of the array is simple. It's pretty new, so it will have been created with the latest mdadm, and was originally a mirror of sda4 and sdb4. A new drive was added and the array upgraded to raid-5, and I BELIEVE the order is sdc4, sda4, sdb1 - sdb1 being the new drive that was added. Am I safe to assume that sdc4 and sda4 will have the same data offset? What is it likely to be? And seeing as it was the last added am I safe to assume that sdb1 is the last drive, so all I have to do is see which way round the other two should be? At least the silver lining behind this, is that having been forced to recover my own array, I'll understand it much better helping other people recover theirs! Cheers, Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: The mysterious case of the disappearing superblock ... 2022-01-18 19:51 The mysterious case of the disappearing superblock anthony @ 2022-01-18 20:00 ` Phil Turmel 2022-01-18 20:11 ` anthony 2022-01-18 23:00 ` NeilBrown 1 sibling, 1 reply; 10+ messages in thread From: Phil Turmel @ 2022-01-18 20:00 UTC (permalink / raw) To: anthony, Linux RAID; +Cc: NeilBrown Hi Anthony, On 1/18/22 2:51 PM, anthony wrote: > You all know the story of how the cobbler's children are the worst shod, > I expect :-) Well, the superblock to my raid (containing /home, etc) has > disappeared, and I don't have a backup ... (well I do but it's now well > out of date). Glitch when writing something else. Who knows. > So, a new hard drive is on order, for backup ... > > Firstly, given that superblocks seem to disappear every now and then, > does anybody have any ideas for something that might help us track it > down? The 1.2 superblock is 4K into the device I believe? So if I copy > the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each > partition, that might help provide any clues as to what's happened to > it? What am I looking for? What is the superblock supposed to look like? Well, I've gone to the kernel code for the structure definition a few times, but never really got much out of it that mdadm -E didn't supply. Those seem to be missing from your mail, at least for the still-working drives.... Wait: they're gone from all three? > Secondly, once I've backed up my partitions, I obviously need to do > --create --assume-clean ... The only snag is, the array has been > rebuilt, so I doubt my data offset is the default. The history of the > array is simple. It's pretty new, so it will have been created with the > latest mdadm, and was originally a mirror of sda4 and sdb4. > > A new drive was added and the array upgraded to raid-5, and I BELIEVE > the order is sdc4, sda4, sdb1 - sdb1 being the new drive that was added. No mdadm -E at all? Never ran lsdrv and tucked away the output? > Am I safe to assume that sdc4 and sda4 will have the same data offset? > What is it likely to be? And seeing as it was the last added am I safe > to assume that sdb1 is the last drive, so all I have to do is see which > way round the other two should be? Not safe. But there's only six combinations. > At least the silver lining behind this, is that having been forced to > recover my own array, I'll understand it much better helping other > people recover theirs! > > Cheers, > Wol Phil ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: The mysterious case of the disappearing superblock ... 2022-01-18 20:00 ` Phil Turmel @ 2022-01-18 20:11 ` anthony 0 siblings, 0 replies; 10+ messages in thread From: anthony @ 2022-01-18 20:11 UTC (permalink / raw) To: Phil Turmel, Linux RAID; +Cc: NeilBrown On 18/01/2022 20:00, Phil Turmel wrote: >> Firstly, given that superblocks seem to disappear every now and then, >> does anybody have any ideas for something that might help us track it >> down? The 1.2 superblock is 4K into the device I believe? So if I copy >> the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each >> partition, that might help provide any clues as to what's happened to >> it? What am I looking for? What is the superblock supposed to look like? > > Well, I've gone to the kernel code for the structure definition a few > times, but never really got much out of it that mdadm -E didn't supply. > Well, I was hoping if I looked at it with a hex editor, and knew what I was looking for where, I might get a clue ... > Those seem to be missing from your mail, at least for the still-working > drives.... > > Wait: they're gone from all three? mdadm --examine ... No md superblock detected Ouch! And no, all the stuff I tell people they should do, I haven't ... I had so much grief with systemd, and dm-integrity, and getting stuff working, that I never got round to being sensible ... :-( Cheers, Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: The mysterious case of the disappearing superblock ... 2022-01-18 19:51 The mysterious case of the disappearing superblock anthony 2022-01-18 20:00 ` Phil Turmel @ 2022-01-18 23:00 ` NeilBrown 2022-01-19 8:52 ` PANIC OVER! " Wols Lists 2022-01-21 17:04 ` Roger Heflin 1 sibling, 2 replies; 10+ messages in thread From: NeilBrown @ 2022-01-18 23:00 UTC (permalink / raw) To: anthony; +Cc: Linux RAID, Phil Turmel On Wed, 19 Jan 2022, anthony wrote: > You all know the story of how the cobbler's children are the worst shod, > I expect :-) Well, the superblock to my raid (containing /home, etc) has > disappeared, and I don't have a backup ... (well I do but it's now well > out of date). > > So, a new hard drive is on order, for backup ... > > Firstly, given that superblocks seem to disappear every now and then, > does anybody have any ideas for something that might help us track it > down? The 1.2 superblock is 4K into the device I believe? So if I copy > the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each > partition, that might help provide any clues as to what's happened to > it? What am I looking for? What is the superblock supposed to look like? Yes, 4K offset. Yes, that dd command will get what you want it to. It hardly matters what the superblock should looks like, because it won't be there. The thing you want to know is: what is there? i.e. you see random bytes and need to guess what they mean, so you can guess where they came from. Best to post the "od -x" output and crowd-source. Are you sure the partition starts haven't changed? Was the array made of whole-devices or of partitions? If you want to find out if the superblock got moved, the maybe searching for the magic number is best. Look a the start of super1.c in mdadm. The first 4 bytes of the superblock are 0xa92b4efc little-endian. So: FC 4E 2B A9 The next 4 bytes as 01 00 00 00 ( the major version) Then the feature map - possibly 0. Then 4 zero bytes. If you see something that looks like that, it worth trying to point mdadm at it. Create a loop device over the it with an appropriate offset, and ask mdadm --example to look at it. > > Secondly, once I've backed up my partitions, I obviously need to do > --create --assume-clean ... The only snag is, the array has been > rebuilt, so I doubt my data offset is the default. The history of the > array is simple. It's pretty new, so it will have been created with the > latest mdadm, and was originally a mirror of sda4 and sdb4. > > A new drive was added and the array upgraded to raid-5, and I BELIEVE > the order is sdc4, sda4, sdb1 - sdb1 being the new drive that was added. > > Am I safe to assume that sdc4 and sda4 will have the same data offset? > What is it likely to be? And seeing as it was the last added am I safe > to assume that sdb1 is the last drive, so all I have to do is see which > way round the other two should be? I would suggest creating some sparse files the same size as the device, create loop devices over them, and creating the array in the sequence you remember doing it - using "--assume-clean" to avoid rebuilds that would make those sparse files less sparse. Then look at the metadata written and assume it is will similar to that which was written to your array. NeilBrown > > At least the silver lining behind this, is that having been forced to > recover my own array, I'll understand it much better helping other > people recover theirs! > > Cheers, > Wol > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* PANIC OVER! Re: The mysterious case of the disappearing superblock ... 2022-01-18 23:00 ` NeilBrown @ 2022-01-19 8:52 ` Wols Lists 2022-01-21 19:28 ` Nix 2022-01-21 19:42 ` Roger Heflin 2022-01-21 17:04 ` Roger Heflin 1 sibling, 2 replies; 10+ messages in thread From: Wols Lists @ 2022-01-19 8:52 UTC (permalink / raw) To: NeilBrown, anthony; +Cc: Linux RAID, Phil Turmel On 18/01/2022 23:00, NeilBrown wrote: >> Firstly, given that superblocks seem to disappear every now and then, >> does anybody have any ideas for something that might help us track it >> down? The 1.2 superblock is 4K into the device I believe? So if I copy >> the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each >> partition, that might help provide any clues as to what's happened to >> it? What am I looking for? What is the superblock supposed to look like? > Yes, 4K offset. Yes, that dd command will get what you want it to. > It hardly matters what the superblock should looks like, because it > won't be there. The thing you want to know is: what is there? > i.e. you see random bytes and need to guess what they mean, so you can > guess where they came from. > Best to post the "od -x" output and crowd-source. That's exactly what I was thinking. But I was thinking if it had been damaged rather than destroyed maybe stuff would have been recoverable. > > Are you sure the partition starts haven't changed? Was the array made of > whole-devices or of partitions? That's what I missed. I forgot my array was on top of dm-integrity, so although I think of it as sda4, sdb1, sdc4, they each in fact have an extra layer between them and the raid. Dunno what or why, but my systemd service that fires that up failed. status tells me it was killed after 2msec. So if that wasn't running, the integrity devices weren't there, and mdadm couldn't start the array. Oh well, the good thing is that backup drive is on its way. I'm planning to put plain lvm on it, and write a bunch of services that create backup volumes then do a overwrite-in-place rsync. So as I keep advising people, it does an incremental backup, but the COW volumes mean I have full backups. Cheers, Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PANIC OVER! Re: The mysterious case of the disappearing superblock ... 2022-01-19 8:52 ` PANIC OVER! " Wols Lists @ 2022-01-21 19:28 ` Nix 2022-01-21 19:37 ` Wols Lists 2022-01-21 19:55 ` Wols Lists 2022-01-21 19:42 ` Roger Heflin 1 sibling, 2 replies; 10+ messages in thread From: Nix @ 2022-01-21 19:28 UTC (permalink / raw) To: Wols Lists; +Cc: NeilBrown, anthony, Linux RAID, Phil Turmel On 19 Jan 2022, Wols Lists said: > Oh well, the good thing is that backup drive is on its way. I'm planning to put plain lvm on it, and write a bunch of services that > create backup volumes then do a overwrite-in-place rsync. So as I keep advising > people, it does an incremental backup, but the COW volumes mean I have full backups. rsync works by rename-then-rewrite on a whole-file basis (it doesn't just modify changed bits of files), so I'm afraid it's going to be terribly inefficient for large slightly-changed files, with many unchanging blocks CoWed nonetheless. The right way to do a deduplicating backup is to use a deduplicating backup system (borg, restic, bup, bupstash -- I swear by bup myself). There's a really good list here: <https://github.com/restic/others>. -- NULL && (void) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PANIC OVER! Re: The mysterious case of the disappearing superblock ... 2022-01-21 19:28 ` Nix @ 2022-01-21 19:37 ` Wols Lists 2022-01-21 19:55 ` Wols Lists 1 sibling, 0 replies; 10+ messages in thread From: Wols Lists @ 2022-01-21 19:37 UTC (permalink / raw) To: Nix; +Cc: NeilBrown, anthony, Linux RAID, Phil Turmel On 21/01/2022 19:28, Nix wrote: > rsync works by rename-then-rewrite on a whole-file basis (it doesn't > just modify changed bits of files), so I'm afraid it's going to be > terribly inefficient for large slightly-changed files, with many > unchanging blocks CoWed nonetheless. You're almost certainly right as the default. I need to investigate, but I'm sure I've been told it does have an option to only update-in-place stuff that's been changed. Specifically for updates like I want to do :-) Cheers, Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PANIC OVER! Re: The mysterious case of the disappearing superblock ... 2022-01-21 19:28 ` Nix 2022-01-21 19:37 ` Wols Lists @ 2022-01-21 19:55 ` Wols Lists 1 sibling, 0 replies; 10+ messages in thread From: Wols Lists @ 2022-01-21 19:55 UTC (permalink / raw) To: Nix; +Cc: NeilBrown, anthony, Linux RAID, Phil Turmel On 21/01/2022 19:28, Nix wrote: > rsync works by rename-then-rewrite on a whole-file basis (it doesn't > just modify changed bits of files), so I'm afraid it's going to be > terribly inefficient for large slightly-changed files, with many > unchanging blocks CoWed nonetheless. rsync --inplace Cheers, Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PANIC OVER! Re: The mysterious case of the disappearing superblock ... 2022-01-19 8:52 ` PANIC OVER! " Wols Lists 2022-01-21 19:28 ` Nix @ 2022-01-21 19:42 ` Roger Heflin 1 sibling, 0 replies; 10+ messages in thread From: Roger Heflin @ 2022-01-21 19:42 UTC (permalink / raw) To: Wols Lists; +Cc: NeilBrown, anthony, Linux RAID, Phil Turmel I do something with fewer moving parts for my backups: dstring=`date +%Y%m%d` DIRDATE=`date +%Y%m%d` MONTH=`date +%Y%m` DAY=`date +%d` DIR=/2TB-backup mkdir -p ${DIR}/backup/${MONTH}/${DIRDATE} /usr/bin/rsync -xab --backup-dir=/${DIR}/backup/${MONTH}/${DIRDATE}/ <listofdirstobcakup> --excludefile <excludefilename> ${DIR} >> ${DIR}/backup/backups-${dstring}.out If you run this every day/so often then you end up with any file that is changed being in the backup/month/day structure and can see all file changes going back as far as you have enough space for. And when you are running low on space you delete the directories associated with older dates, and you can easily see any old file changes. like so: ls -l ./backup/*/*/*/datafile03.txt ./randomuser/datafile03.txt -rw-r--r--. 1 randomuser randomuser 150 Jan 21 2016 ./backup/201802/20180205/randomuser/datafile03.txt -rw-r--r--. 1 randomuser randomuser 1019 Dec 19 2017 ./backup/201911/20191110/randomuser/datafile03.txt -rw-r--r--. 1 randomuser randomuser 1104 Sep 5 2019 ./backup/202201/20220101/randomuser/datafile03.txt -rw-r--r--. 1 randomuser randomuser 1874 Dec 7 11:06 ./randomuser/datafile03.txt On Fri, Jan 21, 2022 at 1:02 PM Wols Lists <antlists@youngman.org.uk> wrote: > > On 18/01/2022 23:00, NeilBrown wrote: > >> Firstly, given that superblocks seem to disappear every now and then, > >> does anybody have any ideas for something that might help us track it > >> down? The 1.2 superblock is 4K into the device I believe? So if I copy > >> the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each > >> partition, that might help provide any clues as to what's happened to > >> it? What am I looking for? What is the superblock supposed to look like? > > > Yes, 4K offset. Yes, that dd command will get what you want it to. > > It hardly matters what the superblock should looks like, because it > > won't be there. The thing you want to know is: what is there? > > i.e. you see random bytes and need to guess what they mean, so you can > > guess where they came from. > > Best to post the "od -x" output and crowd-source. > > That's exactly what I was thinking. But I was thinking if it had been > damaged rather than destroyed maybe stuff would have been recoverable. > > > > Are you sure the partition starts haven't changed? Was the array made of > > whole-devices or of partitions? > > That's what I missed. I forgot my array was on top of dm-integrity, so > although I think of it as sda4, sdb1, sdc4, they each in fact have an > extra layer between them and the raid. > > Dunno what or why, but my systemd service that fires that up failed. > status tells me it was killed after 2msec. > > So if that wasn't running, the integrity devices weren't there, and > mdadm couldn't start the array. > > Oh well, the good thing is that backup drive is on its way. I'm planning > to put plain lvm on it, and write a bunch of services that create backup > volumes then do a overwrite-in-place rsync. So as I keep advising > people, it does an incremental backup, but the COW volumes mean I have > full backups. > > Cheers, > Wol ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: The mysterious case of the disappearing superblock ... 2022-01-18 23:00 ` NeilBrown 2022-01-19 8:52 ` PANIC OVER! " Wols Lists @ 2022-01-21 17:04 ` Roger Heflin 1 sibling, 0 replies; 10+ messages in thread From: Roger Heflin @ 2022-01-21 17:04 UTC (permalink / raw) To: NeilBrown; +Cc: anthony, Linux RAID, Phil Turmel I would first look for the superblock magic Neil mentions. Usually in lost PV, FSes and other data volumes the issue is that something like the partition start moved and the magic is now either outside the given partition or not in the right location in the given partition. So you may want to take one disk and scan a wide range to see if you can find it. If you find it on that disk, now you have an idea where it may be on the others. Since the one is sda4 is that the last partition and if it is not the last are you missing any other partitions? I have never seen a disk that disappeared for no reason.I have always been able to find something pointing to what the human error was. A lot of being able to do that is the machines/teams I oversee have weekly data collects similar to sosreport on the active kernel tables/config files, so I can see that prior to reboot the partition table was not where it is after boot. And that is usually as simple as fixing the partition table to match where it was and then all is good. Even without that you can look for the header magic and from that tell where the partition table for that partition starts. I oversee a huge number of systems, with countless different hands of various experience levels doing work on those 20k systems so I have seen pretty much every variation of issue, and I have always been able to find evidence of a root cause. On Fri, Jan 21, 2022 at 5:13 AM NeilBrown <neilb@suse.de> wrote: > > On Wed, 19 Jan 2022, anthony wrote: > > You all know the story of how the cobbler's children are the worst shod, > > I expect :-) Well, the superblock to my raid (containing /home, etc) has > > disappeared, and I don't have a backup ... (well I do but it's now well > > out of date). > > > > So, a new hard drive is on order, for backup ... > > > > Firstly, given that superblocks seem to disappear every now and then, > > does anybody have any ideas for something that might help us track it > > down? The 1.2 superblock is 4K into the device I believe? So if I copy > > the first 8K ( dd if=/dev/sda4 of=sda4.img bs=4K count=2 ) of each > > partition, that might help provide any clues as to what's happened to > > it? What am I looking for? What is the superblock supposed to look like? > > Yes, 4K offset. Yes, that dd command will get what you want it to. > It hardly matters what the superblock should looks like, because it > won't be there. The thing you want to know is: what is there? > i.e. you see random bytes and need to guess what they mean, so you can > guess where they came from. > Best to post the "od -x" output and crowd-source. > > Are you sure the partition starts haven't changed? Was the array made of > whole-devices or of partitions? > > If you want to find out if the superblock got moved, the maybe searching > for the magic number is best. > Look a the start of super1.c in mdadm. The first 4 bytes of the > superblock are 0xa92b4efc little-endian. So: FC 4E 2B A9 > The next 4 bytes as 01 00 00 00 ( the major version) > Then the feature map - possibly 0. Then 4 zero bytes. > > If you see something that looks like that, it worth trying to point > mdadm at it. Create a loop device over the it with an appropriate > offset, and ask mdadm --example to look at it. > > > > > > Secondly, once I've backed up my partitions, I obviously need to do > > --create --assume-clean ... The only snag is, the array has been > > rebuilt, so I doubt my data offset is the default. The history of the > > array is simple. It's pretty new, so it will have been created with the > > latest mdadm, and was originally a mirror of sda4 and sdb4. > > > > A new drive was added and the array upgraded to raid-5, and I BELIEVE > > the order is sdc4, sda4, sdb1 - sdb1 being the new drive that was added. > > > > Am I safe to assume that sdc4 and sda4 will have the same data offset? > > What is it likely to be? And seeing as it was the last added am I safe > > to assume that sdb1 is the last drive, so all I have to do is see which > > way round the other two should be? > > I would suggest creating some sparse files the same size as the device, > create loop devices over them, and creating the array in the sequence > you remember doing it - using "--assume-clean" to avoid rebuilds that > would make those sparse files less sparse. > Then look at the metadata written and assume it is will similar to > that which was written to your array. > > NeilBrown > > > > > > At least the silver lining behind this, is that having been forced to > > recover my own array, I'll understand it much better helping other > > people recover theirs! > > > > Cheers, > > Wol > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-01-21 20:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-01-18 19:51 The mysterious case of the disappearing superblock anthony 2022-01-18 20:00 ` Phil Turmel 2022-01-18 20:11 ` anthony 2022-01-18 23:00 ` NeilBrown 2022-01-19 8:52 ` PANIC OVER! " Wols Lists 2022-01-21 19:28 ` Nix 2022-01-21 19:37 ` Wols Lists 2022-01-21 19:55 ` Wols Lists 2022-01-21 19:42 ` Roger Heflin 2022-01-21 17:04 ` Roger Heflin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).