* Re: Too many missing devices, writeable mount is not allowed
2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
@ 2015-09-25 22:43 ` Hugo Mills
2015-09-26 0:34 ` Duncan
2015-09-26 7:46 ` Roman Mamedov
2 siblings, 0 replies; 4+ messages in thread
From: Hugo Mills @ 2015-09-25 22:43 UTC (permalink / raw)
To: Marcel Bischoff; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3213 bytes --]
On Fri, Sep 25, 2015 at 11:45:44PM +0200, Marcel Bischoff wrote:
> Hello all,
>
> I have kind of a serious problem with one of my disks.
>
> The controller of one of my external drives died (WD Studio). The
> disk is alright though. I cracked open the case, got the drive out
> and connected it via a SATA-USB interface.
>
> Now, mounting the filesystem is not possible. Here's the message:
>
> $ btrfs fi show
> warning devid 3 not found already
> Label: none uuid: bd6090df-5179-490e-a5f8-8fbad433657f
> Total devices 3 FS bytes used 3.02TiB
> devid 1 size 596.17GiB used 532.03GiB path /dev/sdd
> devid 2 size 931.51GiB used 867.03GiB path /dev/sde
> *** Some devices missing
>
> Yes, I did bundle up three drives with very different sizes with the
> --single option on creating the file system.
OK, that's entirely possible. Not a problem in itself.
Now, asuming that the missing device is actually unrecoverable:
Since you've said it's single, you've lost some large fraction of
the file data on your filesystem, so this isn't going to end well in
any case. I hope you have good backups.
Was the metadata on the filesystem also single? If so, then I have
no hesitation in declaring this filesystem completely dead. If it was
RAID-1 (or RAID-5 or RAID-6), then the metadata should still be OK,
and you should be able to mount the FS with -o degraded. That will
give you a working (read-only) filesystem where some of the data will
return EIO where the data is missing. ddrescue should help you to
recover partial files for those cases where partial recovery is
acceptable.
But it might be recoverable, because...
> I have already asked for help on StackExchange but replies have been
> few. Now I thought people on this list, close to btrfs development
> may be able and willing to help. This would be so much appreciated.
>
> Here's the issue with lots of information and a record of what I/we
> have tried up until now: http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-devices-writeable-mount-is-not-allowed
I think Vincent Yu there has the right idea -- there's no
superblock showing up on the device in the place that's expected.
However, your update 3 shows that there is a superblock offset by 1
MiB (1114176-65600 = 1048576 = 1024*1024). So the recovery approach
here would be to construct a block device using an offset of 1 MiB
into /dev/sdc. dmsetup shoudld be able to do this, I think.
It's been a long time since I used dmsetup in anger, but something
like this may work:
# dmsetup load /dev/sdc --table "256 <N> linear /dev/mapper/sdc_offset 0"
where <N> is the number of sectors of /dev/sdc, less the 256 at the
start. I recommend reading the man page in detail and double-checking
that what I've got there is actually what's needed.
That will (I think) give you a device /dev/mapper/sdc_offset, which
should then show up in btfs fi show, and allow you to keep using the
FS.
Hugo.
--
Hugo Mills | If you see something, say nothing and drink to
hugo@... carfax.org.uk | forget
http://carfax.org.uk/ |
PGP: E2AB1DE4 | Welcome to Night Vale
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Too many missing devices, writeable mount is not allowed
2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
2015-09-25 22:43 ` Hugo Mills
@ 2015-09-26 0:34 ` Duncan
2015-09-26 7:46 ` Roman Mamedov
2 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2015-09-26 0:34 UTC (permalink / raw)
To: linux-btrfs
Marcel Bischoff posted on Fri, 25 Sep 2015 23:45:44 +0200 as excerpted:
> Hello all,
>
> I have kind of a serious problem with one of my disks.
>
> The controller of one of my external drives died (WD Studio). The disk
> is alright though. I cracked open the case, got the drive out and
> connected it via a SATA-USB interface.
>
> Now, mounting the filesystem is not possible. Here's the message:
>
> $ btrfs fi show
> warning devid 3 not found already
> Label: none uuid: bd6090df-5179-490e-a5f8-8fbad433657f
> Total devices 3 FS bytes used 3.02TiB
> devid 1 size 596.17GiB used 532.03GiB path /dev/sdd
> devid 2 size 931.51GiB used 867.03GiB path /dev/sde
> *** Some devices missing
>
> Yes, I did bundle up three drives with very different sizes with the
> --single option on creating the file system.
[FWIW, the additional comments on the stackexchange link didn't load for
me, presumably due to my default security settings. I could of course
fiddle with them to try to get it to work, but meh... So I only saw the
first three comments or so. As a result, some of this might be repeat
territory for you.]
?? --single doesn't appear to be a valid option for mkfs.btrfs. Did you
mean --metadata single and/or --data single? Which? Both?
If you were running single metadata, like raid0, you're effectively
declaring the filesystem dead and not worth the effort to fix if a device
dies and disappears. In which case you got what you requested, a multi-
device filesystem that dies when one of the devices dies. =:^) Tho it
may still be possible to revive the filesystem if you can get the bad
device recovered enough to get it to be pulled into the filesystem, again.
That's why metadata defaults to raid1 (tho btrfs raid1 is only pair-
mirror, even if there's more than two devices) on a multi-device
filesystem. So if you didn't specify --metadata single, then it should
be raid1 (unless the filesystem started as a single device and was never
balance-converted when the other devices were added).
--data single is the default on both single and multi-device filesystems,
however, which, given raid1 metadata, should at least let you recover
files that were 100% on the remaining devices. I'm assuming this, as it
would normally allow read-only mounting due to the second copy of the
metadata, but isn't going to allow writable mounting because with single
data, that would damage any possible chance of getting the data on the
missing device back. Chances of getting writable if the missing device
is as damaged as it could be are slim, but it's possible, if you can
bandaid it up. However, even then I'd consider it suspect and would
strongly recommend taking the chance you've been given to freshen your
backups, then at least btrfs device delete (or btrfs replace with another
device), if not blow away the filesystem and start over with a fresh
mkfs. Meanwhile, do a full write/read test (badblocks or the like) of
the bad device, before trying to use it again.
The other (remote) possibility is mixed-bg mode, combining data and
metadata in the same block-groups. But that's default only with 1 GiB
and under filesystems (and with filesystems converted from ext* with some
versions of btrfs-convert), so it's extremely unlikely unless you
specified that at mkfs.btrfs time, in which case mentioning that would
have been useful.
A btrfs filesystem df (or usage) should confirm both data and metadata
status. The filesystem must be mounted to run it, but read-only degraded
mount should do.
[More specific suggestions below.]
> I have already asked for help on StackExchange but replies have been
> few. Now I thought people on this list, close to btrfs development may
> be able and willing to help. This would be so much appreciated.
>
> Here's the issue with lots of information and a record of what I/we have
> tried up until now:
> http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-
devices-writeable-mount-is-not-allowed
OK, first the safe stuff, then some more risky possibilities...
1) Sysadmin's rule of backups: If you value data, by definition, you
have it backed up. If it's not backed up, by definition, you definitely
value it less than the time and resources saved by not doing the backups,
not withstanding any claims to the contrary. (And by the same token, a
would-be backup that hasn't been tested restorable isn't yet a backup, as
the job isn't complete until you know it can be restored.)
1a) Btrfs addendum: Because btrfs is still a maturing filesystem not yet
fully stabilized, the above backup rule applies even more strongly than
it does to a more mature filesystem.
So in worst-case, just blow away the existing filesystem and start over,
either restoring from those backups, or happy in the knowledge that since
you didn't have them, you self-evidently didn't value the data on the
filesystem, and can go on without it.[1]
2) Since you can mount read-only, I'll guess your metadata is raid1, with
single data. Which (as mentioned above) means you should at least have
access to the files that didn't have any extents on the missing device.
If you don't yet have backups, now is your best chance to salvage what
you can by doing a backup of the files you can read, while you can. From
the looks of that btrfs fi show, you might be able to save a TiB worth,
out of the three TiB data it says you had. Depending on fragmentation,
it could be much less than that, but in any case, might as well retrieve
what you can while you know you can.
That's the end of the easy/safe stuff. If you didn't have backups and
didn't choose to backup what you could still get at above while you can
still mount read-only at least, the below risks losing access to what you
have now, so I'd strongly urge you to reconsider before proceeding.
3) Try btrfs-show-super -a (all superblocks, there are three copies, the
first of which is normally used but which appears to be blank in your
case) on the bad device.
With luck, it'll reveal at least one intact superblock. If it does, you
can use btrfs rescue super-recover to try to restore the first/primary
superblock.
But even with a restored superblock, there's a good chance the rest of
the filesystem on that device is too mangled to work. There's btrfs
rescue chunk-recover, and a couple btrfs check --repair options, but I've
never had to use them, and would thus be pretty much shooting in the dark
trying to use them myself, so won't attempt to tell you how to use them.
Bottom line, sysadmin's backups rule above, if you value the data, it's
backed up, if it's not backed up, you self-evidently don't value the
data, despite claims to the contrary. And if you want your btrfs multi-
device filesystem to work after loss of a device, use a raid mode that
will allow you to recover using either redundancy (raid1,10) or parity
(raid5,6), for both data and metadata. Because using single or (worse)
raid1, even for just data with metadata having better protection,
basically means you're willing to simply scrap the filesystem and restore
from backups if you lose a device. And as anybody who has run raid0 for
long can tell you, losing one device out of many is a LOT more likely
than losing the only device in a single-device setup. Yes, it's
sometimes possible to recover still, especially if the metadata was
parity/redundancy protected, but you can't count on it, and even if so,
it's a huge hassle, such that if you have backups it's generally easier
just to blow it away and restore from the backups, and if not, well,
since you're defining the value of that data as pretty low by not having
those backups, no big loss, meaning it's still often easier to simply
blow it away and start over.
---
[1] Seriously! Big-picture, there are more important things in life than
computer data. My neighbor had his house burn down a couple months ago.
Got out with a pair of shorts he was wearing to bed, not so much as ID to
help him get started again! I don't know about you, but while losing un-
backed-up-data isn't pleasant, I'd a whole lot rather be picking up my
life after some lost data than picking it up after losing everything in a
fire, as he is! But he counts himself lucky getting out alive and not
even burned, as a lot of people in bed asleep when the fire starts don't
make it. As I said, big picture, a bit of data on a lost filesystem is
downright trivial compared to that!
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread