Too many missing devices, writeable mount is not allowed

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Too many missing devices, writeable mount is not allowed
@ 2015-09-25 21:45 Marcel Bischoff
  2015-09-25 22:43 ` Hugo Mills
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Marcel Bischoff @ 2015-09-25 21:45 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

I have kind of a serious problem with one of my disks.

The controller of one of my external drives died (WD Studio). The disk 
is alright though. I cracked open the case, got the drive out and 
connected it via a SATA-USB interface.

Now, mounting the filesystem is not possible. Here's the message:

$ btrfs fi show
warning devid 3 not found already
Label: none  uuid: bd6090df-5179-490e-a5f8-8fbad433657f
        Total devices 3 FS bytes used 3.02TiB
        devid    1 size 596.17GiB used 532.03GiB path /dev/sdd
        devid    2 size 931.51GiB used 867.03GiB path /dev/sde
        *** Some devices missing

Yes, I did bundle up three drives with very different sizes with the 
--single option on creating the file system.

I have already asked for help on StackExchange but replies have been 
few. Now I thought people on this list, close to btrfs development may 
be able and willing to help. This would be so much appreciated.

Here's the issue with lots of information and a record of what I/we have 
tried up until now: 
http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-devices-writeable-mount-is-not-allowed

Thanks for your time and consideration!

Best,
Marcel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Too many missing devices, writeable mount is not allowed
  2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
@ 2015-09-25 22:43 ` Hugo Mills
  2015-09-26  0:34 ` Duncan
  2015-09-26  7:46 ` Roman Mamedov
  2 siblings, 0 replies; 4+ messages in thread
From: Hugo Mills @ 2015-09-25 22:43 UTC (permalink / raw)
  To: Marcel Bischoff; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3213 bytes --]

On Fri, Sep 25, 2015 at 11:45:44PM +0200, Marcel Bischoff wrote:
> Hello all,
> 
> I have kind of a serious problem with one of my disks.
> 
> The controller of one of my external drives died (WD Studio). The
> disk is alright though. I cracked open the case, got the drive out
> and connected it via a SATA-USB interface.
> 
> Now, mounting the filesystem is not possible. Here's the message:
> 
> $ btrfs fi show
> warning devid 3 not found already
> Label: none  uuid: bd6090df-5179-490e-a5f8-8fbad433657f
>        Total devices 3 FS bytes used 3.02TiB
>        devid    1 size 596.17GiB used 532.03GiB path /dev/sdd
>        devid    2 size 931.51GiB used 867.03GiB path /dev/sde
>        *** Some devices missing
> 
> Yes, I did bundle up three drives with very different sizes with the
> --single option on creating the file system.

   OK, that's entirely possible. Not a problem in itself.

   Now, asuming that the missing device is actually unrecoverable:

   Since you've said it's single, you've lost some large fraction of
the file data on your filesystem, so this isn't going to end well in
any case. I hope you have good backups.

   Was the metadata on the filesystem also single? If so, then I have
no hesitation in declaring this filesystem completely dead. If it was
RAID-1 (or RAID-5 or RAID-6), then the metadata should still be OK,
and you should be able to mount the FS with -o degraded. That will
give you a working (read-only) filesystem where some of the data will
return EIO where the data is missing. ddrescue should help you to
recover partial files for those cases where partial recovery is
acceptable.

   But it might be recoverable, because...

> I have already asked for help on StackExchange but replies have been
> few. Now I thought people on this list, close to btrfs development
> may be able and willing to help. This would be so much appreciated.
> 
> Here's the issue with lots of information and a record of what I/we
> have tried up until now: http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-devices-writeable-mount-is-not-allowed

   I think Vincent Yu there has the right idea -- there's no
superblock showing up on the device in the place that's expected.
However, your update 3 shows that there is a superblock offset by 1
MiB (1114176-65600 = 1048576 = 1024*1024). So the recovery approach
here would be to construct a block device using an offset of 1 MiB
into /dev/sdc. dmsetup shoudld be able to do this, I think.

   It's been a long time since I used dmsetup in anger, but something
like this may work:

# dmsetup load /dev/sdc --table "256 <N> linear /dev/mapper/sdc_offset 0"

where <N> is the number of sectors of /dev/sdc, less the 256 at the
start. I recommend reading the man page in detail and double-checking
that what I've got there is actually what's needed.

   That will (I think) give you a device /dev/mapper/sdc_offset, which
should then show up in btfs fi show, and allow you to keep using the
FS.

   Hugo.

-- 
Hugo Mills             | If you see something, say nothing and drink to
hugo@... carfax.org.uk | forget
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                 Welcome to Night Vale

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Too many missing devices, writeable mount is not allowed
  2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
  2015-09-25 22:43 ` Hugo Mills
@ 2015-09-26  0:34 ` Duncan
  2015-09-26  7:46 ` Roman Mamedov
  2 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2015-09-26  0:34 UTC (permalink / raw)
  To: linux-btrfs

Marcel Bischoff posted on Fri, 25 Sep 2015 23:45:44 +0200 as excerpted:

> Hello all,
> 
> I have kind of a serious problem with one of my disks.
> 
> The controller of one of my external drives died (WD Studio). The disk
> is alright though. I cracked open the case, got the drive out and
> connected it via a SATA-USB interface.
> 
> Now, mounting the filesystem is not possible. Here's the message:
> 
> $ btrfs fi show
> warning devid 3 not found already
> Label: none  uuid: bd6090df-5179-490e-a5f8-8fbad433657f
>         Total devices 3 FS bytes used 3.02TiB
>         devid    1 size 596.17GiB used 532.03GiB path /dev/sdd
>         devid    2 size 931.51GiB used 867.03GiB path /dev/sde
>         *** Some devices missing
> 
> Yes, I did bundle up three drives with very different sizes with the
> --single option on creating the file system.

[FWIW, the additional comments on the stackexchange link didn't load for 
me, presumably due to my default security settings.  I could of course 
fiddle with them to try to get it to work, but meh...  So I only saw the 
first three comments or so.  As a result, some of this might be repeat 
territory for you.]

?? --single doesn't appear to be a valid option for mkfs.btrfs.  Did you 
mean --metadata single and/or --data single?  Which?  Both?

If you were running single metadata, like raid0, you're effectively 
declaring the filesystem dead and not worth the effort to fix if a device 
dies and disappears.  In which case you got what you requested, a multi-
device filesystem that dies when one of the devices dies. =:^)  Tho it 
may still be possible to revive the filesystem if you can get the bad 
device recovered enough to get it to be pulled into the filesystem, again.

That's why metadata defaults to raid1 (tho btrfs raid1 is only pair-
mirror, even if there's more than two devices) on a multi-device 
filesystem.  So if you didn't specify --metadata single, then it should 
be raid1 (unless the filesystem started as a single device and was never 
balance-converted when the other devices were added).

--data single is the default on both single and multi-device filesystems, 
however, which, given raid1 metadata, should at least let you recover 
files that were 100% on the remaining devices.  I'm assuming this, as it 
would normally allow read-only mounting due to the second copy of the 
metadata, but isn't going to allow writable mounting because with single 
data, that would damage any possible chance of getting the data on the 
missing device back.  Chances of getting writable if the missing device 
is as damaged as it could be are slim, but it's possible, if you can 
bandaid it up.  However, even then I'd consider it suspect and would 
strongly recommend taking the chance you've been given to freshen your 
backups, then at least btrfs device delete (or btrfs replace with another 
device), if not blow away the filesystem and start over with a fresh 
mkfs.  Meanwhile, do a full write/read test (badblocks or the like) of 
the bad device, before trying to use it again.

The other (remote) possibility is mixed-bg mode, combining data and 
metadata in the same block-groups.  But that's default only with 1 GiB 
and under filesystems (and with filesystems converted from ext* with some 
versions of btrfs-convert), so it's extremely unlikely unless you 
specified that at mkfs.btrfs time, in which case mentioning that would 
have been useful.

A btrfs filesystem df (or usage) should confirm both data and metadata 
status.  The filesystem must be mounted to run it, but read-only degraded 
mount should do.

[More specific suggestions below.]

> I have already asked for help on StackExchange but replies have been
> few. Now I thought people on this list, close to btrfs development may
> be able and willing to help. This would be so much appreciated.
> 
> Here's the issue with lots of information and a record of what I/we have
> tried up until now:
> http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-
devices-writeable-mount-is-not-allowed

OK, first the safe stuff, then some more risky possibilities...

1) Sysadmin's rule of backups:  If you value data, by definition, you 
have it backed up.  If it's not backed up, by definition, you definitely 
value it less than the time and resources saved by not doing the backups, 
not withstanding any claims to the contrary.  (And by the same token, a 
would-be backup that hasn't been tested restorable isn't yet a backup, as 
the job isn't complete until you know it can be restored.)

1a) Btrfs addendum: Because btrfs is still a maturing filesystem not yet 
fully stabilized, the above backup rule applies even more strongly than 
it does to a more mature filesystem.

So in worst-case, just blow away the existing filesystem and start over, 
either restoring from those backups, or happy in the knowledge that since 
you didn't have them, you self-evidently didn't value the data on the 
filesystem, and can go on without it.[1]

2) Since you can mount read-only, I'll guess your metadata is raid1, with 
single data.  Which (as mentioned above) means you should at least have 
access to the files that didn't have any extents on the missing device.  
If you don't yet have backups, now is your best chance to salvage what 
you can by doing a backup of the files you can read, while you can.  From 
the looks of that btrfs fi show, you might be able to save a TiB worth, 
out of the three TiB data it says you had.  Depending on fragmentation, 
it could be much less than that, but in any case, might as well retrieve 
what you can while you know you can.

That's the end of the easy/safe stuff.  If you didn't have backups and 
didn't choose to backup what you could still get at above while you can 
still mount read-only at least, the below risks losing access to what you 
have now, so I'd strongly urge you to reconsider before proceeding.

3) Try btrfs-show-super -a (all superblocks, there are three copies, the 
first of which is normally used but which appears to be blank in your 
case) on the bad device.

With luck, it'll reveal at least one intact superblock.  If it does, you 
can use btrfs rescue super-recover to try to restore the first/primary 
superblock.

But even with a restored superblock, there's a good chance the rest of 
the filesystem on that device is too mangled to work.  There's btrfs 
rescue chunk-recover, and a couple btrfs check --repair options, but I've 
never had to use them, and would thus be pretty much shooting in the dark 
trying to use them myself, so won't attempt to tell you how to use them.

Bottom line, sysadmin's backups rule above, if you value the data, it's 
backed up, if it's not backed up, you self-evidently don't value the 
data, despite claims to the contrary.  And if you want your btrfs multi-
device filesystem to work after loss of a device, use a raid mode that 
will allow you to recover using either redundancy (raid1,10) or parity 
(raid5,6), for both data and metadata.  Because using single or (worse) 
raid1, even for just data with metadata having better protection, 
basically means you're willing to simply scrap the filesystem and restore 
from backups if you lose a device.  And as anybody who has run raid0 for 
long can tell you, losing one device out of many is a LOT more likely 
than losing the only device in a single-device setup.  Yes, it's 
sometimes possible to recover still, especially if the metadata was 
parity/redundancy protected, but you can't count on it, and even if so, 
it's a huge hassle, such that if you have backups it's generally easier 
just to blow it away and restore from the backups, and if not, well, 
since you're defining the value of that data as pretty low by not having 
those backups, no big loss, meaning it's still often easier to simply 
blow it away and start over.

---
[1] Seriously!  Big-picture, there are more important things in life than 
computer data.  My neighbor had his house burn down a couple months ago.  
Got out with a pair of shorts he was wearing to bed, not so much as ID to 
help him get started again!  I don't know about you, but while losing un-
backed-up-data isn't pleasant, I'd a whole lot rather be picking up my 
life after some lost data than picking it up after losing everything in a 
fire, as he is! But he counts himself lucky getting out alive and not 
even burned, as a lot of people in bed asleep when the fire starts don't 
make it.  As I said, big picture, a bit of data on a lost filesystem is 
downright trivial compared to that!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Too many missing devices, writeable mount is not allowed
  2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
  2015-09-25 22:43 ` Hugo Mills
  2015-09-26  0:34 ` Duncan
@ 2015-09-26  7:46 ` Roman Mamedov
  2 siblings, 0 replies; 4+ messages in thread
From: Roman Mamedov @ 2015-09-26  7:46 UTC (permalink / raw)
  To: Marcel Bischoff; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]

On Fri, 25 Sep 2015 23:45:44 +0200
Marcel Bischoff <marcel@herrbischoff.com> wrote:

> Hello all,
> 
> I have kind of a serious problem with one of my disks.
> 
> The controller of one of my external drives died (WD Studio). The disk 
> is alright though. I cracked open the case, got the drive out and 
> connected it via a SATA-USB interface.
> 
> Now, mounting the filesystem is not possible. Here's the message:
> 
> $ btrfs fi show
> warning devid 3 not found already
> Label: none  uuid: bd6090df-5179-490e-a5f8-8fbad433657f
>         Total devices 3 FS bytes used 3.02TiB
>         devid    1 size 596.17GiB used 532.03GiB path /dev/sdd
>         devid    2 size 931.51GiB used 867.03GiB path /dev/sde
>         *** Some devices missing
> 
> Yes, I did bundle up three drives with very different sizes with the 
> --single option on creating the file system.
> 
> I have already asked for help on StackExchange but replies have been 
> few. Now I thought people on this list, close to btrfs development may 
> be able and willing to help. This would be so much appreciated.
> 
> Here's the issue with lots of information and a record of what I/we have 
> tried up until now: 
> http://unix.stackexchange.com/questions/231174/btrfs-too-many-missing-devices-writeable-mount-is-not-allowed

What I find confusing about that post, is ...okay, sdd and sde are in the
filesystem alright. But which drive is the problematic one? You never
explicitly mention that. Then you proceed to post xxd dumps of sdb(!), and
also "strings | grep" and partition listing of sdc(!!!). So which is it, sdb
or sdc, or are you unsure at this point.

Also post stuff directly to the mailing list, as websites go away with time,
whereas the mail archives can be more resilient. (Not to mention asking things
like these on "stackoverflow" is a bit like expecting a useful answer on
Ubuntu Forums).

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-26  7:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-25 21:45 Too many missing devices, writeable mount is not allowed Marcel Bischoff
2015-09-25 22:43 ` Hugo Mills
2015-09-26  0:34 ` Duncan
2015-09-26  7:46 ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).