Error mounting multi-device fs after restart

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Error mounting multi-device fs after restart
@ 2011-02-07 19:46 Diwaker Gupta
  2011-02-08 20:25 ` Diwaker Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: Diwaker Gupta @ 2011-02-07 19:46 UTC (permalink / raw)
  To: linux-btrfs

Hello,

We have 10 1-TB drives hosting a multi-device btrfs filesystem,
configured with raid1+0 for both data and metadata. After some package
upgrades over the weekend I restarted the system and it did not come
back up afterwards. I booted using a rescue disk and ran btrfsck (next
branch from Chris's git repository). Unfortunately btrfsck aborts on
every single drive with errors like this:

parent transid verify failed on 12050980864 wanted 377535 found 128327
parent transid verify failed on 12074557440 wanted 422817 found 126691
parent transid verify failed on 12057542656 wanted 422786 found 126395
parent transid verify failed on 12075556864 wanted 423004 found 126691
bad block 12095545344
parent transid verify failed on 12079190016 wanted 422826 found 105147
leaf parent key incorrect 12097544192
bad block 12097544192

I'm running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel:
2.6.35-23-server

Attempting to mount the filesystem blocks indefinitely, with
/var/log/messages getting filled with the 'parent transid verify'
errors.

IIUC the 'btrfs-select-super' utility is not really helpful in our
case. At this point, my only priority is to somehow rescue the data
from the filesystem. I'd really appreciate if someone on the list
could help me out.

I'm happy to provide any other information required. Please CC me on
replies as I'm not subscribed to the list.

Thanks,
Diwaker

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-07 19:46 Error mounting multi-device fs after restart Diwaker Gupta
@ 2011-02-08 20:25 ` Diwaker Gupta
  2011-02-08 21:04   ` Felix Blanke
  2011-02-08 21:51   ` Hubert Kario
  0 siblings, 2 replies; 7+ messages in thread
From: Diwaker Gupta @ 2011-02-08 20:25 UTC (permalink / raw)
  To: linux-btrfs

Help, anyone? Sorry for the quick repost, but there was some important
data on that filesystem that I don't have a backup for. I'd really
appreciate any pointers that can help recover the data.

Searching through the archives, it seems others have faced similar
issues due to sudden power outages. AFAIK we did not have any power
outage.

I've run badblocks on all of the 10 drives and three of them had a few
bad blocks. I'm inclined to rule out bad disks as the root cause. In
any case, isn't this exactly the kind of situation btrfs should
protect users against?

A 'btrfsck' aborts on all of the drives. I've tried running it with
'-s 1' as well as '-s 2' with no success. Does that mean that none of
the drives have any copy of the superblock intact?

Diwaker

On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com> wrote:
> Hello,
>
> We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> configured with raid1+0 for both data and metadata. After some package
> upgrades over the weekend I restarted the system and it did not come
> back up afterwards. I booted using a rescue disk and ran btrfsck (next
> branch from Chris's git repository). Unfortunately btrfsck aborts on
> every single drive with errors like this:
>
> parent transid verify failed on 12050980864 wanted 377535 found 128327
> parent transid verify failed on 12074557440 wanted 422817 found 126691
> parent transid verify failed on 12057542656 wanted 422786 found 126395
> parent transid verify failed on 12075556864 wanted 423004 found 126691
> bad block 12095545344
> parent transid verify failed on 12079190016 wanted 422826 found 105147
> leaf parent key incorrect 12097544192
> bad block 12097544192
>
> I'm running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel:
> 2.6.35-23-server
>
> Attempting to mount the filesystem blocks indefinitely, with
> /var/log/messages getting filled with the 'parent transid verify'
> errors.
>
> IIUC the 'btrfs-select-super' utility is not really helpful in our
> case. At this point, my only priority is to somehow rescue the data
> from the filesystem. I'd really appreciate if someone on the list
> could help me out.
>
> I'm happy to provide any other information required. Please CC me on
> replies as I'm not subscribed to the list.
>
> Thanks,
> Diwaker
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-08 20:25 ` Diwaker Gupta
@ 2011-02-08 21:04   ` Felix Blanke
  2011-02-08 21:51   ` Hubert Kario
  1 sibling, 0 replies; 7+ messages in thread
From: Felix Blanke @ 2011-02-08 21:04 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: linux-btrfs

I can't help you with your problem, but:

It is a really really really bad idea to store data without a backup on a filesystem
that is still in some kind of alpha stadium (don't understand me wrong, I like btrfs
and you guys do a really good job. But the lack of a working fsck keeps btrfs in that
stadium in my eyes).
I can't believe there are ppl out there who do that stupid things :/


Felix


On 08. February 2011 - 12:25, Diwaker Gupta wrote:
> Date:	Tue, 8 Feb 2011 12:25:55 -0800
> From: Diwaker Gupta <diwaker@maginatics.com>
> To: linux-btrfs@vger.kernel.org
> Subject: Re: Error mounting multi-device fs after restart
> 
> Help, anyone? Sorry for the quick repost, but there was some important
> data on that filesystem that I don't have a backup for. I'd really
> appreciate any pointers that can help recover the data.
> 
> Searching through the archives, it seems others have faced similar
> issues due to sudden power outages. AFAIK we did not have any power
> outage.
> 
> I've run badblocks on all of the 10 drives and three of them had a few
> bad blocks. I'm inclined to rule out bad disks as the root cause. In
> any case, isn't this exactly the kind of situation btrfs should
> protect users against?
> 
> A 'btrfsck' aborts on all of the drives. I've tried running it with
> '-s 1' as well as '-s 2' with no success. Does that mean that none of
> the drives have any copy of the superblock intact?
> 
> Diwaker
> 
> On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com> wrote:
> > Hello,
> >
> > We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> > configured with raid1+0 for both data and metadata. After some package
> > upgrades over the weekend I restarted the system and it did not come
> > back up afterwards. I booted using a rescue disk and ran btrfsck (next
> > branch from Chris's git repository). Unfortunately btrfsck aborts on
> > every single drive with errors like this:
> >
> > parent transid verify failed on 12050980864 wanted 377535 found 128327
> > parent transid verify failed on 12074557440 wanted 422817 found 126691
> > parent transid verify failed on 12057542656 wanted 422786 found 126395
> > parent transid verify failed on 12075556864 wanted 423004 found 126691
> > bad block 12095545344
> > parent transid verify failed on 12079190016 wanted 422826 found 105147
> > leaf parent key incorrect 12097544192
> > bad block 12097544192
> >
> > I'm running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel:
> > 2.6.35-23-server
> >
> > Attempting to mount the filesystem blocks indefinitely, with
> > /var/log/messages getting filled with the 'parent transid verify'
> > errors.
> >
> > IIUC the 'btrfs-select-super' utility is not really helpful in our
> > case. At this point, my only priority is to somehow rescue the data
> > from the filesystem. I'd really appreciate if someone on the list
> > could help me out.
> >
> > I'm happy to provide any other information required. Please CC me on
> > replies as I'm not subscribed to the list.
> >
> > Thanks,
> > Diwaker
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-08 20:25 ` Diwaker Gupta
  2011-02-08 21:04   ` Felix Blanke
@ 2011-02-08 21:51   ` Hubert Kario
  2011-02-08 21:59     ` Diwaker Gupta
  1 sibling, 1 reply; 7+ messages in thread
From: Hubert Kario @ 2011-02-08 21:51 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: linux-btrfs

On Tuesday 08 of February 2011 21:25:55 Diwaker Gupta wrote:
> Searching through the archives, it seems others have faced similar
> issues due to sudden power outages. AFAIK we did not have any power
> outage.

SysRq+B will have the same effect, OOPS or BUG will have similar effect

> I've run badblocks on all of the 10 drives and three of them had a fe=
w
> bad blocks. I'm inclined to rule out bad disks as the root cause. In
> any case, isn't this exactly the kind of situation btrfs should
> protect users against?

And in the end it will, unfortunately at the moment it will only report=
 the=20
read data doesn't match stored checksum in the dmesg. If you have redun=
dacy in=20
place it will try to read the other copy of data. That's it.

As a side note, if a drive made in the past 5 years has badblocks detec=
table=20
by `badblocks` it's long gone, probably it was silently corrupting data=
 for a=20
long time now.

> A 'btrfsck' aborts on all of the drives. I've tried running it with
> '-s 1' as well as '-s 2' with no success. Does that mean that none of
> the drives have any copy of the superblock intact?

-s 1 and -s 2 will try to read backup copies of superblock, not superbl=
ock=20
copies on other devices. Regular code should perform the latter by itse=
lf.
=20
> Diwaker
>=20
> On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.co=
m>=20
wrote:
> > Hello,
> >=20
> > We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> > configured with raid1+0 for both data and metadata. After some pack=
age
> > upgrades over the weekend I restarted the system and it did not com=
e
> > back up afterwards. I booted using a rescue disk and ran btrfsck (n=
ext
> > branch from Chris's git repository). Unfortunately btrfsck aborts o=
n
> > every single drive with errors like this:
> >=20
> > parent transid verify failed on 12050980864 wanted 377535 found 128=
327
> > parent transid verify failed on 12074557440 wanted 422817 found 126=
691
> > parent transid verify failed on 12057542656 wanted 422786 found 126=
395
> > parent transid verify failed on 12075556864 wanted 423004 found 126=
691
> > bad block 12095545344
> > parent transid verify failed on 12079190016 wanted 422826 found 105=
147
> > leaf parent key incorrect 12097544192
> > bad block 12097544192
> >=20
> > I'm running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel:
> > 2.6.35-23-server
> >=20
> > Attempting to mount the filesystem blocks indefinitely, with
> > /var/log/messages getting filled with the 'parent transid verify'
> > errors.

Define *indefinitely*.
Are the drives not working?
If the drives are working, have you tried waiting 2-3 days, possibly lo=
nger?
10TB is a *lot* of data

> >=20
> > IIUC the 'btrfs-select-super' utility is not really helpful in our
> > case. At this point, my only priority is to somehow rescue the data
> > from the filesystem. I'd really appreciate if someone on the list
> > could help me out.

getting the FS mountable is your best bet at the moment (apart from div=
ing in=20
the drive with dd in one hand and hexdump in the other...)

> >=20
> > I'm happy to provide any other information required. Please CC me o=
n
> > replies as I'm not subscribed to the list.
> >=20
> > Thanks,
> > Diwaker
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-08 21:51   ` Hubert Kario
@ 2011-02-08 21:59     ` Diwaker Gupta
  2011-02-08 22:04       ` cwillu
  0 siblings, 1 reply; 7+ messages in thread
From: Diwaker Gupta @ 2011-02-08 21:59 UTC (permalink / raw)
  To: Hubert Kario; +Cc: linux-btrfs

> Define *indefinitely*.

Meaning the messages continued for as long as the system was under obse=
rvation.

> Are the drives not working?

I believe they are. Working in the sense that I can read off data
using 'dd', I can inspect partition tables etc.

> If the drives are working, have you tried waiting 2-3 days, possibly =
longer?
> 10TB is a *lot* of data

The system was running overnight when I first hit the problem. On
subsequent reboots, I've only waited less than half an hour. Usually
the mount is instantaneous, so I wasn't sure if waiting would help at
all. The error messages did not indicate that the system could recover
at that stage. If there's even a slight chance that the fs would
eventually mount, I'm happy to let it run for a day or two. Note that
if I mount using the 'degraded' option, the mount succeeds but
subsequent attempts to read the data fail.

> getting the FS mountable is your best bet at the moment (apart from d=
iving in
> the drive with dd in one hand and hexdump in the other...)

sigh, I feared as much.

>> >
>> > I'm happy to provide any other information required. Please CC me =
on
>> > replies as I'm not subscribed to the list.
>> >
>> > Thanks,
>> > Diwaker
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrf=
s" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
> --
> Hubert Kario
> QBS - Quality Business Software
> 02-656 Warszawa, ul. Ksawer=F3w 30/85
> tel. +48 (22) 646-61-51, 646-74-24
> www.qbs.com.pl
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-08 21:59     ` Diwaker Gupta
@ 2011-02-08 22:04       ` cwillu
  2011-02-08 22:11         ` Diwaker Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: cwillu @ 2011-02-08 22:04 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: Hubert Kario, linux-btrfs

On Tue, Feb 8, 2011 at 3:59 PM, Diwaker Gupta <diwaker@maginatics.com> wrote:
>> Define *indefinitely*.
>
> Meaning the messages continued for as long as the system was under observation.
>
>> Are the drives not working?
>
> I believe they are. Working in the sense that I can read off data
> using 'dd', I can inspect partition tables etc.
>
>> If the drives are working, have you tried waiting 2-3 days, possibly longer?
>> 10TB is a *lot* of data
>
> The system was running overnight when I first hit the problem. On
> subsequent reboots, I've only waited less than half an hour. Usually
> the mount is instantaneous, so I wasn't sure if waiting would help at
> all. The error messages did not indicate that the system could recover
> at that stage. If there's even a slight chance that the fs would
> eventually mount, I'm happy to let it run for a day or two. Note that
> if I mount using the 'degraded' option, the mount succeeds but
> subsequent attempts to read the data fail.

Huh.  How do those attempts fail?

Try mounting ro, or degraded,ro, and reading the data off.  That
worked for me recently on a broken btrfs raid10 (and didn't on another
one, so your mileage may vary).

There's also the perpetually imminent fsck development which might save the day.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Error mounting multi-device fs after restart
  2011-02-08 22:04       ` cwillu
@ 2011-02-08 22:11         ` Diwaker Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Diwaker Gupta @ 2011-02-08 22:11 UTC (permalink / raw)
  To: cwillu; +Cc: Hubert Kario, linux-btrfs

>> The system was running overnight when I first hit the problem. On
>> subsequent reboots, I've only waited less than half an hour. Usually
>> the mount is instantaneous, so I wasn't sure if waiting would help a=
t
>> all. The error messages did not indicate that the system could recov=
er
>> at that stage. If there's even a slight chance that the fs would
>> eventually mount, I'm happy to let it run for a day or two. Note tha=
t
>> if I mount using the 'degraded' option, the mount succeeds but
>> subsequent attempts to read the data fail.
>
> Huh. =A0How do those attempts fail?

Same way when I try to do a regular mount: the read blocks and I see a
continuous stream of the 'parent transid verify failed' messages in
dmesg.

> Try mounting ro, or degraded,ro, and reading the data off. =A0That
> worked for me recently on a broken btrfs raid10 (and didn't on anothe=
r
> one, so your mileage may vary).

Ok I'll give these a shot. I still don't quite understand what it
means if btrfsck aborts; if it can't find the superblock on any of the
drives, how would btrfs ever be able to mount the fs?

Diwaker
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-08 22:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-07 19:46 Error mounting multi-device fs after restart Diwaker Gupta
2011-02-08 20:25 ` Diwaker Gupta
2011-02-08 21:04   ` Felix Blanke
2011-02-08 21:51   ` Hubert Kario
2011-02-08 21:59     ` Diwaker Gupta
2011-02-08 22:04       ` cwillu
2011-02-08 22:11         ` Diwaker Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).