From: Gregory L Shomo <greg@techsquare.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: parent transid troubles
Date: Wed, 20 Apr 2011 16:53:29 -0400 [thread overview]
Message-ID: <sa2vcy87igm.fsf@techsquare.com> (raw)
In-Reply-To: <1303308238-sup-3555@think> (message from Chris Mason on Wed, 20 Apr 2011 10:04:52 -0400)
Chris Mason <chris.mason@oracle.com> writes:
> Excerpts from Gregory L Shomo's message of 2011-04-20 09:20:20 -0400:
>> Chris Mason <chris.mason@oracle.com> writes:
>>
>> > Excerpts from Gregory L Shomo's message of 2011-04-20 08:56:02 -0400:
>> >> Chris Mason <chris.mason@oracle.com> writes:
>> >>
>> >> > Excerpts from Gregory L Shomo's message of 2011-04-19 15:08:13 -0400:
>> >> >> Hello list-
>> >> >>
>> >> >> Under heavy load (i/o), one of our fileservers lost two drives
>> >> >> in a raid6 configuration. After the drives were synchronized,
>> >> >> we can no longer mount the multiple-device btrfs filesystem
>> >> >> due to (at least) parent transid verification.
>> >> >>
>> >> >> btrfsck built from git commit 1b444cd2e6ab8dcafdd47dbaeaae369dd1517c17
>> >> >> runs for a while and then aborts on 'failed to find block number'.
>> >> >> Sample output includes :
>> >> >
>> >> > Looks like the rebuild gave you older copies of some of the blocks.
>> >> > btrfsck will exit out pretty early when it sees problems, but I'd say
>> >> > most of your FS is there.
>> >> >
>> >> > Can you please do a btrfs-debug-tree /dev/xxx > out, I'd like to see how
>> >> > far we get.
>> >> >
>> >> > What errors do you get when trying to mount the FS?
>> >> >
>> >> > -chris
>> >>
>> >> I'm not sure how far we will get, but btrfs-debug-tree
>> >> has been running for over 12h now and the screenlog is
>> >> at 80Gb. This may not be surprising, as the filesystem
>> >> is large (60T) and has millions of files.
>> >>
>> >> From the logs at boottime, we have
>> >>
>> >> btrfs: failed to read the system array on sdd1
>> >> btrfs: open_ctree failed
>> >>
>> >> Should we wait for the btrfs-debug-tree to finish
>> >> before executing an other mount command ?
>> >
>> > For btrfs-debug-tree to run this long, big parts of your FS must be
>> > valid. Also, btrfs-debug-tree must have been able to read the sys
>> > array (which mount was complaining about).
>> >
>> > How easily can you try a newer kernel? We need to make sure and do
>> > readonly operations (mount -o ro), but we may be able to pull out a
>> > bunch of files.
>> >
>> > -chris
>>
>>
>> Sure, we're up for that. Should we rebuild the kernel, or just
>> the btrfs module ? If the kernel, is linux-2.6.38.3 a good
>> choice, or should we build 2.6.39-rc4 ? If we only need to
>> rebuild the btrfs module, should we use Monday's commit to
>> btrfs-unstable ?
>
> The best choice right now is 2.6.38 plus the master branch of the btrfs
> unstable tree. There are a lot of fixes to dealing with busted blocks
> thanks to Josef and Fujitsu.
>
> It may still have trouble, please make sure to mount -o ro.
>
> -chris
OK, we've re-compiled linux-2.6.38 patched up to btrfs-unstable
commit f65647c29b14f5a32ff6f3237b0ef3b375ed5a79 and can now mount
the filesystem.
Mounting the filesystem read-only from /dev/sdd1 fails, but
succeeds from /dev/sdc1... after about 4855 parent transid
verification failures.
kernel: [ 293.827069] Btrfs loaded
kernel: [ 293.828014] device fsid 2e4187db574846d8-404f05c2e6ec579d devid 2 transid 176065 /dev/sdd1
kernel: [ 293.828781] btrfs: failed to read the system array on sdd1
kernel: [ 293.835956] btrfs: open_ctree failed
kernel: [ 305.296345] device fsid 2e4187db574846d8-404f05c2e6ec579d devid 1 transid 176066 /dev/sdc1
kernel: [ 305.476360] parent transid verify failed on 20403515125760 wanted 176066 found 174710
kernel: [ 305.476608] parent transid verify failed on 20403515125760 wanted 176066 found 174710
!-- snip
Is there any chance we can resolve some of the parent transid
verification failures ? What should our next steps be ?
Thank you very much for all your help.
- greg
next prev parent reply other threads:[~2011-04-20 20:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-19 19:08 parent transid troubles Gregory L Shomo
2011-04-19 19:34 ` Chris Mason
2011-04-20 12:56 ` Gregory L Shomo
2011-04-20 13:06 ` Chris Mason
2011-04-20 13:20 ` Gregory L Shomo
2011-04-20 14:04 ` Chris Mason
2011-04-20 20:53 ` Gregory L Shomo [this message]
2011-04-20 20:54 ` Chris Mason
2011-05-04 18:04 ` Gregory L Shomo
2011-05-25 18:03 ` Gregory L Shomo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=sa2vcy87igm.fsf@techsquare.com \
--to=greg@techsquare.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.