linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
@ 2012-10-25 19:58 Marc MERLIN
  2012-10-25 20:03 ` cwillu
  2012-10-26 18:29 ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
  0 siblings, 2 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-10-25 19:58 UTC (permalink / raw)
  To: linux-btrfs

Howdy,

I can wait a day or maybe 2 before I have to wipe and restore from backup.
Please let me know if you have a patch against 3.6.3 you'd like me to try
to mount/recover this filesystem, or whether you'd like me to try btrfsck.


My laptop had a problem with its boot drive which prevented linux
from writing to it, and in turn caused btrfs to have incomplete writes 
to it.
After reboot, the boot drive was fine, but the btrfs filesystem has
a corruption that prevents it from being mounted.

Unfortunately the mount crash prevents writing of crash data to even another
drive since linux stops before the crash data can be written to syslog.

Picture #1 shows a dump when my laptop crashed (before reboot).
btrfs no csum found for inode X start Y
http://marc.merlins.org/tmp/crash.jpg

Mounting with 3.5.0 and 3.6.3 gives the same error:

gandalfthegreat:~# mount -o recovery,skip_balance,ro /dev/mapper/bootdsk                                  

shows
btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
(there are 2 lines, not sure why)

kernel BUG at fs/btrfs/volumes.c:3707
int btrfs_num_copies(struct btrfs_mapping_tree *map_tree, u64 logical, u64 len)
{
	struct extent_map *em;
	struct map_lookup *map;
	struct extent_map_tree *em_tree = &map_tree->map_tree;
	int ret;

	read_lock(&em_tree->lock);
	em = lookup_extent_mapping(em_tree, logical, len);
	read_unlock(&em_tree->lock);
	BUG_ON(!em);  <---

If the snapshot helps (sorry, hard to read, but usable):
http://marc.merlins.org/tmp/btrfs_bug.jpg

Questions:
1) Any better way to get a proper dump without serial console?
(I hate to give you pictures)

2) Should I try btrfsck now, or are there other mount options than
mount -o recovery,skip_balance,ro /dev/mapper/bootdsk 
I should try?

3) Want me to try btrfsck although it may make it impossible for me to
reproduce the bug and test a fix, as well as potentially break the filesystem
more (last time I tried btrfsck, it outputted thousands of lines and never converged
to a state it was happy with)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-25 19:58 Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
@ 2012-10-25 20:03 ` cwillu
  2012-10-25 20:12   ` Marc MERLIN
  2012-10-26 18:29 ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
  1 sibling, 1 reply; 13+ messages in thread
From: cwillu @ 2012-10-25 20:03 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Thu, Oct 25, 2012 at 1:58 PM, Marc MERLIN <marc@merlins.org> wrote:
> Howdy,
>
> I can wait a day or maybe 2 before I have to wipe and restore from backup.
> Please let me know if you have a patch against 3.6.3 you'd like me to try
> to mount/recover this filesystem, or whether you'd like me to try btrfsck.
>
>
> My laptop had a problem with its boot drive which prevented linux
> from writing to it, and in turn caused btrfs to have incomplete writes
> to it.
> After reboot, the boot drive was fine, but the btrfs filesystem has
> a corruption that prevents it from being mounted.
>
> Unfortunately the mount crash prevents writing of crash data to even another
> drive since linux stops before the crash data can be written to syslog.
>
> Picture #1 shows a dump when my laptop crashed (before reboot).
> btrfs no csum found for inode X start Y
> http://marc.merlins.org/tmp/crash.jpg
>
> Mounting with 3.5.0 and 3.6.3 gives the same error:
>
> gandalfthegreat:~# mount -o recovery,skip_balance,ro /dev/mapper/bootdsk
>
> shows
> btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> (there are 2 lines, not sure why)
>
> kernel BUG at fs/btrfs/volumes.c:3707
> int btrfs_num_copies(struct btrfs_mapping_tree *map_tree, u64 logical, u64 len)
> {
>         struct extent_map *em;
>         struct map_lookup *map;
>         struct extent_map_tree *em_tree = &map_tree->map_tree;
>         int ret;
>
>         read_lock(&em_tree->lock);
>         em = lookup_extent_mapping(em_tree, logical, len);
>         read_unlock(&em_tree->lock);
>         BUG_ON(!em);  <---
>
> If the snapshot helps (sorry, hard to read, but usable):
> http://marc.merlins.org/tmp/btrfs_bug.jpg
>
> Questions:
> 1) Any better way to get a proper dump without serial console?
> (I hate to give you pictures)
>
> 2) Should I try btrfsck now, or are there other mount options than
> mount -o recovery,skip_balance,ro /dev/mapper/bootdsk
> I should try?
>
> 3) Want me to try btrfsck although it may make it impossible for me to
> reproduce the bug and test a fix, as well as potentially break the filesystem
> more (last time I tried btrfsck, it outputted thousands of lines and never converged
> to a state it was happy with)

This looks like something btrfs-zero-log would work around (although
-o recovery should do mostly the same things).  That would destroy the
evidence though, and may just make things (slightly) worse, so I'd
wait to see if anyone suggests something better before trying it.  If
you're ultimately ending up restoring from backup though, it may save
you that effort at least.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-25 20:03 ` cwillu
@ 2012-10-25 20:12   ` Marc MERLIN
  2012-10-29  4:30     ` Marc MERLIN
  2012-10-29  5:05     ` Chris Murphy
  0 siblings, 2 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-10-25 20:12 UTC (permalink / raw)
  To: cwillu; +Cc: linux-btrfs

On Thu, Oct 25, 2012 at 02:03:49PM -0600, cwillu wrote:
> > 3) Want me to try btrfsck although it may make it impossible for me to
> > reproduce the bug and test a fix, as well as potentially break the filesystem
> > more (last time I tried btrfsck, it outputted thousands of lines and never converged
> > to a state it was happy with)
> 
> This looks like something btrfs-zero-log would work around (although
> -o recovery should do mostly the same things).  That would destroy the
> evidence though, and may just make things (slightly) worse, so I'd
> wait to see if anyone suggests something better before trying it.  If
> you're ultimately ending up restoring from backup though, it may save
> you that effort at least.

Thanks for pointing out btrfs-zero-log, I hadn't re-read the wiki page since
this got added.
But I'll hold off at least until tomorrow morning (GMT-7).

If someone would like me to hold off a bit longer, please let me know and
I'll wait for whatever patch you'd like me to try.

As for backups, yes, I have some :) and I also have hourly, daily, weekly
btrfs subvolume snapshots, but I can't use those currently since I can't
mount the base filesystem.
If my latest snapshot is corrupted, once I know which subvolume has the
problem (I can't quite tell since the crash doesn't say which subvolume is
causing it), I can revert to the last hourly snapshot.

Thanks for your reply.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-25 19:58 Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
  2012-10-25 20:03 ` cwillu
@ 2012-10-26 18:29 ` Marc MERLIN
  1 sibling, 0 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-10-26 18:29 UTC (permalink / raw)
  To: linux-btrfs

If any devs want info out of my drive, please ask today, I really need to
fix it tomorrow.
I'll try btrfs-zero-log otherwise and if not, wipe and start over.

Marc

On Thu, Oct 25, 2012 at 12:58:05PM -0700, Marc MERLIN wrote:
> Howdy,
> 
> I can wait a day or maybe 2 before I have to wipe and restore from backup.
> Please let me know if you have a patch against 3.6.3 you'd like me to try
> to mount/recover this filesystem, or whether you'd like me to try btrfsck.
> 
> 
> My laptop had a problem with its boot drive which prevented linux
> from writing to it, and in turn caused btrfs to have incomplete writes 
> to it.
> After reboot, the boot drive was fine, but the btrfs filesystem has
> a corruption that prevents it from being mounted.
> 
> Unfortunately the mount crash prevents writing of crash data to even another
> drive since linux stops before the crash data can be written to syslog.
> 
> Picture #1 shows a dump when my laptop crashed (before reboot).
> btrfs no csum found for inode X start Y
> http://marc.merlins.org/tmp/crash.jpg
> 
> Mounting with 3.5.0 and 3.6.3 gives the same error:
> 
> gandalfthegreat:~# mount -o recovery,skip_balance,ro /dev/mapper/bootdsk                                  
> 
> shows
> btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> btrfs: bdev /dev/mapper/bootdsk errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> (there are 2 lines, not sure why)
> 
> kernel BUG at fs/btrfs/volumes.c:3707
> int btrfs_num_copies(struct btrfs_mapping_tree *map_tree, u64 logical, u64 len)
> {
> 	struct extent_map *em;
> 	struct map_lookup *map;
> 	struct extent_map_tree *em_tree = &map_tree->map_tree;
> 	int ret;
> 
> 	read_lock(&em_tree->lock);
> 	em = lookup_extent_mapping(em_tree, logical, len);
> 	read_unlock(&em_tree->lock);
> 	BUG_ON(!em);  <---
> 
> If the snapshot helps (sorry, hard to read, but usable):
> http://marc.merlins.org/tmp/btrfs_bug.jpg
> 
> Questions:
> 1) Any better way to get a proper dump without serial console?
> (I hate to give you pictures)
> 
> 2) Should I try btrfsck now, or are there other mount options than
> mount -o recovery,skip_balance,ro /dev/mapper/bootdsk 
> I should try?
> 
> 3) Want me to try btrfsck although it may make it impossible for me to
> reproduce the bug and test a fix, as well as potentially break the filesystem
> more (last time I tried btrfsck, it outputted thousands of lines and never converged
> to a state it was happy with)
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-25 20:12   ` Marc MERLIN
@ 2012-10-29  4:30     ` Marc MERLIN
  2012-10-29  5:05     ` Chris Murphy
  1 sibling, 0 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-10-29  4:30 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Oct 25, 2012 at 01:12:23PM -0700, Marc MERLIN wrote:
> On Thu, Oct 25, 2012 at 02:03:49PM -0600, cwillu wrote:
> > > 3) Want me to try btrfsck although it may make it impossible for me to
> > > reproduce the bug and test a fix, as well as potentially break the filesystem
> > > more (last time I tried btrfsck, it outputted thousands of lines and never converged
> > > to a state it was happy with)
> > 
> > This looks like something btrfs-zero-log would work around (although
> > -o recovery should do mostly the same things).  That would destroy the
> > evidence though, and may just make things (slightly) worse, so I'd
> > wait to see if anyone suggests something better before trying it.  If
> > you're ultimately ending up restoring from backup though, it may save
> > you that effort at least.
> 
> Thanks for pointing out btrfs-zero-log, I hadn't re-read the wiki page since
> this got added.
> But I'll hold off at least until tomorrow morning (GMT-7).

I'm a bit surprised that no one seems to be replying on btrfs crashes,
that's a bit worrisome. I'm willing to risk my data somewhat, but if finding
a problem doesn't help fixing the code, I'm not sure if I'm helping anymore
:-/

Since I ran out of time, I tried:
gandalfthegreat:~# btrfs-zero-log 
usage: btrfs-zero-log dev
Btrfs Btrfs v0.19
gandalfthegreat:~# btrfs-zero-log /dev/mapper/bootdsk 
Check tree block failed, want=7533391872, have=17347973115472321934
Check tree block failed, want=7533391872, have=17347973115472321934
Check tree block failed, want=7533391872, have=8450612919225897562
Check tree block failed, want=7533391872, have=17347973115472321934
Check tree block failed, want=7533391872, have=17347973115472321934
read block failed check_tree_block
gandalfthegreat:~# 

So from here, unless someone chimes in tomorrow, I'm going to have to wipe
my filesystem and start over. I suppose that means btrfs can likely still
cause unknown and unfixable corruption.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-25 20:12   ` Marc MERLIN
  2012-10-29  4:30     ` Marc MERLIN
@ 2012-10-29  5:05     ` Chris Murphy
  2012-10-29 17:42       ` Marc MERLIN
  2012-10-29 17:48       ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED Marc MERLIN
  1 sibling, 2 replies; 13+ messages in thread
From: Chris Murphy @ 2012-10-29  5:05 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org


On Oct 25, 2012, at 2:12 PM, Marc MERLIN <marc@merlins.org> wrote:

> I also have hourly, daily, weekly
> btrfs subvolume snapshots, but I can't use those currently since I can't
> mount the base filesystem.

It might be worth unmounting it. Then only remounting a snapshot well before the problem started, yet still current enough to be useful: use '-o subvol=' instead of trying to mount from the top. Each subvolume is a root directory, so it might be possible to find one that will mount directly.

> I'm a bit surprised that no one seems to be replying on btrfs crashes,
> that's a bit worrisome. I'm willing to risk my data somewhat, but if finding
> a problem doesn't help fixing the code, I'm not sure if I'm helping anymore
> :-/

Lurking, I've learned this means you either didn't provide enough information for anyone to go on, or the problem is known. I suspect the former. Kernel 3.5.0 or 3.6.2 doesn't say where it came from, what distribution, or what version of btrfs is included in that distros kernel. And I'm not seeing that you're using a debug kernel, which will actually produce useful error messages.

And it's over a weekend for another thing.


Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707
  2012-10-29  5:05     ` Chris Murphy
@ 2012-10-29 17:42       ` Marc MERLIN
  2012-10-29 17:48       ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED Marc MERLIN
  1 sibling, 0 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-10-29 17:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs@vger.kernel.org

Hi, 

Thanks for the reply and hints.

On Sun, Oct 28, 2012 at 11:05:42PM -0600, Chris Murphy wrote:
> > I also have hourly, daily, weekly
> > btrfs subvolume snapshots, but I can't use those currently since I can't
> > mount the base filesystem.
> 
> It might be worth unmounting it. Then only remounting a snapshot well
> before the problem started, yet still current enough to be useful: use
> '-o subvol=' instead of trying to mount from the top. Each subvolume
> is a root directory, so it might be possible to find one that will
> mount directly.
 
So, I had thought about going back to an old snapshot, but the problem
is that my snapshots have pseudo random names based on cron times when
they're taken.
Because I couldn't mount the root so I couldn't find the snapshot names.
Is there a way to get a list of snapshots from a btrfs FS without
mounting it?

> > I'm a bit surprised that no one seems to be replying on btrfs crashes,
> > that's a bit worrisome. I'm willing to risk my data somewhat, but if finding
> > a problem doesn't help fixing the code, I'm not sure if I'm helping anymore
> > :-/
> 
> Lurking, I've learned this means you either didn't provide enough
> information for anyone to go on, or the problem is known. I suspect
> the former. Kernel 3.5.0 or 3.6.2 doesn't say where it came from, what

Fair enough. At the time I thought it didn't really matter how the bug
happened, and more that btrfs shouldn't crash my kernel when there is
some minor problem with the filesystem.
In my case, I'm convinced it was simply a problem that all the writes
did not make it to disk before the device disconnected for some unknown
reason (not related to btrfs).

> distribution, or what version of btrfs is included in that distros

debian unstable although it didn't seem relevant since it's the kernel
in initrd that can't mount the filesystem. Userland seems to be btrfs
0.19 as per the output I posted.

> kernel. And I'm not seeing that you're using a debug kernel, which
> will actually produce useful error messages.

Thanks for pointing that out. I'll admit that I'm not sure what kernel
build options I'm supposed to add to help. I asked about that in the
past, but never heard back.
What do you recommend I add in .config?

> And it's over a weekend for another thing.

Well, it was thursday when I posted :)

Now, I get the general point that I have no paid support, and I'm not
even sure there is any official support for kernel.org from yesterday or
last week (just a few vendor kernels).
At the same time, if brave testers are risking their data to help test
the filesystem, it's also good if they feel re-assured that they'll get
help or that if their data is gone, whatever bug they found was useful
to someone.

Now, there is a good ending to this story, thanks to you no less, I'll
post in another message not to burry it down there.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-10-29  5:05     ` Chris Murphy
  2012-10-29 17:42       ` Marc MERLIN
@ 2012-10-29 17:48       ` Marc MERLIN
  2012-10-30 15:46         ` Marc MERLIN
  1 sibling, 1 reply; 13+ messages in thread
From: Marc MERLIN @ 2012-10-29 17:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs@vger.kernel.org

First, I used another tool to see how the FS looked like, and maybe in
the hopes of having a list of subvolumes without mounting it:

gandalfthegreat:~# btrfs-calc-size /dev/mapper/bootdsk
Calculating size of root tree
	180.00KB total size, 0.00 inline data, 1 nodes, 44 leaves, 2 levels
Calculating size of extent tree
	387.90MB total size, 0.00 inline data, 1423 nodes, 97879 leaves, 4 levels
Calculating size of csum tree
	440.88MB total size, 0.00 inline data, 1425 nodes, 111441 leaves, 4 levels
Calculatin' size of fs tree
	20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels

Then, I figured, I'd try mounting all the active snapshots one per one,
and they worked:

[330514.202529] device label btrfs_pool2 devid 1 transid 39698 /dev/dm-0
[330514.203337] device label btrfs_pool1 devid 1 transid 145479 /dev/dm-1
[330629.438572] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk
[330629.439208] btrfs: use lzo compression
[330629.439213] btrfs: not using ssd allocation scheme
[330629.439216] btrfs: disk space caching is enabled
[330653.208718] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk
[330658.854162] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk
[330661.786204] btrfs: unlinked 25 orphans
[330708.314984] device label btrfs_pool1 devid 1 transid 145480 /dev/mapper/bootdsk
[330708.675443] btrfs: unlinked 165 orphans
[330721.558581] device label btrfs_pool1 devid 1 transid 145480 /dev/mapper/bootdsk
[330721.583214] btrfs: unlinked 9 orphans

After that, I was able to mount the root (volid 0) without a crash and
my filesystem looks fine again.

So as far as I can tell, my filesystem is not badly corrupted, and there
was just a small bit that triggered a bug in the mounting code.
Somehow mounting subvolumes separately cleared the state that triggered
the bug, which I can't quite explain.

If someone cares, I made a dd image of the FS to a file on a backup
server, but if not, I'll just delete it.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-10-29 17:48       ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED Marc MERLIN
@ 2012-10-30 15:46         ` Marc MERLIN
  2012-10-31  9:24           ` Sander
  0 siblings, 1 reply; 13+ messages in thread
From: Marc MERLIN @ 2012-10-30 15:46 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On Mon, Oct 29, 2012 at 10:48:02AM -0700, Marc MERLIN wrote:
> Then, I figured, I'd try mounting all the active snapshots one per one,
> and they worked:
>
> After that, I was able to mount the root (volid 0) without a crash and
> my filesystem looks fine again.
 
Ok, I was wrong.
What happened is that my SSD is craping out and failing to write after a
certain number of uptime hours.
I just had the same problem happen again yesterday.

Turns out btrfs-zero-log does fix the problem, but because it output
the errors I saw, I thought it did nothing and forgot that I had run it.

So
1) btrfs-zero-log does fix the problem
2) my drive causes btrfs to reliably enter a state where the filesystem
becomes unmountable and crashes the kernel on the next mount.
It would be nice if the kernel wouldn't crash and refuse to mount instead
or even automatically run the equivalent of btrfs-zero-log if necessary.


Details below if that helps.

gandalfthegreat:~# btrfs-calc-size /dev/mapper/bootdsk        
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=7949122546735189447
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=12301165138967429629
read block failed check_tree_block
Calculating size of root tree
	216.00KB total size, 0.00 inline data, 1 nodes, 53 leaves, 2 levels
Calculating size of extent tree
	390.99MB total size, 0.00 inline data, 1443 nodes, 98651 leaves, 4 levels
Calculating size of csum tree
	458.78MB total size, 0.00 inline data, 1472 nodes, 115976 leaves, 4 levels
Calculatin' size of fs tree
	20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels

gandalfthegreat:~# btrfs-find-root /dev/mapper/bootdsk
Super think's the tree root is at 147779584, chunk root 20979712
Found tree root at 147779584

gandalfthegreat:~# btrfs filesystem show 
Label: 'btrfs_pool1'  uuid: 92584fa9-85cd-4df6-b182-d32198b76a0b
	Total devices 1 FS bytes used 344.85GB
	devid    1 size 441.70GB used 441.70GB path /dev/dm-1

Label: 'btrfs_pool2'  uuid: 04071703-df6b-4022-9632-6c3aeabff206
	Total devices 1 FS bytes used 654.12GB
	devid    1 size 872.51GB used 872.51GB path /dev/dm-0

Btrfs Btrfs v0.19

gandalfthegreat:~# btrfs-zero-log /dev/mapper/bootdsk 
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=7949122546735189447
Check tree block failed, want=259264512, have=12301165138967429629
Check tree block failed, want=259264512, have=12301165138967429629
read block failed check_tree_block
gandalfthegreat:~#  btrfs-calc-size /dev/mapper/bootdsk      
Calculating size of root tree
	216.00KB total size, 0.00 inline data, 1 nodes, 53 leaves, 2 levels
Calculating size of extent tree
	390.99MB total size, 0.00 inline data, 1443 nodes, 98651 leaves, 4 levels
Calculating size of csum tree
	458.78MB total size, 0.00 inline data, 1472 nodes, 115976 leaves, 4 levels
Calculatin' size of fs tree
	20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels
gandalfthegreat:~# btrfs-zero-log /dev/mapper/bootdsk 
gandalfthegreat:~#

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-10-30 15:46         ` Marc MERLIN
@ 2012-10-31  9:24           ` Sander
  2012-10-31 15:40             ` Marc MERLIN
  0 siblings, 1 reply; 13+ messages in thread
From: Sander @ 2012-10-31  9:24 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org

Marc MERLIN wrote (ao):
> What happened is that my SSD is craping out and failing to write after
> a certain number of uptime hours.

What model ssd is that if I may ask?

	Sander

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-10-31  9:24           ` Sander
@ 2012-10-31 15:40             ` Marc MERLIN
  2012-11-01 10:56               ` Sander
  0 siblings, 1 reply; 13+ messages in thread
From: Marc MERLIN @ 2012-10-31 15:40 UTC (permalink / raw)
  To: Sander; +Cc: linux-btrfs@vger.kernel.org

On Wed, Oct 31, 2012 at 10:24:40AM +0100, Sander wrote:
> Marc MERLIN wrote (ao):
> > What happened is that my SSD is craping out and failing to write after
> > a certain number of uptime hours.
> 
> What model ssd is that if I may ask?

I had my first one, Crucial C300 just die with all my data about 3 months
later.
I spent 2-3 weeks trying to get acceptable performance (i.e. faster than a
HD) off 2 samsung 830s (you might remember some spam from me here about them
when I thought it might be an issue with btrfs initially).
Now, I have an OCZ Vertex 4.

That said, it's working fine again for now after I went back to kernel 3.5.3 
(down from 3.6.3). It hasn't been long enough to say for sure, but there is
a remote possibility that changes in 3.6 actually caused my drive to freeze
after several hours of use.
When that happened (3 times), 2 of those times, btrfs did not manage to
write all its data before access was cutoff, and I got the bug I reported
here, which in turn crashes any kernel you try to mount the FS with.
Cleaning the log manually fixed it both times so far.

For now, I'll stick with 3.5.3 for a while to make sure my drive is actually
ok (it seems to be afterall), and once I'm happy that it's the case, I'll go
back to 3.6.3 with serial console remote logging and try to capture the full
sata failure I got with 3.6.3.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-10-31 15:40             ` Marc MERLIN
@ 2012-11-01 10:56               ` Sander
  2012-11-01 16:16                 ` Marc MERLIN
  0 siblings, 1 reply; 13+ messages in thread
From: Sander @ 2012-11-01 10:56 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org

Marc MERLIN wrote (ao):
> That said, it's working fine again for now after I went back to kernel 3.5.3 
> (down from 3.6.3). It hasn't been long enough to say for sure, but there is
> a remote possibility that changes in 3.6 actually caused my drive to freeze
> after several hours of use.
> When that happened (3 times), 2 of those times, btrfs did not manage to
> write all its data before access was cutoff, and I got the bug I reported
> here, which in turn crashes any kernel you try to mount the FS with.
> Cleaning the log manually fixed it both times so far.
> 
> For now, I'll stick with 3.5.3 for a while to make sure my drive is actually
> ok (it seems to be afterall), and once I'm happy that it's the case, I'll go
> back to 3.6.3 with serial console remote logging and try to capture the full
> sata failure I got with 3.6.3.

Thanks for the info. You could put some load on the ssd to see if you
can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or
badblocks (in the default non-destructive mode).

Can you collect SMART data (with smartctl) from the ssd?

	Sander

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
  2012-11-01 10:56               ` Sander
@ 2012-11-01 16:16                 ` Marc MERLIN
  0 siblings, 0 replies; 13+ messages in thread
From: Marc MERLIN @ 2012-11-01 16:16 UTC (permalink / raw)
  To: Sander; +Cc: linux-btrfs@vger.kernel.org

On Thu, Nov 01, 2012 at 11:56:18AM +0100, Sander wrote:
> > For now, I'll stick with 3.5.3 for a while to make sure my drive is actually
> > ok (it seems to be afterall), and once I'm happy that it's the case, I'll go
> > back to 3.6.3 with serial console remote logging and try to capture the full
> > sata failure I got with 3.6.3.
> 
> Thanks for the info. You could put some load on the ssd to see if you
> can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or
> badblocks (in the default non-destructive mode).

I'll try this in a few days when I've first comfirmed that my SSD is still
100% stable under 3.5.3 (so far it is).
After that, I'll go back to 3.6.3 and see what it takes to crash it.
But as per my original report and
http://marc.merlins.org/tmp/crash.jpg
this does look like a sata layer problem, which btrfs isn't responsible for.

Also there is still that unaddressed bug that when it does happen, btrfs
then can end up in a state where the filesystem is unmountable without
manually fixing it.
 
> Can you collect SMART data (with smartctl) from the ssd?

I did actually have a look, but to be honest, SSDs have pretty useless smart
data overall. Mine's likely a bit worse than the average even.

gandalfthegreat:~# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.3-amd64-preempt-noide-20120903] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     OCZ-VERTEX4
Serial Number:    OCZ-26W4VJ3SP32E1WC2
LU WWN Device Id: 5 e83a97 59be3b57e
Firmware Version: 1.5
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   9
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Nov  1 09:14:43 2012 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
					90% of test remaining.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x1d) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x00)	Error logging NOT supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   0) minutes.
Extended self-test routine
recommended polling time: 	 (   0) minutes.

SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   006   000   000    Old_age   Offline      -       6
  3 Spin_Up_Time            0x0000   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       8
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       1210
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       240
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       8019542246
233 Media_Wearout_Indicator 0x0000   099   000   000    Old_age   Offline      -       99

SMART Error Log not supported
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-11-01 16:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-25 19:58 Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
2012-10-25 20:03 ` cwillu
2012-10-25 20:12   ` Marc MERLIN
2012-10-29  4:30     ` Marc MERLIN
2012-10-29  5:05     ` Chris Murphy
2012-10-29 17:42       ` Marc MERLIN
2012-10-29 17:48       ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED Marc MERLIN
2012-10-30 15:46         ` Marc MERLIN
2012-10-31  9:24           ` Sander
2012-10-31 15:40             ` Marc MERLIN
2012-11-01 10:56               ` Sander
2012-11-01 16:16                 ` Marc MERLIN
2012-10-26 18:29 ` Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 Marc MERLIN
     [not found] <E1TTCzR-0001nz-1p@gandalfthegreat.merlins.org>
     [not found] ` <20121030144914.GA18659@merlins.org>

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).