linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* transid problem after a power-failure
@ 2011-01-05 21:02 Mikael Cluseau
  2011-01-08  9:19 ` Mikael Cluseau
  0 siblings, 1 reply; 2+ messages in thread
From: Mikael Cluseau @ 2011-01-05 21:02 UTC (permalink / raw)
  To: BTRFS MAILING LIST

Hello people of BTRFS :)

I'm writing because (as the subject says) I have a problem mounting a
btrfs after a power failure.

About the context...

This a as pretty simple BTRFS setup: the 3ware 9690SA-4I RAID controller
with RAID-5 composed of 4x1TB drives. No software RAID, no BTRFS RAID.

My uname -a is:
Linux nwrk 2.6.36-gentoo-r5-nwrk #1 SMP PREEMPT Sat Dec 18 09:52:24 NCT 2010 x86_64 Intel(R) Core(TM) i7 CPU 975 @ 3.33GHz GenuineIntel GNU/Linux

btrfs-show from the latest btrfs-progs-unstable (pulled from Mason's
git) gives the following informations:

failed to read /dev/sdb
Label: none  uuid: 375315d5-6cc9-4e3c-b318-1823508a4e50
	Total devices 1 FS bytes used 789.63GB
	devid    1 size 2.73TB used 793.54GB path /dev/dm-0

Btrfs v0.19-35-g1b444cd

About the problem...

Whatever I tried to do I had the following in the kernel log or on
stdout:

Jan 06 07:34:32 [kernel] device fsid 3c4ec96cd5155337-504e8a50231818b3 devid 1 transid 85879 /dev/mapper/raid-data
Jan 06 07:34:32 [kernel] parent transid verify failed on 657818017792 wanted 85879 found 85878
                - Last output repeated 2 times -
Jan 06 07:34:32 [kernel] btrfs: open_ctree failed
^-- this line is more detailled with btrfsck:
                         btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root->node)' failed.

I've STFG of course, and then info I found was the following thread, on
this list :
http://kerneltrap.org/mailarchive/linux-btrfs/2010/12/9/6886529/thread
( From: Tommy Jonsson ; Subject: Fsck, parent transid verify failed )

so I tried the following things with the result above each time:

      * mount /dev/dm-0 /mnt/raid
      * mount -o degraded /dev/dm-0 /mnt/raid
      * btrfsck /dev/dm-0
      * git clone
        git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git; make and...
      * ./btrfsck /dev/mapper/raid-data
      * ./btrfsck -s 1 /dev/mapper/raid-data
      * ./btrfsck -s 2 /dev/mapper/raid-data
      * for i in `seq 0 50`; do ./btrfsck -s $i /dev/dm-0; done

None of them gave anything different than the previous log.

I don't know exactly what the problem is, I suppose its a kind
half-commited transaction issue, or maybe a problem with a cache-flush
that wrote the superblock (or even the superblocks) before writing the
transaction...

Regards,
Mikael.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: transid problem after a power-failure
  2011-01-05 21:02 transid problem after a power-failure Mikael Cluseau
@ 2011-01-08  9:19 ` Mikael Cluseau
  0 siblings, 0 replies; 2+ messages in thread
From: Mikael Cluseau @ 2011-01-08  9:19 UTC (permalink / raw)
  To: BTRFS MAILING LIST

[-- Attachment #1: Type: text/plain, Size: 3133 bytes --]

Hello again,

(this is a status update)

from what I begin to understand, the real problem is not the transid,
which is a kind of warning, but the failed assertion on "tree_root",
meaning that the read_tree_block call at disk-io.nc:736 fails.

The GDB backtrace is the following :

Reading symbols from /root/btrfs-progs-unstable/btrfsck...done.
(gdb) Starting program: /root/btrfs-progs-unstable/btrfsck /dev/dm-0
parent transid verify failed on 657818017792 wanted 85879 found 85878
parent transid verify failed on 657818017792 wanted 85879 found 85878
parent transid verify failed on 657818017792 wanted 85879 found 85878
btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root->node)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff7885ee5 in raise () from /lib/libc.so.6
(gdb) #0  0x00007ffff7885ee5 in raise () from /lib/libc.so.6
#1  0x00007ffff7887896 in abort () from /lib/libc.so.6
#2  0x00007ffff787e7a5 in __assert_fail () from /lib/libc.so.6
#3  0x000000000040a96a in open_ctree_fd (fp=5, path=0x14f77 <Address 0x14f77 out of bounds>, sb_bytenr=<value optimized out>, writes=<value optimized out>) at disk-io.c:663
#4  0x000000000040adca in open_ctree (filename=0x7fffffffde9e "/dev/dm-0", sb_bytenr=0, writes=0) at disk-io.c:587
#5  0x00000000004052ec in main (ac=<value optimized out>, av=0x7fffffffdb68) at btrfsck.c:2859


Then I did some changes in disk-io.c#read_tree_block to tell him that
its parent_transid is 85878 when it is called with 85879 (a completely
blind change on consistency issues but my only guess for now). I also
added some printk's to see what is going on. For details, see the
disk-io.[ch].patch files attached. Now btrfsck fails on this:

-- entering read_tree_block...
called with parent_transid=85879, setting it to 85878
btrfs_buffer_uptodate: extent_buffer_uptodate FAIL
btrfs_buffer_uptodate @657818017792 / transid wanted 85878
-- search loop [transid=85878] --
`-> extend buffer informations follow
    |-> start:      657818017792
    |-> dev_bytenr: 659437019136
    |-> len:        4096
    |-> refs:       2
    `-> flags:      0
`-> eb found and set uptodate!
btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion `!(path->slots[0] == 0)' failed.

I dd'ed the block to have a backup, I attached it here, if it can useful
for anything. Here's the dd command, if I'm wrong somewhere, please tell
me:

dd bs=1 count=4096 skip=659437019136 if=/dev/dm-0 of=block

My next move will be to analyse the BTRFS's structure more deeply to
understand how I can try to figure out what exactly is impacted and how
to get access to a least what is not impacted (corruption on impacted
data is something I can live with much better than losing access to
these 3TB of data "just" because of this ;)).

Any link/help welcome, of course. And also tell me if you consider my
status updates as useless for this list, so I don't spam you guys. My
first goal is to get the data back, of course, but a good secondary goal
is to help have btrfsck handle these cases (but my C is pretty old now).

Regards,
Mikaël.

[-- Attachment #2: disk-io.c.patch --]
[-- Type: text/x-patch, Size: 2786 bytes --]

diff --git a/disk-io.c b/disk-io.c
index a6e1000..17808fd 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -185,16 +185,29 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr,
 	int mirror_num = 0;
 	int num_copies;
 
+	printk("-- entering read_tree_block...\n");
+
+	if (parent_transid == 85879) {
+		printk("called with parent_transid=85879,"
+			" setting it to 85878\n");
+		parent_transid = 85878;
+	}
+
 	eb = btrfs_find_create_tree_block(root, bytenr, blocksize);
 	if (!eb)
 		return NULL;
 
 	if (btrfs_buffer_uptodate(eb, parent_transid))
 		return eb;
+	
+	printk("btrfs_buffer_uptodate @%llu / transid wanted %llu\n",
+		(unsigned long long) eb->start,
+		(unsigned long long) parent_transid);
 
 	dev_nr = 0;
 	length = blocksize;
 	while (1) {
+		printk("-- search loop [transid=%llu] --\n", parent_transid);
 		ret = btrfs_map_block(&root->fs_info->mapping_tree, READ,
 				      eb->start, &length, &multi, mirror_num);
 		BUG_ON(ret);
@@ -204,10 +217,14 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr,
 		eb->dev_bytenr = multi->stripes[0].physical;
 		kfree(multi);
 		ret = read_extent_from_disk(eb);
+		if (parent_transid == 85878) {
+			print_extent_buffer_info(eb);
+		}
 		if (ret == 0 && check_tree_block(root, eb) == 0 &&
 		    csum_tree_block(root, eb, 1) == 0 &&
 		    verify_parent_transid(eb->tree, eb, parent_transid) == 0) {
 			btrfs_set_buffer_uptodate(eb);
+			printk("`-> eb found and set uptodate!\n");
 			return eb;
 		}
 		num_copies = btrfs_num_copies(&root->fs_info->mapping_tree,
@@ -221,6 +238,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr,
 		}
 	}
 	free_extent_buffer(eb);
+	printk("-- not found --\n");
 	return NULL;
 }
 
@@ -1016,10 +1034,14 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid)
 	int ret;
 
 	ret = extent_buffer_uptodate(buf);
-	if (!ret)
+	if (!ret) {
+		printk("btrfs_buffer_uptodate: extent_buffer_uptodate FAIL\n");
 		return ret;
+	}
 
 	ret = verify_parent_transid(buf->tree, buf, parent_transid);
+	if (ret)
+		printk("btrfs_buffer_uptodate: verify_parent_transid FAIL\n");
 	return !ret;
 }
 
@@ -1027,3 +1049,13 @@ int btrfs_set_buffer_uptodate(struct extent_buffer *eb)
 {
 	return set_extent_buffer_uptodate(eb);
 }
+
+void print_extent_buffer_info(struct extent_buffer *eb) {
+	printk("`-> extend buffer informations follow\n");
+	printk("    |-> start:      %llu\n", eb->start     );
+	printk("    |-> dev_bytenr: %llu\n", eb->dev_bytenr);
+	printk("    |-> len:        %i\n",   eb->len       );
+	printk("    |-> refs:       %i\n",   eb->refs      );
+	printk("    |-> flags:      %i\n",   eb->flags     );
+}
+

[-- Attachment #3: disk-io.h.patch --]
[-- Type: text/x-patch, Size: 434 bytes --]

diff --git a/disk-io.h b/disk-io.h
index 49e5692..b8a9f95 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -75,4 +75,7 @@ int csum_tree_block_size(struct extent_buffer *buf, u16 csum_sectorsize,
 int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
 		    int verify);
 int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid);
+
+void print_extent_buffer_info(struct extent_buffer *eb);
+
 #endif

[-- Attachment #4: block.gz --]
[-- Type: application/x-gzip, Size: 1827 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-01-08  9:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 21:02 transid problem after a power-failure Mikael Cluseau
2011-01-08  9:19 ` Mikael Cluseau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).