All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jérôme Carretero" <cJ-ko@zougloub.eu>
To: linux-btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <jbacik@fusionio.com>
Subject: Troublesome failure mode and recovery
Date: Sat, 13 Jul 2013 12:14:04 -0400	[thread overview]
Message-ID: <20130713121404.65fc89ea@Bidule> (raw)

Hi there,

Experiencing an broken FS in a state I haven't seen before.

I was running linux-3.10 on my laptop, which I had tried to put to sleep
with an external btrfs partition attached.
On resume, the external partition was lost.
I was able to unmount it, despite many kernel warnings.
Then I remounted it... and unplugged the USB cable.
Then I couldn't unmount it.
Well, too bad, not a big deal.
I ran alt+sysrq+s, waited a little, ran alt+sysrq+b.
And on reboot, my root partition (also btrfs) was unmountable, with the error:

  [    1.150000] btrfs bad tree block start 0 1531035648
  [    1.150000] btrfs: failed to read log tree
  [    1.150000] btrfs: open_ctree failed

Then I did the following:

- Tested various mount flags (some by memory, some by looking at the
  `fs/btrfs/super.c` code (recovery,clear_cache...)

- Took the drive (Lenovo-branded Micron RealSSD 400) to another computer
  and made an image of this partition, because this issue could be of use,
  and I have some recent documents that I'd like to recover in some way.

- Run various btrfs-progs utilities on the partition

- Edit the kernel btrfs code and attempt to mount the partition from a
  user-mode linux kernel.

The results are the following:

- `btrfs-restore` only works with `-u 1`, so the first superblock data has
  an issue
- `btrfsck` was crashing because the code would progress even if fs_root
  was null... fixed with this patch:

diff --git a/cmds-check.c b/cmds-check.c
index 8015288..be3e329 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5777,6 +5777,11 @@ int cmd_check(int argc, char **argv)
 
	root = info->fs_root;
 
+	if (root == NULL) {
+		fprintf(stderr, "Error finding FS root\n");
+		return -EIO;
+	}
+
	if (init_extent_tree) {
		printf("Creating a new extent tree\n");
		ret = reinit_extent_tree(info);

- The linux kernel code patched with the following ugly hack would (somehow)
  boot:

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b8b60b6..0807f4d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2627,6 +2627,14 @@ retry_root_backup:
 	tree_root->node = read_tree_block(tree_root,
 					  btrfs_super_root(disk_super),
 					  blocksize, generation);
+
+	if (1) { // ugly hack to force using the second superblock
+		static int i = 0;
+		if (i++ == 0) {
+			goto recovery_tree_root;
+		}
+	}
+
 	if (!tree_root->node ||
 	    !test_bit(EXTENT_BUFFER_UPTODATE, &tree_root->node->bflags)) {
 		printk(KERN_WARNING "btrfs: failed to read tree root on %s\n",

  But /sbin/init, /bin/bash wouldn't fire up because of btrfs errors.
  Looks like some inodes are broken.
  Somehow /usr/bin/python could start, which made me happy.

Within the UML instance with python, I cannot do `ls` (`os.listdir()`)
on my home folder (`/home/cJ`), and btrfs-restore only restores
a few dot files in there.
But I can get inode numbers and read files or subdirectories beyond
this folder. And it looks like btrfs-debug-tree can find transactions
containing older updated directory inodes.
I can also do stat() calls on files, and to call `/sbin/btrfs`
(using `subprocess.Popen` not `os.system()`).
If this were a FAT partition, I would be able to recover data in subfolders
even if the parent folder inode is broken.
I assume the same thing is possible with btrfs, and even more,
given that there are probably older copies of the `/home/cJ` directory
entries from older transactions hanging around somewhere.
But I am no btrfs specialist, so I can't get this data.

Ideally I would like to be able to mount an older generation, or
re-patch older directory inodes where the newer directories cannot be
read.
Having btrfs-restore able to restore sub-directories of a certain generation
would also be very helpful.

So I have my disk image, linux and btrfs-progs from git, a bootable UML,
and can allocate some time to this issue.

Your help is welcome.

Thanks,

-- 
cJ

             reply	other threads:[~2013-07-13 16:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-13 16:14 Jérôme Carretero [this message]
2013-07-14  9:54 ` Troublesome failure mode and recovery Duncan
2013-07-15 11:58 ` Jérôme Carretero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130713121404.65fc89ea@Bidule \
    --to=cj-ko@zougloub.eu \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.