All of lore.kernel.org
 help / color / mirror / Atom feed
* scrub fails, any way to recover?
@ 2013-01-20 15:54 Neil Schemenauer
  2013-01-21 11:45 ` David Sterba
  0 siblings, 1 reply; 2+ messages in thread
From: Neil Schemenauer @ 2013-01-20 15:54 UTC (permalink / raw)
  To: linux-btrfs

I have a ~350 GB Btrfs filesystem that is corrupted.  I think the
damage was caused by a bad SATA cable.  I can mount the filesystem
and read most of the data (I already have backups of most everything).

The scrub is aborted after a few seconds with the following error in
the kernel log:

    parent transid verify failed on 795639808 wanted 102145 found 101462
    parent transid verify failed on 795639808 wanted 102145 found 101462
    verify_parent_transid: 16273 callbacks suppressed
    ...

Trying to remove the corrupted directory tree results in the following:

    device label DATA devid 1 transid 102169 /dev/sda2
    btrfs: enabling auto recovery
    btrfs: disk space caching is enabled
    verify_parent_transid: 12197 callbacks suppressed
    parent transid verify failed on 795062272 wanted 102145 found 101462
    ...
    ------------[ cut here ]------------
    WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x43/0xb6()
    Hardware name: MS-7388
    btrfs: Transaction aborted
    ...
    Pid: 24332, comm: rm Not tainted 3.8.0-rc4 #66
    Call Trace:
    [<ffffffff81162000>] ? __btrfs_abort_transaction+0x39/0xb6
    [<ffffffff8102cf85>] warn_slowpath_common+0x7e/0x97
    [<ffffffff8102d032>] warn_slowpath_fmt+0x41/0x43
    [<ffffffff81198d6e>] ? set_extent_dirty+0x1b/0x1d
    [<ffffffff8116200a>] __btrfs_abort_transaction+0x43/0xb6
    [<ffffffff81170db9>] __btrfs_free_extent+0x612/0x64e
    [<ffffffff811950eb>] ? btrfs_get_token_32+0x79/0xc7
    [<ffffffff811b8fa9>] ? btrfs_merge_delayed_refs+0x24b/0x266
    [<ffffffff81173cfe>] run_clustered_refs+0x7e3/0x8b9
    [<ffffffff81176b20>] btrfs_run_delayed_refs+0xde/0x268
    [<ffffffff811843f8>] __btrfs_end_transaction+0xd8/0x2cf
    [<ffffffff8118461a>] btrfs_end_transaction+0xb/0xd
    [<ffffffff81186b15>] __unlink_end_trans+0x5e/0x63
    [<ffffffff8118baf1>] btrfs_unlink+0x86/0xa0
    [<ffffffff810bc29f>] vfs_unlink+0x6f/0xdc
    [<ffffffff810bc3f9>] do_unlinkat+0xed/0x199
    [<ffffffff810b1d2e>] ? vfs_write+0x100/0x127
    [<ffffffff810b1f32>] ? sys_write+0x44/0x75
    [<ffffffff810bdf8a>] sys_unlinkat+0x1d/0x29
    [<ffffffff8147f9d2>] system_call_fastpath+0x16/0x1b
    ---[ end trace ce4d352b0ec7d230 ]---
    BTRFS error (device sda2) in __btrfs_free_extent:5184: IO failure
    btrfs is forced readonly
    btrfs: run_one_delayed_ref returned -5

I've tried btrfsck but it fails as well.  Is there some way I can
remove the damaged data and save the good or is a re-format the only
solution?

  Neil

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: scrub fails, any way to recover?
  2013-01-20 15:54 scrub fails, any way to recover? Neil Schemenauer
@ 2013-01-21 11:45 ` David Sterba
  0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2013-01-21 11:45 UTC (permalink / raw)
  To: Neil Schemenauer; +Cc: linux-btrfs

On Sun, Jan 20, 2013 at 09:54:38AM -0600, Neil Schemenauer wrote:
> I have a ~350 GB Btrfs filesystem that is corrupted.  I think the
> damage was caused by a bad SATA cable.  I can mount the filesystem
> and read most of the data (I already have backups of most everything).
> 
> The scrub is aborted after a few seconds with the following error in
> the kernel log:
> 
>     parent transid verify failed on 795639808 wanted 102145 found 101462
>     parent transid verify failed on 795639808 wanted 102145 found 101462
>     verify_parent_transid: 16273 callbacks suppressed

the difference between 102145 and 101462 is small and looks like a bunch
lost writes (ie. not a random corruption), this supports the 'bad cable'
root cause.

>From '16273 callbacks suppressed', there is a large number of broken
b-tree connections.

So far the rescue operation is to run btrfs-restore and copy the data
out.

> I've tried btrfsck but it fails as well.  Is there some way I can
> remove the damaged data and save the good or is a re-format the only
> solution?

IIRC removing the damaged data hasn't been proposed yet, there was a
patch to ignore the failures in a read-only mount

https://patchwork.kernel.org/patch/913642/
(probably does not apply today)

I think that the -o recovery mode could be extended in a way that a
read-only + recovery would ignore the failures.

I see two ways how to fix the on-disk b-tree structure (via fsck):

1) wipe the broken blocks and unlink from b-tree -- but a broen node on
   high level would kill lots of data unpredictably

2) in some cases it would be possible to promote the old transids to the
   current ones (to satisfy the transid verify check), however there may
   be some blocks already overwritten so it only pushes the problem
   farther

Level of success depends on amount of data lost during the cable unplug
and whether data or metadata were affected. It's more likely to rescue
the filesystem if less metadata were affected.

david

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-01-21 11:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-20 15:54 scrub fails, any way to recover? Neil Schemenauer
2013-01-21 11:45 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.