Re: crash while trying to access corrupt fs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Behrens <sbehrens@giantdisaster.de>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: tubalcane@earthlink.net, linux-btrfs@vger.kernel.org
Subject: Re: crash while trying to access corrupt fs
Date: Mon, 27 Aug 2012 18:12:30 +0200	[thread overview]
Message-ID: <503B9C6E.10200@giantdisaster.de> (raw)
In-Reply-To: <503B92DD.4010804@oracle.com>

On Mon, 27 Aug 2012 23:31:41 +0800, Liu Bo wrote:
> On 08/27/2012 07:12 PM, Stefan Behrens wrote:
>> On Sun, 26 Aug 2012 16:07:33 -0400 (EDT), tubalcane wrote:
>>> I'm primarily interested in the block level checksums of files and the
>>> scrubbing
>>> feature to detect corrupt files.  Currently I use ext4 and create and keep
>>> md5sums of everything which is tedious but I care about my data (quadruple
>>> backups including offsite)
>>>
> [...]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.835479]  [<ffffffffa04d344a>]
>>> btrfs_find_device_for_logical+0x4a/0xa0 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.836717]  [<ffffffffa04c6955>]
>>> end_bio_extent_readpage+0x105/0xa80 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.837938]  [<ffffffff81173569>] ?
>>> kfree+0x139/0x160
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.839157]  [<ffffffff811baaad>]
>>> bio_endio+0x1d/0x40
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.840395]  [<ffffffffa049be81>]
>>> end_workqueue_fn+0x41/0x50 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.841635]  [<ffffffffa04d4d46>]
>>> worker_loop+0x136/0x580 [btrfs]
>>
>> That crash is a bug which I have introduced with the IO error stats. It can happen after checksum errors are detected.
>> I'll send a patch to (temporarily) remove the counting for checksum errors in the IO error stats.
> 
> Just out of curiosity, isn't it fixable due to your design, Stefan?
> Why not try to fix the bug?

Yes, it is fixable. But it is complicated (and a source for new errors),
and I wanted to quickly prevent any more harm caused by this bug. People
who face that bug get a kernel crash whenever they access that corrupted
part of the filesystem.

The right btrfs_device pointer is needed in order to find the statistic
counters to increment. One would need to take some code of
bio_readpage_error() and some code of repair_io_failure() to retrieve
the btrfs_device pointer, and that would be rather huge additional code.
But maybe I am just not seeing the simple way to do it. Any simple
solution would be appreciated.

     prev parent reply	other threads:[~2012-08-27 16:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-26 20:07 crash while trying to access corrupt fs tubalcane
2012-08-27 11:12 ` Stefan Behrens
2012-08-27 15:31   ` Liu Bo
2012-08-27 16:12     ` Stefan Behrens [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=503B9C6E.10200@giantdisaster.de \
    --to=sbehrens@giantdisaster.de \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tubalcane@earthlink.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.