linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Curt Wohlgemuth <curtw@google.com>
To: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: ext4 inode corruption
Date: Wed, 23 Sep 2009 15:50:53 -0700	[thread overview]
Message-ID: <6601abe90909231550g5b55f277l218560c827693322@mail.gmail.com> (raw)
In-Reply-To: <6601abe90909230927m6d45cd75wef3525fc23837110@mail.gmail.com>

Sorry to reply to self, but I'm now pretty sure that I understand this
problem.  (Of course this insight came mere hours after I sent this
email -- and not in the previous 4 days of staring at it.)

It's likely the same issue fixed by

       commit	1b774f669b4b02f4d2abf2792362ab72a2e124ab
       ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()

In the previous case, in no-journal mode an about-to-be-freed metadata
block is marked dirty and available for writeback.  The block is then
marked free, and re-used as a data block for a different inode; the
writeback takes place, corrupting the data block.

In this case, the newly-freed block is re-used as a *metadata* block
for a different inode.  Hence the same pattern we were seeing before:
eh_entries = 0, eh_max = 340.

These inodes were left on systems from kernels without the above
patch.  Accessing the files on *patched* kernels will still make the
BUG fire, hence the confusion.

Thanks,
Curt


On Wed, Sep 23, 2009 at 9:27 AM, Curt Wohlgemuth <curtw@google.com> wrote:
> We've been seeing sporadic inode corruption on our ext4 partitions which
> we've been trying to analyze, without much success.  I'm wondering if
> anybody might have some clues as to where things might be going wrong.
>
> We find out about the corruption via a BUG firing in ext4_ext_get_blocks():
>
>        /*
>         * consistent leaf must not be empty;
>         * this situation is possible, though, _during_ tree modification;
>         * this is why assert can't be put in ext4_ext_find_extent()
>         */
>        BUG_ON(path[depth].p_ext == NULL && depth != 0);
>
> Of course, this fires long after the inode in question is corrupted.  With
> some diagnostics added in front of this bug, we can find the inodes; they
> all have characteristics like this:
>
> Output from debugfs' stat command:
>
>   Inode: 1195575   Type: regular    Mode:  0600   Flags: 0x80000
>   Generation: 2821101782    Version: 0x00000001
>   User: 35800   Group:  5000   Size: 8400896
>   File ACL: 0    Directory ACL: 0
>   Links: 1   Blockcount: 8
>   Fragment:  Address: 0    Number: 0    Size: 0
>   ctime: 0x4a9f8009 -- Thu Sep  3 01:36:25 2009
>   atime: 0x4a9f7ff7 -- Thu Sep  3 01:36:07 2009
>   mtime: 0x4a9f8009 -- Thu Sep  3 01:36:25 2009
>   EXTENTS:
>
> Note that no data blocks are printed out here.
>
> Following the actual extent tree, it always looks like this:
>
>   in-inode extent header:
>     eh_magic: 0xf30a
>     eh_entries: 1
>     eh_max: 4
>     eh_depth: 1
>
>   in-inode extent index 0:
>     ei_block: 0
>     ei_leaf_lo: 36738577
>     ei_leaf_hi: 0
>
>      leaf node header (at block 36738577):
>        eh_magic: 0xf30a
>        eh_entries: 0
>        eh_max: 340
>        eh_depth: 0
>
> The i_size value of the inode will vary, from 8192 to 8400896.  But the
> i_blocks value is *always* 8.
>
> The extent tree always has depth of 1 in the in-inode header, and a valid
> leaf node header; but the leaf node header always has 0 entries.  This is
> what's causing the BUG above to fire.
>
> We believe the general pattern of user space calls to create these files is
> something like this:
>
>   open(O_DIRECT)
>   fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 8400896)
>   < various writes to the file >
>   fallocate(fd, 0, 0, actual_size + BLOCK_SIZE)
>   ftruncate(fd, actual_size)
>
> The second fallocate() call without KEEP_SIZE allows the following
> ftruncate to actually truncate the file -- a known issue recently fixed by
> Jiaying Zhang (but her fix is not in our kernel yet).  "actual_size" can be
> 0 at times.
>
> I can't think of any actions that would cause the i_size to be so large, yet
> the i_blocks always be 8.  Looking at the code in
>
>   ext4_ext_remove_space()
>   ext4_ext_rm_leaf()
>   ext4_ext_rm_idx()
>
> I don't see a way for the extent tree to take the shape above.  There are no
> errors that I can see around the time the corrupted inodes are created.  It
> *seems* as though the corruption is coming during truncation, but all our
> efforts to reproduce this with small test cases have so far failed.
>
> We're using a 2.6.26 code base, with most of the latest ext4 patches
> applied.
>
> Any insights/ruminations/guesses as to what might be happening are welcome.
>
> Thanks,
> Curt
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-09-23 22:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-23 16:27 ext4 inode corruption Curt Wohlgemuth
2009-09-23 22:50 ` Curt Wohlgemuth [this message]
2009-09-24 18:27   ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6601abe90909231550g5b55f277l218560c827693322@mail.gmail.com \
    --to=curtw@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).