linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Curt Wohlgemuth <curtw@google.com>
To: Lukas Czerner <lczerner@redhat.com>
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
Date: Mon, 10 Oct 2011 08:28:23 -0700	[thread overview]
Message-ID: <CAO81RMbLGDUihKORjQ5rGeRHPHg5ivvQtOGJL_oBe4QUiiO0hg@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1110100841380.4851@dhcp-27-109.brq.redhat.com>

Hi Lukas:

On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
>
>> In ext4_ext_next_allocated_block(), the path[depth] might
>> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
>> such a case, dereferencing it will crash the machine.
>>
>> This patch checks for p_ext == NULL in
>> ext4_ext_next_allocated_block() before dereferencinging it.
>>
>> Tested using a hand-crafted an inode with eh_entries == 0 in
>> an extent block, verified that running FIEMAP on it crashes
>> without this patch, works fine with it.
>
> Hi Curt,
>
> It seems to me that even that the patch fixes the NULL dereference, it
> is not a complete solution. Is it possible that in "normal" case p_ext
> would be NULL ? I think that this is only possible in extent split/add
> case (as noted in ext4_ext_binsearch()) which should be atomic to the
> other operations (locked i_mutex?).

Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.

We've seen this problem during what appears to be a race between an
inode growth (or truncate?) and another task doing a FIEMAP ioctl.
The real problem is that FIEMAP handing in ext4 is just, well, buggy?

ext4_ext_walk_space() will get the i_data_sem, construct the path
array, then release the semaphore.  But then it does a bazillion
accesses on the extent/header/index pointers in the path array, with
no protection against truncate, growth, or any other changes.  As far
as I can tell, this is the only use of a path array retrieved from
ext4_ext_find_extent() that isn't completely covered by i_data_sem.

> However this seems like an inode corruption so we should probably be
> more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> even better check for the corrupted tree in the ext4_ext_find_extent()
> (instead in ext4_ext_map_blocks()), however this will need to distinguish
> between the normal and the tree modification case.

What we've observed many times is a crash during a FIEMAP call to
ext4_ext_next_allocated_block(), which appears to me to be during a
race with another thread that's splitting the extent tree.  This
causes the machine to go down with the inode in a bad state.  But of
course, fsck won't detect and fix this, so when the machine comes back
up, and a FIEMAP call is done on this same inode -- without any other
threads -- it'll crash again.  Hence a nasty crash loop.

So you're right, in that this isn't the "real solution."  But devising
a safe, non-racy design for FIEMAP is not so simple, unless of course
you want to just hold the i_data_sem during the entire loop body of
ext4_ext_walk_space(), which would be pretty ugly.  Hence the
"band-aid" approach in my patch, which at least seems correct, if not
thorough.

Thanks,
Curt

>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
>> ---
>>  fs/ext4/extents.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 57cf568..063a5b8 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>>       while (depth >= 0) {
>>               if (depth == path->p_depth) {
>>                       /* leaf */
>> -                     if (path[depth].p_ext !=
>> +                     /* p_ext can be NULL */
>> +                     if (path[depth].p_ext &&
>> +                             path[depth].p_ext !=
>>                                       EXT_LAST_EXTENT(path[depth].p_hdr))
>>                         return le32_to_cpu(path[depth].p_ext[1].ee_block);
>>               } else {
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-10-10 15:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-09  1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth
2011-10-10  7:19 ` Lukas Czerner
2011-10-10 15:28   ` Curt Wohlgemuth [this message]
2011-10-11  7:01     ` Dmitry Monakhov
2011-10-26  8:26       ` Ted Ts'o
2011-10-26  8:38 ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAO81RMbLGDUihKORjQ5rGeRHPHg5ivvQtOGJL_oBe4QUiiO0hg@mail.gmail.com \
    --to=curtw@google.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).