* [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
@ 2011-10-09 1:01 Curt Wohlgemuth
2011-10-10 7:19 ` Lukas Czerner
2011-10-26 8:38 ` Ted Ts'o
0 siblings, 2 replies; 6+ messages in thread
From: Curt Wohlgemuth @ 2011-10-09 1:01 UTC (permalink / raw)
To: tytso, adilger.kernel; +Cc: linux-ext4, Curt Wohlgemuth
In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch(). In
such a case, dereferencing it will crash the machine.
This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.
Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
---
fs/ext4/extents.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 57cf568..063a5b8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
while (depth >= 0) {
if (depth == path->p_depth) {
/* leaf */
- if (path[depth].p_ext !=
+ /* p_ext can be NULL */
+ if (path[depth].p_ext &&
+ path[depth].p_ext !=
EXT_LAST_EXTENT(path[depth].p_hdr))
return le32_to_cpu(path[depth].p_ext[1].ee_block);
} else {
--
1.7.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth
@ 2011-10-10 7:19 ` Lukas Czerner
2011-10-10 15:28 ` Curt Wohlgemuth
2011-10-26 8:38 ` Ted Ts'o
1 sibling, 1 reply; 6+ messages in thread
From: Lukas Czerner @ 2011-10-10 7:19 UTC (permalink / raw)
To: Curt Wohlgemuth; +Cc: tytso, adilger.kernel, linux-ext4
On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
> such a case, dereferencing it will crash the machine.
>
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
>
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.
Hi Curt,
It seems to me that even that the patch fixes the NULL dereference, it
is not a complete solution. Is it possible that in "normal" case p_ext
would be NULL ? I think that this is only possible in extent split/add
case (as noted in ext4_ext_binsearch()) which should be atomic to the
other operations (locked i_mutex?).
However this seems like an inode corruption so we should probably be
more verbose about it and print an appropriate EXT4_ERROR_INODE() or
even better check for the corrupted tree in the ext4_ext_find_extent()
(instead in ext4_ext_map_blocks()), however this will need to distinguish
between the normal and the tree modification case.
Thanks!
-Lukas
>
> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> fs/ext4/extents.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 57cf568..063a5b8 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> while (depth >= 0) {
> if (depth == path->p_depth) {
> /* leaf */
> - if (path[depth].p_ext !=
> + /* p_ext can be NULL */
> + if (path[depth].p_ext &&
> + path[depth].p_ext !=
> EXT_LAST_EXTENT(path[depth].p_hdr))
> return le32_to_cpu(path[depth].p_ext[1].ee_block);
> } else {
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
2011-10-10 7:19 ` Lukas Czerner
@ 2011-10-10 15:28 ` Curt Wohlgemuth
2011-10-11 7:01 ` Dmitry Monakhov
0 siblings, 1 reply; 6+ messages in thread
From: Curt Wohlgemuth @ 2011-10-10 15:28 UTC (permalink / raw)
To: Lukas Czerner; +Cc: tytso, adilger.kernel, linux-ext4
Hi Lukas:
On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
>
>> In ext4_ext_next_allocated_block(), the path[depth] might
>> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
>> such a case, dereferencing it will crash the machine.
>>
>> This patch checks for p_ext == NULL in
>> ext4_ext_next_allocated_block() before dereferencinging it.
>>
>> Tested using a hand-crafted an inode with eh_entries == 0 in
>> an extent block, verified that running FIEMAP on it crashes
>> without this patch, works fine with it.
>
> Hi Curt,
>
> It seems to me that even that the patch fixes the NULL dereference, it
> is not a complete solution. Is it possible that in "normal" case p_ext
> would be NULL ? I think that this is only possible in extent split/add
> case (as noted in ext4_ext_binsearch()) which should be atomic to the
> other operations (locked i_mutex?).
Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
We've seen this problem during what appears to be a race between an
inode growth (or truncate?) and another task doing a FIEMAP ioctl.
The real problem is that FIEMAP handing in ext4 is just, well, buggy?
ext4_ext_walk_space() will get the i_data_sem, construct the path
array, then release the semaphore. But then it does a bazillion
accesses on the extent/header/index pointers in the path array, with
no protection against truncate, growth, or any other changes. As far
as I can tell, this is the only use of a path array retrieved from
ext4_ext_find_extent() that isn't completely covered by i_data_sem.
> However this seems like an inode corruption so we should probably be
> more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> even better check for the corrupted tree in the ext4_ext_find_extent()
> (instead in ext4_ext_map_blocks()), however this will need to distinguish
> between the normal and the tree modification case.
What we've observed many times is a crash during a FIEMAP call to
ext4_ext_next_allocated_block(), which appears to me to be during a
race with another thread that's splitting the extent tree. This
causes the machine to go down with the inode in a bad state. But of
course, fsck won't detect and fix this, so when the machine comes back
up, and a FIEMAP call is done on this same inode -- without any other
threads -- it'll crash again. Hence a nasty crash loop.
So you're right, in that this isn't the "real solution." But devising
a safe, non-racy design for FIEMAP is not so simple, unless of course
you want to just hold the i_data_sem during the entire loop body of
ext4_ext_walk_space(), which would be pretty ugly. Hence the
"band-aid" approach in my patch, which at least seems correct, if not
thorough.
Thanks,
Curt
>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
>> ---
>> fs/ext4/extents.c | 4 +++-
>> 1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 57cf568..063a5b8 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>> while (depth >= 0) {
>> if (depth == path->p_depth) {
>> /* leaf */
>> - if (path[depth].p_ext !=
>> + /* p_ext can be NULL */
>> + if (path[depth].p_ext &&
>> + path[depth].p_ext !=
>> EXT_LAST_EXTENT(path[depth].p_hdr))
>> return le32_to_cpu(path[depth].p_ext[1].ee_block);
>> } else {
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
2011-10-10 15:28 ` Curt Wohlgemuth
@ 2011-10-11 7:01 ` Dmitry Monakhov
2011-10-26 8:26 ` Ted Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2011-10-11 7:01 UTC (permalink / raw)
To: Curt Wohlgemuth, Lukas Czerner; +Cc: tytso, adilger.kernel, linux-ext4
[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]
On Mon, 10 Oct 2011 08:28:23 -0700, Curt Wohlgemuth <curtw@google.com> wrote:
> Hi Lukas:
>
> On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> >
> >> In ext4_ext_next_allocated_block(), the path[depth] might
> >> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
> >> such a case, dereferencing it will crash the machine.
> >>
> >> This patch checks for p_ext == NULL in
> >> ext4_ext_next_allocated_block() before dereferencinging it.
> >>
> >> Tested using a hand-crafted an inode with eh_entries == 0 in
> >> an extent block, verified that running FIEMAP on it crashes
> >> without this patch, works fine with it.
> >
> > Hi Curt,
> >
> > It seems to me that even that the patch fixes the NULL dereference, it
> > is not a complete solution. Is it possible that in "normal" case p_ext
> > would be NULL ? I think that this is only possible in extent split/add
> > case (as noted in ext4_ext_binsearch()) which should be atomic to the
> > other operations (locked i_mutex?).
>
> Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
>
> We've seen this problem during what appears to be a race between an
> inode growth (or truncate?) and another task doing a FIEMAP ioctl.
> The real problem is that FIEMAP handing in ext4 is just, well, buggy?
Wow, IMHO it not just buggy, it is obviously incorrect, IMHO it is more
fair just return -ENOTSUPP, at least it is much safer.
Yes calling FIEMAP and truncate/write in parallel is stupid, but not
prohibited.
>
> ext4_ext_walk_space() will get the i_data_sem, construct the path
> array, then release the semaphore. But then it does a bazillion
> accesses on the extent/header/index pointers in the path array, with
> no protection against truncate, growth, or any other changes. As far
> as I can tell, this is the only use of a path array retrieved from
> ext4_ext_find_extent() that isn't completely covered by i_data_sem.
In that case i_data sem protects us from nothing. Path collected can
simply disappear under us. And in fact i don't understand the
reason why we drop i_data_sem too soon. Are any reason to do that?
Seems like following patch should fix the issue.
[-- Attachment #2: 0001-ext4-do-not-drop-i_data_sem-too-soon.patch --]
[-- Type: text/plain, Size: 2115 bytes --]
>From 12cd56ccd86c4d132f186034a9c11b0a2441a19f Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Tue, 11 Oct 2011 10:44:31 +0400
Subject: [PATCH] ext4: do not drop i_data_sem too soon
Path returned from ext4_find_extent is valid only while we hold
i_data_sem, so we can drop it only after we nolonger need it.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/ext4/extents.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index da4583f..c716a1f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1847,23 +1847,32 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
num = last - block;
/* find extent for this block */
down_read(&EXT4_I(inode)->i_data_sem);
+ if (ext_depth(inode) != depth) {
+ /* depth was changed. we have to realloc path */
+ kfree(path);
+ path = NULL;
+ }
+
path = ext4_ext_find_extent(inode, block, path);
- up_read(&EXT4_I(inode)->i_data_sem);
if (IS_ERR(path)) {
err = PTR_ERR(path);
+ up_read(&EXT4_I(inode)->i_data_sem);
path = NULL;
break;
}
depth = ext_depth(inode);
if (unlikely(path[depth].p_hdr == NULL)) {
+ up_read(&EXT4_I(inode)->i_data_sem);
EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
err = -EIO;
break;
}
+
ex = path[depth].p_ext;
next = ext4_ext_next_allocated_block(path);
-
+ up_read(&EXT4_I(inode)->i_data_sem);
+ ext4_ext_drop_refs(path);
exists = 0;
if (!ex) {
/* there is no extent yet, so try to allocate
@@ -1915,7 +1924,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
break;
}
err = func(inode, next, &cbex, ex, cbdata);
- ext4_ext_drop_refs(path);
if (err < 0)
break;
@@ -1927,12 +1935,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
break;
}
- if (ext_depth(inode) != depth) {
- /* depth was changed. we have to realloc path */
- kfree(path);
- path = NULL;
- }
-
block = cbex.ec_block + cbex.ec_len;
}
--
1.7.1
[-- Attachment #3: Type: text/plain, Size: 2554 bytes --]
>
> > However this seems like an inode corruption so we should probably be
> > more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> > even better check for the corrupted tree in the ext4_ext_find_extent()
> > (instead in ext4_ext_map_blocks()), however this will need to distinguish
> > between the normal and the tree modification case.
>
> What we've observed many times is a crash during a FIEMAP call to
> ext4_ext_next_allocated_block(), which appears to me to be during a
> race with another thread that's splitting the extent tree. This
> causes the machine to go down with the inode in a bad state. But of
> course, fsck won't detect and fix this, so when the machine comes back
> up, and a FIEMAP call is done on this same inode -- without any other
> threads -- it'll crash again. Hence a nasty crash loop.
>
> So you're right, in that this isn't the "real solution." But devising
> a safe, non-racy design for FIEMAP is not so simple, unless of course
> you want to just hold the i_data_sem during the entire loop body of
> ext4_ext_walk_space(), which would be pretty ugly. Hence the
> "band-aid" approach in my patch, which at least seems correct, if not
> thorough.
>
> Thanks,
> Curt
>
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> >> ---
> >> fs/ext4/extents.c | 4 +++-
> >> 1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 57cf568..063a5b8 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >> while (depth >= 0) {
> >> if (depth == path->p_depth) {
> >> /* leaf */
> >> - if (path[depth].p_ext !=
> >> + /* p_ext can be NULL */
> >> + if (path[depth].p_ext &&
> >> + path[depth].p_ext !=
> >> EXT_LAST_EXTENT(path[depth].p_hdr))
> >> return le32_to_cpu(path[depth].p_ext[1].ee_block);
> >> } else {
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
2011-10-11 7:01 ` Dmitry Monakhov
@ 2011-10-26 8:26 ` Ted Ts'o
0 siblings, 0 replies; 6+ messages in thread
From: Ted Ts'o @ 2011-10-26 8:26 UTC (permalink / raw)
To: Dmitry Monakhov
Cc: Curt Wohlgemuth, Lukas Czerner, adilger.kernel, linux-ext4
On Tue, Oct 11, 2011 at 11:01:57AM +0400, Dmitry Monakhov wrote:
> > ext4_ext_walk_space() will get the i_data_sem, construct the path
> > array, then release the semaphore. But then it does a bazillion
> > accesses on the extent/header/index pointers in the path array, with
> > no protection against truncate, growth, or any other changes. As far
> > as I can tell, this is the only use of a path array retrieved from
> > ext4_ext_find_extent() that isn't completely covered by i_data_sem.
>
> In that case i_data sem protects us from nothing. Path collected can
> simply disappear under us. And in fact i don't understand the
> reason why we drop i_data_sem too soon. Are any reason to do that?
The one concern I have is that I don't want FIEMAP to slow down "real"
ext4 processing. So what I've hoping we'll be able to do is use a
seq_lock sort of design, where if the pointer changes out from under
us, FIEMAP is forced to redo its sampling. But if there is some crazy
userspace program which is calling FIEMAP all the time, I'd much
rather that it not block ext4_map_blocks() if possible (which is what
I using a seqlock to protect the FIEMAP routines would help).
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth
2011-10-10 7:19 ` Lukas Czerner
@ 2011-10-26 8:38 ` Ted Ts'o
1 sibling, 0 replies; 6+ messages in thread
From: Ted Ts'o @ 2011-10-26 8:38 UTC (permalink / raw)
To: Curt Wohlgemuth; +Cc: adilger.kernel, linux-ext4
On Sat, Oct 08, 2011 at 06:01:14PM -0700, Curt Wohlgemuth wrote:
> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
> such a case, dereferencing it will crash the machine.
>
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
>
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.
>
> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-10-26 8:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth
2011-10-10 7:19 ` Lukas Czerner
2011-10-10 15:28 ` Curt Wohlgemuth
2011-10-11 7:01 ` Dmitry Monakhov
2011-10-26 8:26 ` Ted Ts'o
2011-10-26 8:38 ` Ted Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).