* [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() @ 2011-10-09 1:01 Curt Wohlgemuth 2011-10-10 7:19 ` Lukas Czerner 2011-10-26 8:38 ` Ted Ts'o 0 siblings, 2 replies; 6+ messages in thread From: Curt Wohlgemuth @ 2011-10-09 1:01 UTC (permalink / raw) To: tytso, adilger.kernel; +Cc: linux-ext4, Curt Wohlgemuth In ext4_ext_next_allocated_block(), the path[depth] might have a p_ext that is NULL -- see ext4_ext_binsearch(). In such a case, dereferencing it will crash the machine. This patch checks for p_ext == NULL in ext4_ext_next_allocated_block() before dereferencinging it. Tested using a hand-crafted an inode with eh_entries == 0 in an extent block, verified that running FIEMAP on it crashes without this patch, works fine with it. Signed-off-by: Curt Wohlgemuth <curtw@google.com> --- fs/ext4/extents.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 57cf568..063a5b8 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path) while (depth >= 0) { if (depth == path->p_depth) { /* leaf */ - if (path[depth].p_ext != + /* p_ext can be NULL */ + if (path[depth].p_ext && + path[depth].p_ext != EXT_LAST_EXTENT(path[depth].p_hdr)) return le32_to_cpu(path[depth].p_ext[1].ee_block); } else { -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() 2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth @ 2011-10-10 7:19 ` Lukas Czerner 2011-10-10 15:28 ` Curt Wohlgemuth 2011-10-26 8:38 ` Ted Ts'o 1 sibling, 1 reply; 6+ messages in thread From: Lukas Czerner @ 2011-10-10 7:19 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: tytso, adilger.kernel, linux-ext4 On Sat, 8 Oct 2011, Curt Wohlgemuth wrote: > In ext4_ext_next_allocated_block(), the path[depth] might > have a p_ext that is NULL -- see ext4_ext_binsearch(). In > such a case, dereferencing it will crash the machine. > > This patch checks for p_ext == NULL in > ext4_ext_next_allocated_block() before dereferencinging it. > > Tested using a hand-crafted an inode with eh_entries == 0 in > an extent block, verified that running FIEMAP on it crashes > without this patch, works fine with it. Hi Curt, It seems to me that even that the patch fixes the NULL dereference, it is not a complete solution. Is it possible that in "normal" case p_ext would be NULL ? I think that this is only possible in extent split/add case (as noted in ext4_ext_binsearch()) which should be atomic to the other operations (locked i_mutex?). However this seems like an inode corruption so we should probably be more verbose about it and print an appropriate EXT4_ERROR_INODE() or even better check for the corrupted tree in the ext4_ext_find_extent() (instead in ext4_ext_map_blocks()), however this will need to distinguish between the normal and the tree modification case. Thanks! -Lukas > > Signed-off-by: Curt Wohlgemuth <curtw@google.com> > --- > fs/ext4/extents.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 57cf568..063a5b8 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path) > while (depth >= 0) { > if (depth == path->p_depth) { > /* leaf */ > - if (path[depth].p_ext != > + /* p_ext can be NULL */ > + if (path[depth].p_ext && > + path[depth].p_ext != > EXT_LAST_EXTENT(path[depth].p_hdr)) > return le32_to_cpu(path[depth].p_ext[1].ee_block); > } else { > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() 2011-10-10 7:19 ` Lukas Czerner @ 2011-10-10 15:28 ` Curt Wohlgemuth 2011-10-11 7:01 ` Dmitry Monakhov 0 siblings, 1 reply; 6+ messages in thread From: Curt Wohlgemuth @ 2011-10-10 15:28 UTC (permalink / raw) To: Lukas Czerner; +Cc: tytso, adilger.kernel, linux-ext4 Hi Lukas: On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote: > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote: > >> In ext4_ext_next_allocated_block(), the path[depth] might >> have a p_ext that is NULL -- see ext4_ext_binsearch(). In >> such a case, dereferencing it will crash the machine. >> >> This patch checks for p_ext == NULL in >> ext4_ext_next_allocated_block() before dereferencinging it. >> >> Tested using a hand-crafted an inode with eh_entries == 0 in >> an extent block, verified that running FIEMAP on it crashes >> without this patch, works fine with it. > > Hi Curt, > > It seems to me that even that the patch fixes the NULL dereference, it > is not a complete solution. Is it possible that in "normal" case p_ext > would be NULL ? I think that this is only possible in extent split/add > case (as noted in ext4_ext_binsearch()) which should be atomic to the > other operations (locked i_mutex?). Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL. We've seen this problem during what appears to be a race between an inode growth (or truncate?) and another task doing a FIEMAP ioctl. The real problem is that FIEMAP handing in ext4 is just, well, buggy? ext4_ext_walk_space() will get the i_data_sem, construct the path array, then release the semaphore. But then it does a bazillion accesses on the extent/header/index pointers in the path array, with no protection against truncate, growth, or any other changes. As far as I can tell, this is the only use of a path array retrieved from ext4_ext_find_extent() that isn't completely covered by i_data_sem. > However this seems like an inode corruption so we should probably be > more verbose about it and print an appropriate EXT4_ERROR_INODE() or > even better check for the corrupted tree in the ext4_ext_find_extent() > (instead in ext4_ext_map_blocks()), however this will need to distinguish > between the normal and the tree modification case. What we've observed many times is a crash during a FIEMAP call to ext4_ext_next_allocated_block(), which appears to me to be during a race with another thread that's splitting the extent tree. This causes the machine to go down with the inode in a bad state. But of course, fsck won't detect and fix this, so when the machine comes back up, and a FIEMAP call is done on this same inode -- without any other threads -- it'll crash again. Hence a nasty crash loop. So you're right, in that this isn't the "real solution." But devising a safe, non-racy design for FIEMAP is not so simple, unless of course you want to just hold the i_data_sem during the entire loop body of ext4_ext_walk_space(), which would be pretty ugly. Hence the "band-aid" approach in my patch, which at least seems correct, if not thorough. Thanks, Curt > > Thanks! > -Lukas > >> >> Signed-off-by: Curt Wohlgemuth <curtw@google.com> >> --- >> fs/ext4/extents.c | 4 +++- >> 1 files changed, 3 insertions(+), 1 deletions(-) >> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >> index 57cf568..063a5b8 100644 >> --- a/fs/ext4/extents.c >> +++ b/fs/ext4/extents.c >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path) >> while (depth >= 0) { >> if (depth == path->p_depth) { >> /* leaf */ >> - if (path[depth].p_ext != >> + /* p_ext can be NULL */ >> + if (path[depth].p_ext && >> + path[depth].p_ext != >> EXT_LAST_EXTENT(path[depth].p_hdr)) >> return le32_to_cpu(path[depth].p_ext[1].ee_block); >> } else { >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() 2011-10-10 15:28 ` Curt Wohlgemuth @ 2011-10-11 7:01 ` Dmitry Monakhov 2011-10-26 8:26 ` Ted Ts'o 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Monakhov @ 2011-10-11 7:01 UTC (permalink / raw) To: Curt Wohlgemuth, Lukas Czerner; +Cc: tytso, adilger.kernel, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 2280 bytes --] On Mon, 10 Oct 2011 08:28:23 -0700, Curt Wohlgemuth <curtw@google.com> wrote: > Hi Lukas: > > On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote: > > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote: > > > >> In ext4_ext_next_allocated_block(), the path[depth] might > >> have a p_ext that is NULL -- see ext4_ext_binsearch(). In > >> such a case, dereferencing it will crash the machine. > >> > >> This patch checks for p_ext == NULL in > >> ext4_ext_next_allocated_block() before dereferencinging it. > >> > >> Tested using a hand-crafted an inode with eh_entries == 0 in > >> an extent block, verified that running FIEMAP on it crashes > >> without this patch, works fine with it. > > > > Hi Curt, > > > > It seems to me that even that the patch fixes the NULL dereference, it > > is not a complete solution. Is it possible that in "normal" case p_ext > > would be NULL ? I think that this is only possible in extent split/add > > case (as noted in ext4_ext_binsearch()) which should be atomic to the > > other operations (locked i_mutex?). > > Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL. > > We've seen this problem during what appears to be a race between an > inode growth (or truncate?) and another task doing a FIEMAP ioctl. > The real problem is that FIEMAP handing in ext4 is just, well, buggy? Wow, IMHO it not just buggy, it is obviously incorrect, IMHO it is more fair just return -ENOTSUPP, at least it is much safer. Yes calling FIEMAP and truncate/write in parallel is stupid, but not prohibited. > > ext4_ext_walk_space() will get the i_data_sem, construct the path > array, then release the semaphore. But then it does a bazillion > accesses on the extent/header/index pointers in the path array, with > no protection against truncate, growth, or any other changes. As far > as I can tell, this is the only use of a path array retrieved from > ext4_ext_find_extent() that isn't completely covered by i_data_sem. In that case i_data sem protects us from nothing. Path collected can simply disappear under us. And in fact i don't understand the reason why we drop i_data_sem too soon. Are any reason to do that? Seems like following patch should fix the issue. [-- Attachment #2: 0001-ext4-do-not-drop-i_data_sem-too-soon.patch --] [-- Type: text/plain, Size: 2115 bytes --] >From 12cd56ccd86c4d132f186034a9c11b0a2441a19f Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov <dmonakhov@openvz.org> Date: Tue, 11 Oct 2011 10:44:31 +0400 Subject: [PATCH] ext4: do not drop i_data_sem too soon Path returned from ext4_find_extent is valid only while we hold i_data_sem, so we can drop it only after we nolonger need it. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/extents.c | 20 +++++++++++--------- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index da4583f..c716a1f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1847,23 +1847,32 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, num = last - block; /* find extent for this block */ down_read(&EXT4_I(inode)->i_data_sem); + if (ext_depth(inode) != depth) { + /* depth was changed. we have to realloc path */ + kfree(path); + path = NULL; + } + path = ext4_ext_find_extent(inode, block, path); - up_read(&EXT4_I(inode)->i_data_sem); if (IS_ERR(path)) { err = PTR_ERR(path); + up_read(&EXT4_I(inode)->i_data_sem); path = NULL; break; } depth = ext_depth(inode); if (unlikely(path[depth].p_hdr == NULL)) { + up_read(&EXT4_I(inode)->i_data_sem); EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth); err = -EIO; break; } + ex = path[depth].p_ext; next = ext4_ext_next_allocated_block(path); - + up_read(&EXT4_I(inode)->i_data_sem); + ext4_ext_drop_refs(path); exists = 0; if (!ex) { /* there is no extent yet, so try to allocate @@ -1915,7 +1924,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, break; } err = func(inode, next, &cbex, ex, cbdata); - ext4_ext_drop_refs(path); if (err < 0) break; @@ -1927,12 +1935,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block, break; } - if (ext_depth(inode) != depth) { - /* depth was changed. we have to realloc path */ - kfree(path); - path = NULL; - } - block = cbex.ec_block + cbex.ec_len; } -- 1.7.1 [-- Attachment #3: Type: text/plain, Size: 2554 bytes --] > > > However this seems like an inode corruption so we should probably be > > more verbose about it and print an appropriate EXT4_ERROR_INODE() or > > even better check for the corrupted tree in the ext4_ext_find_extent() > > (instead in ext4_ext_map_blocks()), however this will need to distinguish > > between the normal and the tree modification case. > > What we've observed many times is a crash during a FIEMAP call to > ext4_ext_next_allocated_block(), which appears to me to be during a > race with another thread that's splitting the extent tree. This > causes the machine to go down with the inode in a bad state. But of > course, fsck won't detect and fix this, so when the machine comes back > up, and a FIEMAP call is done on this same inode -- without any other > threads -- it'll crash again. Hence a nasty crash loop. > > So you're right, in that this isn't the "real solution." But devising > a safe, non-racy design for FIEMAP is not so simple, unless of course > you want to just hold the i_data_sem during the entire loop body of > ext4_ext_walk_space(), which would be pretty ugly. Hence the > "band-aid" approach in my patch, which at least seems correct, if not > thorough. > > Thanks, > Curt > > > > > Thanks! > > -Lukas > > > >> > >> Signed-off-by: Curt Wohlgemuth <curtw@google.com> > >> --- > >> fs/ext4/extents.c | 4 +++- > >> 1 files changed, 3 insertions(+), 1 deletions(-) > >> > >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > >> index 57cf568..063a5b8 100644 > >> --- a/fs/ext4/extents.c > >> +++ b/fs/ext4/extents.c > >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path) > >> while (depth >= 0) { > >> if (depth == path->p_depth) { > >> /* leaf */ > >> - if (path[depth].p_ext != > >> + /* p_ext can be NULL */ > >> + if (path[depth].p_ext && > >> + path[depth].p_ext != > >> EXT_LAST_EXTENT(path[depth].p_hdr)) > >> return le32_to_cpu(path[depth].p_ext[1].ee_block); > >> } else { > >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() 2011-10-11 7:01 ` Dmitry Monakhov @ 2011-10-26 8:26 ` Ted Ts'o 0 siblings, 0 replies; 6+ messages in thread From: Ted Ts'o @ 2011-10-26 8:26 UTC (permalink / raw) To: Dmitry Monakhov Cc: Curt Wohlgemuth, Lukas Czerner, adilger.kernel, linux-ext4 On Tue, Oct 11, 2011 at 11:01:57AM +0400, Dmitry Monakhov wrote: > > ext4_ext_walk_space() will get the i_data_sem, construct the path > > array, then release the semaphore. But then it does a bazillion > > accesses on the extent/header/index pointers in the path array, with > > no protection against truncate, growth, or any other changes. As far > > as I can tell, this is the only use of a path array retrieved from > > ext4_ext_find_extent() that isn't completely covered by i_data_sem. > > In that case i_data sem protects us from nothing. Path collected can > simply disappear under us. And in fact i don't understand the > reason why we drop i_data_sem too soon. Are any reason to do that? The one concern I have is that I don't want FIEMAP to slow down "real" ext4 processing. So what I've hoping we'll be able to do is use a seq_lock sort of design, where if the pointer changes out from under us, FIEMAP is forced to redo its sampling. But if there is some crazy userspace program which is calling FIEMAP all the time, I'd much rather that it not block ext4_map_blocks() if possible (which is what I using a seqlock to protect the FIEMAP routines would help). - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() 2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth 2011-10-10 7:19 ` Lukas Czerner @ 2011-10-26 8:38 ` Ted Ts'o 1 sibling, 0 replies; 6+ messages in thread From: Ted Ts'o @ 2011-10-26 8:38 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: adilger.kernel, linux-ext4 On Sat, Oct 08, 2011 at 06:01:14PM -0700, Curt Wohlgemuth wrote: > In ext4_ext_next_allocated_block(), the path[depth] might > have a p_ext that is NULL -- see ext4_ext_binsearch(). In > such a case, dereferencing it will crash the machine. > > This patch checks for p_ext == NULL in > ext4_ext_next_allocated_block() before dereferencinging it. > > Tested using a hand-crafted an inode with eh_entries == 0 in > an extent block, verified that running FIEMAP on it crashes > without this patch, works fine with it. > > Signed-off-by: Curt Wohlgemuth <curtw@google.com> Thanks, applied. - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-10-26 8:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-10-09 1:01 [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block() Curt Wohlgemuth 2011-10-10 7:19 ` Lukas Czerner 2011-10-10 15:28 ` Curt Wohlgemuth 2011-10-11 7:01 ` Dmitry Monakhov 2011-10-26 8:26 ` Ted Ts'o 2011-10-26 8:38 ` Ted Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).