* [PATCH 0/2] xfs: fixes for for-next @ 2014-03-03 5:39 Dave Chinner 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner 2014-03-03 5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner 0 siblings, 2 replies; 14+ messages in thread From: Dave Chinner @ 2014-03-03 5:39 UTC (permalink / raw) To: xfs Hi folks, A couple of quick fixes - the new verifier code leaks EFSBADCRC to userspace, so any code that is checking for EFSCORRUPTED is not going to be working correctly. The first patch fixes this regression. The second patch fixes a long standing issues with vmapped RAM potentially causing deadlocks due to GFP_KERNEL allocations being done for page tables inside GFP_NOFS contexts. This will fix many of the lockdep reports about it, but won't get rid of them all. It will, however, guarantee that we won't deadlock due to the problem which is a big step up. Cheers, Dave. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner @ 2014-03-03 5:39 ` Dave Chinner 2014-03-03 17:34 ` Eric Sandeen ` (2 more replies) 2014-03-03 5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner 1 sibling, 3 replies; 14+ messages in thread From: Dave Chinner @ 2014-03-03 5:39 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> While the verifier reoutines may return EFSBADCRC when a buffer ahs a bad CRC, we need to translate that to EFSCORRUPTED so that the higher layers treat the error appropriately and so we return a consistent error to userspace. This fixes a xfs/005 regression. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_mount.c | 3 +++ fs/xfs/xfs_symlink.c | 4 ++++ fs/xfs/xfs_trans_buf.c | 11 +++++++++++ 3 files changed, 18 insertions(+) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index f96c056..993cb19 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -314,6 +314,9 @@ reread: error = bp->b_error; if (loud) xfs_warn(mp, "SB validate failed with error %d.", error); + /* bad CRC means corrupted metadata */ + if (error == EFSBADCRC) + error = EFSCORRUPTED; goto release_buf; } diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 14e58f2..5fda189 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -80,6 +80,10 @@ xfs_readlink_bmap( if (error) { xfs_buf_ioerror_alert(bp, __func__); xfs_buf_relse(bp); + + /* bad CRC means corrupted metadata */ + if (error == EFSBADCRC) + error = EFSCORRUPTED; goto out; } byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index 647b6f1..b8eef05 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -275,6 +275,10 @@ xfs_trans_read_buf_map( XFS_BUF_UNDONE(bp); xfs_buf_stale(bp); xfs_buf_relse(bp); + + /* bad CRC means corrupted metadata */ + if (error == EFSBADCRC) + error = EFSCORRUPTED; return error; } #ifdef DEBUG @@ -338,6 +342,9 @@ xfs_trans_read_buf_map( if (tp->t_flags & XFS_TRANS_DIRTY) xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR); + /* bad CRC means corrupted metadata */ + if (error == EFSBADCRC) + error = EFSCORRUPTED; return error; } } @@ -375,6 +382,10 @@ xfs_trans_read_buf_map( if (tp->t_flags & XFS_TRANS_DIRTY) xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR); xfs_buf_relse(bp); + + /* bad CRC means corrupted metadata */ + if (error == EFSBADCRC) + error = EFSCORRUPTED; return error; } #ifdef DEBUG -- 1.9.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner @ 2014-03-03 17:34 ` Eric Sandeen 2014-03-03 22:13 ` Dave Chinner 2014-03-03 17:44 ` Brian Foster 2014-03-05 15:48 ` Brian Foster 2 siblings, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2014-03-03 17:34 UTC (permalink / raw) To: Dave Chinner, xfs On 3/2/14, 11:39 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > a bad CRC, we need to translate that to EFSCORRUPTED so that the > higher layers treat the error appropriately and so we return a > consistent error to userspace. This fixes a xfs/005 regression. Can you say a little more about the philosophy here? xfs/005 regresses because it expects "structure needs cleaning" So if we instead return our (icky) CRC error code, we get something else. But it is truly a different root cause. So the goal is to NEVER leak EFSBADCRC to userspace? Maybe a comment above that error definition would help document that. And I'm bit worried that we'll leak more in the future if things changed, or if things got missed here. Everything you have here looks fine, but it's not obvious that every path has been caught; it seems a bit random. I know we _just_ merged my "differentiator" patches, but I wonder if it would be better to add XFS_BSTATE_BADCRC to b_state or some other field, and go back to always assigning EFSCORRUPTED. What do you think? When I wrote those I wasn't thinking about keeping it all internal to the filesystem. -Eric > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_mount.c | 3 +++ > fs/xfs/xfs_symlink.c | 4 ++++ > fs/xfs/xfs_trans_buf.c | 11 +++++++++++ > 3 files changed, 18 insertions(+) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index f96c056..993cb19 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -314,6 +314,9 @@ reread: > error = bp->b_error; > if (loud) > xfs_warn(mp, "SB validate failed with error %d.", error); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto release_buf; > } > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c > index 14e58f2..5fda189 100644 > --- a/fs/xfs/xfs_symlink.c > +++ b/fs/xfs/xfs_symlink.c > @@ -80,6 +80,10 @@ xfs_readlink_bmap( > if (error) { > xfs_buf_ioerror_alert(bp, __func__); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto out; > } > byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); > diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c > index 647b6f1..b8eef05 100644 > --- a/fs/xfs/xfs_trans_buf.c > +++ b/fs/xfs/xfs_trans_buf.c > @@ -275,6 +275,10 @@ xfs_trans_read_buf_map( > XFS_BUF_UNDONE(bp); > xfs_buf_stale(bp); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > @@ -338,6 +342,9 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, > SHUTDOWN_META_IO_ERROR); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > } > @@ -375,6 +382,10 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 17:34 ` Eric Sandeen @ 2014-03-03 22:13 ` Dave Chinner 2014-03-03 22:20 ` Eric Sandeen 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2014-03-03 22:13 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote: > On 3/2/14, 11:39 PM, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > > a bad CRC, we need to translate that to EFSCORRUPTED so that the > > higher layers treat the error appropriately and so we return a > > consistent error to userspace. This fixes a xfs/005 regression. > > Can you say a little more about the philosophy here? > > xfs/005 regresses because it expects "structure needs cleaning" > > So if we instead return our (icky) CRC error code, we get something else. > > But it is truly a different root cause. > > So the goal is to NEVER leak EFSBADCRC to userspace? Maybe a comment > above that error definition would help document that. Not permanently. At the moment, none of the code handles it correctly, and the leak to userspace is just a symptom that tells us we got somethign wrong. We have plenty of places where we check for EFSCORRUPTED and do something special, but if we get EFSBADCRC instead it will do the wrong thing.... > And I'm bit worried that we'll leak more in the future if things changed, > or if things got missed here. Everything you have here looks fine, but > it's not obvious that every path has been caught; it seems a bit random. It's not random. It's buffer reads that matter, and I checked all the calls to xfs_buf_read, xfs_buf_read_map, xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other read interfaces that use verifiers, and so nothing else can return EFSBADCRC. For the log recovery cases, the buffer reads don' use verifiers, and those that do won't return EFSBADCRC (e.g. inode buffers). > I know we _just_ merged my "differentiator" patches, but I wonder if > it would be better to add XFS_BSTATE_BADCRC to b_state or some other > field, and go back to always assigning EFSCORRUPTED. What do you think? It's just the first layer of adding differentiating support. We've just put the mechanism in place to do the differentiation because we need it for *userspace functionality* before we need it for in-kernel functionality. We put it in the kernel because it has value to us developers to indicate what type of corruption error was detected in the dmesg output. We can't however, do everything at once, so for the moment the kernel code needs to translate it back to something the higher layers understand and treat correctly. > When I wrote those I wasn't thinking about keeping it all internal > to the filesystem. Only for the moment, until there's code in the kernel that makes it a meaningfully different error. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 22:13 ` Dave Chinner @ 2014-03-03 22:20 ` Eric Sandeen 2014-03-03 22:31 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2014-03-03 22:20 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 3/3/14, 4:13 PM, Dave Chinner wrote: > On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote: >> On 3/2/14, 11:39 PM, Dave Chinner wrote: >>> From: Dave Chinner <dchinner@redhat.com> >>> >>> While the verifier reoutines may return EFSBADCRC when a buffer ahs >>> a bad CRC, we need to translate that to EFSCORRUPTED so that the >>> higher layers treat the error appropriately and so we return a >>> consistent error to userspace. This fixes a xfs/005 regression. >> >> Can you say a little more about the philosophy here? >> >> xfs/005 regresses because it expects "structure needs cleaning" >> >> So if we instead return our (icky) CRC error code, we get something else. >> >> But it is truly a different root cause. >> >> So the goal is to NEVER leak EFSBADCRC to userspace? Maybe a comment >> above that error definition would help document that. > > Not permanently. At the moment, none of the code handles it > correctly, and the leak to userspace is just a symptom that tells us > we got somethign wrong. We have plenty of places where we check for > EFSCORRUPTED and do something special, but if we get EFSBADCRC > instead it will do the wrong thing.... > >> And I'm bit worried that we'll leak more in the future if things changed, >> or if things got missed here. Everything you have here looks fine, but >> it's not obvious that every path has been caught; it seems a bit random. > > It's not random. It's buffer reads that matter, and I > checked all the calls to xfs_buf_read, xfs_buf_read_map, > xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other > read interfaces that use verifiers, and so nothing else can return > EFSBADCRC. For the log recovery cases, the buffer reads don' use > verifiers, and those that do won't return EFSBADCRC (e.g. inode > buffers). > >> I know we _just_ merged my "differentiator" patches, but I wonder if >> it would be better to add XFS_BSTATE_BADCRC to b_state or some other >> field, and go back to always assigning EFSCORRUPTED. What do you think? > > It's just the first layer of adding differentiating support. We've > just put the mechanism in place to do the differentiation because we > need it for *userspace functionality* before we need it for > in-kernel functionality. We put it in the kernel because it has > value to us developers to indicate what type of corruption error was > detected in the dmesg output. We can't however, do everything at > once, so for the moment the kernel code needs to translate it back > to something the higher layers understand and treat correctly. > >> When I wrote those I wasn't thinking about keeping it all internal >> to the filesystem. > > Only for the moment, until there's code in the kernel that makes it > a meaningfully different error. Ok, thanks. Modulo Brian's question about other paths, what is here so far looks ok to me, then. A commit message that indicates that this is somewhat temporary might be in order? -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 22:20 ` Eric Sandeen @ 2014-03-03 22:31 ` Dave Chinner 0 siblings, 0 replies; 14+ messages in thread From: Dave Chinner @ 2014-03-03 22:31 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Mon, Mar 03, 2014 at 04:20:34PM -0600, Eric Sandeen wrote: > On 3/3/14, 4:13 PM, Dave Chinner wrote: > > On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote: > >> On 3/2/14, 11:39 PM, Dave Chinner wrote: > >>> From: Dave Chinner <dchinner@redhat.com> > >>> > >>> While the verifier reoutines may return EFSBADCRC when a buffer ahs > >>> a bad CRC, we need to translate that to EFSCORRUPTED so that the > >>> higher layers treat the error appropriately and so we return a > >>> consistent error to userspace. This fixes a xfs/005 regression. > >> > >> Can you say a little more about the philosophy here? > >> > >> xfs/005 regresses because it expects "structure needs cleaning" > >> > >> So if we instead return our (icky) CRC error code, we get something else. > >> > >> But it is truly a different root cause. > >> > >> So the goal is to NEVER leak EFSBADCRC to userspace? Maybe a comment > >> above that error definition would help document that. > > > > Not permanently. At the moment, none of the code handles it > > correctly, and the leak to userspace is just a symptom that tells us > > we got somethign wrong. We have plenty of places where we check for > > EFSCORRUPTED and do something special, but if we get EFSBADCRC > > instead it will do the wrong thing.... > > > >> And I'm bit worried that we'll leak more in the future if things changed, > >> or if things got missed here. Everything you have here looks fine, but > >> it's not obvious that every path has been caught; it seems a bit random. > > > > It's not random. It's buffer reads that matter, and I > > checked all the calls to xfs_buf_read, xfs_buf_read_map, > > xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other > > read interfaces that use verifiers, and so nothing else can return > > EFSBADCRC. For the log recovery cases, the buffer reads don' use > > verifiers, and those that do won't return EFSBADCRC (e.g. inode > > buffers). > > > >> I know we _just_ merged my "differentiator" patches, but I wonder if > >> it would be better to add XFS_BSTATE_BADCRC to b_state or some other > >> field, and go back to always assigning EFSCORRUPTED. What do you think? > > > > It's just the first layer of adding differentiating support. We've > > just put the mechanism in place to do the differentiation because we > > need it for *userspace functionality* before we need it for > > in-kernel functionality. We put it in the kernel because it has > > value to us developers to indicate what type of corruption error was > > detected in the dmesg output. We can't however, do everything at > > once, so for the moment the kernel code needs to translate it back > > to something the higher layers understand and treat correctly. > > > >> When I wrote those I wasn't thinking about keeping it all internal > >> to the filesystem. > > > > Only for the moment, until there's code in the kernel that makes it > > a meaningfully different error. > > Ok, thanks. Modulo Brian's question about other paths, what is here > so far looks ok to me, then. A commit message that indicates that > this is somewhat temporary might be in order? Sure, I can improve the commit message by including a summary of this discussion. ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner 2014-03-03 17:34 ` Eric Sandeen @ 2014-03-03 17:44 ` Brian Foster 2014-03-03 22:29 ` Dave Chinner 2014-03-05 15:48 ` Brian Foster 2 siblings, 1 reply; 14+ messages in thread From: Brian Foster @ 2014-03-03 17:44 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > a bad CRC, we need to translate that to EFSCORRUPTED so that the > higher layers treat the error appropriately and so we return a > consistent error to userspace. This fixes a xfs/005 regression. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- This change looks Ok to me, but when I start looking through the users of bp->b_error, I see examples like xfs_dir3_data_read() being called in xfs_dir2_leaf_addname() where it looks like an error could bubble all the way up to xfs_vn_mknod() and its callers. If the intent is to use EFSBADCRC as an internal-only error to differentiate corruption from crc failure, why not push this more closely to the boundaries that we have already defined? For example, we already convert positive errnos to negative at the internal/external boundaries. Could we convert those to use some kind of XFS_USERSPACE_ERROR(error) macro/helper that converts errors appropriately? Another thought could be to reconsider whether we still need some of these extra warnings, as in the xfs_mount.c hunk below, now that we have the generic xfs_verifier_error() messaging. E.g., if we could remove those, perhaps we could snub out EFSBADCRC in or around the verifier after it makes a distinction. Brian > fs/xfs/xfs_mount.c | 3 +++ > fs/xfs/xfs_symlink.c | 4 ++++ > fs/xfs/xfs_trans_buf.c | 11 +++++++++++ > 3 files changed, 18 insertions(+) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index f96c056..993cb19 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -314,6 +314,9 @@ reread: > error = bp->b_error; > if (loud) > xfs_warn(mp, "SB validate failed with error %d.", error); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto release_buf; > } > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c > index 14e58f2..5fda189 100644 > --- a/fs/xfs/xfs_symlink.c > +++ b/fs/xfs/xfs_symlink.c > @@ -80,6 +80,10 @@ xfs_readlink_bmap( > if (error) { > xfs_buf_ioerror_alert(bp, __func__); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto out; > } > byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); > diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c > index 647b6f1..b8eef05 100644 > --- a/fs/xfs/xfs_trans_buf.c > +++ b/fs/xfs/xfs_trans_buf.c > @@ -275,6 +275,10 @@ xfs_trans_read_buf_map( > XFS_BUF_UNDONE(bp); > xfs_buf_stale(bp); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > @@ -338,6 +342,9 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, > SHUTDOWN_META_IO_ERROR); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > } > @@ -375,6 +382,10 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > -- > 1.9.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 17:44 ` Brian Foster @ 2014-03-03 22:29 ` Dave Chinner 2014-03-04 1:20 ` Brian Foster 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2014-03-03 22:29 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote: > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > > a bad CRC, we need to translate that to EFSCORRUPTED so that the > > higher layers treat the error appropriately and so we return a > > consistent error to userspace. This fixes a xfs/005 regression. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > This change looks Ok to me, but when I start looking through the users > of bp->b_error, I see examples like xfs_dir3_data_read() being called in > xfs_dir2_leaf_addname() where it looks like an error could bubble all > the way up to xfs_vn_mknod() and its callers. Sure, but: xfs_dir3_data_read xfs_da_read_buf xfs_trans_read_buf_map Which means the patch prevents the EFSBADCRC leaking back out through that path because it converts it in xfs_trans_read_buf_map. > If the intent is to use EFSBADCRC as an internal-only error to > differentiate corruption from crc failure, why not push this more > closely to the boundaries that we have already defined? For example, we > already convert positive errnos to negative at the internal/external > boundaries. Could we convert those to use some kind of > XFS_USERSPACE_ERROR(error) macro/helper that converts errors > appropriately? That doesn't solve the problem needing an error conversion layer in the first place. The long term goal is to remove the error conversions in XFS by converting the core code to the same error passing conventions as the rest of the kernel code. We manage to screw the negation up fairly regularly because it is convoluted and we cal into generic code that returns negative errors from the core that returns positive errors in lots of places. The conversion surface is just too large to manage sanely. > Another thought could be to reconsider whether we still need some of > these extra warnings, as in the xfs_mount.c hunk below, now that we have > the generic xfs_verifier_error() messaging. E.g., if we could remove > those, perhaps we could snub out EFSBADCRC in or around the verifier > after it makes a distinction. Redundant errors aren't s significant problem. It's the lack of meaningful error messages that are much more of an issue. We get more meaningful error messages as a result of the EFSBADCRC changes that have been made, but for the moment that error simply means EFSCORRUPTED to the higher layers. Hence the translation back to EFSCORRUPTED at the (low) layers where the error no longer has a distinct meaning. As we add more functionality, EFSBADCRC will become more meaningful and so get propagated higher into the kernel code. But for now, it should remain an error that doesn't escape the lower layers... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 22:29 ` Dave Chinner @ 2014-03-04 1:20 ` Brian Foster 2014-03-04 4:32 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Brian Foster @ 2014-03-04 1:20 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote: > On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote: > > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > > > a bad CRC, we need to translate that to EFSCORRUPTED so that the > > > higher layers treat the error appropriately and so we return a > > > consistent error to userspace. This fixes a xfs/005 regression. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > > This change looks Ok to me, but when I start looking through the users > > of bp->b_error, I see examples like xfs_dir3_data_read() being called in > > xfs_dir2_leaf_addname() where it looks like an error could bubble all > > the way up to xfs_vn_mknod() and its callers. > > Sure, but: > > xfs_dir3_data_read > xfs_da_read_buf > xfs_trans_read_buf_map > > Which means the patch prevents the EFSBADCRC leaking back out > through that path because it converts it in xfs_trans_read_buf_map. > Ok, I see. FWIW, I was also trolling through some of the log recovery code and it appears that code uses b_ops for write verification purposes only (i.e., crc generation I suspect), correct? > > If the intent is to use EFSBADCRC as an internal-only error to > > differentiate corruption from crc failure, why not push this more > > closely to the boundaries that we have already defined? For example, we > > already convert positive errnos to negative at the internal/external > > boundaries. Could we convert those to use some kind of > > XFS_USERSPACE_ERROR(error) macro/helper that converts errors > > appropriately? > > That doesn't solve the problem needing an error conversion layer in > the first place. The long term goal is to remove the error > conversions in XFS by converting the core code to the same error > passing conventions as the rest of the kernel code. We manage to > screw the negation up fairly regularly because it is convoluted and > we cal into generic code that returns negative errors from the core > that returns positive errors in lots of places. The conversion > surface is just too large to manage sanely. > Well it wasn't clear that the error conversion layer was a problem that itself needed solving, but fair enough. ;) If the objective is to move away from the positive/negative business to something more "kernel native," then building more infrastructure around that is probably not the way to go, unless that was actually part of the changeover strategy (e.g., if we happened to use something that conditionally negated error values to support incremental codepath conversions... as a semi-random thought). > > Another thought could be to reconsider whether we still need some of > > these extra warnings, as in the xfs_mount.c hunk below, now that we have > > the generic xfs_verifier_error() messaging. E.g., if we could remove > > those, perhaps we could snub out EFSBADCRC in or around the verifier > > after it makes a distinction. > > Redundant errors aren't s significant problem. It's the lack of > meaningful error messages that are much more of an issue. We get > more meaningful error messages as a result of the EFSBADCRC changes > that have been made, but for the moment that error simply means > EFSCORRUPTED to the higher layers. Hence the translation back to > EFSCORRUPTED at the (low) layers where the error no longer has > a distinct meaning. > Right, the purpose is not to solve the problem of redundant errors. Rather, if the only thing preventing the consolidation of the EFSBADCRC trap to a single hunk are redundant errors, then it wouldn't be a problem to remove those errors and push the EFSBADCRC trap into the generic code for the purpose of clarity. Brian > As we add more functionality, EFSBADCRC will become more meaningful > and so get propagated higher into the kernel code. But for now, it > should remain an error that doesn't escape the lower layers... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-04 1:20 ` Brian Foster @ 2014-03-04 4:32 ` Dave Chinner 2014-03-04 14:01 ` Brian Foster 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2014-03-04 4:32 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Mon, Mar 03, 2014 at 08:20:57PM -0500, Brian Foster wrote: > On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote: > > On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote: > > > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > > > > a bad CRC, we need to translate that to EFSCORRUPTED so that the > > > > higher layers treat the error appropriately and so we return a > > > > consistent error to userspace. This fixes a xfs/005 regression. > > > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > > --- > > > > > > This change looks Ok to me, but when I start looking through the users > > > of bp->b_error, I see examples like xfs_dir3_data_read() being called in > > > xfs_dir2_leaf_addname() where it looks like an error could bubble all > > > the way up to xfs_vn_mknod() and its callers. > > > > Sure, but: > > > > xfs_dir3_data_read > > xfs_da_read_buf > > xfs_trans_read_buf_map > > > > Which means the patch prevents the EFSBADCRC leaking back out > > through that path because it converts it in xfs_trans_read_buf_map. > > > > Ok, I see. FWIW, I was also trolling through some of the log recovery > code and it appears that code uses b_ops for write verification purposes > only (i.e., crc generation I suspect), correct? Yes. > > > If the intent is to use EFSBADCRC as an internal-only error to > > > differentiate corruption from crc failure, why not push this more > > > closely to the boundaries that we have already defined? For example, we > > > already convert positive errnos to negative at the internal/external > > > boundaries. Could we convert those to use some kind of > > > XFS_USERSPACE_ERROR(error) macro/helper that converts errors > > > appropriately? > > > > That doesn't solve the problem needing an error conversion layer in > > the first place. The long term goal is to remove the error > > conversions in XFS by converting the core code to the same error > > passing conventions as the rest of the kernel code. We manage to > > screw the negation up fairly regularly because it is convoluted and > > we cal into generic code that returns negative errors from the core > > that returns positive errors in lots of places. The conversion > > surface is just too large to manage sanely. > > > > Well it wasn't clear that the error conversion layer was a problem that > itself needed solving, but fair enough. ;) If the objective is to move > away from the positive/negative business to something more "kernel > native," then building more infrastructure around that is probably not > the way to go, unless that was actually part of the changeover strategy > (e.g., if we happened to use something that conditionally negated error > values to support incremental codepath conversions... as a semi-random > thought). I'm not sure there's any real benefit to adding infrastructure when doing the conversion. It's a lot of little changes involving pushing the conversion down from the top layers, combined with converting interfaces (e.g. bmapi) as they get exposed. Some functions will be able to lose variables as the return becomes a tri-state value (e.g. the btree functions for looking up records); others will be able to use IS_ERR() for pointer returns, and so on. This will have a flow on effect on the code in the callers, and so it really comes down to spending the time to peel back each layer carefully. It's a lot of work, hence the "long term goal" aspect of it. We've been talking about doing this cleanup for several years now, but it hasn't yet been done because it's going to take hundreds of patches to do... ;) > > > Another thought could be to reconsider whether we still need some of > > > these extra warnings, as in the xfs_mount.c hunk below, now that we have > > > the generic xfs_verifier_error() messaging. E.g., if we could remove > > > those, perhaps we could snub out EFSBADCRC in or around the verifier > > > after it makes a distinction. > > > > Redundant errors aren't s significant problem. It's the lack of > > meaningful error messages that are much more of an issue. We get > > more meaningful error messages as a result of the EFSBADCRC changes > > that have been made, but for the moment that error simply means > > EFSCORRUPTED to the higher layers. Hence the translation back to > > EFSCORRUPTED at the (low) layers where the error no longer has > > a distinct meaning. > > > > Right, the purpose is not to solve the problem of redundant errors. > Rather, if the only thing preventing the consolidation of the EFSBADCRC > trap to a single hunk are redundant errors, then it wouldn't be a > problem to remove those errors and push the EFSBADCRC trap into the > generic code for the purpose of clarity. I don't see much point in trying to have a single trap for EFSBADCRC because the moment we want to handle one of those cases specially we are going to have to - as a first step - remove the single trap and push it outwards to all the places that need it like this patch does.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-04 4:32 ` Dave Chinner @ 2014-03-04 14:01 ` Brian Foster 0 siblings, 0 replies; 14+ messages in thread From: Brian Foster @ 2014-03-04 14:01 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Mar 04, 2014 at 03:32:25PM +1100, Dave Chinner wrote: > On Mon, Mar 03, 2014 at 08:20:57PM -0500, Brian Foster wrote: > > On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote: > > > On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote: > > > > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > > > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > > > > > a bad CRC, we need to translate that to EFSCORRUPTED so that the > > > > > higher layers treat the error appropriately and so we return a > > > > > consistent error to userspace. This fixes a xfs/005 regression. > > > > > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > > > --- > > > > > > > > This change looks Ok to me, but when I start looking through the users > > > > of bp->b_error, I see examples like xfs_dir3_data_read() being called in > > > > xfs_dir2_leaf_addname() where it looks like an error could bubble all > > > > the way up to xfs_vn_mknod() and its callers. > > > > > > Sure, but: > > > > > > xfs_dir3_data_read > > > xfs_da_read_buf > > > xfs_trans_read_buf_map > > > > > > Which means the patch prevents the EFSBADCRC leaking back out > > > through that path because it converts it in xfs_trans_read_buf_map. > > > > > > > Ok, I see. FWIW, I was also trolling through some of the log recovery > > code and it appears that code uses b_ops for write verification purposes > > only (i.e., crc generation I suspect), correct? > > Yes. > > > > > If the intent is to use EFSBADCRC as an internal-only error to > > > > differentiate corruption from crc failure, why not push this more > > > > closely to the boundaries that we have already defined? For example, we > > > > already convert positive errnos to negative at the internal/external > > > > boundaries. Could we convert those to use some kind of > > > > XFS_USERSPACE_ERROR(error) macro/helper that converts errors > > > > appropriately? > > > > > > That doesn't solve the problem needing an error conversion layer in > > > the first place. The long term goal is to remove the error > > > conversions in XFS by converting the core code to the same error > > > passing conventions as the rest of the kernel code. We manage to > > > screw the negation up fairly regularly because it is convoluted and > > > we cal into generic code that returns negative errors from the core > > > that returns positive errors in lots of places. The conversion > > > surface is just too large to manage sanely. > > > > > > > Well it wasn't clear that the error conversion layer was a problem that > > itself needed solving, but fair enough. ;) If the objective is to move > > away from the positive/negative business to something more "kernel > > native," then building more infrastructure around that is probably not > > the way to go, unless that was actually part of the changeover strategy > > (e.g., if we happened to use something that conditionally negated error > > values to support incremental codepath conversions... as a semi-random > > thought). > > I'm not sure there's any real benefit to adding infrastructure when > doing the conversion. It's a lot of little changes involving pushing > the conversion down from the top layers, combined with converting > interfaces (e.g. bmapi) as they get exposed. > > Some functions will be able to lose variables as the return becomes > a tri-state value (e.g. the btree functions for looking up records); > others will be able to use IS_ERR() for pointer returns, and so on. > This will have a flow on effect on the code in the callers, and so > it really comes down to spending the time to peel back each layer > carefully. > > It's a lot of work, hence the "long term goal" aspect of it. We've > been talking about doing this cleanup for several years now, but it > hasn't yet been done because it's going to take hundreds of patches > to do... ;) > Indeed. I just wasn't aware of that, so I was thinking aloud a bit (along the same lines of pushing the conversion from top-to-bottom). In any event, it's good to have this in mind going forward. > > > > Another thought could be to reconsider whether we still need some of > > > > these extra warnings, as in the xfs_mount.c hunk below, now that we have > > > > the generic xfs_verifier_error() messaging. E.g., if we could remove > > > > those, perhaps we could snub out EFSBADCRC in or around the verifier > > > > after it makes a distinction. > > > > > > Redundant errors aren't s significant problem. It's the lack of > > > meaningful error messages that are much more of an issue. We get > > > more meaningful error messages as a result of the EFSBADCRC changes > > > that have been made, but for the moment that error simply means > > > EFSCORRUPTED to the higher layers. Hence the translation back to > > > EFSCORRUPTED at the (low) layers where the error no longer has > > > a distinct meaning. > > > > > > > Right, the purpose is not to solve the problem of redundant errors. > > Rather, if the only thing preventing the consolidation of the EFSBADCRC > > trap to a single hunk are redundant errors, then it wouldn't be a > > problem to remove those errors and push the EFSBADCRC trap into the > > generic code for the purpose of clarity. > > I don't see much point in trying to have a single trap for EFSBADCRC > because the moment we want to handle one of those cases specially we > are going to have to - as a first step - remove the single trap and > push it outwards to all the places that need it like this patch > does.... > Agreed. I'm less concerned about it with the understanding that EFSBADCRC is not to be forever isolated to the world of verifiers. I just wanted to make sure my point was clear. ;) Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner 2014-03-03 17:34 ` Eric Sandeen 2014-03-03 17:44 ` Brian Foster @ 2014-03-05 15:48 ` Brian Foster 2 siblings, 0 replies; 14+ messages in thread From: Brian Foster @ 2014-03-05 15:48 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > While the verifier reoutines may return EFSBADCRC when a buffer ahs > a bad CRC, we need to translate that to EFSCORRUPTED so that the > higher layers treat the error appropriately and so we return a > consistent error to userspace. This fixes a xfs/005 regression. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- Looking through again, I don't see any obvious cases where EFSBADCRC could slip through. The verifier users mostly call xfs_trans_read_buf_map(). The readahead case looks like it should be handled by the xfs_buf_iowait() checks therein. The one case I came across that looked suspicious is the readahead/read cycle for inode log recovery in xlog_recover_commit_trans(). This calls into xfs_buf_read() and ultimately uses bp->b_error directly in xlog_recover_inode_pass2(). That said, I don't see any crc checking in these verifiers and thus they don't source EFSBADCRC. Looks good to me... Reviewed-by: Brian Foster <bfoster@redhat.com> > fs/xfs/xfs_mount.c | 3 +++ > fs/xfs/xfs_symlink.c | 4 ++++ > fs/xfs/xfs_trans_buf.c | 11 +++++++++++ > 3 files changed, 18 insertions(+) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index f96c056..993cb19 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -314,6 +314,9 @@ reread: > error = bp->b_error; > if (loud) > xfs_warn(mp, "SB validate failed with error %d.", error); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto release_buf; > } > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c > index 14e58f2..5fda189 100644 > --- a/fs/xfs/xfs_symlink.c > +++ b/fs/xfs/xfs_symlink.c > @@ -80,6 +80,10 @@ xfs_readlink_bmap( > if (error) { > xfs_buf_ioerror_alert(bp, __func__); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > goto out; > } > byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt); > diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c > index 647b6f1..b8eef05 100644 > --- a/fs/xfs/xfs_trans_buf.c > +++ b/fs/xfs/xfs_trans_buf.c > @@ -275,6 +275,10 @@ xfs_trans_read_buf_map( > XFS_BUF_UNDONE(bp); > xfs_buf_stale(bp); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > @@ -338,6 +342,9 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, > SHUTDOWN_META_IO_ERROR); > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > } > @@ -375,6 +382,10 @@ xfs_trans_read_buf_map( > if (tp->t_flags & XFS_TRANS_DIRTY) > xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR); > xfs_buf_relse(bp); > + > + /* bad CRC means corrupted metadata */ > + if (error == EFSBADCRC) > + error = EFSCORRUPTED; > return error; > } > #ifdef DEBUG > -- > 1.9.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram 2014-03-03 5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner @ 2014-03-03 5:39 ` Dave Chinner 2014-03-05 16:58 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2014-03-03 5:39 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> When we map pages in the buffer cache, we can do so in GFP_NOFS contexts. However, the vmap interfaces do not provide any method of communicating this information to memory reclaim, and hence we get lockdep complaining about it regularly and occassionally see hangs that may be vmap related reclaim deadlocks. We can also see these same problems from anywhere where we use vmalloc for a large buffer (e.g. attribute code) inside a transaction context. A typical lockdep report shows up as a reclaim state warning like so: [14046.101458] ================================= [14046.102850] [ INFO: inconsistent lock state ] [14046.102850] 3.14.0-rc4+ #2 Not tainted [14046.102850] --------------------------------- [14046.102850] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [14046.102850] kswapd0/14 [HC0[0]:SC0[0]:HE1:SE1] takes: [14046.102850] (&xfs_dir_ilock_class){++++?+}, at: [<791a04bb>] xfs_ilock+0xff/0x16a [14046.102850] {RECLAIM_FS-ON-W} state was registered at: [14046.102850] [<7904cdb1>] mark_held_locks+0x81/0xe7 [14046.102850] [<7904d390>] lockdep_trace_alloc+0x5c/0xb4 [14046.102850] [<790c2c28>] kmem_cache_alloc_trace+0x2b/0x11e [14046.102850] [<790ba7f4>] vm_map_ram+0x119/0x3e6 [14046.102850] [<7914e124>] _xfs_buf_map_pages+0x5b/0xcf [14046.102850] [<7914ed74>] xfs_buf_get_map+0x67/0x13f [14046.102850] [<7917506f>] xfs_attr_rmtval_set+0x396/0x4d5 [14046.102850] [<7916e8bb>] xfs_attr_leaf_addname+0x18f/0x37d [14046.102850] [<7916ed9e>] xfs_attr_set_int+0x2f5/0x3e8 [14046.102850] [<7916eefc>] xfs_attr_set+0x6b/0x74 [14046.102850] [<79168355>] xfs_xattr_set+0x61/0x81 [14046.102850] [<790e5b10>] generic_setxattr+0x59/0x68 [14046.102850] [<790e4c06>] __vfs_setxattr_noperm+0x58/0xce [14046.102850] [<790e4d0a>] vfs_setxattr+0x8e/0x92 [14046.102850] [<790e4ddd>] setxattr+0xcf/0x159 [14046.102850] [<790e5423>] SyS_lsetxattr+0x88/0xbb [14046.102850] [<79268438>] sysenter_do_call+0x12/0x36 Now, we can't completely remove these traces - mainly because vm_map_ram() will do GFP_KERNEL allocation and that generates the above warning before we get into the reclaim code, but we can turn them all into false positive warnings. To do that, use the method that DM and other IO context code uses to avoid this problem: there is a process flag to tell memory reclaim not to do IO that we can set appropriately. That prevents GFP_KERNEL context reclaim being done from deep inside the vmalloc code in places we can't directly pass a GFP_NOFS context to. That interface has a pair of wrapper functions: memalloc_noio_save() and memalloc_noio_restore(). Adding them around vm_map_ram and the vzalloc call in kmem_alloc_large() will prevent deadlocks and most lockdep reports for this issue. Also, convert the vzalloc() call in kmem_alloc_large() to use __vmalloc() so that we can pass the correct gfp context to the data page allocation routine inside __vmalloc() so that it is clear that GFP_NOFS context is important to this vmalloc call. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/kmem.c | 21 ++++++++++++++++++++- fs/xfs/xfs_buf.c | 11 +++++++++++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 66a36be..844e288 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -65,12 +65,31 @@ kmem_alloc(size_t size, xfs_km_flags_t flags) void * kmem_zalloc_large(size_t size, xfs_km_flags_t flags) { + unsigned noio_flag = 0; void *ptr; + gfp_t lflags; ptr = kmem_zalloc(size, flags | KM_MAYFAIL); if (ptr) return ptr; - return vzalloc(size); + + /* + * __vmalloc() will allocate data pages and auxillary structures (e.g. + * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context + * here. Hence we need to tell memory reclaim that we are in such a + * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering + * the filesystem here and potentially deadlocking. + */ + if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS)) + noio_flag = memalloc_noio_save(); + + lflags = kmem_flags_convert(flags); + ptr = __vmalloc(size, lflags | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL); + + if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS)) + memalloc_noio_restore(noio_flag); + + return ptr; } void diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 9c061ef..107f2fd 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -396,7 +396,17 @@ _xfs_buf_map_pages( bp->b_addr = NULL; } else { int retried = 0; + unsigned noio_flag; + /* + * vm_map_ram() will allocate auxillary structures (e.g. + * pagetables) with GFP_KERNEL, yet we are likely to be under + * GFP_NOFS context here. Hence we need to tell memory reclaim + * that we are in such a context via PF_MEMALLOC_NOIO to prevent + * memory reclaim re-entering the filesystem here and + * potentially deadlocking. + */ + noio_flag = memalloc_noio_save(); do { bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count, -1, PAGE_KERNEL); @@ -404,6 +414,7 @@ _xfs_buf_map_pages( break; vm_unmap_aliases(); } while (retried++ <= 1); + memalloc_noio_restore(noio_flag); if (!bp->b_addr) return -ENOMEM; -- 1.9.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram 2014-03-03 5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner @ 2014-03-05 16:58 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2014-03-05 16:58 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Ugly, but given that it seems like we can't get the flags passing fixed properly: Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-03-05 16:58 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-03 5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner 2014-03-03 5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner 2014-03-03 17:34 ` Eric Sandeen 2014-03-03 22:13 ` Dave Chinner 2014-03-03 22:20 ` Eric Sandeen 2014-03-03 22:31 ` Dave Chinner 2014-03-03 17:44 ` Brian Foster 2014-03-03 22:29 ` Dave Chinner 2014-03-04 1:20 ` Brian Foster 2014-03-04 4:32 ` Dave Chinner 2014-03-04 14:01 ` Brian Foster 2014-03-05 15:48 ` Brian Foster 2014-03-03 5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner 2014-03-05 16:58 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).