linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fixes for for-next
@ 2014-03-03  5:39 Dave Chinner
  2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
  2014-03-03  5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2014-03-03  5:39 UTC (permalink / raw)
  To: xfs

Hi folks,

A couple of quick fixes - the new verifier code leaks EFSBADCRC to
userspace, so any code that is checking for EFSCORRUPTED is not
going to be working correctly. The first patch fixes this
regression.

The second patch fixes a long standing issues with vmapped RAM
potentially causing deadlocks due to GFP_KERNEL allocations being
done for page tables inside GFP_NOFS contexts. This will fix many of
the lockdep reports about it, but won't get rid of them all. It
will, however, guarantee that we won't deadlock due to the problem
which is a big step up.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03  5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner
@ 2014-03-03  5:39 ` Dave Chinner
  2014-03-03 17:34   ` Eric Sandeen
                     ` (2 more replies)
  2014-03-03  5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner
  1 sibling, 3 replies; 14+ messages in thread
From: Dave Chinner @ 2014-03-03  5:39 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

While the verifier reoutines may return EFSBADCRC when a buffer ahs
a bad CRC, we need to translate that to EFSCORRUPTED so that the
higher layers treat the error appropriately and so we return a
consistent error to userspace. This fixes a xfs/005 regression.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c     |  3 +++
 fs/xfs/xfs_symlink.c   |  4 ++++
 fs/xfs/xfs_trans_buf.c | 11 +++++++++++
 3 files changed, 18 insertions(+)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index f96c056..993cb19 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -314,6 +314,9 @@ reread:
 		error = bp->b_error;
 		if (loud)
 			xfs_warn(mp, "SB validate failed with error %d.", error);
+		/* bad CRC means corrupted metadata */
+		if (error == EFSBADCRC)
+			error = EFSCORRUPTED;
 		goto release_buf;
 	}
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 14e58f2..5fda189 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -80,6 +80,10 @@ xfs_readlink_bmap(
 		if (error) {
 			xfs_buf_ioerror_alert(bp, __func__);
 			xfs_buf_relse(bp);
+
+			/* bad CRC means corrupted metadata */
+			if (error == EFSBADCRC)
+				error = EFSCORRUPTED;
 			goto out;
 		}
 		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 647b6f1..b8eef05 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -275,6 +275,10 @@ xfs_trans_read_buf_map(
 			XFS_BUF_UNDONE(bp);
 			xfs_buf_stale(bp);
 			xfs_buf_relse(bp);
+
+			/* bad CRC means corrupted metadata */
+			if (error == EFSBADCRC)
+				error = EFSCORRUPTED;
 			return error;
 		}
 #ifdef DEBUG
@@ -338,6 +342,9 @@ xfs_trans_read_buf_map(
 				if (tp->t_flags & XFS_TRANS_DIRTY)
 					xfs_force_shutdown(tp->t_mountp,
 							SHUTDOWN_META_IO_ERROR);
+				/* bad CRC means corrupted metadata */
+				if (error == EFSBADCRC)
+					error = EFSCORRUPTED;
 				return error;
 			}
 		}
@@ -375,6 +382,10 @@ xfs_trans_read_buf_map(
 		if (tp->t_flags & XFS_TRANS_DIRTY)
 			xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
 		xfs_buf_relse(bp);
+
+		/* bad CRC means corrupted metadata */
+		if (error == EFSBADCRC)
+			error = EFSCORRUPTED;
 		return error;
 	}
 #ifdef DEBUG
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram
  2014-03-03  5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner
  2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
@ 2014-03-03  5:39 ` Dave Chinner
  2014-03-05 16:58   ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2014-03-03  5:39 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we map pages in the buffer cache, we can do so in GFP_NOFS
contexts. However, the vmap interfaces do not provide any method of
communicating this information to memory reclaim, and hence we get
lockdep complaining about it regularly and occassionally see hangs
that may be vmap related reclaim deadlocks. We can also see these
same problems from anywhere where we use vmalloc for a large buffer
(e.g. attribute code) inside a transaction context.

A typical lockdep report shows up as a reclaim state warning like so:

[14046.101458] =================================
[14046.102850] [ INFO: inconsistent lock state ]
[14046.102850] 3.14.0-rc4+ #2 Not tainted
[14046.102850] ---------------------------------
[14046.102850] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[14046.102850] kswapd0/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
[14046.102850]  (&xfs_dir_ilock_class){++++?+}, at: [<791a04bb>] xfs_ilock+0xff/0x16a
[14046.102850] {RECLAIM_FS-ON-W} state was registered at:
[14046.102850]   [<7904cdb1>] mark_held_locks+0x81/0xe7
[14046.102850]   [<7904d390>] lockdep_trace_alloc+0x5c/0xb4
[14046.102850]   [<790c2c28>] kmem_cache_alloc_trace+0x2b/0x11e
[14046.102850]   [<790ba7f4>] vm_map_ram+0x119/0x3e6
[14046.102850]   [<7914e124>] _xfs_buf_map_pages+0x5b/0xcf
[14046.102850]   [<7914ed74>] xfs_buf_get_map+0x67/0x13f
[14046.102850]   [<7917506f>] xfs_attr_rmtval_set+0x396/0x4d5
[14046.102850]   [<7916e8bb>] xfs_attr_leaf_addname+0x18f/0x37d
[14046.102850]   [<7916ed9e>] xfs_attr_set_int+0x2f5/0x3e8
[14046.102850]   [<7916eefc>] xfs_attr_set+0x6b/0x74
[14046.102850]   [<79168355>] xfs_xattr_set+0x61/0x81
[14046.102850]   [<790e5b10>] generic_setxattr+0x59/0x68
[14046.102850]   [<790e4c06>] __vfs_setxattr_noperm+0x58/0xce
[14046.102850]   [<790e4d0a>] vfs_setxattr+0x8e/0x92
[14046.102850]   [<790e4ddd>] setxattr+0xcf/0x159
[14046.102850]   [<790e5423>] SyS_lsetxattr+0x88/0xbb
[14046.102850]   [<79268438>] sysenter_do_call+0x12/0x36

Now, we can't completely remove these traces - mainly because
vm_map_ram() will do GFP_KERNEL allocation and that generates the
above warning before we get into the reclaim code, but we can turn
them all into false positive warnings.

To do that, use the method that DM and other IO context code uses to
avoid this problem: there is a process flag to tell memory reclaim
not to do IO that we can set appropriately. That prevents GFP_KERNEL
context reclaim being done from deep inside the vmalloc code in
places we can't directly pass a GFP_NOFS context to. That interface
has a pair of wrapper functions: memalloc_noio_save() and
memalloc_noio_restore().

Adding them around vm_map_ram and the vzalloc call in
kmem_alloc_large() will prevent deadlocks and most lockdep reports
for this issue. Also, convert the vzalloc() call in
kmem_alloc_large() to use __vmalloc() so that we can pass the
correct gfp context to the data page allocation routine inside
__vmalloc() so that it is clear that GFP_NOFS context is important
to this vmalloc call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/kmem.c    | 21 ++++++++++++++++++++-
 fs/xfs/xfs_buf.c | 11 +++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 66a36be..844e288 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -65,12 +65,31 @@ kmem_alloc(size_t size, xfs_km_flags_t flags)
 void *
 kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
 {
+	unsigned noio_flag = 0;
 	void	*ptr;
+	gfp_t	lflags;
 
 	ptr = kmem_zalloc(size, flags | KM_MAYFAIL);
 	if (ptr)
 		return ptr;
-	return vzalloc(size);
+
+	/*
+	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
+	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
+	 * here. Hence we need to tell memory reclaim that we are in such a
+	 * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering
+	 * the filesystem here and potentially deadlocking.
+	 */
+	if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
+		noio_flag = memalloc_noio_save();
+
+	lflags = kmem_flags_convert(flags);
+	ptr = __vmalloc(size, lflags | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL);
+
+	if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
+		memalloc_noio_restore(noio_flag);
+
+	return ptr;
 }
 
 void
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9c061ef..107f2fd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -396,7 +396,17 @@ _xfs_buf_map_pages(
 		bp->b_addr = NULL;
 	} else {
 		int retried = 0;
+		unsigned noio_flag;
 
+		/*
+		 * vm_map_ram() will allocate auxillary structures (e.g.
+		 * pagetables) with GFP_KERNEL, yet we are likely to be under
+		 * GFP_NOFS context here. Hence we need to tell memory reclaim
+		 * that we are in such a context via PF_MEMALLOC_NOIO to prevent
+		 * memory reclaim re-entering the filesystem here and
+		 * potentially deadlocking.
+		 */
+		noio_flag = memalloc_noio_save();
 		do {
 			bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
 						-1, PAGE_KERNEL);
@@ -404,6 +414,7 @@ _xfs_buf_map_pages(
 				break;
 			vm_unmap_aliases();
 		} while (retried++ <= 1);
+		memalloc_noio_restore(noio_flag);
 
 		if (!bp->b_addr)
 			return -ENOMEM;
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
@ 2014-03-03 17:34   ` Eric Sandeen
  2014-03-03 22:13     ` Dave Chinner
  2014-03-03 17:44   ` Brian Foster
  2014-03-05 15:48   ` Brian Foster
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2014-03-03 17:34 UTC (permalink / raw)
  To: Dave Chinner, xfs

On 3/2/14, 11:39 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> While the verifier reoutines may return EFSBADCRC when a buffer ahs
> a bad CRC, we need to translate that to EFSCORRUPTED so that the
> higher layers treat the error appropriately and so we return a
> consistent error to userspace. This fixes a xfs/005 regression.

Can you say a little more about the philosophy here?

xfs/005 regresses because it expects "structure needs cleaning"

So if we instead return our (icky) CRC error code, we get something else.

But it is truly a different root cause.

So the goal is to NEVER leak EFSBADCRC to userspace?  Maybe a comment
above that error definition would help document that.

And I'm bit worried that we'll leak more in the future if things changed,
or if things got missed here.  Everything you have here looks fine, but
it's not obvious that every path has been caught; it seems a bit random.

I know we _just_ merged my "differentiator" patches, but I wonder if
it would be better to add XFS_BSTATE_BADCRC to b_state or some other 
field, and go back to always assigning EFSCORRUPTED.  What do you think?

When I wrote those I wasn't thinking about keeping it all internal
to the filesystem.

-Eric

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_mount.c     |  3 +++
>  fs/xfs/xfs_symlink.c   |  4 ++++
>  fs/xfs/xfs_trans_buf.c | 11 +++++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index f96c056..993cb19 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -314,6 +314,9 @@ reread:
>  		error = bp->b_error;
>  		if (loud)
>  			xfs_warn(mp, "SB validate failed with error %d.", error);
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		goto release_buf;
>  	}
>  
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 14e58f2..5fda189 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -80,6 +80,10 @@ xfs_readlink_bmap(
>  		if (error) {
>  			xfs_buf_ioerror_alert(bp, __func__);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			goto out;
>  		}
>  		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 647b6f1..b8eef05 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -275,6 +275,10 @@ xfs_trans_read_buf_map(
>  			XFS_BUF_UNDONE(bp);
>  			xfs_buf_stale(bp);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			return error;
>  		}
>  #ifdef DEBUG
> @@ -338,6 +342,9 @@ xfs_trans_read_buf_map(
>  				if (tp->t_flags & XFS_TRANS_DIRTY)
>  					xfs_force_shutdown(tp->t_mountp,
>  							SHUTDOWN_META_IO_ERROR);
> +				/* bad CRC means corrupted metadata */
> +				if (error == EFSBADCRC)
> +					error = EFSCORRUPTED;
>  				return error;
>  			}
>  		}
> @@ -375,6 +382,10 @@ xfs_trans_read_buf_map(
>  		if (tp->t_flags & XFS_TRANS_DIRTY)
>  			xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
>  		xfs_buf_relse(bp);
> +
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		return error;
>  	}
>  #ifdef DEBUG
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
  2014-03-03 17:34   ` Eric Sandeen
@ 2014-03-03 17:44   ` Brian Foster
  2014-03-03 22:29     ` Dave Chinner
  2014-03-05 15:48   ` Brian Foster
  2 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2014-03-03 17:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> While the verifier reoutines may return EFSBADCRC when a buffer ahs
> a bad CRC, we need to translate that to EFSCORRUPTED so that the
> higher layers treat the error appropriately and so we return a
> consistent error to userspace. This fixes a xfs/005 regression.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

This change looks Ok to me, but when I start looking through the users
of bp->b_error, I see examples like xfs_dir3_data_read() being called in
xfs_dir2_leaf_addname() where it looks like an error could bubble all
the way up to xfs_vn_mknod() and its callers.

If the intent is to use EFSBADCRC as an internal-only error to
differentiate corruption from crc failure, why not push this more
closely to the boundaries that we have already defined? For example, we
already convert positive errnos to negative at the internal/external
boundaries. Could we convert those to use some kind of
XFS_USERSPACE_ERROR(error) macro/helper that converts errors
appropriately?

Another thought could be to reconsider whether we still need some of
these extra warnings, as in the xfs_mount.c hunk below, now that we have
the generic xfs_verifier_error() messaging. E.g., if we could remove
those, perhaps we could snub out EFSBADCRC in or around the verifier
after it makes a distinction.

Brian

>  fs/xfs/xfs_mount.c     |  3 +++
>  fs/xfs/xfs_symlink.c   |  4 ++++
>  fs/xfs/xfs_trans_buf.c | 11 +++++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index f96c056..993cb19 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -314,6 +314,9 @@ reread:
>  		error = bp->b_error;
>  		if (loud)
>  			xfs_warn(mp, "SB validate failed with error %d.", error);
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		goto release_buf;
>  	}
>  
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 14e58f2..5fda189 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -80,6 +80,10 @@ xfs_readlink_bmap(
>  		if (error) {
>  			xfs_buf_ioerror_alert(bp, __func__);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			goto out;
>  		}
>  		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 647b6f1..b8eef05 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -275,6 +275,10 @@ xfs_trans_read_buf_map(
>  			XFS_BUF_UNDONE(bp);
>  			xfs_buf_stale(bp);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			return error;
>  		}
>  #ifdef DEBUG
> @@ -338,6 +342,9 @@ xfs_trans_read_buf_map(
>  				if (tp->t_flags & XFS_TRANS_DIRTY)
>  					xfs_force_shutdown(tp->t_mountp,
>  							SHUTDOWN_META_IO_ERROR);
> +				/* bad CRC means corrupted metadata */
> +				if (error == EFSBADCRC)
> +					error = EFSCORRUPTED;
>  				return error;
>  			}
>  		}
> @@ -375,6 +382,10 @@ xfs_trans_read_buf_map(
>  		if (tp->t_flags & XFS_TRANS_DIRTY)
>  			xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
>  		xfs_buf_relse(bp);
> +
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		return error;
>  	}
>  #ifdef DEBUG
> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03 17:34   ` Eric Sandeen
@ 2014-03-03 22:13     ` Dave Chinner
  2014-03-03 22:20       ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2014-03-03 22:13 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote:
> On 3/2/14, 11:39 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > While the verifier reoutines may return EFSBADCRC when a buffer ahs
> > a bad CRC, we need to translate that to EFSCORRUPTED so that the
> > higher layers treat the error appropriately and so we return a
> > consistent error to userspace. This fixes a xfs/005 regression.
> 
> Can you say a little more about the philosophy here?
> 
> xfs/005 regresses because it expects "structure needs cleaning"
> 
> So if we instead return our (icky) CRC error code, we get something else.
> 
> But it is truly a different root cause.
> 
> So the goal is to NEVER leak EFSBADCRC to userspace?  Maybe a comment
> above that error definition would help document that.

Not permanently.  At the moment, none of the code handles it
correctly, and the leak to userspace is just a symptom that tells us
we got somethign wrong. We have plenty of places where we check for
EFSCORRUPTED and do something special, but if we get EFSBADCRC
instead it will do the wrong thing....

> And I'm bit worried that we'll leak more in the future if things changed,
> or if things got missed here.  Everything you have here looks fine, but
> it's not obvious that every path has been caught; it seems a bit random.

It's not random. It's buffer reads that matter, and I
checked all the calls to xfs_buf_read, xfs_buf_read_map,
xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other
read interfaces that use verifiers, and so nothing else can return
EFSBADCRC. For the log recovery cases, the buffer reads don' use
verifiers, and those that do won't return EFSBADCRC (e.g. inode
buffers).

> I know we _just_ merged my "differentiator" patches, but I wonder if
> it would be better to add XFS_BSTATE_BADCRC to b_state or some other 
> field, and go back to always assigning EFSCORRUPTED.  What do you think?

It's just the first layer of adding differentiating support. We've
just put the mechanism in place to do the differentiation because we
need it for *userspace functionality* before we need it for
in-kernel functionality. We put it in the kernel because it has
value to us developers to indicate what type of corruption error was
detected in the dmesg output. We can't however, do everything at
once, so for the moment the kernel code needs to translate it back
to something the higher layers understand and treat correctly.

> When I wrote those I wasn't thinking about keeping it all internal
> to the filesystem.

Only for the moment, until there's code in the kernel that makes it
a meaningfully different error.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03 22:13     ` Dave Chinner
@ 2014-03-03 22:20       ` Eric Sandeen
  2014-03-03 22:31         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2014-03-03 22:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 3/3/14, 4:13 PM, Dave Chinner wrote:
> On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote:
>> On 3/2/14, 11:39 PM, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> While the verifier reoutines may return EFSBADCRC when a buffer ahs
>>> a bad CRC, we need to translate that to EFSCORRUPTED so that the
>>> higher layers treat the error appropriately and so we return a
>>> consistent error to userspace. This fixes a xfs/005 regression.
>>
>> Can you say a little more about the philosophy here?
>>
>> xfs/005 regresses because it expects "structure needs cleaning"
>>
>> So if we instead return our (icky) CRC error code, we get something else.
>>
>> But it is truly a different root cause.
>>
>> So the goal is to NEVER leak EFSBADCRC to userspace?  Maybe a comment
>> above that error definition would help document that.
> 
> Not permanently.  At the moment, none of the code handles it
> correctly, and the leak to userspace is just a symptom that tells us
> we got somethign wrong. We have plenty of places where we check for
> EFSCORRUPTED and do something special, but if we get EFSBADCRC
> instead it will do the wrong thing....
> 
>> And I'm bit worried that we'll leak more in the future if things changed,
>> or if things got missed here.  Everything you have here looks fine, but
>> it's not obvious that every path has been caught; it seems a bit random.
> 
> It's not random. It's buffer reads that matter, and I
> checked all the calls to xfs_buf_read, xfs_buf_read_map,
> xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other
> read interfaces that use verifiers, and so nothing else can return
> EFSBADCRC. For the log recovery cases, the buffer reads don' use
> verifiers, and those that do won't return EFSBADCRC (e.g. inode
> buffers).
> 
>> I know we _just_ merged my "differentiator" patches, but I wonder if
>> it would be better to add XFS_BSTATE_BADCRC to b_state or some other 
>> field, and go back to always assigning EFSCORRUPTED.  What do you think?
> 
> It's just the first layer of adding differentiating support. We've
> just put the mechanism in place to do the differentiation because we
> need it for *userspace functionality* before we need it for
> in-kernel functionality. We put it in the kernel because it has
> value to us developers to indicate what type of corruption error was
> detected in the dmesg output. We can't however, do everything at
> once, so for the moment the kernel code needs to translate it back
> to something the higher layers understand and treat correctly.
> 
>> When I wrote those I wasn't thinking about keeping it all internal
>> to the filesystem.
> 
> Only for the moment, until there's code in the kernel that makes it
> a meaningfully different error.

Ok, thanks.  Modulo Brian's question about other paths, what is here
so far looks ok to me, then.  A commit message that indicates that
this is somewhat temporary might be in order?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03 17:44   ` Brian Foster
@ 2014-03-03 22:29     ` Dave Chinner
  2014-03-04  1:20       ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2014-03-03 22:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote:
> On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > While the verifier reoutines may return EFSBADCRC when a buffer ahs
> > a bad CRC, we need to translate that to EFSCORRUPTED so that the
> > higher layers treat the error appropriately and so we return a
> > consistent error to userspace. This fixes a xfs/005 regression.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> This change looks Ok to me, but when I start looking through the users
> of bp->b_error, I see examples like xfs_dir3_data_read() being called in
> xfs_dir2_leaf_addname() where it looks like an error could bubble all
> the way up to xfs_vn_mknod() and its callers.

Sure, but:

xfs_dir3_data_read
  xfs_da_read_buf
    xfs_trans_read_buf_map

Which means the patch prevents the EFSBADCRC leaking back out
through that path because it converts it in xfs_trans_read_buf_map.

> If the intent is to use EFSBADCRC as an internal-only error to
> differentiate corruption from crc failure, why not push this more
> closely to the boundaries that we have already defined? For example, we
> already convert positive errnos to negative at the internal/external
> boundaries. Could we convert those to use some kind of
> XFS_USERSPACE_ERROR(error) macro/helper that converts errors
> appropriately?

That doesn't solve the problem needing an error conversion layer in
the first place. The long term goal is to remove the error
conversions in XFS by converting the core code to the same error
passing conventions as the rest of the kernel code. We manage to
screw the negation up fairly regularly because it is convoluted and
we cal into generic code that returns negative errors from the core
that returns positive errors in lots of places. The conversion
surface is just too large to manage sanely.

> Another thought could be to reconsider whether we still need some of
> these extra warnings, as in the xfs_mount.c hunk below, now that we have
> the generic xfs_verifier_error() messaging. E.g., if we could remove
> those, perhaps we could snub out EFSBADCRC in or around the verifier
> after it makes a distinction.

Redundant errors aren't s significant problem. It's the lack of
meaningful error messages that are much more of an issue. We get
more meaningful error messages as a result of the EFSBADCRC changes
that have been made, but for the moment that error simply means
EFSCORRUPTED to the higher layers. Hence the translation back to
EFSCORRUPTED at the (low) layers where the error no longer has
a distinct meaning.

As we add more functionality, EFSBADCRC will become more meaningful
and so get propagated higher into the kernel code. But for now, it
should remain an error that doesn't escape the lower layers...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03 22:20       ` Eric Sandeen
@ 2014-03-03 22:31         ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2014-03-03 22:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Mon, Mar 03, 2014 at 04:20:34PM -0600, Eric Sandeen wrote:
> On 3/3/14, 4:13 PM, Dave Chinner wrote:
> > On Mon, Mar 03, 2014 at 11:34:35AM -0600, Eric Sandeen wrote:
> >> On 3/2/14, 11:39 PM, Dave Chinner wrote:
> >>> From: Dave Chinner <dchinner@redhat.com>
> >>>
> >>> While the verifier reoutines may return EFSBADCRC when a buffer ahs
> >>> a bad CRC, we need to translate that to EFSCORRUPTED so that the
> >>> higher layers treat the error appropriately and so we return a
> >>> consistent error to userspace. This fixes a xfs/005 regression.
> >>
> >> Can you say a little more about the philosophy here?
> >>
> >> xfs/005 regresses because it expects "structure needs cleaning"
> >>
> >> So if we instead return our (icky) CRC error code, we get something else.
> >>
> >> But it is truly a different root cause.
> >>
> >> So the goal is to NEVER leak EFSBADCRC to userspace?  Maybe a comment
> >> above that error definition would help document that.
> > 
> > Not permanently.  At the moment, none of the code handles it
> > correctly, and the leak to userspace is just a symptom that tells us
> > we got somethign wrong. We have plenty of places where we check for
> > EFSCORRUPTED and do something special, but if we get EFSBADCRC
> > instead it will do the wrong thing....
> > 
> >> And I'm bit worried that we'll leak more in the future if things changed,
> >> or if things got missed here.  Everything you have here looks fine, but
> >> it's not obvious that every path has been caught; it seems a bit random.
> > 
> > It's not random. It's buffer reads that matter, and I
> > checked all the calls to xfs_buf_read, xfs_buf_read_map,
> > xfs_trans_read_buf and xfs_trans_read_buf. There aren't any other
> > read interfaces that use verifiers, and so nothing else can return
> > EFSBADCRC. For the log recovery cases, the buffer reads don' use
> > verifiers, and those that do won't return EFSBADCRC (e.g. inode
> > buffers).
> > 
> >> I know we _just_ merged my "differentiator" patches, but I wonder if
> >> it would be better to add XFS_BSTATE_BADCRC to b_state or some other 
> >> field, and go back to always assigning EFSCORRUPTED.  What do you think?
> > 
> > It's just the first layer of adding differentiating support. We've
> > just put the mechanism in place to do the differentiation because we
> > need it for *userspace functionality* before we need it for
> > in-kernel functionality. We put it in the kernel because it has
> > value to us developers to indicate what type of corruption error was
> > detected in the dmesg output. We can't however, do everything at
> > once, so for the moment the kernel code needs to translate it back
> > to something the higher layers understand and treat correctly.
> > 
> >> When I wrote those I wasn't thinking about keeping it all internal
> >> to the filesystem.
> > 
> > Only for the moment, until there's code in the kernel that makes it
> > a meaningfully different error.
> 
> Ok, thanks.  Modulo Brian's question about other paths, what is here
> so far looks ok to me, then.  A commit message that indicates that
> this is somewhat temporary might be in order?

Sure, I can improve the commit message by including a summary of
this discussion. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03 22:29     ` Dave Chinner
@ 2014-03-04  1:20       ` Brian Foster
  2014-03-04  4:32         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2014-03-04  1:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote:
> On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote:
> > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > While the verifier reoutines may return EFSBADCRC when a buffer ahs
> > > a bad CRC, we need to translate that to EFSCORRUPTED so that the
> > > higher layers treat the error appropriately and so we return a
> > > consistent error to userspace. This fixes a xfs/005 regression.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > This change looks Ok to me, but when I start looking through the users
> > of bp->b_error, I see examples like xfs_dir3_data_read() being called in
> > xfs_dir2_leaf_addname() where it looks like an error could bubble all
> > the way up to xfs_vn_mknod() and its callers.
> 
> Sure, but:
> 
> xfs_dir3_data_read
>   xfs_da_read_buf
>     xfs_trans_read_buf_map
> 
> Which means the patch prevents the EFSBADCRC leaking back out
> through that path because it converts it in xfs_trans_read_buf_map.
> 

Ok, I see. FWIW, I was also trolling through some of the log recovery
code and it appears that code uses b_ops for write verification purposes
only (i.e., crc generation I suspect), correct?

> > If the intent is to use EFSBADCRC as an internal-only error to
> > differentiate corruption from crc failure, why not push this more
> > closely to the boundaries that we have already defined? For example, we
> > already convert positive errnos to negative at the internal/external
> > boundaries. Could we convert those to use some kind of
> > XFS_USERSPACE_ERROR(error) macro/helper that converts errors
> > appropriately?
> 
> That doesn't solve the problem needing an error conversion layer in
> the first place. The long term goal is to remove the error
> conversions in XFS by converting the core code to the same error
> passing conventions as the rest of the kernel code. We manage to
> screw the negation up fairly regularly because it is convoluted and
> we cal into generic code that returns negative errors from the core
> that returns positive errors in lots of places. The conversion
> surface is just too large to manage sanely.
> 

Well it wasn't clear that the error conversion layer was a problem that
itself needed solving, but fair enough. ;) If the objective is to move
away from the positive/negative business to something more "kernel
native," then building more infrastructure around that is probably not
the way to go, unless that was actually part of the changeover strategy
(e.g., if we happened to use something that conditionally negated error
values to support incremental codepath conversions... as a semi-random
thought).

> > Another thought could be to reconsider whether we still need some of
> > these extra warnings, as in the xfs_mount.c hunk below, now that we have
> > the generic xfs_verifier_error() messaging. E.g., if we could remove
> > those, perhaps we could snub out EFSBADCRC in or around the verifier
> > after it makes a distinction.
> 
> Redundant errors aren't s significant problem. It's the lack of
> meaningful error messages that are much more of an issue. We get
> more meaningful error messages as a result of the EFSBADCRC changes
> that have been made, but for the moment that error simply means
> EFSCORRUPTED to the higher layers. Hence the translation back to
> EFSCORRUPTED at the (low) layers where the error no longer has
> a distinct meaning.
> 

Right, the purpose is not to solve the problem of redundant errors.
Rather, if the only thing preventing the consolidation of the EFSBADCRC
trap to a single hunk are redundant errors, then it wouldn't be a
problem to remove those errors and push the EFSBADCRC trap into the
generic code for the purpose of clarity.

Brian

> As we add more functionality, EFSBADCRC will become more meaningful
> and so get propagated higher into the kernel code. But for now, it
> should remain an error that doesn't escape the lower layers...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-04  1:20       ` Brian Foster
@ 2014-03-04  4:32         ` Dave Chinner
  2014-03-04 14:01           ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2014-03-04  4:32 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Mar 03, 2014 at 08:20:57PM -0500, Brian Foster wrote:
> On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote:
> > On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote:
> > > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs
> > > > a bad CRC, we need to translate that to EFSCORRUPTED so that the
> > > > higher layers treat the error appropriately and so we return a
> > > > consistent error to userspace. This fixes a xfs/005 regression.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > 
> > > This change looks Ok to me, but when I start looking through the users
> > > of bp->b_error, I see examples like xfs_dir3_data_read() being called in
> > > xfs_dir2_leaf_addname() where it looks like an error could bubble all
> > > the way up to xfs_vn_mknod() and its callers.
> > 
> > Sure, but:
> > 
> > xfs_dir3_data_read
> >   xfs_da_read_buf
> >     xfs_trans_read_buf_map
> > 
> > Which means the patch prevents the EFSBADCRC leaking back out
> > through that path because it converts it in xfs_trans_read_buf_map.
> > 
> 
> Ok, I see. FWIW, I was also trolling through some of the log recovery
> code and it appears that code uses b_ops for write verification purposes
> only (i.e., crc generation I suspect), correct?

Yes.

> > > If the intent is to use EFSBADCRC as an internal-only error to
> > > differentiate corruption from crc failure, why not push this more
> > > closely to the boundaries that we have already defined? For example, we
> > > already convert positive errnos to negative at the internal/external
> > > boundaries. Could we convert those to use some kind of
> > > XFS_USERSPACE_ERROR(error) macro/helper that converts errors
> > > appropriately?
> > 
> > That doesn't solve the problem needing an error conversion layer in
> > the first place. The long term goal is to remove the error
> > conversions in XFS by converting the core code to the same error
> > passing conventions as the rest of the kernel code. We manage to
> > screw the negation up fairly regularly because it is convoluted and
> > we cal into generic code that returns negative errors from the core
> > that returns positive errors in lots of places. The conversion
> > surface is just too large to manage sanely.
> > 
> 
> Well it wasn't clear that the error conversion layer was a problem that
> itself needed solving, but fair enough. ;) If the objective is to move
> away from the positive/negative business to something more "kernel
> native," then building more infrastructure around that is probably not
> the way to go, unless that was actually part of the changeover strategy
> (e.g., if we happened to use something that conditionally negated error
> values to support incremental codepath conversions... as a semi-random
> thought).

I'm not sure there's any real benefit to adding infrastructure when
doing the conversion. It's a lot of little changes involving pushing
the conversion down from the top layers, combined with converting
interfaces (e.g. bmapi) as they get exposed.

Some functions will be able to lose variables as the return becomes
a tri-state value (e.g. the btree functions for looking up records);
others will be able to use IS_ERR() for pointer returns, and so on.
This will have a flow on effect on the code in the callers, and so
it really comes down to spending the time to peel back each layer
carefully.

It's a lot of work, hence the "long term goal" aspect of it. We've
been talking about doing this cleanup for several years now, but it
hasn't yet been done because it's going to take hundreds of patches
to do... ;)

> > > Another thought could be to reconsider whether we still need some of
> > > these extra warnings, as in the xfs_mount.c hunk below, now that we have
> > > the generic xfs_verifier_error() messaging. E.g., if we could remove
> > > those, perhaps we could snub out EFSBADCRC in or around the verifier
> > > after it makes a distinction.
> > 
> > Redundant errors aren't s significant problem. It's the lack of
> > meaningful error messages that are much more of an issue. We get
> > more meaningful error messages as a result of the EFSBADCRC changes
> > that have been made, but for the moment that error simply means
> > EFSCORRUPTED to the higher layers. Hence the translation back to
> > EFSCORRUPTED at the (low) layers where the error no longer has
> > a distinct meaning.
> > 
> 
> Right, the purpose is not to solve the problem of redundant errors.
> Rather, if the only thing preventing the consolidation of the EFSBADCRC
> trap to a single hunk are redundant errors, then it wouldn't be a
> problem to remove those errors and push the EFSBADCRC trap into the
> generic code for the purpose of clarity.

I don't see much point in trying to have a single trap for EFSBADCRC
because the moment we want to handle one of those cases specially we
are going to have to - as a first step - remove the single trap and
push it outwards to all the places that need it like this patch
does....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-04  4:32         ` Dave Chinner
@ 2014-03-04 14:01           ` Brian Foster
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2014-03-04 14:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Mar 04, 2014 at 03:32:25PM +1100, Dave Chinner wrote:
> On Mon, Mar 03, 2014 at 08:20:57PM -0500, Brian Foster wrote:
> > On Tue, Mar 04, 2014 at 09:29:46AM +1100, Dave Chinner wrote:
> > > On Mon, Mar 03, 2014 at 12:44:26PM -0500, Brian Foster wrote:
> > > > On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > While the verifier reoutines may return EFSBADCRC when a buffer ahs
> > > > > a bad CRC, we need to translate that to EFSCORRUPTED so that the
> > > > > higher layers treat the error appropriately and so we return a
> > > > > consistent error to userspace. This fixes a xfs/005 regression.
> > > > > 
> > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > > ---
> > > > 
> > > > This change looks Ok to me, but when I start looking through the users
> > > > of bp->b_error, I see examples like xfs_dir3_data_read() being called in
> > > > xfs_dir2_leaf_addname() where it looks like an error could bubble all
> > > > the way up to xfs_vn_mknod() and its callers.
> > > 
> > > Sure, but:
> > > 
> > > xfs_dir3_data_read
> > >   xfs_da_read_buf
> > >     xfs_trans_read_buf_map
> > > 
> > > Which means the patch prevents the EFSBADCRC leaking back out
> > > through that path because it converts it in xfs_trans_read_buf_map.
> > > 
> > 
> > Ok, I see. FWIW, I was also trolling through some of the log recovery
> > code and it appears that code uses b_ops for write verification purposes
> > only (i.e., crc generation I suspect), correct?
> 
> Yes.
> 
> > > > If the intent is to use EFSBADCRC as an internal-only error to
> > > > differentiate corruption from crc failure, why not push this more
> > > > closely to the boundaries that we have already defined? For example, we
> > > > already convert positive errnos to negative at the internal/external
> > > > boundaries. Could we convert those to use some kind of
> > > > XFS_USERSPACE_ERROR(error) macro/helper that converts errors
> > > > appropriately?
> > > 
> > > That doesn't solve the problem needing an error conversion layer in
> > > the first place. The long term goal is to remove the error
> > > conversions in XFS by converting the core code to the same error
> > > passing conventions as the rest of the kernel code. We manage to
> > > screw the negation up fairly regularly because it is convoluted and
> > > we cal into generic code that returns negative errors from the core
> > > that returns positive errors in lots of places. The conversion
> > > surface is just too large to manage sanely.
> > > 
> > 
> > Well it wasn't clear that the error conversion layer was a problem that
> > itself needed solving, but fair enough. ;) If the objective is to move
> > away from the positive/negative business to something more "kernel
> > native," then building more infrastructure around that is probably not
> > the way to go, unless that was actually part of the changeover strategy
> > (e.g., if we happened to use something that conditionally negated error
> > values to support incremental codepath conversions... as a semi-random
> > thought).
> 
> I'm not sure there's any real benefit to adding infrastructure when
> doing the conversion. It's a lot of little changes involving pushing
> the conversion down from the top layers, combined with converting
> interfaces (e.g. bmapi) as they get exposed.
> 
> Some functions will be able to lose variables as the return becomes
> a tri-state value (e.g. the btree functions for looking up records);
> others will be able to use IS_ERR() for pointer returns, and so on.
> This will have a flow on effect on the code in the callers, and so
> it really comes down to spending the time to peel back each layer
> carefully.
> 
> It's a lot of work, hence the "long term goal" aspect of it. We've
> been talking about doing this cleanup for several years now, but it
> hasn't yet been done because it's going to take hundreds of patches
> to do... ;)
> 

Indeed. I just wasn't aware of that, so I was thinking aloud a bit
(along the same lines of pushing the conversion from top-to-bottom). In
any event, it's good to have this in mind going forward.

> > > > Another thought could be to reconsider whether we still need some of
> > > > these extra warnings, as in the xfs_mount.c hunk below, now that we have
> > > > the generic xfs_verifier_error() messaging. E.g., if we could remove
> > > > those, perhaps we could snub out EFSBADCRC in or around the verifier
> > > > after it makes a distinction.
> > > 
> > > Redundant errors aren't s significant problem. It's the lack of
> > > meaningful error messages that are much more of an issue. We get
> > > more meaningful error messages as a result of the EFSBADCRC changes
> > > that have been made, but for the moment that error simply means
> > > EFSCORRUPTED to the higher layers. Hence the translation back to
> > > EFSCORRUPTED at the (low) layers where the error no longer has
> > > a distinct meaning.
> > > 
> > 
> > Right, the purpose is not to solve the problem of redundant errors.
> > Rather, if the only thing preventing the consolidation of the EFSBADCRC
> > trap to a single hunk are redundant errors, then it wouldn't be a
> > problem to remove those errors and push the EFSBADCRC trap into the
> > generic code for the purpose of clarity.
> 
> I don't see much point in trying to have a single trap for EFSBADCRC
> because the moment we want to handle one of those cases specially we
> are going to have to - as a first step - remove the single trap and
> push it outwards to all the places that need it like this patch
> does....
> 

Agreed. I'm less concerned about it with the understanding that
EFSBADCRC is not to be forever isolated to the world of verifiers. I
just wanted to make sure my point was clear. ;)

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace
  2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
  2014-03-03 17:34   ` Eric Sandeen
  2014-03-03 17:44   ` Brian Foster
@ 2014-03-05 15:48   ` Brian Foster
  2 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2014-03-05 15:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Mar 03, 2014 at 04:39:53PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> While the verifier reoutines may return EFSBADCRC when a buffer ahs
> a bad CRC, we need to translate that to EFSCORRUPTED so that the
> higher layers treat the error appropriately and so we return a
> consistent error to userspace. This fixes a xfs/005 regression.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looking through again, I don't see any obvious cases where EFSBADCRC
could slip through. The verifier users mostly call
xfs_trans_read_buf_map(). The readahead case looks like it should be
handled by the xfs_buf_iowait() checks therein.

The one case I came across that looked suspicious is the readahead/read
cycle for inode log recovery in xlog_recover_commit_trans(). This calls
into xfs_buf_read() and ultimately uses bp->b_error directly in
xlog_recover_inode_pass2(). That said, I don't see any crc checking in
these verifiers and thus they don't source EFSBADCRC. Looks good to
me...

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_mount.c     |  3 +++
>  fs/xfs/xfs_symlink.c   |  4 ++++
>  fs/xfs/xfs_trans_buf.c | 11 +++++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index f96c056..993cb19 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -314,6 +314,9 @@ reread:
>  		error = bp->b_error;
>  		if (loud)
>  			xfs_warn(mp, "SB validate failed with error %d.", error);
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		goto release_buf;
>  	}
>  
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 14e58f2..5fda189 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -80,6 +80,10 @@ xfs_readlink_bmap(
>  		if (error) {
>  			xfs_buf_ioerror_alert(bp, __func__);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			goto out;
>  		}
>  		byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 647b6f1..b8eef05 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -275,6 +275,10 @@ xfs_trans_read_buf_map(
>  			XFS_BUF_UNDONE(bp);
>  			xfs_buf_stale(bp);
>  			xfs_buf_relse(bp);
> +
> +			/* bad CRC means corrupted metadata */
> +			if (error == EFSBADCRC)
> +				error = EFSCORRUPTED;
>  			return error;
>  		}
>  #ifdef DEBUG
> @@ -338,6 +342,9 @@ xfs_trans_read_buf_map(
>  				if (tp->t_flags & XFS_TRANS_DIRTY)
>  					xfs_force_shutdown(tp->t_mountp,
>  							SHUTDOWN_META_IO_ERROR);
> +				/* bad CRC means corrupted metadata */
> +				if (error == EFSBADCRC)
> +					error = EFSCORRUPTED;
>  				return error;
>  			}
>  		}
> @@ -375,6 +382,10 @@ xfs_trans_read_buf_map(
>  		if (tp->t_flags & XFS_TRANS_DIRTY)
>  			xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
>  		xfs_buf_relse(bp);
> +
> +		/* bad CRC means corrupted metadata */
> +		if (error == EFSBADCRC)
> +			error = EFSCORRUPTED;
>  		return error;
>  	}
>  #ifdef DEBUG
> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram
  2014-03-03  5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner
@ 2014-03-05 16:58   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2014-03-05 16:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Ugly, but given that it seems like we can't get the flags passing fixed
properly:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-03-05 16:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-03  5:39 [PATCH 0/2] xfs: fixes for for-next Dave Chinner
2014-03-03  5:39 ` [PATCH 1/2] xfs: don't leak EFSBADCRC to userspace Dave Chinner
2014-03-03 17:34   ` Eric Sandeen
2014-03-03 22:13     ` Dave Chinner
2014-03-03 22:20       ` Eric Sandeen
2014-03-03 22:31         ` Dave Chinner
2014-03-03 17:44   ` Brian Foster
2014-03-03 22:29     ` Dave Chinner
2014-03-04  1:20       ` Brian Foster
2014-03-04  4:32         ` Dave Chinner
2014-03-04 14:01           ` Brian Foster
2014-03-05 15:48   ` Brian Foster
2014-03-03  5:39 ` [PATCH 2/2] xfs: use NOIO contexts for vm_map_ram Dave Chinner
2014-03-05 16:58   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).