Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi_write() parameter") into stable
@ 2017-03-15 10:41 Nikolay Borisov
  2017-03-15 11:54 ` Brian Foster
  2017-03-15 12:24 ` xfs: pass total block res. as total xfs_bmapi_write() parameter Nikolay Borisov
  0 siblings, 2 replies; 5+ messages in thread
From: Nikolay Borisov @ 2017-03-15 10:41 UTC (permalink / raw)
  To: stable; +Cc: darrick.wong, linux-xfs, david

Hello,

I'd like to request the commit from the subject to be included into
stable kernels, starting with 3.12. Without generic/299 would deadlock
almost always, whereas with the commit applied I can get it to run 20
consecutive times without a hitch.

Regards,
Nikolay

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi_write() parameter") into stable
  2017-03-15 10:41 Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi_write() parameter") into stable Nikolay Borisov
@ 2017-03-15 11:54 ` Brian Foster
  2017-03-15 12:24 ` xfs: pass total block res. as total xfs_bmapi_write() parameter Nikolay Borisov
  1 sibling, 0 replies; 5+ messages in thread
From: Brian Foster @ 2017-03-15 11:54 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: stable, darrick.wong, linux-xfs, david

On Wed, Mar 15, 2017 at 12:41:02PM +0200, Nikolay Borisov wrote:
> Hello,
> 
> I'd like to request the commit from the subject to be included into
> stable kernels, starting with 3.12. Without generic/299 would deadlock
> almost always, whereas with the commit applied I can get it to run 20
> consecutive times without a hitch.
> 

Seems reasonable to me. Care to post a backport(s)? (Unless Greg
suggests otherwise, but I think that's how stable works...).

Brian

> 
> Regards,
> Nikolay
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* xfs: pass total block res. as total xfs_bmapi_write() parameter
  2017-03-15 10:41 Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi_write() parameter") into stable Nikolay Borisov
  2017-03-15 11:54 ` Brian Foster
@ 2017-03-15 12:24 ` Nikolay Borisov
  2017-03-15 15:00   ` Brian Foster
  2017-03-16 10:07   ` Jiri Slaby
  1 sibling, 2 replies; 5+ messages in thread
From: Nikolay Borisov @ 2017-03-15 12:24 UTC (permalink / raw)
  To: stable; +Cc: darrick.wong, linux-xfs, david, Brian Foster, Nikolay Borisov

From: Brian Foster <bfoster@redhat.com>

The total field from struct xfs_alloc_arg is a bit of an unknown
commodity. It is documented as the total block requirement for the
transaction and is used in this manner from most call sites by virtue of
passing the total block reservation of the transaction associated with
an allocation. Several xfs_bmapi_write() callers pass hardcoded values
of 0 or 1 for the total block requirement, which is a historical oddity
without any clear reasoning.

The xfs_iomap_write_direct() caller, for example, passes 0 for the total
block requirement. This has been determined to cause problems in the
form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
the block allocator. Specifically, the xfs_alloc_space_available()
function incorrectly selects an AG that doesn't actually have sufficient
space for the allocation. This occurs because the args.total field is 0
and thus the remaining free space check on the AG doesn't actually
consider the size of the allocation request. This locks the AGF buffer,
the allocation attempt proceeds and ultimately fails (in
xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
AG. In turn, this can lead to incorrect AG locking order (if the
allocator wraps around, attempting to lock AG 0 after acquiring AG N)
and thus deadlock if racing with another operation. This problem has
been reproduced via generic/299 on smallish (1GB) ramdisk test devices.

To avoid this problem, replace the undocumented hardcoded total
parameters from the iomap and utility callers to pass the block
reservation used for the associated transaction. This is consistent with
other xfs_bmapi_write() callers throughout XFS. The assumption is that
the total field allows the selection of an AG that can handle the entire
operation rather than simply the allocation/range being requested (e.g.,
resulting btree splits, etc.). This addresses the aforementioned
generic/299 hang by ensuring AG selection only occurs when the
allocation can be satisfied by the AG.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

Here is the backport for 3.12 

 fs/xfs/xfs_bmap_util.c | 2 +-
 fs/xfs/xfs_iomap.c     | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2843a0426bca..66172581dacb 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1146,7 +1146,7 @@ retry:
 		xfs_bmap_init(&free_list, &firstfsb);
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
 					allocatesize_fsb, alloc_type, &firstfsb,
-					0, imapp, &nimaps, &free_list);
+					resblks, imapp, &nimaps, &free_list);
 		if (error) {
 			goto error0;
 		}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 43e31b0b288b..da87415ac561 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -217,7 +217,7 @@ xfs_iomap_write_direct(
 	xfs_bmap_init(&free_list, &firstfsb);
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flag,
-				&firstfsb, 0, imap, &nimaps, &free_list);
+				&firstfsb, resblks, imap, &nimaps, &free_list);
 	if (error)
 		goto out_bmap_cancel;

@@ -762,7 +762,7 @@ xfs_iomap_write_allocate(
 			error = xfs_bmapi_write(tp, ip, map_start_fsb,
 						count_fsb,
 						XFS_BMAPI_STACK_SWITCH,
-						&first_block, 1,
+						&first_block, nres,
 						imap, &nimaps, &free_list);
 			if (error)
 				goto trans_cancel;
@@ -877,8 +877,8 @@ xfs_iomap_write_unwritten(
 		xfs_bmap_init(&free_list, &firstfsb);
 		nimaps = 1;
 		error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
-				  XFS_BMAPI_CONVERT, &firstfsb,
-				  1, &imap, &nimaps, &free_list);
+					XFS_BMAPI_CONVERT, &firstfsb, resblks,
+					&imap, &nimaps, &free_list);
 		if (error)
 			goto error_on_bmapi_transaction;

-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: xfs: pass total block res. as total xfs_bmapi_write() parameter
  2017-03-15 12:24 ` xfs: pass total block res. as total xfs_bmapi_write() parameter Nikolay Borisov
@ 2017-03-15 15:00   ` Brian Foster
  2017-03-16 10:07   ` Jiri Slaby
  1 sibling, 0 replies; 5+ messages in thread
From: Brian Foster @ 2017-03-15 15:00 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: stable, darrick.wong, linux-xfs, david

On Wed, Mar 15, 2017 at 02:24:14PM +0200, Nikolay Borisov wrote:
> From: Brian Foster <bfoster@redhat.com>
> 
> The total field from struct xfs_alloc_arg is a bit of an unknown
> commodity. It is documented as the total block requirement for the
> transaction and is used in this manner from most call sites by virtue of
> passing the total block reservation of the transaction associated with
> an allocation. Several xfs_bmapi_write() callers pass hardcoded values
> of 0 or 1 for the total block requirement, which is a historical oddity
> without any clear reasoning.
> 
> The xfs_iomap_write_direct() caller, for example, passes 0 for the total
> block requirement. This has been determined to cause problems in the
> form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
> the block allocator. Specifically, the xfs_alloc_space_available()
> function incorrectly selects an AG that doesn't actually have sufficient
> space for the allocation. This occurs because the args.total field is 0
> and thus the remaining free space check on the AG doesn't actually
> consider the size of the allocation request. This locks the AGF buffer,
> the allocation attempt proceeds and ultimately fails (in
> xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
> AG. In turn, this can lead to incorrect AG locking order (if the
> allocator wraps around, attempting to lock AG 0 after acquiring AG N)
> and thus deadlock if racing with another operation. This problem has
> been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
> 
> To avoid this problem, replace the undocumented hardcoded total
> parameters from the iomap and utility callers to pass the block
> reservation used for the associated transaction. This is consistent with
> other xfs_bmapi_write() callers throughout XFS. The assumption is that
> the total field allows the selection of an AG that can handle the entire
> operation rather than simply the allocation/range being requested (e.g.,
> resulting btree splits, etc.). This addresses the aforementioned
> generic/299 hang by ensuring AG selection only occurs when the
> allocation can be satisfied by the AG.
> 
> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> 
> Here is the backport for 3.12 
> 

Seems fine to me, thanks. FWIW:

Acked-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_bmap_util.c | 2 +-
>  fs/xfs/xfs_iomap.c     | 8 ++++----
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2843a0426bca..66172581dacb 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1146,7 +1146,7 @@ retry:
>  		xfs_bmap_init(&free_list, &firstfsb);
>  		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
>  					allocatesize_fsb, alloc_type, &firstfsb,
> -					0, imapp, &nimaps, &free_list);
> +					resblks, imapp, &nimaps, &free_list);
>  		if (error) {
>  			goto error0;
>  		}
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 43e31b0b288b..da87415ac561 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -217,7 +217,7 @@ xfs_iomap_write_direct(
>  	xfs_bmap_init(&free_list, &firstfsb);
>  	nimaps = 1;
>  	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flag,
> -				&firstfsb, 0, imap, &nimaps, &free_list);
> +				&firstfsb, resblks, imap, &nimaps, &free_list);
>  	if (error)
>  		goto out_bmap_cancel;
>  
> @@ -762,7 +762,7 @@ xfs_iomap_write_allocate(
>  			error = xfs_bmapi_write(tp, ip, map_start_fsb,
>  						count_fsb,
>  						XFS_BMAPI_STACK_SWITCH,
> -						&first_block, 1,
> +						&first_block, nres,
>  						imap, &nimaps, &free_list);
>  			if (error)
>  				goto trans_cancel;
> @@ -877,8 +877,8 @@ xfs_iomap_write_unwritten(
>  		xfs_bmap_init(&free_list, &firstfsb);
>  		nimaps = 1;
>  		error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
> -				  XFS_BMAPI_CONVERT, &firstfsb,
> -				  1, &imap, &nimaps, &free_list);
> +					XFS_BMAPI_CONVERT, &firstfsb, resblks,
> +					&imap, &nimaps, &free_list);
>  		if (error)
>  			goto error_on_bmapi_transaction;
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs: pass total block res. as total xfs_bmapi_write() parameter
  2017-03-15 12:24 ` xfs: pass total block res. as total xfs_bmapi_write() parameter Nikolay Borisov
  2017-03-15 15:00   ` Brian Foster
@ 2017-03-16 10:07   ` Jiri Slaby
  1 sibling, 0 replies; 5+ messages in thread
From: Jiri Slaby @ 2017-03-16 10:07 UTC (permalink / raw)
  To: Nikolay Borisov, stable; +Cc: darrick.wong, linux-xfs, david, Brian Foster

On 03/15/2017, 01:24 PM, Nikolay Borisov wrote:
> From: Brian Foster <bfoster@redhat.com>
> 
> The total field from struct xfs_alloc_arg is a bit of an unknown
> commodity. It is documented as the total block requirement for the
> transaction and is used in this manner from most call sites by virtue of
> passing the total block reservation of the transaction associated with
> an allocation. Several xfs_bmapi_write() callers pass hardcoded values
> of 0 or 1 for the total block requirement, which is a historical oddity
> without any clear reasoning.
> 
> The xfs_iomap_write_direct() caller, for example, passes 0 for the total
> block requirement. This has been determined to cause problems in the
> form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
> the block allocator. Specifically, the xfs_alloc_space_available()
> function incorrectly selects an AG that doesn't actually have sufficient
> space for the allocation. This occurs because the args.total field is 0
> and thus the remaining free space check on the AG doesn't actually
> consider the size of the allocation request. This locks the AGF buffer,
> the allocation attempt proceeds and ultimately fails (in
> xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
> AG. In turn, this can lead to incorrect AG locking order (if the
> allocator wraps around, attempting to lock AG 0 after acquiring AG N)
> and thus deadlock if racing with another operation. This problem has
> been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
> 
> To avoid this problem, replace the undocumented hardcoded total
> parameters from the iomap and utility callers to pass the block
> reservation used for the associated transaction. This is consistent with
> other xfs_bmapi_write() callers throughout XFS. The assumption is that
> the total field allows the selection of an AG that can handle the entire
> operation rather than simply the allocation/range being requested (e.g.,
> resulting btree splits, etc.). This addresses the aforementioned
> generic/299 hang by ensuring AG selection only occurs when the
> allocation can be satisfied by the AG.
> 
> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> 
> Here is the backport for 3.12 

Applied, thanks!

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-16 10:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-15 10:41 Request for including dbd5c8c9a288 ("xfs: pass total block res. as total xfs_bmapi_write() parameter") into stable Nikolay Borisov
2017-03-15 11:54 ` Brian Foster
2017-03-15 12:24 ` xfs: pass total block res. as total xfs_bmapi_write() parameter Nikolay Borisov
2017-03-15 15:00   ` Brian Foster
2017-03-16 10:07   ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).