linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
@ 2014-02-11 18:07 Brian Foster
  2014-02-18 16:48 ` Mark Tinguely
  2014-02-28 19:24 ` Brian Foster
  0 siblings, 2 replies; 4+ messages in thread
From: Brian Foster @ 2014-02-11 18:07 UTC (permalink / raw)
  To: xfs

The inode chunk allocation path can lead to deadlock conditions if
a transaction is dirtied with an AGF (to fix up the freelist) for
an AG that cannot satisfy the actual allocation request. This code
path is written to try and avoid this scenario, but it can be
reproduced by running xfstests generic/270 in a loop on a 512b fs.

An example situation is:
- process A attempts an inode allocation on AG 3, modifies
  the freelist, fails the allocation and ultimately moves on to
  AG 0 with the AG 3 AGF held
- process B is doing a free space operation (i.e., truncate) and
  acquires the AG 0 AGF, waits on the AG 3 AGF
- process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)

The problem here is that process A acquired the AG 3 AGF while
moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
held). xfs_dialloc() makes one pass through each of the AGs when
attempting to allocate an inode chunk. The expectation is a clean
transaction if a particular AG cannot satisfy the allocation
request. xfs_ialloc_ag_alloc() is written to support this through
use of the minalignslop allocation args field.

When using the agi->agi_newino optimization, we attempt an exact
bno allocation request based on the location of the previously
allocated chunk. minalignslop is set to inform the allocator that
we will require alignment on this chunk, and thus to not allow the
request for this AG if the extra space is not available. Suppose
that the AG in question has just enough space for this request, but
not at the requested bno. xfs_alloc_fix_freelist() will proceed as
normal as it determines the request should succeed, and thus it is
allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
because the requested bno is not available. In response, the caller
moves on to a NEAR_BNO allocation request for the same AG. The
alignment is set, but the minalignslop field is never reset. This
increases the overall requirement of the request from the first
attempt. If this delta is the difference between allocation success
and failure for the AG, xfs_alloc_fix_freelist() rejects this
request outright the second time around and causes the allocation
request to unnecessarily fail for this AG.

To address this situation, reset the minalignslop field immediately
after use and prevent it from leaking into subsequent requests.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

v2:
- Reset minalignslop immediately after use rather than prior to the
  subsequent request and add a comment. [dchinner]

 fs/xfs/xfs_ialloc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 5d7f105..a57843f 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -363,6 +363,18 @@ xfs_ialloc_ag_alloc(
 		args.minleft = args.mp->m_in_maxlevels - 1;
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
+
+		/*
+		 * This request might have dirtied the transaction if the AG can
+		 * satisfy the request, but the exact block was not available. 
+		 * If the allocation did fail, subsequent requests will relax 
+		 * the exact agbno requirement and increase the alignment
+		 * instead. It is critical that the total size of the request
+		 * (len + alignment + slop) does not increase from this point
+		 * on, so reset minalignslop to ensure it is not included in
+		 * subsequent requests.
+		 */
+		args.minalignslop = 0;
 	} else
 		args.fsbno = NULLFSBLOCK;
 
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
  2014-02-11 18:07 [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation Brian Foster
@ 2014-02-18 16:48 ` Mark Tinguely
  2014-02-28 19:24 ` Brian Foster
  1 sibling, 0 replies; 4+ messages in thread
From: Mark Tinguely @ 2014-02-18 16:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 02/11/14 12:07, Brian Foster wrote:
> The inode chunk allocation path can lead to deadlock conditions if
> a transaction is dirtied with an AGF (to fix up the freelist) for
> an AG that cannot satisfy the actual allocation request. This code
> path is written to try and avoid this scenario, but it can be
> reproduced by running xfstests generic/270 in a loop on a 512b fs.
>
> An example situation is:
> - process A attempts an inode allocation on AG 3, modifies
>    the freelist, fails the allocation and ultimately moves on to
>    AG 0 with the AG 3 AGF held
> - process B is doing a free space operation (i.e., truncate) and
>    acquires the AG 0 AGF, waits on the AG 3 AGF
> - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
>
> The problem here is that process A acquired the AG 3 AGF while
> moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
> held). xfs_dialloc() makes one pass through each of the AGs when
> attempting to allocate an inode chunk. The expectation is a clean
> transaction if a particular AG cannot satisfy the allocation
> request. xfs_ialloc_ag_alloc() is written to support this through
> use of the minalignslop allocation args field.
>
> When using the agi->agi_newino optimization, we attempt an exact
> bno allocation request based on the location of the previously
> allocated chunk. minalignslop is set to inform the allocator that
> we will require alignment on this chunk, and thus to not allow the
> request for this AG if the extra space is not available. Suppose
> that the AG in question has just enough space for this request, but
> not at the requested bno. xfs_alloc_fix_freelist() will proceed as
> normal as it determines the request should succeed, and thus it is
> allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
> because the requested bno is not available. In response, the caller
> moves on to a NEAR_BNO allocation request for the same AG. The
> alignment is set, but the minalignslop field is never reset. This
> increases the overall requirement of the request from the first
> attempt. If this delta is the difference between allocation success
> and failure for the AG, xfs_alloc_fix_freelist() rejects this
> request outright the second time around and causes the allocation
> request to unnecessarily fail for this AG.
>
> To address this situation, reset the minalignslop field immediately
> after use and prevent it from leaking into subsequent requests.
>
> Signed-off-by: Brian Foster<bfoster@redhat.com>
> ---
>
> v2:
> - Reset minalignslop immediately after use rather than prior to the
>    subsequent request and add a comment. [dchinner]
>
>   fs/xfs/xfs_ialloc.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
> index 5d7f105..a57843f 100644
> --- a/fs/xfs/xfs_ialloc.c
> +++ b/fs/xfs/xfs_ialloc.c
> @@ -363,6 +363,18 @@ xfs_ialloc_ag_alloc(
>   		args.minleft = args.mp->m_in_maxlevels - 1;
>   		if ((error = xfs_alloc_vextent(&args)))
>   			return error;
> +
> +		/*
> +		 * This request might have dirtied the transaction if the AG can
> +		 * satisfy the request, but the exact block was not available.
> +		 * If the allocation did fail, subsequent requests will relax
> +		 * the exact agbno requirement and increase the alignment
> +		 * instead. It is critical that the total size of the request
> +		 * (len + alignment + slop) does not increase from this point
> +		 * on, so reset minalignslop to ensure it is not included in
> +		 * subsequent requests.
> +		 */
> +		args.minalignslop = 0;
>   	} else
>   		args.fsbno = NULLFSBLOCK;
>

Effective, this was removed by commit 83a9ba:

  xfs: don't zero structure members after a memset(0)

It was zeroed if the xfs_alloc_vextent() failed and would have stayed 
zero if xfs_alloc_vextent() needed to be called a third time.

Does not matter to me if it is zeroed here or where it used to be zeroed.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
  2014-02-11 18:07 [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation Brian Foster
  2014-02-18 16:48 ` Mark Tinguely
@ 2014-02-28 19:24 ` Brian Foster
  2014-02-28 22:20   ` Dave Chinner
  1 sibling, 1 reply; 4+ messages in thread
From: Brian Foster @ 2014-02-28 19:24 UTC (permalink / raw)
  To: xfs

On Tue, Feb 11, 2014 at 01:07:46PM -0500, Brian Foster wrote:
> The inode chunk allocation path can lead to deadlock conditions if
> a transaction is dirtied with an AGF (to fix up the freelist) for
> an AG that cannot satisfy the actual allocation request. This code
> path is written to try and avoid this scenario, but it can be
> reproduced by running xfstests generic/270 in a loop on a 512b fs.
> 
> An example situation is:
> - process A attempts an inode allocation on AG 3, modifies
>   the freelist, fails the allocation and ultimately moves on to
>   AG 0 with the AG 3 AGF held
> - process B is doing a free space operation (i.e., truncate) and
>   acquires the AG 0 AGF, waits on the AG 3 AGF
> - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
> 
> The problem here is that process A acquired the AG 3 AGF while
> moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
> held). xfs_dialloc() makes one pass through each of the AGs when
> attempting to allocate an inode chunk. The expectation is a clean
> transaction if a particular AG cannot satisfy the allocation
> request. xfs_ialloc_ag_alloc() is written to support this through
> use of the minalignslop allocation args field.
> 
> When using the agi->agi_newino optimization, we attempt an exact
> bno allocation request based on the location of the previously
> allocated chunk. minalignslop is set to inform the allocator that
> we will require alignment on this chunk, and thus to not allow the
> request for this AG if the extra space is not available. Suppose
> that the AG in question has just enough space for this request, but
> not at the requested bno. xfs_alloc_fix_freelist() will proceed as
> normal as it determines the request should succeed, and thus it is
> allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
> because the requested bno is not available. In response, the caller
> moves on to a NEAR_BNO allocation request for the same AG. The
> alignment is set, but the minalignslop field is never reset. This
> increases the overall requirement of the request from the first
> attempt. If this delta is the difference between allocation success
> and failure for the AG, xfs_alloc_fix_freelist() rejects this
> request outright the second time around and causes the allocation
> request to unnecessarily fail for this AG.
> 
> To address this situation, reset the minalignslop field immediately
> after use and prevent it from leaking into subsequent requests.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> v2:
> - Reset minalignslop immediately after use rather than prior to the
>   subsequent request and add a comment. [dchinner]
> 

ping? Any chance to get this committed?

Brian

>  fs/xfs/xfs_ialloc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
> index 5d7f105..a57843f 100644
> --- a/fs/xfs/xfs_ialloc.c
> +++ b/fs/xfs/xfs_ialloc.c
> @@ -363,6 +363,18 @@ xfs_ialloc_ag_alloc(
>  		args.minleft = args.mp->m_in_maxlevels - 1;
>  		if ((error = xfs_alloc_vextent(&args)))
>  			return error;
> +
> +		/*
> +		 * This request might have dirtied the transaction if the AG can
> +		 * satisfy the request, but the exact block was not available. 
> +		 * If the allocation did fail, subsequent requests will relax 
> +		 * the exact agbno requirement and increase the alignment
> +		 * instead. It is critical that the total size of the request
> +		 * (len + alignment + slop) does not increase from this point
> +		 * on, so reset minalignslop to ensure it is not included in
> +		 * subsequent requests.
> +		 */
> +		args.minalignslop = 0;
>  	} else
>  		args.fsbno = NULLFSBLOCK;
>  
> -- 
> 1.8.3.1
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
  2014-02-28 19:24 ` Brian Foster
@ 2014-02-28 22:20   ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2014-02-28 22:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Feb 28, 2014 at 02:24:18PM -0500, Brian Foster wrote:
> On Tue, Feb 11, 2014 at 01:07:46PM -0500, Brian Foster wrote:
> > The inode chunk allocation path can lead to deadlock conditions if
> > a transaction is dirtied with an AGF (to fix up the freelist) for
> > an AG that cannot satisfy the actual allocation request. This code
> > path is written to try and avoid this scenario, but it can be
> > reproduced by running xfstests generic/270 in a loop on a 512b fs.
> > 
> > An example situation is:
> > - process A attempts an inode allocation on AG 3, modifies
> >   the freelist, fails the allocation and ultimately moves on to
> >   AG 0 with the AG 3 AGF held
> > - process B is doing a free space operation (i.e., truncate) and
> >   acquires the AG 0 AGF, waits on the AG 3 AGF
> > - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
> > 
> > The problem here is that process A acquired the AG 3 AGF while
> > moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
> > held). xfs_dialloc() makes one pass through each of the AGs when
> > attempting to allocate an inode chunk. The expectation is a clean
> > transaction if a particular AG cannot satisfy the allocation
> > request. xfs_ialloc_ag_alloc() is written to support this through
> > use of the minalignslop allocation args field.
> > 
> > When using the agi->agi_newino optimization, we attempt an exact
> > bno allocation request based on the location of the previously
> > allocated chunk. minalignslop is set to inform the allocator that
> > we will require alignment on this chunk, and thus to not allow the
> > request for this AG if the extra space is not available. Suppose
> > that the AG in question has just enough space for this request, but
> > not at the requested bno. xfs_alloc_fix_freelist() will proceed as
> > normal as it determines the request should succeed, and thus it is
> > allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
> > because the requested bno is not available. In response, the caller
> > moves on to a NEAR_BNO allocation request for the same AG. The
> > alignment is set, but the minalignslop field is never reset. This
> > increases the overall requirement of the request from the first
> > attempt. If this delta is the difference between allocation success
> > and failure for the AG, xfs_alloc_fix_freelist() rejects this
> > request outright the second time around and causes the allocation
> > request to unnecessarily fail for this AG.
> > 
> > To address this situation, reset the minalignslop field immediately
> > after use and prevent it from leaking into subsequent requests.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > v2:
> > - Reset minalignslop immediately after use rather than prior to the
> >   subsequent request and add a comment. [dchinner]
> > 
> 
> ping? Any chance to get this committed?

I'm sorry, Brian, I thought I had committed it - I'll get it in the
next round.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-28 22:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-11 18:07 [PATCH v2] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation Brian Foster
2014-02-18 16:48 ` Mark Tinguely
2014-02-28 19:24 ` Brian Foster
2014-02-28 22:20   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).