public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: Fix double unlock in defer capture code
@ 2021-11-03 21:33 Allison Henderson
  2021-11-04  0:16 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Allison Henderson @ 2021-11-03 21:33 UTC (permalink / raw)
  To: linux-xfs

The new deferred attr patch set uncovered a double unlock in the
recent port of the defer ops capture and continue code.  During log
recovery, we're allowed to hold buffers to a transaction that's being
used to replay an intent item.  When we capture the resources as part
of scheduling a continuation of an intent chain, we call xfs_buf_hold
to retain our reference to the buffer beyond the transaction commit,
but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
This means that xfs_defer_ops_continue needs to relock the buffers
before xfs_defer_restore_resources joins then tothe new transaction.

Additionally, the buffers should not be passed back via the dres
structure since they need to remain locked unlike the inodes.  So
simply set dr_bufs to zero after populating the dres structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c | 40 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0805ade2d300..734ac9fd2628 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -22,6 +22,7 @@
 #include "xfs_refcount.h"
 #include "xfs_bmap.h"
 #include "xfs_alloc.h"
+#include "xfs_buf.h"
 
 static struct kmem_cache	*xfs_defer_pending_cache;
 
@@ -762,6 +763,33 @@ xfs_defer_ops_capture_and_commit(
 	return 0;
 }
 
+static void
+xfs_defer_relock_buffers(
+	struct xfs_defer_capture	*dfc)
+{
+	struct xfs_defer_resources	*dres = &dfc->dfc_held;
+	unsigned int			i, j;
+
+	/*
+	 * Sort the elements via bubble sort.  (Remember, there are at most 2
+	 * elements to sort, so this is adequate.)
+	 */
+	for (i = 0; i < dres->dr_bufs; i++) {
+		for (j = 1; j < dres->dr_bufs; j++) {
+			if (xfs_buf_daddr(dres->dr_bp[j]) <
+				xfs_buf_daddr(dres->dr_bp[j - 1])) {
+				struct xfs_buf  *temp = dres->dr_bp[j];
+
+				dres->dr_bp[j] = dres->dr_bp[j - 1];
+				dres->dr_bp[j - 1] = temp;
+			}
+		}
+	}
+
+	for (i = 0; i < dres->dr_bufs; i++)
+		xfs_buf_lock(dres->dr_bp[i]);
+}
+
 /*
  * Attach a chain of captured deferred ops to a new transaction and free the
  * capture structure.  If an inode was captured, it will be passed back to the
@@ -777,15 +805,25 @@ xfs_defer_ops_continue(
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
-	/* Lock and join the captured inode to the new transaction. */
+	/* Lock the captured resources to the new transaction. */
 	if (dfc->dfc_held.dr_inos == 2)
 		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
 				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
 	else if (dfc->dfc_held.dr_inos == 1)
 		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
+
+	xfs_defer_relock_buffers(dfc);
+
+	/* Join the captured resources to the new transaction. */
 	xfs_defer_restore_resources(tp, &dfc->dfc_held);
 	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
 
+	/*
+	 * Inodes must be passed back to the log recovery code to be unlocked,
+	 * but buffers do not.  Ignore the captured buffers
+	 */
+	dres->dr_bufs = 0;
+
 	/* Move captured dfops chain and state to the transaction. */
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: Fix double unlock in defer capture code
  2021-11-03 21:33 [PATCH] xfs: Fix double unlock in defer capture code Allison Henderson
@ 2021-11-04  0:16 ` Dave Chinner
  2021-11-04  1:30   ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2021-11-04  0:16 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
> The new deferred attr patch set uncovered a double unlock in the
> recent port of the defer ops capture and continue code.  During log
> recovery, we're allowed to hold buffers to a transaction that's being
> used to replay an intent item.  When we capture the resources as part
> of scheduling a continuation of an intent chain, we call xfs_buf_hold
> to retain our reference to the buffer beyond the transaction commit,
> but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
> This means that xfs_defer_ops_continue needs to relock the buffers
> before xfs_defer_restore_resources joins then tothe new transaction.
> 
> Additionally, the buffers should not be passed back via the dres
> structure since they need to remain locked unlike the inodes.  So
> simply set dr_bufs to zero after populating the dres structure.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 40 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0805ade2d300..734ac9fd2628 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -22,6 +22,7 @@
>  #include "xfs_refcount.h"
>  #include "xfs_bmap.h"
>  #include "xfs_alloc.h"
> +#include "xfs_buf.h"
>  
>  static struct kmem_cache	*xfs_defer_pending_cache;
>  
> @@ -762,6 +763,33 @@ xfs_defer_ops_capture_and_commit(
>  	return 0;
>  }
>  
> +static void
> +xfs_defer_relock_buffers(
> +	struct xfs_defer_capture	*dfc)
> +{
> +	struct xfs_defer_resources	*dres = &dfc->dfc_held;
> +	unsigned int			i, j;
> +
> +	/*
> +	 * Sort the elements via bubble sort.  (Remember, there are at most 2
> +	 * elements to sort, so this is adequate.)
> +	 */

Seems like overkill if we only have two buffers that can be held.
All we need if there are only two buffers is a swap() call.

However, locking arbitrary buffers based on disk address order is
also theoretically incorrect.

For example, if the two buffers we have held the AGF and AGI buffers
for a given AG, then this will lock the AGF before the AGI. However,
the lock order for AGI vs AGF is AGI first, hence we'd be locking
these buffers in the wrong order here. Another example is that btree
buffers are generally locked in parent->child order or
sibling->sibling order, not disk offset order.

Hence I'm wondering is this generalisation is a safe method of
locking buffers.

In general, the first locked and joined buffer in a transaction is
always the first that should be locked. Hence I suspect we need to
ensure that the dres->dr_bp array always reflects the order in which
buffers were joined to a transaction so that we can simply lock them
in ascending array index order and not need to care what the
relationship between the buffers are...

> +	for (i = 0; i < dres->dr_bufs; i++) {
> +		for (j = 1; j < dres->dr_bufs; j++) {
> +			if (xfs_buf_daddr(dres->dr_bp[j]) <
> +				xfs_buf_daddr(dres->dr_bp[j - 1])) {
> +				struct xfs_buf  *temp = dres->dr_bp[j];
> +
> +				dres->dr_bp[j] = dres->dr_bp[j - 1];
> +				dres->dr_bp[j - 1] = temp;
> +			}
> +		}
> +	}
> +
> +	for (i = 0; i < dres->dr_bufs; i++)
> +		xfs_buf_lock(dres->dr_bp[i]);
> +}
> +
>  /*
>   * Attach a chain of captured deferred ops to a new transaction and free the
>   * capture structure.  If an inode was captured, it will be passed back to the
> @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
>  	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>  	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
>  
> -	/* Lock and join the captured inode to the new transaction. */
> +	/* Lock the captured resources to the new transaction. */
>  	if (dfc->dfc_held.dr_inos == 2)
>  		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
>  				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
>  	else if (dfc->dfc_held.dr_inos == 1)
>  		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
> +
> +	xfs_defer_relock_buffers(dfc);
> +
> +	/* Join the captured resources to the new transaction. */
>  	xfs_defer_restore_resources(tp, &dfc->dfc_held);
>  	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
>  
> +	/*
> +	 * Inodes must be passed back to the log recovery code to be unlocked,
> +	 * but buffers do not.  Ignore the captured buffers
> +	 */
> +	dres->dr_bufs = 0;

I'm not sure what this comment is supposed to indicate. This seems
to be infrastructure specific to log recovery, not general runtime
functionality, but even in that context I don't really understand
what it means or why it is done...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: Fix double unlock in defer capture code
  2021-11-04  0:16 ` Dave Chinner
@ 2021-11-04  1:30   ` Darrick J. Wong
  2021-11-04 16:59     ` Allison Henderson
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2021-11-04  1:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Allison Henderson, linux-xfs

On Thu, Nov 04, 2021 at 11:16:33AM +1100, Dave Chinner wrote:
> On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
> > The new deferred attr patch set uncovered a double unlock in the
> > recent port of the defer ops capture and continue code.  During log
> > recovery, we're allowed to hold buffers to a transaction that's being
> > used to replay an intent item.  When we capture the resources as part
> > of scheduling a continuation of an intent chain, we call xfs_buf_hold
> > to retain our reference to the buffer beyond the transaction commit,
> > but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
> > This means that xfs_defer_ops_continue needs to relock the buffers
> > before xfs_defer_restore_resources joins then tothe new transaction.
> > 
> > Additionally, the buffers should not be passed back via the dres
> > structure since they need to remain locked unlike the inodes.  So
> > simply set dr_bufs to zero after populating the dres structure.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c | 40 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 39 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 0805ade2d300..734ac9fd2628 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -22,6 +22,7 @@
> >  #include "xfs_refcount.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_alloc.h"
> > +#include "xfs_buf.h"
> >  
> >  static struct kmem_cache	*xfs_defer_pending_cache;
> >  
> > @@ -762,6 +763,33 @@ xfs_defer_ops_capture_and_commit(
> >  	return 0;
> >  }
> >  
> > +static void
> > +xfs_defer_relock_buffers(
> > +	struct xfs_defer_capture	*dfc)
> > +{
> > +	struct xfs_defer_resources	*dres = &dfc->dfc_held;
> > +	unsigned int			i, j;
> > +
> > +	/*
> > +	 * Sort the elements via bubble sort.  (Remember, there are at most 2
> > +	 * elements to sort, so this is adequate.)
> > +	 */
> 
> Seems like overkill if we only have two buffers that can be held.
> All we need if there are only two buffers is a swap() call.
> 
> However, locking arbitrary buffers based on disk address order is
> also theoretically incorrect.
> 
> For example, if the two buffers we have held the AGF and AGI buffers
> for a given AG, then this will lock the AGF before the AGI. However,
> the lock order for AGI vs AGF is AGI first, hence we'd be locking
> these buffers in the wrong order here. Another example is that btree
> buffers are generally locked in parent->child order or
> sibling->sibling order, not disk offset order.
> 
> Hence I'm wondering is this generalisation is a safe method of
> locking buffers.
> 
> In general, the first locked and joined buffer in a transaction is
> always the first that should be locked. Hence I suspect we need to
> ensure that the dres->dr_bp array always reflects the order in which
> buffers were joined to a transaction so that we can simply lock them
> in ascending array index order and not need to care what the
> relationship between the buffers are...

/me agrees with that; I think you ought to be able to skip the sort
entirely because the dr_bp array is loaded in order of the transaction
items, which means that we'd be locking them in the same order as the
transaction.

> > +	for (i = 0; i < dres->dr_bufs; i++) {
> > +		for (j = 1; j < dres->dr_bufs; j++) {
> > +			if (xfs_buf_daddr(dres->dr_bp[j]) <
> > +				xfs_buf_daddr(dres->dr_bp[j - 1])) {
> > +				struct xfs_buf  *temp = dres->dr_bp[j];
> > +
> > +				dres->dr_bp[j] = dres->dr_bp[j - 1];
> > +				dres->dr_bp[j - 1] = temp;
> > +			}
> > +		}
> > +	}
> > +
> > +	for (i = 0; i < dres->dr_bufs; i++)
> > +		xfs_buf_lock(dres->dr_bp[i]);
> > +}
> > +
> >  /*
> >   * Attach a chain of captured deferred ops to a new transaction and free the
> >   * capture structure.  If an inode was captured, it will be passed back to the
> > @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
> >  	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
> >  	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> >  
> > -	/* Lock and join the captured inode to the new transaction. */
> > +	/* Lock the captured resources to the new transaction. */
> >  	if (dfc->dfc_held.dr_inos == 2)
> >  		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
> >  				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
> >  	else if (dfc->dfc_held.dr_inos == 1)
> >  		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
> > +
> > +	xfs_defer_relock_buffers(dfc);
> > +
> > +	/* Join the captured resources to the new transaction. */
> >  	xfs_defer_restore_resources(tp, &dfc->dfc_held);
> >  	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
> >  
> > +	/*
> > +	 * Inodes must be passed back to the log recovery code to be unlocked,
> > +	 * but buffers do not.  Ignore the captured buffers
> > +	 */
> > +	dres->dr_bufs = 0;
> 
> I'm not sure what this comment is supposed to indicate. This seems
> to be infrastructure specific to log recovery, not general runtime
> functionality, but even in that context I don't really understand
> what it means or why it is done...

The defer_capture machinery picks up inodes that were ijoined with
lock_flags==0 (i.e. caller will unlock explicitly), which is why they
have to be passed back out after the entire transaction sequence
completes.

By contrast, the defer capture machinery picks up buffers with BLI_HOLD
set on the log item.  These are only meant to maintain the hold across
the next transaction roll (or the next defer_finish invocation), which
means that the caller is responsible for unlocking and releasing the
buffer (or I guess re-holding it) during that next transaction.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: Fix double unlock in defer capture code
  2021-11-04  1:30   ` Darrick J. Wong
@ 2021-11-04 16:59     ` Allison Henderson
  2021-11-04 22:18       ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Allison Henderson @ 2021-11-04 16:59 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs



On 11/3/21 6:30 PM, Darrick J. Wong wrote:
> On Thu, Nov 04, 2021 at 11:16:33AM +1100, Dave Chinner wrote:
>> On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
>>> The new deferred attr patch set uncovered a double unlock in the
>>> recent port of the defer ops capture and continue code.  During log
>>> recovery, we're allowed to hold buffers to a transaction that's being
>>> used to replay an intent item.  When we capture the resources as part
>>> of scheduling a continuation of an intent chain, we call xfs_buf_hold
>>> to retain our reference to the buffer beyond the transaction commit,
>>> but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
>>> This means that xfs_defer_ops_continue needs to relock the buffers
>>> before xfs_defer_restore_resources joins then tothe new transaction.
>>>
>>> Additionally, the buffers should not be passed back via the dres
>>> structure since they need to remain locked unlike the inodes.  So
>>> simply set dr_bufs to zero after populating the dres structure.
>>>
>>> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>> ---
>>>   fs/xfs/libxfs/xfs_defer.c | 40 ++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>>> index 0805ade2d300..734ac9fd2628 100644
>>> --- a/fs/xfs/libxfs/xfs_defer.c
>>> +++ b/fs/xfs/libxfs/xfs_defer.c
>>> @@ -22,6 +22,7 @@
>>>   #include "xfs_refcount.h"
>>>   #include "xfs_bmap.h"
>>>   #include "xfs_alloc.h"
>>> +#include "xfs_buf.h"
>>>   
>>>   static struct kmem_cache	*xfs_defer_pending_cache;
>>>   
>>> @@ -762,6 +763,33 @@ xfs_defer_ops_capture_and_commit(
>>>   	return 0;
>>>   }
>>>   
>>> +static void
>>> +xfs_defer_relock_buffers(
>>> +	struct xfs_defer_capture	*dfc)
>>> +{
>>> +	struct xfs_defer_resources	*dres = &dfc->dfc_held;
>>> +	unsigned int			i, j;
>>> +
>>> +	/*
>>> +	 * Sort the elements via bubble sort.  (Remember, there are at most 2
>>> +	 * elements to sort, so this is adequate.)
>>> +	 */
>>
>> Seems like overkill if we only have two buffers that can be held.
>> All we need if there are only two buffers is a swap() call.
>>
>> However, locking arbitrary buffers based on disk address order is
>> also theoretically incorrect.
>>
>> For example, if the two buffers we have held the AGF and AGI buffers
>> for a given AG, then this will lock the AGF before the AGI. However,
>> the lock order for AGI vs AGF is AGI first, hence we'd be locking
>> these buffers in the wrong order here. Another example is that btree
>> buffers are generally locked in parent->child order or
>> sibling->sibling order, not disk offset order.
>>
>> Hence I'm wondering is this generalisation is a safe method of
>> locking buffers.
>>
>> In general, the first locked and joined buffer in a transaction is
>> always the first that should be locked. Hence I suspect we need to
>> ensure that the dres->dr_bp array always reflects the order in which
>> buffers were joined to a transaction so that we can simply lock them
>> in ascending array index order and not need to care what the
>> relationship between the buffers are...
> 
> /me agrees with that; I think you ought to be able to skip the sort
> entirely because the dr_bp array is loaded in order of the transaction
> items, which means that we'd be locking them in the same order as the
> transaction.
Ok, we don't have anything that uses two buffers ATM, so there really 
isn't a need for it.  Will remove.

> 
>>> +	for (i = 0; i < dres->dr_bufs; i++) {
>>> +		for (j = 1; j < dres->dr_bufs; j++) {
>>> +			if (xfs_buf_daddr(dres->dr_bp[j]) <
>>> +				xfs_buf_daddr(dres->dr_bp[j - 1])) {
>>> +				struct xfs_buf  *temp = dres->dr_bp[j];
>>> +
>>> +				dres->dr_bp[j] = dres->dr_bp[j - 1];
>>> +				dres->dr_bp[j - 1] = temp;
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	for (i = 0; i < dres->dr_bufs; i++)
>>> +		xfs_buf_lock(dres->dr_bp[i]);
>>> +}
>>> +
>>>   /*
>>>    * Attach a chain of captured deferred ops to a new transaction and free the
>>>    * capture structure.  If an inode was captured, it will be passed back to the
>>> @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
>>>   	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>>>   	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
>>>   
>>> -	/* Lock and join the captured inode to the new transaction. */
>>> +	/* Lock the captured resources to the new transaction. */
>>>   	if (dfc->dfc_held.dr_inos == 2)
>>>   		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
>>>   				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
>>>   	else if (dfc->dfc_held.dr_inos == 1)
>>>   		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
>>> +
>>> +	xfs_defer_relock_buffers(dfc);
>>> +
>>> +	/* Join the captured resources to the new transaction. */
>>>   	xfs_defer_restore_resources(tp, &dfc->dfc_held);
>>>   	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
>>>   
>>> +	/*
>>> +	 * Inodes must be passed back to the log recovery code to be unlocked,
>>> +	 * but buffers do not.  Ignore the captured buffers
>>> +	 */
>>> +	dres->dr_bufs = 0;
>>
>> I'm not sure what this comment is supposed to indicate. This seems
>> to be infrastructure specific to log recovery, not general runtime
>> functionality, but even in that context I don't really understand
>> what it means or why it is done...
> 
> The defer_capture machinery picks up inodes that were ijoined with
> lock_flags==0 (i.e. caller will unlock explicitly), which is why they
> have to be passed back out after the entire transaction sequence
> completes.
> 
> By contrast, the defer capture machinery picks up buffers with BLI_HOLD
> set on the log item.  These are only meant to maintain the hold across
> the next transaction roll (or the next defer_finish invocation), which
> means that the caller is responsible for unlocking and releasing the
> buffer (or I guess re-holding it) during that next transaction.
> 
Ok, so should we remove or expand the comment?  I thought it made sense 
with the commentary at the top of the function that talks about why 
inodes are passed back, but I am not picky.  How about:

/*
  * Inodes must be passed back to the log recovery code to be unlocked,
  * but buffers do not.  Buffers are released by the calling code, and
  * only need to be transferred to the next transaction.  Ignore
  * captured buffers here
  */

?

Thanks!
Allison


> --D
> 
>> Cheers,
>>
>> Dave.
>> -- 
>> Dave Chinner
>> david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: Fix double unlock in defer capture code
  2021-11-04 16:59     ` Allison Henderson
@ 2021-11-04 22:18       ` Dave Chinner
  2021-11-05  6:59         ` Allison Henderson
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2021-11-04 22:18 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Darrick J. Wong, linux-xfs

On Thu, Nov 04, 2021 at 09:59:50AM -0700, Allison Henderson wrote:
> On 11/3/21 6:30 PM, Darrick J. Wong wrote:
> > On Thu, Nov 04, 2021 at 11:16:33AM +1100, Dave Chinner wrote:
> > > On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
> > > > @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
> > > >   	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
> > > >   	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> > > > -	/* Lock and join the captured inode to the new transaction. */
> > > > +	/* Lock the captured resources to the new transaction. */
> > > >   	if (dfc->dfc_held.dr_inos == 2)
> > > >   		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
> > > >   				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
> > > >   	else if (dfc->dfc_held.dr_inos == 1)
> > > >   		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
> > > > +
> > > > +	xfs_defer_relock_buffers(dfc);
> > > > +
> > > > +	/* Join the captured resources to the new transaction. */
> > > >   	xfs_defer_restore_resources(tp, &dfc->dfc_held);
> > > >   	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
> > > > +	/*
> > > > +	 * Inodes must be passed back to the log recovery code to be unlocked,
> > > > +	 * but buffers do not.  Ignore the captured buffers
> > > > +	 */
> > > > +	dres->dr_bufs = 0;
> > > 
> > > I'm not sure what this comment is supposed to indicate. This seems
> > > to be infrastructure specific to log recovery, not general runtime
> > > functionality, but even in that context I don't really understand
> > > what it means or why it is done...
> > 
> > The defer_capture machinery picks up inodes that were ijoined with
> > lock_flags==0 (i.e. caller will unlock explicitly), which is why they
> > have to be passed back out after the entire transaction sequence
> > completes.

I'm still not grokking what "passed back out" is supposed to mean
or how it is implemented.

> > By contrast, the defer capture machinery picks up buffers with BLI_HOLD
> > set on the log item.  These are only meant to maintain the hold across
> > the next transaction roll (or the next defer_finish invocation), which
> > means that the caller is responsible for unlocking and releasing the
> > buffer (or I guess re-holding it) during that next transaction.

Sure, but buffers that have XFS_BLI_HOLD is set are not unlocked on
transaction commit. So this makes little sense to me.

A bunch of notes follows as I tried to make sense of this....

We have deferop "save/restore" resources functions that store held
buffers/inodes on save and hold them again on restore via a struct
xfs_defer_resources. This is only used to wrap transaction commits
in xfs_defer_trans_roll(), which means that the held objects stay
held across the entire transaction commit and defer ops processing.

Then we have "capture/free/continue/rele" which use the same struct
xfs_defer_resources but only takes direct references to buffers and
inodes they "hold" and rather than transaction scope references.
Hence before commit, they have to be relocked and rejoined to the
transaction. Ugh - same xfs_defer_resources structure, different
semantic meaning of contents.

Uses xfs_defer_restore_resources() internally, so it joins *and
holds* those items at the transaction level, meaning they do not get
unlocked by the subsequent transaction commit.  And then it is
committed like so:

                xfs_defer_ops_continue(dfc, tp, &dres);
		error = xfs_trans_commit(tp);
		xfs_defer_resources_rele(&dres);

And then because the objects are held and not unlocked by the
transaction commit, they need to be unlocked and released by the
xfs_defer_resources_rele() call.  But we've hacked dres.nbufs = 0,
so buffers are not released after transaction commit. This makes no
obvious sense - transaction commit does not free/release held
buffers, nor does xfs_defer_resources_rele(), so this just looks
like a buffer leak to me.

[ the API is a mess here - why does xfs_defer_ops_continue() memcpy
dfc->dres to dres, then get freed, then dres get passed to
xfs_defer_resources_rele()? Why isn't this simply:

		xfs_defer_ops_capture_continue(dfc, tp);
		error = xfs_trans_commit(tp);
		xfs_defer_ops_capture_rele(dfc);

The deferops functions are all single caller functions from log
recovery, so it doesn't make a huge amount of sense to me how or why
the code is structured this way. Indeed, I don't know why this
capture interface isn't part of the log recovery API, not core
deferops... ]

> Ok, so should we remove or expand the comment?  I thought it made sense with
> the commentary at the top of the function that talks about why inodes are
> passed back, but I am not picky.  How about:
> 
> /*
>  * Inodes must be passed back to the log recovery code to be unlocked,
>  * but buffers do not.  Buffers are released by the calling code, and
>  * only need to be transferred to the next transaction.  Ignore
>  * captured buffers here
>  */

This still just describes what the code does, and so I have no
more insight into what is actually doing the releasing of these
buffers and why they behave differently to inodes....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: Fix double unlock in defer capture code
  2021-11-04 22:18       ` Dave Chinner
@ 2021-11-05  6:59         ` Allison Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Allison Henderson @ 2021-11-05  6:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs



On 11/4/21 3:18 PM, Dave Chinner wrote:
> On Thu, Nov 04, 2021 at 09:59:50AM -0700, Allison Henderson wrote:
>> On 11/3/21 6:30 PM, Darrick J. Wong wrote:
>>> On Thu, Nov 04, 2021 at 11:16:33AM +1100, Dave Chinner wrote:
>>>> On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
>>>>> @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
>>>>>    	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>>>>>    	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
>>>>> -	/* Lock and join the captured inode to the new transaction. */
>>>>> +	/* Lock the captured resources to the new transaction. */
>>>>>    	if (dfc->dfc_held.dr_inos == 2)
>>>>>    		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
>>>>>    				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
>>>>>    	else if (dfc->dfc_held.dr_inos == 1)
>>>>>    		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
>>>>> +
>>>>> +	xfs_defer_relock_buffers(dfc);
>>>>> +
>>>>> +	/* Join the captured resources to the new transaction. */
>>>>>    	xfs_defer_restore_resources(tp, &dfc->dfc_held);
>>>>>    	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
>>>>> +	/*
>>>>> +	 * Inodes must be passed back to the log recovery code to be unlocked,
>>>>> +	 * but buffers do not.  Ignore the captured buffers
>>>>> +	 */
>>>>> +	dres->dr_bufs = 0;
>>>>
>>>> I'm not sure what this comment is supposed to indicate. This seems
>>>> to be infrastructure specific to log recovery, not general runtime
>>>> functionality, but even in that context I don't really understand
>>>> what it means or why it is done...
>>>
>>> The defer_capture machinery picks up inodes that were ijoined with
>>> lock_flags==0 (i.e. caller will unlock explicitly), which is why they
>>> have to be passed back out after the entire transaction sequence
>>> completes.
> 
> I'm still not grokking what "passed back out" is supposed to mean
> or how it is implemented.
> 
>>> By contrast, the defer capture machinery picks up buffers with BLI_HOLD
>>> set on the log item.  These are only meant to maintain the hold across
>>> the next transaction roll (or the next defer_finish invocation), which
>>> means that the caller is responsible for unlocking and releasing the
>>> buffer (or I guess re-holding it) during that next transaction.
> 
> Sure, but buffers that have XFS_BLI_HOLD is set are not unlocked on
> transaction commit. So this makes little sense to me.
> 
> A bunch of notes follows as I tried to make sense of this....
> 
> We have deferop "save/restore" resources functions that store held
> buffers/inodes on save and hold them again on restore via a struct
> xfs_defer_resources. This is only used to wrap transaction commits
> in xfs_defer_trans_roll(), which means that the held objects stay
> held across the entire transaction commit and defer ops processing.
> 
> Then we have "capture/free/continue/rele" which use the same struct
> xfs_defer_resources but only takes direct references to buffers and
> inodes they "hold" and rather than transaction scope references.
> Hence before commit, they have to be relocked and rejoined to the
> transaction. Ugh - same xfs_defer_resources structure, different
> semantic meaning of contents.
> 
> Uses xfs_defer_restore_resources() internally, so it joins *and
> holds* those items at the transaction level, meaning they do not get
> unlocked by the subsequent transaction commit.  And then it is
> committed like so:
> 
>                  xfs_defer_ops_continue(dfc, tp, &dres);
> 		error = xfs_trans_commit(tp);
> 		xfs_defer_resources_rele(&dres);

I feel like the part that's getting over looked here is that the attr 
code acquired the buffer in the middle of a multi transaction operation 
(when shortform turns into a leaf).  It holds that buffer across the 
roll and then lets it go when it no longer needs it.  This is something 
that the underlying attr operation is aware of, but the capture code is 
not.  So that why it needs to leave this for the underlying operation to 
have control of.  If xfs_defer_resources_rele were to release it again, 
that's where we get the double unlock.  I hope that helps some?  Unless 
I'm missing something else?  (i am sill recovering from my booster vax 
today :-p)

Allison

> 
> And then because the objects are held and not unlocked by the
> transaction commit, they need to be unlocked and released by the
> xfs_defer_resources_rele() call.  But we've hacked dres.nbufs = 0,
> so buffers are not released after transaction commit. This makes no
> obvious sense - transaction commit does not free/release held
> buffers, nor does xfs_defer_resources_rele(), so this just looks
> like a buffer leak to me.
> 
> [ the API is a mess here - why does xfs_defer_ops_continue() memcpy
> dfc->dres to dres, then get freed, then dres get passed to
> xfs_defer_resources_rele()? Why isn't this simply:
> 
> 		xfs_defer_ops_capture_continue(dfc, tp);
> 		error = xfs_trans_commit(tp);
> 		xfs_defer_ops_capture_rele(dfc);
> 
> The deferops functions are all single caller functions from log
> recovery, so it doesn't make a huge amount of sense to me how or why
> the code is structured this way. Indeed, I don't know why this
> capture interface isn't part of the log recovery API, not core
> deferops... ]
> 
>> Ok, so should we remove or expand the comment?  I thought it made sense with
>> the commentary at the top of the function that talks about why inodes are
>> passed back, but I am not picky.  How about:
>>
>> /*
>>   * Inodes must be passed back to the log recovery code to be unlocked,
>>   * but buffers do not.  Buffers are released by the calling code, and
>>   * only need to be transferred to the next transaction.  Ignore
>>   * captured buffers here
>>   */
> 
> This still just describes what the code does, and so I have no
> more insight into what is actually doing the releasing of these
> buffers and why they behave differently to inodes....
> 
> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-05  6:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-03 21:33 [PATCH] xfs: Fix double unlock in defer capture code Allison Henderson
2021-11-04  0:16 ` Dave Chinner
2021-11-04  1:30   ` Darrick J. Wong
2021-11-04 16:59     ` Allison Henderson
2021-11-04 22:18       ` Dave Chinner
2021-11-05  6:59         ` Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox