public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence
@ 2015-06-24 14:04 Brian Foster
  2015-07-08  0:09 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2015-06-24 14:04 UTC (permalink / raw)
  To: xfs

We have seen somewhat rare reports of the following assert from
xlog_cil_push_background() failing during ltp tests or somewhat
innocuous desktop root fs workloads (e.g., virt operations, initramfs
construction):

	ASSERT(!list_empty(&cil->xc_cil));

The reasoning behind the assert is that the transaction has inserted
items to the CIL and hit background push codepath all with
cil->xc_ctx_lock held for reading. This locks out background commit from
emptying the CIL, which acquires the lock for writing. Therefore, the
reasoning is that the items previously inserted in the CIL should still
be present.

The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
list, however, due to how CIL insertion is handled.
xlog_cil_insert_items() inserts and reorders the dirty transaction items
to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
achieve insertion and reordering in the same block of code. This
function removes and reinserts an item to the tail of the list. If a
transaction commits an item that was already logged and thus already
resides in the CIL, and said item is the sole item on the list, the
removal and reinsertion creates a temporary state where the list is
actually empty.

This state is not valid and thus should never be observed by concurrent
transaction commit-side checks in the circumstances outlined above.
Update all of the xc_cil checks to acquire xc_cil_lock before assessing
the state of xc_cil.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index abc2ccb..324d449 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -687,7 +687,7 @@ xlog_cil_push_background(
 	 * The cil won't be empty because we are called while holding the
 	 * context lock so whatever we added to the CIL will still be there
 	 */
-	ASSERT(!list_empty(&cil->xc_cil));
+	ASSERT(!xlog_cil_empty(log));
 
 	/*
 	 * don't do a background push if we haven't used up all the
@@ -731,13 +731,17 @@ xlog_cil_push_now(
 	 * there's no work we need to do.
 	 */
 	spin_lock(&cil->xc_push_lock);
+	spin_lock(&cil->xc_cil_lock);
 	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
+		spin_unlock(&cil->xc_cil_lock);
 		spin_unlock(&cil->xc_push_lock);
 		return;
 	}
+	spin_unlock(&cil->xc_cil_lock);
 
 	cil->xc_push_seq = push_seq;
 	queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work);
+
 	spin_unlock(&cil->xc_push_lock);
 }
 
@@ -749,8 +753,10 @@ xlog_cil_empty(
 	bool		empty = false;
 
 	spin_lock(&cil->xc_push_lock);
+	spin_lock(&cil->xc_cil_lock);
 	if (list_empty(&cil->xc_cil))
 		empty = true;
+	spin_unlock(&cil->xc_cil_lock);
 	spin_unlock(&cil->xc_push_lock);
 	return empty;
 }
@@ -887,12 +893,15 @@ restart:
 	 * it means we haven't yet started the push, because if it had started
 	 * we would have found the context on the committing list.
 	 */
+	spin_lock(&cil->xc_cil_lock);
 	if (sequence == cil->xc_current_sequence &&
 	    !list_empty(&cil->xc_cil)) {
+		spin_unlock(&cil->xc_cil_lock);
 		spin_unlock(&cil->xc_push_lock);
 		goto restart;
 	}
 
+	spin_unlock(&cil->xc_cil_lock);
 	spin_unlock(&cil->xc_push_lock);
 	return commit_lsn;
 
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence
  2015-06-24 14:04 [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence Brian Foster
@ 2015-07-08  0:09 ` Dave Chinner
  2015-07-08 18:31   ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2015-07-08  0:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Jun 24, 2015 at 10:04:01AM -0400, Brian Foster wrote:
> We have seen somewhat rare reports of the following assert from
> xlog_cil_push_background() failing during ltp tests or somewhat
> innocuous desktop root fs workloads (e.g., virt operations, initramfs
> construction):
> 
> 	ASSERT(!list_empty(&cil->xc_cil));
> 
> The reasoning behind the assert is that the transaction has inserted
> items to the CIL and hit background push codepath all with
> cil->xc_ctx_lock held for reading. This locks out background commit from
> emptying the CIL, which acquires the lock for writing. Therefore, the
> reasoning is that the items previously inserted in the CIL should still
> be present.
> 
> The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
> list, however, due to how CIL insertion is handled.
> xlog_cil_insert_items() inserts and reorders the dirty transaction items
> to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
> achieve insertion and reordering in the same block of code. This
> function removes and reinserts an item to the tail of the list. If a
> transaction commits an item that was already logged and thus already
> resides in the CIL, and said item is the sole item on the list, the
> removal and reinsertion creates a temporary state where the list is
> actually empty.

The only way I can see this occurring is that we have to be committing a transaction that
modifies the same object as the previous transaction commit that
is still running through xfs_log_commit_cil(). e.g. racing
timestamp modifications on an inode, and the CIL is empty:

thread A			thread B

lock(inode)			lock(inode)
xfs_trans_join(inode)		<schedule>
xfs_trans_log(inode)
xfs_trans_commit(tp)
  xfs_log_commit_cil()
    lock(xc_ctx_lock)
    <add inode to cil>
    xfs_trans_free_items()
      unlock(inode)
      <schedule>
				<runs with inode lock>
				xfs_trans_join(inode)
				xfs_trans_log(inode)
				xfs_trans_commit(tp)
				  xfs_log_commit_cil()
				    lock(xc_ctx_lock)
				    xlog_cil_insert_items()
				      xlog_cil_insert_format_items()
      <runs again>
      xlog_cil_push_background
	ASSERT(!list_empty(cil))      list_move_tail(item, cil)


If that is the race, then the fix appears simple to me: call
xlog_cil_push_background() before xfs_trans_free_items() so that we
push the CIL before we unlock the items we just added to the CIL.
i.e.:

thread A			thread B

lock(inode)			lock(inode)
xfs_trans_join(inode)		<schedule>
xfs_trans_log(inode)
xfs_trans_commit(tp)
  xfs_log_commit_cil()
    lock(xc_ctx_lock)
    <add inode to cil>
    log_cil_push_background
	ASSERT(!list_empty(cil))
    xfs_trans_free_items()
      unlock(inode)
      <schedule>
				<runs with inode lock>
				xfs_trans_join(inode)
				xfs_trans_log(inode)
				xfs_trans_commit(tp)
				  xfs_log_commit_cil()
				    lock(xc_ctx_lock)
				    xlog_cil_insert_items()
				      xlog_cil_insert_format_items()
				      list_move_tail(item, cil)
> This state is not valid and thus should never be observed by concurrent
> transaction commit-side checks in the circumstances outlined above.
> Update all of the xc_cil checks to acquire xc_cil_lock before assessing
> the state of xc_cil.

That will reintroduce the problem of lock contention between the
commit and the push sides of the CIL, which the cil/push lock
separation was added to solve:

commit 4bb928cdb900d0614f4766d5f1ca5bc3844f7656
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Aug 12 20:50:08 2013 +1000

    xfs: split the CIL lock
    
    The xc_cil_lock is used for two purposes - to protect the CIL
    itself, and to protect the push/commit state and lists. These are
    two logically separate structures and operations, so can have their
    own locks. This means that pushing on the CIL and the commit wait
    ordering won't contend for a lock with other transactions that are
    completing concurrently. As the CIL insertion is the hottest path
    throught eh CIL, this is a big win.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Mark Tinguely <tinguely@sgi.com>
    Signed-off-by: Ben Myers <bpm@sgi.com>


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence
  2015-07-08  0:09 ` Dave Chinner
@ 2015-07-08 18:31   ` Brian Foster
  2015-07-09  1:51     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2015-07-08 18:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jul 08, 2015 at 10:09:09AM +1000, Dave Chinner wrote:
> On Wed, Jun 24, 2015 at 10:04:01AM -0400, Brian Foster wrote:
> > We have seen somewhat rare reports of the following assert from
> > xlog_cil_push_background() failing during ltp tests or somewhat
> > innocuous desktop root fs workloads (e.g., virt operations, initramfs
> > construction):
> > 
> > 	ASSERT(!list_empty(&cil->xc_cil));
> > 
> > The reasoning behind the assert is that the transaction has inserted
> > items to the CIL and hit background push codepath all with
> > cil->xc_ctx_lock held for reading. This locks out background commit from
> > emptying the CIL, which acquires the lock for writing. Therefore, the
> > reasoning is that the items previously inserted in the CIL should still
> > be present.
> > 
> > The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
> > list, however, due to how CIL insertion is handled.
> > xlog_cil_insert_items() inserts and reorders the dirty transaction items
> > to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
> > achieve insertion and reordering in the same block of code. This
> > function removes and reinserts an item to the tail of the list. If a
> > transaction commits an item that was already logged and thus already
> > resides in the CIL, and said item is the sole item on the list, the
> > removal and reinsertion creates a temporary state where the list is
> > actually empty.
> 
> The only way I can see this occurring is that we have to be committing a transaction that
> modifies the same object as the previous transaction commit that
> is still running through xfs_log_commit_cil(). e.g. racing
> timestamp modifications on an inode, and the CIL is empty:
> 
> thread A			thread B
> 
> lock(inode)			lock(inode)
> xfs_trans_join(inode)		<schedule>
> xfs_trans_log(inode)
> xfs_trans_commit(tp)
>   xfs_log_commit_cil()
>     lock(xc_ctx_lock)
>     <add inode to cil>
>     xfs_trans_free_items()
>       unlock(inode)
>       <schedule>
> 				<runs with inode lock>
> 				xfs_trans_join(inode)
> 				xfs_trans_log(inode)
> 				xfs_trans_commit(tp)
> 				  xfs_log_commit_cil()
> 				    lock(xc_ctx_lock)
> 				    xlog_cil_insert_items()
> 				      xlog_cil_insert_format_items()
>       <runs again>
>       xlog_cil_push_background
> 	ASSERT(!list_empty(cil))      list_move_tail(item, cil)
> 

Yes. I don't have the exact details of the original reproducer but it
basically required transactions to recommit an already CIL-resident item
with that item as the only entry in the CIL. The first transaction
inserts or reinserts and heads to the background push while the second
reinserts and creates the transient empty list state. I don't think the
CIL necessarily has to be empty to start, but either way the above looks
about right to me.

> 
> If that is the race, then the fix appears simple to me: call
> xlog_cil_push_background() before xfs_trans_free_items() so that we
> push the CIL before we unlock the items we just added to the CIL.
> i.e.:
> 
> thread A			thread B
> 
> lock(inode)			lock(inode)
> xfs_trans_join(inode)		<schedule>
> xfs_trans_log(inode)
> xfs_trans_commit(tp)
>   xfs_log_commit_cil()
>     lock(xc_ctx_lock)
>     <add inode to cil>
>     log_cil_push_background
> 	ASSERT(!list_empty(cil))
>     xfs_trans_free_items()
>       unlock(inode)
>       <schedule>
> 				<runs with inode lock>
> 				xfs_trans_join(inode)
> 				xfs_trans_log(inode)
> 				xfs_trans_commit(tp)
> 				  xfs_log_commit_cil()
> 				    lock(xc_ctx_lock)
> 				    xlog_cil_insert_items()
> 				      xlog_cil_insert_format_items()
> 				      list_move_tail(item, cil)

That seems like it should work for this particular case. I suppose it
depends on the nature of the transactions as opposed to closing the race
directly. It would require independent transactions to relog the same
item without locking to expose this instance of the race again, which
probably should never happen.

That doesn't help us for the other push side users of list_empty(),
however. For example, what about xlog_cil_push_now(), etc.? AFAICS, that
code still seems racy.

> > This state is not valid and thus should never be observed by concurrent
> > transaction commit-side checks in the circumstances outlined above.
> > Update all of the xc_cil checks to acquire xc_cil_lock before assessing
> > the state of xc_cil.
> 
> That will reintroduce the problem of lock contention between the
> commit and the push sides of the CIL, which the cil/push lock
> separation was added to solve:
> 

If the lock is a problem, how about trying to close the race on the
insert side? E.g., something like the following in
xlog_cil_insert_items():

	if (!list_is_last(&lip->li_cil, &cil->xc_cil))
		list_move_tail(&lip->li_cil, &cil->xc_cil);

... such that the list with a single item is never transiently empty.
Thoughts?

Brian

> commit 4bb928cdb900d0614f4766d5f1ca5bc3844f7656
> Author: Dave Chinner <dchinner@redhat.com>
> Date:   Mon Aug 12 20:50:08 2013 +1000
> 
>     xfs: split the CIL lock
>     
>     The xc_cil_lock is used for two purposes - to protect the CIL
>     itself, and to protect the push/commit state and lists. These are
>     two logically separate structures and operations, so can have their
>     own locks. This means that pushing on the CIL and the commit wait
>     ordering won't contend for a lock with other transactions that are
>     completing concurrently. As the CIL insertion is the hottest path
>     throught eh CIL, this is a big win.
>     
>     Signed-off-by: Dave Chinner <dchinner@redhat.com>
>     Reviewed-by: Mark Tinguely <tinguely@sgi.com>
>     Signed-off-by: Ben Myers <bpm@sgi.com>
> 
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence
  2015-07-08 18:31   ` Brian Foster
@ 2015-07-09  1:51     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2015-07-09  1:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Jul 08, 2015 at 02:31:53PM -0400, Brian Foster wrote:
> On Wed, Jul 08, 2015 at 10:09:09AM +1000, Dave Chinner wrote:
> > On Wed, Jun 24, 2015 at 10:04:01AM -0400, Brian Foster wrote:
> > > We have seen somewhat rare reports of the following assert from
> > > xlog_cil_push_background() failing during ltp tests or somewhat
> > > innocuous desktop root fs workloads (e.g., virt operations, initramfs
> > > construction):
> > > 
> > > 	ASSERT(!list_empty(&cil->xc_cil));
> > > 
> > > The reasoning behind the assert is that the transaction has inserted
> > > items to the CIL and hit background push codepath all with
> > > cil->xc_ctx_lock held for reading. This locks out background commit from
> > > emptying the CIL, which acquires the lock for writing. Therefore, the
> > > reasoning is that the items previously inserted in the CIL should still
> > > be present.
> > > 
> > > The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
> > > list, however, due to how CIL insertion is handled.
> > > xlog_cil_insert_items() inserts and reorders the dirty transaction items
> > > to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
> > > achieve insertion and reordering in the same block of code. This
> > > function removes and reinserts an item to the tail of the list. If a
> > > transaction commits an item that was already logged and thus already
> > > resides in the CIL, and said item is the sole item on the list, the
> > > removal and reinsertion creates a temporary state where the list is
> > > actually empty.
> > 
> > The only way I can see this occurring is that we have to be committing a transaction that
> > modifies the same object as the previous transaction commit that
> > is still running through xfs_log_commit_cil(). e.g. racing
> > timestamp modifications on an inode, and the CIL is empty:
> > 
> > thread A			thread B
> > 
> > lock(inode)			lock(inode)
> > xfs_trans_join(inode)		<schedule>
> > xfs_trans_log(inode)
> > xfs_trans_commit(tp)
> >   xfs_log_commit_cil()
> >     lock(xc_ctx_lock)
> >     <add inode to cil>
> >     xfs_trans_free_items()
> >       unlock(inode)
> >       <schedule>
> > 				<runs with inode lock>
> > 				xfs_trans_join(inode)
> > 				xfs_trans_log(inode)
> > 				xfs_trans_commit(tp)
> > 				  xfs_log_commit_cil()
> > 				    lock(xc_ctx_lock)
> > 				    xlog_cil_insert_items()
> > 				      xlog_cil_insert_format_items()
> >       <runs again>
> >       xlog_cil_push_background
> > 	ASSERT(!list_empty(cil))      list_move_tail(item, cil)
> > 
> 
> Yes. I don't have the exact details of the original reproducer but it
> basically required transactions to recommit an already CIL-resident item
> with that item as the only entry in the CIL. The first transaction
> inserts or reinserts and heads to the background push while the second
> reinserts and creates the transient empty list state. I don't think the
> CIL necessarily has to be empty to start, but either way the above looks
> about right to me.
> 
> > 
> > If that is the race, then the fix appears simple to me: call
> > xlog_cil_push_background() before xfs_trans_free_items() so that we
> > push the CIL before we unlock the items we just added to the CIL.
> > i.e.:
> > 
> > thread A			thread B
> > 
> > lock(inode)			lock(inode)
> > xfs_trans_join(inode)		<schedule>
> > xfs_trans_log(inode)
> > xfs_trans_commit(tp)
> >   xfs_log_commit_cil()
> >     lock(xc_ctx_lock)
> >     <add inode to cil>
> >     log_cil_push_background
> > 	ASSERT(!list_empty(cil))
> >     xfs_trans_free_items()
> >       unlock(inode)
> >       <schedule>
> > 				<runs with inode lock>
> > 				xfs_trans_join(inode)
> > 				xfs_trans_log(inode)
> > 				xfs_trans_commit(tp)
> > 				  xfs_log_commit_cil()
> > 				    lock(xc_ctx_lock)
> > 				    xlog_cil_insert_items()
> > 				      xlog_cil_insert_format_items()
> > 				      list_move_tail(item, cil)
> 
> That seems like it should work for this particular case. I suppose it
> depends on the nature of the transactions as opposed to closing the race
> directly. It would require independent transactions to relog the same
> item without locking to expose this instance of the race again, which
> probably should never happen.
> 
> That doesn't help us for the other push side users of list_empty(),
> however. For example, what about xlog_cil_push_now(), etc.? AFAICS, that
> code still seems racy.

If we get a transient empty list there, we shouldn't have to care as
the retry condition in xlog_cil_force_lsn() should catch it. I guess
it is possible that both the original push and the retry
list-empty() check could *both* see transient empty lists (however
unlikely that is!).

However, a log force is a fairly rare occurrence outside of
sync/fsync heavy workloads, so adding the xc_cil_lock there
shouldn't cause any serious performance issues because everything in
a sync/fsync heavy workload will back waiting on the log forces
completing rather than formatting changes into the CIL...

The other places we check for empty xc_cil:

	- xlog_cil_push() holds xc_ctx_lock in write mode, so cannot
	  race at all with xfs_log_commit_cil(). No need for
	  additional locking there.
	- xlog_cil_empty() is only used by the log covering code,
	  so we simply don't care if we get a transient empty there
	  because that implies the log is active and shouldn't be
	  covered.
	- xfs_log_item_in_current_chkpt() is only called from
	  xfs_buf_item_format() and so is done with the buffer
	  locked and the xc_ctx_lock in read mode.
	  Hence it can't race with itself being removed or a CIl push
	  removing all the entries and so a
	  transient false empty implies that the buffer itself is
	  not in the CIL. Therefore it functions correctly even if
	  we get a transient CIL empty occurring...

So, really, there isn't a fundamental problem with the locking
during the corner case where a transient empty can be detected as
everything functions correctly even when a transient is detected.
The only place I'd be concerned is the unlikely case of a double
transient in the log force situation, but even then I'd question if
it is even possible because we only force the log when we know that
we have something that needs pushing....

> > > This state is not valid and thus should never be observed by concurrent
> > > transaction commit-side checks in the circumstances outlined above.
> > > Update all of the xc_cil checks to acquire xc_cil_lock before assessing
> > > the state of xc_cil.
> > 
> > That will reintroduce the problem of lock contention between the
> > commit and the push sides of the CIL, which the cil/push lock
> > separation was added to solve:
> > 
> 
> If the lock is a problem, how about trying to close the race on the
> insert side? E.g., something like the following in
> xlog_cil_insert_items():
> 
> 	if (!list_is_last(&lip->li_cil, &cil->xc_cil))
> 		list_move_tail(&lip->li_cil, &cil->xc_cil);
> 
> ... such that the list with a single item is never transiently empty.
> Thoughts?

Yeah, that would work. I like KISS solutions and sometimes I get so
far into the code I can't see them. Good thinking, Brain!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-09  1:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-24 14:04 [PATCH] xfs: close xc_cil list_empty() races with cil commit sequence Brian Foster
2015-07-08  0:09 ` Dave Chinner
2015-07-08 18:31   ` Brian Foster
2015-07-09  1:51     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox