[PATCH] Use atomic_t and wait_event to track dquot pincount

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Use atomic_t and wait_event to track dquot pincount
@ 2008-09-24  4:28 Peter Leckie
  2008-09-24  6:05 ` Dave Chinner
  2008-09-24  7:26 ` [PATCH v2] " Peter Leckie
  0 siblings, 2 replies; 34+ messages in thread
From: Peter Leckie @ 2008-09-24  4:28 UTC (permalink / raw)
  To: xfs, xfs-dev

During a heavy workload with quota's enabled the system can stall with 
every process
requesting log space, however the log is full of dquots. The aild is 
trying to push the tail
of the log however every item in the log had previously been pushed by 
the aild and was
marked as clean so the aild aborts pushing the items in the log.

The reason the log items are still in the log is because the iodone 
routine xfs_qm_dqflush_done
detected that the log sequence number(lsn) had changed and assumed the 
log item had been re-dirtied
so it should be left in the log.  However the log item had not been 
re-dirtied so it was still marked as
clean preventing it from being pushed again causing the log item to be 
stuck in the log. After a while
the log eventually filled with these dquot log items.

The reason the lsn had changed was due to it not being initialized by 
the time a copy of the lsn was taken
in xfs_qm_dqflush().  The lsn was then latter updated causing the test 
in xfs_qm_dqflush_done() to fail.

Synchronizations between the 2 functions is done by the pincount in 
struct xfs_dquot_t  and xfs_qm_dqflush()
calls xfs_qm_dqunpin_wait() to wait for the pincount == 0. However after 
been woken up it does not validate
the pincount is actually 0, allowing a false wake up by the scheduler to 
cause xfs_qm_dqflush() to prematurely
start processing the dquot.

So 
this patch uses an atomic_t to track the pincount which allows us to easily
use the wait_event macro to wait, this will guarantee 
that when we return from
xfs_qm_dqunpin_wait() that the pincount == 0.  We also remove the global 
qi_pinlock
from xfs_quotainfo which may also reduce contention when pinning dquot's.



Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot_item.c    2008-09-24 
12:02:41.987960702 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c    2008-09-24 
14:22:01.643627312 +1000
@@ -98,9 +98,7 @@ xfs_qm_dquot_logitem_pin(
 
     dqp = logitem->qli_dquot;
     ASSERT(XFS_DQ_IS_LOCKED(dqp));
-    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-    dqp->q_pincount++;
-    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
+    atomic_inc(&dqp->q_pincount);
 }
 
 /*
@@ -117,13 +115,9 @@ xfs_qm_dquot_logitem_unpin(
     xfs_dquot_t *dqp;
 
     dqp = logitem->qli_dquot;
-    ASSERT(dqp->q_pincount > 0);
-    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-    dqp->q_pincount--;
-    if (dqp->q_pincount == 0) {
-        sv_broadcast(&dqp->q_pinwait);
-    }
-    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
+    ASSERT(atomic_read(&dqp->q_pincount) > 0);
+    if (atomic_dec_and_test(&dqp->q_pincount))
+        wake_up(&dqp->q_pinwait);
 }
 
 /* ARGSUSED */
@@ -193,21 +187,14 @@ xfs_qm_dqunpin_wait(
     xfs_dquot_t    *dqp)
 {
     ASSERT(XFS_DQ_IS_LOCKED(dqp));
-    if (dqp->q_pincount == 0) {
+    if (atomic_read(&dqp->q_pincount) == 0)
         return;
-    }
 
     /*
      * Give the log a push so we don't wait here too long.
      */
     xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
-    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-    if (dqp->q_pincount == 0) {
-        spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-        return;
-    }
-    sv_wait(&(dqp->q_pinwait), PINOD,
-        &(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
+    wait_event(dqp->q_pinwait, (atomic_read(&dqp->q_pincount) == 0));
 }
 
 /*
@@ -310,7 +297,7 @@ xfs_qm_dquot_logitem_trylock(
     uint            retval;
 
     dqp = qip->qli_dquot;
-    if (dqp->q_pincount > 0)
+    if (atomic_read(&dqp->q_pincount) > 0)
         return (XFS_ITEM_PINNED);
 
     if (! xfs_qm_dqlock_nowait(dqp))
Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.h    2008-09-24 
12:02:41.991960179 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h    2008-09-24 14:20:28.979820915 
+1000
@@ -83,8 +83,8 @@ typedef struct xfs_dquot {
     xfs_qcnt_t     q_res_rtbcount;/* total realtime blks used+reserved */
     mutex_t         q_qlock;    /* quota lock */
     struct completion q_flush;    /* flush completion queue */
-    uint         q_pincount;    /* pin count for this dquot */
-    sv_t         q_pinwait;    /* sync var for pinning */
+    atomic_t          q_pincount;    /* dquot pin count */
+    wait_queue_head_t q_pinwait;    /* dquot pinning wait queue */
 #ifdef XFS_DQUOT_TRACE
     struct ktrace    *q_trace;    /* trace header structure */
 #endif
Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.c    2008-09-24 
12:09:14.025200071 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c    2008-09-24 14:20:56.948140364 
+1000
@@ -101,7 +101,7 @@ xfs_qm_dqinit(
     if (brandnewdquot) {
         dqp->dq_flnext = dqp->dq_flprev = dqp;
         mutex_init(&dqp->q_qlock);
-        sv_init(&dqp->q_pinwait, SV_DEFAULT, "pdq");
+        init_waitqueue_head(&dqp->q_pinwait);
 
         /*
          * Because we want to use a counting completion, complete
@@ -131,7 +131,7 @@ xfs_qm_dqinit(
          dqp->q_res_bcount = 0;
          dqp->q_res_icount = 0;
          dqp->q_res_rtbcount = 0;
-         dqp->q_pincount = 0;
+         atomic_set(&dqp->q_pincount, 0);
          dqp->q_hash = NULL;
          ASSERT(dqp->dq_flnext == dqp->dq_flprev);
 
@@ -1490,7 +1490,7 @@ xfs_qm_dqpurge(
                 "xfs_qm_dqpurge: dquot %p flush failed", dqp);
         xfs_dqflock(dqp);
     }
-    ASSERT(dqp->q_pincount == 0);
+    ASSERT(atomic_read(&dqp->q_pincount) == 0);
     ASSERT(XFS_FORCED_SHUTDOWN(mp) ||
            !(dqp->q_logitem.qli_item.li_flags & XFS_LI_IN_AIL));
 
Index: 2.6.x-xfs/fs/xfs/xfsidbg.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/xfsidbg.c    2008-09-24 12:02:42.115943985 +1000
+++ 2.6.x-xfs/fs/xfs/xfsidbg.c    2008-09-24 14:21:17.413447303 +1000
@@ -6577,7 +6577,7 @@ xfsidbg_xqm_dquot(xfs_dquot_t *dqp)
         (unsigned long long)dqp->q_res_rtbcount);
     kdb_printf("qlock 0x%p  &q_flush 0x%p (%d) pincount 0x%x\n",
         &dqp->q_qlock, &dqp->q_flush,
-        dqp->q_flush.done, dqp->q_pincount);
+        dqp->q_flush.done, atomic_read(&dqp->q_pincount));
 #ifdef XFS_DQUOT_TRACE
     qprintf("dqtrace 0x%p\n", dqp->q_trace);
 #endif
@@ -6761,10 +6761,9 @@ xfsidbg_xqm_qinfo(xfs_mount_t *mp)
         return;
     }
 
-    kdb_printf("uqip 0x%p, gqip 0x%p, &pinlock 0x%p &dqlist 0x%p\n",
+    kdb_printf("uqip 0x%p, gqip 0x%p, &dqlist 0x%p\n",
         mp->m_quotainfo->qi_uquotaip,
         mp->m_quotainfo->qi_gquotaip,
-        &mp->m_quotainfo->qi_pinlock,
         &mp->m_quotainfo->qi_dqlist);
 
     kdb_printf("btmlimit 0x%x, itmlimit 0x%x, RTbtmlim 0x%x\n",
Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.h
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.h    2008-09-24 
12:02:41.999959135 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.h    2008-09-24 14:00:36.168448681 +1000
@@ -106,7 +106,6 @@ typedef struct xfs_qm {
 typedef struct xfs_quotainfo {
     xfs_inode_t    *qi_uquotaip;     /* user quota inode */
     xfs_inode_t    *qi_gquotaip;     /* group quota inode */
-    spinlock_t     qi_pinlock;     /* dquot pinning lock */
     xfs_dqlist_t     qi_dqlist;     /* all dquots in filesys */
     int         qi_dqreclaims;     /* a change here indicates
                         a removal in the dqlist */
Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.c    2008-09-24 
12:02:42.000000000 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.c    2008-09-24 14:03:44.795703839 +1000
@@ -1137,7 +1137,6 @@ xfs_qm_init_quotainfo(
         return error;
     }
 
-    spin_lock_init(&qinf->qi_pinlock);
     xfs_qm_list_init(&qinf->qi_dqlist, "mpdqlist", 0);
     qinf->qi_dqreclaims = 0;
 
@@ -1234,7 +1233,6 @@ xfs_qm_destroy_quotainfo(
      */
     xfs_qm_rele_quotafs_ref(mp);
 
-    spinlock_destroy(&qi->qi_pinlock);
     xfs_qm_list_destroy(&qi->qi_dqlist);
 
     if (qi->qi_uquotaip) {

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  4:28 [PATCH] Use atomic_t and wait_event to track dquot pincount Peter Leckie
@ 2008-09-24  6:05 ` Dave Chinner
  2008-09-24  6:53   ` Peter Leckie
  2008-09-24  7:26 ` [PATCH v2] " Peter Leckie
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2008-09-24  6:05 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

[ Pete, please wrap your text at 68-72 columns]

On Wed, Sep 24, 2008 at 02:28:13PM +1000, Peter Leckie wrote:
> The reason the lsn had changed was due to it not being initialized
> by  the time a copy of the lsn was taken in xfs_qm_dqflush().
> The lsn was then latter updated causing the test  in
> xfs_qm_dqflush_done() to fail.
>
> Synchronizations between the 2 functions is done by the pincount
> in  struct xfs_dquot_t  and xfs_qm_dqflush() calls
> xfs_qm_dqunpin_wait() to wait for the pincount == 0. However after
> been woken up it does not validate the pincount is actually 0,

Sure - that's because we only ever send a wakeup when the pin count
falls to zero. Because we have to be holding the dquot lock when we
either pin a dquot or wait for it to be unpinned, the act of waiting
for it to be unpinned with the dquot lock held guarantees that it
is not pinned when we wake up.

IOWs, the pin count cannot be incremented while we are waiting for
it to be unpinned, and hence it must be zero when we are woken......

> allowing a false wake up by the scheduler to  cause
> xfs_qm_dqflush() to prematurely start processing the dquot.

.... which means I can't see how that would happen...

What am I missing here?

> So this patch uses an atomic_t to track the pincount which allows
> us to easily use the wait_event macro to wait, this will guarantee
> that when we return from xfs_qm_dqunpin_wait() that the pincount
> == 0.  We also remove the global  qi_pinlock from xfs_quotainfo
> which may also reduce contention when pinning dquot's.

I have an patch series that I've been running under test for the
past two months that does exactly this - it's an optimisation,
not a bug fix. I was actually planning on posting it this afternoon.

As to the patch, your mailer has whitespace damaged it so you need
to be fix that up.

> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot_item.c    2008-09-24  
> 12:02:41.987960702 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c    2008-09-24  
> 14:22:01.643627312 +1000
> @@ -98,9 +98,7 @@ xfs_qm_dquot_logitem_pin(
>
>     dqp = logitem->qli_dquot;
>     ASSERT(XFS_DQ_IS_LOCKED(dqp));
> -    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -    dqp->q_pincount++;
> -    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> +    atomic_inc(&dqp->q_pincount);
> }

The header comment on this function needs updating - it references
the spinlock that just got removed. You can also do

	xfs_dquot_t	*dqp = logitem->qli_dquot;

> @@ -117,13 +115,9 @@ xfs_qm_dquot_logitem_unpin(
>     xfs_dquot_t *dqp;
>
>     dqp = logitem->qli_dquot;
> -    ASSERT(dqp->q_pincount > 0);
> -    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -    dqp->q_pincount--;
> -    if (dqp->q_pincount == 0) {
> -        sv_broadcast(&dqp->q_pinwait);
> -    }
> -    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> +    ASSERT(atomic_read(&dqp->q_pincount) > 0);
> +    if (atomic_dec_and_test(&dqp->q_pincount))
> +        wake_up(&dqp->q_pinwait);
> }

The header comment for this function references functions that
don't exist. Needs updating. same again about logitem->qli_dquot...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  6:05 ` Dave Chinner
@ 2008-09-24  6:53   ` Peter Leckie
  2008-09-24  7:43     ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Leckie @ 2008-09-24  6:53 UTC (permalink / raw)
  To: xfs, xfs-dev

Dave Chinner wrote:
> [ Pete, please wrap your text at 68-72 columns]
>   
Ok will do in future
> On Wed, Sep 24, 2008 at 02:28:13PM +1000, Peter Leckie wrote:
>   
>> The reason the lsn had changed was due to it not being initialized
>> by  the time a copy of the lsn was taken in xfs_qm_dqflush().
>> The lsn was then latter updated causing the test  in
>> xfs_qm_dqflush_done() to fail.
>>
>> Synchronizations between the 2 functions is done by the pincount
>> in  struct xfs_dquot_t  and xfs_qm_dqflush() calls
>> xfs_qm_dqunpin_wait() to wait for the pincount == 0. However after
>> been woken up it does not validate the pincount is actually 0,
>>     
>
> Sure - that's because we only ever send a wakeup when the pin count
> falls to zero. Because we have to be holding the dquot lock when we
> either pin a dquot or wait for it to be unpinned, the act of waiting
> for it to be unpinned with the dquot lock held guarantees that it
> is not pinned when we wake up.
>
> IOWs, the pin count cannot be incremented while we are waiting for
> it to be unpinned, and hence it must be zero when we are woken......
>
>   
>> allowing a false wake up by the scheduler to  cause
>> xfs_qm_dqflush() to prematurely start processing the dquot.
>>     
>
> .... which means I can't see how that would happen...
>
> What am I missing here?
>   
Yeah it's quite intriguing  isn't it, I added the pincount to the
dquot tracing code and sure enough it's 1 all the way through
xfs_qm_dqflush() I can tell you that none of the XFS code is
causing it to wake up. I was going to add some tracing code to
the scheduler to determine who is causing us to wake up however
I had other priorities to work on.

Reading the code it appears things like a threads dying or a
system suspending can cause it. However none of these had
happened, after reading other examples in the Linux kernel
it appeared pretty standard to always re-evaluate the condition
after being woken so I suspect that Linux simply does not
guarantee that we will only be woken by the thread calling
wake_up on  us.


Any insight into this would be much appreciated as I'm also very curious
as to why this is happening.
> As to the patch, your mailer has whitespace damaged it so you need
> to be fix that up.
>   

Ahh yeah copy and past from a terminal, I'll  fix up the patch and
re-send shortly,

Thanks Dave.

Pete

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  4:28 [PATCH] Use atomic_t and wait_event to track dquot pincount Peter Leckie
  2008-09-24  6:05 ` Dave Chinner
@ 2008-09-24  7:26 ` Peter Leckie
  2008-09-24  7:42   ` Lachlan McIlroy
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Leckie @ 2008-09-24  7:26 UTC (permalink / raw)
  Cc: xfs, xfs-dev

During a heavy workload with quota's enabled the system can stall

with every process requesting log space, however the log is full
of dquots.The aild is trying to push the tail of the log however
every item in the log had previously been pushed by the aild and was
marked as clean so the aild aborts pushing the items in the log.

The reason the log items are still in the log is because the iodone
routine xfs_qm_dqflush_done detected that the log sequence number(lsn)
had changed and assumed the log item had been re-dirtied so it should
be left in the log.  However the log item had not been re-dirtied so
it was still marked as clean preventing it from being pushed again
causing the log item to be stuck in the log. After a while the log
eventually filled with these dquot log items.

The reason the lsn had changed was due to it not being initialized by
the time a copy of the lsn was taken in xfs_qm_dqflush().  The lsn was
then latter updated causing the test in xfs_qm_dqflush_done() to fail.

Synchronizations between the 2 functions is done by the pincount in
struct xfs_dquot_t  and xfs_qm_dqflush() calls xfs_qm_dqunpin_wait()
to wait for the pincount == 0. However after been woken up it does not
validate the pincount is actually 0, allowing a false wake up by the
scheduler to cause xfs_qm_dqflush() to prematurely start processing
the dquot.

So this patch uses an atomic_t to track the pincount which allows us
to easily use the wait_event macro to wait, this will guarantee that
when we return from xfs_qm_dqunpin_wait() that the pincount == 0.
We also remove the global qi_pinlock from xfs_quotainfo which may
also reduce contention when pinning dquot's.

Signed-off-by: Peter Leckie <pleckie@sgi.com>


Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot_item.c	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c	2008-09-24 16:36:06.748350869 +1000
@@ -88,25 +88,22 @@ xfs_qm_dquot_logitem_format(
 
 /*
  * Increment the pin count of the given dquot.
- * This value is protected by pinlock spinlock in the xQM structure.
  */
 STATIC void
 xfs_qm_dquot_logitem_pin(
 	xfs_dq_logitem_t *logitem)
 {
-	xfs_dquot_t *dqp;
+	xfs_dquot_t *dqp = logitem->qli_dquot;
 
-	dqp = logitem->qli_dquot;
 	ASSERT(XFS_DQ_IS_LOCKED(dqp));
-	spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-	dqp->q_pincount++;
-	spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
+	atomic_inc(dqp->q_pincount);
 }
 
 /*
  * Decrement the pin count of the given dquot, and wake up
  * anyone in xfs_dqwait_unpin() if the count goes to 0.	 The
- * dquot must have been previously pinned with a call to xfs_dqpin().
+ * dquot must have been previously pinned with a call to
+ * xfs_qm_dquot_logitem_pin().
  */
 /* ARGSUSED */
 STATIC void
@@ -114,16 +111,11 @@ xfs_qm_dquot_logitem_unpin(
 	xfs_dq_logitem_t *logitem,
 	int		  stale)
 {
-	xfs_dquot_t *dqp;
+	xfs_dquot_t *dqp = logitem->qli_dquot;
 
-	dqp = logitem->qli_dquot;
-	ASSERT(dqp->q_pincount > 0);
-	spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-	dqp->q_pincount--;
-	if (dqp->q_pincount == 0) {
-		sv_broadcast(&dqp->q_pinwait);
-	}
-	spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
+	ASSERT(atomic_read(&dqp->q_pincount) > 0);
+	if (atomic_dec_and_test(&dqp->q_pincount))
+		wake_up(&dqp->q_pinwait);
 }
 
 /* ARGSUSED */
@@ -193,21 +185,14 @@ xfs_qm_dqunpin_wait(
 	xfs_dquot_t	*dqp)
 {
 	ASSERT(XFS_DQ_IS_LOCKED(dqp));
-	if (dqp->q_pincount == 0) {
+	if (atomic_read(&dqp->q_pincount) == 0)
 		return;
-	}
 
 	/*
 	 * Give the log a push so we don't wait here too long.
 	 */
 	xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
-	spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-	if (dqp->q_pincount == 0) {
-		spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
-		return;
-	}
-	sv_wait(&(dqp->q_pinwait), PINOD,
-		&(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
+	wait_event(dqp->q_pinwait, (atomic_read(&dqp->q_pincount) == 0));
 }
 
 /*
@@ -310,7 +295,7 @@ xfs_qm_dquot_logitem_trylock(
 	uint			retval;
 
 	dqp = qip->qli_dquot;
-	if (dqp->q_pincount > 0)
+	if (atomic_read(&dqp->q_pincount) > 0)
 		return (XFS_ITEM_PINNED);
 
 	if (! xfs_qm_dqlock_nowait(dqp))
Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.h	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h	2008-09-24 16:24:58.623139695 +1000
@@ -83,8 +83,8 @@ typedef struct xfs_dquot {
 	xfs_qcnt_t	 q_res_rtbcount;/* total realtime blks used+reserved */
 	mutex_t		 q_qlock;	/* quota lock */
 	struct completion q_flush;	/* flush completion queue */
-	uint		 q_pincount;	/* pin count for this dquot */
-	sv_t		 q_pinwait;	/* sync var for pinning */
+	atomic_t          q_pincount;	/* dquot pin count */
+	wait_queue_head_t q_pinwait;	/* dquot pinning wait queue */
 #ifdef XFS_DQUOT_TRACE
 	struct ktrace	*q_trace;	/* trace header structure */
 #endif
Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.c	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c	2008-09-24 16:24:58.623139695 +1000
@@ -101,7 +101,7 @@ xfs_qm_dqinit(
 	if (brandnewdquot) {
 		dqp->dq_flnext = dqp->dq_flprev = dqp;
 		mutex_init(&dqp->q_qlock);
-		sv_init(&dqp->q_pinwait, SV_DEFAULT, "pdq");
+		init_waitqueue_head(&dqp->q_pinwait);
 
 		/*
 		 * Because we want to use a counting completion, complete
@@ -131,7 +131,7 @@ xfs_qm_dqinit(
 		 dqp->q_res_bcount = 0;
 		 dqp->q_res_icount = 0;
 		 dqp->q_res_rtbcount = 0;
-		 dqp->q_pincount = 0;
+		 atomic_set(&dqp->q_pincount, 0);
 		 dqp->q_hash = NULL;
 		 ASSERT(dqp->dq_flnext == dqp->dq_flprev);
 
@@ -1489,7 +1489,7 @@ xfs_qm_dqpurge(
 				"xfs_qm_dqpurge: dquot %p flush failed", dqp);
 		xfs_dqflock(dqp);
 	}
-	ASSERT(dqp->q_pincount == 0);
+	ASSERT(atomic_read(&dqp->q_pincount) == 0);
 	ASSERT(XFS_FORCED_SHUTDOWN(mp) ||
 	       !(dqp->q_logitem.qli_item.li_flags & XFS_LI_IN_AIL));
 
Index: 2.6.x-xfs/fs/xfs/xfsidbg.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/xfsidbg.c	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/xfsidbg.c	2008-09-24 16:24:58.627139176 +1000
@@ -6577,7 +6577,7 @@ xfsidbg_xqm_dquot(xfs_dquot_t *dqp)
 		(unsigned long long)dqp->q_res_rtbcount);
 	kdb_printf("qlock 0x%p  &q_flush 0x%p (%d) pincount 0x%x\n",
 		&dqp->q_qlock, &dqp->q_flush,
-		dqp->q_flush.done, dqp->q_pincount);
+		dqp->q_flush.done, atomic_read(&dqp->q_pincount));
 #ifdef XFS_DQUOT_TRACE
 	qprintf("dqtrace 0x%p\n", dqp->q_trace);
 #endif
@@ -6761,10 +6761,9 @@ xfsidbg_xqm_qinfo(xfs_mount_t *mp)
 		return;
 	}
 
-	kdb_printf("uqip 0x%p, gqip 0x%p, &pinlock 0x%p &dqlist 0x%p\n",
+	kdb_printf("uqip 0x%p, gqip 0x%p, &dqlist 0x%p\n",
 		mp->m_quotainfo->qi_uquotaip,
 		mp->m_quotainfo->qi_gquotaip,
-		&mp->m_quotainfo->qi_pinlock,
 		&mp->m_quotainfo->qi_dqlist);
 
 	kdb_printf("btmlimit 0x%x, itmlimit 0x%x, RTbtmlim 0x%x\n",
Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.h
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.h	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.h	2008-09-24 16:24:58.627139176 +1000
@@ -106,7 +106,6 @@ typedef struct xfs_qm {
 typedef struct xfs_quotainfo {
 	xfs_inode_t	*qi_uquotaip;	 /* user quota inode */
 	xfs_inode_t	*qi_gquotaip;	 /* group quota inode */
-	spinlock_t	 qi_pinlock;	 /* dquot pinning lock */
 	xfs_dqlist_t	 qi_dqlist;	 /* all dquots in filesys */
 	int		 qi_dqreclaims;	 /* a change here indicates
 					    a removal in the dqlist */
Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.c
===================================================================
--- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.c	2008-09-24 14:45:29.270573668 +1000
+++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.c	2008-09-24 16:24:58.631138657 +1000
@@ -1137,7 +1137,6 @@ xfs_qm_init_quotainfo(
 		return error;
 	}
 
-	spin_lock_init(&qinf->qi_pinlock);
 	xfs_qm_list_init(&qinf->qi_dqlist, "mpdqlist", 0);
 	qinf->qi_dqreclaims = 0;
 
@@ -1234,7 +1233,6 @@ xfs_qm_destroy_quotainfo(
 	 */
 	xfs_qm_rele_quotafs_ref(mp);
 
-	spinlock_destroy(&qi->qi_pinlock);
 	xfs_qm_list_destroy(&qi->qi_dqlist);
 
 	if (qi->qi_uquotaip) {

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  7:26 ` [PATCH v2] " Peter Leckie
@ 2008-09-24  7:42   ` Lachlan McIlroy
  2008-09-24  7:46     ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-24  7:42 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

Looks good Pete.

Peter Leckie wrote:
> During a heavy workload with quota's enabled the system can stall
> 
> with every process requesting log space, however the log is full
> of dquots.The aild is trying to push the tail of the log however
> every item in the log had previously been pushed by the aild and was
> marked as clean so the aild aborts pushing the items in the log.
> 
> The reason the log items are still in the log is because the iodone
> routine xfs_qm_dqflush_done detected that the log sequence number(lsn)
> had changed and assumed the log item had been re-dirtied so it should
> be left in the log.  However the log item had not been re-dirtied so
> it was still marked as clean preventing it from being pushed again
> causing the log item to be stuck in the log. After a while the log
> eventually filled with these dquot log items.
> 
> The reason the lsn had changed was due to it not being initialized by
> the time a copy of the lsn was taken in xfs_qm_dqflush().  The lsn was
> then latter updated causing the test in xfs_qm_dqflush_done() to fail.
> 
> Synchronizations between the 2 functions is done by the pincount in
> struct xfs_dquot_t  and xfs_qm_dqflush() calls xfs_qm_dqunpin_wait()
> to wait for the pincount == 0. However after been woken up it does not
> validate the pincount is actually 0, allowing a false wake up by the
> scheduler to cause xfs_qm_dqflush() to prematurely start processing
> the dquot.
> 
> So this patch uses an atomic_t to track the pincount which allows us
> to easily use the wait_event macro to wait, this will guarantee that
> when we return from xfs_qm_dqunpin_wait() that the pincount == 0.
> We also remove the global qi_pinlock from xfs_quotainfo which may
> also reduce contention when pinning dquot's.
> 
> Signed-off-by: Peter Leckie <pleckie@sgi.com>
> 
> 
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot_item.c    2008-09-24 
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c    2008-09-24 
> 16:36:06.748350869 +1000
> @@ -88,25 +88,22 @@ xfs_qm_dquot_logitem_format(
> 
> /*
>  * Increment the pin count of the given dquot.
> - * This value is protected by pinlock spinlock in the xQM structure.
>  */
> STATIC void
> xfs_qm_dquot_logitem_pin(
>     xfs_dq_logitem_t *logitem)
> {
> -    xfs_dquot_t *dqp;
> +    xfs_dquot_t *dqp = logitem->qli_dquot;
> 
> -    dqp = logitem->qli_dquot;
>     ASSERT(XFS_DQ_IS_LOCKED(dqp));
> -    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -    dqp->q_pincount++;
> -    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> +    atomic_inc(dqp->q_pincount);
> }
> 
> /*
>  * Decrement the pin count of the given dquot, and wake up
>  * anyone in xfs_dqwait_unpin() if the count goes to 0.     The
> - * dquot must have been previously pinned with a call to xfs_dqpin().
> + * dquot must have been previously pinned with a call to
> + * xfs_qm_dquot_logitem_pin().
>  */
> /* ARGSUSED */
> STATIC void
> @@ -114,16 +111,11 @@ xfs_qm_dquot_logitem_unpin(
>     xfs_dq_logitem_t *logitem,
>     int          stale)
> {
> -    xfs_dquot_t *dqp;
> +    xfs_dquot_t *dqp = logitem->qli_dquot;
> 
> -    dqp = logitem->qli_dquot;
> -    ASSERT(dqp->q_pincount > 0);
> -    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -    dqp->q_pincount--;
> -    if (dqp->q_pincount == 0) {
> -        sv_broadcast(&dqp->q_pinwait);
> -    }
> -    spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> +    ASSERT(atomic_read(&dqp->q_pincount) > 0);
> +    if (atomic_dec_and_test(&dqp->q_pincount))
> +        wake_up(&dqp->q_pinwait);
> }
> 
> /* ARGSUSED */
> @@ -193,21 +185,14 @@ xfs_qm_dqunpin_wait(
>     xfs_dquot_t    *dqp)
> {
>     ASSERT(XFS_DQ_IS_LOCKED(dqp));
> -    if (dqp->q_pincount == 0) {
> +    if (atomic_read(&dqp->q_pincount) == 0)
>         return;
> -    }
> 
>     /*
>      * Give the log a push so we don't wait here too long.
>      */
>     xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
> -    spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -    if (dqp->q_pincount == 0) {
> -        spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> -        return;
> -    }
> -    sv_wait(&(dqp->q_pinwait), PINOD,
> -        &(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
> +    wait_event(dqp->q_pinwait, (atomic_read(&dqp->q_pincount) == 0));
> }
> 
> /*
> @@ -310,7 +295,7 @@ xfs_qm_dquot_logitem_trylock(
>     uint            retval;
> 
>     dqp = qip->qli_dquot;
> -    if (dqp->q_pincount > 0)
> +    if (atomic_read(&dqp->q_pincount) > 0)
>         return (XFS_ITEM_PINNED);
> 
>     if (! xfs_qm_dqlock_nowait(dqp))
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.h    2008-09-24 
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h    2008-09-24 16:24:58.623139695 
> +1000
> @@ -83,8 +83,8 @@ typedef struct xfs_dquot {
>     xfs_qcnt_t     q_res_rtbcount;/* total realtime blks used+reserved */
>     mutex_t         q_qlock;    /* quota lock */
>     struct completion q_flush;    /* flush completion queue */
> -    uint         q_pincount;    /* pin count for this dquot */
> -    sv_t         q_pinwait;    /* sync var for pinning */
> +    atomic_t          q_pincount;    /* dquot pin count */
> +    wait_queue_head_t q_pinwait;    /* dquot pinning wait queue */
> #ifdef XFS_DQUOT_TRACE
>     struct ktrace    *q_trace;    /* trace header structure */
> #endif
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.c    2008-09-24 
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c    2008-09-24 16:24:58.623139695 
> +1000
> @@ -101,7 +101,7 @@ xfs_qm_dqinit(
>     if (brandnewdquot) {
>         dqp->dq_flnext = dqp->dq_flprev = dqp;
>         mutex_init(&dqp->q_qlock);
> -        sv_init(&dqp->q_pinwait, SV_DEFAULT, "pdq");
> +        init_waitqueue_head(&dqp->q_pinwait);
> 
>         /*
>          * Because we want to use a counting completion, complete
> @@ -131,7 +131,7 @@ xfs_qm_dqinit(
>          dqp->q_res_bcount = 0;
>          dqp->q_res_icount = 0;
>          dqp->q_res_rtbcount = 0;
> -         dqp->q_pincount = 0;
> +         atomic_set(&dqp->q_pincount, 0);
>          dqp->q_hash = NULL;
>          ASSERT(dqp->dq_flnext == dqp->dq_flprev);
> 
> @@ -1489,7 +1489,7 @@ xfs_qm_dqpurge(
>                 "xfs_qm_dqpurge: dquot %p flush failed", dqp);
>         xfs_dqflock(dqp);
>     }
> -    ASSERT(dqp->q_pincount == 0);
> +    ASSERT(atomic_read(&dqp->q_pincount) == 0);
>     ASSERT(XFS_FORCED_SHUTDOWN(mp) ||
>            !(dqp->q_logitem.qli_item.li_flags & XFS_LI_IN_AIL));
> 
> Index: 2.6.x-xfs/fs/xfs/xfsidbg.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/xfsidbg.c    2008-09-24 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/xfsidbg.c    2008-09-24 16:24:58.627139176 +1000
> @@ -6577,7 +6577,7 @@ xfsidbg_xqm_dquot(xfs_dquot_t *dqp)
>         (unsigned long long)dqp->q_res_rtbcount);
>     kdb_printf("qlock 0x%p  &q_flush 0x%p (%d) pincount 0x%x\n",
>         &dqp->q_qlock, &dqp->q_flush,
> -        dqp->q_flush.done, dqp->q_pincount);
> +        dqp->q_flush.done, atomic_read(&dqp->q_pincount));
> #ifdef XFS_DQUOT_TRACE
>     qprintf("dqtrace 0x%p\n", dqp->q_trace);
> #endif
> @@ -6761,10 +6761,9 @@ xfsidbg_xqm_qinfo(xfs_mount_t *mp)
>         return;
>     }
> 
> -    kdb_printf("uqip 0x%p, gqip 0x%p, &pinlock 0x%p &dqlist 0x%p\n",
> +    kdb_printf("uqip 0x%p, gqip 0x%p, &dqlist 0x%p\n",
>         mp->m_quotainfo->qi_uquotaip,
>         mp->m_quotainfo->qi_gquotaip,
> -        &mp->m_quotainfo->qi_pinlock,
>         &mp->m_quotainfo->qi_dqlist);
> 
>     kdb_printf("btmlimit 0x%x, itmlimit 0x%x, RTbtmlim 0x%x\n",
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.h
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.h    2008-09-24 
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.h    2008-09-24 16:24:58.627139176 +1000
> @@ -106,7 +106,6 @@ typedef struct xfs_qm {
> typedef struct xfs_quotainfo {
>     xfs_inode_t    *qi_uquotaip;     /* user quota inode */
>     xfs_inode_t    *qi_gquotaip;     /* group quota inode */
> -    spinlock_t     qi_pinlock;     /* dquot pinning lock */
>     xfs_dqlist_t     qi_dqlist;     /* all dquots in filesys */
>     int         qi_dqreclaims;     /* a change here indicates
>                         a removal in the dqlist */
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.c    2008-09-24 
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.c    2008-09-24 16:24:58.631138657 +1000
> @@ -1137,7 +1137,6 @@ xfs_qm_init_quotainfo(
>         return error;
>     }
> 
> -    spin_lock_init(&qinf->qi_pinlock);
>     xfs_qm_list_init(&qinf->qi_dqlist, "mpdqlist", 0);
>     qinf->qi_dqreclaims = 0;
> 
> @@ -1234,7 +1233,6 @@ xfs_qm_destroy_quotainfo(
>      */
>     xfs_qm_rele_quotafs_ref(mp);
> 
> -    spinlock_destroy(&qi->qi_pinlock);
>     xfs_qm_list_destroy(&qi->qi_dqlist);
> 
>     if (qi->qi_uquotaip) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  6:53   ` Peter Leckie
@ 2008-09-24  7:43     ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-24  7:43 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

On Wed, Sep 24, 2008 at 04:53:22PM +1000, Peter Leckie wrote:
> Dave Chinner wrote:
>> [ Pete, please wrap your text at 68-72 columns]
>>   
> Ok will do in future
>> On Wed, Sep 24, 2008 at 02:28:13PM +1000, Peter Leckie wrote:
>>   
>>> The reason the lsn had changed was due to it not being initialized
>>> by  the time a copy of the lsn was taken in xfs_qm_dqflush().
>>> The lsn was then latter updated causing the test  in
>>> xfs_qm_dqflush_done() to fail.
>>>
>>> Synchronizations between the 2 functions is done by the pincount
>>> in  struct xfs_dquot_t  and xfs_qm_dqflush() calls
>>> xfs_qm_dqunpin_wait() to wait for the pincount == 0. However after
>>> been woken up it does not validate the pincount is actually 0,
>>>     
>>
>> Sure - that's because we only ever send a wakeup when the pin count
>> falls to zero. Because we have to be holding the dquot lock when we
>> either pin a dquot or wait for it to be unpinned, the act of waiting
>> for it to be unpinned with the dquot lock held guarantees that it
>> is not pinned when we wake up.
>>
>> IOWs, the pin count cannot be incremented while we are waiting for
>> it to be unpinned, and hence it must be zero when we are woken......
>>
>>   
>>> allowing a false wake up by the scheduler to  cause
>>> xfs_qm_dqflush() to prematurely start processing the dquot.
>>>     
>>
>> .... which means I can't see how that would happen...
>>
>> What am I missing here?
>>   
> Yeah it's quite intriguing  isn't it, I added the pincount to the
> dquot tracing code and sure enough it's 1 all the way through
> xfs_qm_dqflush() I can tell you that none of the XFS code is
> causing it to wake up.

Only the XFS code can cause it to be woken....

> I was going to add some tracing code to
> the scheduler to determine who is causing us to wake up however
> I had other priorities to work on.
>
> Reading the code it appears things like a threads dying or a
> system suspending can cause it.

Wait queues are not affected by such events when they are
configured to be uninterruptable. The sv_t:

#define sv_wait(sv, pri, lock, s) \
        _sv_wait(sv, lock, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)

Is as uninterruptible as it comes. Which means only an explicit
wakeup will cause any waiters to wake up.

> However none of these had
> happened, after reading other examples in the Linux kernel
> it appeared pretty standard to always re-evaluate the condition
> after being woken so I suspect that Linux simply does not
> guarantee that we will only be woken by the thread calling
> wake_up on  us.

Not true - wake_up() will only wake tasks on the wait queue
it was called for.

> Any insight into this would be much appreciated as I'm also very curious
> as to why this is happening.

Assert fail the machine when a dquot with a non-zero pincount is
woken in xfs_qm_dqunpin_wait(). If the assert trips, then we need
to work out how the pin count is getting elevated while we have
a waiter....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  7:42   ` Lachlan McIlroy
@ 2008-09-24  7:46     ` Dave Chinner
  2008-09-24  8:03       ` Lachlan McIlroy
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-24  7:46 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Wed, Sep 24, 2008 at 05:42:38PM +1000, Lachlan McIlroy wrote:
> Looks good Pete.

No, it is not yet good. Pete cannot explain the underlying problem
and we need to understand if this is fixing the problem or just
changing the timing so it doesn't show up....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  7:46     ` Dave Chinner
@ 2008-09-24  8:03       ` Lachlan McIlroy
  2008-09-24 14:42         ` Christoph Hellwig
  2008-09-24  8:15       ` Peter Leckie
  2008-09-24 14:41       ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-24  8:03 UTC (permalink / raw)
  To: Lachlan McIlroy, Peter Leckie, xfs, xfs-dev

Dave Chinner wrote:
> On Wed, Sep 24, 2008 at 05:42:38PM +1000, Lachlan McIlroy wrote:
>> Looks good Pete.
> 
> No, it is not yet good. Pete cannot explain the underlying problem
> and we need to understand if this is fixing the problem or just
> changing the timing so it doesn't show up....
> 

Pete clearly demonstrated to me that the sv_wait is being prematurely
woken up when it should not be.  There may be an underlying problem or
this could just be another peculiarity of the Linux kernel.  Either way
Pete will continue to look into this and whatever the explanation turns
out to be, we deperately need a way to prevent our customers from
running into this deadlock.  Even if there is another problem lurking
this code change is fine and if nothing else will serve as a performance
improvement.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  7:46     ` Dave Chinner
  2008-09-24  8:03       ` Lachlan McIlroy
@ 2008-09-24  8:15       ` Peter Leckie
  2008-09-25  1:03         ` Dave Chinner
  2008-09-24 14:41       ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Peter Leckie @ 2008-09-24  8:15 UTC (permalink / raw)
  To: xfs, xfs-dev

Dave Chinner wrote:
> No, it is not yet good. Pete cannot explain the underlying problem
> and we need to understand if this is fixing the problem or just
>
>   
No this patch does not change the timing so it does not happen, it fixes 
a problem
we don't properly understand. Using wait_event prevents the thread from 
waking up
unless the test condition is actually true. The the bug here is we are 
being woken up
with our test condition still being false.

And this patch catches that case and hence fixes the problem the 
question is
why is this happening in the first place. And yes that is an unanswered 
question.

> Only the XFS code can cause it to be woken....
Do you know this for sure?

> Wait queues are not affected by such events when they are
> configured to be uninterruptable. The sv_t:
>
> #define sv_wait(sv, pri, lock, s) \
>         _sv_wait(sv, lock, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)
>
> Is as uninterruptible as it comes. Which means only an explicit
> wakeup will cause any waiters to wake up.
>
>   

I added some tracing to sv_broadcast to trace anyone waking up the 
thread however
the only time it was called was from the unpinning code well after the 
thread had already
been woken up.

> Assert fail the machine when a dquot with a non-zero pincount is
> woken in xfs_qm_dqunpin_wait(). If the assert trips, then we need
> to work out how the pin count is getting elevated while we have
> a waiter....
Done that, and it trips however it does not help us as there is no 
insight into
who woke us.


XFS does have references to wake_up in other spots for example in 
xfs_super.c and xfs_buf.c
so I could add some tracing there, however the most ideal spot is in 
__wake_up_common()
so as soon as I have a chance I'll add some tracing there so we can work 
out what's going on here.


Either way this patch resolves this issue and is a nice code cleanup.

Thanks,
Pete

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  7:46     ` Dave Chinner
  2008-09-24  8:03       ` Lachlan McIlroy
  2008-09-24  8:15       ` Peter Leckie
@ 2008-09-24 14:41       ` Christoph Hellwig
  2008-09-25  1:08         ` Dave Chinner
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-24 14:41 UTC (permalink / raw)
  To: Lachlan McIlroy, Peter Leckie, xfs, xfs-dev

On Wed, Sep 24, 2008 at 05:46:04PM +1000, Dave Chinner wrote:
> On Wed, Sep 24, 2008 at 05:42:38PM +1000, Lachlan McIlroy wrote:
> > Looks good Pete.
> 
> No, it is not yet good. Pete cannot explain the underlying problem
> and we need to understand if this is fixing the problem or just
> changing the timing so it doesn't show up....

The patch does not only cause timing but also makes sure
xfs_qm_dqunpin_wait sleeps again when woken up but the condition it was
waiting on is not met.  That's the reason why we have wait_event in
Linux instead of the more traditional sv-style conditional variables.

Now the spurious wakeup from scheduler argument doesn't make any sense,
so this spurious wakeup we're protecting from must come from XFS itself.
The way this could happen is when a task trying to pin the dquot gets
qi_pinlock before the one waiting for q_pincount to reach zero.

So the patch does looks good to me, but the current explanation needs
some updating.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  8:03       ` Lachlan McIlroy
@ 2008-09-24 14:42         ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-24 14:42 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Wed, Sep 24, 2008 at 06:03:22PM +1000, Lachlan McIlroy wrote:
> Pete clearly demonstrated to me that the sv_wait is being prematurely
> woken up when it should not be.  There may be an underlying problem or
> this could just be another peculiarity of the Linux kernel.  Either way
> Pete will continue to look into this and whatever the explanation turns
> out to be, we deperately need a way to prevent our customers from
> running into this deadlock.  Even if there is another problem lurking
> this code change is fine and if nothing else will serve as a performance
> improvement.

Folks, please at least invest half the time you spend flaming each other
into looking at underlying causes ;-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24  8:15       ` Peter Leckie
@ 2008-09-25  1:03         ` Dave Chinner
  2008-09-25  8:43           ` Peter Leckie
  2008-09-29 21:45           ` Christoph Hellwig
  0 siblings, 2 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-25  1:03 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

[ Pete, please wrap your text at 68-72 columns. ]

On Wed, Sep 24, 2008 at 06:15:20PM +1000, Peter Leckie wrote:
> Dave Chinner wrote:
>> No, it is not yet good. Pete cannot explain the underlying problem
>> and we need to understand if this is fixing the problem or just
>>
>>   
> No this patch does not change the timing so it does not happen, it fixes  
> a problem
> we don't properly understand.

Therein lies my objection - you can't fix what you don't understand.
What you are proposing is a work-around. The root cause has not yet
been found....

A recent example, perhaps. We discovered that the way semaphores are
implemented on x86_64 can result in a thread calling up() still
using the semaphore when the waiting down() has already been woken
and hence can free the semaphore while we are still using it. The
first patch I saw fixed the symptom seen in the buffer cache by
adding an extra reference to the xfs_buf_t across I/O so that
the up() in I/O completion was guaranteed to complete before we
dropped the reference and freed the xfs_buf_t.

However, looking at it more deeply lead us to the fundamental problem
that semaphores are optimised in an non-obvious way that lead to
this problem (i.e. that down() can complete before up()). The result
of understanding this is that semaphores were not safe for use
within XFS for flush locks, I/O completion semaphores, etc, so we
had to replace them all with completions.

This is a similar situation - if the sv_t is broken, we need to
replace all users, not just work around one symptom of the
brokenness. This is expecially important as the remaining users
of sv_t's are in the log for iclog synchronisation....

>> Only the XFS code can cause it to be woken....
> Do you know this for sure?

Yes! wake_up() can only wake tasks on that wait queue's task list.
Each different wait queue has it's own task list....

> I added some tracing to sv_broadcast to trace anyone waking up the  
> thread however
> the only time it was called was from the unpinning code well after the  
> thread had already
> been woken up.

So did the unpin wait even sleep? i.e. did the unpin waiter
receive a wakeup to get into the state it was in or did it just pass
straight through the wait code?

>> Assert fail the machine when a dquot with a non-zero pincount is
>> woken in xfs_qm_dqunpin_wait(). If the assert trips, then we need
>> to work out how the pin count is getting elevated while we have
>> a waiter....
> Done that, and it trips however it does not help us as there is no  
> insight into
> who woke us.

Ok - so what was the pin count value before it went to sleep?
i.e. Did it change at all across the sleep?

> Either way this patch resolves this issue and is a nice code cleanup.

Still, don't check it in until we understand whether sv_t's are
completely broken or not...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-24 14:41       ` Christoph Hellwig
@ 2008-09-25  1:08         ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-25  1:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Lachlan McIlroy, Peter Leckie, xfs, xfs-dev

On Wed, Sep 24, 2008 at 10:41:21AM -0400, Christoph Hellwig wrote:
> On Wed, Sep 24, 2008 at 05:46:04PM +1000, Dave Chinner wrote:
> > On Wed, Sep 24, 2008 at 05:42:38PM +1000, Lachlan McIlroy wrote:
> > > Looks good Pete.
> > 
> > No, it is not yet good. Pete cannot explain the underlying problem
> > and we need to understand if this is fixing the problem or just
> > changing the timing so it doesn't show up....
> 
> The patch does not only cause timing but also makes sure
> xfs_qm_dqunpin_wait sleeps again when woken up but the condition it was
> waiting on is not met.  That's the reason why we have wait_event in
> Linux instead of the more traditional sv-style conditional variables.
> 
> Now the spurious wakeup from scheduler argument doesn't make any sense,
> so this spurious wakeup we're protecting from must come from XFS itself.
> The way this could happen is when a task trying to pin the dquot gets
> qi_pinlock before the one waiting for q_pincount to reach zero.

Can't happen. To pin the dquot you have to hold the dquot lock. That
dquot lock is held by the one waiting for the q_pincount to reach
zero. i.e. pin and unpin_wait are mutually exclusive.

Also, qi_pinlock is a spinlock, so it should not be triggering
any spurious scheduler events that wake up threads sleeping on some
unrelated wait queue....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-25  1:03         ` Dave Chinner
@ 2008-09-25  8:43           ` Peter Leckie
  2008-09-25  9:12             ` Christoph Hellwig
                               ` (2 more replies)
  2008-09-29 21:45           ` Christoph Hellwig
  1 sibling, 3 replies; 34+ messages in thread
From: Peter Leckie @ 2008-09-25  8:43 UTC (permalink / raw)
  To: xfs, xfs-dev

> Still, don't check it in until we understand whether sv_t's are
> completely broken or not...
Well I added some tracing code to the __wake_up_common, however it never 
tripped
which made me think "are we even being woken up from the wait queue", or 
is someone
directly waking us up from the task struct. So I had a look and found 
the following.

xfsaild_wakeup(
       xfs_mount_t             *mp,
       xfs_lsn_t               threshold_lsn)
{
       mp->m_ail.xa_target = threshold_lsn;
       wake_up_process(mp->m_ail.xa_task);
}

Which is indirectly called from xlog_grant_push_ail, which is called 
from various other
places.

In fact this bug is not restricted to the aild the xfssyncd also hit 
this issue a number of times
during todays testing where it was woken while waiting on sv_wait for 
the pincount to drop
to zero.

It also is woken up from a number of functions in xfs_super.c including
xfs_syncd_queue_work(), xfs_sync_worker(), xfs_fs_sync_super()

The change that introduced the wake_up on the aild was introduced from

modid: xfs-linux-melb:xfs-kern:30371a
Move AIL pushing into it's own thread

However xfssyncd has had a long history of the task being woken up from 
other code,
so it looks like it's simply not safe for either the aild or xfssyncd to 
sleep on a queue assuming that
no one else will wake the processes up.

So I would say the fix I proposed is a good solution for this issue.

However there are other functions that use sv_wait and should also be 
fixed in a similar way so I'll
look into the other callers and prepare a patch tomorrow.

Thanks,

Pete

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-25  8:43           ` Peter Leckie
@ 2008-09-25  9:12             ` Christoph Hellwig
  2008-09-26  0:34             ` Dave Chinner
  2008-09-26  1:10             ` Lachlan McIlroy
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-25  9:12 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

On Thu, Sep 25, 2008 at 06:43:43PM +1000, Peter Leckie wrote:
> So I would say the fix I proposed is a good solution for this issue.
>
> However there are other functions that use sv_wait and should also be  
> fixed in a similar way so I'll
> look into the other callers and prepare a patch tomorrow.

Note that most users of sv_wait do actually re-check the condition.
It's just that wait_event enforces it in the API while sv_wait doesn't
make it as explicit.   But another audit of this would be a good thing.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-25  8:43           ` Peter Leckie
  2008-09-25  9:12             ` Christoph Hellwig
@ 2008-09-26  0:34             ` Dave Chinner
  2008-09-26  1:09               ` Peter Leckie
                                 ` (2 more replies)
  2008-09-26  1:10             ` Lachlan McIlroy
  2 siblings, 3 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-26  0:34 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

On Thu, Sep 25, 2008 at 06:43:43PM +1000, Peter Leckie wrote:
>
>> Still, don't check it in until we understand whether sv_t's are
>> completely broken or not...
> Well I added some tracing code to the __wake_up_common, however it never  
> tripped
> which made me think "are we even being woken up from the wait queue", or  
> is someone
> directly waking us up from the task struct. So I had a look and found  
> the following.
>
> xfsaild_wakeup(
>       xfs_mount_t             *mp,
>       xfs_lsn_t               threshold_lsn)
> {
>       mp->m_ail.xa_target = threshold_lsn;
>       wake_up_process(mp->m_ail.xa_task);
> }
>
> Which is indirectly called from xlog_grant_push_ail, which is called  
> from various other
> places.

Ok, so that one will only wake up the xfsaild, which does not flush
pinned items - it will never end up in an unpin wait for any type
of item, so we can rule that one out.

> In fact this bug is not restricted to the aild the xfssyncd also hit  
> this issue a number of times
> during todays testing where it was woken while waiting on sv_wait for  
> the pincount to drop
> to zero.

Ok, so there is the fundamental issue. This one is problematic
because xfssyncd calls into xfs_sync() -> xfs_qm_sync(). It does
so with the flag SYNC_BDFLUSH set, which means:

1013         /*
1014          * We won't block unless we are asked to.
1015          */
1016         nowait = (boolean_t)(flags & SYNC_BDFLUSH || (flags & SYNC_WAIT) == 0);
1017

We should not be blocking when flushing dquots. IOWs, we should not
be waiting on pinned quots in xfs_qm_sync() when it calls
xfs_dqflush(). i.e. it should behave exactly like the inode flush
code.

i.e. the reason why we are seeing this is that xfs_dqflush is not
obeying the non-blocking semantics of the sync that it is being
asked to run. If we enter xfs_sync() from anywhere else, then we
won't have task wakeups occurring to interrupt a pin wait on a
synchronous sync....

> It also is woken up from a number of functions in xfs_super.c including
> xfs_syncd_queue_work(), xfs_sync_worker(), xfs_fs_sync_super()

Yeah, when different work needs doing.

> The change that introduced the wake_up on the aild was introduced from
>
> modid: xfs-linux-melb:xfs-kern:30371a
> Move AIL pushing into it's own thread
>
> However xfssyncd has had a long history of the task being woken up from  
> other code,
> so it looks like it's simply not safe for either the aild or xfssyncd to  
> sleep on a queue assuming that
> no one else will wake the processes up.

Given that both xfsaild and xfssyncd are supposed to be doing
non-blocking flushes, neither of them should ever be waiting on a
pinned item, therefore fixing that problem in xfs_qm_dqflush()
should make this problem go away. It will also substantially
reduce tehnumber of log forces being triggered by dquot writeback
which will have positive impact on performance, too.

> So I would say the fix I proposed is a good solution for this issue.

but it doesn't fix the underlying problem that was causing the
spurious wakeups, which is the fact that xfs_qm_dqflush() is not
obeying non-blocking flush directions. The patch below should fix
that. Can you please test it before you add your patch?

> However there are other functions that use sv_wait and should also be  
> fixed in a similar way so I'll
> look into the other callers and prepare a patch tomorrow.

The log force and write sv_t's are already in loops that would catch
spurious wakeups, so I don't think there's a problem there....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

XFS: don't block in xfs_qm_dqflush() during async writeback

Normally dquots are written back via delayed write mechanisms.  They
are flushed to their backing buffer by xfssyncd, which is then
pushed out by either AIL or xfsbufd flushing. The flush from the
xfssyncd is supposed to be non-blocking, but xfs_qm_dqflush() always
waits for pinned duots, which means that it will block for the
length of time it takes to do a synchronous log force. This causes
unnecessary extra log I/O to be issued whenever we try to flush a
busy dquot.

Avoid the log forces and blocking xfssyncd by making xfs_qm_dqflush()
pay attention to what type of sync it is doing when it sees a pinned
dquot and not waiting when doing non-blocking flushes.
---
 fs/xfs/quota/xfs_dquot.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/quota/xfs_dquot.c b/fs/xfs/quota/xfs_dquot.c
index d738d37..52c8902 100644
--- a/fs/xfs/quota/xfs_dquot.c
+++ b/fs/xfs/quota/xfs_dquot.c
@@ -1229,8 +1229,13 @@ xfs_qm_dqflush(
 	}

 	/*
-	 * Cant flush a pinned dquot. Wait for it.
+	 * Cant flush a pinned dquot. If we are not supposed to block,
+	 * don't wait for it.
 	 */
+	if (!(flags & XFS_QMOPT_SYNC) && dqp->q_pincount > 0) {
+		xfs_dqfunlock(dqp);
+		return (0);
+	}
 	xfs_qm_dqunpin_wait(dqp);

 	/*

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  0:34             ` Dave Chinner
@ 2008-09-26  1:09               ` Peter Leckie
  2008-09-26  1:26                 ` Lachlan McIlroy
  2008-09-26  1:32               ` Lachlan McIlroy
  2008-09-26 11:27               ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Peter Leckie @ 2008-09-26  1:09 UTC (permalink / raw)
  To: xfs, xfs-dev

Dave Chinner wrote:
> but it doesn't fix the underlying problem that was causing the
> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
> obeying non-blocking flush directions. The patch below should fix
> that. Can you please test it before you add your patch?
>   
Yeah I already had this idea I  just have not posted a patch because 
Lachlan though
it might introduce a deadlock. If you think this is a good Idea I will 
update my patch
to  be non blocking.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-25  8:43           ` Peter Leckie
  2008-09-25  9:12             ` Christoph Hellwig
  2008-09-26  0:34             ` Dave Chinner
@ 2008-09-26  1:10             ` Lachlan McIlroy
  2008-09-26 11:28               ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-26  1:10 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

Peter Leckie wrote:
> 
>> Still, don't check it in until we understand whether sv_t's are
>> completely broken or not...
> Well I added some tracing code to the __wake_up_common, however it never 
> tripped
> which made me think "are we even being woken up from the wait queue", or 
> is someone
> directly waking us up from the task struct. So I had a look and found 
> the following.
> 
> xfsaild_wakeup(
>       xfs_mount_t             *mp,
>       xfs_lsn_t               threshold_lsn)
> {
>       mp->m_ail.xa_target = threshold_lsn;
>       wake_up_process(mp->m_ail.xa_task);
> }
> 
> Which is indirectly called from xlog_grant_push_ail, which is called 
> from various other
> places.
> 
> In fact this bug is not restricted to the aild the xfssyncd also hit 
> this issue a number of times
> during todays testing where it was woken while waiting on sv_wait for 
> the pincount to drop
> to zero.
> 
> It also is woken up from a number of functions in xfs_super.c including
> xfs_syncd_queue_work(), xfs_sync_worker(), xfs_fs_sync_super()
> 
> 
> 
> The change that introduced the wake_up on the aild was introduced from
> 
> modid: xfs-linux-melb:xfs-kern:30371a
> Move AIL pushing into it's own thread
> 
> 
> However xfssyncd has had a long history of the task being woken up from 
> other code,
> so it looks like it's simply not safe for either the aild or xfssyncd to 
> sleep on a queue assuming that
> no one else will wake the processes up.
> 
> So I would say the fix I proposed is a good solution for this issue.
> 
> However there are other functions that use sv_wait and should also be 
> fixed in a similar way so I'll
> look into the other callers and prepare a patch tomorrow.

Good work Pete.  We should also consider replacing all calls to
wake_up_process() with wake_up() and a wait queue so we don't go
waking up threads when we shouldn't be.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:09               ` Peter Leckie
@ 2008-09-26  1:26                 ` Lachlan McIlroy
  2008-09-27  1:08                   ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-26  1:26 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

Peter Leckie wrote:
> Dave Chinner wrote:
>> but it doesn't fix the underlying problem that was causing the
>> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
>> obeying non-blocking flush directions. The patch below should fix
>> that. Can you please test it before you add your patch?
>>   
> Yeah I already had this idea I  just have not posted a patch because 
> Lachlan though
> it might introduce a deadlock.
I suggested some changes a while back to make tail pushing non-blocking
and Dave thought it might cause a deadlock.

http://oss.sgi.com/archives/xfs/2008-07/msg00472.html

I actually did hit a deadlock with this change but could not figure out
why.  It may have been the same issue Pete is trying to fix here.

> If you think this is a good Idea I will 
> update my patch
> to  be non blocking.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  0:34             ` Dave Chinner
  2008-09-26  1:09               ` Peter Leckie
@ 2008-09-26  1:32               ` Lachlan McIlroy
  2008-09-26  1:38                 ` Peter Leckie
                                   ` (2 more replies)
  2008-09-26 11:27               ` Christoph Hellwig
  2 siblings, 3 replies; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-26  1:32 UTC (permalink / raw)
  To: Peter Leckie, xfs, xfs-dev

Dave Chinner wrote:
> On Thu, Sep 25, 2008 at 06:43:43PM +1000, Peter Leckie wrote:
>>> Still, don't check it in until we understand whether sv_t's are
>>> completely broken or not...
>> Well I added some tracing code to the __wake_up_common, however it never  
>> tripped
>> which made me think "are we even being woken up from the wait queue", or  
>> is someone
>> directly waking us up from the task struct. So I had a look and found  
>> the following.
>>
>> xfsaild_wakeup(
>>       xfs_mount_t             *mp,
>>       xfs_lsn_t               threshold_lsn)
>> {
>>       mp->m_ail.xa_target = threshold_lsn;
>>       wake_up_process(mp->m_ail.xa_task);
>> }
>>
>> Which is indirectly called from xlog_grant_push_ail, which is called  
>> from various other
>> places.
> 
> Ok, so that one will only wake up the xfsaild, which does not flush
> pinned items - it will never end up in an unpin wait for any type
> of item, so we can rule that one out.
> 
>> In fact this bug is not restricted to the aild the xfssyncd also hit  
>> this issue a number of times
>> during todays testing where it was woken while waiting on sv_wait for  
>> the pincount to drop
>> to zero.
> 
> Ok, so there is the fundamental issue. This one is problematic
> because xfssyncd calls into xfs_sync() -> xfs_qm_sync(). It does
> so with the flag SYNC_BDFLUSH set, which means:
> 
> 1013         /*
> 1014          * We won't block unless we are asked to.
> 1015          */
> 1016         nowait = (boolean_t)(flags & SYNC_BDFLUSH || (flags & SYNC_WAIT) == 0);
> 1017
> 
> 
> We should not be blocking when flushing dquots. IOWs, we should not
> be waiting on pinned quots in xfs_qm_sync() when it calls
> xfs_dqflush(). i.e. it should behave exactly like the inode flush
> code.
> 
> i.e. the reason why we are seeing this is that xfs_dqflush is not
> obeying the non-blocking semantics of the sync that it is being
> asked to run. If we enter xfs_sync() from anywhere else, then we
> won't have task wakeups occurring to interrupt a pin wait on a
> synchronous sync....
> 
>> It also is woken up from a number of functions in xfs_super.c including
>> xfs_syncd_queue_work(), xfs_sync_worker(), xfs_fs_sync_super()
> 
> Yeah, when different work needs doing.
> 
>> The change that introduced the wake_up on the aild was introduced from
>>
>> modid: xfs-linux-melb:xfs-kern:30371a
>> Move AIL pushing into it's own thread
>>
>> However xfssyncd has had a long history of the task being woken up from  
>> other code,
>> so it looks like it's simply not safe for either the aild or xfssyncd to  
>> sleep on a queue assuming that
>> no one else will wake the processes up.
> 
> Given that both xfsaild and xfssyncd are supposed to be doing
> non-blocking flushes, neither of them should ever be waiting on a
> pinned item, therefore fixing that problem in xfs_qm_dqflush()
> should make this problem go away. It will also substantially
> reduce tehnumber of log forces being triggered by dquot writeback
> which will have positive impact on performance, too.
> 
>> So I would say the fix I proposed is a good solution for this issue.
> 
> but it doesn't fix the underlying problem that was causing the
> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
> obeying non-blocking flush directions.
The underlying problem has nothing to do with xfs_qm_dqflush() - the
spurious wakeups are caused by calls to wake_up_process() that arbitrarily
wake up a process that is in a state where it shouldn't be woken up.  If
we don't fix the spurious wakeups then we could easily re-introduce this
problem again.  If xfs_qm_dqflush() should be non-blocking then that's a
separate change and it sounds like a good change too.

> The patch below should fix
> that. Can you please test it before you add your patch?
> 
>> However there are other functions that use sv_wait and should also be  
>> fixed in a similar way so I'll
>> look into the other callers and prepare a patch tomorrow.
> 
> The log force and write sv_t's are already in loops that would catch
> spurious wakeups, so I don't think there's a problem there....
> 
> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:32               ` Lachlan McIlroy
@ 2008-09-26  1:38                 ` Peter Leckie
  2008-09-26  1:44                   ` Mark Goodwin
  2008-09-26 11:31                   ` Christoph Hellwig
  2008-09-26  2:57                 ` Dave Chinner
  2008-09-26 11:30                 ` Christoph Hellwig
  2 siblings, 2 replies; 34+ messages in thread
From: Peter Leckie @ 2008-09-26  1:38 UTC (permalink / raw)
  Cc: xfs, xfs-dev

Lachlan McIlroy wrote:
> The underlying problem has nothing to do with xfs_qm_dqflush() - the
> spurious wakeups are caused by calls to wake_up_process() that 
> arbitrarily
> wake up a process that is in a state where it shouldn't be woken up.  If
> we don't fix the spurious wakeups then we could easily re-introduce this
> problem again.  If xfs_qm_dqflush() should be non-blocking then that's a
> separate change and it sounds like a good change too.
Ok so what do we want to do. It almost sounds like there are 3 issues I 
need to solve,
first clean up the code, second make xfs_qm_dqflush() non blocking, and 3ed
fix up the spurious wakeups.

Should I propose 3 patches to fix each of these issues?

Pete

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:38                 ` Peter Leckie
@ 2008-09-26  1:44                   ` Mark Goodwin
  2008-09-26  1:54                     ` Peter Leckie
  2008-09-26 11:31                   ` Christoph Hellwig
  1 sibling, 1 reply; 34+ messages in thread
From: Mark Goodwin @ 2008-09-26  1:44 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev



Peter Leckie wrote:
> Lachlan McIlroy wrote:
>> The underlying problem has nothing to do with xfs_qm_dqflush() - the
>> spurious wakeups are caused by calls to wake_up_process() that 
>> arbitrarily
>> wake up a process that is in a state where it shouldn't be woken up.  If
>> we don't fix the spurious wakeups then we could easily re-introduce this
>> problem again.  If xfs_qm_dqflush() should be non-blocking then that's a
>> separate change and it sounds like a good change too.
> Ok so what do we want to do. It almost sounds like there are 3 issues I 
> need to solve,
> first clean up the code, second make xfs_qm_dqflush() non blocking, and 3ed
> fix up the spurious wakeups.
> 
> Should I propose 3 patches to fix each of these issues?

yes - can we take the quota fix "as-is" for now. Those of us shipping
NAS servers with quotas enabled need this fix.

The other two issues need more investigation and look like they may have
more fundamental implications - they should be separate bugs.

Thanks

-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:44                   ` Mark Goodwin
@ 2008-09-26  1:54                     ` Peter Leckie
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Leckie @ 2008-09-26  1:54 UTC (permalink / raw)
  To: markgw; +Cc: xfs, xfs-dev


> yes - can we take the quota fix "as-is" for now. Those of us shipping
> NAS servers with quotas enabled need this fix.
Ok is everyone happy for me to push the original patch now and I'll  
propose 2 more
patches latter? The other 2 patches should not touch code involved with 
this patch.

Thanks,
Pete

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:32               ` Lachlan McIlroy
  2008-09-26  1:38                 ` Peter Leckie
@ 2008-09-26  2:57                 ` Dave Chinner
  2008-09-26  3:38                   ` Lachlan McIlroy
  2008-09-26 11:30                 ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2008-09-26  2:57 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 11:32:43AM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Thu, Sep 25, 2008 at 06:43:43PM +1000, Peter Leckie wrote:
>>> However xfssyncd has had a long history of the task being woken up 
>>> from  other code,
>>> so it looks like it's simply not safe for either the aild or xfssyncd 
>>> to  sleep on a queue assuming that
>>> no one else will wake the processes up.
>>
>> Given that both xfsaild and xfssyncd are supposed to be doing
>> non-blocking flushes, neither of them should ever be waiting on a
>> pinned item, therefore fixing that problem in xfs_qm_dqflush()
>> should make this problem go away. It will also substantially
>> reduce tehnumber of log forces being triggered by dquot writeback
>> which will have positive impact on performance, too.
>>
>>> So I would say the fix I proposed is a good solution for this issue.
>>
>> but it doesn't fix the underlying problem that was causing the
>> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
>> obeying non-blocking flush directions.
>
> The underlying problem has nothing to do with xfs_qm_dqflush() - the
> spurious wakeups are caused by calls to wake_up_process() that arbitrarily
> wake up a process that is in a state where it shouldn't be woken up.

Spurious wakeups are causing problems in a place where we should not 
even be sleeping. If you don't sleep there, you can't get spurious
wakeups....

> If we don't fix the spurious wakeups then we could easily re-introduce this
> problem again.

Right, but keep in mind that the patch doesn't prevent spurious
wakeups - it merely causes the thread to wakeup and go back to sleep
when a spurious wakeup occurs. The patch I posted avoids the
spurious wakeup problem completely, which is what we should be
aiming to do given it avoids the overhead of 2 context switches
and speeds up the rate at which we can flush unpinned dquots.

That being said, I agree that the original patch is still desirable,
though not from a bug-fix perspective. It's a cleanup and
optimisation patch, with the nice side effect of preventing future
occurrences of the spurious wakeup problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  2:57                 ` Dave Chinner
@ 2008-09-26  3:38                   ` Lachlan McIlroy
  2008-09-27  1:11                     ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-26  3:38 UTC (permalink / raw)
  To: Lachlan McIlroy, Peter Leckie, xfs, xfs-dev

Dave Chinner wrote:
> On Fri, Sep 26, 2008 at 11:32:43AM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> On Thu, Sep 25, 2008 at 06:43:43PM +1000, Peter Leckie wrote:
>>>> However xfssyncd has had a long history of the task being woken up 
>>>> from  other code,
>>>> so it looks like it's simply not safe for either the aild or xfssyncd 
>>>> to  sleep on a queue assuming that
>>>> no one else will wake the processes up.
>>> Given that both xfsaild and xfssyncd are supposed to be doing
>>> non-blocking flushes, neither of them should ever be waiting on a
>>> pinned item, therefore fixing that problem in xfs_qm_dqflush()
>>> should make this problem go away. It will also substantially
>>> reduce tehnumber of log forces being triggered by dquot writeback
>>> which will have positive impact on performance, too.
>>>
>>>> So I would say the fix I proposed is a good solution for this issue.
>>> but it doesn't fix the underlying problem that was causing the
>>> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
>>> obeying non-blocking flush directions.
>> The underlying problem has nothing to do with xfs_qm_dqflush() - the
>> spurious wakeups are caused by calls to wake_up_process() that arbitrarily
>> wake up a process that is in a state where it shouldn't be woken up.
> 
> Spurious wakeups are causing problems in a place where we should not 
> even be sleeping. If you don't sleep there, you can't get spurious
> wakeups....
> 
>> If we don't fix the spurious wakeups then we could easily re-introduce this
>> problem again.
> 
> Right, but keep in mind that the patch doesn't prevent spurious
> wakeups - it merely causes the thread to wakeup and go back to sleep
Yes that's right and it's why I suggested replacing the uses of wake_up_process
with wake_up and a wait queue where both the xfsaild and xfssyncd threads can
have a wait queue specific to them.  This way we only wake them up if they are
sleeping on that wait queue and not somewhere else waiting for a different event.
I'm pretty sure that will be a safe change to make.

> when a spurious wakeup occurs. The patch I posted avoids the
> spurious wakeup problem completely, which is what we should be
> aiming to do given it avoids the overhead of 2 context switches
> and speeds up the rate at which we can flush unpinned dquots.
> 
> That being said, I agree that the original patch is still desirable,
> though not from a bug-fix perspective. It's a cleanup and
> optimisation patch, with the nice side effect of preventing future
> occurrences of the spurious wakeup problem....
> 
> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  0:34             ` Dave Chinner
  2008-09-26  1:09               ` Peter Leckie
  2008-09-26  1:32               ` Lachlan McIlroy
@ 2008-09-26 11:27               ` Christoph Hellwig
  2008-09-27  1:18                 ` Dave Chinner
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-26 11:27 UTC (permalink / raw)
  To: Peter Leckie, xfs, xfs-dev

>  	/*
> -	 * Cant flush a pinned dquot. Wait for it.
> +	 * Cant flush a pinned dquot. If we are not supposed to block,
> +	 * don't wait for it.
>  	 */
> +	if (!(flags & XFS_QMOPT_SYNC) && dqp->q_pincount > 0) {
> +		xfs_dqfunlock(dqp);
> +		return (0);
> +	}
>  	xfs_qm_dqunpin_wait(dqp);

Looks good, but please remove the braces around the 0.  (And yes, I know
that the statement just above it does it too..)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:10             ` Lachlan McIlroy
@ 2008-09-26 11:28               ` Christoph Hellwig
  2008-09-29  3:08                 ` Lachlan McIlroy
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-26 11:28 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 11:10:26AM +1000, Lachlan McIlroy wrote:
> Good work Pete.  We should also consider replacing all calls to
> wake_up_process() with wake_up() and a wait queue so we don't go
> waking up threads when we shouldn't be.

No.  The daemons should not block anyway in these places, and using
a waitqueue just causes additional locking overhead.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:32               ` Lachlan McIlroy
  2008-09-26  1:38                 ` Peter Leckie
  2008-09-26  2:57                 ` Dave Chinner
@ 2008-09-26 11:30                 ` Christoph Hellwig
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-26 11:30 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 11:32:43AM +1000, Lachlan McIlroy wrote:
>> but it doesn't fix the underlying problem that was causing the
>> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
>> obeying non-blocking flush directions.
> The underlying problem has nothing to do with xfs_qm_dqflush() - the
> spurious wakeups are caused by calls to wake_up_process() that arbitrarily
> wake up a process that is in a state where it shouldn't be woken up.  If
> we don't fix the spurious wakeups then we could easily re-introduce this
> problem again.  If xfs_qm_dqflush() should be non-blocking then that's a
> separate change and it sounds like a good change too.

That sounds like there is one single underlying problem, but there
isn't.  The first problem was that we didn't re-check the condition
after sv_wait.  We must always do this, if only for robustness reasons.
The second problem we have here is that xfssyncd blocks when flushing
quotas.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:38                 ` Peter Leckie
  2008-09-26  1:44                   ` Mark Goodwin
@ 2008-09-26 11:31                   ` Christoph Hellwig
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-26 11:31 UTC (permalink / raw)
  To: Peter Leckie; +Cc: xfs, xfs-dev

On Fri, Sep 26, 2008 at 11:38:27AM +1000, Peter Leckie wrote:
> Lachlan McIlroy wrote:
>> The underlying problem has nothing to do with xfs_qm_dqflush() - the
>> spurious wakeups are caused by calls to wake_up_process() that  
>> arbitrarily
>> wake up a process that is in a state where it shouldn't be woken up.  If
>> we don't fix the spurious wakeups then we could easily re-introduce this
>> problem again.  If xfs_qm_dqflush() should be non-blocking then that's a
>> separate change and it sounds like a good change too.
> Ok so what do we want to do. It almost sounds like there are 3 issues I  
> need to solve,
> first clean up the code, second make xfs_qm_dqflush() non blocking, and 3ed
> fix up the spurious wakeups.
>
> Should I propose 3 patches to fix each of these issues?

Well, your patch for 1 is in, Dave has one for 2, and I don't think
three is an issue - at least for xfssyncd.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  1:26                 ` Lachlan McIlroy
@ 2008-09-27  1:08                   ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-27  1:08 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 11:26:38AM +1000, Lachlan McIlroy wrote:
> Peter Leckie wrote:
>> Dave Chinner wrote:
>>> but it doesn't fix the underlying problem that was causing the
>>> spurious wakeups, which is the fact that xfs_qm_dqflush() is not
>>> obeying non-blocking flush directions. The patch below should fix
>>> that. Can you please test it before you add your patch?
>>>   
>> Yeah I already had this idea I  just have not posted a patch because  
>> Lachlan though
>> it might introduce a deadlock.
> I suggested some changes a while back to make tail pushing non-blocking
> and Dave thought it might cause a deadlock.
>
> http://oss.sgi.com/archives/xfs/2008-07/msg00472.html

That is a different case - the aborting of writeback due to a locked
inode cluster buffer could be problematic for the AIL code
because it already has special code to handle cluster buffer pushing
in the case of DELWRI flushed inodes.

The case I described is what the xfsaild "watchdog timeout" really
catches - before the aild the filesystem would simply lock up, and
one way to trigger that was to have the AIL traversal restart too
many times without making progress. The AIL cursor patch series I
posted fixes the excessive restart problem, but doesn't prevent
problem from occurring if the async push doesn't actually write the
item back in IOP_PUSH(). Effectively we need to tweak the ail
push-wait-push loops in the log grant code to avoid this problem.

FWIW, IOP_TRYLOCK() will return ITEM_PINNED for any object
that is still pinned, and hence the AIL does not call IOP_PUSH()
for such items. Instead it schedules a non-blocking log force
for the end of the traversal to get things moving for the next
push to flush it out. Hence the AIL pushing should never, ever be
trying to push a pinned inode or dquot to disk, and hence the
proposed change will not affect AIL pushing at all....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26  3:38                   ` Lachlan McIlroy
@ 2008-09-27  1:11                     ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-27  1:11 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 01:38:45PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> Right, but keep in mind that the patch doesn't prevent spurious
>> wakeups - it merely causes the thread to wakeup and go back to sleep
> Yes that's right and it's why I suggested replacing the uses of wake_up_process
> with wake_up and a wait queue where both the xfsaild and xfssyncd threads can
> have a wait queue specific to them.  This way we only wake them up if they are
> sleeping on that wait queue and not somewhere else waiting for a different event.
> I'm pretty sure that will be a safe change to make.

Yes, that sounds like a good idea.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26 11:27               ` Christoph Hellwig
@ 2008-09-27  1:18                 ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2008-09-27  1:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Peter Leckie, xfs, xfs-dev

On Fri, Sep 26, 2008 at 07:27:29AM -0400, Christoph Hellwig wrote:
> >  	/*
> > -	 * Cant flush a pinned dquot. Wait for it.
> > +	 * Cant flush a pinned dquot. If we are not supposed to block,
> > +	 * don't wait for it.
> >  	 */
> > +	if (!(flags & XFS_QMOPT_SYNC) && dqp->q_pincount > 0) {
> > +		xfs_dqfunlock(dqp);
> > +		return (0);
> > +	}
> >  	xfs_qm_dqunpin_wait(dqp);
> 
> Looks good, but please remove the braces around the 0.  (And yes, I know
> that the statement just above it does it too..)

No prize for guessing where I copied the code from, then ;)

New version that combines the dirty check with the pin count
check below....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


XFS: don't block in xfs_qm_dqflush() during async writeback

Normally dquots are written back via delayed write mechanisms.  They
are flushed to their backing buffer by xfssyncd, which is then
pushed out by either AIL or xfsbufd flushing. The flush from the
xfssyncd is supposed to be non-blocking, but xfs_qm_dqflush() always
waits for pinned duots, which means that it will block for the
length of time it takes to do a synchronous log force. This causes
unnecessary extra log I/O to be issued whenever we try to flush a
busy dquot.

Avoid the log forces and blocking xfssyncd by making xfs_qm_dqflush()
pay attention to what type of sync it is doing when it sees a pinned
dquot and not waiting when doing non-blocking flushes.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/quota/xfs_dquot.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/quota/xfs_dquot.c b/fs/xfs/quota/xfs_dquot.c
index d738d37..aa72162 100644
--- a/fs/xfs/quota/xfs_dquot.c
+++ b/fs/xfs/quota/xfs_dquot.c
@@ -1221,16 +1221,14 @@ xfs_qm_dqflush(
 	xfs_dqtrace_entry(dqp, "DQFLUSH");
 
 	/*
-	 * If not dirty, nada.
+	 * If not dirty, or it's pinned and we are not supposed to
+	 * block, nada.
 	 */
-	if (!XFS_DQ_IS_DIRTY(dqp)) {
+	if (!XFS_DQ_IS_DIRTY(dqp) ||
+	    (!(flags & XFS_QMOPT_SYNC) && dqp->q_pincount > 0)) {
 		xfs_dqfunlock(dqp);
-		return (0);
+		return 0;
 	}
-
-	/*
-	 * Cant flush a pinned dquot. Wait for it.
-	 */
 	xfs_qm_dqunpin_wait(dqp);
 
 	/*

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-26 11:28               ` Christoph Hellwig
@ 2008-09-29  3:08                 ` Lachlan McIlroy
  0 siblings, 0 replies; 34+ messages in thread
From: Lachlan McIlroy @ 2008-09-29  3:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Peter Leckie, xfs, xfs-dev

Christoph Hellwig wrote:
> On Fri, Sep 26, 2008 at 11:10:26AM +1000, Lachlan McIlroy wrote:
>> Good work Pete.  We should also consider replacing all calls to
>> wake_up_process() with wake_up() and a wait queue so we don't go
>> waking up threads when we shouldn't be.
> 
> No.  The daemons should not block anyway in these places, and using
> a waitqueue just causes additional locking overhead.
> 
> 

The daemons shouldn't block anymore in the code we are going to fix
but what about somewhere else?  Maybe in a memory allocation, semaphore,
mutex, etc... ?  Can you guarantee that there is no other code that does
not correctly handle waking up prematurely?

Just as it is prudent to be defensive and add a loop around the sv_wait()
we should also be prudent by not potentially causing this same problem in
some other buggy code somewhere else.  Using wait queues may add additional
locking overhead but if we are waking up threads that shouldn't be woken
up then we're wasting cycles on unnecessary context switches anyway.

Our customers wont notice if they lose a couple of cycles here or there but
they will notice deadlocks, corruption or panics.  And I would feel at ease
knowing this problem wont happen again.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
  2008-09-25  1:03         ` Dave Chinner
  2008-09-25  8:43           ` Peter Leckie
@ 2008-09-29 21:45           ` Christoph Hellwig
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2008-09-29 21:45 UTC (permalink / raw)
  To: Peter Leckie, xfs, xfs-dev

On Thu, Sep 25, 2008 at 11:03:18AM +1000, Dave Chinner wrote:
> This is a similar situation - if the sv_t is broken, we need to
> replace all users, not just work around one symptom of the
> brokenness. This is expecially important as the remaining users
> of sv_t's are in the log for iclog synchronisation....

The sv_t is not broken per se but a quite dangerous primitive as
people can easily miss re-checking the condition after a wakeup.

After a quick check 5 out of 10 callers of sv_wait re-check their
condition, and most do in a quite broad loop.  It might make sense
to just that lille bit of code and use prepare_wait / finish_wait
explicitly.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2008-09-29 21:43 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-24  4:28 [PATCH] Use atomic_t and wait_event to track dquot pincount Peter Leckie
2008-09-24  6:05 ` Dave Chinner
2008-09-24  6:53   ` Peter Leckie
2008-09-24  7:43     ` Dave Chinner
2008-09-24  7:26 ` [PATCH v2] " Peter Leckie
2008-09-24  7:42   ` Lachlan McIlroy
2008-09-24  7:46     ` Dave Chinner
2008-09-24  8:03       ` Lachlan McIlroy
2008-09-24 14:42         ` Christoph Hellwig
2008-09-24  8:15       ` Peter Leckie
2008-09-25  1:03         ` Dave Chinner
2008-09-25  8:43           ` Peter Leckie
2008-09-25  9:12             ` Christoph Hellwig
2008-09-26  0:34             ` Dave Chinner
2008-09-26  1:09               ` Peter Leckie
2008-09-26  1:26                 ` Lachlan McIlroy
2008-09-27  1:08                   ` Dave Chinner
2008-09-26  1:32               ` Lachlan McIlroy
2008-09-26  1:38                 ` Peter Leckie
2008-09-26  1:44                   ` Mark Goodwin
2008-09-26  1:54                     ` Peter Leckie
2008-09-26 11:31                   ` Christoph Hellwig
2008-09-26  2:57                 ` Dave Chinner
2008-09-26  3:38                   ` Lachlan McIlroy
2008-09-27  1:11                     ` Dave Chinner
2008-09-26 11:30                 ` Christoph Hellwig
2008-09-26 11:27               ` Christoph Hellwig
2008-09-27  1:18                 ` Dave Chinner
2008-09-26  1:10             ` Lachlan McIlroy
2008-09-26 11:28               ` Christoph Hellwig
2008-09-29  3:08                 ` Lachlan McIlroy
2008-09-29 21:45           ` Christoph Hellwig
2008-09-24 14:41       ` Christoph Hellwig
2008-09-25  1:08         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox