From: Lachlan McIlroy <lachlan@sgi.com>
To: Peter Leckie <pleckie@sgi.com>
Cc: xfs@oss.sgi.com, xfs-dev@sgi.com
Subject: Re: [PATCH v2] Use atomic_t and wait_event to track dquot pincount
Date: Wed, 24 Sep 2008 17:42:38 +1000 [thread overview]
Message-ID: <48D9EF6E.8010505@sgi.com> (raw)
In-Reply-To: <48D9EB8F.1070104@sgi.com>
Looks good Pete.
Peter Leckie wrote:
> During a heavy workload with quota's enabled the system can stall
>
> with every process requesting log space, however the log is full
> of dquots.The aild is trying to push the tail of the log however
> every item in the log had previously been pushed by the aild and was
> marked as clean so the aild aborts pushing the items in the log.
>
> The reason the log items are still in the log is because the iodone
> routine xfs_qm_dqflush_done detected that the log sequence number(lsn)
> had changed and assumed the log item had been re-dirtied so it should
> be left in the log. However the log item had not been re-dirtied so
> it was still marked as clean preventing it from being pushed again
> causing the log item to be stuck in the log. After a while the log
> eventually filled with these dquot log items.
>
> The reason the lsn had changed was due to it not being initialized by
> the time a copy of the lsn was taken in xfs_qm_dqflush(). The lsn was
> then latter updated causing the test in xfs_qm_dqflush_done() to fail.
>
> Synchronizations between the 2 functions is done by the pincount in
> struct xfs_dquot_t and xfs_qm_dqflush() calls xfs_qm_dqunpin_wait()
> to wait for the pincount == 0. However after been woken up it does not
> validate the pincount is actually 0, allowing a false wake up by the
> scheduler to cause xfs_qm_dqflush() to prematurely start processing
> the dquot.
>
> So this patch uses an atomic_t to track the pincount which allows us
> to easily use the wait_event macro to wait, this will guarantee that
> when we return from xfs_qm_dqunpin_wait() that the pincount == 0.
> We also remove the global qi_pinlock from xfs_quotainfo which may
> also reduce contention when pinning dquot's.
>
> Signed-off-by: Peter Leckie <pleckie@sgi.com>
>
>
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot_item.c 2008-09-24
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot_item.c 2008-09-24
> 16:36:06.748350869 +1000
> @@ -88,25 +88,22 @@ xfs_qm_dquot_logitem_format(
>
> /*
> * Increment the pin count of the given dquot.
> - * This value is protected by pinlock spinlock in the xQM structure.
> */
> STATIC void
> xfs_qm_dquot_logitem_pin(
> xfs_dq_logitem_t *logitem)
> {
> - xfs_dquot_t *dqp;
> + xfs_dquot_t *dqp = logitem->qli_dquot;
>
> - dqp = logitem->qli_dquot;
> ASSERT(XFS_DQ_IS_LOCKED(dqp));
> - spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> - dqp->q_pincount++;
> - spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> + atomic_inc(dqp->q_pincount);
> }
>
> /*
> * Decrement the pin count of the given dquot, and wake up
> * anyone in xfs_dqwait_unpin() if the count goes to 0. The
> - * dquot must have been previously pinned with a call to xfs_dqpin().
> + * dquot must have been previously pinned with a call to
> + * xfs_qm_dquot_logitem_pin().
> */
> /* ARGSUSED */
> STATIC void
> @@ -114,16 +111,11 @@ xfs_qm_dquot_logitem_unpin(
> xfs_dq_logitem_t *logitem,
> int stale)
> {
> - xfs_dquot_t *dqp;
> + xfs_dquot_t *dqp = logitem->qli_dquot;
>
> - dqp = logitem->qli_dquot;
> - ASSERT(dqp->q_pincount > 0);
> - spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> - dqp->q_pincount--;
> - if (dqp->q_pincount == 0) {
> - sv_broadcast(&dqp->q_pinwait);
> - }
> - spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> + ASSERT(atomic_read(&dqp->q_pincount) > 0);
> + if (atomic_dec_and_test(&dqp->q_pincount))
> + wake_up(&dqp->q_pinwait);
> }
>
> /* ARGSUSED */
> @@ -193,21 +185,14 @@ xfs_qm_dqunpin_wait(
> xfs_dquot_t *dqp)
> {
> ASSERT(XFS_DQ_IS_LOCKED(dqp));
> - if (dqp->q_pincount == 0) {
> + if (atomic_read(&dqp->q_pincount) == 0)
> return;
> - }
>
> /*
> * Give the log a push so we don't wait here too long.
> */
> xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
> - spin_lock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> - if (dqp->q_pincount == 0) {
> - spin_unlock(&(XFS_DQ_TO_QINF(dqp)->qi_pinlock));
> - return;
> - }
> - sv_wait(&(dqp->q_pinwait), PINOD,
> - &(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
> + wait_event(dqp->q_pinwait, (atomic_read(&dqp->q_pincount) == 0));
> }
>
> /*
> @@ -310,7 +295,7 @@ xfs_qm_dquot_logitem_trylock(
> uint retval;
>
> dqp = qip->qli_dquot;
> - if (dqp->q_pincount > 0)
> + if (atomic_read(&dqp->q_pincount) > 0)
> return (XFS_ITEM_PINNED);
>
> if (! xfs_qm_dqlock_nowait(dqp))
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.h 2008-09-24
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.h 2008-09-24 16:24:58.623139695
> +1000
> @@ -83,8 +83,8 @@ typedef struct xfs_dquot {
> xfs_qcnt_t q_res_rtbcount;/* total realtime blks used+reserved */
> mutex_t q_qlock; /* quota lock */
> struct completion q_flush; /* flush completion queue */
> - uint q_pincount; /* pin count for this dquot */
> - sv_t q_pinwait; /* sync var for pinning */
> + atomic_t q_pincount; /* dquot pin count */
> + wait_queue_head_t q_pinwait; /* dquot pinning wait queue */
> #ifdef XFS_DQUOT_TRACE
> struct ktrace *q_trace; /* trace header structure */
> #endif
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_dquot.c 2008-09-24
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_dquot.c 2008-09-24 16:24:58.623139695
> +1000
> @@ -101,7 +101,7 @@ xfs_qm_dqinit(
> if (brandnewdquot) {
> dqp->dq_flnext = dqp->dq_flprev = dqp;
> mutex_init(&dqp->q_qlock);
> - sv_init(&dqp->q_pinwait, SV_DEFAULT, "pdq");
> + init_waitqueue_head(&dqp->q_pinwait);
>
> /*
> * Because we want to use a counting completion, complete
> @@ -131,7 +131,7 @@ xfs_qm_dqinit(
> dqp->q_res_bcount = 0;
> dqp->q_res_icount = 0;
> dqp->q_res_rtbcount = 0;
> - dqp->q_pincount = 0;
> + atomic_set(&dqp->q_pincount, 0);
> dqp->q_hash = NULL;
> ASSERT(dqp->dq_flnext == dqp->dq_flprev);
>
> @@ -1489,7 +1489,7 @@ xfs_qm_dqpurge(
> "xfs_qm_dqpurge: dquot %p flush failed", dqp);
> xfs_dqflock(dqp);
> }
> - ASSERT(dqp->q_pincount == 0);
> + ASSERT(atomic_read(&dqp->q_pincount) == 0);
> ASSERT(XFS_FORCED_SHUTDOWN(mp) ||
> !(dqp->q_logitem.qli_item.li_flags & XFS_LI_IN_AIL));
>
> Index: 2.6.x-xfs/fs/xfs/xfsidbg.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/xfsidbg.c 2008-09-24 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/xfsidbg.c 2008-09-24 16:24:58.627139176 +1000
> @@ -6577,7 +6577,7 @@ xfsidbg_xqm_dquot(xfs_dquot_t *dqp)
> (unsigned long long)dqp->q_res_rtbcount);
> kdb_printf("qlock 0x%p &q_flush 0x%p (%d) pincount 0x%x\n",
> &dqp->q_qlock, &dqp->q_flush,
> - dqp->q_flush.done, dqp->q_pincount);
> + dqp->q_flush.done, atomic_read(&dqp->q_pincount));
> #ifdef XFS_DQUOT_TRACE
> qprintf("dqtrace 0x%p\n", dqp->q_trace);
> #endif
> @@ -6761,10 +6761,9 @@ xfsidbg_xqm_qinfo(xfs_mount_t *mp)
> return;
> }
>
> - kdb_printf("uqip 0x%p, gqip 0x%p, &pinlock 0x%p &dqlist 0x%p\n",
> + kdb_printf("uqip 0x%p, gqip 0x%p, &dqlist 0x%p\n",
> mp->m_quotainfo->qi_uquotaip,
> mp->m_quotainfo->qi_gquotaip,
> - &mp->m_quotainfo->qi_pinlock,
> &mp->m_quotainfo->qi_dqlist);
>
> kdb_printf("btmlimit 0x%x, itmlimit 0x%x, RTbtmlim 0x%x\n",
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.h
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.h 2008-09-24
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.h 2008-09-24 16:24:58.627139176 +1000
> @@ -106,7 +106,6 @@ typedef struct xfs_qm {
> typedef struct xfs_quotainfo {
> xfs_inode_t *qi_uquotaip; /* user quota inode */
> xfs_inode_t *qi_gquotaip; /* group quota inode */
> - spinlock_t qi_pinlock; /* dquot pinning lock */
> xfs_dqlist_t qi_dqlist; /* all dquots in filesys */
> int qi_dqreclaims; /* a change here indicates
> a removal in the dqlist */
> Index: 2.6.x-xfs/fs/xfs/quota/xfs_qm.c
> ===================================================================
> --- 2.6.x-xfs.orig/fs/xfs/quota/xfs_qm.c 2008-09-24
> 14:45:29.270573668 +1000
> +++ 2.6.x-xfs/fs/xfs/quota/xfs_qm.c 2008-09-24 16:24:58.631138657 +1000
> @@ -1137,7 +1137,6 @@ xfs_qm_init_quotainfo(
> return error;
> }
>
> - spin_lock_init(&qinf->qi_pinlock);
> xfs_qm_list_init(&qinf->qi_dqlist, "mpdqlist", 0);
> qinf->qi_dqreclaims = 0;
>
> @@ -1234,7 +1233,6 @@ xfs_qm_destroy_quotainfo(
> */
> xfs_qm_rele_quotafs_ref(mp);
>
> - spinlock_destroy(&qi->qi_pinlock);
> xfs_qm_list_destroy(&qi->qi_dqlist);
>
> if (qi->qi_uquotaip) {
>
>
>
next prev parent reply other threads:[~2008-09-24 7:32 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-24 4:28 [PATCH] Use atomic_t and wait_event to track dquot pincount Peter Leckie
2008-09-24 6:05 ` Dave Chinner
2008-09-24 6:53 ` Peter Leckie
2008-09-24 7:43 ` Dave Chinner
2008-09-24 7:26 ` [PATCH v2] " Peter Leckie
2008-09-24 7:42 ` Lachlan McIlroy [this message]
2008-09-24 7:46 ` Dave Chinner
2008-09-24 8:03 ` Lachlan McIlroy
2008-09-24 14:42 ` Christoph Hellwig
2008-09-24 8:15 ` Peter Leckie
2008-09-25 1:03 ` Dave Chinner
2008-09-25 8:43 ` Peter Leckie
2008-09-25 9:12 ` Christoph Hellwig
2008-09-26 0:34 ` Dave Chinner
2008-09-26 1:09 ` Peter Leckie
2008-09-26 1:26 ` Lachlan McIlroy
2008-09-27 1:08 ` Dave Chinner
2008-09-26 1:32 ` Lachlan McIlroy
2008-09-26 1:38 ` Peter Leckie
2008-09-26 1:44 ` Mark Goodwin
2008-09-26 1:54 ` Peter Leckie
2008-09-26 11:31 ` Christoph Hellwig
2008-09-26 2:57 ` Dave Chinner
2008-09-26 3:38 ` Lachlan McIlroy
2008-09-27 1:11 ` Dave Chinner
2008-09-26 11:30 ` Christoph Hellwig
2008-09-26 11:27 ` Christoph Hellwig
2008-09-27 1:18 ` Dave Chinner
2008-09-26 1:10 ` Lachlan McIlroy
2008-09-26 11:28 ` Christoph Hellwig
2008-09-29 3:08 ` Lachlan McIlroy
2008-09-29 21:45 ` Christoph Hellwig
2008-09-24 14:41 ` Christoph Hellwig
2008-09-25 1:08 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48D9EF6E.8010505@sgi.com \
--to=lachlan@sgi.com \
--cc=pleckie@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox