From: NeilBrown <neilb@suse.de>
To: Ben Myers <bpm@sgi.com>
Cc: nfbrown@suse.com, Alex Elder <elder@linux.com>,
xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
ataschner@novell.com
Subject: Re: [REVIEW] xfssyncd lost wakes circa 2.6.32
Date: Thu, 24 Nov 2011 06:42:16 +1100 [thread overview]
Message-ID: <20111124064216.5ff54077@notabene.brown> (raw)
In-Reply-To: <20111123163046.GQ29840@sgi.com>
[-- Attachment #1.1: Type: text/plain, Size: 5264 bytes --]
On Wed, 23 Nov 2011 10:30:46 -0600 Ben Myers <bpm@sgi.com> wrote:
> Hi,
>
> I'd like to request a review for this patch. This is related to ENOSPC
> condition and also project quotas, where we call xfs_flush_inodes from
> xfs_iomap_write_delay. Neil and Andreas did some very heavy lifting on
> this bug (suse 722910) and found that there is a repeatable ~30ish
> second delay in xfs_write that is related to xfssyncd at ENOSPC. From
> there I captured the interaction in this trace:
>
> Nov 22 15:06:39 nfs4 kernel: [ 478.757337] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757338] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757341] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:39 nfs4 kernel: [ 478.757344] 1767: xfssyncd awake
> Nov 22 15:06:39 nfs4 kernel: [ 478.757346] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757352] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757357] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757367] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 478.757368] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 478.757370] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757394] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 508.708008] 830: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [ 508.708011] 830 xfs_sync_worker (sda3) start
> Nov 22 15:06:40 nfs4 kernel: [ 508.708016] 830 xfs_sync_worker (sda3) end
> Nov 22 15:06:40 nfs4 kernel: [ 508.708018] 830: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 514.664300] 1767: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [ 514.664303] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 514.664317] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664322] 1767 xfs_sync_worker (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 514.664324] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664330] 1767 xfs_sync_worker (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664332] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 514.664349] 5091ef25 35s 907016877s
> ^^^ xid ^^^^^^ service time delay in nfsd_vfs_write
>
> Note that xfssyncd was going to sleep at 478.757394, even though work
> had just been queued. It looks to me like xfs_syncd_queue_work can try
> to wake xfssyncd when it's already running, and xfssyncd can
> subsequently go back to sleep, holding off the xfs_flush_inodes_work
> until the timer pops again. David has already rewritten this subsystem
> using work queues, but I'd rather fix this very specific issue for
> support purposes than backport a new implementation.
>
> To fix this we need to check m_sync_list under lock and only sleep if it
> is empty. Also set TASK_INTERRUPTIBLE before the check so that if we're
> woken we won't sleep either. This is discussed here:
>
> http://www.linuxjournal.com/node/8144/print
>
> I'm also adding work items to the tail of the temp list so that they are
> processed in the order they were added. My testing of this patch shows
> that the ~30s delay is gone, but I did see a ~2s delay in there
> occasionally.
>
> Thanks!
> -Ben
>
> Index: linux/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
> +++ linux/fs/xfs/linux-2.6/xfs_sync.c
> @@ -620,13 +620,25 @@ xfssyncd(
> set_freezable();
> timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
> for (;;) {
> - timeleft = schedule_timeout_interruptible(timeleft);
> - /* swsusp */
> - try_to_freeze();
> - if (kthread_should_stop() && list_empty(&mp->m_sync_list))
> + set_current_state(TASK_INTERRUPTIBLE);
> + spin_lock(&mp->m_sync_lock);
> +
> + if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
> + spin_unlock(&mp->m_sync_lock);
> +
> + timeleft = schedule_timeout_interruptible(timeleft);
This should be just "schedule_timeout(timeleft)".
This call sets TASK_INTERRUPTIBLE so we will go to sleep even if we were
only just woken up.
I don't really know the XFS code well enough to the rest looks right, but
with that small fix it certainly doesn't look wrong :-)
NeilBrown
> + /* swsusp */
> + try_to_freeze();
> +
> + spin_lock(&mp->m_sync_lock);
> + }
> + set_current_state(TASK_RUNNING);
> +
> + if (kthread_should_stop() && list_empty(&mp->m_sync_list)) {
> + spin_unlock(&mp->m_sync_lock);
> break;
> + }
>
> - spin_lock(&mp->m_sync_lock);
> /*
> * We can get woken by laptop mode, to do a sync -
> * that's the (only!) case where the list would be
> @@ -641,7 +653,7 @@ xfssyncd(
> &mp->m_sync_list);
> }
> list_for_each_entry_safe(work, n, &mp->m_sync_list, w_list)
> - list_move(&work->w_list, &tmp);
> + list_move_tail(&work->w_list, &tmp);
> spin_unlock(&mp->m_sync_lock);
>
> list_for_each_entry_safe(work, n, &tmp, w_list) {
>
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-11-23 19:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-23 16:30 [REVIEW] xfssyncd lost wakes circa 2.6.32 Ben Myers
2011-11-23 19:42 ` NeilBrown [this message]
2011-11-23 20:30 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111124064216.5ff54077@notabene.brown \
--to=neilb@suse.de \
--cc=ataschner@novell.com \
--cc=bpm@sgi.com \
--cc=elder@linux.com \
--cc=hch@infradead.org \
--cc=nfbrown@suse.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox