* [REVIEW] xfssyncd lost wakes circa 2.6.32
@ 2011-11-23 16:30 Ben Myers
2011-11-23 19:42 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Ben Myers @ 2011-11-23 16:30 UTC (permalink / raw)
To: Dave Chinner, Christoph Hellwig, Alex Elder; +Cc: ataschner, nfbrown, xfs
Hi,
I'd like to request a review for this patch. This is related to ENOSPC
condition and also project quotas, where we call xfs_flush_inodes from
xfs_iomap_write_delay. Neil and Andreas did some very heavy lifting on
this bug (suse 722910) and found that there is a repeatable ~30ish
second delay in xfs_write that is related to xfssyncd at ENOSPC. From
there I captured the interaction in this trace:
Nov 22 15:06:39 nfs4 kernel: [ 478.757337] 5571: xfs_flush_inodes (sdb1) start
Nov 22 15:06:39 nfs4 kernel: [ 478.757338] 5571: xfs_syncd_queue_work (sdb1) start
Nov 22 15:06:39 nfs4 kernel: [ 478.757341] 5571: xfs_syncd_queue_work (sdb1) end
Nov 22 15:06:39 nfs4 kernel: [ 478.757344] 1767: xfssyncd awake
Nov 22 15:06:39 nfs4 kernel: [ 478.757346] 1767: xfs_flush_inodes_work (sdb1) start
Nov 22 15:06:39 nfs4 kernel: [ 478.757352] 1767: xfs_flush_inodes_work (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 478.757357] 5571: xfs_flush_inodes (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 478.757367] 5571: xfs_flush_inodes (sdb1) start
Nov 22 15:06:40 nfs4 kernel: [ 478.757368] 5571: xfs_syncd_queue_work (sdb1) start
Nov 22 15:06:40 nfs4 kernel: [ 478.757370] 5571: xfs_syncd_queue_work (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 478.757394] 1767: xfssyncd go to sleep
Nov 22 15:06:40 nfs4 kernel: [ 508.708008] 830: xfssyncd awake
Nov 22 15:06:40 nfs4 kernel: [ 508.708011] 830 xfs_sync_worker (sda3) start
Nov 22 15:06:40 nfs4 kernel: [ 508.708016] 830 xfs_sync_worker (sda3) end
Nov 22 15:06:40 nfs4 kernel: [ 508.708018] 830: xfssyncd go to sleep
Nov 22 15:06:40 nfs4 kernel: [ 514.664300] 1767: xfssyncd awake
Nov 22 15:06:40 nfs4 kernel: [ 514.664303] 1767: xfs_flush_inodes_work (sdb1) start
Nov 22 15:06:40 nfs4 kernel: [ 514.664317] 1767: xfs_flush_inodes_work (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 514.664322] 1767 xfs_sync_worker (sdb1) start
Nov 22 15:06:40 nfs4 kernel: [ 514.664324] 5571: xfs_flush_inodes (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 514.664330] 1767 xfs_sync_worker (sdb1) end
Nov 22 15:06:40 nfs4 kernel: [ 514.664332] 1767: xfssyncd go to sleep
Nov 22 15:06:40 nfs4 kernel: [ 514.664349] 5091ef25 35s 907016877s
^^^ xid ^^^^^^ service time delay in nfsd_vfs_write
Note that xfssyncd was going to sleep at 478.757394, even though work
had just been queued. It looks to me like xfs_syncd_queue_work can try
to wake xfssyncd when it's already running, and xfssyncd can
subsequently go back to sleep, holding off the xfs_flush_inodes_work
until the timer pops again. David has already rewritten this subsystem
using work queues, but I'd rather fix this very specific issue for
support purposes than backport a new implementation.
To fix this we need to check m_sync_list under lock and only sleep if it
is empty. Also set TASK_INTERRUPTIBLE before the check so that if we're
woken we won't sleep either. This is discussed here:
http://www.linuxjournal.com/node/8144/print
I'm also adding work items to the tail of the temp list so that they are
processed in the order they were added. My testing of this patch shows
that the ~30s delay is gone, but I did see a ~2s delay in there
occasionally.
Thanks!
-Ben
Index: linux/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
+++ linux/fs/xfs/linux-2.6/xfs_sync.c
@@ -620,13 +620,25 @@ xfssyncd(
set_freezable();
timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
for (;;) {
- timeleft = schedule_timeout_interruptible(timeleft);
- /* swsusp */
- try_to_freeze();
- if (kthread_should_stop() && list_empty(&mp->m_sync_list))
+ set_current_state(TASK_INTERRUPTIBLE);
+ spin_lock(&mp->m_sync_lock);
+
+ if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
+ spin_unlock(&mp->m_sync_lock);
+
+ timeleft = schedule_timeout_interruptible(timeleft);
+ /* swsusp */
+ try_to_freeze();
+
+ spin_lock(&mp->m_sync_lock);
+ }
+ set_current_state(TASK_RUNNING);
+
+ if (kthread_should_stop() && list_empty(&mp->m_sync_list)) {
+ spin_unlock(&mp->m_sync_lock);
break;
+ }
- spin_lock(&mp->m_sync_lock);
/*
* We can get woken by laptop mode, to do a sync -
* that's the (only!) case where the list would be
@@ -641,7 +653,7 @@ xfssyncd(
&mp->m_sync_list);
}
list_for_each_entry_safe(work, n, &mp->m_sync_list, w_list)
- list_move(&work->w_list, &tmp);
+ list_move_tail(&work->w_list, &tmp);
spin_unlock(&mp->m_sync_lock);
list_for_each_entry_safe(work, n, &tmp, w_list) {
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [REVIEW] xfssyncd lost wakes circa 2.6.32
2011-11-23 16:30 [REVIEW] xfssyncd lost wakes circa 2.6.32 Ben Myers
@ 2011-11-23 19:42 ` NeilBrown
2011-11-23 20:30 ` Ben Myers
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2011-11-23 19:42 UTC (permalink / raw)
To: Ben Myers; +Cc: nfbrown, Alex Elder, xfs, Christoph Hellwig, ataschner
[-- Attachment #1.1: Type: text/plain, Size: 5264 bytes --]
On Wed, 23 Nov 2011 10:30:46 -0600 Ben Myers <bpm@sgi.com> wrote:
> Hi,
>
> I'd like to request a review for this patch. This is related to ENOSPC
> condition and also project quotas, where we call xfs_flush_inodes from
> xfs_iomap_write_delay. Neil and Andreas did some very heavy lifting on
> this bug (suse 722910) and found that there is a repeatable ~30ish
> second delay in xfs_write that is related to xfssyncd at ENOSPC. From
> there I captured the interaction in this trace:
>
> Nov 22 15:06:39 nfs4 kernel: [ 478.757337] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757338] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757341] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:39 nfs4 kernel: [ 478.757344] 1767: xfssyncd awake
> Nov 22 15:06:39 nfs4 kernel: [ 478.757346] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [ 478.757352] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757357] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757367] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 478.757368] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 478.757370] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 478.757394] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 508.708008] 830: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [ 508.708011] 830 xfs_sync_worker (sda3) start
> Nov 22 15:06:40 nfs4 kernel: [ 508.708016] 830 xfs_sync_worker (sda3) end
> Nov 22 15:06:40 nfs4 kernel: [ 508.708018] 830: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 514.664300] 1767: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [ 514.664303] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 514.664317] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664322] 1767 xfs_sync_worker (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [ 514.664324] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664330] 1767 xfs_sync_worker (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [ 514.664332] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [ 514.664349] 5091ef25 35s 907016877s
> ^^^ xid ^^^^^^ service time delay in nfsd_vfs_write
>
> Note that xfssyncd was going to sleep at 478.757394, even though work
> had just been queued. It looks to me like xfs_syncd_queue_work can try
> to wake xfssyncd when it's already running, and xfssyncd can
> subsequently go back to sleep, holding off the xfs_flush_inodes_work
> until the timer pops again. David has already rewritten this subsystem
> using work queues, but I'd rather fix this very specific issue for
> support purposes than backport a new implementation.
>
> To fix this we need to check m_sync_list under lock and only sleep if it
> is empty. Also set TASK_INTERRUPTIBLE before the check so that if we're
> woken we won't sleep either. This is discussed here:
>
> http://www.linuxjournal.com/node/8144/print
>
> I'm also adding work items to the tail of the temp list so that they are
> processed in the order they were added. My testing of this patch shows
> that the ~30s delay is gone, but I did see a ~2s delay in there
> occasionally.
>
> Thanks!
> -Ben
>
> Index: linux/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
> +++ linux/fs/xfs/linux-2.6/xfs_sync.c
> @@ -620,13 +620,25 @@ xfssyncd(
> set_freezable();
> timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
> for (;;) {
> - timeleft = schedule_timeout_interruptible(timeleft);
> - /* swsusp */
> - try_to_freeze();
> - if (kthread_should_stop() && list_empty(&mp->m_sync_list))
> + set_current_state(TASK_INTERRUPTIBLE);
> + spin_lock(&mp->m_sync_lock);
> +
> + if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
> + spin_unlock(&mp->m_sync_lock);
> +
> + timeleft = schedule_timeout_interruptible(timeleft);
This should be just "schedule_timeout(timeleft)".
This call sets TASK_INTERRUPTIBLE so we will go to sleep even if we were
only just woken up.
I don't really know the XFS code well enough to the rest looks right, but
with that small fix it certainly doesn't look wrong :-)
NeilBrown
> + /* swsusp */
> + try_to_freeze();
> +
> + spin_lock(&mp->m_sync_lock);
> + }
> + set_current_state(TASK_RUNNING);
> +
> + if (kthread_should_stop() && list_empty(&mp->m_sync_list)) {
> + spin_unlock(&mp->m_sync_lock);
> break;
> + }
>
> - spin_lock(&mp->m_sync_lock);
> /*
> * We can get woken by laptop mode, to do a sync -
> * that's the (only!) case where the list would be
> @@ -641,7 +653,7 @@ xfssyncd(
> &mp->m_sync_list);
> }
> list_for_each_entry_safe(work, n, &mp->m_sync_list, w_list)
> - list_move(&work->w_list, &tmp);
> + list_move_tail(&work->w_list, &tmp);
> spin_unlock(&mp->m_sync_lock);
>
> list_for_each_entry_safe(work, n, &tmp, w_list) {
>
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [REVIEW] xfssyncd lost wakes circa 2.6.32
2011-11-23 19:42 ` NeilBrown
@ 2011-11-23 20:30 ` Ben Myers
0 siblings, 0 replies; 3+ messages in thread
From: Ben Myers @ 2011-11-23 20:30 UTC (permalink / raw)
To: NeilBrown; +Cc: nfbrown, Alex Elder, xfs, Christoph Hellwig, ataschner
On Thu, Nov 24, 2011 at 06:42:16AM +1100, NeilBrown wrote:
> On Wed, 23 Nov 2011 10:30:46 -0600 Ben Myers <bpm@sgi.com> wrote:
> > Index: linux/fs/xfs/linux-2.6/xfs_sync.c
> > ===================================================================
> > --- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
> > +++ linux/fs/xfs/linux-2.6/xfs_sync.c
> > @@ -620,13 +620,25 @@ xfssyncd(
> > set_freezable();
> > timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
> > for (;;) {
> > - timeleft = schedule_timeout_interruptible(timeleft);
> > - /* swsusp */
> > - try_to_freeze();
> > - if (kthread_should_stop() && list_empty(&mp->m_sync_list))
> > + set_current_state(TASK_INTERRUPTIBLE);
> > + spin_lock(&mp->m_sync_lock);
> > +
> > + if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
> > + spin_unlock(&mp->m_sync_lock);
> > +
> > + timeleft = schedule_timeout_interruptible(timeleft);
>
> This should be just "schedule_timeout(timeleft)".
> This call sets TASK_INTERRUPTIBLE so we will go to sleep even if we were
> only just woken up.
>
> I don't really know the XFS code well enough to the rest looks right, but
> with that small fix it certainly doesn't look wrong :-)
Thanks Neil! I'll get that fixed up. ;)
-Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-11-23 20:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-23 16:30 [REVIEW] xfssyncd lost wakes circa 2.6.32 Ben Myers
2011-11-23 19:42 ` NeilBrown
2011-11-23 20:30 ` Ben Myers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox