All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Ben Myers <bpm@sgi.com>
Cc: nfbrown@suse.com, Alex Elder <elder@linux.com>,
	xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
	ataschner@novell.com
Subject: Re: [REVIEW] xfssyncd lost wakes circa 2.6.32
Date: Thu, 24 Nov 2011 06:42:16 +1100	[thread overview]
Message-ID: <20111124064216.5ff54077@notabene.brown> (raw)
In-Reply-To: <20111123163046.GQ29840@sgi.com>


[-- Attachment #1.1: Type: text/plain, Size: 5264 bytes --]

On Wed, 23 Nov 2011 10:30:46 -0600 Ben Myers <bpm@sgi.com> wrote:

> Hi,
> 
> I'd like to request a review for this patch.  This is related to ENOSPC
> condition and also project quotas, where we call xfs_flush_inodes from
> xfs_iomap_write_delay.  Neil and Andreas did some very heavy lifting on
> this bug (suse 722910) and found that there is a repeatable ~30ish
> second delay in xfs_write that is related to xfssyncd at ENOSPC.  From
> there I captured the interaction in this trace:
> 
> Nov 22 15:06:39 nfs4 kernel: [  478.757337] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [  478.757338] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [  478.757341] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:39 nfs4 kernel: [  478.757344] 1767: xfssyncd awake
> Nov 22 15:06:39 nfs4 kernel: [  478.757346] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [  478.757352] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  478.757357] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  478.757367] 5571: xfs_flush_inodes (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  478.757368] 5571: xfs_syncd_queue_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  478.757370] 5571: xfs_syncd_queue_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  478.757394] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  508.708008] 830: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [  508.708011] 830 xfs_sync_worker (sda3) start
> Nov 22 15:06:40 nfs4 kernel: [  508.708016] 830 xfs_sync_worker (sda3) end
> Nov 22 15:06:40 nfs4 kernel: [  508.708018] 830: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  514.664300] 1767: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [  514.664303] 1767: xfs_flush_inodes_work (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  514.664317] 1767: xfs_flush_inodes_work (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664322] 1767 xfs_sync_worker (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  514.664324] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664330] 1767 xfs_sync_worker (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664332] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  514.664349] 5091ef25 35s 907016877s
>                                             ^^^ xid  ^^^^^^ service time delay in nfsd_vfs_write
> 
> Note that xfssyncd was going to sleep at 478.757394, even though work
> had just been queued.  It looks to me like xfs_syncd_queue_work can try
> to wake xfssyncd when it's already running, and xfssyncd can
> subsequently go back to sleep, holding off the xfs_flush_inodes_work
> until the timer pops again.  David has already rewritten this subsystem
> using work queues, but I'd rather fix this very specific issue for
> support purposes than backport a new implementation.
> 
> To fix this we need to check m_sync_list under lock and only sleep if it
> is empty.  Also set TASK_INTERRUPTIBLE before the check so that if we're
> woken we won't sleep either.  This is discussed here:
> 
> http://www.linuxjournal.com/node/8144/print
> 
> I'm also adding work items to the tail of the temp list so that they are
> processed in the order they were added.  My testing of this patch shows
> that the ~30s delay is gone, but I did see a ~2s delay in there
> occasionally.
> 
> Thanks!
> -Ben
> 
> Index: linux/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
> +++ linux/fs/xfs/linux-2.6/xfs_sync.c
> @@ -620,13 +620,25 @@ xfssyncd(
>  	set_freezable();
>  	timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
>  	for (;;) {
> -		timeleft = schedule_timeout_interruptible(timeleft);
> -		/* swsusp */
> -		try_to_freeze();
> -		if (kthread_should_stop() && list_empty(&mp->m_sync_list))
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		spin_lock(&mp->m_sync_lock);
> +
> +		if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
> +			spin_unlock(&mp->m_sync_lock);
> +
> +			timeleft = schedule_timeout_interruptible(timeleft);

This should be just "schedule_timeout(timeleft)".
This call sets TASK_INTERRUPTIBLE so we will go to sleep even if we were
only just woken up.

I don't really know the XFS code well enough to the rest looks right, but
with that small fix it certainly doesn't look wrong :-)

NeilBrown


> +			/* swsusp */
> +			try_to_freeze();
> +
> +			spin_lock(&mp->m_sync_lock);
> +		}
> +		set_current_state(TASK_RUNNING);
> +
> +		if (kthread_should_stop() && list_empty(&mp->m_sync_list)) {
> +			spin_unlock(&mp->m_sync_lock);
>  			break;
> +		}
>  
> -		spin_lock(&mp->m_sync_lock);
>  		/*
>  		 * We can get woken by laptop mode, to do a sync -
>  		 * that's the (only!) case where the list would be
> @@ -641,7 +653,7 @@ xfssyncd(
>  					&mp->m_sync_list);
>  		}
>  		list_for_each_entry_safe(work, n, &mp->m_sync_list, w_list)
> -			list_move(&work->w_list, &tmp);
> +			list_move_tail(&work->w_list, &tmp);
>  		spin_unlock(&mp->m_sync_lock);
>  
>  		list_for_each_entry_safe(work, n, &tmp, w_list) {
> 
> 


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-11-23 19:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-23 16:30 [REVIEW] xfssyncd lost wakes circa 2.6.32 Ben Myers
2011-11-23 19:42 ` NeilBrown [this message]
2011-11-23 20:30   ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111124064216.5ff54077@notabene.brown \
    --to=neilb@suse.de \
    --cc=ataschner@novell.com \
    --cc=bpm@sgi.com \
    --cc=elder@linux.com \
    --cc=hch@infradead.org \
    --cc=nfbrown@suse.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.