From: Alex Elder <aelder@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 02/10] xfs: Use delayed write for inodes rather than async V2
Date: Fri, 05 Feb 2010 15:38:44 -0600 [thread overview]
Message-ID: <1265405924.2714.117.camel@doink1> (raw)
In-Reply-To: <1265153104-29680-3-git-send-email-david@fromorbit.com>
On Wed, 2010-02-03 at 10:24 +1100, Dave Chinner wrote:
> We currently do background inode flush asynchronously, resulting in
> inodes being written in whatever order the background writeback
> issues them. Not only that, there are also blocking and non-blocking
> asynchronous inode flushes, depending on where the flush comes from.
>
> This patch completely removes asynchronous inode writeback. It
> removes all the strange writeback modes and replaces them with
> either a synchronous flush or a non-blocking delayed write flush.
> That is, inode flushes will only issue IO directly if they are
> synchronous, and background flushing may do nothing if the operation
> would block (e.g. on a pinned inode or buffer lock).
>
> Delayed write flushes will now result in the inode buffer sitting in
> the delwri queue of the buffer cache to be flushed by either an AIL
> push or by the xfsbufd timing out the buffer. This will allow
> accumulation of dirty inode buffers in memory and allow optimisation
> of inode cluster writeback at the xfsbufd level where we have much
> greater queue depths than the block layer elevators. We will also
> get adjacent inode cluster buffer IO merging for free when a later
> patch in the series allows sorting of the delayed write buffers
> before dispatch.
>
> This effectively means that any inode that is written back by
> background writeback will be seen as flush locked during AIL
> pushing, and will result in the buffers being pushed from there.
> This writeback path is currently non-optimal, but the next patch
> in the series will fix that problem.
>
> A side effect of this delayed write mechanism is that background
> inode reclaim will no longer directly flush inodes, nor can it wait
> on the flush lock. The result is that inode reclaim must leave the
> inode in the reclaimable state until it is clean. Hence attempts to
> reclaim a dirty inode in the background will simply skip the inode
> until it is clean and this allows other mechanisms (i.e. xfsbufd) to
> do more optimal writeback of the dirty buffers. As a result, the
> inode reclaim code has been rewritten so that it no longer relies on
> the ambiguous return values of xfs_iflush() to determine whether it
> is safe to reclaim an inode.
>
> Portions of this patch are derived from patches by Christoph
> Hellwig.
>
> Version 2:
> - cleanup reclaim code as suggested by Christoph
> - log background reclaim inode flush errors
> - just pass sync flags to xfs_iflush
I now see that the missing "break;" in the previous patch
doesn't matter as long as this patch is also applied...
This change looks right to me, and it's pretty cool
to see where you're headed with it. I wasn't quite
following a few weeks back when you and Christoph
were discussing this.
I have a few comments on the comments, below. Maybe
you could either confirm or correct what I say below.
> Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
> ---
> fs/xfs/linux-2.6/xfs_super.c | 4 +-
> fs/xfs/linux-2.6/xfs_sync.c | 104 ++++++++++++++++++++++++++++++-----------
> fs/xfs/xfs_inode.c | 75 ++----------------------------
> fs/xfs/xfs_inode.h | 10 ----
> fs/xfs/xfs_inode_item.c | 10 +++-
> fs/xfs/xfs_mount.c | 13 +++++-
> 6 files changed, 102 insertions(+), 114 deletions(-)
>
. . .
> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
> index 98b8937..a9f6d20 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.c
> +++ b/fs/xfs/linux-2.6/xfs_sync.c
. . .
> @@ -719,21 +720,42 @@ __xfs_inode_clear_reclaim_tag(
Regardless of the state, if the inode has a flush underway
we requeue.
> * shutdown EIO unpin and reclaim
> * clean, unpinned 0 reclaim
> * stale, unpinned 0 reclaim
> - * clean, pinned(*) 0 unpin and reclaim
> - * stale, pinned 0 unpin and reclaim
> - * dirty, async 0 block on flush lock, reclaim
> - * dirty, sync flush 0 block on flush lock, reclaim
> + * clean, pinned(*) 0 requeue
> + * stale, pinned EAGAIN requeue
Actually, if it's pinned (clean or stale) then we either
unpin and reclaim (if synchronous) or requeue (otherwise),
i.e.:
* clean, pinned, async(*) N/A requeue
* stale, pinned, async(*) N/A requeue
* clean, pinned, sync flush 0 unpin and reclaim
* stale, pinned, sync flush EAGAIN unpin and reclaim
> + * dirty, delwri ok 0 requeue
> + * dirty, delwri blocked EAGAIN requeue
> + * dirty, sync flush 0 reclaim
> *
> * (*) dgc: I don't think the clean, pinned state is possible but it gets
> * handled anyway given the order of checks implemented.
> *
> + * As can be seen from the table, the return value of xfs_iflush() is not
> + * sufficient to correctly decide the reclaim action here. The checks in
> + * xfs_iflush() might look like duplicates, but they are not.
> + *
> + * Also, because we get the flush lock first, we know that any inode that has
> + * been flushed delwri has had the flush completed by the time we check that
> + * the inode is clean. The clean inode check needs to be done before flushing
> + * the inode delwri otherwise we would loop forever requeuing clean inodes as
> + * we cannot tell apart a successful delwri flush and a clean inode from the
> + * return value of xfs_iflush().
> + *
> + * Note that because the inode is flushed delayed write by background
> + * writeback, the flush lock may already be held here and waiting on it can
> + * result in very long latencies. Hence for sync reclaims, where we wait on the
> + * flush lock, the caller should push out delayed write inodes first before
> + * trying to reclaim them to minimise the amount of time spent waiting. For
> + * background relaim, we just requeue the inode for the next pass.
> + *
> * Hence the order of actions after gaining the locks should be:
> * bad => reclaim
> * shutdown => unpin and reclaim
> - * pinned => unpin
> + * pinned, delwri => requeue
> + * pinned, sync => unpin
> * stale => reclaim
> * clean => reclaim
> - * dirty => flush, wait and reclaim
> + * dirty, delwri => flush and requeue
> + * dirty, sync => flush, wait and reclaim
> */
> STATIC int
> xfs_reclaim_inode(
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-02-05 21:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-02 23:24 [PATCH 0/10] Delayed write metadata writeback V4 Dave Chinner
2010-02-02 23:24 ` [PATCH 01/10] xfs: Make inode reclaim states explicit Dave Chinner
2010-02-05 19:06 ` Alex Elder
2010-02-06 0:07 ` Dave Chinner
2010-02-02 23:24 ` [PATCH 02/10] xfs: Use delayed write for inodes rather than async V2 Dave Chinner
2010-02-03 11:17 ` Christoph Hellwig
2010-02-05 21:38 ` Alex Elder [this message]
2010-02-02 23:24 ` [PATCH 03/10] xfs: Don't issue buffer IO direct from AIL push V2 Dave Chinner
2010-02-05 22:51 ` Alex Elder
2010-02-02 23:24 ` [PATCH 04/10] xfs: Sort delayed write buffers before dispatch Dave Chinner
2010-02-05 23:53 ` Alex Elder
2010-02-02 23:24 ` [PATCH 05/10] xfs: Use delay write promotion for dquot flushing Dave Chinner
2010-02-05 23:55 ` Alex Elder
2010-02-02 23:25 ` [PATCH 06/10] xfs: kill the unused XFS_QMOPT_* flush flags V2 Dave Chinner
2010-02-03 11:17 ` Christoph Hellwig
2010-02-02 23:25 ` [PATCH 07/10] xfs: remove invalid barrier optimization from xfs_fsync Dave Chinner
2010-02-02 23:25 ` [PATCH 08/10] xfs: move the inode locking outside xfs_fsync() Dave Chinner
2010-02-03 11:29 ` Christoph Hellwig
2010-02-03 23:08 ` Dave Chinner
2010-02-04 16:07 ` Christoph Hellwig
2010-02-02 23:25 ` [PATCH 09/10] xfs: xfs_fs_write_inode() can fail to write inodes synchronously V2 Dave Chinner
2010-02-03 11:27 ` Christoph Hellwig
2010-02-03 18:07 ` bpm
2010-02-03 20:55 ` Christoph Hellwig
2010-02-03 20:56 ` Christoph Hellwig
2010-02-03 23:02 ` Dave Chinner
2010-02-04 17:36 ` Christoph Hellwig
2010-02-02 23:25 ` [PATCH 10/10] xfs: kill xfs_bawrite Dave Chinner
2010-02-03 11:19 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1265405924.2714.117.camel@doink1 \
--to=aelder@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox