Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: Curt Wohlgemuth <curtw@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	fengguang.wu@intel.com
Subject: Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr
Date: Wed, 29 Jun 2011 04:11:55 -0400	[thread overview]
Message-ID: <20110629081155.GA5558@infradead.org> (raw)
In-Reply-To: <BANLkTimKtwVZo3D+w9AB2-C83d08YROcMw@mail.gmail.com>

On Tue, Jun 28, 2011 at 06:56:41PM -0700, Curt Wohlgemuth wrote:
> I don't quite understand this.  It's true that all IO done as a result
> of calling wb_writeback() on this work item won't finish before the
> completion takes place, but sending all those pages in flight *will*
> take place.  And that's a lot of time.  To wait on this before we then
> call sync_inodes_sb(), and do it all over again, seems odd at best.
> 
> Pre-2.6.35 kernels would start non-integrity sync writeback and
> immediately return, which would seem like a reasonable "prefetch-y"
> thing to do, considering it's going to be immediately followed by a
> data integrity sync writeback operation.

The only old kernel I have around is 2.6.16, so looking at that one
for old semantics:

 - it first does a wakeup_pdflush() which does a truely asynchronous
   writeback of everything in the system.  This is replaced with a
   wakeup_flusher_threads(0) in 3.0, which has pretty similar sematics.
 - after that things change a bit as we reordered sync quite a bit too
   a) improve data integrity by properly ordering the steps of the sync
   and b) shared code between the global sync and per-fs sync as
   used by umount, sys_syncfs and a few other callers.  But both do
   an semi-sync pass and a sync pass on per-sb writeback.  For the old
   code it's done from the calling threads context, while the next code
   moved it to the flushers thread, but still waiting for it with the
   completion.  No major change in semantics as far as I can see.

The idea of the async pass is to start writeback on as many as possible
pages before actively having to wait on them.  I'd agree with your
assessment that the writeback_inodes_sb might not really help all that
much - given that a lot of the waiting might not actually be for I/O
completion but e.g. for the block level throtteling (or maybe cgroups
in your case?).

For sys_sync I'm pretty sure we could simply remove the
writeback_inodes_sb call and get just as good if not better performance,
but we'd still need a solution for the other sync_filesystem callers,
assuming the first write actually benefits them.  Maybe you can run
some sys_syncfs microbenchmarks to check it?  Note that doing multiple
passes generally doesn't actually help live-lock avoidance either, but
wu has recently done some work in that area.

Another thing we've discussed a while ago is changing sync_inodes_sb
to the writeback or at least the waiting back in the calling threads
context to not block the flushers threads from getting real work done.

next prev parent reply	other threads:[~2011-06-29  8:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-28 23:43 [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr Curt Wohlgemuth
2011-06-29  0:54 ` Dave Chinner
2011-06-29  1:56   ` Curt Wohlgemuth
2011-06-29  8:11     ` Christoph Hellwig [this message]
2011-06-29 16:57       ` Jan Kara
2011-06-29 17:55         ` Christoph Hellwig
2011-06-29 19:15           ` Jan Kara
2011-06-29 20:12             ` Christoph Hellwig
2011-06-30 12:15               ` Jan Kara
2011-06-30 12:37                 ` Christoph Hellwig
2011-07-01 22:55             ` Curt Wohlgemuth
2011-07-02 11:32               ` Christoph Hellwig
2011-07-11 17:00               ` Jan Kara
2011-07-11 17:11                 ` Curt Wohlgemuth
2011-07-11 19:48                   ` Jan Kara
2011-07-11 19:51                     ` Curt Wohlgemuth
2011-07-11 20:11                     ` Christoph Hellwig
2011-07-12 10:34                       ` Jan Kara
2011-07-12 10:41                         ` Christoph Hellwig
2011-07-12 22:37                           ` Jan Kara
2011-07-14 16:29                             ` Curt Wohlgemuth
2011-07-14 23:08                               ` Jan Kara
2011-07-19 16:56                                 ` Christoph Hellwig
2011-07-21 18:35                                   ` Jan Kara
2011-07-22 15:16                                     ` Christoph Hellwig
2011-07-19 16:53                               ` Christoph Hellwig
2011-07-19 16:51                             ` Christoph Hellwig
2011-07-20 22:00                               ` Jan Kara
2011-07-22 15:11                                 ` Christoph Hellwig
2011-06-29 17:26       ` Curt Wohlgemuth
2011-06-29 18:00         ` Christoph Hellwig
2011-06-29 21:30           ` Curt Wohlgemuth
2011-07-19 15:54             ` Christoph Hellwig
2011-06-29  6:42   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110629081155.GA5558@infradead.org \
    --to=hch@infradead.org \
    --cc=curtw@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).