linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: trying to understand READ_META, READ_SYNC, WRITE_SYNC & co
Date: Fri, 25 Jun 2010 13:03:20 +0200	[thread overview]
Message-ID: <20100625110319.GA12855@lst.de> (raw)
In-Reply-To: <20100624014420.GB3297@redhat.com>

On Wed, Jun 23, 2010 at 09:44:20PM -0400, Vivek Goyal wrote:
> Let me explain the general idling logic and then see if it makes sense in case
> of WRITE_SYNC.
> 
> Once a request has completed, if the cfq queue is empty, we have two choices.
> Either expire the cfq queue and move on to dispatch requests from a
> different queue or we idle on the queue hoping we will get more IO from
> same process/queue.

queues are basically processes in this context?

> Idling can help (on SATA disks with high seek cost), if
> our guess was right and soon we got another request from same process. We
> cut down on number of seeks hence increased throghput.

I don't really understand the logic behind this.  If we lots of I/O
that actually is close to each other we should generally submit it in
one batch.  That is true for pagecache writeback, that is true for
metadata (at least in XFS..), and it's true for any sane application
doing O_DIRECT / O_SYNC style I/O.

What workloads produde I/O that is local (not random) writes with small
delays between the I/O requests?

I see the point of this logic for reads where various workloads have
dependent reads that might be close to each other, but I don't really
see any point for writes.

> So looks like fsync path will do bunch of IO and then will wait for jbd thread
> to finish the work. In this case idling is waste of time.

Given that ->writepage already does WRITE_SYNC_PLUG I/O which includes
REQ_NODILE I'm still confused why we still have that issue.

> I guess same will
> be true for umount and sync() path. But same probably is not necessarily true
> for a O_DIRECT writer (database comes to mind), and for O_SYNC writer
> (virtual machines?).

For virtual machines idling seems like a waste of ressources.  If we
have sequential I/O we dispatch in batches - in fact qemu even merges
sequential small block I/O it gets from the guest into one large request
we hand off to the host kernel.  For reads the same caveat as above
applies as read requests as handed through 1:1 from the guest.

> O_SYNC writers will get little disk share in presence of heavy buffered
> WRITES. If we choose to not special case WRITE_SYNC and continue to
> idle on the queue then we probably are wasting time and reducing overall
> throughput. (The fsync() case Jeff is running into).

Remember that O_SYNC writes are implemented as normal buffered write +
fsync (a range fsync to be exact, but that doesn't change a thing).

And that's what they conceptually are anyway, so treating a normal
buffered write + fsync different from an O_SYNC write is not only wrong
conceptuall but also in implementation.  You have the exact same issue
of handing off work to the journal commit thread in extN.   Note that
the log write (or at least parts of it) will always use WRITE_BARRIER,
which completey bypasses the I/O scheduler.  

> So one possible way could be that don't try to special case synchronous
> writes and continue to idle on the queue based on other parameters. If
> kernel/higher layers have knowledge that we are not going to issue more
> IO in same context, then they should explicitly call blk_yield(), to
> stop idling and give up slice.

We have no way to know what userspace will do if we are doing
O_SYNC/O_DIRECT style I/O or use fsync.  We know that we will most
likely continue kicking things from the same queue when doing page
writeback.  One thing that should help with this is Jens' explicit
per-process plugging stuff, which I noticed he recently updated to a
current kernel.

  reply	other threads:[~2010-06-25 11:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-21  9:48 trying to understand READ_META, READ_SYNC, WRITE_SYNC & co Christoph Hellwig
2010-06-21 10:04 ` Jens Axboe
2010-06-21 11:04   ` Christoph Hellwig
2010-06-21 18:56     ` Jens Axboe
2010-06-21 19:14       ` Christoph Hellwig
2010-06-21 19:16         ` Jens Axboe
2010-06-21 19:20           ` Christoph Hellwig
2010-06-21 21:36         ` Vivek Goyal
2010-06-23 10:01           ` Christoph Hellwig
2010-06-24  1:44             ` Vivek Goyal
2010-06-25 11:03               ` Christoph Hellwig [this message]
2010-06-26  3:35                 ` Vivek Goyal
2010-06-26 10:05                   ` Christoph Hellwig
2010-06-26 11:20                     ` Jens Axboe
2010-06-26 11:56                       ` Christoph Hellwig
2010-06-27 15:44                   ` Jeff Moyer
2010-06-29  9:06                     ` Corrado Zoccolo
2010-06-29 12:30                       ` Vivek Goyal
2010-06-30 15:30                         ` Corrado Zoccolo
2010-06-26  9:25                 ` Nick Piggin
2010-06-26  9:27                   ` Christoph Hellwig
2010-06-26 10:10                     ` Nick Piggin
2010-06-26 10:16                       ` Christoph Hellwig
2010-06-21 18:52   ` Jeff Moyer
2010-06-21 18:58     ` Jens Axboe
2010-06-21 19:08       ` Jeff Moyer
2010-06-23  9:26       ` Christoph Hellwig
2010-06-21 20:25   ` Vivek Goyal
2010-06-23 10:02     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100625110319.GA12855@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).