Re: trying to understand READ_META, READ_SYNC, WRITE_SYNC & co

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: trying to understand READ_META, READ_SYNC, WRITE_SYNC & co
Date: Mon, 21 Jun 2010 20:56:55 +0200	[thread overview]
Message-ID: <4C1FB5F7.3070908@kernel.dk> (raw)
In-Reply-To: <20100621110436.GA4056@lst.de>

On 21/06/10 13.04, Christoph Hellwig wrote:
> On Mon, Jun 21, 2010 at 12:04:06PM +0200, Jens Axboe wrote:
>> It's also annotation for blktrace, so you can tell which parts of the IO
>> is meta data etc. The scheduler impact is questionable, I doubt it makes
>> a whole lot of difference.
> 
>> For analysis purposes, annotating all meta data IOs as such would be
>> beneficial (reads as well as writes). As mentioned above, the current
>> scheduler impact isn't huge. There may be some interesting test and
>> benchmarks there for improving that part.
> 
> As mentioned in the previous mail the use of the flag is not very
> wide spread. Currently it's only ext3/ext3 inodes and directories as well
> as all metadata I/O in gfs2 that gets marked this way.  And I'd be much more
> comfortable to add more annotations if it didn't also have some form
> of schedule impact.  The I/O schedules in general and cfq in particular
> have caused us far too many issues with such subtile differences.

FWIW, Windows marks meta data writes and they go out with FUA set
on SATA disks. And SATA firmware prioritizes FUA writes, it's essentially
a priority bit as well as a platter access bit. So at least we have some
one else using a meta data boost. I agree that it would be a lot more
trivial to add the annotations if they didn't have scheduler impact
as well, but I still think it's a sane thing.

>>> 	This one is used in quite a few places, with many of them
>>> 	obsfucated by macros like rw_is_sync, rq_is_sync and
>>> 	cfq_bio_sync.  In general all uses seem to imply giving
>>> 	a write request the same priority as a read request and
>>> 	treat it as synchronous.  I could not spot a place where
>>> 	it actually has any effect on reads.
>>
>> Reads are sync by nature in the block layer, so they don't get that
>> special annotation.
> 
> Well, we do give them this special annotation in a few places, but we
> don't actually use it.

For unplugging?

>> So a large part of that problem is the overloaded meaning of sync. For
>> some cases it means "unplug on issue", and for others it means that the
>> IO itself is syncronous. The other nasty bit is the implicit plugging
>> that happens behind the back of the submitter, but that's an issue to
>> tackle separately. I'd suggest avoiding unnecessary churn in naming of
>> those.
> 
> Well, the current naming is extremly confusing.  The best thing I could
> come up with is to completely drop READ_SYNC and WRITE_SYNC and just
> pass REQ_UNPLUG explicitly together with READ / WRITE_SYNC_PLUG.
> There's only 5 respective 8 users of them, so explicitly documenting
> our intentions there seems useful.  Especially if we want to add more
> _META annotation in which case the simple READ_* / WRITE_* macros
> don't do anymore either.  Similarly it might be useful to remove
> READ_META/WRITE_META and replace them with explicit | REQ_META, which
> is just about as short and a lot more descriptive, especially for
> synchronous metadata writes.

Sure, we can add the explicit plug naming, I don't have an issue
with that.

>>> 	Why do O_DIRECT writes not want to set REQ_NOIDLE (and that
>>> 	exactly does REQ_NOIDLE mean anyway).  It's the only sync writes
>>> 	that do not set it, so if this special case went away we
>>> 	could get rid of the flag and key it off REQ_SYNC.
>>
>> See above for NOIDLE. You kill O_DIRECT write throughput if you don't
>> idle at the end of a write, if you have other activity on the disk.
> 
> Ok, makes sense.  Would you mind taking a patch to kill the
> WRITE_ODIRECT_PLUG and just do a
> 
> 	/*
> 	 * O_DIRECT writes are synchronous, but we must not disable the
> 	 * idling logic in CFQ to avoid killing performance.
> 	 */
> 	if (rw & WRITE)
> 		rw |= REQ_SYNC;

At the caller site? Sure.

> But that leaves the question why disabling the idling logical for
> data integrity ->writepage is fine?  This gets called from ->fsync
> or O_SYNC writes and will have the same impact as O_DIRECT writes.

We have never enabled idling for those. O_SYNC should get a nice
boost too, it just needs to be benchmarked and tested and then
there would be no reason not to add it.

-- 
Jens Axboe

next prev parent reply	other threads:[~2010-06-21 18:56 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-21  9:48 trying to understand READ_META, READ_SYNC, WRITE_SYNC & co Christoph Hellwig
2010-06-21 10:04 ` Jens Axboe
2010-06-21 11:04   ` Christoph Hellwig
2010-06-21 18:56     ` Jens Axboe [this message]
2010-06-21 19:14       ` Christoph Hellwig
2010-06-21 19:16         ` Jens Axboe
2010-06-21 19:20           ` Christoph Hellwig
2010-06-21 21:36         ` Vivek Goyal
2010-06-23 10:01           ` Christoph Hellwig
2010-06-24  1:44             ` Vivek Goyal
2010-06-25 11:03               ` Christoph Hellwig
2010-06-26  3:35                 ` Vivek Goyal
2010-06-26 10:05                   ` Christoph Hellwig
2010-06-26 11:20                     ` Jens Axboe
2010-06-26 11:56                       ` Christoph Hellwig
2010-06-27 15:44                   ` Jeff Moyer
2010-06-29  9:06                     ` Corrado Zoccolo
2010-06-29 12:30                       ` Vivek Goyal
2010-06-30 15:30                         ` Corrado Zoccolo
2010-06-26  9:25                 ` Nick Piggin
2010-06-26  9:27                   ` Christoph Hellwig
2010-06-26 10:10                     ` Nick Piggin
2010-06-26 10:16                       ` Christoph Hellwig
2010-06-21 18:52   ` Jeff Moyer
2010-06-21 18:58     ` Jens Axboe
2010-06-21 19:08       ` Jeff Moyer
2010-06-23  9:26       ` Christoph Hellwig
2010-06-21 20:25   ` Vivek Goyal
2010-06-23 10:02     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C1FB5F7.3070908@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).