From: Andrew Morton <akpm@linux-foundation.org>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Theodore Tso <tytso@mit.edu>,
linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
jack@suse.cz
Subject: Re: [PATCH] block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG
Date: Wed, 8 Apr 2009 15:34:28 -0700 [thread overview]
Message-ID: <20090408153428.6195a442.akpm@linux-foundation.org> (raw)
In-Reply-To: <20090408080844.GW5178@kernel.dk>
On Wed, 8 Apr 2009 10:08:44 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> > this change have upon kernel behaviour?
>
> How about something like this. Comments welcome.
It's lovely.
> Should we move this to
> a dedicated header file? fs.h is amazingly cluttered as it is.
Sometime, perhaps.
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 562d285..6b6597a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -87,6 +87,57 @@ struct inodes_stat_t {
> */
> #define FMODE_NOCMTIME ((__force fmode_t)2048)
>
> +/*
> + * The below are the various read and write types that we support. Some of
> + * them include behavioral modifiers that send information down to the
> + * block layer and IO scheduler. Terminology:
> + *
> + * The block layer uses device plugging to defer IO a little bit, in
> + * the hope that we will see more IO very shortly. This increases
> + * coalescing of adjacent IO and thus reduces the number of IOs we
> + * have to send to the device. It also allows for better queuing,
> + * if the IO isn't mergeable. If the caller is going to be waiting
> + * for the IO, then he must ensure that the device is unplugged so
> + * that the IO is dispatched to the driver.
> + *
> + * All IO is handled async in Linux. This is fine for background
> + * writes, but for reads or writes that someone waits for completion
> + * on, we want to notify the block layer and IO scheduler so that they
> + * know about it. That allows them to make better scheduling
> + * decisions. So when the below references 'sync' and 'async', it
> + * is referencing this priority hint.
> + *
> + * With that in mind, the available types are:
> + *
> + * READ A normal read operation. Device will be plugged.
> + * READ_SYNC A synchronous read. Device is not plugged, caller can
> + * immediately wait on this read without caring about
> + * unplugging.
> + * READA Used for read-ahead operations. Lower priority, and the
> + * block layer could (in theory) choose to ignore this
> + * request if it runs into resource problems.
> + * WRITE A normal async write. Device will be plugged.
> + * SWRITE Like WRITE, but a special case for ll_rw_block() that
> + * tells it to lock the buffer first. Normally a buffer
> + * must be locked before doing IO.
> + * WRITE_SYNC_PLUG Synchronous write. Identical to WRITE, but passes down
> + * the hint that someone will be waiting on this IO
> + * shortly.
>From the text, I'd expect WRITE_SYNC_PLUG to, err, unplug!
> + * WRITE_SYNC Like WRITE_SYNC_PLUG, but also unplugs the device
> + * immediately after submission. The write equivalent
> + * of READ_SYNC.
But this contradicts my expectation.
So what does WRITE_SYNC_PLUG really do dofferent from WRITE?
> + * WRITE_ODIRECT Special case write for O_DIRECT only.
> + * SWRITE_SYNC
> + * SWRITE_SYNC_PLUG Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
> + * See SWRITE.
> + * WRITE_BARRIER Like WRITE, but tells the block layer that all
> + * previously submitted writes must be safely on storage
> + * before this one is started. Also guarantees that when
> + * this write is complete, it itself is also safely on
> + * storage. Prevents reordering of writes on both sides
> + * of this IO.
> + *
> + */
> #define RW_MASK 1
> #define RWA_MASK 2
> #define READ 0
> @@ -102,6 +153,11 @@ struct inodes_stat_t {
> (SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
> #define SWRITE_SYNC (SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
> #define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
> +
> +/*
> + * These aren't really reads or writes, they pass down information about
> + * parts of device that are now unused by the file system.
> + */
> #define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
> #define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))
next prev parent reply other threads:[~2009-04-08 22:37 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-27 20:24 [PATCH 0/3] Ext3 latency improvement patches Theodore Ts'o
2009-03-27 20:24 ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-03-27 20:24 ` [PATCH 2/3] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
2009-03-27 20:24 ` [PATCH 3/3] ext3: Avoid starting a transaction in writepage when not necessary Theodore Ts'o
2009-03-27 22:23 ` Jan Kara
2009-03-27 23:03 ` Theodore Tso
2009-03-30 13:22 ` Jan Kara
2009-03-27 22:20 ` [PATCH 2/3] ext3: Use WRITE_SYNC for commits which are caused by fsync() Jan Kara
2009-03-27 20:55 ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Jan Kara
2009-04-07 6:21 ` Andrew Morton
2009-04-07 6:50 ` Andrew Morton
2009-04-07 7:08 ` Jens Axboe
2009-04-07 7:17 ` Jens Axboe
2009-04-07 8:16 ` Jens Axboe
2009-04-07 7:23 ` Andrew Morton
2009-04-07 7:57 ` Jens Axboe
2009-04-07 19:09 ` Theodore Tso
2009-04-07 19:32 ` Jens Axboe
2009-04-07 21:44 ` Theodore Tso
2009-04-07 22:19 ` [PATCH] block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG Theodore Tso
2009-04-07 23:09 ` Andrew Morton
2009-04-07 23:46 ` Theodore Tso
2009-04-08 8:08 ` Jens Axboe
2009-04-08 22:34 ` Andrew Morton [this message]
2009-04-09 17:59 ` Jens Axboe
2009-04-08 6:00 ` Jens Axboe
2009-04-08 15:26 ` Theodore Tso
2009-04-08 5:58 ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Jens Axboe
2009-04-08 15:25 ` Theodore Tso
2009-04-07 14:19 ` Theodore Tso
2009-03-27 20:50 ` [PATCH 0/3] Ext3 latency improvement patches Chris Mason
2009-03-27 21:03 ` Chris Mason
2009-03-27 21:19 ` Jan Kara
2009-03-27 21:30 ` Theodore Tso
2009-03-27 21:54 ` Jan Kara
2009-03-27 23:09 ` Theodore Tso
2009-03-28 0:14 ` Jeff Garzik
2009-03-28 0:24 ` David Rees
2009-03-30 14:16 ` Ric Wheeler
2009-03-30 11:23 ` Aneesh Kumar K.V
[not found] ` <20090330112330.GA11357@skywalker>
2009-03-30 11:44 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090408153428.6195a442.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).