linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Theodore Tso <tytso@mit.edu>,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	jack@suse.cz
Subject: Re: [PATCH] block_write_full_page: switch synchronous writes to use  WRITE_SYNC_PLUG
Date: Wed, 8 Apr 2009 10:08:44 +0200	[thread overview]
Message-ID: <20090408080844.GW5178@kernel.dk> (raw)
In-Reply-To: <20090407160944.de3c5139.akpm@linux-foundation.org>

On Tue, Apr 07 2009, Andrew Morton wrote:
> On Tue, 7 Apr 2009 18:19:33 -0400
> Theodore Tso <tytso@mit.edu> wrote:
> 
> > Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
> > use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
> > the block device I/O queue between each page that gets flushed out.
> > 
> > The upstream callers of block_write_full_page() which wait for the
> > writes to finish call wait_on_buffer(), wait_on_writeback_range()
> > (which ultimately calls sync_page(), which calls
> > blk_run_backing_dev(), which will unplug the device queue), and so on.
> > 
> 
> <sob>
> 
> > 
> > We should get this applied to avoid any performance regressions
> > resulting from commit a64c8610.
> > 
> >  fs/buffer.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 977e12a..95b5390 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1646,7 +1646,8 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
> >  	struct buffer_head *bh, *head;
> >  	const unsigned blocksize = 1 << inode->i_blkbits;
> >  	int nr_underway = 0;
> > -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> > +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> > +			WRITE_SYNC_PLUG : WRITE);
> >  
> >  	BUG_ON(!PageLocked(page));
> 
> So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> this change have upon kernel behaviour?

How about something like this. Comments welcome. Should we move this to
a dedicated header file? fs.h is amazingly cluttered as it is.

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 562d285..6b6597a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,6 +87,57 @@ struct inodes_stat_t {
  */
 #define FMODE_NOCMTIME		((__force fmode_t)2048)
 
+/*
+ * The below are the various read and write types that we support. Some of
+ * them include behavioral modifiers that send information down to the
+ * block layer and IO scheduler. Terminology:
+ *
+ *	The block layer uses device plugging to defer IO a little bit, in
+ *	the hope that we will see more IO very shortly. This increases
+ *	coalescing of adjacent IO and thus reduces the number of IOs we
+ *	have to send to the device. It also allows for better queuing,
+ *	if the IO isn't mergeable. If the caller is going to be waiting
+ *	for the IO, then he must ensure that the device is unplugged so
+ *	that the IO is dispatched to the driver.
+ *
+ *	All IO is handled async in Linux. This is fine for background
+ *	writes, but for reads or writes that someone waits for completion
+ *	on, we want to notify the block layer and IO scheduler so that they
+ *	know about it. That allows them to make better scheduling
+ *	decisions. So when the below references 'sync' and 'async', it
+ *	is referencing this priority hint.
+ *
+ * With that in mind, the available types are:
+ *
+ * READ			A normal read operation. Device will be plugged.
+ * READ_SYNC		A synchronous read. Device is not plugged, caller can
+ *			immediately wait on this read without caring about
+ *			unplugging.
+ * READA		Used for read-ahead operations. Lower priority, and the
+ *			 block layer could (in theory) choose to ignore this
+ *			request if it runs into resource problems.
+ * WRITE		A normal async write. Device will be plugged.
+ * SWRITE		Like WRITE, but a special case for ll_rw_block() that
+ *			tells it to lock the buffer first. Normally a buffer
+ *			must be locked before doing IO.
+ * WRITE_SYNC_PLUG	Synchronous write. Identical to WRITE, but passes down
+ *			the hint that someone will be waiting on this IO
+ *			shortly.
+ * WRITE_SYNC		Like WRITE_SYNC_PLUG, but also unplugs the device
+ *			immediately after submission. The write equivalent
+ *			of READ_SYNC.
+ * WRITE_ODIRECT	Special case write for O_DIRECT only.
+ * SWRITE_SYNC
+ * SWRITE_SYNC_PLUG	Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
+ *			See SWRITE.
+ * WRITE_BARRIER	Like WRITE, but tells the block layer that all
+ *			previously submitted writes must be safely on storage
+ *			before this one is started. Also guarantees that when
+ *			this write is complete, it itself is also safely on
+ *			storage. Prevents reordering of writes on both sides
+ *			of this IO.
+ *
+ */
 #define RW_MASK		1
 #define RWA_MASK	2
 #define READ 0
@@ -102,6 +153,11 @@ struct inodes_stat_t {
 			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
 #define SWRITE_SYNC	(SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
 #define WRITE_BARRIER	(WRITE | (1 << BIO_RW_BARRIER))
+
+/*
+ * These aren't really reads or writes, they pass down information about
+ * parts of device that are now unused by the file system.
+ */
 #define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
 #define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))
 

-- 
Jens Axboe


  parent reply	other threads:[~2009-04-08  8:08 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-27 20:24 [PATCH 0/3] Ext3 latency improvement patches Theodore Ts'o
2009-03-27 20:24 ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-03-27 20:24   ` [PATCH 2/3] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
2009-03-27 20:24     ` [PATCH 3/3] ext3: Avoid starting a transaction in writepage when not necessary Theodore Ts'o
2009-03-27 22:23       ` Jan Kara
2009-03-27 23:03         ` Theodore Tso
2009-03-30 13:22           ` Jan Kara
2009-03-27 22:20     ` [PATCH 2/3] ext3: Use WRITE_SYNC for commits which are caused by fsync() Jan Kara
2009-03-27 20:55   ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Jan Kara
2009-04-07  6:21   ` Andrew Morton
2009-04-07  6:50     ` Andrew Morton
2009-04-07  7:08       ` Jens Axboe
2009-04-07  7:17         ` Jens Axboe
2009-04-07  8:16           ` Jens Axboe
2009-04-07  7:23         ` Andrew Morton
2009-04-07  7:57           ` Jens Axboe
2009-04-07 19:09             ` Theodore Tso
2009-04-07 19:32               ` Jens Axboe
2009-04-07 21:44                 ` Theodore Tso
2009-04-07 22:19                   ` [PATCH] block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG Theodore Tso
2009-04-07 23:09                     ` Andrew Morton
2009-04-07 23:46                       ` Theodore Tso
2009-04-08  8:08                       ` Jens Axboe [this message]
2009-04-08 22:34                         ` Andrew Morton
2009-04-09 17:59                           ` Jens Axboe
2009-04-08  6:00                     ` Jens Axboe
2009-04-08 15:26                       ` Theodore Tso
2009-04-08  5:58                   ` [PATCH 1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Jens Axboe
2009-04-08 15:25                     ` Theodore Tso
2009-04-07 14:19           ` Theodore Tso
2009-03-27 20:50 ` [PATCH 0/3] Ext3 latency improvement patches Chris Mason
2009-03-27 21:03   ` Chris Mason
2009-03-27 21:19     ` Jan Kara
2009-03-27 21:30     ` Theodore Tso
2009-03-27 21:54       ` Jan Kara
2009-03-27 23:09         ` Theodore Tso
2009-03-28  0:14           ` Jeff Garzik
2009-03-28  0:24             ` David Rees
2009-03-30 14:16               ` Ric Wheeler
2009-03-30 11:23       ` Aneesh Kumar K.V
     [not found]       ` <20090330112330.GA11357@skywalker>
2009-03-30 11:44         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090408080844.GW5178@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).