linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
	jaxboe@fusionio.com, linux-fsdevel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	James.Bottomley@suse.de, tytso@mit.edu, chris.mason@oracle.com,
	swhiteho@redhat.com, konishi.ryusuke@lab.ntt.co.jp,
	dm-devel@redhat.com, vst@vlnb.net, jack@suse.cz,
	rwheeler@redhat.com, hare@suse.de
Subject: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush
Date: Sat, 14 Aug 2010 12:36:54 +0200	[thread overview]
Message-ID: <20100814103654.GA13292@lst.de> (raw)
In-Reply-To: <4C655BE5.4070809@kernel.org>

On Fri, Aug 13, 2010 at 04:51:17PM +0200, Tejun Heo wrote:
> Do you want to change the whole thing in a single commit?  That would
> be a pretty big invasive patch touching multiple subsystems.

We can just stop draining in the block layer in the first patch, then
stop doing the stuff in md/dm/etc in the following and then do the
final renaming patches.  It would still be less patches then now, but
keep things working through the whole transition, which would really
help biseting any problems.

> +			if (req->cmd_flags & REQ_FUA)
> +				vbr->out_hdr.type |= VIRTIO_BLK_T_FUA;

I'd suggest not adding FUA support to virtio yet.  Just using the flush
feature gives you a fully working barrier implementation.

Eventually we might want to add a flag in the block queue to send
REQ_FLUSH|REQ_FUA request through to virtio directly so that we can
avoid separate pre- and post flushes, but I really want to benchmark if
it makes an impact on real life setups first.

> Index: block/drivers/md/linear.c
> ===================================================================
> --- block.orig/drivers/md/linear.c
> +++ block/drivers/md/linear.c
> @@ -294,8 +294,8 @@ static int linear_make_request (mddev_t
>  	dev_info_t *tmp_dev;
>  	sector_t start_sector;
> 
> -	if (unlikely(bio->bi_rw & REQ_HARDBARRIER)) {
> -		md_barrier_request(mddev, bio);
> +	if (unlikely(bio->bi_rw & REQ_FLUSH)) {
> +		md_flush_request(mddev, bio);

We only need the special md_flush_request handling for
empty REQ_FLUSH requests.  REQ_WRITE | REQ_FLUSH just need the
flag propagated to the underlying devices.

> +static void md_end_flush(struct bio *bio, int err)
>  {
>  	mdk_rdev_t *rdev = bio->bi_private;
>  	mddev_t *mddev = rdev->mddev;
> 
>  	rdev_dec_pending(rdev, mddev);
> 
>  	if (atomic_dec_and_test(&mddev->flush_pending)) {
> +		/* The pre-request flush has finished */
> +		schedule_work(&mddev->flush_work);

Once we only handle empty barriers here we can directly call bio_endio
instead of first scheduling a work queue.Once we only handle empty
barriers here we can directly call bio_endio and the super wakeup
instead of first scheduling a work queue.

>  	while ((bio = bio_list_pop(writes))) {
> -		if (unlikely(bio_empty_barrier(bio))) {
> +		if ((bio->bi_rw & REQ_FLUSH) && !bio_has_data(bio)) {

I kept bio_empty_barrier as bio_empty_flush, which actually is a quite
useful macro for the bio based drivers.

> @@ -621,7 +621,7 @@ static void dec_pending(struct dm_io *io
>  			 */
>  			spin_lock_irqsave(&md->deferred_lock, flags);
>  			if (__noflush_suspending(md)) {
> -				if (!(io->bio->bi_rw & REQ_HARDBARRIER))
> +				if (!(io->bio->bi_rw & REQ_FLUSH))

I suspect we don't actually need to special case flushes here anymore.


> @@ -633,14 +633,14 @@ static void dec_pending(struct dm_io *io
>  		io_error = io->error;
>  		bio = io->bio;
> 
> -		if (bio->bi_rw & REQ_HARDBARRIER) {
> +		if (bio->bi_rw & REQ_FLUSH) {
>  			/*
> -			 * There can be just one barrier request so we use
> +			 * There can be just one flush request so we use
>  			 * a per-device variable for error reporting.
>  			 * Note that you can't touch the bio after end_io_acct
>  			 */
> -			if (!md->barrier_error && io_error != -EOPNOTSUPP)
> -				md->barrier_error = io_error;
> +			if (!md->flush_error)
> +				md->flush_error = io_error;

And we certainly do not need any special casing here.  See my patch.

>  {
>  	int rw = rq_data_dir(clone);
>  	int run_queue = 1;
> -	bool is_barrier = clone->cmd_flags & REQ_HARDBARRIER;
> +	bool is_flush = clone->cmd_flags & REQ_FLUSH;
>  	struct dm_rq_target_io *tio = clone->end_io_data;
>  	struct mapped_device *md = tio->md;
>  	struct request *rq = tio->orig;
> 
> -	if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_barrier) {
> +	if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_flush) {

We never send flush requests as REQ_TYPE_BLOCK_PC anymore, so no need
for the second half of this conditional.

> +	if (!is_flush)
> +		blk_end_request_all(rq, error);
> +	else {
>  		if (unlikely(error))
> -			store_barrier_error(md, error);
> +			store_flush_error(md, error);
>  		run_queue = 0;
> -	} else
> -		blk_end_request_all(rq, error);
> +	}

Flush requests can now be completed normally.

> @@ -1308,11 +1302,11 @@ static void __split_and_process_bio(stru
> 
>  	ci.map = dm_get_live_table(md);
>  	if (unlikely(!ci.map)) {
> -		if (!(bio->bi_rw & REQ_HARDBARRIER))
> +		if (!(bio->bi_rw & REQ_FLUSH))
>  			bio_io_error(bio);
>  		else
> -			if (!md->barrier_error)
> -				md->barrier_error = -EIO;
> +			if (!md->flush_error)
> +				md->flush_error = -EIO;

No need for the special error handling here, flush requests can now
be completed normally.

> @@ -1417,11 +1419,11 @@ static int _dm_request(struct request_qu
>  	part_stat_unlock();
> 
>  	/*
> -	 * If we're suspended or the thread is processing barriers
> +	 * If we're suspended or the thread is processing flushes
>  	 * we have to queue this io for later.
>  	 */
>  	if (unlikely(test_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags)) ||
> -	    unlikely(bio->bi_rw & REQ_HARDBARRIER)) {
> +	    (bio->bi_rw & REQ_FLUSH)) {
>  		up_read(&md->io_lock);

AFAICS this is only needed for the old barrier code, no need for this
for pure flushes.

> @@ -1464,10 +1466,7 @@ static int dm_request(struct request_que
> 
>  static bool dm_rq_is_flush_request(struct request *rq)
>  {
> -	if (rq->cmd_flags & REQ_FLUSH)
> -		return true;
> -	else
> -		return false;
> +	return rq->cmd_flags & REQ_FLUSH;
>  }

It's probably worth just killing this wrapper.


>  void dm_dispatch_request(struct request *rq)
> @@ -1520,7 +1519,7 @@ static int setup_clone(struct request *c
>  	if (dm_rq_is_flush_request(rq)) {
>  		blk_rq_init(NULL, clone);
>  		clone->cmd_type = REQ_TYPE_FS;
> -		clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
> +		clone->cmd_flags |= (REQ_FLUSH | WRITE);
>  	} else {
>  		r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
>  				      dm_rq_bio_constructor, tio);

My suspicion is that we can get rif of all that special casing here
and just use blk_rq_prep_clone once it's been updated to propagate
REQ_FLUSH, similar to the DISCARD flag.

I also suspect that there is absolutely no need to the barrier work
queue once we stop waiting for outstanding request.  But then again
the request based dm code still somewhat confuses me.

> +static void process_flush(struct mapped_device *md, struct bio *bio)
>  {
> +	md->flush_error = 0;
> +
> +	/* handle REQ_FLUSH */
>  	dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
> 
> -	bio_init(&md->barrier_bio);
> -	md->barrier_bio.bi_bdev = md->bdev;
> -	md->barrier_bio.bi_rw = WRITE_BARRIER;
> -	__split_and_process_bio(md, &md->barrier_bio);
> +	bio_init(&md->flush_bio);
> +	md->flush_bio.bi_bdev = md->bdev;
> +	md->flush_bio.bi_rw = WRITE_FLUSH;
> +	__split_and_process_bio(md, &md->flush_bio);

There's not need to use a separate flush_bio here.
__split_and_process_bio does the right thing for empty REQ_FLUSH
requests.  See my patch for how to do this differenty.  And yeah,
my version has been tested.

  reply	other threads:[~2010-08-14 10:36 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-12 12:41 [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Tejun Heo
2010-08-12 12:41 ` [PATCH 01/11] block/loop: queue ordered mode should be DRAIN_FLUSH Tejun Heo
2010-08-12 12:41 ` [PATCH 02/11] block: kill QUEUE_ORDERED_BY_TAG Tejun Heo
2010-08-13 12:56   ` Vladislav Bolkhovitin
2010-08-13 13:06     ` Christoph Hellwig
2010-08-12 12:41 ` [PATCH 03/11] block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() Tejun Heo
2010-08-14  1:07   ` Jeremy Fitzhardinge
2010-08-14  9:42     ` hch
2010-08-16 20:38       ` Jeremy Fitzhardinge
2010-08-12 12:41 ` [PATCH 04/11] block: remove spurious uses of REQ_HARDBARRIER Tejun Heo
2010-08-12 12:41 ` [PATCH 05/11] block: misc cleanups in barrier code Tejun Heo
2010-08-12 12:41 ` [PATCH 06/11] block: drop barrier ordering by queue draining Tejun Heo
2010-08-12 12:41 ` [PATCH 07/11] block: rename blk-barrier.c to blk-flush.c Tejun Heo
2010-08-12 12:41 ` [PATCH 08/11] block: rename barrier/ordered to flush Tejun Heo
2010-08-17 13:26   ` Christoph Hellwig
2010-08-17 16:23     ` Tejun Heo
2010-08-17 17:08       ` Christoph Hellwig
2010-08-18  6:23         ` Tejun Heo
2010-08-12 12:41 ` [PATCH 09/11] block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests Tejun Heo
2010-08-12 12:41 ` [PATCH 10/11] fs, block: propagate REQ_FLUSH/FUA interface to upper layers Tejun Heo
2010-08-12 21:24   ` Jan Kara
2010-08-13  7:19     ` Tejun Heo
2010-08-13  7:47       ` Christoph Hellwig
2010-08-16 16:33   ` [PATCH UPDATED " Tejun Heo
2010-08-12 12:41 ` [PATCH 11/11] block: use REQ_FLUSH in blkdev_issue_flush() Tejun Heo
2010-08-13 11:48 ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Christoph Hellwig
2010-08-13 13:48   ` Tejun Heo
2010-08-13 14:38     ` Christoph Hellwig
2010-08-13 14:51       ` Tejun Heo
2010-08-14 10:36         ` Christoph Hellwig [this message]
2010-08-17  9:59           ` Tejun Heo
2010-08-17 13:19             ` Christoph Hellwig
2010-08-17 16:41               ` Tejun Heo
2010-08-17 16:59                 ` Christoph Hellwig
2010-08-18  6:35                   ` Tejun Heo
2010-08-18  8:11                     ` Tejun Heo
2010-08-20  8:26                   ` Kiyoshi Ueda
2010-08-23 12:14                     ` Tejun Heo
2010-08-23 14:17                       ` Mike Snitzer
2010-08-24 10:24                         ` Kiyoshi Ueda
2010-08-24 16:59                           ` Tejun Heo
2010-08-24 17:52                             ` Mike Snitzer
2010-08-24 18:14                               ` Tejun Heo
2010-08-25  8:00                             ` Kiyoshi Ueda
2010-08-25 15:28                               ` Mike Snitzer
2010-08-27  9:47                                 ` Kiyoshi Ueda
2010-08-27 13:49                                   ` Mike Snitzer
2010-08-30  6:13                                     ` Kiyoshi Ueda
2010-09-01  0:55                                       ` safety of retrying SYNCHRONIZE CACHE [was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush] Mike Snitzer
2010-09-01  7:32                                         ` Hannes Reinecke
2010-09-01  7:38                                           ` Hannes Reinecke
2010-08-25 15:59                               ` [RFC] training mpath to discern between SCSI errors (was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush) Mike Snitzer
2010-08-25 19:15                                 ` [RFC] training mpath to discern between SCSI errors Mike Christie
2010-08-30 11:38                                 ` Hannes Reinecke
2010-08-30 12:07                                   ` Sergei Shtylyov
2010-08-30 12:39                                     ` Hannes Reinecke
2010-08-30 14:52                                       ` [dm-devel] " Hannes Reinecke
2010-10-18  8:09                                         ` Jun'ichi Nomura
2010-10-18 11:55                                           ` Hannes Reinecke
2010-10-19  4:03                                             ` Jun'ichi Nomura
2010-11-19  3:11                                             ` [dm-devel] " Malahal Naineni
2010-11-30 22:59                                               ` Mike Snitzer
2010-12-07 23:16                                                 ` [RFC PATCH 0/3] differentiate between I/O errors Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH v2 1/3] scsi: Detailed " Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH v2 2/3] dm mpath: propagate target errors immediately Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH 3/3] block: improve detail in I/O error messages Mike Snitzer
2010-12-08 11:28                                                     ` Sergei Shtylyov
2010-12-08 15:05                                                       ` [PATCH v2 " Mike Snitzer
2010-12-10 23:40                                                   ` [RFC PATCH 0/3] differentiate between I/O errors Malahal Naineni
2011-01-14  1:15                                                     ` Mike Snitzer
2010-12-17  9:47                                                 ` training mpath to discern between SCSI errors Hannes Reinecke
2010-12-17 14:06                                                   ` Mike Snitzer
2011-01-14  1:09                                                     ` Mike Snitzer
2011-01-14  7:45                                                       ` Hannes Reinecke
2011-01-14 13:59                                                         ` Mike Snitzer
2010-08-24 17:11                       ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Vladislav Bolkhovitin
2010-08-24 23:14                         ` Alan Cox
2010-08-13 12:55 ` Vladislav Bolkhovitin
2010-08-13 13:17   ` Christoph Hellwig
2010-08-18 19:29     ` Vladislav Bolkhovitin
2010-08-13 13:21   ` Tejun Heo
2010-08-18 19:30     ` Vladislav Bolkhovitin
2010-08-19  9:51       ` Tejun Heo
2010-08-30  9:54         ` Hannes Reinecke
2010-08-30 20:34           ` Vladislav Bolkhovitin
2010-08-18  9:46 ` Christoph Hellwig
2010-08-19  9:57   ` Tejun Heo
2010-08-19 10:20     ` Christoph Hellwig
2010-08-19 10:22       ` Tejun Heo
2010-08-20 13:22 ` Christoph Hellwig
2010-08-20 15:18   ` Ric Wheeler
2010-08-20 16:00     ` Chris Mason
2010-08-20 16:02       ` Ric Wheeler
2010-08-23 12:30     ` Tejun Heo
2010-08-23 12:48       ` Christoph Hellwig
2010-08-23 13:58         ` Ric Wheeler
2010-08-23 14:01           ` Jens Axboe
2010-08-23 14:08             ` Christoph Hellwig
2010-08-23 14:13               ` Tejun Heo
2010-08-23 14:19                 ` Christoph Hellwig
2010-08-25 11:31               ` Jens Axboe
2010-08-30 10:04               ` Hannes Reinecke
2010-08-23 15:19             ` Ric Wheeler
2010-08-23 16:45               ` Sergey Vlasov
2010-08-23 16:49                 ` [dm-devel] " Ric Wheeler
2010-08-23 12:36   ` Tejun Heo
2010-08-23 14:05     ` Christoph Hellwig
2010-08-23 14:15 ` [PATCH] block: simplify queue_next_fseq Christoph Hellwig
2010-08-23 16:28   ` OT grammar nit " John Robinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100814103654.GA13292@lst.de \
    --to=hch@lst.de \
    --cc=James.Bottomley@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).