[PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier
@ 2005-07-26 15:45 Tejun Heo
  2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:45 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

 Hello, Jens, James, Jeff and Bartlomiej.

 This is the third posting of blk ordered reimplementation.  Changes
since the scond posting are...

 * Dispatch queue patchset is reordered in front of this patchset.
 * Draining bug fix
 * Proper requeueing of requests in an ordered sequence for TAG
   ordered queues.
 * Fallback mechanism stripped out.  This also makes -EOPNOTSUPP
   changes in SCSI and IDE unnecessary.
 * TAG ordering request issue fix
 * SCSI/libata/IDE patches splitted as requested
 * Hopefully, changes to libata and IDE are more acceptable
 * Documentation/block/barrier.txt added

 This patchset is part of the patch series described in the following mail.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238377602033&w=2

 As all previous patches don't really concern subsystems other than
block layer.  They are only sent to Jens and LKML.  If they're needed,
preceding patches are...

fix-elevator_find:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238419731035&w=2

fix-cfq_find_next_crq:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238456712203&w=2

generic-dispatch-queue:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238633622498&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238647101632&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238693809889&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238675926411&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238660407245&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238708918364&w=2

reimplement-elevator-switch:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238771305863&w=2

 For general description please read barrier.txt and the previous posting.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=111795127124020&w=2

 ==== Some excerpts from barrier.txt. ====

* SCSI layer currently can't use TAG ordering even if the drive,
  controller and driver support it.  The problem is that SCSI midlayer
  request dispatch function is not atomic.  It releases queue lock and
  switch to SCSI host lock during issue and it's possible and likely
  to happen in time that requests change their relative positions.
  Once this problem is solved, TAG ordering can be enabled.

* Error handling.  Currently, block layer will report error to upper
  layer if any of requests in an ordered sequence fails.
  Unfortunately, this doesn't seem to be enough.  Look at the
  following request flow.  QUEUE_ORDERED_TAG_FLUSH is in use.

 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
                                          still in elevator

  Let's say request [2], [3] are write requests to update file system
  metadata (journal or whatever) and [barrier] is used to mark that
  those updates are valid.  Consider the following sequence.

 i.     Requests [0] ~ [post] leaves the request queue and enters
        low-level driver.
 ii.    After a while, unfortunately, something goes wrong and the
        drive fails [2].  Note that any of [0], [1] and [3] could have
        completed by this time, but [pre] couldn't have been finished
        as the drive must process it in order and it failed before
        processing that command.
 iii.   Error handling kicks in and determines that the error is
        unrecoverable and fails [2], and resumes operation.
 iv.    [pre] [barrier] [post] gets processed.
 v.     *BOOM* power fails

  The problem here is that the barrier request is *supposed* to
  indicate that filesystem update requests [2] and [3] made it safely
  to the physical medium and, if the machine crashes after the barrier
  is written, filesystem recovery code can depend on that.  Sadly,
  that isn't true in this case anymore.  IOW, the success of a I/O
  barrier should also be dependent on success of some of the preceding
  requests, where only upper layer (filesystem) knows what 'some' is.

  This can be solved by implementing a way to tell the block layer
  which requests affect the success of the following barrier request
  and making lower lever drivers to resume operation on error only
  after block layer tells it to do so.

  As the probability of this happening is very low and the drive
  should be faulty, implementing the fix is probably an overkill.
  But, still, it's there.

[ Start of patch descriptions ]

01_blk_add-uptodate-to-end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

02_blk_implement-init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

03_blk_reimplement-ordered.patch
	: reimplement handling of barrier request

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

04_blk_scsi-update-ordered.patch
	: update SCSI to use new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

05_blk_scsi-add-fua-support.patch
	: add FUA support to SCSI disk

	Add FUA support to SCSI disk.

06_blk_libata-update-ordered.patch
	: update libata to use new blk_ordered

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

07_blk_libata-add-fua-support.patch
	: add FUA support to libata

	Add FUA support to libata.

08_blk_ide-update-ordered.patch
	: update IDE to use new blk_ordered

	Update IDE to use new blk_ordered.

09_blk_ide-add-fua-support.patch
	: add FUA support to IDE

	Add FUA support to IDE

10_blk_add-barrier-doc.patch
	: I/O barrier documentation

	I/O barrier documentation

[ End of patch descriptions ]

 Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
@ 2005-07-26 15:45 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 02/10] blk: separate out bio init part from __make_request Tejun Heo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:45 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

01_blk_add-uptodate-to-end_that_request_last.patch

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/block/DAC960.c          |    2 +-
 drivers/block/cciss.c           |    2 +-
 drivers/block/cpqarray.c        |    2 +-
 drivers/block/elevator.c        |    2 +-
 drivers/block/floppy.c          |    2 +-
 drivers/block/ll_rw_blk.c       |   20 ++++++++++++++------
 drivers/block/nbd.c             |    2 +-
 drivers/block/sx8.c             |    2 +-
 drivers/block/ub.c              |    2 +-
 drivers/block/viodasd.c         |    2 +-
 drivers/cdrom/cdu31a.c          |    2 +-
 drivers/ide/ide-cd.c            |    4 ++--
 drivers/ide/ide-io.c            |    6 +++---
 drivers/message/i2o/i2o_block.c |    6 +++---
 drivers/mmc/mmc_block.c         |    4 ++--
 drivers/s390/block/dasd.c       |    2 +-
 drivers/s390/char/tape_block.c  |    2 +-
 drivers/scsi/ide-scsi.c         |    4 ++--
 drivers/scsi/scsi_lib.c         |    8 ++++----
 drivers/scsi/sd.c               |    2 +-
 include/linux/blkdev.h          |    6 +++---
 21 files changed, 46 insertions(+), 38 deletions(-)

Index: blk-fixes/drivers/block/DAC960.c
===================================================================
--- blk-fixes.orig/drivers/block/DAC960.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/DAC960.c	2005-07-27 00:44:50.000000000 +0900
@@ -3480,7 +3480,7 @@ static inline boolean DAC960_ProcessComp
 
 	 if (!end_that_request_first(Request, UpToDate, Command->BlockCount)) {
 
- 	 	end_that_request_last(Request);
+ 	 	end_that_request_last(Request, UpToDate);
 
 		if (Command->Completion) {
 			complete(Command->Completion);
Index: blk-fixes/drivers/block/cciss.c
===================================================================
--- blk-fixes.orig/drivers/block/cciss.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/cciss.c	2005-07-27 00:44:50.000000000 +0900
@@ -2071,7 +2071,7 @@ static inline void complete_command( ctl
 	printk("Done with %p\n", cmd->rq);
 #endif /* CCISS_DEBUG */ 
 
-	end_that_request_last(cmd->rq);
+	end_that_request_last(cmd->rq, status ? 1 : -EIO);
 	cmd_free(h,cmd,1);
 }
 
Index: blk-fixes/drivers/block/cpqarray.c
===================================================================
--- blk-fixes.orig/drivers/block/cpqarray.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/cpqarray.c	2005-07-27 00:44:50.000000000 +0900
@@ -1036,7 +1036,7 @@ static inline void complete_command(cmdl
 	complete_buffers(cmd->rq->bio, ok);
 
         DBGPX(printk("Done with %p\n", cmd->rq););
-	end_that_request_last(cmd->rq);
+	end_that_request_last(cmd->rq, ok ? 1 : -EIO);
 }
 
 /*
Index: blk-fixes/drivers/block/elevator.c
===================================================================
--- blk-fixes.orig/drivers/block/elevator.c	2005-07-27 00:44:49.000000000 +0900
+++ blk-fixes/drivers/block/elevator.c	2005-07-27 00:44:50.000000000 +0900
@@ -502,7 +502,7 @@ struct request *elv_next_request(request
 			blkdev_dequeue_request(rq);
 			rq->flags |= REQ_QUIET;
 			end_that_request_chunk(rq, 0, nr_bytes);
-			end_that_request_last(rq);
+			end_that_request_last(rq, 0);
 		} else {
 			printk(KERN_ERR "%s: bad return=%d\n", __FUNCTION__,
 								ret);
Index: blk-fixes/drivers/block/floppy.c
===================================================================
--- blk-fixes.orig/drivers/block/floppy.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/floppy.c	2005-07-27 00:44:50.000000000 +0900
@@ -2299,7 +2299,7 @@ static void floppy_end_request(struct re
 	add_disk_randomness(req->rq_disk);
 	floppy_off((long)req->rq_disk->private_data);
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 
 	/* We're done with the request */
 	current_req = NULL;
Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-07-27 00:44:49.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-07-27 00:44:50.000000000 +0900
@@ -342,7 +342,7 @@ EXPORT_SYMBOL(blk_queue_issue_flush_fn);
 /*
  * Cache flushing for ordered writes handling
  */
-static void blk_pre_flush_end_io(struct request *flush_rq)
+static void blk_pre_flush_end_io(struct request *flush_rq, int error)
 {
 	struct request *rq = flush_rq->end_io_data;
 	request_queue_t *q = rq->q;
@@ -360,7 +360,7 @@ static void blk_pre_flush_end_io(struct 
 	}
 }
 
-static void blk_post_flush_end_io(struct request *flush_rq)
+static void blk_post_flush_end_io(struct request *flush_rq, int error)
 {
 	struct request *rq = flush_rq->end_io_data;
 	request_queue_t *q = rq->q;
@@ -2419,7 +2419,7 @@ EXPORT_SYMBOL(blk_put_request);
  * blk_end_sync_rq - executes a completion event on a request
  * @rq: request to complete
  */
-void blk_end_sync_rq(struct request *rq)
+void blk_end_sync_rq(struct request *rq, int error)
 {
 	struct completion *waiting = rq->waiting;
 
@@ -3104,9 +3104,17 @@ EXPORT_SYMBOL(end_that_request_chunk);
 /*
  * queue lock must be held
  */
-void end_that_request_last(struct request *req)
+void end_that_request_last(struct request *req, int uptodate)
 {
 	struct gendisk *disk = req->rq_disk;
+	int error;
+
+	/*
+	 * extend uptodate bool to allow < 0 value to be direct io error
+	 */
+	error = 0;
+	if (end_io_error(uptodate))
+		error = !uptodate ? -EIO : uptodate;
 
 	if (unlikely(laptop_mode) && blk_fs_request(req))
 		laptop_io_completion();
@@ -3127,7 +3135,7 @@ void end_that_request_last(struct reques
 		disk->in_flight--;
 	}
 	if (req->end_io)
-		req->end_io(req);
+		req->end_io(req, error);
 	else
 		__blk_put_request(req->q, req);
 }
@@ -3139,7 +3147,7 @@ void end_request(struct request *req, in
 	if (!end_that_request_first(req, uptodate, req->hard_cur_sectors)) {
 		add_disk_randomness(req->rq_disk);
 		blkdev_dequeue_request(req);
-		end_that_request_last(req);
+		end_that_request_last(req, uptodate);
 	}
 }
 
Index: blk-fixes/drivers/block/nbd.c
===================================================================
--- blk-fixes.orig/drivers/block/nbd.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/nbd.c	2005-07-27 00:44:50.000000000 +0900
@@ -136,7 +136,7 @@ static void nbd_end_request(struct reque
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
-		end_that_request_last(req);
+		end_that_request_last(req, uptodate);
 	}
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
Index: blk-fixes/drivers/block/sx8.c
===================================================================
--- blk-fixes.orig/drivers/block/sx8.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/sx8.c	2005-07-27 00:44:50.000000000 +0900
@@ -746,7 +746,7 @@ static inline void carm_end_request_queu
 	rc = end_that_request_first(req, uptodate, req->hard_nr_sectors);
 	assert(rc == 0);
 
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 
 	rc = carm_put_request(host, crq);
 	assert(rc == 0);
Index: blk-fixes/drivers/block/ub.c
===================================================================
--- blk-fixes.orig/drivers/block/ub.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/ub.c	2005-07-27 00:44:50.000000000 +0900
@@ -875,7 +875,7 @@ static void ub_end_rq(struct request *rq
 
 	rc = end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
 	// assert(rc == 0);
-	end_that_request_last(rq);
+	end_that_request_last(rq, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/block/viodasd.c
===================================================================
--- blk-fixes.orig/drivers/block/viodasd.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/block/viodasd.c	2005-07-27 00:44:50.000000000 +0900
@@ -305,7 +305,7 @@ static void viodasd_end_request(struct r
 	if (end_that_request_first(req, uptodate, num_sectors))
 		return;
 	add_disk_randomness(req->rq_disk);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/cdrom/cdu31a.c
===================================================================
--- blk-fixes.orig/drivers/cdrom/cdu31a.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/cdrom/cdu31a.c	2005-07-27 00:44:50.000000000 +0900
@@ -1402,7 +1402,7 @@ static void do_cdu31a_request(request_qu
 			if (!end_that_request_first(req, 1, nblock)) {
 				spin_lock_irq(q->queue_lock);
 				blkdev_dequeue_request(req);
-				end_that_request_last(req);
+				end_that_request_last(req, 1);
 				spin_unlock_irq(q->queue_lock);
 			}
 			continue;
Index: blk-fixes/drivers/ide/ide-cd.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-cd.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/ide/ide-cd.c	2005-07-27 00:44:50.000000000 +0900
@@ -616,7 +616,7 @@ static void cdrom_end_request (ide_drive
 			 */
 			spin_lock_irqsave(&ide_lock, flags);
 			end_that_request_chunk(failed, 0, failed->data_len);
-			end_that_request_last(failed);
+			end_that_request_last(failed, 0);
 			spin_unlock_irqrestore(&ide_lock, flags);
 		}
 
@@ -1741,7 +1741,7 @@ end_request:
 
 	spin_lock_irqsave(&ide_lock, flags);
 	blkdev_dequeue_request(rq);
-	end_that_request_last(rq);
+	end_that_request_last(rq, 1);
 	HWGROUP(drive)->rq = NULL;
 	spin_unlock_irqrestore(&ide_lock, flags);
 	return ide_stopped;
Index: blk-fixes/drivers/ide/ide-io.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-io.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/ide/ide-io.c	2005-07-27 00:44:50.000000000 +0900
@@ -89,7 +89,7 @@ int __ide_end_request(ide_drive_t *drive
 
 		blkdev_dequeue_request(rq);
 		HWGROUP(drive)->rq = NULL;
-		end_that_request_last(rq);
+		end_that_request_last(rq, uptodate);
 		ret = 0;
 	}
 	return ret;
@@ -247,7 +247,7 @@ static void ide_complete_pm_request (ide
 	}
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
-	end_that_request_last(rq);
+	end_that_request_last(rq, 1);
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 
@@ -379,7 +379,7 @@ void ide_end_drive_cmd (ide_drive_t *dri
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
 	rq->errors = err;
-	end_that_request_last(rq);
+	end_that_request_last(rq, !rq->errors);
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 
Index: blk-fixes/drivers/message/i2o/i2o_block.c
===================================================================
--- blk-fixes.orig/drivers/message/i2o/i2o_block.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/message/i2o/i2o_block.c	2005-07-27 00:44:50.000000000 +0900
@@ -452,7 +452,7 @@ static int i2o_block_reply(struct i2o_co
 
 		while (end_that_request_chunk(req, !req->errors,
 					      le32_to_cpu(pmsg->body[1]))) ;
-		end_that_request_last(req);
+		end_that_request_last(req, !req->errors);
 
 		dev->open_queue_depth--;
 		list_del(&ireq->queue);
@@ -489,7 +489,7 @@ static int i2o_block_reply(struct i2o_co
 		spin_lock_irqsave(q->queue_lock, flags);
 		while (end_that_request_chunk
 		       (req, !req->errors, le32_to_cpu(msg->body[1]))) ;
-		end_that_request_last(req);
+		end_that_request_last(req, !req->errors);
 
 		dev->open_queue_depth--;
 		list_del(&ireq->queue);
@@ -554,7 +554,7 @@ static int i2o_block_reply(struct i2o_co
 		add_disk_randomness(req->rq_disk);
 		spin_lock_irqsave(q->queue_lock, flags);
 
-		end_that_request_last(req);
+		end_that_request_last(req, !req->errors);
 
 		dev->open_queue_depth--;
 		list_del(&ireq->queue);
Index: blk-fixes/drivers/mmc/mmc_block.c
===================================================================
--- blk-fixes.orig/drivers/mmc/mmc_block.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/mmc/mmc_block.c	2005-07-27 00:44:50.000000000 +0900
@@ -254,7 +254,7 @@ static int mmc_blk_issue_rq(struct mmc_q
 			 */
 			add_disk_randomness(req->rq_disk);
 			blkdev_dequeue_request(req);
-			end_that_request_last(req);
+			end_that_request_last(req, 1);
 		}
 		spin_unlock_irq(&md->lock);
 	} while (ret);
@@ -280,7 +280,7 @@ static int mmc_blk_issue_rq(struct mmc_q
 
 	add_disk_randomness(req->rq_disk);
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, 0);
 	spin_unlock_irq(&md->lock);
 
 	return 0;
Index: blk-fixes/drivers/s390/block/dasd.c
===================================================================
--- blk-fixes.orig/drivers/s390/block/dasd.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/s390/block/dasd.c	2005-07-27 00:44:50.000000000 +0900
@@ -1039,7 +1039,7 @@ dasd_end_request(struct request *req, in
 	if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
 		BUG();
 	add_disk_randomness(req->rq_disk);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/s390/char/tape_block.c
===================================================================
--- blk-fixes.orig/drivers/s390/char/tape_block.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/s390/char/tape_block.c	2005-07-27 00:44:50.000000000 +0900
@@ -78,7 +78,7 @@ tapeblock_end_request(struct request *re
 {
 	if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
 		BUG();
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 static void
Index: blk-fixes/drivers/scsi/ide-scsi.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ide-scsi.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/ide-scsi.c	2005-07-27 00:44:50.000000000 +0900
@@ -1039,7 +1039,7 @@ static int idescsi_eh_reset (struct scsi
 
 	/* kill current request */
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, 0);
 	if (req->flags & REQ_SENSE)
 		kfree(scsi->pc->buffer);
 	kfree(scsi->pc);
@@ -1049,7 +1049,7 @@ static int idescsi_eh_reset (struct scsi
 	/* now nuke the drive queue */
 	while ((req = elv_next_request(drive->queue))) {
 		blkdev_dequeue_request(req);
-		end_that_request_last(req);
+		end_that_request_last(req, 0);
 	}
 
 	HWGROUP(drive)->rq = NULL;
Index: blk-fixes/drivers/scsi/scsi_lib.c
===================================================================
--- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/scsi_lib.c	2005-07-27 00:44:50.000000000 +0900
@@ -258,7 +258,7 @@ static void scsi_wait_done(struct scsi_c
 /* This is the end routine we get to if a command was never attached
  * to the request.  Simply complete the request without changing
  * rq_status; this will cause a DRIVER_ERROR. */
-static void scsi_wait_req_end_io(struct request *req)
+static void scsi_wait_req_end_io(struct request *req, int error)
 {
 	BUG_ON(!req->waiting);
 
@@ -574,7 +574,7 @@ static struct scsi_cmnd *scsi_end_reques
 	spin_lock_irqsave(q->queue_lock, flags);
 	if (blk_rq_tagged(req))
 		blk_queue_end_tag(q, req);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	/*
@@ -1259,7 +1259,7 @@ static void scsi_kill_requests(request_q
 		req->flags |= REQ_QUIET;
 		while (end_that_request_first(req, 0, req->nr_sectors))
 			;
-		end_that_request_last(req);
+		end_that_request_last(req, 0);
 	}
 }
 
@@ -1314,7 +1314,7 @@ static void scsi_request_fn(struct reque
 			req->flags |= REQ_QUIET;
 			while (end_that_request_first(req, 0, req->nr_sectors))
 				;
-			end_that_request_last(req);
+			end_that_request_last(req, 0);
 			continue;
 		}
 
Index: blk-fixes/drivers/scsi/sd.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sd.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sd.c	2005-07-27 00:44:50.000000000 +0900
@@ -758,7 +758,7 @@ static void sd_end_flush(request_queue_t
 		 * force journal abort of barriers
 		 */
 		end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
-		end_that_request_last(rq);
+		end_that_request_last(rq, -EOPNOTSUPP);
 	}
 }
 
Index: blk-fixes/include/linux/blkdev.h
===================================================================
--- blk-fixes.orig/include/linux/blkdev.h	2005-07-27 00:44:49.000000000 +0900
+++ blk-fixes/include/linux/blkdev.h	2005-07-27 00:44:50.000000000 +0900
@@ -94,7 +94,7 @@ void copy_io_context(struct io_context *
 void swap_io_context(struct io_context **ioc1, struct io_context **ioc2);
 
 struct request;
-typedef void (rq_end_io_fn)(struct request *);
+typedef void (rq_end_io_fn)(struct request *, int);
 
 struct request_list {
 	int count[2];
@@ -550,7 +550,7 @@ extern void blk_unregister_queue(struct 
 extern void register_disk(struct gendisk *dev);
 extern void generic_make_request(struct bio *bio);
 extern void blk_put_request(struct request *);
-extern void blk_end_sync_rq(struct request *rq);
+extern void blk_end_sync_rq(struct request *rq, int error);
 extern void blk_attempt_remerge(request_queue_t *, struct request *);
 extern void __blk_attempt_remerge(request_queue_t *, struct request *);
 extern struct request *blk_get_request(request_queue_t *, int, int);
@@ -601,7 +601,7 @@ static inline void blk_run_address_space
  */
 extern int end_that_request_first(struct request *, int, int);
 extern int end_that_request_chunk(struct request *, int, int);
-extern void end_that_request_last(struct request *);
+extern void end_that_request_last(struct request *, int);
 extern void end_request(struct request *req, int uptodate);
 
 /*


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 02/10] blk: separate out bio init part from __make_request
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
  2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 03/10] blk: reimplement handling of barrier request Tejun Heo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

02_blk_implement-init_request_from_bio.patch

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ll_rw_blk.c |   59 ++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 32 insertions(+), 27 deletions(-)

Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-07-27 00:44:50.000000000 +0900
@@ -36,6 +36,8 @@
 
 static void blk_unplug_work(void *data);
 static void blk_unplug_timeout(unsigned long data);
+static void init_request_from_bio(struct request *req, struct bio *bio);
+static int __make_request(request_queue_t *q, struct bio *bio);
 
 /*
  * For the allocated request tables
@@ -1655,8 +1657,6 @@ static int blk_init_free_list(request_qu
 	return 0;
 }
 
-static int __make_request(request_queue_t *, struct bio *);
-
 request_queue_t *blk_alloc_queue(int gfp_mask)
 {
 	request_queue_t *q = kmem_cache_alloc(requestq_cachep, gfp_mask);
@@ -2555,6 +2555,35 @@ void blk_attempt_remerge(request_queue_t
 
 EXPORT_SYMBOL(blk_attempt_remerge);
 
+static void init_request_from_bio(struct request *req, struct bio *bio)
+{
+	req->flags |= REQ_CMD;
+
+	/*
+	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
+	 */
+	if (bio_rw_ahead(bio) || bio_failfast(bio))
+		req->flags |= REQ_FAILFAST;
+
+	/*
+	 * REQ_BARRIER implies no merging, but lets make it explicit
+	 */
+	if (unlikely(bio_barrier(bio)))
+		req->flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
+
+	req->errors = 0;
+	req->hard_sector = req->sector = bio->bi_sector;
+	req->hard_nr_sectors = req->nr_sectors = bio_sectors(bio);
+	req->current_nr_sectors = req->hard_cur_sectors = bio_cur_sectors(bio);
+	req->nr_phys_segments = bio_phys_segments(req->q, bio);
+	req->nr_hw_segments = bio_hw_segments(req->q, bio);
+	req->buffer = bio_data(bio);	/* see ->buffer comment above */
+	req->waiting = NULL;
+	req->bio = req->biotail = bio;
+	req->rq_disk = bio->bi_bdev->bd_disk;
+	req->start_time = jiffies;
+}
+
 /*
  * Non-locking blk_attempt_remerge variant.
  */
@@ -2678,31 +2707,7 @@ get_rq:
 		goto again;
 	}
 
-	req->flags |= REQ_CMD;
-
-	/*
-	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
-	 */
-	if (bio_rw_ahead(bio) || bio_failfast(bio))
-		req->flags |= REQ_FAILFAST;
-
-	/*
-	 * REQ_BARRIER implies no merging, but lets make it explicit
-	 */
-	if (barrier)
-		req->flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
-
-	req->errors = 0;
-	req->hard_sector = req->sector = sector;
-	req->hard_nr_sectors = req->nr_sectors = nr_sectors;
-	req->current_nr_sectors = req->hard_cur_sectors = cur_nr_sectors;
-	req->nr_phys_segments = bio_phys_segments(q, bio);
-	req->nr_hw_segments = bio_hw_segments(q, bio);
-	req->buffer = bio_data(bio);	/* see ->buffer comment above */
-	req->waiting = NULL;
-	req->bio = req->biotail = bio;
-	req->rq_disk = bio->bi_bdev->bd_disk;
-	req->start_time = jiffies;
+	init_request_from_bio(req, bio);
 
 	add_request(q, req);
 out:


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 03/10] blk: reimplement handling of barrier request
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
  2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 02/10] blk: separate out bio init part from __make_request Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 04/10] blk: update SCSI to use new blk_ordered Tejun Heo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

03_blk_reimplement-ordered.patch

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/block/elevator.c  |   83 +++++---
 drivers/block/ll_rw_blk.c |  468 ++++++++++++++++++++++++++++++++--------------
 include/linux/blkdev.h    |   83 +++++---
 include/linux/elevator.h  |    1 
 4 files changed, 443 insertions(+), 192 deletions(-)

Index: blk-fixes/drivers/block/elevator.c
===================================================================
--- blk-fixes.orig/drivers/block/elevator.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/block/elevator.c	2005-07-27 00:44:51.000000000 +0900
@@ -326,22 +326,24 @@ void elv_requeue_request(request_queue_t
 
 	rq->flags &= ~REQ_STARTED;
 
-	/*
-	 * if this is the flush, requeue the original instead and drop the flush
-	 */
-	if (rq->flags & REQ_BAR_FLUSH) {
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		rq = rq->end_io_data;
-	}
-
-	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+	__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE, 0);
 }
 
 void __elv_add_request(request_queue_t *q, struct request *rq, int where,
 		       int plug)
 {
+	struct list_head *pos;
+	unsigned ordseq;
+
+	rq->flags |= q->ordcolor ? REQ_ORDERED_COLOR : 0;
+
 	if (rq->flags & (REQ_SOFTBARRIER | REQ_HARDBARRIER)) {
 		/*
+		 * toggle ordered color
+		 */
+		q->ordcolor ^= 1;
+
+		/*
 		 * barriers implicitly indicate back insertion
 		 */
 		if (where == ELEVATOR_INSERT_SORT)
@@ -397,6 +399,30 @@ void __elv_add_request(request_queue_t *
 			q->last_merge = rq;
 		break;
 
+	case ELEVATOR_INSERT_REQUEUE:
+		/*
+		 * If ordered flush isn't in progress, we do front
+		 * insertion; otherwise, requests should be requeued
+		 * in ordseq order.
+		 */
+		rq->flags |= REQ_SOFTBARRIER;
+
+		if (q->ordseq == 0) {
+			list_add(&rq->queuelist, &q->queue_head);
+			break;
+		}
+
+		ordseq = blk_ordered_req_seq(rq);
+
+		list_for_each(pos, &q->queue_head) {
+			struct request *pos_rq = list_entry_rq(pos);
+			if (ordseq <= blk_ordered_req_seq(pos_rq))
+				break;
+		}
+
+		list_add_tail(&rq->queuelist, pos);
+		break;
+
 	default:
 		printk(KERN_ERR "%s: bad insertion point %d\n",
 		       __FUNCTION__, where);
@@ -426,25 +452,16 @@ static inline struct request *__elv_next
 {
 	struct request *rq;
 
-	if (unlikely(list_empty(&q->queue_head) &&
-		     !q->elevator->ops->elevator_dispatch_fn(q, 0)))
-		return NULL;
-
-	rq = list_entry_rq(q->queue_head.next);
-
-	/*
-	 * if this is a barrier write and the device has to issue a
-	 * flush sequence to support it, check how far we are
-	 */
-	if (blk_fs_request(rq) && blk_barrier_rq(rq)) {
-		BUG_ON(q->ordered == QUEUE_ORDERED_NONE);
+	while (1) {
+		while (!list_empty(&q->queue_head)) {
+			rq = list_entry_rq(q->queue_head.next);
+			if (blk_do_ordered(q, &rq))
+				return rq;
+		}
 
-		if (q->ordered == QUEUE_ORDERED_FLUSH &&
-		    !blk_barrier_preflush(rq))
-			rq = blk_start_pre_flush(q, rq);
+		if (!q->elevator->ops->elevator_dispatch_fn(q, 0))
+			return NULL;
 	}
-
-	return rq;
 }
 
 struct request *elv_next_request(request_queue_t *q)
@@ -610,7 +627,21 @@ void elv_completed_request(request_queue
 	 * request is released from the driver, io must be done
 	 */
 	if (blk_account_rq(rq)) {
+		struct request *first_rq = list_entry_rq(q->queue_head.next);
+
 		q->in_flight--;
+
+		/*
+		 * Check if the queue is waiting for fs requests to be
+		 * drained for flush sequence.
+		 */
+		if (q->ordseq && q->in_flight == 0 &&
+		    blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN &&
+		    blk_ordered_req_seq(first_rq) > QUEUE_ORDSEQ_DRAIN) {
+			blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0);
+			q->request_fn(q);
+		}
+
 		if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
 			e->ops->elevator_completed_req_fn(q, rq);
 	}
Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-07-27 00:44:51.000000000 +0900
@@ -286,10 +286,104 @@ static inline void rq_init(request_queue
 	rq->end_io_data = NULL;
 }
 
+static int __blk_queue_ordered(request_queue_t *q, int ordered,
+			       prepare_flush_fn *prepare_flush_fn,
+			       unsigned gfp_mask, int locked)
+{
+	void *rq0 = NULL, *rq1 = NULL, *rq2 = NULL;
+	unsigned long flags = 0; /* shut up, gcc */
+	int ret = 0;
+
+	might_sleep_if(gfp_mask & __GFP_WAIT);
+
+	if (ordered & (QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH) &&
+	    prepare_flush_fn == NULL) {
+		printk(KERN_ERR "blk_queue_ordered: prepare_flush_fn required\n");
+		ret = -EINVAL;
+		goto out_unlocked;
+	}
+
+	if (!locked)
+		spin_lock_irqsave(q->queue_lock, flags);
+
+	if (q->ordseq) {
+		/* Queue flushing in progress. */
+		ret = -EINPROGRESS;
+		goto out;
+	}
+
+	switch (ordered) {
+	case QUEUE_ORDERED_NONE:
+		if (!q->pre_flush_rq) {
+			rq0 = q->pre_flush_rq;
+			rq1 = q->bar_rq;
+			rq2 = q->post_flush_rq;
+			q->pre_flush_rq = NULL;
+			q->bar_rq = NULL;
+			q->post_flush_rq = NULL;
+		}
+		break;
+
+	case QUEUE_ORDERED_DRAIN:
+	case QUEUE_ORDERED_DRAIN_FLUSH:
+	case QUEUE_ORDERED_DRAIN_FUA:
+	case QUEUE_ORDERED_TAG:
+	case QUEUE_ORDERED_TAG_FLUSH:
+	case QUEUE_ORDERED_TAG_FUA:
+		if (q->pre_flush_rq)
+			break;
+
+		if (!locked)
+			spin_unlock_irqrestore(q->queue_lock, flags);
+		if (!(rq0 = kmem_cache_alloc(request_cachep, gfp_mask)) ||
+		    !(rq1 = kmem_cache_alloc(request_cachep, gfp_mask)) ||
+		    !(rq2 = kmem_cache_alloc(request_cachep, gfp_mask))) {
+			printk(KERN_ERR "blk_queue_ordered: failed to "
+			       "switch to 0x%x (out of memory)\n", ordered);
+			ret = -ENOMEM;
+			goto out_unlocked;
+		}
+		if (!locked)
+			spin_lock_irqsave(q->queue_lock, flags);
+
+		if (!q->pre_flush_rq) {
+			q->pre_flush_rq = rq0;
+			q->bar_rq = rq1;
+			q->post_flush_rq = rq2;
+			rq0 = NULL;
+			rq1 = NULL;
+			rq2 = NULL;
+		}
+		break;
+
+	default:
+		printk(KERN_ERR "blk_queue_ordered: bad value %d\n", ordered);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	q->ordered = ordered;
+	q->prepare_flush_fn = prepare_flush_fn;
+
+ out:
+	if (!locked)
+		spin_unlock_irqrestore(q->queue_lock, flags);
+ out_unlocked:
+	if (rq0)
+		kmem_cache_free(request_cachep, rq0);
+	if (rq1)
+		kmem_cache_free(request_cachep, rq1);
+	if (rq2)
+		kmem_cache_free(request_cachep, rq2);
+
+	return ret;
+}
+
 /**
  * blk_queue_ordered - does this queue support ordered writes
- * @q:     the request queue
- * @flag:  see below
+ * @q:        the request queue
+ * @ordered:  one of QUEUE_ORDERED_*
+ * @gfp_mask: allocation mask
  *
  * Description:
  *   For journalled file systems, doing ordered writes on a commit
@@ -298,32 +392,24 @@ static inline void rq_init(request_queue
  *   feature should call this function and indicate so.
  *
  **/
-void blk_queue_ordered(request_queue_t *q, int flag)
+extern int blk_queue_ordered(request_queue_t *q, unsigned ordered,
+			     prepare_flush_fn *prepare_flush_fn,
+			     unsigned gfp_mask)
 {
-	switch (flag) {
-		case QUEUE_ORDERED_NONE:
-			if (q->flush_rq)
-				kmem_cache_free(request_cachep, q->flush_rq);
-			q->flush_rq = NULL;
-			q->ordered = flag;
-			break;
-		case QUEUE_ORDERED_TAG:
-			q->ordered = flag;
-			break;
-		case QUEUE_ORDERED_FLUSH:
-			q->ordered = flag;
-			if (!q->flush_rq)
-				q->flush_rq = kmem_cache_alloc(request_cachep,
-								GFP_KERNEL);
-			break;
-		default:
-			printk("blk_queue_ordered: bad value %d\n", flag);
-			break;
-	}
+	return __blk_queue_ordered(q, ordered, prepare_flush_fn, gfp_mask, 0);
 }
 
 EXPORT_SYMBOL(blk_queue_ordered);
 
+extern int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
+				    prepare_flush_fn *prepare_flush_fn,
+				    unsigned gfp_mask)
+{
+	return __blk_queue_ordered(q, ordered, prepare_flush_fn, gfp_mask, 1);
+}
+
+EXPORT_SYMBOL(blk_queue_ordered_locked);
+
 /**
  * blk_queue_issue_flush_fn - set function for issuing a flush
  * @q:     the request queue
@@ -344,167 +430,263 @@ EXPORT_SYMBOL(blk_queue_issue_flush_fn);
 /*
  * Cache flushing for ordered writes handling
  */
-static void blk_pre_flush_end_io(struct request *flush_rq, int error)
+inline unsigned blk_ordered_cur_seq(request_queue_t *q)
 {
-	struct request *rq = flush_rq->end_io_data;
-	request_queue_t *q = rq->q;
-
-	elv_completed_request(q, flush_rq);
-
-	rq->flags |= REQ_BAR_PREFLUSH;
-
-	if (!flush_rq->errors)
-		elv_requeue_request(q, rq);
-	else {
-		q->end_flush_fn(q, flush_rq);
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		q->request_fn(q);
-	}
+	if (!q->ordseq)
+		return 0;
+	return 1 << ffz(q->ordseq);
 }
 
-static void blk_post_flush_end_io(struct request *flush_rq, int error)
+unsigned blk_ordered_req_seq(struct request *rq)
 {
-	struct request *rq = flush_rq->end_io_data;
 	request_queue_t *q = rq->q;
 
-	elv_completed_request(q, flush_rq);
-
-	rq->flags |= REQ_BAR_POSTFLUSH;
+	BUG_ON(q->ordseq == 0);
 
-	q->end_flush_fn(q, flush_rq);
-	clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-	q->request_fn(q);
+	if (rq == q->pre_flush_rq)
+		return QUEUE_ORDSEQ_PREFLUSH;
+	if (rq == q->bar_rq)
+		return QUEUE_ORDSEQ_BAR;
+	if (rq == q->post_flush_rq)
+		return QUEUE_ORDSEQ_POSTFLUSH;
+
+	if ((rq->flags & REQ_ORDERED_COLOR) ==
+	    (q->orig_bar_rq->flags & REQ_ORDERED_COLOR))
+		return QUEUE_ORDSEQ_DRAIN;
+	else
+		return QUEUE_ORDSEQ_DONE;
 }
 
-struct request *blk_start_pre_flush(request_queue_t *q, struct request *rq)
+void blk_ordered_complete_seq(request_queue_t *q, unsigned seq, int error)
 {
-	struct request *flush_rq = q->flush_rq;
-
-	BUG_ON(!blk_barrier_rq(rq));
+	struct request *rq;
+	int uptodate;
 
-	if (test_and_set_bit(QUEUE_FLAG_FLUSH, &q->queue_flags))
-		return NULL;
+	if (error && !q->orderr)
+		q->orderr = error;
 
-	rq_init(q, flush_rq);
-	flush_rq->elevator_private = NULL;
-	flush_rq->flags = REQ_BAR_FLUSH;
-	flush_rq->rq_disk = rq->rq_disk;
-	flush_rq->rl = NULL;
+	BUG_ON(q->ordseq & seq);
+	q->ordseq |= seq;
 
-	/*
-	 * prepare_flush returns 0 if no flush is needed, just mark both
-	 * pre and post flush as done in that case
-	 */
-	if (!q->prepare_flush_fn(q, flush_rq)) {
-		rq->flags |= REQ_BAR_PREFLUSH | REQ_BAR_POSTFLUSH;
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		return rq;
-	}
+	if (blk_ordered_cur_seq(q) != QUEUE_ORDSEQ_DONE)
+		return;
 
 	/*
-	 * some drivers dequeue requests right away, some only after io
-	 * completion. make sure the request is dequeued.
+	 * Okay, sequence complete.
 	 */
-	if (!list_empty(&rq->queuelist))
-		blkdev_dequeue_request(rq);
+	rq = q->orig_bar_rq;
+	uptodate = q->orderr ? q->orderr : 1;
 
-	flush_rq->end_io_data = rq;
-	flush_rq->end_io = blk_pre_flush_end_io;
+	q->ordseq = 0;
 
-	__elv_add_request(q, flush_rq, ELEVATOR_INSERT_FRONT, 0);
-	return flush_rq;
+	end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
+	end_that_request_last(rq, uptodate);
 }
 
-static void blk_start_post_flush(request_queue_t *q, struct request *rq)
+static void pre_flush_end_io(struct request *rq, int error)
 {
-	struct request *flush_rq = q->flush_rq;
+	elv_completed_request(rq->q, rq);
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
+}
 
-	BUG_ON(!blk_barrier_rq(rq));
+static void bar_end_io(struct request *rq, int error)
+{
+	elv_completed_request(rq->q, rq);
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_BAR, error);
+}
 
-	rq_init(q, flush_rq);
-	flush_rq->elevator_private = NULL;
-	flush_rq->flags = REQ_BAR_FLUSH;
-	flush_rq->rq_disk = rq->rq_disk;
-	flush_rq->rl = NULL;
+static void post_flush_end_io(struct request *rq, int error)
+{
+	elv_completed_request(rq->q, rq);
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_POSTFLUSH, error);
+}
 
-	if (q->prepare_flush_fn(q, flush_rq)) {
-		flush_rq->end_io_data = rq;
-		flush_rq->end_io = blk_post_flush_end_io;
+static void queue_flush(request_queue_t *q, unsigned which)
+{
+	struct request *rq;
+	rq_end_io_fn *end_io;
 
-		__elv_add_request(q, flush_rq, ELEVATOR_INSERT_FRONT, 0);
-		q->request_fn(q);
+	if (which == QUEUE_ORDERED_PREFLUSH) {
+		rq = q->pre_flush_rq;
+		end_io = pre_flush_end_io;
+	} else {
+		rq = q->post_flush_rq;
+		end_io = post_flush_end_io;
 	}
+
+	rq_init(q, rq);
+	rq->flags = REQ_HARDBARRIER;
+	rq->elevator_private = NULL;
+	rq->rq_disk = q->bar_rq->rq_disk;
+	rq->rl = NULL;
+	rq->end_io = end_io;
+	q->prepare_flush_fn(q, rq);
+
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
 }
 
-static inline int blk_check_end_barrier(request_queue_t *q, struct request *rq,
-					int sectors)
+static inline struct request *start_ordered(request_queue_t *q,
+					    struct request *rq)
 {
-	if (sectors > rq->nr_sectors)
-		sectors = rq->nr_sectors;
+	q->bi_size = 0;
+	q->orderr = 0;
+	q->ordseq |= QUEUE_ORDSEQ_STARTED;
+
+	/*
+	 * Prep proxy barrier request.
+	 */
+	blkdev_dequeue_request(rq);
+	q->orig_bar_rq = rq;
+	rq = q->bar_rq;
+	rq_init(q, rq);
+	rq->flags = bio_data_dir(q->orig_bar_rq->bio);
+	rq->flags |= q->ordered & QUEUE_ORDERED_FUA ? REQ_FUA : 0;
+	rq->elevator_private = NULL;
+	rq->rl = NULL;
+	init_request_from_bio(rq, q->orig_bar_rq->bio);
+	rq->end_io = bar_end_io;
 
-	rq->nr_sectors -= sectors;
-	return rq->nr_sectors;
+	/*
+	 * Queue ordered sequence.  As we stack them at the head, we
+	 * need to queue in reverse order.  Note that we rely on that
+	 * no fs request uses ELEVATOR_INSERT_FRONT and thus no fs
+	 * request gets inbetween ordered sequence.
+	 */
+	if (q->ordered & QUEUE_ORDERED_POSTFLUSH)
+		queue_flush(q, QUEUE_ORDERED_POSTFLUSH);
+	else
+		q->ordseq |= QUEUE_ORDSEQ_POSTFLUSH;
+
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+
+	if (q->ordered & QUEUE_ORDERED_PREFLUSH) {
+		queue_flush(q, QUEUE_ORDERED_PREFLUSH);
+		rq = q->pre_flush_rq;
+	} else
+		q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;
+
+	if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
+		q->ordseq |= QUEUE_ORDSEQ_DRAIN;
+	else
+		rq = NULL;
+
+	return rq;
 }
 
-static int __blk_complete_barrier_rq(request_queue_t *q, struct request *rq,
-				     int sectors, int queue_locked)
+int blk_do_ordered(request_queue_t *q, struct request **rqp)
 {
-	if (q->ordered != QUEUE_ORDERED_FLUSH)
-		return 0;
-	if (!blk_fs_request(rq) || !blk_barrier_rq(rq))
-		return 0;
-	if (blk_barrier_postflush(rq))
-		return 0;
-
-	if (!blk_check_end_barrier(q, rq, sectors)) {
-		unsigned long flags = 0;
+	struct request *rq = *rqp, *allowed_rq;
+	int is_barrier = blk_fs_request(rq) && blk_barrier_rq(rq);
 
-		if (!queue_locked)
-			spin_lock_irqsave(q->queue_lock, flags);
+	if (!q->ordseq) {
+		if (!is_barrier)
+			return 1;
 
-		blk_start_post_flush(q, rq);
+		if (q->ordered != QUEUE_ORDERED_NONE) {
+			*rqp = start_ordered(q, rq);
+			return 1;
+		} else {
+			/*
+			 * This can happen when the queue switches to
+			 * ORDERED_NONE while this request is on it.
+			 */
+			blkdev_dequeue_request(rq);
+			end_that_request_first(rq, -EOPNOTSUPP,
+					       rq->hard_nr_sectors);
+			end_that_request_last(rq, -EOPNOTSUPP);
+			*rqp = NULL;
+			return 0;
+		}
+	}
 
-		if (!queue_locked)
-			spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q->ordered & QUEUE_ORDERED_TAG) {
+		if (is_barrier && rq != q->bar_rq)
+			*rqp = NULL;
+		return 1;
 	}
 
+	switch (blk_ordered_cur_seq(q)) {
+	case QUEUE_ORDSEQ_PREFLUSH:
+		allowed_rq = q->pre_flush_rq;
+		break;
+	case QUEUE_ORDSEQ_BAR:
+		allowed_rq = q->bar_rq;
+		break;
+	case QUEUE_ORDSEQ_POSTFLUSH:
+		allowed_rq = q->post_flush_rq;
+		break;
+	default:
+		allowed_rq = NULL;
+		break;
+	}
+
+	if (rq != allowed_rq && (blk_fs_request(rq) || rq == q->pre_flush_rq ||
+				 rq == q->post_flush_rq))
+		*rqp = NULL;
+
 	return 1;
 }
 
-/**
- * blk_complete_barrier_rq - complete possible barrier request
- * @q:  the request queue for the device
- * @rq:  the request
- * @sectors:  number of sectors to complete
- *
- * Description:
- *   Used in driver end_io handling to determine whether to postpone
- *   completion of a barrier request until a post flush has been done. This
- *   is the unlocked variant, used if the caller doesn't already hold the
- *   queue lock.
- **/
-int blk_complete_barrier_rq(request_queue_t *q, struct request *rq, int sectors)
+static int flush_dry_bio_endio(struct bio *bio, unsigned int bytes, int error)
 {
-	return __blk_complete_barrier_rq(q, rq, sectors, 0);
+	request_queue_t *q = bio->bi_private;
+	struct bio_vec *bvec;
+	int i;
+
+	/*
+	 * This is dry run, restore bio_sector and size.  We'll finish
+	 * this request again with the original bi_end_io after an
+	 * error occurs or post flush is complete.
+	 */
+	q->bi_size += bytes;
+
+	if (bio->bi_size)
+		return 1;
+
+	/* Rewind bvec's */
+	bio->bi_idx = 0;
+	bio_for_each_segment(bvec, bio, i) {
+		bvec->bv_len += bvec->bv_offset;
+		bvec->bv_offset = 0;
+	}
+
+	/* Reset bio */
+	set_bit(BIO_UPTODATE, &bio->bi_flags);
+	bio->bi_size = q->bi_size;
+	bio->bi_sector -= (q->bi_size >> 9);
+	q->bi_size = 0;
+
+	return 0;
 }
-EXPORT_SYMBOL(blk_complete_barrier_rq);
 
-/**
- * blk_complete_barrier_rq_locked - complete possible barrier request
- * @q:  the request queue for the device
- * @rq:  the request
- * @sectors:  number of sectors to complete
- *
- * Description:
- *   See blk_complete_barrier_rq(). This variant must be used if the caller
- *   holds the queue lock.
- **/
-int blk_complete_barrier_rq_locked(request_queue_t *q, struct request *rq,
-				   int sectors)
+static inline int ordered_bio_endio(struct request *rq, struct bio *bio,
+				    unsigned int nbytes, int error)
 {
-	return __blk_complete_barrier_rq(q, rq, sectors, 1);
+	request_queue_t *q = rq->q;
+	bio_end_io_t *endio;
+	void *private;
+
+	if (q->bar_rq != rq)
+		return 0;
+
+	/*
+	 * Okay, this is the barrier request in progress, dry finish it.
+	 */
+	if (error && !q->orderr)
+		q->orderr = error;
+
+	endio = bio->bi_end_io;
+	private = bio->bi_private;
+	bio->bi_end_io = flush_dry_bio_endio;
+	bio->bi_private = q;
+
+	bio_endio(bio, nbytes, error);
+
+	bio->bi_end_io = endio;
+	bio->bi_private = private;
+
+	return 1;
 }
-EXPORT_SYMBOL(blk_complete_barrier_rq_locked);
 
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
@@ -1041,6 +1223,7 @@ static char *rq_flags[] = {
 	"REQ_SORTED",
 	"REQ_SOFTBARRIER",
 	"REQ_HARDBARRIER",
+	"REQ_FUA",
 	"REQ_CMD",
 	"REQ_NOMERGE",
 	"REQ_STARTED",
@@ -1060,6 +1243,7 @@ static char *rq_flags[] = {
 	"REQ_PM_SUSPEND",
 	"REQ_PM_RESUME",
 	"REQ_PM_SHUTDOWN",
+	"REQ_ORDERED_COLOR",
 };
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -1632,7 +1816,7 @@ void blk_cleanup_queue(request_queue_t *
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
-	blk_queue_ordered(q, QUEUE_ORDERED_NONE);
+	blk_queue_ordered(q, QUEUE_ORDERED_NONE, NULL, 0);
 
 	kmem_cache_free(requestq_cachep, q);
 }
@@ -2997,7 +3181,8 @@ static int __end_that_request_first(stru
 		if (nr_bytes >= bio->bi_size) {
 			req->bio = bio->bi_next;
 			nbytes = bio->bi_size;
-			bio_endio(bio, nbytes, error);
+			if (!ordered_bio_endio(req, bio, nbytes, error))
+				bio_endio(bio, nbytes, error);
 			next_idx = 0;
 			bio_nbytes = 0;
 		} else {
@@ -3052,7 +3237,8 @@ static int __end_that_request_first(stru
 	 * if the request wasn't completed, update state
 	 */
 	if (bio_nbytes) {
-		bio_endio(bio, bio_nbytes, error);
+		if (!ordered_bio_endio(req, bio, bio_nbytes, error))
+			bio_endio(bio, bio_nbytes, error);
 		bio->bi_idx += next_idx;
 		bio_iovec(bio)->bv_offset += nr_bytes;
 		bio_iovec(bio)->bv_len -= nr_bytes;
Index: blk-fixes/include/linux/blkdev.h
===================================================================
--- blk-fixes.orig/include/linux/blkdev.h	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/include/linux/blkdev.h	2005-07-27 00:44:51.000000000 +0900
@@ -196,6 +196,7 @@ enum rq_flag_bits {
 	__REQ_SORTED,		/* elevator knows about this request */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
 	__REQ_HARDBARRIER,	/* may not be passed by drive either */
+	__REQ_FUA,		/* forced unit access */
 	__REQ_CMD,		/* is a regular fs rw request */
 	__REQ_NOMERGE,		/* don't touch this for merging */
 	__REQ_STARTED,		/* drive already may have started this one */
@@ -219,9 +220,7 @@ enum rq_flag_bits {
 	__REQ_PM_SUSPEND,	/* suspend request */
 	__REQ_PM_RESUME,	/* resume request */
 	__REQ_PM_SHUTDOWN,	/* shutdown request */
-	__REQ_BAR_PREFLUSH,	/* barrier pre-flush done */
-	__REQ_BAR_POSTFLUSH,	/* barrier post-flush */
-	__REQ_BAR_FLUSH,	/* rq is the flush request */
+	__REQ_ORDERED_COLOR,	/* is before or after barrier */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -230,6 +229,7 @@ enum rq_flag_bits {
 #define REQ_SORTED	(1 << __REQ_SORTED)
 #define REQ_SOFTBARRIER	(1 << __REQ_SOFTBARRIER)
 #define REQ_HARDBARRIER	(1 << __REQ_HARDBARRIER)
+#define REQ_FUA		(1 << __REQ_FUA)
 #define REQ_CMD		(1 << __REQ_CMD)
 #define REQ_NOMERGE	(1 << __REQ_NOMERGE)
 #define REQ_STARTED	(1 << __REQ_STARTED)
@@ -249,9 +249,7 @@ enum rq_flag_bits {
 #define REQ_PM_SUSPEND	(1 << __REQ_PM_SUSPEND)
 #define REQ_PM_RESUME	(1 << __REQ_PM_RESUME)
 #define REQ_PM_SHUTDOWN	(1 << __REQ_PM_SHUTDOWN)
-#define REQ_BAR_PREFLUSH	(1 << __REQ_BAR_PREFLUSH)
-#define REQ_BAR_POSTFLUSH	(1 << __REQ_BAR_POSTFLUSH)
-#define REQ_BAR_FLUSH	(1 << __REQ_BAR_FLUSH)
+#define REQ_ORDERED_COLOR	(1 << __REQ_ORDERED_COLOR)
 
 /*
  * State information carried for REQ_PM_SUSPEND and REQ_PM_RESUME
@@ -281,8 +279,7 @@ struct bio_vec;
 typedef int (merge_bvec_fn) (request_queue_t *, struct bio *, struct bio_vec *);
 typedef void (activity_fn) (void *data, int rw);
 typedef int (issue_flush_fn) (request_queue_t *, struct gendisk *, sector_t *);
-typedef int (prepare_flush_fn) (request_queue_t *, struct request *);
-typedef void (end_flush_fn) (request_queue_t *, struct request *);
+typedef void (prepare_flush_fn) (request_queue_t *, struct request *);
 
 enum blk_queue_state {
 	Queue_down,
@@ -327,7 +324,6 @@ struct request_queue
 	activity_fn		*activity_fn;
 	issue_flush_fn		*issue_flush_fn;
 	prepare_flush_fn	*prepare_flush_fn;
-	end_flush_fn		*end_flush_fn;
 
 	/*
 	 * Dispatch queue sorting
@@ -411,14 +407,11 @@ struct request_queue
 	/*
 	 * reserved for flush operations
 	 */
-	struct request		*flush_rq;
-	unsigned char		ordered;
-};
-
-enum {
-	QUEUE_ORDERED_NONE,
-	QUEUE_ORDERED_TAG,
-	QUEUE_ORDERED_FLUSH,
+	unsigned int		ordered, ordseq;
+	int			orderr, ordcolor;
+	struct request		*pre_flush_rq, *bar_rq, *post_flush_rq;
+	struct request		*orig_bar_rq;
+	unsigned int		bi_size;
 };
 
 #define RQ_INACTIVE		(-1)
@@ -436,12 +429,51 @@ enum {
 #define QUEUE_FLAG_REENTER	6	/* Re-entrancy avoidance */
 #define QUEUE_FLAG_PLUGGED	7	/* queue is plugged */
 #define QUEUE_FLAG_BYPASS	8	/* don't use elevator, just do FIFO */
-#define QUEUE_FLAG_FLUSH	9	/* doing barrier flush sequence */
+
+enum {
+	/*
+	 * Hardbarrier is supported with one of the following methods.
+	 *
+	 * NONE		: hardbarrier unsupported
+	 * DRAIN	: ordering by draining is enough
+	 * DRAIN_FLUSH	: ordering by draining w/ pre and post flushes
+	 * DRAIN_FUA	: ordering by draining w/ pre flush and FUA write
+	 * TAG		: ordering by tag is enough
+	 * TAG_FLUSH	: ordering by tag w/ pre and post flushes
+	 * TAG_FUA	: ordering by tag w/ pre flush and FUA write
+	 */
+	QUEUE_ORDERED_NONE	= 0x00,
+	QUEUE_ORDERED_DRAIN	= 0x01,
+	QUEUE_ORDERED_TAG	= 0x02,
+
+	QUEUE_ORDERED_PREFLUSH	= 0x10,
+	QUEUE_ORDERED_POSTFLUSH	= 0x20,
+	QUEUE_ORDERED_FUA	= 0x40,
+
+	QUEUE_ORDERED_DRAIN_FLUSH = QUEUE_ORDERED_DRAIN |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH,
+	QUEUE_ORDERED_DRAIN_FUA	= QUEUE_ORDERED_DRAIN |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_FUA,
+	QUEUE_ORDERED_TAG_FLUSH	= QUEUE_ORDERED_TAG |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH,
+	QUEUE_ORDERED_TAG_FUA	= QUEUE_ORDERED_TAG |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_FUA,
+
+	/*
+	 * Ordered operation sequence
+	 */
+	QUEUE_ORDSEQ_STARTED	= 0x01,	/* flushing in progress */
+	QUEUE_ORDSEQ_DRAIN	= 0x02,	/* waiting for the queue to be drained */
+	QUEUE_ORDSEQ_PREFLUSH	= 0x04,	/* pre-flushing in progress */
+	QUEUE_ORDSEQ_BAR	= 0x08,	/* original barrier req in progress */
+	QUEUE_ORDSEQ_POSTFLUSH	= 0x10,	/* post-flushing in progress */
+	QUEUE_ORDSEQ_DONE	= 0x20,
+};
 
 #define blk_queue_plugged(q)	test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
-#define blk_queue_flushing(q)	test_bit(QUEUE_FLAG_FLUSH, &(q)->queue_flags)
+#define blk_queue_flushing(q)	((q)->ordseq)
 
 #define blk_fs_request(rq)	((rq)->flags & REQ_CMD)
 #define blk_pc_request(rq)	((rq)->flags & REQ_BLOCK_PC)
@@ -457,8 +489,7 @@ enum {
 
 #define blk_sorted_rq(rq)	((rq)->flags & REQ_SORTED)
 #define blk_barrier_rq(rq)	((rq)->flags & REQ_HARDBARRIER)
-#define blk_barrier_preflush(rq)	((rq)->flags & REQ_BAR_PREFLUSH)
-#define blk_barrier_postflush(rq)	((rq)->flags & REQ_BAR_POSTFLUSH)
+#define blk_fua_rq(rq)		((rq)->flags & REQ_FUA)
 
 #define list_entry_rq(ptr)	list_entry((ptr), struct request, queuelist)
 
@@ -635,12 +666,14 @@ extern void blk_queue_prep_rq(request_qu
 extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
 extern void blk_queue_dma_alignment(request_queue_t *, int);
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
-extern void blk_queue_ordered(request_queue_t *, int);
+extern int blk_queue_ordered(request_queue_t *, unsigned, prepare_flush_fn *, unsigned);
+extern int blk_queue_ordered_locked(request_queue_t *, unsigned, prepare_flush_fn *, unsigned);
 extern void blk_queue_issue_flush_fn(request_queue_t *, issue_flush_fn *);
 extern int blkdev_scsi_issue_flush_fn(request_queue_t *, struct gendisk *, sector_t *);
-extern struct request *blk_start_pre_flush(request_queue_t *,struct request *);
-extern int blk_complete_barrier_rq(request_queue_t *, struct request *, int);
-extern int blk_complete_barrier_rq_locked(request_queue_t *, struct request *, int);
+extern int blk_do_ordered(request_queue_t *, struct request **);
+extern unsigned blk_ordered_cur_seq(request_queue_t *);
+extern unsigned blk_ordered_req_seq(struct request *);
+extern void blk_ordered_complete_seq(request_queue_t *, unsigned, int);
 
 extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
 extern void blk_dump_rq_flags(struct request *, char *);
Index: blk-fixes/include/linux/elevator.h
===================================================================
--- blk-fixes.orig/include/linux/elevator.h	2005-07-27 00:44:49.000000000 +0900
+++ blk-fixes/include/linux/elevator.h	2005-07-27 00:44:51.000000000 +0900
@@ -130,6 +130,7 @@ extern int elv_try_last_merge(request_qu
 #define ELEVATOR_INSERT_FRONT	1
 #define ELEVATOR_INSERT_BACK	2
 #define ELEVATOR_INSERT_SORT	3
+#define ELEVATOR_INSERT_REQUEUE	4
 
 /*
  * return values from elevator_may_queue_fn


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 04/10] blk: update SCSI to use new blk_ordered
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (2 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 03/10] blk: reimplement handling of barrier request Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk Tejun Heo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

04_blk_scsi-update-ordered.patch

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/scsi/hosts.c       |    9 ------
 drivers/scsi/scsi_lib.c    |   46 -------------------------------
 drivers/scsi/sd.c          |   66 ++++++++++++++++-----------------------------
 include/scsi/scsi_driver.h |    1 
 include/scsi/scsi_host.h   |    1 
 5 files changed, 24 insertions(+), 99 deletions(-)

Index: blk-fixes/drivers/scsi/hosts.c
===================================================================
--- blk-fixes.orig/drivers/scsi/hosts.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/hosts.c	2005-07-27 00:44:51.000000000 +0900
@@ -264,17 +264,8 @@ struct Scsi_Host *scsi_host_alloc(struct
 	shost->cmd_per_lun = sht->cmd_per_lun;
 	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
 	shost->use_clustering = sht->use_clustering;
-	shost->ordered_flush = sht->ordered_flush;
 	shost->ordered_tag = sht->ordered_tag;
 
-	/*
-	 * hosts/devices that do queueing must support ordered tags
-	 */
-	if (shost->can_queue > 1 && shost->ordered_flush) {
-		printk(KERN_ERR "scsi: ordered flushes don't support queueing\n");
-		shost->ordered_flush = 0;
-	}
-
 	if (sht->max_host_blocked)
 		shost->max_host_blocked = sht->max_host_blocked;
 	else
Index: blk-fixes/drivers/scsi/scsi_lib.c
===================================================================
--- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/scsi/scsi_lib.c	2005-07-27 00:44:51.000000000 +0900
@@ -717,9 +717,6 @@ void scsi_io_completion(struct scsi_cmnd
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
-		return;
-
 	/*
 	 * Free up any indirection buffers we allocated for DMA purposes. 
 	 * For the case of a READ, we need to copy the data out of the
@@ -983,38 +980,6 @@ static int scsi_init_io(struct scsi_cmnd
 	return BLKPREP_KILL;
 }
 
-static int scsi_prepare_flush_fn(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct scsi_driver *drv;
-
-	if (sdev->sdev_state == SDEV_RUNNING) {
-		drv = *(struct scsi_driver **) rq->rq_disk->private_data;
-
-		if (drv->prepare_flush)
-			return drv->prepare_flush(q, rq);
-	}
-
-	return 0;
-}
-
-static void scsi_end_flush_fn(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct request *flush_rq = rq->end_io_data;
-	struct scsi_driver *drv;
-
-	if (flush_rq->errors) {
-		printk("scsi: barrier error, disabling flush support\n");
-		blk_queue_ordered(q, QUEUE_ORDERED_NONE);
-	}
-
-	if (sdev->sdev_state == SDEV_RUNNING) {
-		drv = *(struct scsi_driver **) rq->rq_disk->private_data;
-		drv->end_flush(q, rq);
-	}
-}
-
 static int scsi_issue_flush_fn(request_queue_t *q, struct gendisk *disk,
 			       sector_t *error_sector)
 {
@@ -1442,17 +1407,6 @@ struct request_queue *scsi_alloc_queue(s
 	blk_queue_segment_boundary(q, shost->dma_boundary);
 	blk_queue_issue_flush_fn(q, scsi_issue_flush_fn);
 
-	/*
-	 * ordered tags are superior to flush ordering
-	 */
-	if (shost->ordered_tag)
-		blk_queue_ordered(q, QUEUE_ORDERED_TAG);
-	else if (shost->ordered_flush) {
-		blk_queue_ordered(q, QUEUE_ORDERED_FLUSH);
-		q->prepare_flush_fn = scsi_prepare_flush_fn;
-		q->end_flush_fn = scsi_end_flush_fn;
-	}
-
 	if (!shost->use_clustering)
 		clear_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
 	return q;
Index: blk-fixes/drivers/scsi/sd.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sd.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/scsi/sd.c	2005-07-27 00:44:51.000000000 +0900
@@ -122,8 +122,7 @@ static void sd_shutdown(struct device *d
 static void sd_rescan(struct device *);
 static int sd_init_command(struct scsi_cmnd *);
 static int sd_issue_flush(struct device *, sector_t *);
-static void sd_end_flush(request_queue_t *, struct request *);
-static int sd_prepare_flush(request_queue_t *, struct request *);
+static void sd_prepare_flush(request_queue_t *, struct request *);
 static void sd_read_capacity(struct scsi_disk *sdkp, char *diskname,
 		 struct scsi_request *SRpnt, unsigned char *buffer);
 
@@ -138,8 +137,6 @@ static struct scsi_driver sd_template = 
 	.rescan			= sd_rescan,
 	.init_command		= sd_init_command,
 	.issue_flush		= sd_issue_flush,
-	.prepare_flush		= sd_prepare_flush,
-	.end_flush		= sd_end_flush,
 };
 
 /*
@@ -739,43 +736,12 @@ static int sd_issue_flush(struct device 
 	return sd_sync_cache(sdp);
 }
 
-static void sd_end_flush(request_queue_t *q, struct request *flush_rq)
+static void sd_prepare_flush(request_queue_t *q, struct request *rq)
 {
-	struct request *rq = flush_rq->end_io_data;
-	struct scsi_cmnd *cmd = rq->special;
-	unsigned int bytes = rq->hard_nr_sectors << 9;
-
-	if (!flush_rq->errors) {
-		spin_unlock(q->queue_lock);
-		scsi_io_completion(cmd, bytes, 0);
-		spin_lock(q->queue_lock);
-	} else if (blk_barrier_postflush(rq)) {
-		spin_unlock(q->queue_lock);
-		scsi_io_completion(cmd, 0, bytes);
-		spin_lock(q->queue_lock);
-	} else {
-		/*
-		 * force journal abort of barriers
-		 */
-		end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
-		end_that_request_last(rq, -EOPNOTSUPP);
-	}
-}
-
-static int sd_prepare_flush(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct scsi_disk *sdkp = dev_get_drvdata(&sdev->sdev_gendev);
-
-	if (sdkp->WCE) {
-		memset(rq->cmd, 0, sizeof(rq->cmd));
-		rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
-		rq->timeout = SD_TIMEOUT;
-		rq->cmd[0] = SYNCHRONIZE_CACHE;
-		return 1;
-	}
-
-	return 0;
+	memset(rq->cmd, 0, sizeof(rq->cmd));
+	rq->flags |= REQ_BLOCK_PC;
+	rq->timeout = SD_TIMEOUT;
+	rq->cmd[0] = SYNCHRONIZE_CACHE;
 }
 
 static void sd_rescan(struct device *dev)
@@ -1469,6 +1435,7 @@ static int sd_revalidate_disk(struct gen
 	struct scsi_device *sdp = sdkp->device;
 	struct scsi_request *sreq;
 	unsigned char *buffer;
+	unsigned ordered;
 
 	SCSI_LOG_HLQUEUE(3, printk("sd_revalidate_disk: disk=%s\n", disk->disk_name));
 
@@ -1514,7 +1481,21 @@ static int sd_revalidate_disk(struct gen
 					sreq, buffer);
 		sd_read_cache_type(sdkp, disk->disk_name, sreq, buffer);
 	}
-		
+
+	/*
+	 * We now have all cache related info, determine how we deal
+	 * with ordered requests.  Note that as the current SCSI
+	 * dispatch function can alter request order, we cannot use
+	 * QUEUE_ORDERED_TAG_* even when ordered tag is supported.
+	 */
+	if (sdkp->WCE)
+		ordered = QUEUE_ORDERED_DRAIN_FLUSH;
+	else
+		ordered = QUEUE_ORDERED_DRAIN;
+
+	blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush,
+			  GFP_KERNEL);
+
 	set_capacity(disk, sdkp->capacity);
 	kfree(buffer);
 
@@ -1615,6 +1596,7 @@ static int sd_probe(struct device *dev)
 	strcpy(gd->devfs_name, sdp->devfs_name);
 
 	gd->private_data = &sdkp->driver;
+	gd->queue = sdkp->device->request_queue;
 
 	sd_revalidate_disk(gd);
 
@@ -1622,7 +1604,6 @@ static int sd_probe(struct device *dev)
 	gd->flags = GENHD_FL_DRIVERFS;
 	if (sdp->removable)
 		gd->flags |= GENHD_FL_REMOVABLE;
-	gd->queue = sdkp->device->request_queue;
 
 	dev_set_drvdata(dev, sdkp);
 	add_disk(gd);
@@ -1657,6 +1638,7 @@ static int sd_remove(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 
+	blk_queue_ordered(sdkp->disk->queue, QUEUE_ORDERED_NONE, NULL, 0);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 	down(&sd_ref_sem);
Index: blk-fixes/include/scsi/scsi_driver.h
===================================================================
--- blk-fixes.orig/include/scsi/scsi_driver.h	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/include/scsi/scsi_driver.h	2005-07-27 00:44:51.000000000 +0900
@@ -15,7 +15,6 @@ struct scsi_driver {
 	void (*rescan)(struct device *);
 	int (*issue_flush)(struct device *, sector_t *);
 	int (*prepare_flush)(struct request_queue *, struct request *);
-	void (*end_flush)(struct request_queue *, struct request *);
 };
 #define to_scsi_driver(drv) \
 	container_of((drv), struct scsi_driver, gendrv)
Index: blk-fixes/include/scsi/scsi_host.h
===================================================================
--- blk-fixes.orig/include/scsi/scsi_host.h	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/include/scsi/scsi_host.h	2005-07-27 00:44:51.000000000 +0900
@@ -391,7 +391,6 @@ struct scsi_host_template {
 	/*
 	 * ordered write support
 	 */
-	unsigned ordered_flush:1;
 	unsigned ordered_tag:1;
 
 	/*


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (3 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 04/10] blk: update SCSI to use new blk_ordered Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:55   ` Jeff Garzik
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

05_blk_scsi-add-fua-support.patch

	Add FUA support to SCSI disk.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 sd.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)

Index: blk-fixes/drivers/scsi/sd.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sd.c	2005-07-27 00:44:51.000000000 +0900
+++ blk-fixes/drivers/scsi/sd.c	2005-07-27 00:44:51.000000000 +0900
@@ -103,6 +103,7 @@ struct scsi_disk {
 	u8		write_prot;
 	unsigned	WCE : 1;	/* state of disk WCE bit */
 	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
+	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */
 };
 
 static DEFINE_IDR(sd_index_idr);
@@ -343,6 +344,7 @@ static int sd_init_command(struct scsi_c
 	
 	if (block > 0xffffffff) {
 		SCpnt->cmnd[0] += READ_16 - READ_6;
+		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
 		SCpnt->cmnd[2] = sizeof(block) > 4 ? (unsigned char) (block >> 56) & 0xff : 0;
 		SCpnt->cmnd[3] = sizeof(block) > 4 ? (unsigned char) (block >> 48) & 0xff : 0;
 		SCpnt->cmnd[4] = sizeof(block) > 4 ? (unsigned char) (block >> 40) & 0xff : 0;
@@ -362,6 +364,7 @@ static int sd_init_command(struct scsi_c
 			this_count = 0xffff;
 
 		SCpnt->cmnd[0] += READ_10 - READ_6;
+		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
 		SCpnt->cmnd[2] = (unsigned char) (block >> 24) & 0xff;
 		SCpnt->cmnd[3] = (unsigned char) (block >> 16) & 0xff;
 		SCpnt->cmnd[4] = (unsigned char) (block >> 8) & 0xff;
@@ -370,6 +373,17 @@ static int sd_init_command(struct scsi_c
 		SCpnt->cmnd[7] = (unsigned char) (this_count >> 8) & 0xff;
 		SCpnt->cmnd[8] = (unsigned char) this_count & 0xff;
 	} else {
+		if (unlikely(blk_fua_rq(rq))) {
+			/*
+			 * This happens only if this drive failed
+			 * 10byte rw command with ILLEGAL_REQUEST
+			 * during operation and thus turned off
+			 * use_10_for_rw.
+			 */
+			printk(KERN_ERR "sd: FUA write on READ/WRITE(6) drive\n");
+			return 0;
+		}
+
 		if (this_count > 0xff)
 			this_count = 0xff;
 
@@ -1399,10 +1413,18 @@ sd_read_cache_type(struct scsi_disk *sdk
 			sdkp->RCD = 0;
 		}
 
+		sdkp->DPOFUA = (data.device_specific & 0x10) != 0;
+		if (sdkp->DPOFUA && !sdkp->device->use_10_for_rw) {
+			printk(KERN_NOTICE "SCSI device %s: uses "
+			       "READ/WRITE(6), disabling FUA\n", diskname);
+			sdkp->DPOFUA = 0;
+		}
+
 		ct =  sdkp->RCD + 2*sdkp->WCE;
 
-		printk(KERN_NOTICE "SCSI device %s: drive cache: %s\n",
-		       diskname, types[ct]);
+		printk(KERN_NOTICE "SCSI device %s: drive cache: %s%s\n",
+		       diskname, types[ct],
+		       sdkp->DPOFUA ? " w/ FUA" : "");
 
 		return;
 	}
@@ -1489,7 +1511,8 @@ static int sd_revalidate_disk(struct gen
 	 * QUEUE_ORDERED_TAG_* even when ordered tag is supported.
 	 */
 	if (sdkp->WCE)
-		ordered = QUEUE_ORDERED_DRAIN_FLUSH;
+		ordered = sdkp->DPOFUA
+			? QUEUE_ORDERED_DRAIN_FUA : QUEUE_ORDERED_DRAIN_FLUSH;
 	else
 		ordered = QUEUE_ORDERED_DRAIN;
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk Tejun Heo
@ 2005-07-26 15:55   ` Jeff Garzik
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Garzik @ 2005-07-26 15:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 05_blk_scsi-add-fua-support.patch
> 
> 	Add FUA support to SCSI disk.

ACK



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (4 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:55   ` Jeff Garzik
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata Tejun Heo
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

06_blk_libata-update-ordered.patch

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ahci.c         |    1 -
 ata_piix.c     |    1 -
 sata_nv.c      |    1 -
 sata_promise.c |    1 -
 sata_sil.c     |    1 -
 sata_sis.c     |    1 -
 sata_svw.c     |    1 -
 sata_sx4.c     |    1 -
 sata_uli.c     |    1 -
 sata_via.c     |    1 -
 sata_vsc.c     |    1 -
 11 files changed, 11 deletions(-)

Index: blk-fixes/drivers/scsi/ahci.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ahci.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/ahci.c	2005-07-27 00:44:51.000000000 +0900
@@ -206,7 +206,6 @@ static Scsi_Host_Template ahci_sht = {
 	.dma_boundary		= AHCI_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations ahci_ops = {
Index: blk-fixes/drivers/scsi/ata_piix.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ata_piix.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/ata_piix.c	2005-07-27 00:44:51.000000000 +0900
@@ -123,7 +123,6 @@ static Scsi_Host_Template piix_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations piix_pata_ops = {
Index: blk-fixes/drivers/scsi/sata_nv.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_nv.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_nv.c	2005-07-27 00:44:51.000000000 +0900
@@ -206,7 +206,6 @@ static Scsi_Host_Template nv_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations nv_ops = {
Index: blk-fixes/drivers/scsi/sata_promise.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_promise.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_promise.c	2005-07-27 00:44:51.000000000 +0900
@@ -103,7 +103,6 @@ static Scsi_Host_Template pdc_ata_sht = 
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_ata_ops = {
Index: blk-fixes/drivers/scsi/sata_sil.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sil.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sil.c	2005-07-27 00:44:51.000000000 +0900
@@ -135,7 +135,6 @@ static Scsi_Host_Template sil_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sil_ops = {
Index: blk-fixes/drivers/scsi/sata_sis.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sis.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sis.c	2005-07-27 00:44:51.000000000 +0900
@@ -90,7 +90,6 @@ static Scsi_Host_Template sis_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sis_ops = {
Index: blk-fixes/drivers/scsi/sata_svw.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_svw.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_svw.c	2005-07-27 00:44:51.000000000 +0900
@@ -288,7 +288,6 @@ static Scsi_Host_Template k2_sata_sht = 
 	.proc_info		= k2_sata_proc_info,
 #endif
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 
Index: blk-fixes/drivers/scsi/sata_sx4.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sx4.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sx4.c	2005-07-27 00:44:51.000000000 +0900
@@ -188,7 +188,6 @@ static Scsi_Host_Template pdc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_20621_ops = {
Index: blk-fixes/drivers/scsi/sata_uli.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_uli.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_uli.c	2005-07-27 00:44:51.000000000 +0900
@@ -82,7 +82,6 @@ static Scsi_Host_Template uli_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations uli_ops = {
Index: blk-fixes/drivers/scsi/sata_via.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_via.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_via.c	2005-07-27 00:44:51.000000000 +0900
@@ -102,7 +102,6 @@ static Scsi_Host_Template svia_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations svia_sata_ops = {
Index: blk-fixes/drivers/scsi/sata_vsc.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_vsc.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_vsc.c	2005-07-27 00:44:51.000000000 +0900
@@ -206,7 +206,6 @@ static Scsi_Host_Template vsc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
@ 2005-07-26 15:55   ` Jeff Garzik
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Garzik @ 2005-07-26 15:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 06_blk_libata-update-ordered.patch
> 
> 	Reflect changes in SCSI midlayer and updated to use new
>         ordered request implementation

ACK



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (5 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:55   ` Jeff Garzik
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 08/10] blk: update IDE to use new blk_ordered Tejun Heo
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

07_blk_libata-add-fua-support.patch

	Add FUA support to libata.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/scsi/libata-core.c |   14 +++++++++-----
 drivers/scsi/libata-scsi.c |   26 ++++++++++++++++++++++++--
 include/linux/ata.h        |    4 +++-
 include/linux/libata.h     |    1 +
 4 files changed, 37 insertions(+), 8 deletions(-)

Index: blk-fixes/drivers/scsi/libata-core.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-core.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-core.c	2005-07-27 00:44:52.000000000 +0900
@@ -602,19 +602,21 @@ void ata_tf_from_fis(u8 *fis, struct ata
 }
 
 /**
- *	ata_prot_to_cmd - determine which read/write opcodes to use
+ *	ata_prot_to_cmd - determine which read/write/fua-write opcodes to use
  *	@protocol: ATA_PROT_xxx taskfile protocol
  *	@lba48: true is lba48 is present
  *
- *	Given necessary input, determine which read/write commands
- *	to use to transfer data.
+ *	Given necessary input, determine which read/write/fua-write
+ *	commands to use to transfer data.  Note that we only support
+ *	fua-writes on DMA LBA48 protocol.  In other cases, we simply
+ *	return 0 which is NOP.
  *
  *	LOCKING:
  *	None.
  */
 static int ata_prot_to_cmd(int protocol, int lba48)
 {
-	int rcmd = 0, wcmd = 0;
+	int rcmd = 0, wcmd = 0, wfua = 0;
 
 	switch (protocol) {
 	case ATA_PROT_PIO:
@@ -631,6 +633,7 @@ static int ata_prot_to_cmd(int protocol,
 		if (lba48) {
 			rcmd = ATA_CMD_READ_EXT;
 			wcmd = ATA_CMD_WRITE_EXT;
+			wfua = ATA_CMD_WRITE_FUA_EXT;
 		} else {
 			rcmd = ATA_CMD_READ;
 			wcmd = ATA_CMD_WRITE;
@@ -641,7 +644,7 @@ static int ata_prot_to_cmd(int protocol,
 		return -1;
 	}
 
-	return rcmd | (wcmd << 8);
+	return rcmd | (wcmd << 8) | (wfua << 16);
 }
 
 /**
@@ -674,6 +677,7 @@ static void ata_dev_set_protocol(struct 
 
 	dev->read_cmd = cmd & 0xff;
 	dev->write_cmd = (cmd >> 8) & 0xff;
+	dev->write_fua_cmd = (cmd >> 16) & 0xff;
 }
 
 static const char * xfer_mode_str[] = {
Index: blk-fixes/include/linux/ata.h
===================================================================
--- blk-fixes.orig/include/linux/ata.h	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/include/linux/ata.h	2005-07-27 00:44:52.000000000 +0900
@@ -117,6 +117,7 @@ enum {
 	ATA_CMD_READ_EXT	= 0x25,
 	ATA_CMD_WRITE		= 0xCA,
 	ATA_CMD_WRITE_EXT	= 0x35,
+	ATA_CMD_WRITE_FUA_EXT	= 0x3D,
 	ATA_CMD_PIO_READ	= 0x20,
 	ATA_CMD_PIO_READ_EXT	= 0x24,
 	ATA_CMD_PIO_WRITE	= 0x30,
@@ -227,7 +228,8 @@ struct ata_taskfile {
 #define ata_id_is_sata(id)	((id)[93] == 0)
 #define ata_id_rahead_enabled(id) ((id)[85] & (1 << 6))
 #define ata_id_wcache_enabled(id) ((id)[85] & (1 << 5))
-#define ata_id_has_flush(id) ((id)[83] & (1 << 12))
+#define ata_id_has_fua(id)	((id)[84] & (1 << 6))
+#define ata_id_has_flush(id)	((id)[83] & (1 << 12))
 #define ata_id_has_flush_ext(id) ((id)[83] & (1 << 13))
 #define ata_id_has_lba48(id)	((id)[83] & (1 << 10))
 #define ata_id_has_wcache(id)	((id)[82] & (1 << 5))
Index: blk-fixes/include/linux/libata.h
===================================================================
--- blk-fixes.orig/include/linux/libata.h	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/include/linux/libata.h	2005-07-27 00:44:52.000000000 +0900
@@ -278,6 +278,7 @@ struct ata_device {
 	u8			xfer_protocol;	/* taskfile xfer protocol */
 	u8			read_cmd;	/* opcode to use on read */
 	u8			write_cmd;	/* opcode to use on write */
+	u8			write_fua_cmd;	/* opcode to use on FUA write */
 };
 
 struct ata_port {
Index: blk-fixes/drivers/scsi/libata-scsi.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-scsi.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-scsi.c	2005-07-27 00:44:52.000000000 +0900
@@ -537,6 +537,7 @@ static unsigned int ata_scsi_rw_xlat(str
 {
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned int lba48 = tf->flags & ATA_TFLAG_LBA48;
+	int fua = scsicmd[1] & (1 << 3);
 
 	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
 	tf->protocol = qc->dev->xfer_protocol;
@@ -544,9 +545,28 @@ static unsigned int ata_scsi_rw_xlat(str
 
 	if (scsicmd[0] == READ_10 || scsicmd[0] == READ_6 ||
 	    scsicmd[0] == READ_16) {
-		tf->command = qc->dev->read_cmd;
+		if (!fua) {
+			tf->command = qc->dev->read_cmd;
+		} else {
+			printk(KERN_WARNING
+			       "ata%u(%u): WARNING: FUA READ unsupported\n",
+			       qc->ap->id, qc->dev->devno);
+			return 1;
+		}
 	} else {
-		tf->command = qc->dev->write_cmd;
+		if (!fua) {
+			tf->command = qc->dev->write_cmd;
+		} else {
+			if (qc->dev->write_fua_cmd == 0 || !lba48) {
+				printk(KERN_WARNING
+				       "ata%u(%u): WARNING: FUA WRITE "
+				       "unsupported with the current "
+				       "protocol/addressing\n",
+				       qc->ap->id, qc->dev->devno);
+				return 1;
+			}
+			tf->command = qc->dev->write_fua_cmd;
+		}
 		tf->flags |= ATA_TFLAG_WRITE;
 	}
 
@@ -1141,10 +1161,12 @@ unsigned int ata_scsiop_mode_sense(struc
 	if (six_byte) {
 		output_len--;
 		rbuf[0] = output_len;
+		rbuf[2] |= ata_id_has_fua(args->id) ? 1 << 4 : 0;
 	} else {
 		output_len -= 2;
 		rbuf[0] = output_len >> 8;
 		rbuf[1] = output_len;
+		rbuf[3] |= ata_id_has_fua(args->id) ? 1 << 4 : 0;
 	}
 
 	return 0;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata Tejun Heo
@ 2005-07-26 15:55   ` Jeff Garzik
  2005-07-27  7:44     ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Garzik @ 2005-07-26 15:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 07_blk_libata-add-fua-support.patch
> 
> 	Add FUA support to libata.

NAK -- doesn't appear to take into account that read/write(6) don't 
support FUA.

Correct me if I'm wrong.

Otherwise, looks OK.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata
  2005-07-26 15:55   ` Jeff Garzik
@ 2005-07-27  7:44     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-27  7:44 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

On Tue, Jul 26, 2005 at 11:55:50AM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >07_blk_libata-add-fua-support.patch
> >
> >	Add FUA support to libata.
> 
> NAK -- doesn't appear to take into account that read/write(6) don't 
> support FUA.
> 
> Correct me if I'm wrong.
> 
> Otherwise, looks OK.
> 

 Hello, Jeff.

 I'm sorry.  It's my bad.  Here's the corrected one.  Thanks for
pointing out.

 As this patch is the last one which modifies libata, this changes
does not affect other patches in this patchset.  Just ignoring the
original one and applying this one should suffice.

Signed-off-by: Tejun Heo <htejun@gmail.com>

Index: blk-fixes/drivers/scsi/libata-core.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-core.c	2005-07-27 15:53:29.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-core.c	2005-07-27 15:53:31.000000000 +0900
@@ -602,19 +602,21 @@ void ata_tf_from_fis(u8 *fis, struct ata
 }
 
 /**
- *	ata_prot_to_cmd - determine which read/write opcodes to use
+ *	ata_prot_to_cmd - determine which read/write/fua-write opcodes to use
  *	@protocol: ATA_PROT_xxx taskfile protocol
  *	@lba48: true is lba48 is present
  *
- *	Given necessary input, determine which read/write commands
- *	to use to transfer data.
+ *	Given necessary input, determine which read/write/fua-write
+ *	commands to use to transfer data.  Note that we only support
+ *	fua-writes on DMA LBA48 protocol.  In other cases, we simply
+ *	return 0 which is NOP.
  *
  *	LOCKING:
  *	None.
  */
 static int ata_prot_to_cmd(int protocol, int lba48)
 {
-	int rcmd = 0, wcmd = 0;
+	int rcmd = 0, wcmd = 0, wfua = 0;
 
 	switch (protocol) {
 	case ATA_PROT_PIO:
@@ -631,6 +633,7 @@ static int ata_prot_to_cmd(int protocol,
 		if (lba48) {
 			rcmd = ATA_CMD_READ_EXT;
 			wcmd = ATA_CMD_WRITE_EXT;
+			wfua = ATA_CMD_WRITE_FUA_EXT;
 		} else {
 			rcmd = ATA_CMD_READ;
 			wcmd = ATA_CMD_WRITE;
@@ -641,7 +644,7 @@ static int ata_prot_to_cmd(int protocol,
 		return -1;
 	}
 
-	return rcmd | (wcmd << 8);
+	return rcmd | (wcmd << 8) | (wfua << 16);
 }
 
 /**
@@ -674,6 +677,7 @@ static void ata_dev_set_protocol(struct 
 
 	dev->read_cmd = cmd & 0xff;
 	dev->write_cmd = (cmd >> 8) & 0xff;
+	dev->write_fua_cmd = (cmd >> 16) & 0xff;
 }
 
 static const char * xfer_mode_str[] = {
Index: blk-fixes/include/linux/ata.h
===================================================================
--- blk-fixes.orig/include/linux/ata.h	2005-07-27 15:53:29.000000000 +0900
+++ blk-fixes/include/linux/ata.h	2005-07-27 15:53:31.000000000 +0900
@@ -117,6 +117,7 @@ enum {
 	ATA_CMD_READ_EXT	= 0x25,
 	ATA_CMD_WRITE		= 0xCA,
 	ATA_CMD_WRITE_EXT	= 0x35,
+	ATA_CMD_WRITE_FUA_EXT	= 0x3D,
 	ATA_CMD_PIO_READ	= 0x20,
 	ATA_CMD_PIO_READ_EXT	= 0x24,
 	ATA_CMD_PIO_WRITE	= 0x30,
@@ -227,7 +228,8 @@ struct ata_taskfile {
 #define ata_id_is_sata(id)	((id)[93] == 0)
 #define ata_id_rahead_enabled(id) ((id)[85] & (1 << 6))
 #define ata_id_wcache_enabled(id) ((id)[85] & (1 << 5))
-#define ata_id_has_flush(id) ((id)[83] & (1 << 12))
+#define ata_id_has_fua(id)	((id)[84] & (1 << 6))
+#define ata_id_has_flush(id)	((id)[83] & (1 << 12))
 #define ata_id_has_flush_ext(id) ((id)[83] & (1 << 13))
 #define ata_id_has_lba48(id)	((id)[83] & (1 << 10))
 #define ata_id_has_wcache(id)	((id)[82] & (1 << 5))
Index: blk-fixes/include/linux/libata.h
===================================================================
--- blk-fixes.orig/include/linux/libata.h	2005-07-27 15:53:29.000000000 +0900
+++ blk-fixes/include/linux/libata.h	2005-07-27 15:53:31.000000000 +0900
@@ -278,6 +278,7 @@ struct ata_device {
 	u8			xfer_protocol;	/* taskfile xfer protocol */
 	u8			read_cmd;	/* opcode to use on read */
 	u8			write_cmd;	/* opcode to use on write */
+	u8			write_fua_cmd;	/* opcode to use on FUA write */
 };
 
 struct ata_port {
Index: blk-fixes/drivers/scsi/libata-scsi.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-scsi.c	2005-07-27 15:53:29.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-scsi.c	2005-07-27 16:18:45.000000000 +0900
@@ -542,12 +542,40 @@ static unsigned int ata_scsi_rw_xlat(str
 	tf->protocol = qc->dev->xfer_protocol;
 	tf->device |= ATA_LBA;
 
-	if (scsicmd[0] == READ_10 || scsicmd[0] == READ_6 ||
-	    scsicmd[0] == READ_16) {
+	switch (scsicmd[0]) {
+	case READ_10:
+	case READ_16:
+		if (unlikely(scsicmd[1] & (1 << 3))) {
+			printk(KERN_WARNING
+			       "ata%u(%u): WARNING: FUA READ unsupported\n",
+			       qc->ap->id, qc->dev->devno);
+			return 1;
+		}
+		/* fall through */
+	case READ_6:
 		tf->command = qc->dev->read_cmd;
-	} else {
+		break;
+
+	case WRITE_10:
+	case WRITE_16:
+		if (unlikely(scsicmd[1] & (1 << 3))) {
+			if (qc->dev->write_fua_cmd == 0 || !lba48) {
+				printk(KERN_WARNING
+				       "ata%u(%u): WARNING: FUA WRITE "
+				       "unsupported with the current "
+				       "protocol/addressing\n",
+				       qc->ap->id, qc->dev->devno);
+				return 1;
+			}
+			tf->command = qc->dev->write_fua_cmd;
+			tf->flags |= ATA_TFLAG_WRITE;
+			break;
+		}
+		/* fall through */
+	case WRITE_6:
 		tf->command = qc->dev->write_cmd;
 		tf->flags |= ATA_TFLAG_WRITE;
+		break;
 	}
 
 	if (scsicmd[0] == READ_10 || scsicmd[0] == WRITE_10) {
@@ -1141,10 +1169,12 @@ unsigned int ata_scsiop_mode_sense(struc
 	if (six_byte) {
 		output_len--;
 		rbuf[0] = output_len;
+		rbuf[2] |= ata_id_has_fua(args->id) ? 1 << 4 : 0;
 	} else {
 		output_len -= 2;
 		rbuf[0] = output_len >> 8;
 		rbuf[1] = output_len;
+		rbuf[3] |= ata_id_has_fua(args->id) ? 1 << 4 : 0;
 	}
 
 	return 0;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 08/10] blk: update IDE to use new blk_ordered
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (6 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 09/10] blk: add FUA support to IDE Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 10/10] blk: I/O barrier documentation Tejun Heo
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

08_blk_ide-update-ordered.patch

	Update IDE to use new blk_ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ide-disk.c |   57 +++++++++------------------------------------------------
 ide-io.c   |    5 +----
 2 files changed, 10 insertions(+), 52 deletions(-)

Index: blk-fixes/drivers/ide/ide-disk.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-disk.c	2005-07-27 00:44:48.000000000 +0900
+++ blk-fixes/drivers/ide/ide-disk.c	2005-07-27 00:44:52.000000000 +0900
@@ -677,50 +677,9 @@ static ide_proc_entry_t idedisk_proc[] =
 
 #endif	/* CONFIG_PROC_FS */
 
-static void idedisk_end_flush(request_queue_t *q, struct request *flush_rq)
+static void idedisk_prepare_flush(request_queue_t *q, struct request *rq)
 {
 	ide_drive_t *drive = q->queuedata;
-	struct request *rq = flush_rq->end_io_data;
-	int good_sectors = rq->hard_nr_sectors;
-	int bad_sectors;
-	sector_t sector;
-
-	if (flush_rq->errors & ABRT_ERR) {
-		printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
-		blk_queue_ordered(drive->queue, QUEUE_ORDERED_NONE);
-		blk_queue_issue_flush_fn(drive->queue, NULL);
-		good_sectors = 0;
-	} else if (flush_rq->errors) {
-		good_sectors = 0;
-		if (blk_barrier_preflush(rq)) {
-			sector = ide_get_error_location(drive,flush_rq->buffer);
-			if ((sector >= rq->hard_sector) &&
-			    (sector < rq->hard_sector + rq->hard_nr_sectors))
-				good_sectors = sector - rq->hard_sector;
-		}
-	}
-
-	if (flush_rq->errors)
-		printk(KERN_ERR "%s: failed barrier write: "
-				"sector=%Lx(good=%d/bad=%d)\n",
-				drive->name, (unsigned long long)rq->sector,
-				good_sectors,
-				(int) (rq->hard_nr_sectors-good_sectors));
-
-	bad_sectors = rq->hard_nr_sectors - good_sectors;
-
-	if (good_sectors)
-		__ide_end_request(drive, rq, 1, good_sectors);
-	if (bad_sectors)
-		__ide_end_request(drive, rq, 0, bad_sectors);
-}
-
-static int idedisk_prepare_flush(request_queue_t *q, struct request *rq)
-{
-	ide_drive_t *drive = q->queuedata;
-
-	if (!drive->wcache)
-		return 0;
 
 	memset(rq->cmd, 0, sizeof(rq->cmd));
 
@@ -731,9 +690,8 @@ static int idedisk_prepare_flush(request
 		rq->cmd[0] = WIN_FLUSH_CACHE;
 
 
-	rq->flags |= REQ_DRIVE_TASK | REQ_SOFTBARRIER;
+	rq->flags |= REQ_DRIVE_TASK;
 	rq->buffer = rq->cmd;
-	return 1;
 }
 
 static int idedisk_issue_flush(request_queue_t *q, struct gendisk *disk,
@@ -1008,11 +966,12 @@ static void idedisk_setup (ide_drive_t *
 	printk(KERN_INFO "%s: cache flushes %ssupported\n",
 		drive->name, barrier ? "" : "not ");
 	if (barrier) {
-		blk_queue_ordered(drive->queue, QUEUE_ORDERED_FLUSH);
-		drive->queue->prepare_flush_fn = idedisk_prepare_flush;
-		drive->queue->end_flush_fn = idedisk_end_flush;
+		blk_queue_ordered(drive->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+				  idedisk_prepare_flush, GFP_KERNEL);
 		blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
-	}
+	} else if (!drive->wcache)
+		blk_queue_ordered(drive->queue, QUEUE_ORDERED_DRAIN,
+				  NULL, GFP_KERNEL);
 }
 
 static void ide_cacheflush_p(ide_drive_t *drive)
@@ -1030,6 +989,8 @@ static int ide_disk_remove(struct device
 	struct ide_disk_obj *idkp = drive->driver_data;
 	struct gendisk *g = idkp->disk;
 
+	blk_queue_ordered(drive->queue, QUEUE_ORDERED_NONE, NULL, 0);
+
 	ide_cacheflush_p(drive);
 
 	ide_unregister_subdriver(drive, idkp->driver);
Index: blk-fixes/drivers/ide/ide-io.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-io.c	2005-07-27 00:44:50.000000000 +0900
+++ blk-fixes/drivers/ide/ide-io.c	2005-07-27 00:44:52.000000000 +0900
@@ -119,10 +119,7 @@ int ide_end_request (ide_drive_t *drive,
 	if (!nr_sectors)
 		nr_sectors = rq->hard_cur_sectors;
 
-	if (blk_complete_barrier_rq_locked(drive->queue, rq, nr_sectors))
-		ret = rq->nr_sectors != 0;
-	else
-		ret = __ide_end_request(drive, rq, uptodate, nr_sectors);
+	ret = __ide_end_request(drive, rq, uptodate, nr_sectors);
 
 	spin_unlock_irqrestore(&ide_lock, flags);
 	return ret;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 09/10] blk: add FUA support to IDE
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (7 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 08/10] blk: update IDE to use new blk_ordered Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 10/10] blk: I/O barrier documentation Tejun Heo
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

09_blk_ide-add-fua-support.patch

	Add FUA support to IDE

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/ide/ide-disk.c |   57 +++++++++++++++++++++++++++++++++++++++++--------
 include/linux/ata.h    |    1 
 include/linux/hdreg.h  |   16 ++++++++++++-
 include/linux/ide.h    |    3 ++
 4 files changed, 66 insertions(+), 11 deletions(-)

Index: blk-fixes/drivers/ide/ide-disk.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-disk.c	2005-07-27 00:44:52.000000000 +0900
+++ blk-fixes/drivers/ide/ide-disk.c	2005-07-27 00:44:52.000000000 +0900
@@ -160,13 +160,14 @@ static ide_startstop_t __ide_do_rw_disk(
 	ide_hwif_t *hwif	= HWIF(drive);
 	unsigned int dma	= drive->using_dma;
 	u8 lba48		= (drive->addressing == 1) ? 1 : 0;
+	int fua			= blk_fua_rq(rq);
 	task_ioreg_t command	= WIN_NOP;
 	ata_nsector_t		nsectors;
 
 	nsectors.all		= (u16) rq->nr_sectors;
 
 	if (hwif->no_lba48_dma && lba48 && dma) {
-		if (block + rq->nr_sectors > 1ULL << 28)
+		if (block + rq->nr_sectors > 1ULL << 28 || fua)
 			dma = 0;
 		else
 			lba48 = 0;
@@ -222,6 +223,16 @@ static ide_startstop_t __ide_do_rw_disk(
 			hwif->OUTB(tasklets[6], IDE_HCYL_REG);
 			hwif->OUTB(0x00|drive->select.all,IDE_SELECT_REG);
 		} else {
+			if (unlikely(fua)) {
+				/*
+				 * This happens if LBA48 addressing is
+				 * turned off during operation.
+				 */
+				printk(KERN_ERR "%s: FUA write but LBA48 off\n",
+				       drive->name);
+				goto fail;
+			}
+
 			hwif->OUTB(0x00, IDE_FEATURE_REG);
 			hwif->OUTB(nsectors.b.low, IDE_NSECTOR_REG);
 			hwif->OUTB(block, IDE_SECTOR_REG);
@@ -249,9 +260,12 @@ static ide_startstop_t __ide_do_rw_disk(
 	if (dma) {
 		if (!hwif->dma_setup(drive)) {
 			if (rq_data_dir(rq)) {
-				command = lba48 ? WIN_WRITEDMA_EXT : WIN_WRITEDMA;
-				if (drive->vdma)
-					command = lba48 ? WIN_WRITE_EXT: WIN_WRITE;
+				if (!fua) {
+					command = lba48 ? WIN_WRITEDMA_EXT : WIN_WRITEDMA;
+					if (drive->vdma)
+						command = lba48 ? WIN_WRITE_EXT: WIN_WRITE;
+				} else
+					command = ATA_CMD_WRITE_FUA_EXT;
 			} else {
 				command = lba48 ? WIN_READDMA_EXT : WIN_READDMA;
 				if (drive->vdma)
@@ -280,8 +294,20 @@ static ide_startstop_t __ide_do_rw_disk(
 	} else {
 		if (drive->mult_count) {
 			hwif->data_phase = TASKFILE_MULTI_OUT;
-			command = lba48 ? WIN_MULTWRITE_EXT : WIN_MULTWRITE;
+			if (!fua)
+				command = lba48 ? WIN_MULTWRITE_EXT : WIN_MULTWRITE;
+			else
+				command = ATA_CMD_MULT_WRITE_FUA_EXT;
 		} else {
+			if (unlikely(fua)) {
+				/*
+				 * This happens if multisector PIO is
+				 * turned off during operation.
+				 */
+				printk(KERN_ERR "%s: FUA write but in single "
+				       "sector PIO mode\n", drive->name);
+				goto fail;
+			}
 			hwif->data_phase = TASKFILE_OUT;
 			command = lba48 ? WIN_WRITE_EXT : WIN_WRITE;
 		}
@@ -291,6 +317,10 @@ static ide_startstop_t __ide_do_rw_disk(
 
 		return pre_task_out_intr(drive, rq);
 	}
+
+ fail:
+	ide_end_request(drive, 0, 0);
+	return ide_stopped;
 }
 
 /*
@@ -842,7 +872,7 @@ static void idedisk_setup (ide_drive_t *
 {
 	struct hd_driveid *id = drive->id;
 	unsigned long long capacity;
-	int barrier;
+	int barrier, fua;
 
 	idedisk_add_settings(drive);
 
@@ -963,10 +993,19 @@ static void idedisk_setup (ide_drive_t *
 			barrier = 0;
 	}
 
-	printk(KERN_INFO "%s: cache flushes %ssupported\n",
-		drive->name, barrier ? "" : "not ");
+	fua = barrier && idedisk_supports_lba48(id) && ide_id_has_fua(id);
+	/* When using PIO, FUA needs multisector. */
+	if ((!drive->using_dma || drive->hwif->no_lba48_dma) &&
+	    drive->mult_count == 0)
+		fua = 0;
+
+	printk(KERN_INFO "%s: cache flushes %ssupported%s\n",
+	       drive->name, barrier ? "" : "not ",
+	       fua ? " w/ FUA" : "");
 	if (barrier) {
-		blk_queue_ordered(drive->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+		unsigned ordered = fua ? QUEUE_ORDERED_DRAIN_FUA
+				       : QUEUE_ORDERED_DRAIN_FLUSH;
+		blk_queue_ordered(drive->queue, ordered,
 				  idedisk_prepare_flush, GFP_KERNEL);
 		blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
 	} else if (!drive->wcache)
Index: blk-fixes/include/linux/ata.h
===================================================================
--- blk-fixes.orig/include/linux/ata.h	2005-07-27 00:44:52.000000000 +0900
+++ blk-fixes/include/linux/ata.h	2005-07-27 00:44:52.000000000 +0900
@@ -122,6 +122,7 @@ enum {
 	ATA_CMD_PIO_READ_EXT	= 0x24,
 	ATA_CMD_PIO_WRITE	= 0x30,
 	ATA_CMD_PIO_WRITE_EXT	= 0x34,
+	ATA_CMD_MULT_WRITE_FUA_EXT = 0xCE,
 	ATA_CMD_SET_FEATURES	= 0xEF,
 	ATA_CMD_PACKET		= 0xA0,
 	ATA_CMD_VERIFY		= 0x40,
Index: blk-fixes/include/linux/hdreg.h
===================================================================
--- blk-fixes.orig/include/linux/hdreg.h	2005-07-27 00:44:47.000000000 +0900
+++ blk-fixes/include/linux/hdreg.h	2005-07-27 00:44:52.000000000 +0900
@@ -550,7 +550,13 @@ struct hd_driveid {
 					 * cmd set-feature supported extensions
 					 * 15:	Shall be ZERO
 					 * 14:	Shall be ONE
-					 * 13:6	reserved
+					 * 13:  IDLE IMMEDIATE w/ UNLOAD FEATURE
+					 * 12:11 reserved for technical report
+					 * 10:  URG for WRITE STREAM
+					 *  9:  URG for READ STREAM
+					 *  8:  64-bit World wide name
+					 *  7:  WRITE DMA QUEUED FUA EXT
+					 *  6:  WRITE DMA/MULTIPLE FUA EXT
 					 *  5:	General Purpose Logging
 					 *  4:	Streaming Feature Set
 					 *  3:	Media Card Pass Through
@@ -600,7 +606,13 @@ struct hd_driveid {
 					 * command set-feature default
 					 * 15:	Shall be ZERO
 					 * 14:	Shall be ONE
-					 * 13:6	reserved
+					 * 13:  IDLE IMMEDIATE w/ UNLOAD FEATURE
+					 * 12:11 reserved for technical report
+					 * 10:  URG for WRITE STREAM
+					 *  9:  URG for READ STREAM
+					 *  8:  64-bit World wide name
+					 *  7:  WRITE DMA QUEUED FUA EXT
+					 *  6:  WRITE DMA/MULTIPLE FUA EXT
 					 *  5:	General Purpose Logging enabled
 					 *  4:	Valid CONFIGURE STREAM executed
 					 *  3:	Media Card Pass Through enabled
Index: blk-fixes/include/linux/ide.h
===================================================================
--- blk-fixes.orig/include/linux/ide.h	2005-07-27 00:44:47.000000000 +0900
+++ blk-fixes/include/linux/ide.h	2005-07-27 00:44:52.000000000 +0900
@@ -1497,6 +1497,9 @@ extern struct bus_type ide_bus_type;
 /* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
 #define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
 
+/* check if WRITE DMA FUA EXT command is supported (defined in ATA-8) */
+#define ide_id_has_fua(id)		((id)->cfsse & 0x0040)
+
 /* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
 #define ide_id_has_flush_cache_ext(id)	\
 	(((id)->cfs_enable_2 & 0x2400) == 0x2400)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 10/10] blk: I/O barrier documentation
  2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
                   ` (8 preceding siblings ...)
  2005-07-26 15:46 ` [PATCH linux-2.6-block:master 09/10] blk: add FUA support to IDE Tejun Heo
@ 2005-07-26 15:46 ` Tejun Heo
  9 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:46 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

10_blk_add-barrier-doc.patch

	I/O barrier documentation

Signed-off-by: Tejun Heo <htejun@gmail.com>

 barrier.txt |  271 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 biodoc.txt  |   10 --
 2 files changed, 273 insertions(+), 8 deletions(-)

Index: blk-fixes/Documentation/block/barrier.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ blk-fixes/Documentation/block/barrier.txt	2005-07-27 00:44:53.000000000 +0900
@@ -0,0 +1,271 @@
+I/O Barriers
+============
+Tejun Heo <htejun@gmail.com>, July 22 2005
+
+I/O barrier requests are used to guarantee ordering around the barrier
+requests.  Unless you're crazy enough to use disk drives for
+implementing synchronization constructs (wow, sounds interesting...),
+the ordering is meaningful only for write requests for things like
+journal checkpoints.  All requests queued before a barrier request
+must be finished (made it to the physical medium) before the barrier
+request is started, and all requests queued after the barrier request
+must be started only after the barrier request is finished (again,
+made it to the physical medium).
+
+In other words, I/O barrier requests have the following two properties.
+
+1. Request ordering
+
+Requests cannot pass the barrier request.  Preceding requests are
+processed before the barrier and following requests after.
+
+Depending on what features a drive supports, this can be done in one
+of the following three ways.
+
+i. For devices which have queue depth greater than 1 (TCQ devices) and
+support ordered tags, block layer can just issue the barrier as an
+ordered request and the lower level driver, controller and drive
+itself are responsible for making sure that the ordering contraint is
+met.  Most modern SCSI controllers/drives should support this.
+
+NOTE: SCSI ordered tag isn't currently used due to limitation in the
+      SCSI midlayer, see the following random notes section.
+
+ii. For devices which have queue depth greater than 1 but don't
+support ordered tags, block layer ensures that the requests preceding
+a barrier request finishes before issuing the barrier request.  Also,
+it defers requests following the barrier until the barrier request is
+finished.  Older SCSI controllers/drives and SATA drives fall in this
+category.
+
+iii. Devices which have queue depth of 1.  This is a degenerate case
+of ii.  Just keeping issue order suffices.  Ancient SCSI
+controllers/drives and IDE drives are in this category.
+
+2. Forced flushing to physcial medium
+
+Again, if you're not gonna do synchronization with disk drives (dang,
+it sounds even more appealing now!), the reason you use I/O barriers
+is mainly to protect filesystem integrity when power failure or some
+other events abruptly stop the drive from operating and possibly make
+the drive lose data in its cache.  So, I/O barriers need to guarantee
+that requests actually get written to non-volatile medium in order.
+
+There are four cases,
+
+i. No write-back cache.  Keeping requests ordered is enough.
+
+ii. Write-back cache but no flush operation.  There's no way to
+gurantee physical-medium commit order.  This kind of devices can't to
+I/O barriers.
+
+iii. Write-back cache and flush operation but no FUA (forced unit
+access).  We need two cache flushes - before and after the barrier
+request.
+
+iv. Write-back cache, flush operation and FUA.  We still need one
+flush to make sure requests preceding a barrier are written to medium,
+but post-barrier flush can be avoided by using FUA write on the
+barrier itself.
+
+
+How to support barrier requests in drivers
+------------------------------------------
+
+All barrier handling is done inside block layer proper.  All low level
+drivers have to are implementing its prepare_flush_fn and using one
+the following two functions to indicate what barrier type it supports
+and how to prepare flush requests.  Note that the term 'ordered' is
+used to indicate the whole sequence of performing barrier requests
+including draining and flushing.
+
+typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq);
+
+int blk_queue_ordered(request_queue_t *q, unsigned ordered,
+		      prepare_flush_fn *prepare_flush_fn,
+		      unsigned gfp_mask);
+
+int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
+			     prepare_flush_fn *prepare_flush_fn,
+			     unsigned gfp_mask);
+
+The only difference between the two functions is whether or not the
+caller is holding q->queue_lock on entry.  The latter expects the
+caller is holding the lock.
+
+@q			: the queue in question
+@ordered		: the ordered mode the driver/device supports
+@prepare_flush_fn	: this function should prepare @rq such that it
+			  flushes cache to physical medium when executed
+@gfp_mask		: gfp_mask used when allocating data structures
+			  for ordered processing
+
+For example, SCSI disk driver's prepare_flush_fn looks like the
+following.
+
+static void sd_prepare_flush(request_queue_t *q, struct request *rq)
+{
+	memset(rq->cmd, 0, sizeof(rq->cmd));
+	rq->flags |= REQ_BLOCK_PC;
+	rq->timeout = SD_TIMEOUT;
+	rq->cmd[0] = SYNCHRONIZE_CACHE;
+}
+
+The following seven ordered modes are supported.  The following table
+shows which mode should be used depending on what features a
+device/driver supports.  In the leftmost column of table,
+QUEUE_ORDERED_ prefix is omitted from the mode names to save space.
+
+The table is followed by description of each mode.  Note that in the
+descriptions of QUEUE_ORDERED_DRAIN*, '=>' is used whereas '->' is
+used for QUEUE_ORDERED_TAG* descriptions.  '=>' indicates that the
+preceding step must be complete before proceeding to the next step.
+'->' indicates that the next step can start as soon as the previous
+step is issued.
+
+	    write-back cache	ordered tag	flush		FUA
+-----------------------------------------------------------------------
+NONE		yes/no		N/A		no		N/A
+DRAIN		no		no		N/A		N/A
+DRAIN_FLUSH	yes		no		yes		no
+DRAIN_FUA	yes		no		yes		yes
+TAG		no		yes		N/A		N/A
+TAG_FLUSH	yes		yes		yes		no
+TAG_FUA		yes		yes		yes		yes
+
+
+QUEUE_ORDERED_NONE
+	I/O barriers are not needed and/or supported.
+
+	Sequence: N/A
+
+QUEUE_ORDERED_DRAIN
+	Requests are ordered by draining the request queue and cache
+	flushing isn't needed.
+
+	Sequence: drain => barrier
+
+QUEUE_ORDERED_DRAIN_FLUSH
+	Requests are ordered by draining the request queue and both
+	pre-barrier and post-barrier cache flushings are needed.
+
+	Sequence: drain => preflush => barrier => postflush
+
+QUEUE_ORDERED_DRAIN_FUA
+	Requests are ordered by draining the request queue and
+	pre-barrier cache flushing is needed.  By using FUA on barrier
+	request, post-barrier flushing can be skipped.
+
+	Sequence: drain => preflush => barrier
+
+QUEUE_ORDERED_TAG
+	Requests are ordered by ordered tag and cache flushing isn't
+	needed.
+
+	Sequence: barrier
+
+QUEUE_ORDERED_TAG_FLUSH
+	Requests are ordered by ordered tag and both pre-barrier and
+	post-barrier cache flushings are needed.
+
+	Sequence: preflush -> barrier -> postflush
+
+QUEUE_ORDERED_TAG_FUA
+	Requests are ordered by ordered tag and pre-barrier cache
+	flushing is needed.  By using FUA on barrier request,
+	post-barrier flushing can be skipped.
+
+	Sequence: preflush -> barrier
+
+
+Random notes/caveats
+--------------------
+
+* SCSI layer currently can't use TAG ordering even if the drive,
+controller and driver support it.  The problem is that SCSI midlayer
+request dispatch function is not atomic.  It releases queue lock and
+switch to SCSI host lock during issue and it's possible and likely to
+happen in time that requests change their relative positions.  Once
+this problem is solved, TAG ordering can be enabled.
+
+* Currently, no matter which ordered mode is used, there can be only
+one barrier request in progress.  All I/O barriers are held off by
+block layer until the previous I/O barrier is complete.  This doesn't
+make any difference for DRAIN ordered devices, but, for TAG ordered
+devices with very high command latency, passing multiple I/O barriers
+to low level *might* be helpful if they are very frequent.  Well, this
+certainly is a non-issue.  I'm writing this just to make clear that no
+two I/O barrier is ever passed to low-level driver.
+
+* Completion order.  Requests in ordered sequence are issued in order
+but not required to finish in order.  Barrier implementation can
+handle out-of-order completion of ordered sequence.  IOW, the requests
+MUST be processed in order but the hardware/software completion paths
+are allowed to reorder completion notifications - eg. current SCSI
+midlayer doesn't preserve completion order during error handling.
+
+* Requeueing order.  Low-level drivers are free to requeue any request
+after they removed it from the request queue with
+blkdev_dequeue_request().  As barrier sequence should be kept in order
+when requeued, generic elevator code takes care of putting requests in
+order around barrier.  See blk_ordered_req_seq() and
+ELEVATOR_INSERT_REQUEUE handling in __elv_add_request() for details.
+
+Note that block drivers must not requeue preceding requests while
+completing latter requests in an ordered sequence.  Currently, no
+error checking is done against this.
+
+* Error handling.  Currently, block layer will report error to upper
+layer if any of requests in an ordered sequence fails.  Unfortunately,
+this doesn't seem to be enough.  Look at the following request flow.
+QUEUE_ORDERED_TAG_FLUSH is in use.
+
+ [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
+					  still in elevator
+
+Let's say request [2], [3] are write requests to update file system
+metadata (journal or whatever) and [barrier] is used to mark that
+those updates are valid.  Consider the following sequence.
+
+ i.	Requests [0] ~ [post] leaves the request queue and enters
+	low-level driver.
+ ii.	After a while, unfortunately, something goes wrong and the
+	drive fails [2].  Note that any of [0], [1] and [3] could have
+	completed by this time, but [pre] couldn't have been finished
+	as the drive must process it in order and it failed before
+	processing that command.
+ iii.	Error handling kicks in and determines that the error is
+	unrecoverable and fails [2], and resumes operation.
+ iv.	[pre] [barrier] [post] gets processed.
+ v.	*BOOM* power fails
+
+The problem here is that the barrier request is *supposed* to indicate
+that filesystem update requests [2] and [3] made it safely to the
+physical medium and, if the machine crashes after the barrier is
+written, filesystem recovery code can depend on that.  Sadly, that
+isn't true in this case anymore.  IOW, the success of a I/O barrier
+should also be dependent on success of some of the preceding requests,
+where only upper layer (filesystem) knows what 'some' is.
+
+This can be solved by implementing a way to tell the block layer which
+requests affect the success of the following barrier request and
+making lower lever drivers to resume operation on error only after
+block layer tells it to do so.
+
+As the probability of this happening is very low and the drive should
+be faulty, implementing the fix is probably an overkill.  But, still,
+it's there.
+
+* In previous drafts of barrier implementation, there was fallback
+mechanism such that, if FUA or ordered TAG fails, less fancy ordered
+mode can be selected and the failed barrier request is retried
+automatically.  The rationale for this feature was that as FUA is
+pretty new in ATA world and ordered tag was never used widely, there
+could be devices which report to support those features but choke when
+actually given such requests.
+
+ This was removed for two reasons 1. it's an overkill 2. it's
+impossible to implement properly when TAG ordering is used as low
+level drivers resume after an error automatically.  If it's ever
+needed adding it back and modifying low level drivers accordingly
+shouldn't be difficult.
Index: blk-fixes/Documentation/block/biodoc.txt
===================================================================
--- blk-fixes.orig/Documentation/block/biodoc.txt	2005-07-27 00:44:49.000000000 +0900
+++ blk-fixes/Documentation/block/biodoc.txt	2005-07-27 00:44:53.000000000 +0900
@@ -263,14 +263,8 @@ A flag in the bio structure, BIO_BARRIER
 The generic i/o scheduler would make sure that it places the barrier request and
 all other requests coming after it after all the previous requests in the
 queue. Barriers may be implemented in different ways depending on the
-driver. A SCSI driver for example could make use of ordered tags to
-preserve the necessary ordering with a lower impact on throughput. For IDE
-this might be two sync cache flush: a pre and post flush when encountering
-a barrier write.
-
-There is a provision for queues to indicate what kind of barriers they
-can provide. This is as of yet unmerged, details will be added here once it
-is in the kernel.
+driver. For more details regarding I/O barriers, please read barrier.txt
+in this directory.
 
 1.2.2 Request Priority/Latency
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier
@ 2005-10-19 12:47 Tejun Heo
  2005-10-19 12:48 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2005-10-19 12:47 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

 Hello, Jens, James, Jeff and Bartlomiej.

 This is the fourth posting of blk ordered reimplementation.  The last
posting was on 27th July.

 http://marc.theaimsgroup.com/?l=linux-kernel&m=112239673727945&w=2

 Other than regenerated against the current tree, nothing has changed.

 This patchset is part of the patch series described in the following mail.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972451727866&w=2

 As all previous patches don't really concern subsystems other than
block layer.  They are only sent to Jens and LKML.  If they're needed,
preceding patches are...

fix-elevator_find:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972451705426&w=2

generic-dispatch-queue:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555921965&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555927228&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555815824&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555801764&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555903273&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555917170&w=2

reimplement-elevator-switch:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555831088&w=2

 For general description please read barrier.txt and the previous posting.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=111795127124020&w=2

 ==== Some excerpts from barrier.txt. ====

* SCSI layer currently can't use TAG ordering even if the drive,
  controller and driver support it.  The problem is that SCSI midlayer
  request dispatch function is not atomic.  It releases queue lock and
  switch to SCSI host lock during issue and it's possible and likely
  to happen in time that requests change their relative positions.
  Once this problem is solved, TAG ordering can be enabled.

* Error handling.  Currently, block layer will report error to upper
  layer if any of requests in an ordered sequence fails.
  Unfortunately, this doesn't seem to be enough.  Look at the
  following request flow.  QUEUE_ORDERED_TAG_FLUSH is in use.

 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
                                          still in elevator

  Let's say request [2], [3] are write requests to update file system
  metadata (journal or whatever) and [barrier] is used to mark that
  those updates are valid.  Consider the following sequence.

 i.     Requests [0] ~ [post] leaves the request queue and enters
        low-level driver.
 ii.    After a while, unfortunately, something goes wrong and the
        drive fails [2].  Note that any of [0], [1] and [3] could have
        completed by this time, but [pre] couldn't have been finished
        as the drive must process it in order and it failed before
        processing that command.
 iii.   Error handling kicks in and determines that the error is
        unrecoverable and fails [2], and resumes operation.
 iv.    [pre] [barrier] [post] gets processed.
 v.     *BOOM* power fails

  The problem here is that the barrier request is *supposed* to
  indicate that filesystem update requests [2] and [3] made it safely
  to the physical medium and, if the machine crashes after the barrier
  is written, filesystem recovery code can depend on that.  Sadly,
  that isn't true in this case anymore.  IOW, the success of a I/O
  barrier should also be dependent on success of some of the preceding
  requests, where only upper layer (filesystem) knows what 'some' is.

  This can be solved by implementing a way to tell the block layer
  which requests affect the success of the following barrier request
  and making lower lever drivers to resume operation on error only
  after block layer tells it to do so.

  As the probability of this happening is very low and the drive
  should be faulty, implementing the fix is probably an overkill.
  But, still, it's there.

[ Start of patch descriptions ]

01_blk_add-uptodate-to-end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

02_blk_implement-init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

03_blk_reimplement-ordered.patch
	: reimplement handling of barrier request

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

04_blk_scsi-update-ordered.patch
	: update SCSI to use new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

05_blk_scsi-add-fua-support.patch
	: add FUA support to SCSI disk

	Add FUA support to SCSI disk.

06_blk_libata-update-ordered.patch
	: update libata to use new blk_ordered

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

07_blk_libata-add-fua-support.patch
	: add FUA support to libata

	Add FUA support to libata.

08_blk_ide-update-ordered.patch
	: update IDE to use new blk_ordered

	Update IDE to use new blk_ordered.

09_blk_ide-add-fua-support.patch
	: add FUA support to IDE

	Add FUA support to IDE

10_blk_add-barrier-doc.patch
	: I/O barrier documentation

	I/O barrier documentation

[ End of patch descriptions ]

 Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered
  2005-10-19 12:47 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
@ 2005-10-19 12:48 ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2005-10-19 12:48 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

06_blk_libata-update-ordered.patch

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ahci.c         |    1 -
 ata_piix.c     |    1 -
 sata_mv.c      |    1 -
 sata_nv.c      |    1 -
 sata_promise.c |    1 -
 sata_sil.c     |    1 -
 sata_sis.c     |    1 -
 sata_svw.c     |    1 -
 sata_sx4.c     |    1 -
 sata_uli.c     |    1 -
 sata_via.c     |    1 -
 sata_vsc.c     |    1 -
 12 files changed, 12 deletions(-)

Index: blk-fixes/drivers/scsi/ahci.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ahci.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/ahci.c	2005-10-19 21:47:14.000000000 +0900
@@ -213,7 +213,6 @@ static Scsi_Host_Template ahci_sht = {
 	.dma_boundary		= AHCI_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations ahci_ops = {
Index: blk-fixes/drivers/scsi/ata_piix.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ata_piix.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/ata_piix.c	2005-10-19 21:47:14.000000000 +0900
@@ -144,7 +144,6 @@ static Scsi_Host_Template piix_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations piix_pata_ops = {
Index: blk-fixes/drivers/scsi/sata_nv.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_nv.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_nv.c	2005-10-19 21:47:14.000000000 +0900
@@ -235,7 +235,6 @@ static Scsi_Host_Template nv_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations nv_ops = {
Index: blk-fixes/drivers/scsi/sata_promise.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_promise.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_promise.c	2005-10-19 21:47:14.000000000 +0900
@@ -110,7 +110,6 @@ static Scsi_Host_Template pdc_ata_sht = 
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_sata_ops = {
Index: blk-fixes/drivers/scsi/sata_sil.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sil.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sil.c	2005-10-19 21:47:14.000000000 +0900
@@ -147,7 +147,6 @@ static Scsi_Host_Template sil_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sil_ops = {
Index: blk-fixes/drivers/scsi/sata_sis.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sis.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sis.c	2005-10-19 21:47:14.000000000 +0900
@@ -99,7 +99,6 @@ static Scsi_Host_Template sis_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sis_ops = {
Index: blk-fixes/drivers/scsi/sata_svw.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_svw.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_svw.c	2005-10-19 21:47:14.000000000 +0900
@@ -293,7 +293,6 @@ static Scsi_Host_Template k2_sata_sht = 
 	.proc_info		= k2_sata_proc_info,
 #endif
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 
Index: blk-fixes/drivers/scsi/sata_sx4.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sx4.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sx4.c	2005-10-19 21:47:14.000000000 +0900
@@ -193,7 +193,6 @@ static Scsi_Host_Template pdc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_20621_ops = {
Index: blk-fixes/drivers/scsi/sata_uli.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_uli.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_uli.c	2005-10-19 21:47:14.000000000 +0900
@@ -87,7 +87,6 @@ static Scsi_Host_Template uli_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations uli_ops = {
Index: blk-fixes/drivers/scsi/sata_via.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_via.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_via.c	2005-10-19 21:47:14.000000000 +0900
@@ -106,7 +106,6 @@ static Scsi_Host_Template svia_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations svia_sata_ops = {
Index: blk-fixes/drivers/scsi/sata_vsc.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_vsc.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_vsc.c	2005-10-19 21:47:14.000000000 +0900
@@ -227,7 +227,6 @@ static Scsi_Host_Template vsc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 
Index: blk-fixes/drivers/scsi/sata_mv.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_mv.c	2005-10-19 21:46:49.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_mv.c	2005-10-19 21:47:14.000000000 +0900
@@ -207,7 +207,6 @@ static Scsi_Host_Template mv_sht = {
 	.dma_boundary		= MV_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations mv_ops = {


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-10-19 12:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 02/10] blk: separate out bio init part from __make_request Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 03/10] blk: reimplement handling of barrier request Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 04/10] blk: update SCSI to use new blk_ordered Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-27  7:44     ` Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 08/10] blk: update IDE to use new blk_ordered Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 09/10] blk: add FUA support to IDE Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 10/10] blk: I/O barrier documentation Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2005-10-19 12:47 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
2005-10-19 12:48 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox