public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier
@ 2005-07-26 15:45 Tejun Heo
  2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Tejun Heo @ 2005-07-26 15:45 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

 Hello, Jens, James, Jeff and Bartlomiej.

 This is the third posting of blk ordered reimplementation.  Changes
since the scond posting are...

 * Dispatch queue patchset is reordered in front of this patchset.
 * Draining bug fix
 * Proper requeueing of requests in an ordered sequence for TAG
   ordered queues.
 * Fallback mechanism stripped out.  This also makes -EOPNOTSUPP
   changes in SCSI and IDE unnecessary.
 * TAG ordering request issue fix
 * SCSI/libata/IDE patches splitted as requested
 * Hopefully, changes to libata and IDE are more acceptable
 * Documentation/block/barrier.txt added

 This patchset is part of the patch series described in the following mail.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238377602033&w=2

 As all previous patches don't really concern subsystems other than
block layer.  They are only sent to Jens and LKML.  If they're needed,
preceding patches are...

fix-elevator_find:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238419731035&w=2

fix-cfq_find_next_crq:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238456712203&w=2

generic-dispatch-queue:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238633622498&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238647101632&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238693809889&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238675926411&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238660407245&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238708918364&w=2

reimplement-elevator-switch:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112238771305863&w=2

 For general description please read barrier.txt and the previous posting.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=111795127124020&w=2

 ==== Some excerpts from barrier.txt. ====

* SCSI layer currently can't use TAG ordering even if the drive,
  controller and driver support it.  The problem is that SCSI midlayer
  request dispatch function is not atomic.  It releases queue lock and
  switch to SCSI host lock during issue and it's possible and likely
  to happen in time that requests change their relative positions.
  Once this problem is solved, TAG ordering can be enabled.

* Error handling.  Currently, block layer will report error to upper
  layer if any of requests in an ordered sequence fails.
  Unfortunately, this doesn't seem to be enough.  Look at the
  following request flow.  QUEUE_ORDERED_TAG_FLUSH is in use.

 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
                                          still in elevator

  Let's say request [2], [3] are write requests to update file system
  metadata (journal or whatever) and [barrier] is used to mark that
  those updates are valid.  Consider the following sequence.

 i.     Requests [0] ~ [post] leaves the request queue and enters
        low-level driver.
 ii.    After a while, unfortunately, something goes wrong and the
        drive fails [2].  Note that any of [0], [1] and [3] could have
        completed by this time, but [pre] couldn't have been finished
        as the drive must process it in order and it failed before
        processing that command.
 iii.   Error handling kicks in and determines that the error is
        unrecoverable and fails [2], and resumes operation.
 iv.    [pre] [barrier] [post] gets processed.
 v.     *BOOM* power fails

  The problem here is that the barrier request is *supposed* to
  indicate that filesystem update requests [2] and [3] made it safely
  to the physical medium and, if the machine crashes after the barrier
  is written, filesystem recovery code can depend on that.  Sadly,
  that isn't true in this case anymore.  IOW, the success of a I/O
  barrier should also be dependent on success of some of the preceding
  requests, where only upper layer (filesystem) knows what 'some' is.

  This can be solved by implementing a way to tell the block layer
  which requests affect the success of the following barrier request
  and making lower lever drivers to resume operation on error only
  after block layer tells it to do so.

  As the probability of this happening is very low and the drive
  should be faulty, implementing the fix is probably an overkill.
  But, still, it's there.

[ Start of patch descriptions ]

01_blk_add-uptodate-to-end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

02_blk_implement-init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

03_blk_reimplement-ordered.patch
	: reimplement handling of barrier request

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

04_blk_scsi-update-ordered.patch
	: update SCSI to use new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

05_blk_scsi-add-fua-support.patch
	: add FUA support to SCSI disk

	Add FUA support to SCSI disk.

06_blk_libata-update-ordered.patch
	: update libata to use new blk_ordered

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

07_blk_libata-add-fua-support.patch
	: add FUA support to libata

	Add FUA support to libata.

08_blk_ide-update-ordered.patch
	: update IDE to use new blk_ordered

	Update IDE to use new blk_ordered.

09_blk_ide-add-fua-support.patch
	: add FUA support to IDE

	Add FUA support to IDE

10_blk_add-barrier-doc.patch
	: I/O barrier documentation

	I/O barrier documentation

[ End of patch descriptions ]

 Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 16+ messages in thread
* [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier
@ 2005-10-19 12:47 Tejun Heo
  2005-10-19 12:48 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2005-10-19 12:47 UTC (permalink / raw)
  To: axboe, jgarzik, James.Bottomley, bzolnier; +Cc: linux-kernel

 Hello, Jens, James, Jeff and Bartlomiej.

 This is the fourth posting of blk ordered reimplementation.  The last
posting was on 27th July.

 http://marc.theaimsgroup.com/?l=linux-kernel&m=112239673727945&w=2

 Other than regenerated against the current tree, nothing has changed.

 This patchset is part of the patch series described in the following mail.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972451727866&w=2

 As all previous patches don't really concern subsystems other than
block layer.  They are only sent to Jens and LKML.  If they're needed,
preceding patches are...

fix-elevator_find:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972451705426&w=2

generic-dispatch-queue:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555921965&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555927228&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555815824&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555801764&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555903273&w=2
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555917170&w=2

reimplement-elevator-switch:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112972555831088&w=2

 For general description please read barrier.txt and the previous posting.
	http://marc.theaimsgroup.com/?l=linux-kernel&m=111795127124020&w=2

 ==== Some excerpts from barrier.txt. ====

* SCSI layer currently can't use TAG ordering even if the drive,
  controller and driver support it.  The problem is that SCSI midlayer
  request dispatch function is not atomic.  It releases queue lock and
  switch to SCSI host lock during issue and it's possible and likely
  to happen in time that requests change their relative positions.
  Once this problem is solved, TAG ordering can be enabled.

* Error handling.  Currently, block layer will report error to upper
  layer if any of requests in an ordered sequence fails.
  Unfortunately, this doesn't seem to be enough.  Look at the
  following request flow.  QUEUE_ORDERED_TAG_FLUSH is in use.

 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
                                          still in elevator

  Let's say request [2], [3] are write requests to update file system
  metadata (journal or whatever) and [barrier] is used to mark that
  those updates are valid.  Consider the following sequence.

 i.     Requests [0] ~ [post] leaves the request queue and enters
        low-level driver.
 ii.    After a while, unfortunately, something goes wrong and the
        drive fails [2].  Note that any of [0], [1] and [3] could have
        completed by this time, but [pre] couldn't have been finished
        as the drive must process it in order and it failed before
        processing that command.
 iii.   Error handling kicks in and determines that the error is
        unrecoverable and fails [2], and resumes operation.
 iv.    [pre] [barrier] [post] gets processed.
 v.     *BOOM* power fails

  The problem here is that the barrier request is *supposed* to
  indicate that filesystem update requests [2] and [3] made it safely
  to the physical medium and, if the machine crashes after the barrier
  is written, filesystem recovery code can depend on that.  Sadly,
  that isn't true in this case anymore.  IOW, the success of a I/O
  barrier should also be dependent on success of some of the preceding
  requests, where only upper layer (filesystem) knows what 'some' is.

  This can be solved by implementing a way to tell the block layer
  which requests affect the success of the following barrier request
  and making lower lever drivers to resume operation on error only
  after block layer tells it to do so.

  As the probability of this happening is very low and the drive
  should be faulty, implementing the fix is probably an overkill.
  But, still, it's there.

[ Start of patch descriptions ]

01_blk_add-uptodate-to-end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
        to rq_end_io_fn().  There's no generic way to pass error code
        to request completion function, making generic error handling
        of non-fs request difficult (rq->errors is driver-specific and
        each driver uses it differently).  This patch adds @uptodate
        to end_that_request_last() and @error to rq_end_io_fn().

        For fs requests, this doesn't really matter, so just using the
        same uptodate argument used in the last call to
        end_that_request_first() should suffice.  IMHO, this can also
        help the generic command-carrying request Jens is working on.

02_blk_implement-init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
        will be used by the following blk_ordered_reimpl.

03_blk_reimplement-ordered.patch
	: reimplement handling of barrier request

	 Reimplement handling of barrier requests.

        * Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

04_blk_scsi-update-ordered.patch
	: update SCSI to use new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
        now doens't deal with ordered setting or prepare_flush
        callback.  sd.c updated to deal with blk_queue_ordered
        setting.  Currently, ordered tag isn't used as SCSI midlayer
        cannot guarantee request ordering.

05_blk_scsi-add-fua-support.patch
	: add FUA support to SCSI disk

	Add FUA support to SCSI disk.

06_blk_libata-update-ordered.patch
	: update libata to use new blk_ordered

	Reflect changes in SCSI midlayer and updated to use new
        ordered request implementation

07_blk_libata-add-fua-support.patch
	: add FUA support to libata

	Add FUA support to libata.

08_blk_ide-update-ordered.patch
	: update IDE to use new blk_ordered

	Update IDE to use new blk_ordered.

09_blk_ide-add-fua-support.patch
	: add FUA support to IDE

	Add FUA support to IDE

10_blk_add-barrier-doc.patch
	: I/O barrier documentation

	I/O barrier documentation

[ End of patch descriptions ]

 Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-10-19 12:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-26 15:45 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
2005-07-26 15:45 ` [PATCH linux-2.6-block:master 01/10] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 02/10] blk: separate out bio init part from __make_request Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 03/10] blk: reimplement handling of barrier request Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 04/10] blk: update SCSI to use new blk_ordered Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 05/10] blk: add FUA support to SCSI disk Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 07/10] blk: add FUA support to libata Tejun Heo
2005-07-26 15:55   ` Jeff Garzik
2005-07-27  7:44     ` Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 08/10] blk: update IDE to use new blk_ordered Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 09/10] blk: add FUA support to IDE Tejun Heo
2005-07-26 15:46 ` [PATCH linux-2.6-block:master 10/10] blk: I/O barrier documentation Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2005-10-19 12:47 [PATCH linux-2.6-block:master 00/10] blk: reimplementation of I/O barrier Tejun Heo
2005-10-19 12:48 ` [PATCH linux-2.6-block:master 06/10] blk: update libata to use new blk_ordered Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox