[PATCH 0/4] bidi support: block layer bidirectional io.

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] bidi support: block layer bidirectional io.
@ 2007-04-15 17:17 Boaz Harrosh
  2007-04-15 17:25 ` [PATCH 1/4] bidi support: request dma_data_direction Boaz Harrosh
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-15 17:17 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Andrew Morton, Mike Christie,
	Christoph Hellwig
  Cc: linux-scsi, Linux-ide, Benny Halevy, osd-dev, bharrosh

Following are 4 (large) patches for support of bidirectional
block I/O in kernel. (not including SCSI-ml or iSCSI)

The submitted work is against linux-2.6-block tree as of
2007/04/15, and will only cleanly apply in succession.

The patches are based on the RFC I sent 3 months ago. They only
cover the block layer at this point. I suggest they get included
in Morton's tree until they reach the kernel so they can get
compiled on all architectures/platforms. There is still a chance
that architectures I did not compile were not fully converted.
(FWIW, my search for use of struct request members failed to find
them). If you find such a case, please send me the file
name and I will fix it ASAP.

Patches summary:
1. [PATCH 1/4] bidi support: request dma_data_direction
	- Convert REQ_RW bit flag to a dma_data_direction member like in SCSI-ml use.
	- removed rq_data_dir() and added other APIs for querying request's direction.
	- fix usage of rq_data_dir() and peeking at req->cmd_flags & REQ_RW to using
	  new api.
	- clean-up bad usage of DMA_BIDIRECTIONAL and bzero of none-queue requests,
	  to use the new blk_rq_init_unqueued_req()

2. [PATCH 2/4] bidi support: fix req->cmd == INT cases
	- Digging into all these old drivers, I have found traces of past life
	  where request->cmd was the command type. This patch fixes some of these
	  places. All drivers touched by this patch are clear indication of drivers
	  that were not used for a while. Should we removed them from Kernel? 
	  These Are:
		drivers/acorn/block/fd1772.c, drivers/acorn/block/mfmhd.c,
		drivers/block/nbd.c, drivers/cdrom/aztcd.c, drivers/cdrom/cm206.c
		drivers/cdrom/gscd.c, drivers/cdrom/mcdx.c, drivers/cdrom/optcd.c
		drivers/cdrom/sjcd.c, drivers/ide/legacy/hd.c, drivers/block/amiflop.c

2. [PATCH 3/4] bidi support: request_io_part
	- extract io related fields in struct request into struct request_io_part
	  in preparation to full bidi support.
	- new rq_uni() API to access the sub-structure. (Please read below comment
	  on why an API and not open code the access)
	- Convert All users to new API.

3. [PATCH 4/4] bidi support: bidirectional block layer
	- add one more request_io_part member for bidi support in struct request.
	- add block layer API functions for mapping and accessing bidi data buffers
	  and for ending a block request as a whole (end_that_request_block())

--------------------------------------------------------------------------------------------
Developer comments:

patch 1/4: Borrow from struct scsi_cmnd use of enum dma_data_direction. Further work (in
progress) is the removal of the corresponding member from struct scsi_cmnd and converting
all users to directly access rq_dma_dir(sc->req).

patch 3/4: The reasons for introducing the rq_uni(req) API rather than directly accessing
req->uni are:

* WARN(!bidi_dir(req)) is life saving when developing bidi enabled paths.  Once we, bidi
  users, start to push bidi requests down the kernel paths, we immediately get warned of
  paths we did not anticipate. Otherwise, they will be very hard to find, and will hurt
  kernel stability.

* A cleaner and saner future implementation could be in/out members rather than
  uni/bidi_read.  This way the dma_direction member can deprecated and the uni sub-
  structure can be maintained using a pointer in struct req.
  With this API we are free to change the implementation in the future without
  touching any users of the API. We can also experiment with what's best. Also, with the
  API it is much easier to convert uni-directional drivers for bidi (look in
  ll_rw_block.c in patch 4/4).

* Note, that internal uses inside the block layer access req->uni directly, as they will
  need to be changed if the implementation of req->{uni, bidi_read} changes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] bidi support: request dma_data_direction
  2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
@ 2007-04-15 17:25 ` Boaz Harrosh
  2007-04-15 17:31 ` [PATCH 2/4] bidi support: fix req->cmd == INT cases Boaz Harrosh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-15 17:25 UTC (permalink / raw)
  To: Boaz Harrosh, Jens Axboe, James Bottomley, Andrew Morton,
	Mike Christie, Christoph Hellwig
  Cc: linux-scsi, Linux-ide, Benny Halevy, osd-dev

 - Introduce a new enum dma_data_direction data_dir member in struct request.
  and remove the RW bit from request->cmd_flag
- Add new API to query request direction.
- Adjust existing API and implementation.
- Cleanup wrong use of DMA_BIDIRECTIONAL
- Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
  requests were used and bzero'ed.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 arch/um/drivers/ubd_kern.c      |    4 +-
 block/as-iosched.c              |    2 +-
 block/cfq-iosched.c             |   10 ++--
 block/deadline-iosched.c        |    8 ++--
 block/elevator.c                |   11 ++--
 block/ll_rw_blk.c               |  113 ++++++++++++++++++++++++---------------
 block/scsi_ioctl.c              |   10 ++--
 drivers/block/DAC960.c          |    2 +-
 drivers/block/amiflop.c         |    4 +-
 drivers/block/ataflop.c         |    2 +-
 drivers/block/cciss.c           |    6 +-
 drivers/block/cpqarray.c        |    6 +-
 drivers/block/floppy.c          |    9 +--
 drivers/block/nbd.c             |    3 +-
 drivers/block/paride/pcd.c      |    2 +-
 drivers/block/paride/pd.c       |   29 +++++------
 drivers/block/paride/pf.c       |    9 +---
 drivers/block/pktcdvd.c         |    2 +-
 drivers/block/ps2esdi.c         |    9 ++--
 drivers/block/swim3.c           |   26 +++++-----
 drivers/block/sx8.c             |    6 +-
 drivers/block/ub.c              |    8 +--
 drivers/block/viodasd.c         |    2 +-
 drivers/block/xd.c              |    4 +-
 drivers/block/z2ram.c           |    2 +-
 drivers/cdrom/cdrom.c           |    2 +-
 drivers/cdrom/cdu31a.c          |    2 +-
 drivers/cdrom/gscd.c            |    2 +-
 drivers/cdrom/sbpcd.c           |    2 +-
 drivers/cdrom/sonycd535.c       |    2 +-
 drivers/cdrom/viocd.c           |    2 +-
 drivers/ide/arm/icside.c        |    7 +--
 drivers/ide/cris/ide-cris.c     |    4 +-
 drivers/ide/ide-cd.c            |   13 +++--
 drivers/ide/ide-disk.c          |   10 ++--
 drivers/ide/ide-dma.c           |    4 +-
 drivers/ide/ide-floppy.c        |    4 +-
 drivers/ide/ide-io.c            |    5 +-
 drivers/ide/ide-tape.c          |    2 +-
 drivers/ide/ide-taskfile.c      |    6 +-
 drivers/ide/ide.c               |    4 +-
 drivers/ide/legacy/hd.c         |    6 +-
 drivers/ide/mips/au1xxx-ide.c   |    7 +--
 drivers/ide/pci/alim15x3.c      |    2 +-
 drivers/ide/pci/hpt366.c        |    2 +-
 drivers/ide/pci/pdc202xx_old.c  |    2 +-
 drivers/ide/pci/scc_pata.c      |    2 +-
 drivers/ide/pci/sgiioc4.c       |    4 +-
 drivers/ide/pci/trm290.c        |    2 +-
 drivers/ide/ppc/pmac.c          |    5 +-
 drivers/md/dm-emc.c             |    2 +-
 drivers/message/i2o/i2o_block.c |    8 ++--
 drivers/mmc/mmc_block.c         |   13 +++--
 drivers/mtd/mtd_blkdevs.c       |    8 ++--
 drivers/s390/block/dasd.c       |    2 +-
 drivers/s390/block/dasd_diag.c  |    4 +-
 drivers/s390/block/dasd_eckd.c  |   10 ++--
 drivers/s390/block/dasd_fba.c   |   14 +++---
 drivers/s390/char/tape_block.c  |    2 +-
 drivers/sbus/char/jsflash.c     |    2 +-
 drivers/scsi/aic7xxx_old.c      |    4 +-
 drivers/scsi/scsi_error.c       |    3 +-
 drivers/scsi/scsi_lib.c         |   21 +++----
 drivers/scsi/scsi_tgt_lib.c     |   29 ++++-------
 drivers/scsi/sd.c               |   20 ++++---
 drivers/scsi/sg.c               |    2 -
 drivers/scsi/sr.c               |   15 +++---
 drivers/scsi/sun3_NCR5380.c     |    6 +-
 include/linux/blkdev.h          |   53 ++++++++++++++++---
 include/linux/blktrace_api.h    |    8 +++-
 include/linux/dma-mapping.h     |   22 ++++++++
 include/linux/elevator.h        |    4 +-
 72 files changed, 359 insertions(+), 285 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 8bd9204..222ad17 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -1030,7 +1030,7 @@ static int prepare_request(struct request *req, struct io_thread_req *io_req)
     int len;

     /* This should be impossible now */
-    if((rq_data_dir(req) == WRITE) && !ubd_dev->openflags.w){
+    if((rq_rw_dir(req) == WRITE) && !ubd_dev->openflags.w){
         printk("Write attempted on readonly ubd device %s\n",
                disk->disk_name);
         end_request(req, 0);
@@ -1049,7 +1049,7 @@ static int prepare_request(struct request *req, struct io_thread_req *io_req)
     io_req->error = 0;
     io_req->sector_mask = 0;

-    io_req->op = (rq_data_dir(req) == READ) ? UBD_READ : UBD_WRITE;
+    io_req->op = (rq_uni_rw_dir(req) == READ) ? UBD_READ : UBD_WRITE;
     io_req->offsets[0] = 0;
     io_req->offsets[1] = ubd_dev->cow.data_offset;
     io_req->buffer = req->buffer;
diff --git a/block/as-iosched.c b/block/as-iosched.c
index ef12627..824d93e 100644
--- a/block/as-iosched.c
+++ b/block/as-iosched.c
@@ -1285,7 +1285,7 @@ static void as_work_handler(struct work_struct *work)
     spin_unlock_irqrestore(q->queue_lock, flags);
 }

-static int as_may_queue(request_queue_t *q, int rw)
+static int as_may_queue(request_queue_t *q, int rw, int is_sync)
 {
     int ret = ELV_MQUEUE_MAY;
     struct as_data *ad = q->elevator->elevator_data;
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..882a15a 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -226,7 +226,7 @@ static inline pid_t cfq_queue_pid(struct task_struct *task, int rw, int is_sync)
     /*
      * Use the per-process queue, for read requests and syncronous writes
      */
-    if (!(rw & REQ_RW) || is_sync)
+    if (!(rw == WRITE) || is_sync)
         return task->pid;

     return CFQ_KEY_ASYNC;
@@ -1787,14 +1787,14 @@ static inline int __cfq_may_queue(struct cfq_queue *cfqq)
     return ELV_MQUEUE_MAY;
 }

-static int cfq_may_queue(request_queue_t *q, int rw)
+static int cfq_may_queue(request_queue_t *q, int rw, int is_sync)
 {
     struct cfq_data *cfqd = q->elevator->elevator_data;
     struct task_struct *tsk = current;
     struct cfq_queue *cfqq;
     unsigned int key;

-    key = cfq_queue_pid(tsk, rw, rw & REQ_RW_SYNC);
+    key = cfq_queue_pid(tsk, rw, is_sync);

     /*
      * don't force setup of a queue from here, as a call to may_queue
@@ -1821,7 +1821,7 @@ static void cfq_put_request(struct request *rq)
     struct cfq_queue *cfqq = RQ_CFQQ(rq);

     if (cfqq) {
-        const int rw = rq_data_dir(rq);
+        const int rw = rq_rw_dir(rq);

         BUG_ON(!cfqq->allocated[rw]);
         cfqq->allocated[rw]--;
@@ -1844,7 +1844,7 @@ cfq_set_request(request_queue_t *q, struct request *rq, gfp_t gfp_mask)
     struct cfq_data *cfqd = q->elevator->elevator_data;
     struct task_struct *tsk = current;
     struct cfq_io_context *cic;
-    const int rw = rq_data_dir(rq);
+    const int rw = rq_rw_dir(rq);
     const int is_sync = rq_is_sync(rq);
     pid_t key = cfq_queue_pid(tsk, rw, is_sync);
     struct cfq_queue *cfqq;
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 6d673e9..e605c09 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -53,7 +53,7 @@ struct deadline_data {

 static void deadline_move_request(struct deadline_data *, struct request *);

-#define RQ_RB_ROOT(dd, rq)    (&(dd)->sort_list[rq_data_dir((rq))])
+#define RQ_RB_ROOT(dd, rq)    (&(dd)->sort_list[rq_rw_dir((rq))])

 static void
 deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
@@ -72,7 +72,7 @@ retry:
 static inline void
 deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
 {
-    const int data_dir = rq_data_dir(rq);
+    const int data_dir = rq_rw_dir(rq);

     if (dd->next_rq[data_dir] == rq) {
         struct rb_node *rbnext = rb_next(&rq->rb_node);
@@ -92,7 +92,7 @@ static void
 deadline_add_request(struct request_queue *q, struct request *rq)
 {
     struct deadline_data *dd = q->elevator->elevator_data;
-    const int data_dir = rq_data_dir(rq);
+    const int data_dir = rq_rw_dir(rq);

     deadline_add_rq_rb(dd, rq);

@@ -197,7 +197,7 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
 static void
 deadline_move_request(struct deadline_data *dd, struct request *rq)
 {
-    const int data_dir = rq_data_dir(rq);
+    const int data_dir = rq_rw_dir(rq);
     struct rb_node *rbnext = rb_next(&rq->rb_node);

     dd->next_rq[READ] = NULL;
diff --git a/block/elevator.c b/block/elevator.c
index 96a00c8..b73feec 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -76,7 +76,7 @@ inline int elv_rq_merge_ok(struct request *rq, struct bio *bio)
     /*
      * different data direction or already started, don't merge
      */
-    if (bio_data_dir(bio) != rq_data_dir(rq))
+    if (bio_data_dir(bio) != rq_rw_dir(rq))
         return 0;

     /*
@@ -413,7 +413,7 @@ void elv_dispatch_sort(request_queue_t *q, struct request *rq)
     list_for_each_prev(entry, &q->queue_head) {
         struct request *pos = list_entry_rq(entry);

-        if (rq_data_dir(rq) != rq_data_dir(pos))
+        if (rq_uni_rw_dir(rq) != rq_uni_rw_dir(pos))
             break;
         if (pos->cmd_flags & (REQ_SOFTBARRIER|REQ_HARDBARRIER|REQ_STARTED))
             break;
@@ -733,7 +733,8 @@ struct request *elv_next_request(request_queue_t *q)
             blk_add_trace_rq(q, rq, BLK_TA_ISSUE);
         }

-        if (!q->boundary_rq || q->boundary_rq == rq) {
+        if ((!q->boundary_rq || q->boundary_rq == rq) &&
+            !rq_bidi_dir(rq)) {
             q->end_sector = rq_end_sector(rq);
             q->boundary_rq = NULL;
         }
@@ -845,12 +846,12 @@ void elv_put_request(request_queue_t *q, struct request *rq)
         e->ops->elevator_put_req_fn(rq);
 }

-int elv_may_queue(request_queue_t *q, int rw)
+int elv_may_queue(request_queue_t *q, int rw, int is_sync)
 {
     elevator_t *e = q->elevator;

     if (e->ops->elevator_may_queue_fn)
-        return e->ops->elevator_may_queue_fn(q, rw);
+        return e->ops->elevator_may_queue_fn(q, rw, is_sync);

     return ELV_MQUEUE_MAY;
 }
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..e7ae7d4 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -431,8 +431,9 @@ static inline struct request *start_ordered(request_queue_t *q,
     rq = &q->bar_rq;
     rq->cmd_flags = 0;
     rq_init(q, rq);
-    if (bio_data_dir(q->orig_bar_rq->bio) == WRITE)
-        rq->cmd_flags |= REQ_RW;
+    rq->data_dir = bio_data_dir(q->orig_bar_rq->bio) ?
+        DMA_TO_DEVICE : DMA_FROM_DEVICE;
+
     rq->cmd_flags |= q->ordered & QUEUE_ORDERED_FUA ? REQ_FUA : 0;
     rq->elevator_private = NULL;
     rq->elevator_private2 = NULL;
@@ -1958,18 +1959,25 @@ static inline void blk_free_request(request_queue_t *q, struct request *rq)
 }

 static struct request *
-blk_alloc_request(request_queue_t *q, int rw, int priv, gfp_t gfp_mask)
+blk_alloc_request(request_queue_t *q, enum dma_data_direction dir,
+    int priv, gfp_t gfp_mask)
 {
     struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);

     if (!rq)
         return NULL;

+    rq->cmd_flags = REQ_ALLOCED;
+
+    BUG_ON(!(dma_uni_dir(dir) || (dir == DMA_BIDIRECTIONAL)));
+    rq->data_dir = dir;
+
     /*
-     * first three bits are identical in rq->cmd_flags and bio->bi_rw,
-     * see bio.h and blkdev.h
+     * FIXME: Safeguard from unqueued requests
+     * that were not allocated/initted by us
      */
-    rq->cmd_flags = rw | REQ_ALLOCED;
+    if (dir == DMA_BIDIRECTIONAL)
+        rq->cmd_flags |= REQ_BIDI;

     if (priv) {
         if (unlikely(elv_set_request(q, rq, gfp_mask))) {
@@ -2055,16 +2063,17 @@ static void freed_request(request_queue_t *q, int rw, int priv)
  * Returns NULL on failure, with queue_lock held.
  * Returns !NULL on success, with queue_lock *not held*.
  */
-static struct request *get_request(request_queue_t *q, int rw_flags,
-                   struct bio *bio, gfp_t gfp_mask)
+static struct request *get_request(request_queue_t *q,
+    enum dma_data_direction dir, struct bio *bio, gfp_t gfp_mask)
 {
     struct request *rq = NULL;
     struct request_list *rl = &q->rq;
     struct io_context *ioc = NULL;
-    const int rw = rw_flags & 0x01;
     int may_queue, priv;
+    int rw = dma_write_dir(dir);
+    int is_sync = (rw==READ) || (bio && bio_sync(bio));

-    may_queue = elv_may_queue(q, rw_flags);
+    may_queue = elv_may_queue(q, rw, is_sync);
     if (may_queue == ELV_MQUEUE_NO)
         goto rq_starved;

@@ -2112,7 +2121,7 @@ static struct request *get_request(request_queue_t *q, int rw_flags,

     spin_unlock_irq(q->queue_lock);

-    rq = blk_alloc_request(q, rw_flags, priv, gfp_mask);
+    rq = blk_alloc_request(q, dir, priv, gfp_mask);
     if (unlikely(!rq)) {
         /*
          * Allocation failed presumably due to memory. Undo anything
@@ -2160,13 +2169,13 @@ out:
  *
  * Called with q->queue_lock held, and returns with it unlocked.
  */
-static struct request *get_request_wait(request_queue_t *q, int rw_flags,
-                    struct bio *bio)
+static struct request *get_request_wait(request_queue_t *q,
+    enum dma_data_direction dir, struct bio *bio)
 {
-    const int rw = rw_flags & 0x01;
+    const int rw = dma_write_dir(dir);
     struct request *rq;

-    rq = get_request(q, rw_flags, bio, GFP_NOIO);
+    rq = get_request(q, dir, bio, GFP_NOIO);
     while (!rq) {
         DEFINE_WAIT(wait);
         struct request_list *rl = &q->rq;
@@ -2174,7 +2183,7 @@ static struct request *get_request_wait(request_queue_t *q, int rw_flags,
         prepare_to_wait_exclusive(&rl->wait[rw], &wait,
                 TASK_UNINTERRUPTIBLE);

-        rq = get_request(q, rw_flags, bio, GFP_NOIO);
+        rq = get_request(q, dir, bio, GFP_NOIO);

         if (!rq) {
             struct io_context *ioc;
@@ -2202,17 +2211,16 @@ static struct request *get_request_wait(request_queue_t *q, int rw_flags,
     return rq;
 }

-struct request *blk_get_request(request_queue_t *q, int rw, gfp_t gfp_mask)
+struct request *blk_get_request(request_queue_t *q,
+    enum dma_data_direction dir, gfp_t gfp_mask)
 {
     struct request *rq;

-    BUG_ON(rw != READ && rw != WRITE);
-
     spin_lock_irq(q->queue_lock);
     if (gfp_mask & __GFP_WAIT) {
-        rq = get_request_wait(q, rw, NULL);
+        rq = get_request_wait(q, dir, NULL);
     } else {
-        rq = get_request(q, rw, NULL, gfp_mask);
+        rq = get_request(q, dir, NULL, gfp_mask);
         if (!rq)
             spin_unlock_irq(q->queue_lock);
     }
@@ -2223,6 +2231,22 @@ struct request *blk_get_request(request_queue_t *q, int rw, gfp_t gfp_mask)
 EXPORT_SYMBOL(blk_get_request);

 /**
+ * blk_rq_init_unqueued_req - Initialize a request that does not come from a Q
+ *
+ * Description:
+ * Drivers that need to send a request that does not belong to a Q, like on
+ * the stack, should not bzero the request.
+ * They should call this function instead to initialize the request.
+ * It should never be called with a request that was allocated by a Q.
+ */
+void blk_rq_init_unqueued_req(struct request * rq)
+{
+    memset(rq, 0, sizeof(*rq));
+    rq->data_dir = DMA_FROM_DEVICE;
+}
+EXPORT_SYMBOL(blk_rq_init_unqueued_req);
+
+/**
  * blk_start_queueing - initiate dispatch of requests to device
  * @q:        request queue to kick into gear
  *
@@ -2335,7 +2359,7 @@ static int __blk_rq_map_user(request_queue_t *q, struct request *rq,
     struct bio *bio, *orig_bio;
     int reading, ret;

-    reading = rq_data_dir(rq) == READ;
+    reading = rq_rw_dir(rq) == READ;

     /*
      * if alignment requirement is satisfied, map in user pages for
@@ -2479,7 +2503,7 @@ int blk_rq_map_user_iov(request_queue_t *q, struct request *rq,
     /* we don't allow misaligned data like bio_map_user() does.  If the
      * user is using sg, they're expected to know the alignment constraints
      * and respect them accordingly */
-    bio = bio_map_user_iov(q, NULL, iov, iov_count, rq_data_dir(rq)== READ);
+    bio = bio_map_user_iov(q, NULL, iov, iov_count, rq_rw_dir(rq)== READ);
     if (IS_ERR(bio))
         return PTR_ERR(bio);

@@ -2552,7 +2576,7 @@ int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
     if (IS_ERR(bio))
         return PTR_ERR(bio);

-    if (rq_data_dir(rq) == WRITE)
+    if (dma_write_dir(rq->data_dir))
         bio->bi_rw |= (1 << BIO_RW);

     blk_rq_bio_prep(q, rq, bio);
@@ -2663,7 +2687,7 @@ EXPORT_SYMBOL(blkdev_issue_flush);

 static void drive_stat_acct(struct request *rq, int nr_sectors, int new_io)
 {
-    int rw = rq_data_dir(rq);
+    int rw = rq_rw_dir(rq);

     if (!blk_fs_request(rq) || !rq->rq_disk)
         return;
@@ -2741,7 +2765,7 @@ void __blk_put_request(request_queue_t *q, struct request *req)
      * it didn't come out of our reserved rq pools
      */
     if (req->cmd_flags & REQ_ALLOCED) {
-        int rw = rq_data_dir(req);
+        int rw = rq_rw_dir(req);
         int priv = req->cmd_flags & REQ_ELVPRIV;

         BUG_ON(!list_empty(&req->queuelist));
@@ -2807,7 +2831,7 @@ static int attempt_merge(request_queue_t *q, struct request *req,
     if (req->sector + req->nr_sectors != next->sector)
         return 0;

-    if (rq_data_dir(req) != rq_data_dir(next)
+    if (rq_dma_dir(req) != rq_dma_dir(next)
         || req->rq_disk != next->rq_disk
         || next->special)
         return 0;
@@ -2908,7 +2932,6 @@ static int __make_request(request_queue_t *q, struct bio *bio)
     int el_ret, nr_sectors, barrier, err;
     const unsigned short prio = bio_prio(bio);
     const int sync = bio_sync(bio);
-    int rw_flags;

     nr_sectors = bio_sectors(bio);

@@ -2983,19 +3006,11 @@ static int __make_request(request_queue_t *q, struct bio *bio)

 get_rq:
     /*
-     * This sync check and mask will be re-done in init_request_from_bio(),
-     * but we need to set it earlier to expose the sync flag to the
-     * rq allocator and io schedulers.
-     */
-    rw_flags = bio_data_dir(bio);
-    if (sync)
-        rw_flags |= REQ_RW_SYNC;
-
-    /*
      * Grab a free request. This is might sleep but can not fail.
      * Returns with the queue unlocked.
      */
-    req = get_request_wait(q, rw_flags, bio);
+    req = get_request_wait(q,
+        bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE, bio);

     /*
      * After dropping the lock and possibly sleeping here, our request
@@ -3346,7 +3361,7 @@ static int __end_that_request_first(struct request *req, int uptodate,
     if (!blk_pc_request(req))
         req->errors = 0;

-    if (!uptodate) {
+    if (error) {
         if (blk_fs_request(req) && !(req->cmd_flags & REQ_QUIET))
             printk("end_request: I/O error, dev %s, sector %llu\n",
                 req->rq_disk ? req->rq_disk->disk_name : "?",
@@ -3354,7 +3369,7 @@ static int __end_that_request_first(struct request *req, int uptodate,
     }

     if (blk_fs_request(req) && req->rq_disk) {
-        const int rw = rq_data_dir(req);
+        const int rw = rq_rw_dir(req);

         disk_stat_add(req->rq_disk, sectors[rw], nr_bytes >> 9);
     }
@@ -3578,7 +3593,7 @@ void end_that_request_last(struct request *req, int uptodate)
      */
     if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
         unsigned long duration = jiffies - req->start_time;
-        const int rw = rq_data_dir(req);
+        const int rw = rq_rw_dir(req);

         __disk_stat_inc(disk, ios[rw]);
         __disk_stat_add(disk, ticks[rw], duration);
@@ -3606,8 +3621,20 @@ EXPORT_SYMBOL(end_request);

 void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio)
 {
-    /* first two bits are identical in rq->cmd_flags and bio->bi_rw */
-    rq->cmd_flags |= (bio->bi_rw & 3);
+    rq->data_dir = bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+
+    if (bio->bi_rw & (1<<BIO_RW_SYNC))
+        rq->cmd_flags |= REQ_RW_SYNC;
+    else
+        rq->cmd_flags &= ~REQ_RW_SYNC;
+    /* FIXME: what about other flags, should we sync these too? */
+    /*
+    BIO_RW_AHEAD    ==> ??
+    BIO_RW_BARRIER    ==> REQ_SOFTBARRIER/REQ_HARDBARRIER
+    BIO_RW_FAILFAST    ==> REQ_FAILFAST
+    BIO_RW_SYNC    ==> REQ_RW_SYNC
+    BIO_RW_META    ==> REQ_RW_META
+    */

     rq->nr_phys_segments = bio_phys_segments(q, bio);
     rq->nr_hw_segments = bio_hw_segments(q, bio);
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 65c6a3c..8fc6b65 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -254,7 +254,8 @@ static int sg_io(struct file *file, request_queue_t *q,
             break;
         }

-    rq = blk_get_request(q, writing ? WRITE : READ, GFP_KERNEL);
+    rq = blk_get_request(q,
+        writing ? DMA_TO_DEVICE : DMA_FROM_DEVICE, GFP_KERNEL);
     if (!rq)
         return -ENOMEM;

@@ -410,7 +411,8 @@ int sg_scsi_ioctl(struct file *file, struct request_queue *q,
         memset(buffer, 0, bytes);
     }

-    rq = blk_get_request(q, in_len ? WRITE : READ, __GFP_WAIT);
+    rq = blk_get_request(q,
+        in_len ? DMA_TO_DEVICE : DMA_FROM_DEVICE, __GFP_WAIT);

     cmdlen = COMMAND_SIZE(opcode);

@@ -495,10 +497,8 @@ static int __blk_send_generic(request_queue_t *q, struct gendisk *bd_disk, int c
     struct request *rq;
     int err;

-    rq = blk_get_request(q, WRITE, __GFP_WAIT);
+    rq = blk_get_request(q, DMA_NONE, __GFP_WAIT);
     rq->cmd_type = REQ_TYPE_BLOCK_PC;
-    rq->data = NULL;
-    rq->data_len = 0;
     rq->timeout = BLK_DEFAULT_TIMEOUT;
     memset(rq->cmd, 0, sizeof(rq->cmd));
     rq->cmd[0] = cmd;
diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index 92bf868..8ba5142 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -3322,7 +3322,7 @@ static int DAC960_process_queue(DAC960_Controller_T *Controller, struct request_
     if (Command == NULL)
         return 0;

-    if (rq_data_dir(Request) == READ) {
+    if (rq_uni_rw_dir(Request) == READ) {
         Command->DmaDirection = PCI_DMA_FROMDEVICE;
         Command->CommandType = DAC960_ReadCommand;
     } else {
diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c
index 5d65621..54f2fb3 100644
--- a/drivers/block/amiflop.c
+++ b/drivers/block/amiflop.c
@@ -1379,7 +1379,7 @@ static void redo_fd_request(void)
                "0x%08lx\n", track, sector, data);
 #endif

-        if ((rq_data_dir(CURRENT) != READ) && (rq_data_dir(CURRENT) != WRITE)) {
+        if (!dma_uni_dir(CURRENT->data_dir)) {
             printk(KERN_WARNING "do_fd_request: unknown command\n");
             end_request(CURRENT, 0);
             goto repeat;
@@ -1389,7 +1389,7 @@ static void redo_fd_request(void)
             goto repeat;
         }

-        switch (rq_data_dir(CURRENT)) {
+        switch (rq_rw_dir(CURRENT)) {
         case READ:
             memcpy(data, floppy->trackbuf + sector * 512, 512);
             break;
diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index 14d6b94..b940802 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1453,7 +1453,7 @@ repeat:
     del_timer( &motor_off_timer );

     ReqCnt = 0;
-    ReqCmd = rq_data_dir(CURRENT);
+    ReqCmd = rq_uni_rw_dir(CURRENT);
     ReqBlock = CURRENT->sector;
     ReqBuffer = CURRENT->buffer;
     setup_req_params( drive );
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 072e18e..d87eecc 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1237,7 +1237,7 @@ static void cciss_softirq_done(struct request *rq)
     complete_buffers(rq->bio, rq->errors);

     if (blk_fs_request(rq)) {
-        const int rw = rq_data_dir(rq);
+        const int rw = rq_rw_dir(rq);

         disk_stat_add(rq->rq_disk, sectors[rw], rq->nr_sectors);
     }
@@ -2501,10 +2501,10 @@ static void do_cciss_request(request_queue_t *q)
     c->Request.Type.Type = TYPE_CMD;    // It is a command.
     c->Request.Type.Attribute = ATTR_SIMPLE;
     c->Request.Type.Direction =
-        (rq_data_dir(creq) == READ) ? XFER_READ : XFER_WRITE;
+        (rq_uni_rw_dir(creq) == READ) ? XFER_READ : XFER_WRITE;
     c->Request.Timeout = 0;    // Don't time out
     c->Request.CDB[0] =
-        (rq_data_dir(creq) == READ) ? h->cciss_read : h->cciss_write;
+        (rq_uni_rw_dir(creq) == READ) ? h->cciss_read : h->cciss_write;
     start_blk = creq->sector;
 #ifdef CCISS_DEBUG
     printk(KERN_DEBUG "ciss: sector =%d nr_sectors=%d\n", (int)creq->sector,
diff --git a/drivers/block/cpqarray.c b/drivers/block/cpqarray.c
index b94cd1c..59b62cf 100644
--- a/drivers/block/cpqarray.c
+++ b/drivers/block/cpqarray.c
@@ -922,7 +922,7 @@ DBGPX(
     seg = blk_rq_map_sg(q, creq, tmp_sg);

     /* Now do all the DMA Mappings */
-    if (rq_data_dir(creq) == READ)
+    if (rq_uni_rw_dir(creq) == READ)
         dir = PCI_DMA_FROMDEVICE;
     else
         dir = PCI_DMA_TODEVICE;
@@ -937,7 +937,7 @@ DBGPX(
 DBGPX(    printk("Submitting %d sectors in %d segments\n", creq->nr_sectors, seg); );
     c->req.hdr.sg_cnt = seg;
     c->req.hdr.blk_cnt = creq->nr_sectors;
-    c->req.hdr.cmd = (rq_data_dir(creq) == READ) ? IDA_READ : IDA_WRITE;
+    c->req.hdr.cmd = (rq_uni_rw_dir(creq) == READ) ? IDA_READ : IDA_WRITE;
     c->type = CMD_RWREQ;

     /* Put the request on the tail of the request queue */
@@ -1033,7 +1033,7 @@ static inline void complete_command(cmdlist_t *cmd, int timeout)
     complete_buffers(rq->bio, ok);

     if (blk_fs_request(rq)) {
-        const int rw = rq_data_dir(rq);
+        const int rw = rq_rw_dir(rq);

         disk_stat_add(rq->rq_disk, sectors[rw], rq->nr_sectors);
     }
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 5231ed7..d27b8aa 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -2325,7 +2325,7 @@ static void request_done(int uptodate)
         floppy_end_request(req, 1);
         spin_unlock_irqrestore(q->queue_lock, flags);
     } else {
-        if (rq_data_dir(req) == WRITE) {
+        if (rq_uni_rw_dir(req) == WRITE) {
             /* record write error information */
             DRWE->write_errors++;
             if (DRWE->write_errors == 1) {
@@ -2628,15 +2628,12 @@ static int make_raw_rw_request(void)
     raw_cmd->flags = FD_RAW_SPIN | FD_RAW_NEED_DISK | FD_RAW_NEED_DISK |
         FD_RAW_NEED_SEEK;
     raw_cmd->cmd_count = NR_RW;
-    if (rq_data_dir(current_req) == READ) {
+    if (rq_uni_rw_dir(current_req) == READ) {
         raw_cmd->flags |= FD_RAW_READ;
         COMMAND = FM_MODE(_floppy, FD_READ);
-    } else if (rq_data_dir(current_req) == WRITE) {
+    } else {
         raw_cmd->flags |= FD_RAW_WRITE;
         COMMAND = FM_MODE(_floppy, FD_WRITE);
-    } else {
-        DPRINT("make_raw_rw_request: unknown command\n");
-        return 0;
     }

     max_sector = _floppy->sect * _floppy->head;
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 090796b..411e138 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -434,7 +434,7 @@ static void do_nbd_request(request_queue_t * q)
         BUG_ON(lo->magic != LO_MAGIC);

         nbd_cmd(req) = NBD_CMD_READ;
-        if (rq_data_dir(req) == WRITE) {
+        if (rq_uni_rw_dir(req) == WRITE) {
             nbd_cmd(req) = NBD_CMD_WRITE;
             if (lo->flags & NBD_READ_ONLY) {
                 printk(KERN_ERR "%s: Write on read-only\n",
@@ -502,6 +502,7 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
     dprintk(DBG_IOCTL, "%s: nbd_ioctl cmd=%s(0x%x) arg=%lu\n",
             lo->disk->disk_name, ioctl_cmd_to_ascii(cmd), cmd, arg);

+    blk_rq_init_unqueued_req(&sreq);
     switch (cmd) {
     case NBD_DISCONNECT:
             printk(KERN_INFO "%s: NBD_DISCONNECT\n", lo->disk->disk_name);
diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c
index c852eed..375499e 100644
--- a/drivers/block/paride/pcd.c
+++ b/drivers/block/paride/pcd.c
@@ -722,7 +722,7 @@ static void do_pcd_request(request_queue_t * q)
         if (!pcd_req)
             return;

-        if (rq_data_dir(pcd_req) == READ) {
+        if (rq_dma_dir(pcd_req) == DMA_FROM_DEVICE) {
             struct pcd_unit *cd = pcd_req->rq_disk->private_data;
             if (cd != pcd_current)
                 pcd_bufblk = -1;
diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c
index 31e0148..0db4e8a 100644
--- a/drivers/block/paride/pd.c
+++ b/drivers/block/paride/pd.c
@@ -441,21 +441,18 @@ static enum action do_pd_io_start(void)
         return pd_special();
     }

-    pd_cmd = rq_data_dir(pd_req);
-    if (pd_cmd == READ || pd_cmd == WRITE) {
-        pd_block = pd_req->sector;
-        pd_count = pd_req->current_nr_sectors;
-        if (pd_block + pd_count > get_capacity(pd_req->rq_disk))
-            return Fail;
-        pd_run = pd_req->nr_sectors;
-        pd_buf = pd_req->buffer;
-        pd_retries = 0;
-        if (pd_cmd == READ)
-            return do_pd_read_start();
-        else
-            return do_pd_write_start();
-    }
-    return Fail;
+    pd_cmd = rq_uni_rw_dir(pd_req);
+    pd_block = pd_req->sector;
+    pd_count = pd_req->current_nr_sectors;
+    if (pd_block + pd_count > get_capacity(pd_req->rq_disk))
+        return Fail;
+    pd_run = pd_req->nr_sectors;
+    pd_buf = pd_req->buffer;
+    pd_retries = 0;
+    if (pd_cmd == READ)
+        return do_pd_read_start();
+    else
+        return do_pd_write_start();
 }

 static enum action pd_special(void)
@@ -716,7 +713,7 @@ static int pd_special_command(struct pd_unit *disk,
     struct request rq;
     int err = 0;

-    memset(&rq, 0, sizeof(rq));
+    blk_rq_init_unqueued_req(&rq);
     rq.errors = 0;
     rq.rq_disk = disk->gd;
     rq.ref_count = 1;
diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index 7cdaa19..47e9bcb 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -779,20 +779,15 @@ repeat:
         goto repeat;
     }

-    pf_cmd = rq_data_dir(pf_req);
+    pf_cmd = rq_uni_rw_dir(pf_req);
     pf_buf = pf_req->buffer;
     pf_retries = 0;

     pf_busy = 1;
     if (pf_cmd == READ)
         pi_do_claimed(pf_current->pi, do_pf_read);
-    else if (pf_cmd == WRITE)
+    else
         pi_do_claimed(pf_current->pi, do_pf_write);
-    else {
-        pf_busy = 0;
-        pf_end_request(0);
-        goto repeat;
-    }
 }

 static int pf_next_buf(void)
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index a4fb703..4c2413b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -758,7 +758,7 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
     int ret = 0;

     rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ?
-                 WRITE : READ, __GFP_WAIT);
+                 DMA_TO_DEVICE : DMA_FROM_DEVICE, __GFP_WAIT);

     if (cgc->buflen) {
         if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, __GFP_WAIT))
diff --git a/drivers/block/ps2esdi.c b/drivers/block/ps2esdi.c
index 688a4fb..cf06abd 100644
--- a/drivers/block/ps2esdi.c
+++ b/drivers/block/ps2esdi.c
@@ -506,11 +506,12 @@ static void do_ps2esdi_request(request_queue_t * q)
         return;
     }

-    switch (rq_data_dir(req)) {
-    case READ:
+    switch (rq_dma_dir(req)) {
+    case DMA_FROM_DEVICE:
+    case DMA_NONE:
         ps2esdi_readwrite(READ, req);
         break;
-    case WRITE:
+    case DMA_TO_DEVICE:
         ps2esdi_readwrite(WRITE, req);
         break;
     default:
@@ -859,7 +860,7 @@ static void ps2esdi_normal_interrupt_handler(u_int int_ret_code)
     case INT_TRANSFER_REQ:
         ps2esdi_prep_dma(current_req->buffer,
                  current_req->current_nr_sectors,
-            (rq_data_dir(current_req) == READ)
+            (rq_uni_rw_dir(current_req) == READ)
             ? MCA_DMA_MODE_16 | MCA_DMA_MODE_WRITE | MCA_DMA_MODE_XFER
             : MCA_DMA_MODE_16 | MCA_DMA_MODE_READ);
         outb(CTRL_ENABLE_DMA | CTRL_ENABLE_INTR, ESDI_CONTROL);
diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c
index 1a65979..91eb6f9 100644
--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -336,7 +336,7 @@ static void start_request(struct floppy_state *fs)
             continue;
         }

-        if (rq_data_dir(req) == WRITE) {
+        if (rq_uni_rw_dir(req) == WRITE) {
             if (fs->write_prot < 0)
                 fs->write_prot = swim3_readbit(fs, WRITE_PROT);
             if (fs->write_prot) {
@@ -432,7 +432,7 @@ static inline void setup_transfer(struct floppy_state *fs)
         printk(KERN_ERR "swim3: transfer 0 sectors?\n");
         return;
     }
-    if (rq_data_dir(fd_req) == WRITE)
+    if (rq_uni_rw_dir(fd_req) == WRITE)
         n = 1;
     else {
         n = fs->secpertrack - fs->req_sector + 1;
@@ -445,7 +445,7 @@ static inline void setup_transfer(struct floppy_state *fs)
     out_8(&sw->nsect, n);
     out_8(&sw->gap3, 0);
     out_le32(&dr->cmdptr, virt_to_bus(cp));
-    if (rq_data_dir(fd_req) == WRITE) {
+    if (rq_uni_rw_dir(fd_req) == WRITE) {
         /* Set up 3 dma commands: write preamble, data, postamble */
         init_dma(cp, OUTPUT_MORE, write_preamble, sizeof(write_preamble));
         ++cp;
@@ -460,7 +460,7 @@ static inline void setup_transfer(struct floppy_state *fs)
     out_8(&sw->control_bic, DO_ACTION | WRITE_SECTORS);
     in_8(&sw->error);
     out_8(&sw->control_bic, DO_ACTION | WRITE_SECTORS);
-    if (rq_data_dir(fd_req) == WRITE)
+    if (rq_uni_rw_dir(fd_req) == WRITE)
         out_8(&sw->control_bis, WRITE_SECTORS);
     in_8(&sw->intr);
     out_le32(&dr->control, (RUN << 16) | RUN);
@@ -609,7 +609,7 @@ static void xfer_timeout(unsigned long data)
     out_8(&sw->intr_enable, 0);
     out_8(&sw->control_bic, WRITE_SECTORS | DO_ACTION);
     out_8(&sw->select, RELAX);
-    if (rq_data_dir(fd_req) == WRITE)
+    if (rq_uni_rw_dir(fd_req) == WRITE)
         ++cp;
     if (ld_le16(&cp->xfer_status) != 0)
         s = fs->scount - ((ld_le16(&cp->res_count) + 511) >> 9);
@@ -617,8 +617,8 @@ static void xfer_timeout(unsigned long data)
         s = 0;
     fd_req->sector += s;
     fd_req->current_nr_sectors -= s;
-    printk(KERN_ERR "swim3: timeout %sing sector %ld\n",
-           (rq_data_dir(fd_req)==WRITE? "writ": "read"), (long)fd_req->sector);
+    printk(KERN_ERR "swim3: timeout %s sector %ld\n",
+           dma_dir_to_string(fd_rq->data_dir), (long)fd_req->sector);
     end_request(fd_req, 0);
     fs->state = idle;
     start_request(fs);
@@ -636,8 +636,8 @@ static irqreturn_t swim3_interrupt(int irq, void *dev_id)
     intr = in_8(&sw->intr);
     err = (intr & ERROR_INTR)? in_8(&sw->error): 0;
     if ((intr & ERROR_INTR) && fs->state != do_transfer)
-        printk(KERN_ERR "swim3_interrupt, state=%d, dir=%x, intr=%x, err=%x\n",
-               fs->state, rq_data_dir(fd_req), intr, err);
+        printk(KERN_ERR "swim3_interrupt, state=%d, dir=%d, intr=%x, err=%x\n",
+               fs->state, rq_dma_dir(fd_req), intr, err);
     switch (fs->state) {
     case locating:
         if (intr & SEEN_SECTOR) {
@@ -698,7 +698,7 @@ static irqreturn_t swim3_interrupt(int irq, void *dev_id)
         fs->timeout_pending = 0;
         dr = fs->dma;
         cp = fs->dma_cmd;
-        if (rq_data_dir(fd_req) == WRITE)
+        if (rq_uni_rw_dir(fd_req) == WRITE)
             ++cp;
         /*
          * Check that the main data transfer has finished.
@@ -733,7 +733,7 @@ static irqreturn_t swim3_interrupt(int irq, void *dev_id)
                 act(fs);
             } else {
                 printk("swim3: error %sing block %ld (err=%x)\n",
-                       rq_data_dir(fd_req) == WRITE? "writ": "read",
+                       rq_rw_dir(fd_req) == WRITE? "writ": "read",
                        (long)fd_req->sector, err);
                 end_request(fd_req, 0);
                 fs->state = idle;
@@ -742,8 +742,8 @@ static irqreturn_t swim3_interrupt(int irq, void *dev_id)
             if ((stat & ACTIVE) == 0 || resid != 0) {
                 /* musta been an error */
                 printk(KERN_ERR "swim3: fd dma: stat=%x resid=%d\n", stat, resid);
-                printk(KERN_ERR "  state=%d, dir=%x, intr=%x, err=%x\n",
-                       fs->state, rq_data_dir(fd_req), intr, err);
+                printk(KERN_ERR "  state=%d, dir=%d, intr=%x, err=%x\n",
+                       fs->state, rq_dma_dir(fd_req), intr, err);
                 end_request(fd_req, 0);
                 fs->state = idle;
                 start_request(fs);
diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index 54509eb..61df2a0 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -565,7 +565,7 @@ static struct carm_request *carm_get_special(struct carm_host *host)
     if (!crq)
         return NULL;

-    rq = blk_get_request(host->oob_q, WRITE /* bogus */, GFP_KERNEL);
+    rq = blk_get_request(host->oob_q, DMA_TO_DEVICE /* bogus */, GFP_KERNEL);
     if (!rq) {
         spin_lock_irqsave(&host->lock, flags);
         carm_put_request(host, crq);
@@ -860,7 +860,7 @@ queue_one_request:

     blkdev_dequeue_request(rq);

-    if (rq_data_dir(rq) == WRITE) {
+    if (rq_uni_rw_dir(rq) == WRITE) {
         writing = 1;
         pci_dir = PCI_DMA_TODEVICE;
     } else {
@@ -1053,7 +1053,7 @@ static inline void carm_handle_rw(struct carm_host *host,

     VPRINTK("ENTER\n");

-    if (rq_data_dir(crq->rq) == WRITE)
+    if (rq_uni_rw_dir(crq->rq) == WRITE)
         pci_dir = PCI_DMA_TODEVICE;
     else
         pci_dir = PCI_DMA_FROMDEVICE;
diff --git a/drivers/block/ub.c b/drivers/block/ub.c
index 2098eff..c789a37 100644
--- a/drivers/block/ub.c
+++ b/drivers/block/ub.c
@@ -709,11 +709,7 @@ static void ub_cmd_build_block(struct ub_dev *sc, struct ub_lun *lun,
     struct request *rq = urq->rq;
     unsigned int block, nblks;

-    if (rq_data_dir(rq) == WRITE)
-        cmd->dir = UB_DIR_WRITE;
-    else
-        cmd->dir = UB_DIR_READ;
-
+    cmd->dir = rq_uni_rw_dir(rq) ? UB_DIR_WRITE : UB_DIR_READ;
     cmd->nsg = urq->nsg;
     memcpy(cmd->sgv, urq->sgv, sizeof(struct scatterlist) * cmd->nsg);

@@ -747,7 +743,7 @@ static void ub_cmd_build_packet(struct ub_dev *sc, struct ub_lun *lun,
     if (rq->data_len == 0) {
         cmd->dir = UB_DIR_NONE;
     } else {
-        if (rq_data_dir(rq) == WRITE)
+        if (rq_uni_rw_dir(rq) == WRITE)
             cmd->dir = UB_DIR_WRITE;
         else
             cmd->dir = UB_DIR_READ;
diff --git a/drivers/block/viodasd.c b/drivers/block/viodasd.c
index 68592c3..ff960c6 100644
--- a/drivers/block/viodasd.c
+++ b/drivers/block/viodasd.c
@@ -302,7 +302,7 @@ static int send_request(struct request *req)

     start = (u64)req->sector << 9;

-    if (rq_data_dir(req) == READ) {
+    if (rq_uni_rw_dir(req) == READ) {
         direction = DMA_FROM_DEVICE;
         viocmd = viomajorsubtype_blockio | vioblockread;
         statindex = 0;
diff --git a/drivers/block/xd.c b/drivers/block/xd.c
index 0d97b7e..3a4a377 100644
--- a/drivers/block/xd.c
+++ b/drivers/block/xd.c
@@ -308,7 +308,7 @@ static void do_xd_request (request_queue_t * q)
     while ((req = elv_next_request(q)) != NULL) {
         unsigned block = req->sector;
         unsigned count = req->nr_sectors;
-        int rw = rq_data_dir(req);
+        int rw = rq_rw_dir(req);
         XD_INFO *disk = req->rq_disk->private_data;
         int res = 0;
         int retry;
@@ -321,7 +321,7 @@ static void do_xd_request (request_queue_t * q)
             end_request(req, 0);
             continue;
         }
-        if (rw != READ && rw != WRITE) {
+        if (!dma_uni_dir(req->data_dir)) {
             printk("do_xd_request: unknown request\n");
             end_request(req, 0);
             continue;
diff --git a/drivers/block/z2ram.c b/drivers/block/z2ram.c
index 7cc2685..6cc92b3 100644
--- a/drivers/block/z2ram.c
+++ b/drivers/block/z2ram.c
@@ -89,7 +89,7 @@ static void do_z2_request(request_queue_t *q)
             if (len < size)
                 size = len;
             addr += z2ram_map[ start >> Z2RAM_CHUNKSHIFT ];
-            if (rq_data_dir(req) == READ)
+            if (rq_uni_rw_dir(req) == READ)
                 memcpy(req->buffer, (char *)addr, size);
             else
                 memcpy((char *)addr, req->buffer, size);
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index b36f44d..53f383d 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2103,7 +2103,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
     if (!q)
         return -ENXIO;

-    rq = blk_get_request(q, READ, GFP_KERNEL);
+    rq = blk_get_request(q, DMA_FROM_DEVICE, GFP_KERNEL);
     if (!rq)
         return -ENOMEM;

diff --git a/drivers/cdrom/cdu31a.c b/drivers/cdrom/cdu31a.c
index 2157c58..2eb37bf 100644
--- a/drivers/cdrom/cdu31a.c
+++ b/drivers/cdrom/cdu31a.c
@@ -1342,7 +1342,7 @@ static void do_cdu31a_request(request_queue_t * q)
             end_request(req, 0);
             continue;
         }
-        if (rq_data_dir(req) == WRITE) {
+        if (rq_rw_dir(req) == WRITE) {
             end_request(req, 0);
             continue;
         }
diff --git a/drivers/cdrom/gscd.c b/drivers/cdrom/gscd.c
index b3ab6e9..8411f8c 100644
--- a/drivers/cdrom/gscd.c
+++ b/drivers/cdrom/gscd.c
@@ -265,7 +265,7 @@ repeat:
         goto out;

     if (req->cmd != READ) {
-        printk("GSCD: bad cmd %u\n", rq_data_dir(req));
+        printk("GSCD: bad cmd %u\n", rq_dma_dir(req));
         end_request(req, 0);
         goto repeat;
     }
diff --git a/drivers/cdrom/sbpcd.c b/drivers/cdrom/sbpcd.c
index a1283b1..f345876 100644
--- a/drivers/cdrom/sbpcd.c
+++ b/drivers/cdrom/sbpcd.c
@@ -4550,7 +4550,7 @@ static void do_sbpcd_request(request_queue_t * q)
     spin_unlock_irq(q->queue_lock);

     down(&ioctl_read_sem);
-    if (rq_data_dir(elv_next_request(q)) != READ)
+    if (rq_rw_dir(elv_next_request(q)) != READ)
     {
         msg(DBG_INF, "bad cmd %d\n", req->cmd[0]);
         goto err_done;
diff --git a/drivers/cdrom/sonycd535.c b/drivers/cdrom/sonycd535.c
index f77ada9..3516cc7 100644
--- a/drivers/cdrom/sonycd535.c
+++ b/drivers/cdrom/sonycd535.c
@@ -816,7 +816,7 @@ do_cdu535_request(request_queue_t * q)
             end_request(req, 0);
             continue;
         }
-        if (rq_data_dir(req) == WRITE) {
+        if (rq_rw_dir(req) == WRITE) {
             end_request(req, 0);
             continue;
         }
diff --git a/drivers/cdrom/viocd.c b/drivers/cdrom/viocd.c
index 44cd7b2..9afb9e9 100644
--- a/drivers/cdrom/viocd.c
+++ b/drivers/cdrom/viocd.c
@@ -338,7 +338,7 @@ static int send_request(struct request *req)

     BUG_ON(req->nr_phys_segments > 1);

-    if (rq_data_dir(req) == READ) {
+    if (rq_uni_rw_dir(req) == READ) {
         direction = DMA_FROM_DEVICE;
         cmd = viomajorsubtype_cdio | viocdread;
     } else {
diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c
index e2953fc..ff14c3c 100644
--- a/drivers/ide/arm/icside.c
+++ b/drivers/ide/arm/icside.c
@@ -214,10 +214,7 @@ static void icside_build_sglist(ide_drive_t *drive, struct request *rq)

     ide_map_sg(drive, rq);

-    if (rq_data_dir(rq) == READ)
-        hwif->sg_dma_direction = DMA_FROM_DEVICE;
-    else
-        hwif->sg_dma_direction = DMA_TO_DEVICE;
+    hwif->sg_dma_direction = rq_dma_dir(rq);

     hwif->sg_nents = dma_map_sg(state->dev, sg, hwif->sg_nents,
                     hwif->sg_dma_direction);
@@ -392,7 +389,7 @@ static int icside_dma_setup(ide_drive_t *drive)
     struct request *rq = hwif->hwgroup->rq;
     unsigned int dma_mode;

-    if (rq_data_dir(rq))
+    if (rq_uni_rw_dir(rq))
         dma_mode = DMA_MODE_WRITE;
     else
         dma_mode = DMA_MODE_READ;
diff --git a/drivers/ide/cris/ide-cris.c b/drivers/ide/cris/ide-cris.c
index 556455f..090269b 100644
--- a/drivers/ide/cris/ide-cris.c
+++ b/drivers/ide/cris/ide-cris.c
@@ -1060,7 +1060,7 @@ static int cris_dma_setup(ide_drive_t *drive)
 {
     struct request *rq = drive->hwif->hwgroup->rq;

-    cris_ide_initialize_dma(!rq_data_dir(rq));
+    cris_ide_initialize_dma(!rq_uni_rw_dir(rq));
     if (cris_ide_build_dmatable (drive)) {
         ide_map_sg(drive, rq);
         return 1;
@@ -1082,7 +1082,7 @@ static void cris_dma_exec_cmd(ide_drive_t *drive, u8 command)
 static void cris_dma_start(ide_drive_t *drive)
 {
     struct request *rq = drive->hwif->hwgroup->rq;
-    int writing = rq_data_dir(rq);
+    int writing = rq_uni_rw_dir(rq);
     int type = TYPE_DMA;

     if (drive->current_speed >= XFER_UDMA_0)
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 45a928c..0387997 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -767,7 +767,7 @@ static int cdrom_decode_status(ide_drive_t *drive, int good_stat, int *stat_ret)

         if (sense_key == NOT_READY) {
             /* Tray open. */
-            if (rq_data_dir(rq) == READ) {
+            if (rq_uni_rw_dir(rq) == READ) {
                 cdrom_saw_media_change (drive);

                 /* Fail the request. */
@@ -1730,7 +1730,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
     /*
      * check which way to transfer data
      */
-    if (rq_data_dir(rq) == WRITE) {
+    if (rq_uni_rw_dir(rq) == WRITE) {
         /*
          * write to drive
          */
@@ -2021,10 +2021,13 @@ ide_do_rw_cdrom (ide_drive_t *drive, struct request *rq, sector_t block)
             }
             CDROM_CONFIG_FLAGS(drive)->seeking = 0;
         }
-        if ((rq_data_dir(rq) == READ) && IDE_LARGE_SEEK(info->last_block, block, IDECD_SEEK_THRESHOLD) && drive->dsc_overlap) {
+        if ((rq_uni_rw_dir(rq) == READ) &&
+            IDE_LARGE_SEEK(info->last_block, block,
+                IDECD_SEEK_THRESHOLD) &&
+            drive->dsc_overlap) {
             action = cdrom_start_seek(drive, block);
         } else {
-            if (rq_data_dir(rq) == READ)
+            if (rq_uni_rw_dir(rq) == READ)
                 action = cdrom_start_read(drive, block);
             else
                 action = cdrom_start_write(drive, rq);
@@ -3066,7 +3069,7 @@ static int ide_cdrom_prep_fs(request_queue_t *q, struct request *rq)

     memset(rq->cmd, 0, sizeof(rq->cmd));

-    if (rq_data_dir(rq) == READ)
+    if (rq_uni_rw_dir(rq) == READ)
         rq->cmd[0] = GPCMD_READ_10;
     else
         rq->cmd[0] = GPCMD_WRITE_10;
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 37aa6dd..5d66966 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -253,7 +253,7 @@ static ide_startstop_t __ide_do_rw_disk(ide_drive_t *drive, struct request *rq,

     if (dma) {
         if (!hwif->dma_setup(drive)) {
-            if (rq_data_dir(rq)) {
+            if (rq_uni_rw_dir(rq)) {
                 command = lba48 ? WIN_WRITEDMA_EXT : WIN_WRITEDMA;
                 if (drive->vdma)
                     command = lba48 ? WIN_WRITE_EXT: WIN_WRITE;
@@ -270,7 +270,7 @@ static ide_startstop_t __ide_do_rw_disk(ide_drive_t *drive, struct request *rq,
         ide_init_sg_cmd(drive, rq);
     }

-    if (rq_data_dir(rq) == READ) {
+    if (rq_uni_rw_dir(rq) == READ) {

         if (drive->mult_count) {
             hwif->data_phase = TASKFILE_MULTI_IN;
@@ -318,8 +318,8 @@ static ide_startstop_t ide_do_rw_disk (ide_drive_t *drive, struct request *rq, s

     ledtrig_ide_activity();

-    pr_debug("%s: %sing: block=%llu, sectors=%lu, buffer=0x%08lx\n",
-         drive->name, rq_data_dir(rq) == READ ? "read" : "writ",
+    pr_debug("%s: %s: block=%llu, sectors=%lu, buffer=0x%08lx\n",
+         drive->name, dma_dir_to_string(rq->data_dir),
          (unsigned long long)block, rq->nr_sectors,
          (unsigned long)rq->buffer);

@@ -713,7 +713,7 @@ static int idedisk_issue_flush(request_queue_t *q, struct gendisk *disk,
     if (!drive->wcache)
         return 0;

-    rq = blk_get_request(q, WRITE, __GFP_WAIT);
+    rq = blk_get_request(q, DMA_TO_DEVICE, __GFP_WAIT);

     idedisk_prepare_flush(q, rq);

diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c
index fd21308..abedd21 100644
--- a/drivers/ide/ide-dma.c
+++ b/drivers/ide/ide-dma.c
@@ -207,7 +207,7 @@ int ide_build_sglist(ide_drive_t *drive, struct request *rq)

     ide_map_sg(drive, rq);

-    if (rq_data_dir(rq) == READ)
+    if (rq_uni_rw_dir(rq) == READ)
         hwif->sg_dma_direction = PCI_DMA_FROMDEVICE;
     else
         hwif->sg_dma_direction = PCI_DMA_TODEVICE;
@@ -545,7 +545,7 @@ int ide_dma_setup(ide_drive_t *drive)
     unsigned int reading;
     u8 dma_stat;

-    if (rq_data_dir(rq))
+    if (rq_uni_rw_dir(rq))
         reading = 0;
     else
         reading = 1 << 3;
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index 57cd21c..dfd1562 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -1233,7 +1233,7 @@ static void idefloppy_create_rw_cmd (idefloppy_floppy_t *floppy, idefloppy_pc_t
 {
     int block = sector / floppy->bs_factor;
     int blocks = rq->nr_sectors / floppy->bs_factor;
-    int cmd = rq_data_dir(rq);
+    int cmd = rq_uni_rw_dir(rq);

     debug_log("create_rw1%d_cmd: block == %d, blocks == %d\n",
         2 * test_bit (IDEFLOPPY_USE_READ12, &floppy->flags),
@@ -1251,7 +1251,7 @@ static void idefloppy_create_rw_cmd (idefloppy_floppy_t *floppy, idefloppy_pc_t
     pc->callback = &idefloppy_rw_callback;
     pc->rq = rq;
     pc->b_count = cmd == READ ? 0 : rq->bio->bi_size;
-    if (rq->cmd_flags & REQ_RW)
+    if (cmd == WRITE)
         set_bit(PC_WRITING, &pc->flags);
     pc->buffer = NULL;
     pc->request_transfer = pc->buffer_size = blocks * floppy->block_size;
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 0e02800..0097e51 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -516,7 +516,8 @@ static ide_startstop_t ide_ata_error(ide_drive_t *drive, struct request *rq, u8
         }
     }

-    if ((stat & DRQ_STAT) && rq_data_dir(rq) == READ && hwif->err_stops_fifo == 0)
+    if ((stat & DRQ_STAT) && rq_uni_rw_dir(rq) == READ &&
+        hwif->err_stops_fifo == 0)
         try_to_flush_leftover_data(drive);

     if (rq->errors >= ERROR_MAX || blk_noretry_request(rq)) {
@@ -1706,7 +1707,7 @@ irqreturn_t ide_intr (int irq, void *dev_id)

 void ide_init_drive_cmd (struct request *rq)
 {
-    memset(rq, 0, sizeof(*rq));
+    blk_rq_init_unqueued_req(rq);
     rq->cmd_type = REQ_TYPE_ATA_CMD;
     rq->ref_count = 1;
 }
diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
index 4e59239..3c33d57 100644
--- a/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -1775,7 +1775,7 @@ static void idetape_create_request_sense_cmd (idetape_pc_t *pc)

 static void idetape_init_rq(struct request *rq, u8 cmd)
 {
-    memset(rq, 0, sizeof(*rq));
+    blk_rq_init_unqueued_req(rq);
     rq->cmd_type = REQ_TYPE_SPECIAL;
     rq->cmd[0] = cmd;
 }
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index 30175c7..158338f 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -473,7 +473,7 @@ static int ide_diag_taskfile(ide_drive_t *drive, ide_task_t *args, unsigned long
 {
     struct request rq;

-    memset(&rq, 0, sizeof(rq));
+    ide_init_drive_cmd(&rq);
     rq.cmd_type = REQ_TYPE_ATA_TASKFILE;
     rq.buffer = buf;

@@ -498,8 +498,8 @@ static int ide_diag_taskfile(ide_drive_t *drive, ide_task_t *args, unsigned long
         rq.hard_nr_sectors = rq.nr_sectors;
         rq.hard_cur_sectors = rq.current_nr_sectors = rq.nr_sectors;

-        if (args->command_type == IDE_DRIVE_TASK_RAW_WRITE)
-            rq.cmd_flags |= REQ_RW;
+        rq.data_dir = (args->command_type == IDE_DRIVE_TASK_RAW_WRITE) ?
+            DMA_TO_DEVICE : DMA_FROM_DEVICE;
     }

     rq.special = args;
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index a6f098f..67422df 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -1242,7 +1242,7 @@ static int generic_ide_suspend(struct device *dev, pm_message_t mesg)
     if (!(drive->dn % 2))
         ide_acpi_get_timing(hwif);

-    memset(&rq, 0, sizeof(rq));
+    ide_init_drive_cmd(&rq);
     memset(&rqpm, 0, sizeof(rqpm));
     memset(&args, 0, sizeof(args));
     rq.cmd_type = REQ_TYPE_PM_SUSPEND;
@@ -1270,7 +1270,7 @@ static int generic_ide_resume(struct device *dev)

     ide_acpi_exec_tfs(drive);

-    memset(&rq, 0, sizeof(rq));
+    ide_init_drive_cmd(&rq);
     memset(&rqpm, 0, sizeof(rqpm));
     memset(&args, 0, sizeof(args));
     rq.cmd_type = REQ_TYPE_PM_RESUME;
diff --git a/drivers/ide/legacy/hd.c b/drivers/ide/legacy/hd.c
index 45ed035..ca8dd47 100644
--- a/drivers/ide/legacy/hd.c
+++ b/drivers/ide/legacy/hd.c
@@ -627,13 +627,13 @@ repeat:
         cyl, head, sec, nsect, req->buffer);
 #endif
     if (blk_fs_request(req)) {
-        switch (rq_data_dir(req)) {
-        case READ:
+        switch (rq_dma_dir(req)) {
+        case DMA_FROM_DEVICE:
             hd_out(disk,nsect,sec,head,cyl,WIN_READ,&read_intr);
             if (reset)
                 goto repeat;
             break;
-        case WRITE:
+        case DMA_TO_DEVICE:
             hd_out(disk,nsect,sec,head,cyl,WIN_WRITE,&write_intr);
             if (reset)
                 goto repeat;
diff --git a/drivers/ide/mips/au1xxx-ide.c b/drivers/ide/mips/au1xxx-ide.c
index d54d9fe..ebf2d9b 100644
--- a/drivers/ide/mips/au1xxx-ide.c
+++ b/drivers/ide/mips/au1xxx-ide.c
@@ -248,10 +248,7 @@ static int auide_build_sglist(ide_drive_t *drive,  struct request *rq)

     ide_map_sg(drive, rq);

-    if (rq_data_dir(rq) == READ)
-        hwif->sg_dma_direction = DMA_FROM_DEVICE;
-    else
-        hwif->sg_dma_direction = DMA_TO_DEVICE;
+    hwif->sg_dma_direction = rq_dma_dir(rq);

     return dma_map_sg(ahwif->dev, sg, hwif->sg_nents,
               hwif->sg_dma_direction);
@@ -267,7 +264,7 @@ static int auide_build_dmatable(ide_drive_t *drive)
     _auide_hwif *ahwif = (_auide_hwif*)hwif->hwif_data;
     struct scatterlist *sg;

-    iswrite = (rq_data_dir(rq) == WRITE);
+    iswrite = (rq_uni_rw_dir(rq) == WRITE);
     /* Save for interrupt context */
     ahwif->drive = drive;

diff --git a/drivers/ide/pci/alim15x3.c b/drivers/ide/pci/alim15x3.c
index 83e0aa6..72b3576 100644
--- a/drivers/ide/pci/alim15x3.c
+++ b/drivers/ide/pci/alim15x3.c
@@ -585,7 +585,7 @@ no_dma_set:
 static int ali15x3_dma_setup(ide_drive_t *drive)
 {
     if (m5229_revision < 0xC2 && drive->media != ide_disk) {
-        if (rq_data_dir(drive->hwif->hwgroup->rq))
+        if (rq_uni_rw_dir(drive->hwif->hwgroup->rq))
             return 1;    /* try PIO instead of DMA */
     }
     return ide_dma_setup(drive);
diff --git a/drivers/ide/pci/hpt366.c b/drivers/ide/pci/hpt366.c
index 60ecdc2..dd83c41 100644
--- a/drivers/ide/pci/hpt366.c
+++ b/drivers/ide/pci/hpt366.c
@@ -908,7 +908,7 @@ static void hpt3xxn_set_clock(ide_hwif_t *hwif, u8 mode)

 static void hpt3xxn_rw_disk(ide_drive_t *drive, struct request *rq)
 {
-    hpt3xxn_set_clock(HWIF(drive), rq_data_dir(rq) ? 0x23 : 0x21);
+    hpt3xxn_set_clock(HWIF(drive), rq_uni_rw_dir(rq) ? 0x23 : 0x21);
 }

 /*
diff --git a/drivers/ide/pci/pdc202xx_old.c b/drivers/ide/pci/pdc202xx_old.c
index a7a639f..ddd301a 100644
--- a/drivers/ide/pci/pdc202xx_old.c
+++ b/drivers/ide/pci/pdc202xx_old.c
@@ -357,7 +357,7 @@ static void pdc202xx_old_ide_dma_start(ide_drive_t *drive)

         outb(clock | (hwif->channel ? 0x08 : 0x02), high_16 + 0x11);
         word_count = (rq->nr_sectors << 8);
-        word_count = (rq_data_dir(rq) == READ) ?
+        word_count = (rq_uni_rw_dir(rq) == READ) ?
                     word_count | 0x05000000 :
                     word_count | 0x06000000;
         outl(word_count, atapi_reg);
diff --git a/drivers/ide/pci/scc_pata.c b/drivers/ide/pci/scc_pata.c
index f84bf79..1edfbf6 100644
--- a/drivers/ide/pci/scc_pata.c
+++ b/drivers/ide/pci/scc_pata.c
@@ -398,7 +398,7 @@ static int scc_dma_setup(ide_drive_t *drive)
     unsigned int reading;
     u8 dma_stat;

-    if (rq_data_dir(rq))
+    if (rq_uni_rw_dir(rq))
         reading = 0;
     else
         reading = 1 << 3;
diff --git a/drivers/ide/pci/sgiioc4.c b/drivers/ide/pci/sgiioc4.c
index fd09b29..7533d79 100644
--- a/drivers/ide/pci/sgiioc4.c
+++ b/drivers/ide/pci/sgiioc4.c
@@ -554,7 +554,7 @@ static int sgiioc4_ide_dma_setup(ide_drive_t *drive)
     unsigned int count = 0;
     int ddir;

-    if (rq_data_dir(rq))
+    if (rq_uni_rw_dir(rq))
         ddir = PCI_DMA_TODEVICE;
     else
         ddir = PCI_DMA_FROMDEVICE;
@@ -565,7 +565,7 @@ static int sgiioc4_ide_dma_setup(ide_drive_t *drive)
         return 1;
     }

-    if (rq_data_dir(rq))
+    if (rq_uni_rw_dir(rq))
         /* Writes TO the IOC4 FROM Main Memory */
         ddir = IOC4_DMA_READ;
     else
diff --git a/drivers/ide/pci/trm290.c b/drivers/ide/pci/trm290.c
index cbb1b11..3c39547 100644
--- a/drivers/ide/pci/trm290.c
+++ b/drivers/ide/pci/trm290.c
@@ -191,7 +191,7 @@ static int trm290_ide_dma_setup(ide_drive_t *drive)
     struct request *rq = hwif->hwgroup->rq;
     unsigned int count, rw;

-    if (rq_data_dir(rq)) {
+    if (rq_uni_rw_dir(rq)) {
 #ifdef TRM290_NO_DMA_WRITES
         /* always use PIO for writes */
         trm290_prepare_drive(drive, 0);    /* select PIO xfer */
diff --git a/drivers/ide/ppc/pmac.c b/drivers/ide/ppc/pmac.c
index 071a030..909d43d 100644
--- a/drivers/ide/ppc/pmac.c
+++ b/drivers/ide/ppc/pmac.c
@@ -1596,7 +1596,7 @@ pmac_ide_build_dmatable(ide_drive_t *drive, struct request *rq)
     pmac_ide_hwif_t* pmif = (pmac_ide_hwif_t *)hwif->hwif_data;
     volatile struct dbdma_regs __iomem *dma = pmif->dma_regs;
     struct scatterlist *sg;
-    int wr = (rq_data_dir(rq) == WRITE);
+    int wr = (rq_uni_rw_dir(rq) == WRITE);

     /* DMA table is already aligned */
     table = (struct dbdma_cmd *) pmif->dma_table_cpu;
@@ -1873,7 +1873,8 @@ pmac_ide_dma_setup(ide_drive_t *drive)

     /* Apple adds 60ns to wrDataSetup on reads */
     if (ata4 && (pmif->timings[unit] & TR_66_UDMA_EN)) {
-        writel(pmif->timings[unit] + (!rq_data_dir(rq) ? 0x00800000UL : 0),
+        writel(pmif->timings[unit] +
+            (!rq_uni_rw_dir(rq) ? 0x00800000UL : 0),
             PMAC_IDE_REG(IDE_TIMING_CONFIG));
         (void)readl(PMAC_IDE_REG(IDE_TIMING_CONFIG));
     }
diff --git a/drivers/md/dm-emc.c b/drivers/md/dm-emc.c
index 265c467..347c2b6 100644
--- a/drivers/md/dm-emc.c
+++ b/drivers/md/dm-emc.c
@@ -103,7 +103,7 @@ static struct request *get_failover_req(struct emc_handler *h,
     struct request_queue *q = bdev_get_queue(bdev);

     /* FIXME: Figure out why it fails with GFP_ATOMIC. */
-    rq = blk_get_request(q, WRITE, __GFP_WAIT);
+    rq = blk_get_request(q, DMA_TO_DEVICE, __GFP_WAIT);
     if (!rq) {
         DMERR("get_failover_req: blk_get_request failed");
         return NULL;
diff --git a/drivers/message/i2o/i2o_block.c b/drivers/message/i2o/i2o_block.c
index b17c4b2..1772620 100644
--- a/drivers/message/i2o/i2o_block.c
+++ b/drivers/message/i2o/i2o_block.c
@@ -342,7 +342,7 @@ static inline int i2o_block_sglist_alloc(struct i2o_controller *c,
     ireq->dev = &c->pdev->dev;
     nents = blk_rq_map_sg(ireq->req->q, ireq->req, ireq->sg_table);

-    if (rq_data_dir(ireq->req) == READ)
+    if (rq_uni_rw_dir(ireq->req) == READ)
         direction = PCI_DMA_FROMDEVICE;
     else
         direction = PCI_DMA_TODEVICE;
@@ -362,7 +362,7 @@ static inline void i2o_block_sglist_free(struct i2o_block_request *ireq)
 {
     enum dma_data_direction direction;

-    if (rq_data_dir(ireq->req) == READ)
+    if (rq_uni_rw_dir(ireq->req) == READ)
         direction = PCI_DMA_FROMDEVICE;
     else
         direction = PCI_DMA_TODEVICE;
@@ -779,7 +779,7 @@ static int i2o_block_transfer(struct request *req)

     mptr = &msg->body[0];

-    if (rq_data_dir(req) == READ) {
+    if (rq_uni_rw_dir(req) == READ) {
         cmd = I2O_CMD_BLOCK_READ << 24;

         switch (dev->rcache) {
@@ -844,7 +844,7 @@ static int i2o_block_transfer(struct request *req)
          * SIMPLE_TAG
          * RETURN_SENSE_DATA_IN_REPLY_MESSAGE_FRAME
          */
-        if (rq_data_dir(req) == READ) {
+        if (rq_uni_rw_dir(req) == READ) {
             cmd[0] = READ_10;
             scsi_flags = 0x60a0000a;
         } else {
diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
index 86439a0..566d7e5 100644
--- a/drivers/mmc/mmc_block.c
+++ b/drivers/mmc/mmc_block.c
@@ -248,7 +248,8 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
         if (brq.data.blocks > card->host->max_blk_count)
             brq.data.blocks = card->host->max_blk_count;

-        mmc_set_data_timeout(&brq.data, card, rq_data_dir(req) != READ);
+        mmc_set_data_timeout(&brq.data, card,
+                             rq_uni_rw_dir(req) != READ);

         /*
          * If the host doesn't support multiple block writes, force
@@ -256,7 +257,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
          * this rule as they support querying the number of
          * successfully written sectors.
          */
-        if (rq_data_dir(req) != READ &&
+        if (rq_uni_rw_dir(req) != READ &&
             !(card->host->caps & MMC_CAP_MULTIWRITE) &&
             !mmc_card_sd(card))
             brq.data.blocks = 1;
@@ -272,7 +273,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
             writecmd = MMC_WRITE_BLOCK;
         }

-        if (rq_data_dir(req) == READ) {
+        if (rq_uni_rw_dir(req) == READ) {
             brq.cmd.opcode = readcmd;
             brq.data.flags |= MMC_DATA_READ;
         } else {
@@ -302,7 +303,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
             goto cmd_err;
         }

-        if (rq_data_dir(req) != READ) {
+        if (rq_uni_rw_dir(req) != READ) {
             do {
                 int err;

@@ -357,7 +358,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
      * For reads we just fail the entire chunk as that should
      * be safe in all cases.
      */
-     if (rq_data_dir(req) != READ && mmc_card_sd(card)) {
+     if (rq_uni_rw_dir(req) != READ && mmc_card_sd(card)) {
         u32 blocks;
         unsigned int bytes;

@@ -371,7 +372,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
             ret = end_that_request_chunk(req, 1, bytes);
             spin_unlock_irq(&md->lock);
         }
-    } else if (rq_data_dir(req) != READ &&
+    } else if (rq_uni_rw_dir(req) != READ &&
            (card->host->caps & MMC_CAP_MULTIWRITE)) {
         spin_lock_irq(&md->lock);
         ret = end_that_request_chunk(req, 1, brq.data.bytes_xfered);
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index b879a66..5df310e 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -53,14 +53,14 @@ static int do_blktrans_request(struct mtd_blktrans_ops *tr,
     if (req->sector + req->current_nr_sectors > get_capacity(req->rq_disk))
         return 0;

-    switch(rq_data_dir(req)) {
-    case READ:
+    switch(rq_dma_dir(req)) {
+    case DMA_FROM_DEVICE:
         for (; nsect > 0; nsect--, block++, buf += tr->blksize)
             if (tr->readsect(dev, block, buf))
                 return 0;
         return 1;

-    case WRITE:
+    case DMA_TO_DEVICE:
         if (!tr->writesect)
             return 0;

@@ -70,7 +70,7 @@ static int do_blktrans_request(struct mtd_blktrans_ops *tr,
         return 1;

     default:
-        printk(KERN_NOTICE "Unknown request %u\n", rq_data_dir(req));
+        printk(KERN_NOTICE "Unknown request %u\n", rq_dma_dir(req));
         return 0;
     }
 }
diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c
index eb5dc62..65ab1b1 100644
--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -1215,7 +1215,7 @@ __dasd_process_blk_queue(struct dasd_device * device)
         req = elv_next_request(queue);

         if (device->features & DASD_FEATURE_READONLY &&
-            rq_data_dir(req) == WRITE) {
+            rq_rw_dir(req) == WRITE) {
             DBF_DEV_EVENT(DBF_ERR, device,
                       "Rejecting write request %p",
                       req);
diff --git a/drivers/s390/block/dasd_diag.c b/drivers/s390/block/dasd_diag.c
index e810e4a..423debf 100644
--- a/drivers/s390/block/dasd_diag.c
+++ b/drivers/s390/block/dasd_diag.c
@@ -478,9 +478,9 @@ dasd_diag_build_cp(struct dasd_device * device, struct request *req)
     unsigned char rw_cmd;
     int i;

-    if (rq_data_dir(req) == READ)
+    if (rq_dma_dir(req) == DMA_FROM_DEVICE)
         rw_cmd = MDSK_READ_REQ;
-    else if (rq_data_dir(req) == WRITE)
+    else if (rq_dma_dir(req) == DMA_TO_DEVICE)
         rw_cmd = MDSK_WRITE_REQ;
     else
         return ERR_PTR(-EINVAL);
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index cecab22..e73a161 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -1113,9 +1113,9 @@ dasd_eckd_build_cp(struct dasd_device * device, struct request *req)
     int i;

     private = (struct dasd_eckd_private *) device->private;
-    if (rq_data_dir(req) == READ)
+    if (rq_dma_dir(req) == DMA_FROM_DEVICE)
         cmd = DASD_ECKD_CCW_READ_MT;
-    else if (rq_data_dir(req) == WRITE)
+    else if (rq_dma_dir(req) == DMA_TO_DEVICE)
         cmd = DASD_ECKD_CCW_WRITE_MT;
     else
         return ERR_PTR(-EINVAL);
@@ -1187,7 +1187,7 @@ dasd_eckd_build_cp(struct dasd_device * device, struct request *req)
         if (dasd_page_cache) {
             char *copy = kmem_cache_alloc(dasd_page_cache,
                               GFP_DMA | __GFP_NOWARN);
-            if (copy && rq_data_dir(req) == WRITE)
+            if (copy && rq_uni_rw_dir(req) == WRITE)
                 memcpy(copy + bv->bv_offset, dst, bv->bv_len);
             if (copy)
                 dst = copy + bv->bv_offset;
@@ -1203,7 +1203,7 @@ dasd_eckd_build_cp(struct dasd_device * device, struct request *req)
                     rcmd |= 0x8;
                     count = dasd_eckd_cdl_reclen(recid);
                     if (count < blksize &&
-                        rq_data_dir(req) == READ)
+                        rq_uni_rw_dir(req) == READ)
                         memset(dst + count, 0xe5,
                                blksize - count);
                 }
@@ -1283,7 +1283,7 @@ dasd_eckd_free_cp(struct dasd_ccw_req *cqr, struct request *req)
                 else
                     cda = (char *)((addr_t) ccw->cda);
                 if (dst != cda) {
-                    if (rq_data_dir(req) == READ)
+                    if (rq_uni_rw_dir(req) == READ)
                         memcpy(dst, cda, bv->bv_len);
                     kmem_cache_free(dasd_page_cache,
                         (void *)((addr_t)cda & PAGE_MASK));
diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
index be0909e..35a7bfe 100644
--- a/drivers/s390/block/dasd_fba.c
+++ b/drivers/s390/block/dasd_fba.c
@@ -244,9 +244,9 @@ dasd_fba_build_cp(struct dasd_device * device, struct request *req)
     int i;

     private = (struct dasd_fba_private *) device->private;
-    if (rq_data_dir(req) == READ) {
+    if (rq_dma_dir(req) == DMA_FROM_DEVICE) {
         cmd = DASD_FBA_CCW_READ;
-    } else if (rq_data_dir(req) == WRITE) {
+    } else if (rq_dma_dir(req) == DMA_TO_DEVICE) {
         cmd = DASD_FBA_CCW_WRITE;
     } else
         return ERR_PTR(-EINVAL);
@@ -293,7 +293,7 @@ dasd_fba_build_cp(struct dasd_device * device, struct request *req)
         return cqr;
     ccw = cqr->cpaddr;
     /* First ccw is define extent. */
-    define_extent(ccw++, cqr->data, rq_data_dir(req),
+    define_extent(ccw++, cqr->data, rq_uni_rw_dir(req),
               device->bp_block, req->sector, req->nr_sectors);
     /* Build locate_record + read/write ccws. */
     idaws = (unsigned long *) (cqr->data + sizeof(struct DE_fba_data));
@@ -301,7 +301,7 @@ dasd_fba_build_cp(struct dasd_device * device, struct request *req)
     /* Locate record for all blocks for smart devices. */
     if (private->rdc_data.mode.bits.data_chain != 0) {
         ccw[-1].flags |= CCW_FLAG_CC;
-        locate_record(ccw++, LO_data++, rq_data_dir(req), 0, count);
+        locate_record(ccw++, LO_data++, rq_uni_rw_dir(req), 0, count);
     }
     recid = first_rec;
     rq_for_each_bio(bio, req) bio_for_each_segment(bv, bio, i) {
@@ -309,7 +309,7 @@ dasd_fba_build_cp(struct dasd_device * device, struct request *req)
         if (dasd_page_cache) {
             char *copy = kmem_cache_alloc(dasd_page_cache,
                               GFP_DMA | __GFP_NOWARN);
-            if (copy && rq_data_dir(req) == WRITE)
+            if (copy && rq_uni_rw_dir(req) == WRITE)
                 memcpy(copy + bv->bv_offset, dst, bv->bv_len);
             if (copy)
                 dst = copy + bv->bv_offset;
@@ -319,7 +319,7 @@ dasd_fba_build_cp(struct dasd_device * device, struct request *req)
             if (private->rdc_data.mode.bits.data_chain == 0) {
                 ccw[-1].flags |= CCW_FLAG_CC;
                 locate_record(ccw, LO_data++,
-                          rq_data_dir(req),
+                          rq_uni_rw_dir(req),
                           recid - first_rec, 1);
                 ccw->flags = CCW_FLAG_CC;
                 ccw++;
@@ -386,7 +386,7 @@ dasd_fba_free_cp(struct dasd_ccw_req *cqr, struct request *req)
                 else
                     cda = (char *)((addr_t) ccw->cda);
                 if (dst != cda) {
-                    if (rq_data_dir(req) == READ)
+                    if (rq_uni_rw_dir(req) == READ)
                         memcpy(dst, cda, bv->bv_len);
                     kmem_cache_free(dasd_page_cache,
                         (void *)((addr_t)cda & PAGE_MASK));
diff --git a/drivers/s390/char/tape_block.c b/drivers/s390/char/tape_block.c
index dd0ecae..e9a67d0 100644
--- a/drivers/s390/char/tape_block.c
+++ b/drivers/s390/char/tape_block.c
@@ -174,7 +174,7 @@ tapeblock_requeue(struct work_struct *work) {
         nr_queued < TAPEBLOCK_MIN_REQUEUE
     ) {
         req = elv_next_request(queue);
-        if (rq_data_dir(req) == WRITE) {
+        if (rq_rw_dir(req) == WRITE) {
             DBF_EVENT(1, "TBLOCK: Rejecting write request\n");
             blkdev_dequeue_request(req);
             tapeblock_end_request(req, 0);
diff --git a/drivers/sbus/char/jsflash.c b/drivers/sbus/char/jsflash.c
index 512857a..36bd378 100644
--- a/drivers/sbus/char/jsflash.c
+++ b/drivers/sbus/char/jsflash.c
@@ -199,7 +199,7 @@ static void jsfd_do_request(request_queue_t *q)
             continue;
         }

-        if (rq_data_dir(req) != READ) {
+        if (rq_rw_dir(req) != READ) {
             printk(KERN_ERR "jsfd: write\n");
             end_request(req, 0);
             continue;
diff --git a/drivers/scsi/aic7xxx_old.c b/drivers/scsi/aic7xxx_old.c
index a988d5a..2e77803 100644
--- a/drivers/scsi/aic7xxx_old.c
+++ b/drivers/scsi/aic7xxx_old.c
@@ -2849,7 +2849,7 @@ aic7xxx_done(struct aic7xxx_host *p, struct aic7xxx_scb *scb)
     int x, i;


-    if (rq_data_dir(cmd->request) == WRITE)
+    if (rq_uni_rw_dir(cmd->request) == WRITE)
     {
       aic_dev->w_total++;
       ptr = aic_dev->w_bins;
@@ -3858,7 +3858,7 @@ aic7xxx_calculate_residual (struct aic7xxx_host *p, struct aic7xxx_scb *scb)
       {
         printk(INFO_LEAD "Underflow - Wanted %u, %s %u, residual SG "
           "count %d.\n", p->host_no, CTL_OF_SCB(scb), cmd->underflow,
-          (rq_data_dir(cmd->request) == WRITE) ? "wrote" : "read", actual,
+          (rq_rw_dir(cmd->request) == WRITE) ? "wrote" : "read", actual,
           hscb->residual_SG_segment_count);
         printk(INFO_LEAD "status 0x%x.\n", p->host_no, CTL_OF_SCB(scb),
           hscb->target_status);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 918bb60..e4baab6 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1676,6 +1676,7 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
     unsigned long flags;
     int rtn;

+    blk_rq_init_unqueued_req(&req);
     scmd->request = &req;
     memset(&scmd->eh_timeout, 0, sizeof(scmd->eh_timeout));

@@ -1688,7 +1689,7 @@ scsi_reset_provider(struct scsi_device *dev, int flag)

     scmd->cmd_len            = 0;

-    scmd->sc_data_direction        = DMA_BIDIRECTIONAL;
+    scmd->sc_data_direction        = DMA_NONE;

     init_timer(&scmd->eh_timeout);

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9f7482d..2c5bc49 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -181,10 +181,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
          unsigned char *sense, int timeout, int retries, int flags)
 {
     struct request *req;
-    int write = (data_direction == DMA_TO_DEVICE);
     int ret = DRIVER_ERROR << 24;

-    req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
+    req = blk_get_request(sdev->request_queue, data_direction, __GFP_WAIT);

     if (bufflen &&    blk_rq_map_kern(sdev->request_queue, req,
                     buffer, bufflen, __GFP_WAIT))
@@ -259,8 +258,10 @@ static int scsi_merge_bio(struct request *rq, struct bio *bio)
     struct request_queue *q = rq->q;

     bio->bi_flags &= ~(1 << BIO_SEG_VALID);
-    if (rq_data_dir(rq) == WRITE)
+    if (rq_dma_dir(rq) == DMA_TO_DEVICE)
         bio->bi_rw |= (1 << BIO_RW);
+    else
+        bio->bi_rw &= ~(1 << BIO_RW);
     blk_queue_bounce(q, &bio);

     if (!rq->bio)
@@ -386,13 +387,14 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
     struct request *req;
     struct scsi_io_context *sioc;
     int err = 0;
-    int write = (data_direction == DMA_TO_DEVICE);

     sioc = kmem_cache_zalloc(scsi_io_context_cache, gfp);
     if (!sioc)
         return DRIVER_ERROR << 24;

-    req = blk_get_request(sdev->request_queue, write, gfp);
+    WARN_ON((data_direction == DMA_NONE) && bufflen);
+    WARN_ON((data_direction != DMA_NONE) && !bufflen);
+    req = blk_get_request(sdev->request_queue, data_direction, gfp);
     if (!req)
         goto free_sense;
     req->cmd_type = REQ_TYPE_BLOCK_PC;
@@ -1118,18 +1120,13 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
         cmd->request_bufflen = 0;
         cmd->request_buffer = NULL;
         cmd->use_sg = 0;
-        req->buffer = NULL;
+        req->data_dir = DMA_NONE;
     }

     BUILD_BUG_ON(sizeof(req->cmd) > sizeof(cmd->cmnd));
     memcpy(cmd->cmnd, req->cmd, sizeof(cmd->cmnd));
     cmd->cmd_len = req->cmd_len;
-    if (!req->data_len)
-        cmd->sc_data_direction = DMA_NONE;
-    else if (rq_data_dir(req) == WRITE)
-        cmd->sc_data_direction = DMA_TO_DEVICE;
-    else
-        cmd->sc_data_direction = DMA_FROM_DEVICE;
+    cmd->sc_data_direction = rq_dma_dir(req);

     cmd->transfersize = req->data_len;
     cmd->allowed = req->retries;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index d402aff..72cdde1 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -80,7 +80,6 @@ struct scsi_cmnd *scsi_host_get_command(struct Scsi_Host *shost,
                     enum dma_data_direction data_dir,
                     gfp_t gfp_mask)
 {
-    int write = (data_dir == DMA_TO_DEVICE);
     struct request *rq;
     struct scsi_cmnd *cmd;
     struct scsi_tgt_cmd *tcmd;
@@ -93,7 +92,7 @@ struct scsi_cmnd *scsi_host_get_command(struct Scsi_Host *shost,
     if (!tcmd)
         goto put_dev;

-    rq = blk_get_request(shost->uspace_req_q, write, gfp_mask);
+    rq = blk_get_request(shost->uspace_req_q, data_dir, gfp_mask);
     if (!rq)
         goto free_tcmd;

@@ -191,17 +190,8 @@ static void scsi_tgt_cmd_destroy(struct work_struct *work)
         container_of(work, struct scsi_tgt_cmd, work);
     struct scsi_cmnd *cmd = tcmd->rq->special;

-    dprintk("cmd %p %d %lu\n", cmd, cmd->sc_data_direction,
-        rq_data_dir(cmd->request));
-    /*
-     * We fix rq->cmd_flags here since when we told bio_map_user
-     * to write vm for WRITE commands, blk_rq_bio_prep set
-     * rq_data_dir the flags to READ.
-     */
-    if (cmd->sc_data_direction == DMA_TO_DEVICE)
-        cmd->request->cmd_flags |= REQ_RW;
-    else
-        cmd->request->cmd_flags &= ~REQ_RW;
+    dprintk("cmd %p %d %d\n", cmd, cmd->sc_data_direction,
+        rq_dma_dir(cmd->request));

     scsi_unmap_user_pages(tcmd);
     scsi_host_put_command(scsi_tgt_cmd_to_host(cmd), cmd);
@@ -346,7 +336,7 @@ static void scsi_tgt_cmd_done(struct scsi_cmnd *cmd)
 {
     struct scsi_tgt_cmd *tcmd = cmd->request->end_io_data;

-    dprintk("cmd %p %lu\n", cmd, rq_data_dir(cmd->request));
+    dprintk("cmd %p %d\n", cmd, rq_dma_dir(cmd->request));

     scsi_tgt_uspace_send_status(cmd, tcmd->tag);
     queue_work(scsi_tgtd, &tcmd->work);
@@ -357,7 +347,7 @@ static int __scsi_tgt_transfer_response(struct scsi_cmnd *cmd)
     struct Scsi_Host *shost = scsi_tgt_cmd_to_host(cmd);
     int err;

-    dprintk("cmd %p %lu\n", cmd, rq_data_dir(cmd->request));
+    dprintk("cmd %p %d\n", cmd, rq_dma_dir(cmd->request));

     err = shost->hostt->transfer_response(cmd, scsi_tgt_cmd_done);
     switch (err) {
@@ -398,8 +388,8 @@ static int scsi_tgt_init_cmd(struct scsi_cmnd *cmd, gfp_t gfp_mask)

     cmd->request_bufflen = rq->data_len;

-    dprintk("cmd %p addr %p cnt %d %lu\n", cmd, tcmd->buffer, cmd->use_sg,
-        rq_data_dir(rq));
+    dprintk("cmd %p addr %p cnt %d %d\n", cmd, tcmd->buffer, cmd->use_sg,
+        rq_dma_dir(rq));
     count = blk_rq_map_sg(rq->q, rq, cmd->request_buffer);
     if (likely(count <= cmd->use_sg)) {
         cmd->use_sg = count;
@@ -617,8 +607,9 @@ int scsi_tgt_kspace_exec(int host_no, u64 tag, int result, u32 len,
     }
     cmd = rq->special;

-    dprintk("cmd %p result %d len %d bufflen %u %lu %x\n", cmd,
-        result, len, cmd->request_bufflen, rq_data_dir(rq), cmd->cmnd[0]);
+    dprintk("cmd %p result %d len %d bufflen %u %d %x\n", cmd,
+        result, len, cmd->request_bufflen, rq_dma_dir(rq),
+        cmd->cmnd[0]);

     if (result == TASK_ABORTED) {
         scsi_tgt_abort_cmd(shost, cmd);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5a8f55f..e9e60d7 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -432,23 +432,27 @@ static int sd_init_command(struct scsi_cmnd * SCpnt)
             this_count = this_count >> 3;
         }
     }
-    if (rq_data_dir(rq) == WRITE) {
+
+    SCpnt->sc_data_direction = rq_dma_dir(rq);
+    switch (SCpnt->sc_data_direction) {
+    case DMA_TO_DEVICE:
         if (!sdp->writeable) {
             return 0;
         }
         SCpnt->cmnd[0] = WRITE_6;
-        SCpnt->sc_data_direction = DMA_TO_DEVICE;
-    } else if (rq_data_dir(rq) == READ) {
+        break;
+    case DMA_FROM_DEVICE:
         SCpnt->cmnd[0] = READ_6;
-        SCpnt->sc_data_direction = DMA_FROM_DEVICE;
-    } else {
-        printk(KERN_ERR "sd: Unknown command %x\n", rq->cmd_flags);
+        break;
+    default:
+        printk(KERN_ERR "sd: Unknown command %x data_dir %d\n",
+            rq->cmd_flags ,rq_dma_dir(rq));
         return 0;
     }

     SCSI_LOG_HLQUEUE(2, printk("%s : %s %d/%ld 512 byte blocks.\n",
-        disk->disk_name, (rq_data_dir(rq) == WRITE) ?
-        "writing" : "reading", this_count, rq->nr_sectors));
+        disk->disk_name, dma_dir_to_string(rq->data_dir),
+        this_count, rq->nr_sectors));

     SCpnt->cmnd[1] = 0;

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 81e3bc7..46a1f7e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -733,8 +733,6 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
         data_dir = DMA_TO_DEVICE;
         break;
     case SG_DXFER_UNKNOWN:
-        data_dir = DMA_BIDIRECTIONAL;
-        break;
     default:
         data_dir = DMA_NONE;
         break;
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 1857d68..40ade44 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -334,16 +334,18 @@ static int sr_init_command(struct scsi_cmnd * SCpnt)
         return 0;
     }

-    if (rq_data_dir(SCpnt->request) == WRITE) {
+    SCpnt->sc_data_direction = rq_dma_dir(SCpnt->request);
+    switch (SCpnt->sc_data_direction) {
+    case DMA_TO_DEVICE:
         if (!cd->device->writeable)
             return 0;
         SCpnt->cmnd[0] = WRITE_10;
-        SCpnt->sc_data_direction = DMA_TO_DEVICE;
           cd->cdi.media_written = 1;
-    } else if (rq_data_dir(SCpnt->request) == READ) {
+        break;
+    case DMA_FROM_DEVICE:
         SCpnt->cmnd[0] = READ_10;
-        SCpnt->sc_data_direction = DMA_FROM_DEVICE;
-    } else {
+        break;
+    default:
         blk_dump_rq_flags(SCpnt->request, "Unknown sr command");
         return 0;
     }
@@ -377,8 +379,7 @@ static int sr_init_command(struct scsi_cmnd * SCpnt)

     SCSI_LOG_HLQUEUE(2, printk("%s : %s %d/%ld 512 byte blocks.\n",
                 cd->cdi.name,
-                (rq_data_dir(SCpnt->request) == WRITE) ?
-                    "writing" : "reading",
+                dma_dir_to_string(SCpnt->request->data_dir),
                 this_count, SCpnt->request->nr_sectors));

     SCpnt->cmnd[1] = 0;
diff --git a/drivers/scsi/sun3_NCR5380.c b/drivers/scsi/sun3_NCR5380.c
index 98e3fe1..6d18b33 100644
--- a/drivers/scsi/sun3_NCR5380.c
+++ b/drivers/scsi/sun3_NCR5380.c
@@ -1206,7 +1206,7 @@ static void NCR5380_dma_complete( struct Scsi_Host *instance )
            HOSTNO, NCR5380_read(BUS_AND_STATUS_REG),
            NCR5380_read(STATUS_REG));

-    if((sun3scsi_dma_finish(rq_data_dir(hostdata->connected->request)))) {
+    if((sun3scsi_dma_finish(rq_uni_rw_dir(hostdata->connected->request)))) {
         printk("scsi%d: overrun in UDC counter -- not prepared to deal with this!\n", HOSTNO);
         printk("please e-mail sammy@sammy.net with a description of how this\n");
         printk("error was produced.\n");
@@ -2024,7 +2024,7 @@ static void NCR5380_information_transfer (struct Scsi_Host *instance)
         {
             if(blk_fs_request(cmd->request)) {
                 sun3scsi_dma_setup(d, count,
-                           rq_data_dir(cmd->request));
+                           rq_uni_rw_dir(cmd->request));
                 sun3_dma_setup_done = cmd;
             }
         }
@@ -2636,7 +2636,7 @@ static void NCR5380_reselect (struct Scsi_Host *instance)
         /* setup this command for dma if not already */
         if((count > SUN3_DMA_MINSIZE) && (sun3_dma_setup_done != tmp))
         {
-            sun3scsi_dma_setup(d, count, rq_data_dir(tmp->request));
+            sun3scsi_dma_setup(d, count, rq_uni_rw_dir(tmp->request));
             sun3_dma_setup_done = tmp;
         }
 #endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83dcd8c..0969008 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -14,6 +14,7 @@
 #include <linux/bio.h>
 #include <linux/module.h>
 #include <linux/stringify.h>
+#include <linux/dma-mapping.h>

 #include <asm/scatterlist.h>

@@ -177,10 +178,9 @@ enum {
 };

 /*
- * request type modified bits. first three bits match BIO_RW* bits, important
+ * request type modified bits.
  */
 enum rq_flag_bits {
-    __REQ_RW,        /* not set, read. set, write */
     __REQ_FAILFAST,        /* no low level driver retries */
     __REQ_SORTED,        /* elevator knows about this request */
     __REQ_SOFTBARRIER,    /* may not be passed by ioscheduler */
@@ -199,9 +199,10 @@ enum rq_flag_bits {
     __REQ_ALLOCED,        /* request came from our alloc pool */
     __REQ_RW_META,        /* metadata io request */
     __REQ_NR_BITS,        /* stops here */
+    __REQ_BIDI,        /* FIXME: Will be removed. It is only for some quirks checking */
 };

-#define REQ_RW        (1 << __REQ_RW)
+#define REQ_BIDI    (1 << __REQ_BIDI)
 #define REQ_FAILFAST    (1 << __REQ_FAILFAST)
 #define REQ_SORTED    (1 << __REQ_SORTED)
 #define REQ_SOFTBARRIER    (1 << __REQ_SOFTBARRIER)
@@ -232,6 +233,7 @@ struct request {
     request_queue_t *q;

     unsigned int cmd_flags;
+    enum dma_data_direction data_dir;
     enum rq_cmd_type_bits cmd_type;

     /* Maintain bio traversal state for part by part I/O submission.
@@ -545,12 +547,46 @@ enum {

 #define list_entry_rq(ptr)    list_entry((ptr), struct request, queuelist)

-#define rq_data_dir(rq)        ((rq)->cmd_flags & 1)
+#define rq_dma_dir(rq)        ((rq)->data_dir)
+#define rq_uni_dir(rq)        dma_uni_dir((rq)->data_dir)
+
+static inline int rq_bidi_dir(struct request* rq)
+{
+    /*
+     * FIXME: the (req->cmd_flags & REQ_BIDI) will be removed once all
+     * the warnings go away.
+     */
+    return (rq_dma_dir(rq) == DMA_BIDIRECTIONAL) &&
+        (rq->cmd_flags & REQ_BIDI);
+}
+
+static inline int rq_rw_dir(struct request* rq)
+{
+    return dma_write_dir(rq->data_dir) ? WRITE : READ;
+}
+
+static inline int rq_uni_rw_dir(struct request* rq)
+{
+    WARN_ON(!dma_uni_dir(rq->data_dir));
+    return (rq->data_dir == DMA_TO_DEVICE) ? WRITE : READ;
+}
+
+/*
+ * DMA_BIDIRECTIONAL==0, and some drivers just bzero the request, so this will
+ * catch these cases. One must use blk_rq_init_unqueued_req() for cases that
+ * a request did not come from a request_queue.
+ */
+#define WARN_ON_BIDI_FLAG(req) \
+    WARN_ON( \
+        (rq_dma_dir(req) == DMA_BIDIRECTIONAL) != \
+        ((req->cmd_flags & REQ_BIDI) != 0) \
+    )

 /*
  * We regard a request as sync, if it's a READ or a SYNC write.
  */
-#define rq_is_sync(rq)        (rq_data_dir((rq)) == READ || (rq)->cmd_flags & REQ_RW_SYNC)
+#define rq_is_sync(rq)        (rq_dma_dir(rq) == DMA_FROM_DEVICE || \
+                                 (rq)->cmd_flags & REQ_RW_SYNC)
 #define rq_is_meta(rq)        ((rq)->cmd_flags & REQ_RW_META)

 static inline int blk_queue_full(struct request_queue *q, int rw)
@@ -584,7 +620,8 @@ static inline void blk_clear_queue_full(struct request_queue *q, int rw)
 #define RQ_NOMERGE_FLAGS    \
     (REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER)
 #define rq_mergeable(rq)    \
-    (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && blk_fs_request((rq)))
+    (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && blk_fs_request((rq)) && \
+    ((rq)->data_dir != DMA_BIDIRECTIONAL))

 /*
  * q->prep_rq_fn return values
@@ -630,7 +667,9 @@ extern void generic_make_request(struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(request_queue_t *, struct request *);
 extern void blk_end_sync_rq(struct request *rq, int error);
-extern struct request *blk_get_request(request_queue_t *, int, gfp_t);
+extern struct request *blk_get_request(request_queue_t *,
+                                       enum dma_data_direction, gfp_t);
+extern void blk_rq_init_unqueued_req(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
 extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 3680ff9..d9665b1 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -161,7 +161,13 @@ static inline void blk_add_trace_rq(struct request_queue *q, struct request *rq,
                     u32 what)
 {
     struct blk_trace *bt = q->blk_trace;
-    int rw = rq->cmd_flags & 0x03;
+    /* blktrace.c prints them according to bio flags */
+    int rw = (((rq_rw_dir(rq) == WRITE) << BIO_RW) |
+              (((rq->cmd_flags & (REQ_SOFTBARRIER|REQ_HARDBARRIER)) != 0) <<
+               BIO_RW_BARRIER) |
+              (((rq->cmd_flags & REQ_FAILFAST) != 0) << BIO_RW_FAILFAST) |
+              (((rq->cmd_flags & REQ_RW_SYNC) != 0) << BIO_RW_SYNC) |
+              (((rq->cmd_flags & REQ_RW_META) != 0) << BIO_RW_META));

     if (likely(!bt))
         return;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 9a663c6..0d22fd3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -13,6 +13,28 @@ enum dma_data_direction {
     DMA_NONE = 3,
 };

+static inline int dma_write_dir(enum dma_data_direction dir)
+{
+    return (dir == DMA_TO_DEVICE) || (dir == DMA_BIDIRECTIONAL);
+}
+
+static inline int dma_uni_dir(enum dma_data_direction dir)
+{
+    return (dir == DMA_TO_DEVICE) || (dir == DMA_FROM_DEVICE) ||
+           (dir == DMA_NONE);
+}
+
+static inline char* dma_dir_to_string(enum dma_data_direction dir)
+{
+    switch(dir){
+    case DMA_BIDIRECTIONAL:   return "bidirectional";
+    case DMA_TO_DEVICE:       return "writing";
+    case DMA_FROM_DEVICE:     return "reading";
+    case DMA_NONE:            return "no-data";
+    default:                  return "invalid";
+    }
+}
+
 #define DMA_64BIT_MASK    0xffffffffffffffffULL
 #define DMA_48BIT_MASK    0x0000ffffffffffffULL
 #define DMA_40BIT_MASK    0x000000ffffffffffULL
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index e88fcbc..c947f71 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -20,7 +20,7 @@ typedef void (elevator_add_req_fn) (request_queue_t *, struct request *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
-typedef int (elevator_may_queue_fn) (request_queue_t *, int);
+typedef int (elevator_may_queue_fn) (request_queue_t *, int, int);

 typedef int (elevator_set_req_fn) (request_queue_t *, struct request *, gfp_t);
 typedef void (elevator_put_req_fn) (struct request *);
@@ -111,7 +111,7 @@ extern struct request *elv_former_request(request_queue_t *, struct request *);
 extern struct request *elv_latter_request(request_queue_t *, struct request *);
 extern int elv_register_queue(request_queue_t *q);
 extern void elv_unregister_queue(request_queue_t *q);
-extern int elv_may_queue(request_queue_t *, int);
+extern int elv_may_queue(request_queue_t *, int, int);
 extern void elv_completed_request(request_queue_t *, struct request *);
 extern int elv_set_request(request_queue_t *, struct request *, gfp_t);
 extern void elv_put_request(request_queue_t *, struct request *);
-- 
1.5.0.4.402.g8035




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] bidi support: fix req->cmd == INT cases
  2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
  2007-04-15 17:25 ` [PATCH 1/4] bidi support: request dma_data_direction Boaz Harrosh
@ 2007-04-15 17:31 ` Boaz Harrosh
  2007-04-15 17:32 ` [PATCH 3/4] bidi support: request_io_part Boaz Harrosh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-15 17:31 UTC (permalink / raw)
  To: Boaz Harrosh, Jens Axboe, James Bottomley, Andrew Morton,
	Mike Christie, Christoph Hellwig
  Cc: linux-scsi, Linux-ide, Benny Halevy, osd-dev

 - we have unearthed very old bugs in stale drivers that still
  used request->cmd as a READ|WRITE int
- these drivers should probably go away...

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 drivers/acorn/block/fd1772.c |    2 +-
 drivers/acorn/block/mfmhd.c  |    8 ++++----
 drivers/block/amiflop.c      |    2 +-
 drivers/block/nbd.c          |    2 +-
 drivers/cdrom/aztcd.c        |    2 +-
 drivers/cdrom/cm206.c        |    2 +-
 drivers/cdrom/gscd.c         |    2 +-
 drivers/cdrom/mcdx.c         |    2 +-
 drivers/cdrom/optcd.c        |    2 +-
 drivers/cdrom/sjcd.c         |    2 +-
 drivers/ide/legacy/hd.c      |    4 ++--
 11 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/acorn/block/fd1772.c b/drivers/acorn/block/fd1772.c
index 674bf81..1717679 100644
--- a/drivers/acorn/block/fd1772.c
+++ b/drivers/acorn/block/fd1772.c
@@ -1246,7 +1246,7 @@ repeat:
     del_timer(&motor_off_timer);

     ReqCnt = 0;
-    ReqCmd = CURRENT->cmd;
+    ReqCmd = rq_uni_rw_dir(CURRENT);
     ReqBlock = CURRENT->sector;
     ReqBuffer = CURRENT->buffer;
     setup_req_params(drive);
diff --git a/drivers/acorn/block/mfmhd.c b/drivers/acorn/block/mfmhd.c
index 689a4c3..50001eb 100644
--- a/drivers/acorn/block/mfmhd.c
+++ b/drivers/acorn/block/mfmhd.c
@@ -439,7 +439,7 @@ static void mfm_rw_intr(void)
        a choice of command end or some data which is ready to be collected */
     /* I think we have to transfer data while the interrupt line is on and its
        not any other type of interrupt */
-    if (CURRENT->cmd == WRITE) {
+    if (rq_uni_rw_dir(CURRENT) == WRITE) {
         extern void hdc63463_writedma(void);
         if ((hdc63463_dataleft <= 0) && (!(mfm_status & STAT_CED))) {
             printk("mfm_rw_intr: Apparent DMA write request when no more to DMA\n");
@@ -799,7 +799,7 @@ static void issue_request(unsigned int block, unsigned int nsect,
     raw_cmd.head = start_head;
     raw_cmd.cylinder = track / p->heads;
     raw_cmd.cmdtype = CURRENT->cmd;
-    raw_cmd.cmdcode = CURRENT->cmd == WRITE ? CMD_WD : CMD_RD;
+    raw_cmd.cmdcode = rq_uni_rw_dir(CURRENT) == WRITE ? CMD_WD : CMD_RD;
     raw_cmd.cmddata[0] = dev + 1;    /* DAG: +1 to get US */
     raw_cmd.cmddata[1] = raw_cmd.head;
     raw_cmd.cmddata[2] = raw_cmd.cylinder >> 8;
@@ -830,7 +830,7 @@ static void issue_request(unsigned int block, unsigned int nsect,
     hdc63463_dataleft = nsect * 256;    /* Better way? */

     DBG("mfm%c: %sing: CHS=%d/%d/%d, sectors=%d, buffer=0x%08lx (%p)\n",
-         raw_cmd.dev + 'a', (CURRENT->cmd == READ) ? "read" : "writ",
+         raw_cmd.dev + 'a', dma_dir_to_string(rq_dma_dir(CURRENT)),
                raw_cmd.cylinder,
                raw_cmd.head,
         raw_cmd.sector, nsect, (unsigned long) Copy_buffer, CURRENT);
@@ -917,7 +917,7 @@ static void mfm_request(void)

         DBG("mfm_request: block after offset=%d\n", block);

-        if (CURRENT->cmd != READ && CURRENT->cmd != WRITE) {
+        if (!dma_uni_dir(rq_dma_dir(CURRENT))) {
             printk("unknown mfm-command %d\n", CURRENT->cmd);
             end_request(CURRENT, 0);
             Busy = 0;
diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c
index 54f2fb3..fa0da1f 100644
--- a/drivers/block/amiflop.c
+++ b/drivers/block/amiflop.c
@@ -1363,7 +1363,7 @@ static void redo_fd_request(void)
 #ifdef DEBUG
         printk("fd: sector %ld + %d requested for %s\n",
                CURRENT->sector,cnt,
-               (CURRENT->cmd==READ)?"read":"write");
+               (rq_uni_rw_dir(CURRENT) == READ) ? "read" : "write");
 #endif
         block = CURRENT->sector + cnt;
         if ((int)block > floppy->blocks) {
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 411e138..fc5e1b2 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -411,7 +411,7 @@ static void nbd_clear_que(struct nbd_device *lo)
 /*
  * We always wait for result of write, for now. It would be nice to make it optional
  * in future
- * if ((req->cmd == WRITE) && (lo->flags & NBD_WRITE_NOCHK))
+ * if ((rq_uni_rw_dir(req) == WRITE) && (lo->flags & NBD_WRITE_NOCHK))
  *   { printk( "Warning: Ignoring result!\n"); nbd_end_request( req ); }
  */

diff --git a/drivers/cdrom/aztcd.c b/drivers/cdrom/aztcd.c
index 1f9fb7a..8ccae77 100644
--- a/drivers/cdrom/aztcd.c
+++ b/drivers/cdrom/aztcd.c
@@ -229,7 +229,7 @@ static struct request_queue *azt_queue;
 static int current_valid(void)
 {
         return CURRENT &&
-        CURRENT->cmd == READ &&
+        rq_rw_dir(CURRENT) == READ &&
         CURRENT->sector != -1;
 }

diff --git a/drivers/cdrom/cm206.c b/drivers/cdrom/cm206.c
index 2301311..1bdf0b7 100644
--- a/drivers/cdrom/cm206.c
+++ b/drivers/cdrom/cm206.c
@@ -851,7 +851,7 @@ static void do_cm206_request(request_queue_t * q)
         if (!req)
             return;

-        if (req->cmd != READ) {
+        if (rq_rw_dir(req) != READ) {
             debug(("Non-read command %d on cdrom\n", req->cmd));
             end_request(req, 0);
             continue;
diff --git a/drivers/cdrom/gscd.c b/drivers/cdrom/gscd.c
index 8411f8c..d08ebbe 100644
--- a/drivers/cdrom/gscd.c
+++ b/drivers/cdrom/gscd.c
@@ -264,7 +264,7 @@ repeat:
     if (req->sector == -1)
         goto out;

-    if (req->cmd != READ) {
+    if (rq_rw_dir(req) != READ) {
         printk("GSCD: bad cmd %u\n", rq_dma_dir(req));
         end_request(req, 0);
         goto repeat;
diff --git a/drivers/cdrom/mcdx.c b/drivers/cdrom/mcdx.c
index f574962..52ecbf9 100644
--- a/drivers/cdrom/mcdx.c
+++ b/drivers/cdrom/mcdx.c
@@ -596,7 +596,7 @@ static void do_mcdx_request(request_queue_t * q)
     xtrace(REQUEST, "do_request() (%lu + %lu)\n",
            req->sector, req->nr_sectors);

-    if (req->cmd != READ) {
+    if (rq_rw_dir(req) != READ) {
         xwarn("do_request(): non-read command to cd!!\n");
         xtrace(REQUEST, "end_request(0): write\n");
         end_request(req, 0);
diff --git a/drivers/cdrom/optcd.c b/drivers/cdrom/optcd.c
index 3541690..73cc108 100644
--- a/drivers/cdrom/optcd.c
+++ b/drivers/cdrom/optcd.c
@@ -977,7 +977,7 @@ static int update_toc(void)
 static int current_valid(void)
 {
         return CURRENT &&
-        CURRENT->cmd == READ &&
+        rq_rw_dir(CURRENT) == READ &&
         CURRENT->sector != -1;
 }

diff --git a/drivers/cdrom/sjcd.c b/drivers/cdrom/sjcd.c
index 5409fca..b1d3aa0 100644
--- a/drivers/cdrom/sjcd.c
+++ b/drivers/cdrom/sjcd.c
@@ -1064,7 +1064,7 @@ static void sjcd_invalidate_buffers(void)
 static int current_valid(void)
 {
         return CURRENT &&
-        CURRENT->cmd == READ &&
+        rq_rw_dir(CURRENT) == READ &&
         CURRENT->sector != -1;
 }

diff --git a/drivers/ide/legacy/hd.c b/drivers/ide/legacy/hd.c
index ca8dd47..fac56a2 100644
--- a/drivers/ide/legacy/hd.c
+++ b/drivers/ide/legacy/hd.c
@@ -622,8 +622,8 @@ repeat:
     head  = track % disk->head;
     cyl   = track / disk->head;
 #ifdef DEBUG
-    printk("%s: %sing: CHS=%d/%d/%d, sectors=%d, buffer=%p\n",
-        req->rq_disk->disk_name, (req->cmd == READ)?"read":"writ",
+    printk("%s: %s: CHS=%d/%d/%d, sectors=%d, buffer=%p\n",
+        req->rq_disk->disk_name, dma_dir_to_string(rq_dma_dir(req)),
         cyl, head, sec, nsect, req->buffer);
 #endif
     if (blk_fs_request(req)) {
-- 
1.5.0.4.402.g8035





^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] bidi support: request_io_part
  2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
  2007-04-15 17:25 ` [PATCH 1/4] bidi support: request dma_data_direction Boaz Harrosh
  2007-04-15 17:31 ` [PATCH 2/4] bidi support: fix req->cmd == INT cases Boaz Harrosh
@ 2007-04-15 17:32 ` Boaz Harrosh
  2007-04-29 15:49   ` Boaz Harrosh
  2007-04-15 17:33 ` [PATCH 4/4] bidi support: bidirectional request Boaz Harrosh
  2007-04-16 18:03 ` [PATCH 0/4] bidi support: block layer bidirectional io Douglas Gilbert
  4 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-15 17:32 UTC (permalink / raw)
  To: Boaz Harrosh, Jens Axboe, James Bottomley, Andrew Morton,
	Mike Christie, Christoph Hellwig
  Cc: linux-scsi, Linux-ide, Benny Halevy, osd-dev

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

- Extract all I/O members of struct request into a request_io_part member.
- Define API to access the I/O part
- Adjust block layer accordingly.
- Change all users to new API.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>

---------------------------------------------------

Patch is attached compressed because of size


[-- Attachment #2: 0003-bidi-support-request_io_part.patch.bz2 --]
[-- Type: application/x-bzip2, Size: 33742 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] bidi support: request_io_part
  2007-04-15 17:32 ` [PATCH 3/4] bidi support: request_io_part Boaz Harrosh
@ 2007-04-29 15:49   ` Boaz Harrosh
  0 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-29 15:49 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Andrew Morton
  Cc: Christoph Hellwig, Benny Halevy, linux-scsi, Linux-ide

Boaz Harrosh wrote:
> - Extract all I/O members of struct request into a request_io_part member.
> - Define API to access the I/O part
> - Adjust block layer accordingly.
> - Change all users to new API.
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> 
> ---------------------------------------------------
> 
> Patch is attached compressed because of size
It looks like this patch is very big and is hard for review/maintenance.

I was thinking of a way it could be divided into small patches but still
make it compile and run at each stage/patch for bisects.

It could be done in three stages:
1. Make a dummy API that mimics the new API but still lets old drivers/code
   compile.
2. Stage 2 - convert driver by driver or group by group to new API. this can
   be done in an arbitrary number of patches.
3. Final stage. do the actual move of members and implement the new API.
   At this stage, if any drivers are not converted, (out-of-tree drivers),
   they will not compile.

Please tell me if you need this done? should I send the all patchset or just this one divided?
(Below is a demonstration of 1st and 3rd stages at blkdev.h)

<FIRST_STAGE>
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c1121d2..579ee2d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -235,23 +235,6 @@ struct request {
 	unsigned int cmd_flags;
 	enum rq_cmd_type_bits cmd_type;

-	/* Maintain bio traversal state for part by part I/O submission.
-	 * hard_* are block layer internals, no driver should touch them!
-	 */
-
-	sector_t sector;		/* next sector to submit */
-	sector_t hard_sector;		/* next sector to complete */
-	unsigned long nr_sectors;	/* no. of sectors left to submit */
-	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */
-	/* no. of sectors left to submit in the current segment */
-	unsigned int current_nr_sectors;
-
-	/* no. of sectors left to complete in the current segment */
-	unsigned int hard_cur_sectors;
-
-	struct bio *bio;
-	struct bio *biotail;
-
 	struct hlist_node hash;	/* merge hash */
 	/*
 	 * The rb_node is only used inside the io scheduler, requests
@@ -273,22 +256,11 @@ struct request {
 	struct gendisk *rq_disk;
 	unsigned long start_time;

-	/* Number of scatter-gather DMA addr+len pairs after
-	 * physical address coalescing is performed.
-	 */
-	unsigned short nr_phys_segments;
-
-	/* Number of scatter-gather addr+len pairs after
-	 * physical and DMA remapping hardware coalescing is performed.
-	 * This is the number of scatter-gather entries the driver
-	 * will actually have to deal with after DMA mapping is done.
-	 */
-	unsigned short nr_hw_segments;
-
 	unsigned short ioprio;

 	void *special;
-	char *buffer;
+	char *buffer;			/* FIXME: should be Deprecated */
+	void *data;			/* FIXME: should be Deprecated */

 	int tag;
 	int errors;
@@ -301,9 +273,7 @@ struct request {
 	unsigned int cmd_len;
 	unsigned char cmd[BLK_MAX_CDB];

-	unsigned int data_len;
 	unsigned int sense_len;
-	void *data;
 	void *sense;

 	unsigned int timeout;
@@ -314,8 +284,49 @@ struct request {
 	 */
 	rq_end_io_fn *end_io;
 	void *end_io_data;
+
+/*
+ request io members. FIXME: will go later in the patchset into a sub-structure
+*/
+/*	struct request_io_part uni; */
+/* struct request_io_part { */
+	unsigned int data_len;
+
+	/* Maintain bio traversal state for part by part I/O submission.
+	 * hard_* are block layer internals, no driver should touch them!
+	 */
+	sector_t sector;		/* next sector to submit */
+	sector_t hard_sector;		/* next sector to complete */
+	unsigned long nr_sectors;	/* no. of sectors left to submit */
+	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */
+	/* no. of sectors left to submit in the current segment */
+	unsigned int current_nr_sectors;
+
+	/* no. of sectors left to complete in the current segment */
+	unsigned int hard_cur_sectors;
+
+	struct bio *bio;
+	struct bio *biotail;
+
+	/* Number of scatter-gather DMA addr+len pairs after
+	 * physical address coalescing is performed.
+	 */
+	unsigned short nr_phys_segments;
+
+	/* Number of scatter-gather addr+len pairs after
+	 * physical and DMA remapping hardware coalescing is performed.
+	 * This is the number of scatter-gather entries the driver
+	 * will actually have to deal with after DMA mapping is done.
+	 */
+	unsigned short nr_hw_segments;
+/* }; */
 };

+/* FIXME: here only for the duration of the patchset.
+   will be removed in last patch
+*/
+#define request_io_part request
+
 /*
  * State information carried for REQ_TYPE_PM_SUSPEND and REQ_TYPE_PM_RESUME
  * requests. Some step values could eventually be made generic.
@@ -589,6 +600,15 @@ static inline const char* rq_dir_to_string(struct request* rq)
 }

 /*
+ * access for the apropreate bio and io members
+ */
+static inline struct request_io_part* rq_uni(struct request* req)
+{
+	/* FIXME: will be changed to real implementation in last patch */
+	return req;
+}
+
+/*
  * We regard a request as sync, if it's a READ or a SYNC write.
  */
 #define rq_is_sync(rq)		(rq_rw_dir((rq)) == READ || (rq)->cmd_flags & REQ_RW_SYNC)
</FIRST_STAGE>

<SECOND_STAGE>
- Convert all users file by file or group by group. For example:
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index b36f44d..eea101f 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2137,7 +2137,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
 		rq->cmd_len = 12;
 		rq->cmd_type = REQ_TYPE_BLOCK_PC;
 		rq->timeout = 60 * HZ;
-		bio = rq->bio;
+		bio = rq_uni(rq)->bio;

 		if (blk_execute_rq(q, cdi->disk, rq, 0)) {
 			struct request_sense *s = rq->sense;
</SECOND_STAGE>

<THIRD STAGE>
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 579ee2d..bcd2a2a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -137,6 +137,41 @@ struct request_list {
 };

 /*
+ * request io members. one for uni read/write and one for bidi_read
+ */
+struct request_io_part {
+	unsigned int data_len;
+
+	/* Maintain bio traversal state for part by part I/O submission.
+	 * hard_* are block layer internals, no driver should touch them!
+	 */
+	sector_t sector;		/* next sector to submit */
+	sector_t hard_sector;		/* next sector to complete */
+	unsigned long nr_sectors;	/* no. of sectors left to submit */
+	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */
+	/* no. of sectors left to submit in the current segment */
+	unsigned int current_nr_sectors;
+
+	/* no. of sectors left to complete in the current segment */
+	unsigned int hard_cur_sectors;
+
+	struct bio *bio;
+	struct bio *biotail;
+
+	/* Number of scatter-gather DMA addr+len pairs after
+	 * physical address coalescing is performed.
+	 */
+	unsigned short nr_phys_segments;
+
+	/* Number of scatter-gather addr+len pairs after
+	 * physical and DMA remapping hardware coalescing is performed.
+	 * This is the number of scatter-gather entries the driver
+	 * will actually have to deal with after DMA mapping is done.
+	 */
+	unsigned short nr_hw_segments;
+};
+
+/*
  * request command types
  */
 enum rq_cmd_type_bits {
@@ -285,48 +320,9 @@ struct request {
 	rq_end_io_fn *end_io;
 	void *end_io_data;

-/*
- request io members. FIXME: will go later in the patchset into a sub-structure
-*/
-/*	struct request_io_part uni; */
-/* struct request_io_part { */
-	unsigned int data_len;
-
-	/* Maintain bio traversal state for part by part I/O submission.
-	 * hard_* are block layer internals, no driver should touch them!
-	 */
-	sector_t sector;		/* next sector to submit */
-	sector_t hard_sector;		/* next sector to complete */
-	unsigned long nr_sectors;	/* no. of sectors left to submit */
-	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */
-	/* no. of sectors left to submit in the current segment */
-	unsigned int current_nr_sectors;
-
-	/* no. of sectors left to complete in the current segment */
-	unsigned int hard_cur_sectors;
-
-	struct bio *bio;
-	struct bio *biotail;
-
-	/* Number of scatter-gather DMA addr+len pairs after
-	 * physical address coalescing is performed.
-	 */
-	unsigned short nr_phys_segments;
-
-	/* Number of scatter-gather addr+len pairs after
-	 * physical and DMA remapping hardware coalescing is performed.
-	 * This is the number of scatter-gather entries the driver
-	 * will actually have to deal with after DMA mapping is done.
-	 */
-	unsigned short nr_hw_segments;
-/* }; */
+	struct request_io_part uni;
 };

-/* FIXME: here only for the duration of the patchset.
-   will be removed in last patch
-*/
-#define request_io_part request
-
 /*
  * State information carried for REQ_TYPE_PM_SUSPEND and REQ_TYPE_PM_RESUME
  * requests. Some step values could eventually be made generic.
@@ -584,17 +580,17 @@ static inline int rq_data_dir(struct request* rq)
 static inline enum dma_data_direction rq_dma_dir(struct request* rq)
 {
 	WARN_ON(rq_is_bidi(rq));
-	if (!rq->bio)
+	if (!rq->uni.bio)
 		return DMA_NONE;
 	else
-		return bio_data_dir(rq->bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+		return bio_data_dir(rq->uni.bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
 }
 static inline const char* rq_dir_to_string(struct request* rq)
 {
-	if (!rq->bio)
+	if (!rq->uni.bio)
 		return "no data command";
 	else
-		return bio_data_dir(rq->bio) ?
+		return bio_data_dir(rq->uni.bio) ?
 			"writing" :
 			"reading";
 }
@@ -604,8 +600,8 @@ static inline const char* rq_dir_to_string(struct request* rq)
  */
 static inline struct request_io_part* rq_uni(struct request* req)
 {
-	/* FIXME: will be changed to real implementation in last patch */
-	return req;
+	WARN_ON( rq_is_bidi(req) );
+	return &req->uni;
 }

 /*
@@ -681,8 +677,8 @@ static inline void blk_queue_bounce(request_queue_t *q, struct bio **bio)
 #endif /* CONFIG_MMU */

 #define rq_for_each_bio(_bio, rq)	\
-	if ((rq->bio))			\
-		for (_bio = (rq)->bio; _bio; _bio = _bio->bi_next)
+	if ((rq_uni(rq)->bio))			\
+		for (_bio = rq_uni(rq)->bio; _bio; _bio = _bio->bi_next)

 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);

</THIRD_STAGE>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] bidi support: bidirectional request
  2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
                   ` (2 preceding siblings ...)
  2007-04-15 17:32 ` [PATCH 3/4] bidi support: request_io_part Boaz Harrosh
@ 2007-04-15 17:33 ` Boaz Harrosh
  2007-04-28 19:48   ` FUJITA Tomonori
  2007-04-16 18:03 ` [PATCH 0/4] bidi support: block layer bidirectional io Douglas Gilbert
  4 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-15 17:33 UTC (permalink / raw)
  To: Boaz Harrosh, Jens Axboe, James Bottomley, Andrew Morton,
	Mike Christie, Christoph Hellwig
  Cc: linux-scsi, Linux-ide, Benny Halevy, osd-dev

- Instantiate another request_io_part in struct request for bidi_read.
- Define & Implement new API for accessing bidi parts.
- API to Build bidi requests and map to sglists.
- Define new end_that_request_block() function to end a complete request.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 block/elevator.c        |    7 +--
 block/ll_rw_blk.c       |  120 ++++++++++++++++++++++++++++++++++++++++-------
 drivers/scsi/scsi_lib.c |    2 +-
 include/linux/blkdev.h  |   56 +++++++++++++++++++++-
 4 files changed, 160 insertions(+), 25 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 237f15c..e39ef57 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -757,14 +757,9 @@ struct request *elv_next_request(request_queue_t *q)
             rq = NULL;
             break;
         } else if (ret == BLKPREP_KILL) {
-            int nr_bytes = rq_uni(rq)->hard_nr_sectors << 9;
-
-            if (!nr_bytes)
-                nr_bytes = rq_uni(rq)->data_len;
-
             blkdev_dequeue_request(rq);
             rq->cmd_flags |= REQ_QUIET;
-            end_that_request_chunk(rq, 0, nr_bytes);
+            end_that_request_block(rq, 0);
             end_that_request_last(rq, 0);
         } else {
             printk(KERN_ERR "%s: bad return=%d\n", __FUNCTION__,
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index c8ed8a9..21fdbc2 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -261,6 +261,7 @@ static void rq_init(request_queue_t *q, struct request *rq)
     rq->end_io_data = NULL;
     rq->completion_data = NULL;
     rq_init_io_part(&rq->uni);
+    rq_init_io_part(&rq->bidi_read);
 }

 /**
@@ -1312,14 +1313,16 @@ static int blk_hw_contig_segment(request_queue_t *q, struct bio *bio,
 }

 /*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
+ * map a request_io_part to scatterlist, return number of sg entries setup.
+ * Caller must make sure sg can hold rq_io(rq, dir)->nr_phys_segments entries
  */
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+int blk_rq_map_sg_bidi(request_queue_t *q, struct request *rq,
+    struct scatterlist *sg, enum dma_data_direction dir)
 {
     struct bio_vec *bvec, *bvprv;
     struct bio *bio;
     int nsegs, i, cluster;
+    struct request_io_part* req_io = rq_io(rq, dir);

     nsegs = 0;
     cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
@@ -1328,7 +1331,7 @@ int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg
      * for each bio in rq
      */
     bvprv = NULL;
-    rq_for_each_bio(bio, rq) {
+    for (bio = req_io->bio; bio; bio = bio->bi_next) {
         /*
          * for each segment in bio
          */
@@ -1360,7 +1363,17 @@ new_segment:

     return nsegs;
 }
+EXPORT_SYMBOL(blk_rq_map_sg_bidi);

+/*
+ * map a request to scatterlist, return number of sg entries setup. Caller
+ * must make sure sg can hold rq->nr_phys_segments entries
+ */
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,
+                  struct scatterlist *sg)
+{
+    return blk_rq_map_sg_bidi(q, rq, sg, rq->data_dir);
+}
 EXPORT_SYMBOL(blk_rq_map_sg);

 /*
@@ -1415,11 +1428,12 @@ static inline int ll_new_hw_segment(request_queue_t *q,
     return 1;
 }

-int ll_back_merge_fn(request_queue_t *q, struct request *req, struct bio *bio)
+int ll_back_merge_fn(request_queue_t *q, struct request *req, struct bio *bio,
+    enum dma_data_direction dir)
 {
     unsigned short max_sectors;
     int len;
-    struct request_io_part* req_io = rq_uni(req);
+    struct request_io_part* req_io = rq_io(req, dir);

     if (unlikely(blk_pc_request(req)))
         max_sectors = q->max_hw_sectors;
@@ -2404,7 +2418,7 @@ static int __blk_rq_map_user(request_queue_t *q, struct request *rq,
     req_io = rq_uni(rq);
     if (!req_io->bio)
         blk_rq_bio_prep(q, rq, bio);
-    else if (!ll_back_merge_fn(q, rq, bio)) {
+    else if (!ll_back_merge_fn(q, rq, bio, rq->data_dir)) {
         ret = -EINVAL;
         goto unmap_bio;
     } else {
@@ -2574,15 +2588,18 @@ int blk_rq_unmap_user(struct bio *bio)
 EXPORT_SYMBOL(blk_rq_unmap_user);

 /**
- * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage
+ * blk_rq_map_kern_bidi - maps kernel data to a request_io_part, for BIDI usage
  * @q:        request queue where request should be inserted
  * @rq:        request to fill
  * @kbuf:    the kernel buffer
  * @len:    length of user data
  * @gfp_mask:    memory allocation flags
+ * @dir:        if it is a BIDIRECTIONAL request than DMA_TO_DEVICE to prepare
+ *              the bidi_write side or DMA_FROM_DEVICE to prepare the bidi_read
+ *              side, else it should be same as req->data_dir
  */
-int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
-            unsigned int len, gfp_t gfp_mask)
+int blk_rq_map_kern_bidi(request_queue_t *q, struct request *rq, void *kbuf,
+    unsigned int len, gfp_t gfp_mask, enum dma_data_direction dir)
 {
     struct bio *bio;

@@ -2595,14 +2612,29 @@ int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
     if (IS_ERR(bio))
         return PTR_ERR(bio);

-    if (dma_write_dir(rq->data_dir))
+    if (dma_write_dir(dir))
         bio->bi_rw |= (1 << BIO_RW);

-    blk_rq_bio_prep(q, rq, bio);
+    blk_rq_bio_prep_bidi(q, rq, bio ,dir);
     rq->buffer = rq->data = NULL;
     return 0;
 }

+EXPORT_SYMBOL(blk_rq_map_kern_bidi);
+
+/**
+ * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage
+ * @q:        request queue where request should be inserted
+ * @rq:        request to fill
+ * @kbuf:    the kernel buffer
+ * @len:    length of user data
+ * @gfp_mask:    memory allocation flags
+ */
+int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
+            unsigned int len, gfp_t gfp_mask)
+{
+    return blk_rq_map_kern_bidi( q, rq, kbuf, len, gfp_mask, rq->data_dir);
+}
 EXPORT_SYMBOL(blk_rq_map_kern);

 /**
@@ -2988,7 +3020,7 @@ static int __make_request(request_queue_t *q, struct bio *bio)
         case ELEVATOR_BACK_MERGE:
             BUG_ON(!rq_mergeable(req));

-            if (!ll_back_merge_fn(q, req, bio))
+            if (!ll_back_merge_fn(q, req, bio, req->data_dir))
                 break;

             blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE);
@@ -3375,11 +3407,11 @@ static void blk_recalc_rq_sectors(struct request *rq, int nsect)
 }

 static int __end_that_request_first(struct request *req, int uptodate,
-                    int nr_bytes)
+                    int nr_bytes, enum dma_data_direction dir)
 {
     int total_bytes, bio_nbytes, error, next_idx = 0;
     struct bio *bio;
-    struct request_io_part* req_io = rq_uni(req);
+    struct request_io_part* req_io = rq_io(req, dir);

     blk_add_trace_rq(req->q, req, BLK_TA_COMPLETE);

@@ -3469,6 +3501,8 @@ static int __end_that_request_first(struct request *req, int uptodate,
     if (!req_io->bio)
         return 0;

+    WARN_ON(rq_bidi_dir(req));
+
     /*
      * if the request wasn't completed, update state
      */
@@ -3501,7 +3535,7 @@ static int __end_that_request_first(struct request *req, int uptodate,
  **/
 int end_that_request_first(struct request *req, int uptodate, int nr_sectors)
 {
-    return __end_that_request_first(req, uptodate, nr_sectors << 9);
+    return end_that_request_chunk(req, uptodate, nr_sectors << 9);
 }

 EXPORT_SYMBOL(end_that_request_first);
@@ -3523,11 +3557,55 @@ EXPORT_SYMBOL(end_that_request_first);
  **/
 int end_that_request_chunk(struct request *req, int uptodate, int nr_bytes)
 {
-    return __end_that_request_first(req, uptodate, nr_bytes);
+    WARN_ON_BIDI_FLAG(req);
+    WARN_ON(!rq_uni_dir(req));
+    return __end_that_request_first(req, uptodate, nr_bytes,
+        rq_uni_dir(req) ? rq_dma_dir(req) : DMA_TO_DEVICE);
 }

 EXPORT_SYMBOL(end_that_request_chunk);

+static void __end_req_io_block(struct request_io_part *req_io, int error)
+{
+    struct bio *next, *bio = req_io->bio;
+    req_io->bio = NULL;
+
+    for (; bio; bio = next) {
+        next = bio->bi_next;
+        bio_endio(bio, bio->bi_size, error);
+    }
+}
+
+/**
+ * end_that_request_block - end ALL I/O on a request in one "shloop",
+ * including the bidi part.
+ * @req:      the request being processed
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
+ *
+ * Description:
+ *     Ends ALL I/O on @req, both read/write or bidi. frees all bio resources.
+ **/
+void end_that_request_block(struct request *req, int uptodate)
+{
+    if (blk_pc_request(req)) {
+        int error = 0;
+        if (end_io_error(uptodate))
+            error = !uptodate ? -EIO : uptodate;
+        blk_add_trace_rq(req->q, req, BLK_TA_COMPLETE);
+
+        __end_req_io_block(&req->uni, error);
+        if (rq_bidi_dir(req))
+            __end_req_io_block(&req->bidi_read, 0);
+    } else { /* needs elevator bookeeping */
+        int nr_bytes = req->uni.hard_nr_sectors << 9;
+        if (!nr_bytes)
+            nr_bytes = req->uni.data_len;
+        end_that_request_chunk(req, uptodate, nr_bytes);
+    }
+}
+
+EXPORT_SYMBOL(end_that_request_block);
+
 /*
  * splice the completion data to a local structure and hand off to
  * process_completion_queue() to complete the requests
@@ -3656,6 +3734,14 @@ void end_request(struct request *req, int uptodate)

 EXPORT_SYMBOL(end_request);

+void blk_rq_bio_prep_bidi(request_queue_t *q, struct request *rq,
+    struct bio *bio, enum dma_data_direction dir)
+{
+    init_req_io_part_from_bio(q, rq_io(rq, dir), bio);
+    rq->buffer = NULL;
+}
+EXPORT_SYMBOL(blk_rq_bio_prep_bidi);
+
 void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio)
 {
     rq->data_dir = bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5863827..42aefd4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -268,7 +268,7 @@ static int scsi_merge_bio(struct request *rq, struct bio *bio)

     if (!req_io->bio)
         blk_rq_bio_prep(q, rq, bio);
-    else if (!ll_back_merge_fn(q, rq, bio))
+    else if (!ll_back_merge_fn(q, rq, bio, rq_dma_dir(rq)))
         return -EINVAL;
     else {
         req_io->biotail->bi_next = bio;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 645d24b..16a02ee 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -322,6 +322,7 @@ struct request {
     void *end_io_data;

     struct request_io_part uni;
+    struct request_io_part bidi_read;
 };

 /*
@@ -600,6 +601,34 @@ static inline struct request_io_part* rq_uni(struct request* req)
     return &req->uni;
 }

+static inline struct request_io_part* rq_out(struct request* req)
+{
+    WARN_ON_BIDI_FLAG(req);
+    return &req->uni;
+}
+
+static inline struct request_io_part* rq_in(struct request* req)
+{
+    WARN_ON_BIDI_FLAG(req);
+    if (likely(rq_dma_dir(req) != DMA_BIDIRECTIONAL))
+        return &req->uni;
+
+    if (likely(req->cmd_flags & REQ_BIDI))
+        return &req->bidi_read;
+
+    return &req->uni;
+}
+
+static inline struct request_io_part* rq_io(struct request* req,
+                                            enum dma_data_direction dir)
+{
+    if (dir == DMA_FROM_DEVICE)
+        return rq_in(req);
+
+    WARN_ON( (dir != DMA_TO_DEVICE) && (dir != DMA_NONE) );
+    return &req->uni;
+}
+
 /*
  * We regard a request as sync, if it's a READ or a SYNC write.
  */
@@ -700,7 +729,8 @@ extern int sg_scsi_ioctl(struct file *, struct request_queue *,
 /*
  * Temporary export, until SCSI gets fixed up.
  */
-extern int ll_back_merge_fn(request_queue_t *, struct request *, struct bio *);
+extern int ll_back_merge_fn(request_queue_t *, struct request *, struct bio *,
+    enum dma_data_direction);

 /*
  * A queue has just exitted congestion.  Note this in the global counter of
@@ -771,6 +801,15 @@ extern void end_request(struct request *req, int uptodate);
 extern void blk_complete_request(struct request *);

 /*
+ * end_request_block will complete and free all bio resources held
+ * by the request in one call. User will still need to call
+ * end_that_request_last(..).
+ * It is the only one that can deal with BIDI.
+ * can be called for parial bidi allocation and cleanup.
+ */
+extern void end_that_request_block(struct request *req, int uptodate);
+
+/*
  * end_that_request_first/chunk() takes an uptodate argument. we account
  * any value <= as an io error. 0 means -EIO for compatability reasons,
  * any other < 0 value is the direct error type. An uptodate value of
@@ -849,6 +888,21 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 extern void blk_rq_bio_prep(request_queue_t *, struct request *, struct bio *);
 extern int blkdev_issue_flush(struct block_device *, sector_t *);

+/* BIDI API
+ * build a request. for bidi requests must be called twice to map/prepare
+ * the data-in and data-out buffers, one at a time according to
+ * the given dma_data_direction.
+ */
+extern void blk_rq_bio_prep_bidi(request_queue_t *, struct request *,
+    struct bio *, enum dma_data_direction);
+extern int blk_rq_map_kern_bidi(request_queue_t *, struct request *,
+    void *, unsigned int, gfp_t, enum dma_data_direction);
+/* retrieve the mapped pages for bidi according to
+ * the given dma_data_direction
+ */
+extern int blk_rq_map_sg_bidi(request_queue_t *, struct request *,
+    struct scatterlist *, enum dma_data_direction);
+
 #define MAX_PHYS_SEGMENTS 128
 #define MAX_HW_SEGMENTS 128
 #define SAFE_MAX_SECTORS 255
-- 
1.5.0.4.402.g8035




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-15 17:33 ` [PATCH 4/4] bidi support: bidirectional request Boaz Harrosh
@ 2007-04-28 19:48   ` FUJITA Tomonori
  2007-04-29 15:48     ` Boaz Harrosh
  0 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2007-04-28 19:48 UTC (permalink / raw)
  To: bharrosh
  Cc: jens.axboe, James.Bottomley, akpm, michaelc, hch, linux-scsi,
	linux-ide, bhalevy, osd-dev

From: Boaz Harrosh <bharrosh@panasas.com>
Subject: [PATCH 4/4] bidi support: bidirectional request
Date: Sun, 15 Apr 2007 20:33:28 +0300

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 645d24b..16a02ee 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -322,6 +322,7 @@ struct request {
>      void *end_io_data;
> 
>      struct request_io_part uni;
> +    struct request_io_part bidi_read;
>  };

Would be more straightforward to have:

struct request_io_part in;
struct request_io_part out;


>  /*
> @@ -600,6 +601,34 @@ static inline struct request_io_part* rq_uni(struct request* req)
>      return &req->uni;
>  }
>
> +static inline struct request_io_part* rq_out(struct request* req)
> +{
> +    WARN_ON_BIDI_FLAG(req);
> +    return &req->uni;
> +}
> +
> +static inline struct request_io_part* rq_in(struct request* req)
> +{
> +    WARN_ON_BIDI_FLAG(req);
> +    if (likely(rq_dma_dir(req) != DMA_BIDIRECTIONAL))
> +        return &req->uni;
> +
> +    if (likely(req->cmd_flags & REQ_BIDI))
> +        return &req->bidi_read;
> +
> +    return &req->uni;
> +}
> +
> +static inline struct request_io_part* rq_io(struct request* req,
> +                                            enum dma_data_direction dir)
> +{
> +    if (dir == DMA_FROM_DEVICE)
> +        return rq_in(req);
> +
> +    WARN_ON( (dir != DMA_TO_DEVICE) && (dir != DMA_NONE) );
> +    return &req->uni;
> +}

static inline struct request_io_part* rq_io(struct request* req)
{
	return (req is WRITE) ? &req->out : &req->in;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-28 19:48   ` FUJITA Tomonori
@ 2007-04-29 15:48     ` Boaz Harrosh
  2007-04-29 18:49       ` James Bottomley
  0 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2007-04-29 15:48 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: jens.axboe, James.Bottomley, akpm, michaelc, hch, linux-scsi,
	linux-ide, bhalevy

FUJITA Tomonori wrote:
> From: Boaz Harrosh <bharrosh@panasas.com>
> Subject: [PATCH 4/4] bidi support: bidirectional request
> Date: Sun, 15 Apr 2007 20:33:28 +0300
> 
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 645d24b..16a02ee 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -322,6 +322,7 @@ struct request {
>>      void *end_io_data;
>>
>>      struct request_io_part uni;
>> +    struct request_io_part bidi_read;
>>  };
> 
> Would be more straightforward to have:
> 
> struct request_io_part in;
> struct request_io_part out;
> 

Yes I wish I could do that. For bidi supporting drivers this is the most logical.
But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
the hotish paths, this means we will need a pointer to a uni request_io_part.
This is bad because:
1st- There is no defined stage in a request life where to definitely set that pointer,
     specially in the preparation stages.
2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
     very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
     (not sent yet) but a more long term solution is needed. Once such hacks are
     cleaned up we can do what you say. This is exactly why I use the access functions
     rq_uni/rq_io/rq_in/rq_out and not open code access.

> 
>>  /*
>> @@ -600,6 +601,34 @@ static inline struct request_io_part* rq_uni(struct request* req)
>>      return &req->uni;
>>  }
>>
>> +static inline struct request_io_part* rq_out(struct request* req)
>> +{
>> +    WARN_ON_BIDI_FLAG(req);
>> +    return &req->uni;
>> +}
>> +
>> +static inline struct request_io_part* rq_in(struct request* req)
>> +{
>> +    WARN_ON_BIDI_FLAG(req);
>> +    if (likely(rq_dma_dir(req) != DMA_BIDIRECTIONAL))
>> +        return &req->uni;
>> +
>> +    if (likely(req->cmd_flags & REQ_BIDI))
>> +        return &req->bidi_read;
>> +
>> +    return &req->uni;
>> +}
>> +
>> +static inline struct request_io_part* rq_io(struct request* req,
>> +                                            enum dma_data_direction dir)
>> +{
>> +    if (dir == DMA_FROM_DEVICE)
>> +        return rq_in(req);
>> +
>> +    WARN_ON( (dir != DMA_TO_DEVICE) && (dir != DMA_NONE) );
>> +    return &req->uni;
>> +}
> 
> static inline struct request_io_part* rq_io(struct request* req)
> {
> 	return (req is WRITE) ? &req->out : &req->in;
> }

Again I'm all for it. But this is a to deep of a change. Too many things changing
at once. If we keep the access functions than it does not matter, we can do it later.

Boaz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-29 15:48     ` Boaz Harrosh
@ 2007-04-29 18:49       ` James Bottomley
  2007-04-30 11:11         ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2007-04-29 18:49 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: FUJITA Tomonori, jens.axboe, akpm, michaelc, hch, linux-scsi,
	linux-ide, bhalevy

On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
> FUJITA Tomonori wrote:
> > From: Boaz Harrosh <bharrosh@panasas.com>
> > Subject: [PATCH 4/4] bidi support: bidirectional request
> > Date: Sun, 15 Apr 2007 20:33:28 +0300
> > 
> >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >> index 645d24b..16a02ee 100644
> >> --- a/include/linux/blkdev.h
> >> +++ b/include/linux/blkdev.h
> >> @@ -322,6 +322,7 @@ struct request {
> >>      void *end_io_data;
> >>
> >>      struct request_io_part uni;
> >> +    struct request_io_part bidi_read;
> >>  };
> > 
> > Would be more straightforward to have:
> > 
> > struct request_io_part in;
> > struct request_io_part out;
> > 
> 
> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
> the hotish paths, this means we will need a pointer to a uni request_io_part.
> This is bad because:
> 1st- There is no defined stage in a request life where to definitely set that pointer,
>      specially in the preparation stages.
> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
>      (not sent yet) but a more long term solution is needed. Once such hacks are
>      cleaned up we can do what you say. This is exactly why I use the access functions
>      rq_uni/rq_io/rq_in/rq_out and not open code access.

I'm still not really convinced about this approach.  The primary job of
the block layer is to manage and merge READ and WRITE requests.  It
serves a beautiful secondary function of queueing for arbitrary requests
it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
indeed any non REQ_TYPE_FS).

bidirectional requests fall into the latter category (there's nothing
really we can do to merge them ... they're just transported by the block
layer).  The only unusual feature is that they carry two bios.  I think
the drivers that actually support bidirectional will be a rarity, so it
might even be advisable to add it to the queue capability (refuse
bidirectional requests at the top rather than perturbing all the drivers
to process them).

So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
remove it from the standard path and put it on the special command type
path where we can process it specially.  Additionally, if you take this
approach, you can probably simply chain the second bio through
req->special as an additional request in the stream.  The only thing
that would then need modification would be the dequeue of the block
driver (it would have to dequeue both requests and prepare them) and
that needs to be done only for drivers handling bidirectional requests.

James

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-29 18:49       ` James Bottomley
@ 2007-04-30 11:11         ` Jens Axboe
  2007-04-30 11:53           ` Benny Halevy
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jens Axboe @ 2007-04-30 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Boaz Harrosh, FUJITA Tomonori, akpm, michaelc, hch, linux-scsi,
	linux-ide, bhalevy

On Sun, Apr 29 2007, James Bottomley wrote:
> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
> > FUJITA Tomonori wrote:
> > > From: Boaz Harrosh <bharrosh@panasas.com>
> > > Subject: [PATCH 4/4] bidi support: bidirectional request
> > > Date: Sun, 15 Apr 2007 20:33:28 +0300
> > > 
> > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > >> index 645d24b..16a02ee 100644
> > >> --- a/include/linux/blkdev.h
> > >> +++ b/include/linux/blkdev.h
> > >> @@ -322,6 +322,7 @@ struct request {
> > >>      void *end_io_data;
> > >>
> > >>      struct request_io_part uni;
> > >> +    struct request_io_part bidi_read;
> > >>  };
> > > 
> > > Would be more straightforward to have:
> > > 
> > > struct request_io_part in;
> > > struct request_io_part out;
> > > 
> > 
> > Yes I wish I could do that. For bidi supporting drivers this is the most logical.
> > But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
> > the hotish paths, this means we will need a pointer to a uni request_io_part.
> > This is bad because:
> > 1st- There is no defined stage in a request life where to definitely set that pointer,
> >      specially in the preparation stages.
> > 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
> >      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
> >      (not sent yet) but a more long term solution is needed. Once such hacks are
> >      cleaned up we can do what you say. This is exactly why I use the access functions
> >      rq_uni/rq_io/rq_in/rq_out and not open code access.
> 
> I'm still not really convinced about this approach.  The primary job of
> the block layer is to manage and merge READ and WRITE requests.  It
> serves a beautiful secondary function of queueing for arbitrary requests
> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
> indeed any non REQ_TYPE_FS).
> 
> bidirectional requests fall into the latter category (there's nothing
> really we can do to merge them ... they're just transported by the block
> layer).  The only unusual feature is that they carry two bios.  I think
> the drivers that actually support bidirectional will be a rarity, so it
> might even be advisable to add it to the queue capability (refuse
> bidirectional requests at the top rather than perturbing all the drivers
> to process them).
> 
> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
> remove it from the standard path and put it on the special command type
> path where we can process it specially.  Additionally, if you take this
> approach, you can probably simply chain the second bio through
> req->special as an additional request in the stream.  The only thing
> that would then need modification would be the dequeue of the block
> driver (it would have to dequeue both requests and prepare them) and
> that needs to be done only for drivers handling bidirectional requests.

I agree, I'm really not crazy about shuffling the entire request setup
around just for something as exotic as bidirection commands. How about
just keeping it simple - have a second request linked off the first one
for the second data phase? So keep it completely seperate, not just
overload ->special for 2nd bio list.

So basically just add a struct request pointer, so you can do rq =
rq->next_rq or something for the next data phase. I bet this would be a
LOT less invasive as well, and we can get by with a few helpers to
support it.

And it should definitely be a request type.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 11:11         ` Jens Axboe
@ 2007-04-30 11:53           ` Benny Halevy
  2007-04-30 11:59             ` Jens Axboe
  2007-04-30 13:05           ` Mark Lord
  2007-05-01 19:50           ` FUJITA Tomonori
  2 siblings, 1 reply; 22+ messages in thread
From: Benny Halevy @ 2007-04-30 11:53 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley
  Cc: Boaz Harrosh, FUJITA Tomonori, akpm, michaelc, hch, linux-scsi,
	linux-ide

[-- Attachment #1: Type: text/plain, Size: 4371 bytes --]

Jens Axboe wrote:
> On Sun, Apr 29 2007, James Bottomley wrote:
>> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
>>> FUJITA Tomonori wrote:
>>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>>> Subject: [PATCH 4/4] bidi support: bidirectional request
>>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
>>>>
>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>> index 645d24b..16a02ee 100644
>>>>> --- a/include/linux/blkdev.h
>>>>> +++ b/include/linux/blkdev.h
>>>>> @@ -322,6 +322,7 @@ struct request {
>>>>>      void *end_io_data;
>>>>>
>>>>>      struct request_io_part uni;
>>>>> +    struct request_io_part bidi_read;
>>>>>  };
>>>> Would be more straightforward to have:
>>>>
>>>> struct request_io_part in;
>>>> struct request_io_part out;
>>>>
>>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
>>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
>>> the hotish paths, this means we will need a pointer to a uni request_io_part.
>>> This is bad because:
>>> 1st- There is no defined stage in a request life where to definitely set that pointer,
>>>      specially in the preparation stages.
>>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
>>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
>>>      (not sent yet) but a more long term solution is needed. Once such hacks are
>>>      cleaned up we can do what you say. This is exactly why I use the access functions
>>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
>> I'm still not really convinced about this approach.  The primary job of
>> the block layer is to manage and merge READ and WRITE requests.  It
>> serves a beautiful secondary function of queueing for arbitrary requests
>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
>> indeed any non REQ_TYPE_FS).
>>
>> bidirectional requests fall into the latter category (there's nothing
>> really we can do to merge them ... they're just transported by the block
>> layer).  The only unusual feature is that they carry two bios.  I think
>> the drivers that actually support bidirectional will be a rarity, so it
>> might even be advisable to add it to the queue capability (refuse
>> bidirectional requests at the top rather than perturbing all the drivers
>> to process them).
>>
>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
>> remove it from the standard path and put it on the special command type
>> path where we can process it specially.  Additionally, if you take this
>> approach, you can probably simply chain the second bio through
>> req->special as an additional request in the stream.  The only thing
>> that would then need modification would be the dequeue of the block
>> driver (it would have to dequeue both requests and prepare them) and
>> that needs to be done only for drivers handling bidirectional requests.
> 
> I agree, I'm really not crazy about shuffling the entire request setup
> around just for something as exotic as bidirection commands. How about
> just keeping it simple - have a second request linked off the first one
> for the second data phase? So keep it completely seperate, not just
> overload ->special for 2nd bio list.
> 
> So basically just add a struct request pointer, so you can do rq =
> rq->next_rq or something for the next data phase. I bet this would be a
> LOT less invasive as well, and we can get by with a few helpers to
> support it.
> 
> And it should definitely be a request type.
> 

I'm a bit confused since what you both suggest is very similar to what we've
proposed back in October 2006 and the impression we got was that it will be
better to support bidirectional block requests natively (yet to be honest,
James, you wanted a linked request all along).

Before we go on that route again, how do you see the support for bidi
at the scsi mid-layer done?  Again, we prefer to support that officially
using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
a one-off feature, using special-purpose state and logic (e.g. a linked
struct scsi_cmd for the bidi_read sg list).

I'm attaching the patch we sent back then for your reference.
(for some reason I couldn't find the original post in the
any linux-scsi archives)

Regards,

Benny

[-- Attachment #2: linux-bidi-2.6.18.patch --]
[-- Type: application/x-patch, Size: 28634 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 11:53           ` Benny Halevy
@ 2007-04-30 11:59             ` Jens Axboe
  2007-04-30 14:52               ` Douglas Gilbert
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2007-04-30 11:59 UTC (permalink / raw)
  To: Benny Halevy
  Cc: James Bottomley, Boaz Harrosh, FUJITA Tomonori, akpm, michaelc,
	hch, linux-scsi, linux-ide

On Mon, Apr 30 2007, Benny Halevy wrote:
> Jens Axboe wrote:
> > On Sun, Apr 29 2007, James Bottomley wrote:
> >> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
> >>> FUJITA Tomonori wrote:
> >>>> From: Boaz Harrosh <bharrosh@panasas.com>
> >>>> Subject: [PATCH 4/4] bidi support: bidirectional request
> >>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
> >>>>
> >>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >>>>> index 645d24b..16a02ee 100644
> >>>>> --- a/include/linux/blkdev.h
> >>>>> +++ b/include/linux/blkdev.h
> >>>>> @@ -322,6 +322,7 @@ struct request {
> >>>>>      void *end_io_data;
> >>>>>
> >>>>>      struct request_io_part uni;
> >>>>> +    struct request_io_part bidi_read;
> >>>>>  };
> >>>> Would be more straightforward to have:
> >>>>
> >>>> struct request_io_part in;
> >>>> struct request_io_part out;
> >>>>
> >>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
> >>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
> >>> the hotish paths, this means we will need a pointer to a uni request_io_part.
> >>> This is bad because:
> >>> 1st- There is no defined stage in a request life where to definitely set that pointer,
> >>>      specially in the preparation stages.
> >>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
> >>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
> >>>      (not sent yet) but a more long term solution is needed. Once such hacks are
> >>>      cleaned up we can do what you say. This is exactly why I use the access functions
> >>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
> >> I'm still not really convinced about this approach.  The primary job of
> >> the block layer is to manage and merge READ and WRITE requests.  It
> >> serves a beautiful secondary function of queueing for arbitrary requests
> >> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
> >> indeed any non REQ_TYPE_FS).
> >>
> >> bidirectional requests fall into the latter category (there's nothing
> >> really we can do to merge them ... they're just transported by the block
> >> layer).  The only unusual feature is that they carry two bios.  I think
> >> the drivers that actually support bidirectional will be a rarity, so it
> >> might even be advisable to add it to the queue capability (refuse
> >> bidirectional requests at the top rather than perturbing all the drivers
> >> to process them).
> >>
> >> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
> >> remove it from the standard path and put it on the special command type
> >> path where we can process it specially.  Additionally, if you take this
> >> approach, you can probably simply chain the second bio through
> >> req->special as an additional request in the stream.  The only thing
> >> that would then need modification would be the dequeue of the block
> >> driver (it would have to dequeue both requests and prepare them) and
> >> that needs to be done only for drivers handling bidirectional requests.
> > 
> > I agree, I'm really not crazy about shuffling the entire request setup
> > around just for something as exotic as bidirection commands. How about
> > just keeping it simple - have a second request linked off the first one
> > for the second data phase? So keep it completely seperate, not just
> > overload ->special for 2nd bio list.
> > 
> > So basically just add a struct request pointer, so you can do rq =
> > rq->next_rq or something for the next data phase. I bet this would be a
> > LOT less invasive as well, and we can get by with a few helpers to
> > support it.
> > 
> > And it should definitely be a request type.
> > 
> 
> I'm a bit confused since what you both suggest is very similar to what we've
> proposed back in October 2006 and the impression we got was that it will be
> better to support bidirectional block requests natively (yet to be honest,
> James, you wanted a linked request all along).

It still has to be implemented natively at the block layer, just
differently like described above. So instead of messing all over the
block layer adding rq_uni() stuff, just add that struct request pointer
to the request structure for the 2nd data phase. You can relatively easy
then modify the block layer helpers to support mapping and setup of such
requests.

> Before we go on that route again, how do you see the support for bidi
> at the scsi mid-layer done?  Again, we prefer to support that officially
> using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
> a one-off feature, using special-purpose state and logic (e.g. a linked
> struct scsi_cmd for the bidi_read sg list).

The SCSI part is up to James, that can be done as either inside a single
scsi command, or as linked scsi commands as well. I don't care too much
about that bit, just the block layer parts :-). And the proposed block
layer design can be used both ways by the scsi layer.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 11:59             ` Jens Axboe
@ 2007-04-30 14:52               ` Douglas Gilbert
  2007-04-30 14:51                 ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Douglas Gilbert @ 2007-04-30 14:52 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Benny Halevy, James Bottomley, Boaz Harrosh, FUJITA Tomonori,
	akpm, michaelc, hch, linux-scsi, linux-ide

Jens Axboe wrote:
> On Mon, Apr 30 2007, Benny Halevy wrote:
>> Jens Axboe wrote:
>>> On Sun, Apr 29 2007, James Bottomley wrote:
>>>> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
>>>>> FUJITA Tomonori wrote:
>>>>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>>>>> Subject: [PATCH 4/4] bidi support: bidirectional request
>>>>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
>>>>>>
>>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>>>> index 645d24b..16a02ee 100644
>>>>>>> --- a/include/linux/blkdev.h
>>>>>>> +++ b/include/linux/blkdev.h
>>>>>>> @@ -322,6 +322,7 @@ struct request {
>>>>>>>      void *end_io_data;
>>>>>>>
>>>>>>>      struct request_io_part uni;
>>>>>>> +    struct request_io_part bidi_read;
>>>>>>>  };
>>>>>> Would be more straightforward to have:
>>>>>>
>>>>>> struct request_io_part in;
>>>>>> struct request_io_part out;
>>>>>>
>>>>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
>>>>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
>>>>> the hotish paths, this means we will need a pointer to a uni request_io_part.
>>>>> This is bad because:
>>>>> 1st- There is no defined stage in a request life where to definitely set that pointer,
>>>>>      specially in the preparation stages.
>>>>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
>>>>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
>>>>>      (not sent yet) but a more long term solution is needed. Once such hacks are
>>>>>      cleaned up we can do what you say. This is exactly why I use the access functions
>>>>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
>>>> I'm still not really convinced about this approach.  The primary job of
>>>> the block layer is to manage and merge READ and WRITE requests.  It
>>>> serves a beautiful secondary function of queueing for arbitrary requests
>>>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
>>>> indeed any non REQ_TYPE_FS).
>>>>
>>>> bidirectional requests fall into the latter category (there's nothing
>>>> really we can do to merge them ... they're just transported by the block
>>>> layer).  The only unusual feature is that they carry two bios.  I think
>>>> the drivers that actually support bidirectional will be a rarity, so it
>>>> might even be advisable to add it to the queue capability (refuse
>>>> bidirectional requests at the top rather than perturbing all the drivers
>>>> to process them).
>>>>
>>>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
>>>> remove it from the standard path and put it on the special command type
>>>> path where we can process it specially.  Additionally, if you take this
>>>> approach, you can probably simply chain the second bio through
>>>> req->special as an additional request in the stream.  The only thing
>>>> that would then need modification would be the dequeue of the block
>>>> driver (it would have to dequeue both requests and prepare them) and
>>>> that needs to be done only for drivers handling bidirectional requests.
>>> I agree, I'm really not crazy about shuffling the entire request setup
>>> around just for something as exotic as bidirection commands. How about
>>> just keeping it simple - have a second request linked off the first one
>>> for the second data phase? So keep it completely seperate, not just
>>> overload ->special for 2nd bio list.
>>>
>>> So basically just add a struct request pointer, so you can do rq =
>>> rq->next_rq or something for the next data phase. I bet this would be a
>>> LOT less invasive as well, and we can get by with a few helpers to
>>> support it.
>>>
>>> And it should definitely be a request type.
>>>
>> I'm a bit confused since what you both suggest is very similar to what we've
>> proposed back in October 2006 and the impression we got was that it will be
>> better to support bidirectional block requests natively (yet to be honest,
>> James, you wanted a linked request all along).
> 
> It still has to be implemented natively at the block layer, just
> differently like described above. So instead of messing all over the
> block layer adding rq_uni() stuff, just add that struct request pointer
> to the request structure for the 2nd data phase. You can relatively easy
> then modify the block layer helpers to support mapping and setup of such
> requests.
> 
>> Before we go on that route again, how do you see the support for bidi
>> at the scsi mid-layer done?  Again, we prefer to support that officially
>> using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
>> a one-off feature, using special-purpose state and logic (e.g. a linked
>> struct scsi_cmd for the bidi_read sg list).
> 
> The SCSI part is up to James, that can be done as either inside a single
> scsi command, or as linked scsi commands as well. I don't care too much
> about that bit, just the block layer parts :-). And the proposed block
> layer design can be used both ways by the scsi layer.

Linked SCSI commands have been obsolete since SPC-4 rev 6
(18 July 2006) after proposal 06-259r1 was accepted. That
proposal starts: "The reasons for linked commands have been
overtaken by time and events." I haven't see anyone mourning
their demise on the t10 reflector.

Mapping two requests to one bidi SCSI command might make error
handling more of a challenge.

Doug Gilbert



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 14:52               ` Douglas Gilbert
@ 2007-04-30 14:51                 ` Jens Axboe
  2007-04-30 15:12                   ` Benny Halevy
  2007-05-01 18:22                   ` Boaz Harrosh
  0 siblings, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2007-04-30 14:51 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: Benny Halevy, James Bottomley, Boaz Harrosh, FUJITA Tomonori,
	akpm, michaelc, hch, linux-scsi, linux-ide

On Mon, Apr 30 2007, Douglas Gilbert wrote:
> Jens Axboe wrote:
> > On Mon, Apr 30 2007, Benny Halevy wrote:
> >> Jens Axboe wrote:
> >>> On Sun, Apr 29 2007, James Bottomley wrote:
> >>>> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
> >>>>> FUJITA Tomonori wrote:
> >>>>>> From: Boaz Harrosh <bharrosh@panasas.com>
> >>>>>> Subject: [PATCH 4/4] bidi support: bidirectional request
> >>>>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
> >>>>>>
> >>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >>>>>>> index 645d24b..16a02ee 100644
> >>>>>>> --- a/include/linux/blkdev.h
> >>>>>>> +++ b/include/linux/blkdev.h
> >>>>>>> @@ -322,6 +322,7 @@ struct request {
> >>>>>>>      void *end_io_data;
> >>>>>>>
> >>>>>>>      struct request_io_part uni;
> >>>>>>> +    struct request_io_part bidi_read;
> >>>>>>>  };
> >>>>>> Would be more straightforward to have:
> >>>>>>
> >>>>>> struct request_io_part in;
> >>>>>> struct request_io_part out;
> >>>>>>
> >>>>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
> >>>>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
> >>>>> the hotish paths, this means we will need a pointer to a uni request_io_part.
> >>>>> This is bad because:
> >>>>> 1st- There is no defined stage in a request life where to definitely set that pointer,
> >>>>>      specially in the preparation stages.
> >>>>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
> >>>>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
> >>>>>      (not sent yet) but a more long term solution is needed. Once such hacks are
> >>>>>      cleaned up we can do what you say. This is exactly why I use the access functions
> >>>>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
> >>>> I'm still not really convinced about this approach.  The primary job of
> >>>> the block layer is to manage and merge READ and WRITE requests.  It
> >>>> serves a beautiful secondary function of queueing for arbitrary requests
> >>>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
> >>>> indeed any non REQ_TYPE_FS).
> >>>>
> >>>> bidirectional requests fall into the latter category (there's nothing
> >>>> really we can do to merge them ... they're just transported by the block
> >>>> layer).  The only unusual feature is that they carry two bios.  I think
> >>>> the drivers that actually support bidirectional will be a rarity, so it
> >>>> might even be advisable to add it to the queue capability (refuse
> >>>> bidirectional requests at the top rather than perturbing all the drivers
> >>>> to process them).
> >>>>
> >>>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
> >>>> remove it from the standard path and put it on the special command type
> >>>> path where we can process it specially.  Additionally, if you take this
> >>>> approach, you can probably simply chain the second bio through
> >>>> req->special as an additional request in the stream.  The only thing
> >>>> that would then need modification would be the dequeue of the block
> >>>> driver (it would have to dequeue both requests and prepare them) and
> >>>> that needs to be done only for drivers handling bidirectional requests.
> >>> I agree, I'm really not crazy about shuffling the entire request setup
> >>> around just for something as exotic as bidirection commands. How about
> >>> just keeping it simple - have a second request linked off the first one
> >>> for the second data phase? So keep it completely seperate, not just
> >>> overload ->special for 2nd bio list.
> >>>
> >>> So basically just add a struct request pointer, so you can do rq =
> >>> rq->next_rq or something for the next data phase. I bet this would be a
> >>> LOT less invasive as well, and we can get by with a few helpers to
> >>> support it.
> >>>
> >>> And it should definitely be a request type.
> >>>
> >> I'm a bit confused since what you both suggest is very similar to what we've
> >> proposed back in October 2006 and the impression we got was that it will be
> >> better to support bidirectional block requests natively (yet to be honest,
> >> James, you wanted a linked request all along).
> > 
> > It still has to be implemented natively at the block layer, just
> > differently like described above. So instead of messing all over the
> > block layer adding rq_uni() stuff, just add that struct request pointer
> > to the request structure for the 2nd data phase. You can relatively easy
> > then modify the block layer helpers to support mapping and setup of such
> > requests.
> > 
> >> Before we go on that route again, how do you see the support for bidi
> >> at the scsi mid-layer done?  Again, we prefer to support that officially
> >> using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
> >> a one-off feature, using special-purpose state and logic (e.g. a linked
> >> struct scsi_cmd for the bidi_read sg list).
> > 
> > The SCSI part is up to James, that can be done as either inside a single
> > scsi command, or as linked scsi commands as well. I don't care too much
> > about that bit, just the block layer parts :-). And the proposed block
> > layer design can be used both ways by the scsi layer.
> 
> Linked SCSI commands have been obsolete since SPC-4 rev 6
> (18 July 2006) after proposal 06-259r1 was accepted. That
> proposal starts: "The reasons for linked commands have been
> overtaken by time and events." I haven't see anyone mourning
> their demise on the t10 reflector.

This has nothing to do with linked commands as defined in the SCSI spec.

> Mapping two requests to one bidi SCSI command might make error
> handling more of a challenge.

Then go the other way, a command for each. Not a big deal.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 14:51                 ` Jens Axboe
@ 2007-04-30 15:12                   ` Benny Halevy
  2007-05-01 18:22                   ` Boaz Harrosh
  1 sibling, 0 replies; 22+ messages in thread
From: Benny Halevy @ 2007-04-30 15:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Douglas Gilbert, James Bottomley, Boaz Harrosh, FUJITA Tomonori,
	akpm, michaelc, hch, linux-scsi, linux-ide

Jens Axboe wrote:
> On Mon, Apr 30 2007, Douglas Gilbert wrote:
>> Jens Axboe wrote:
>>> On Mon, Apr 30 2007, Benny Halevy wrote:
>>>> Jens Axboe wrote:
>>>>> On Sun, Apr 29 2007, James Bottomley wrote:
>>>>>> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
>>>>>>> FUJITA Tomonori wrote:
>>>>>>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>>>>>>> Subject: [PATCH 4/4] bidi support: bidirectional request
>>>>>>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
>>>>>>>>
>>>>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>>>>>> index 645d24b..16a02ee 100644
>>>>>>>>> --- a/include/linux/blkdev.h
>>>>>>>>> +++ b/include/linux/blkdev.h
>>>>>>>>> @@ -322,6 +322,7 @@ struct request {
>>>>>>>>>      void *end_io_data;
>>>>>>>>>
>>>>>>>>>      struct request_io_part uni;
>>>>>>>>> +    struct request_io_part bidi_read;
>>>>>>>>>  };
>>>>>>>> Would be more straightforward to have:
>>>>>>>>
>>>>>>>> struct request_io_part in;
>>>>>>>> struct request_io_part out;
>>>>>>>>
>>>>>>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
>>>>>>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
>>>>>>> the hotish paths, this means we will need a pointer to a uni request_io_part.
>>>>>>> This is bad because:
>>>>>>> 1st- There is no defined stage in a request life where to definitely set that pointer,
>>>>>>>      specially in the preparation stages.
>>>>>>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
>>>>>>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
>>>>>>>      (not sent yet) but a more long term solution is needed. Once such hacks are
>>>>>>>      cleaned up we can do what you say. This is exactly why I use the access functions
>>>>>>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
>>>>>> I'm still not really convinced about this approach.  The primary job of
>>>>>> the block layer is to manage and merge READ and WRITE requests.  It
>>>>>> serves a beautiful secondary function of queueing for arbitrary requests
>>>>>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
>>>>>> indeed any non REQ_TYPE_FS).
>>>>>>
>>>>>> bidirectional requests fall into the latter category (there's nothing
>>>>>> really we can do to merge them ... they're just transported by the block
>>>>>> layer).  The only unusual feature is that they carry two bios.  I think
>>>>>> the drivers that actually support bidirectional will be a rarity, so it
>>>>>> might even be advisable to add it to the queue capability (refuse
>>>>>> bidirectional requests at the top rather than perturbing all the drivers
>>>>>> to process them).
>>>>>>
>>>>>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
>>>>>> remove it from the standard path and put it on the special command type
>>>>>> path where we can process it specially.  Additionally, if you take this
>>>>>> approach, you can probably simply chain the second bio through
>>>>>> req->special as an additional request in the stream.  The only thing
>>>>>> that would then need modification would be the dequeue of the block
>>>>>> driver (it would have to dequeue both requests and prepare them) and
>>>>>> that needs to be done only for drivers handling bidirectional requests.
>>>>> I agree, I'm really not crazy about shuffling the entire request setup
>>>>> around just for something as exotic as bidirection commands. How about
>>>>> just keeping it simple - have a second request linked off the first one
>>>>> for the second data phase? So keep it completely seperate, not just
>>>>> overload ->special for 2nd bio list.
>>>>>
>>>>> So basically just add a struct request pointer, so you can do rq =
>>>>> rq->next_rq or something for the next data phase. I bet this would be a
>>>>> LOT less invasive as well, and we can get by with a few helpers to
>>>>> support it.
>>>>>
>>>>> And it should definitely be a request type.
>>>>>
>>>> I'm a bit confused since what you both suggest is very similar to what we've
>>>> proposed back in October 2006 and the impression we got was that it will be
>>>> better to support bidirectional block requests natively (yet to be honest,
>>>> James, you wanted a linked request all along).
>>> It still has to be implemented natively at the block layer, just
>>> differently like described above. So instead of messing all over the
>>> block layer adding rq_uni() stuff, just add that struct request pointer
>>> to the request structure for the 2nd data phase. You can relatively easy
>>> then modify the block layer helpers to support mapping and setup of such
>>> requests.
>>>
>>>> Before we go on that route again, how do you see the support for bidi
>>>> at the scsi mid-layer done?  Again, we prefer to support that officially
>>>> using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
>>>> a one-off feature, using special-purpose state and logic (e.g. a linked
>>>> struct scsi_cmd for the bidi_read sg list).
>>> The SCSI part is up to James, that can be done as either inside a single
>>> scsi command, or as linked scsi commands as well. I don't care too much
>>> about that bit, just the block layer parts :-). And the proposed block
>>> layer design can be used both ways by the scsi layer.
>> Linked SCSI commands have been obsolete since SPC-4 rev 6
>> (18 July 2006) after proposal 06-259r1 was accepted. That
>> proposal starts: "The reasons for linked commands have been
>> overtaken by time and events." I haven't see anyone mourning
>> their demise on the t10 reflector.
> 
> This has nothing to do with linked commands as defined in the SCSI spec.
> 
>> Mapping two requests to one bidi SCSI command might make error
>> handling more of a challenge.
> 
> Then go the other way, a command for each. Not a big deal.
> 

Let's take a stab at it then and see how it goes.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 14:51                 ` Jens Axboe
  2007-04-30 15:12                   ` Benny Halevy
@ 2007-05-01 18:22                   ` Boaz Harrosh
  2007-05-01 18:57                     ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2007-05-01 18:22 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Christoph Hellwig
  Cc: Douglas Gilbert, Benny Halevy, FUJITA Tomonori, akpm, michaelc,
	linux-scsi, linux-ide

[-- Attachment #1: Type: text/plain, Size: 7200 bytes --]

Jens Axboe wrote:
> On Mon, Apr 30 2007, Douglas Gilbert wrote:
>> Jens Axboe wrote:
>>> On Mon, Apr 30 2007, Benny Halevy wrote:
>>>> Jens Axboe wrote:
>>>>> On Sun, Apr 29 2007, James Bottomley wrote:
>>>>>> I'm still not really convinced about this approach.  The primary job of
>>>>>> the block layer is to manage and merge READ and WRITE requests.  It
>>>>>> serves a beautiful secondary function of queueing for arbitrary requests
>>>>>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
>>>>>> indeed any non REQ_TYPE_FS).
>>>>>>
>>>>>> bidirectional requests fall into the latter category (there's nothing
>>>>>> really we can do to merge them ... they're just transported by the block
>>>>>> layer).  The only unusual feature is that they carry two bios.  I think
>>>>>> the drivers that actually support bidirectional will be a rarity, so it
>>>>>> might even be advisable to add it to the queue capability (refuse
>>>>>> bidirectional requests at the top rather than perturbing all the drivers
>>>>>> to process them).
>>>>>>
>>>>>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
>>>>>> remove it from the standard path and put it on the special command type
>>>>>> path where we can process it specially.  Additionally, if you take this
>>>>>> approach, you can probably simply chain the second bio through
>>>>>> req->special as an additional request in the stream.  The only thing
>>>>>> that would then need modification would be the dequeue of the block
>>>>>> driver (it would have to dequeue both requests and prepare them) and
>>>>>> that needs to be done only for drivers handling bidirectional requests.
>>>>> I agree, I'm really not crazy about shuffling the entire request setup
>>>>> around just for something as exotic as bidirection commands. How about
>>>>> just keeping it simple - have a second request linked off the first one
>>>>> for the second data phase? So keep it completely seperate, not just
>>>>> overload ->special for 2nd bio list.
>>>>>
>>>>> So basically just add a struct request pointer, so you can do rq =
>>>>> rq->next_rq or something for the next data phase. I bet this would be a
>>>>> LOT less invasive as well, and we can get by with a few helpers to
>>>>> support it.
>>>>>
>>>>> And it should definitely be a request type.
>>>>>
>>>> I'm a bit confused since what you both suggest is very similar to what we've
>>>> proposed back in October 2006 and the impression we got was that it will be
>>>> better to support bidirectional block requests natively (yet to be honest,
>>>> James, you wanted a linked request all along).
>>> It still has to be implemented natively at the block layer, just
>>> differently like described above. So instead of messing all over the
>>> block layer adding rq_uni() stuff, just add that struct request pointer
>>> to the request structure for the 2nd data phase. You can relatively easy
>>> then modify the block layer helpers to support mapping and setup of such
>>> requests.
>>>
>>>> Before we go on that route again, how do you see the support for bidi
>>>> at the scsi mid-layer done?  Again, we prefer to support that officially
>>>> using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
>>>> a one-off feature, using special-purpose state and logic (e.g. a linked
>>>> struct scsi_cmd for the bidi_read sg list).
>>> The SCSI part is up to James, that can be done as either inside a single
>>> scsi command, or as linked scsi commands as well. I don't care too much
>>> about that bit, just the block layer parts :-). And the proposed block
>>> layer design can be used both ways by the scsi layer.
>> Linked SCSI commands have been obsolete since SPC-4 rev 6
>> (18 July 2006) after proposal 06-259r1 was accepted. That
>> proposal starts: "The reasons for linked commands have been
>> overtaken by time and events." I haven't see anyone mourning
>> their demise on the t10 reflector.
> 
> This has nothing to do with linked commands as defined in the SCSI spec.
> 
>> Mapping two requests to one bidi SCSI command might make error
>> handling more of a challenge.
> 
> Then go the other way, a command for each. Not a big deal.
> 

Hi Jens, James,

Thanks for your response!

Please consider the attached proposal. It is a complete block-level bidi
implementation that is, I hope, a middle ground which will keep everyone
happy (including Christoph). It is both quite small and not invasive,
yet has a full bidi API that is easy to use and maintain.

The patches take into account Douglas concern as well as Jens's and James.
1. Flags and "direction" are kept the same as before. I have only shifted them
   around a bit so they can work with bidi semantics as well. It is more of
   a necessary cleanup of weak code. (Patches 1 && 2). Thanks for the offer of
   a use of a new REQ_TYPE_XXX, but, as you can see below, it is not needed and
   bidi can safely be handled by the REQ_TYPE_BLOCK_PC paths.
2. C language has, what are called, nameless struct and union. I have used it
   to enable the same bidi approach as before but in a way that is absolutely
   backward code compatible. So the huge patch#3 has Just disappeared. The BIDI
   API is then implemented, considering the first and second adjustments, in a much
   similar way as before.

I have tested these patches with IBM OSD driver with bidi-enabled SCSI-ml and iSCSI.
All OSD and iSCSI tests pass. The overall adjustments are minor. Since I have basically
used the old core bidi code I can say with confidence that I have not lost the stability
gained by the testers/developers that are using bidi already.

I would like to summarize on why a second request hanging on the first request is less than
optimal solution.
1. A full request is not needed. Only the io members are needed.
2. A request is an expensive resource in the kernel. (Allocation, queuing, locking ...)
   which is a waste if you need to use bidi.
3. Error handling is a mess. Both in the building and in the recovery of io requests. Especially
   considering the double layering of SCSI-ml. (With struct scsi_cmnd already hanging on req->special)
4. Lots of other code, not at all touched by this patch, will have to change so they safely ignore the
   extra brain-dead request.
5. Bugs can always creep into ll_rw_blk.c since it is not clear from the code itself what functions
   are allowed and safe to use with the second io-only request and what functions are only allowed on
   the main request. With my approach the division of code is very clear.

Concerning what James said about bidi capability been a property of the Q of only these devices that
support it. Yes! Block level should not allow bidi access to devices that do not support it. Otherwise,
through bsg, (when it will be available), user-mode can DOS the system by sending bidi commands to legacy
devices. How should a device advertise this capability?

Please note that these patches are over 2.6.21-rc5 linux-2.6-block tree and will need to be updated
and cleaned for proper submission.

Please every one comment so we can proceed in the direction of the final solution. Pros are as
welcome as Cons   ;)

Thanks in advance
Boaz Harrosh

[-- Attachment #2: 0001-rq_direction-is_sync-and-rw-flags-cleanup.patch --]
[-- Type: text/plain, Size: 7954 bytes --]

>From 73c94d6b7e41523d44e7787617c8a1abb351326f Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@bh-buildlin2.(none)>
Date: Sun, 29 Apr 2007 16:11:11 +0300
Subject: [PATCH] rq_direction - is_sync and rw flags cleanup
- is_sync is it's own bool in call to elev{,ator}_may_queue{,fn}
- set some policy on when rw flag is set
   - alloc starts as read (0)
   - get_request() or __make_request() will set to write acourding to
     parameter or bio information
---
 block/as-iosched.c       |    2 +-
 block/cfq-iosched.c      |    6 +++---
 block/elevator.c         |    4 ++--
 block/ll_rw_blk.c        |   39 +++++++++++++--------------------------
 include/linux/elevator.h |    4 ++--
 5 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/block/as-iosched.c b/block/as-iosched.c
index ef12627..824d93e 100644
--- a/block/as-iosched.c
+++ b/block/as-iosched.c
@@ -1285,7 +1285,7 @@ static void as_work_handler(struct work_struct *work)
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
-static int as_may_queue(request_queue_t *q, int rw)
+static int as_may_queue(request_queue_t *q, int rw, int is_sync)
 {
 	int ret = ELV_MQUEUE_MAY;
 	struct as_data *ad = q->elevator->elevator_data;
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..1392ee9 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -226,7 +226,7 @@ static inline pid_t cfq_queue_pid(struct task_struct *task, int rw, int is_sync)
 	/*
 	 * Use the per-process queue, for read requests and syncronous writes
 	 */
-	if (!(rw & REQ_RW) || is_sync)
+	if (!(rw == WRITE) || is_sync)
 		return task->pid;
 
 	return CFQ_KEY_ASYNC;
@@ -1787,14 +1787,14 @@ static inline int __cfq_may_queue(struct cfq_queue *cfqq)
 	return ELV_MQUEUE_MAY;
 }
 
-static int cfq_may_queue(request_queue_t *q, int rw)
+static int cfq_may_queue(request_queue_t *q, int rw, int is_sync)
 {
 	struct cfq_data *cfqd = q->elevator->elevator_data;
 	struct task_struct *tsk = current;
 	struct cfq_queue *cfqq;
 	unsigned int key;
 
-	key = cfq_queue_pid(tsk, rw, rw & REQ_RW_SYNC);
+	key = cfq_queue_pid(tsk, rw, is_sync);
 
 	/*
 	 * don't force setup of a queue from here, as a call to may_queue
diff --git a/block/elevator.c b/block/elevator.c
index 96a00c8..eae857f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -845,12 +845,12 @@ void elv_put_request(request_queue_t *q, struct request *rq)
 		e->ops->elevator_put_req_fn(rq);
 }
 
-int elv_may_queue(request_queue_t *q, int rw)
+int elv_may_queue(request_queue_t *q, int rw, int is_sync)
 {
 	elevator_t *e = q->elevator;
 
 	if (e->ops->elevator_may_queue_fn)
-		return e->ops->elevator_may_queue_fn(q, rw);
+		return e->ops->elevator_may_queue_fn(q, rw, is_sync);
 
 	return ELV_MQUEUE_MAY;
 }
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..32daa55 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1958,18 +1958,14 @@ static inline void blk_free_request(request_queue_t *q, struct request *rq)
 }
 
 static struct request *
-blk_alloc_request(request_queue_t *q, int rw, int priv, gfp_t gfp_mask)
+blk_alloc_request(request_queue_t *q, int priv, gfp_t gfp_mask)
 {
 	struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);
 
 	if (!rq)
 		return NULL;
 
-	/*
-	 * first three bits are identical in rq->cmd_flags and bio->bi_rw,
-	 * see bio.h and blkdev.h
-	 */
-	rq->cmd_flags = rw | REQ_ALLOCED;
+	rq->cmd_flags = REQ_ALLOCED;
 
 	if (priv) {
 		if (unlikely(elv_set_request(q, rq, gfp_mask))) {
@@ -2055,16 +2051,17 @@ static void freed_request(request_queue_t *q, int rw, int priv)
  * Returns NULL on failure, with queue_lock held.
  * Returns !NULL on success, with queue_lock *not held*.
  */
-static struct request *get_request(request_queue_t *q, int rw_flags,
-				   struct bio *bio, gfp_t gfp_mask)
+static struct request *get_request(request_queue_t *q, int rw,
+                                   struct bio *bio, gfp_t gfp_mask)
 {
 	struct request *rq = NULL;
 	struct request_list *rl = &q->rq;
 	struct io_context *ioc = NULL;
-	const int rw = rw_flags & 0x01;
 	int may_queue, priv;
+	int is_sync = (rw==READ) || (bio && bio_sync(bio));
+	WARN_ON(bio && (bio_data_dir(bio) != (rw==WRITE)));
 
-	may_queue = elv_may_queue(q, rw_flags);
+	may_queue = elv_may_queue(q, rw, is_sync);
 	if (may_queue == ELV_MQUEUE_NO)
 		goto rq_starved;
 
@@ -2112,7 +2109,7 @@ static struct request *get_request(request_queue_t *q, int rw_flags,
 
 	spin_unlock_irq(q->queue_lock);
 
-	rq = blk_alloc_request(q, rw_flags, priv, gfp_mask);
+	rq = blk_alloc_request(q, priv, gfp_mask);
 	if (unlikely(!rq)) {
 		/*
 		 * Allocation failed presumably due to memory. Undo anything
@@ -2147,6 +2144,7 @@ rq_starved:
 	if (ioc_batching(q, ioc))
 		ioc->nr_batch_requests--;
 	
+	rq->cmd_flags |= (rw==WRITE);
 	rq_init(q, rq);
 
 	blk_add_trace_generic(q, bio, rw, BLK_TA_GETRQ);
@@ -2160,13 +2158,12 @@ out:
  *
  * Called with q->queue_lock held, and returns with it unlocked.
  */
-static struct request *get_request_wait(request_queue_t *q, int rw_flags,
+static struct request *get_request_wait(request_queue_t *q, int rw,
 					struct bio *bio)
 {
-	const int rw = rw_flags & 0x01;
 	struct request *rq;
 
-	rq = get_request(q, rw_flags, bio, GFP_NOIO);
+	rq = get_request(q, rw, bio, GFP_NOIO);
 	while (!rq) {
 		DEFINE_WAIT(wait);
 		struct request_list *rl = &q->rq;
@@ -2174,7 +2171,7 @@ static struct request *get_request_wait(request_queue_t *q, int rw_flags,
 		prepare_to_wait_exclusive(&rl->wait[rw], &wait,
 				TASK_UNINTERRUPTIBLE);
 
-		rq = get_request(q, rw_flags, bio, GFP_NOIO);
+		rq = get_request(q, rw, bio, GFP_NOIO);
 
 		if (!rq) {
 			struct io_context *ioc;
@@ -2908,7 +2905,6 @@ static int __make_request(request_queue_t *q, struct bio *bio)
 	int el_ret, nr_sectors, barrier, err;
 	const unsigned short prio = bio_prio(bio);
 	const int sync = bio_sync(bio);
-	int rw_flags;
 
 	nr_sectors = bio_sectors(bio);
 
@@ -2983,19 +2979,10 @@ static int __make_request(request_queue_t *q, struct bio *bio)
 
 get_rq:
 	/*
-	 * This sync check and mask will be re-done in init_request_from_bio(),
-	 * but we need to set it earlier to expose the sync flag to the
-	 * rq allocator and io schedulers.
-	 */
-	rw_flags = bio_data_dir(bio);
-	if (sync)
-		rw_flags |= REQ_RW_SYNC;
-
-	/*
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
-	req = get_request_wait(q, rw_flags, bio);
+	req = get_request_wait(q, bio_data_dir(bio), bio);
 
 	/*
 	 * After dropping the lock and possibly sleeping here, our request
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index e88fcbc..c947f71 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -20,7 +20,7 @@ typedef void (elevator_add_req_fn) (request_queue_t *, struct request *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
-typedef int (elevator_may_queue_fn) (request_queue_t *, int);
+typedef int (elevator_may_queue_fn) (request_queue_t *, int, int);
 
 typedef int (elevator_set_req_fn) (request_queue_t *, struct request *, gfp_t);
 typedef void (elevator_put_req_fn) (struct request *);
@@ -111,7 +111,7 @@ extern struct request *elv_former_request(request_queue_t *, struct request *);
 extern struct request *elv_latter_request(request_queue_t *, struct request *);
 extern int elv_register_queue(request_queue_t *q);
 extern void elv_unregister_queue(request_queue_t *q);
-extern int elv_may_queue(request_queue_t *, int);
+extern int elv_may_queue(request_queue_t *, int, int);
 extern void elv_completed_request(request_queue_t *, struct request *);
 extern int elv_set_request(request_queue_t *, struct request *, gfp_t);
 extern void elv_put_request(request_queue_t *, struct request *);
-- 
1.5.0.4.402.g8035


[-- Attachment #3: 0002-rq_direction-direction-API-and-cleanups.patch --]
[-- Type: text/plain, Size: 12162 bytes --]

>From 8d2b3d084da6d7ff9f7fb817b877c6e1b7759028 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@bh-buildlin2.(none)>
Date: Sun, 29 Apr 2007 16:18:31 +0300
Subject: [PATCH] rq_direction - direction API and cleanups
- define rq_rw_dir() to extract direction from cmd_flgs and translate to WRITE|READ
- rq_data_dir() will WARN_ON bidi before returning rq_rw_dir()
- rq_dma_dir() translate request state to a dma_data_direction enum
- change some users of rq_data dir() to rq_rw_dir() in ll_rw_blk.c elevator.c and scsi_lib.c
- simplify scsi_lib.c command prep in regard to direction.
- clean wrong use of DMA_BIDIRECTIONAL.
- BIO flags and REQ flags no longer match. Remove comments and do a proper translation
  between the 2 systems. (Please look in ll_rw_blk.c/blk_rq_bio_prep below if we need more flags)
---
 block/deadline-iosched.c     |    8 +++---
 block/elevator.c             |    5 ++-
 block/ll_rw_blk.c            |   37 +++++++++++++++++++++++---------
 drivers/scsi/scsi_error.c    |    2 +-
 drivers/scsi/scsi_lib.c      |   13 +++++------
 drivers/scsi/sg.c            |    2 -
 include/linux/blkdev.h       |   47 +++++++++++++++++++++++++++++++++++++++--
 include/linux/blktrace_api.h |    8 ++++++-
 8 files changed, 91 insertions(+), 31 deletions(-)

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 6d673e9..e605c09 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -53,7 +53,7 @@ struct deadline_data {
 
 static void deadline_move_request(struct deadline_data *, struct request *);
 
-#define RQ_RB_ROOT(dd, rq)	(&(dd)->sort_list[rq_data_dir((rq))])
+#define RQ_RB_ROOT(dd, rq)	(&(dd)->sort_list[rq_rw_dir((rq))])
 
 static void
 deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
@@ -72,7 +72,7 @@ retry:
 static inline void
 deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
+	const int data_dir = rq_rw_dir(rq);
 
 	if (dd->next_rq[data_dir] == rq) {
 		struct rb_node *rbnext = rb_next(&rq->rb_node);
@@ -92,7 +92,7 @@ static void
 deadline_add_request(struct request_queue *q, struct request *rq)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int data_dir = rq_data_dir(rq);
+	const int data_dir = rq_rw_dir(rq);
 
 	deadline_add_rq_rb(dd, rq);
 
@@ -197,7 +197,7 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
 static void
 deadline_move_request(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
+	const int data_dir = rq_rw_dir(rq);
 	struct rb_node *rbnext = rb_next(&rq->rb_node);
 
 	dd->next_rq[READ] = NULL;
diff --git a/block/elevator.c b/block/elevator.c
index eae857f..18485f0 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -76,7 +76,7 @@ inline int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 	/*
 	 * different data direction or already started, don't merge
 	 */
-	if (bio_data_dir(bio) != rq_data_dir(rq))
+	if (bio_data_dir(bio) != rq_rw_dir(rq))
 		return 0;
 
 	/*
@@ -733,7 +733,8 @@ struct request *elv_next_request(request_queue_t *q)
 			blk_add_trace_rq(q, rq, BLK_TA_ISSUE);
 		}
 
-		if (!q->boundary_rq || q->boundary_rq == rq) {
+		if ((!q->boundary_rq || q->boundary_rq == rq) &&
+			!rq_is_bidi(rq)) {
 			q->end_sector = rq_end_sector(rq);
 			q->boundary_rq = NULL;
 		}
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 32daa55..0c78540 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2332,7 +2332,7 @@ static int __blk_rq_map_user(request_queue_t *q, struct request *rq,
 	struct bio *bio, *orig_bio;
 	int reading, ret;
 
-	reading = rq_data_dir(rq) == READ;
+	reading = rq_rw_dir(rq) == READ;
 
 	/*
 	 * if alignment requirement is satisfied, map in user pages for
@@ -2476,7 +2476,7 @@ int blk_rq_map_user_iov(request_queue_t *q, struct request *rq,
 	/* we don't allow misaligned data like bio_map_user() does.  If the
 	 * user is using sg, they're expected to know the alignment constraints
 	 * and respect them accordingly */
-	bio = bio_map_user_iov(q, NULL, iov, iov_count, rq_data_dir(rq)== READ);
+	bio = bio_map_user_iov(q, NULL, iov, iov_count, rq_rw_dir(rq)== READ);
 	if (IS_ERR(bio))
 		return PTR_ERR(bio);
 
@@ -2549,7 +2549,7 @@ int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
 	if (IS_ERR(bio))
 		return PTR_ERR(bio);
 
-	if (rq_data_dir(rq) == WRITE)
+	if (rq_rw_dir(rq) == WRITE)
 		bio->bi_rw |= (1 << BIO_RW);
 
 	blk_rq_bio_prep(q, rq, bio);
@@ -2660,7 +2660,7 @@ EXPORT_SYMBOL(blkdev_issue_flush);
 
 static void drive_stat_acct(struct request *rq, int nr_sectors, int new_io)
 {
-	int rw = rq_data_dir(rq);
+	int rw = rq_rw_dir(rq);
 
 	if (!blk_fs_request(rq) || !rq->rq_disk)
 		return;
@@ -2738,7 +2738,7 @@ void __blk_put_request(request_queue_t *q, struct request *req)
 	 * it didn't come out of our reserved rq pools
 	 */
 	if (req->cmd_flags & REQ_ALLOCED) {
-		int rw = rq_data_dir(req);
+		int rw = rq_rw_dir(req);
 		int priv = req->cmd_flags & REQ_ELVPRIV;
 
 		BUG_ON(!list_empty(&req->queuelist));
@@ -2804,7 +2804,7 @@ static int attempt_merge(request_queue_t *q, struct request *req,
 	if (req->sector + req->nr_sectors != next->sector)
 		return 0;
 
-	if (rq_data_dir(req) != rq_data_dir(next)
+	if (rq_rw_dir(req) != rq_rw_dir(next)
 	    || req->rq_disk != next->rq_disk
 	    || next->special)
 		return 0;
@@ -3333,7 +3333,7 @@ static int __end_that_request_first(struct request *req, int uptodate,
 	if (!blk_pc_request(req))
 		req->errors = 0;
 
-	if (!uptodate) {
+	if (error) {
 		if (blk_fs_request(req) && !(req->cmd_flags & REQ_QUIET))
 			printk("end_request: I/O error, dev %s, sector %llu\n",
 				req->rq_disk ? req->rq_disk->disk_name : "?",
@@ -3341,7 +3341,7 @@ static int __end_that_request_first(struct request *req, int uptodate,
 	}
 
 	if (blk_fs_request(req) && req->rq_disk) {
-		const int rw = rq_data_dir(req);
+		const int rw = rq_rw_dir(req);
 
 		disk_stat_add(req->rq_disk, sectors[rw], nr_bytes >> 9);
 	}
@@ -3565,7 +3565,7 @@ void end_that_request_last(struct request *req, int uptodate)
 	 */
 	if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
 		unsigned long duration = jiffies - req->start_time;
-		const int rw = rq_data_dir(req);
+		const int rw = rq_rw_dir(req);
 
 		__disk_stat_inc(disk, ios[rw]);
 		__disk_stat_add(disk, ticks[rw], duration);
@@ -3593,8 +3593,23 @@ EXPORT_SYMBOL(end_request);
 
 void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio)
 {
-	/* first two bits are identical in rq->cmd_flags and bio->bi_rw */
-	rq->cmd_flags |= (bio->bi_rw & 3);
+	if (bio_data_dir(bio))
+		rq->cmd_flags |= REQ_RW;
+	else
+		rq->cmd_flags &= ~REQ_RW;
+
+	if (bio->bi_rw & (1<<BIO_RW_SYNC))
+		rq->cmd_flags |= REQ_RW_SYNC;
+	else
+		rq->cmd_flags &= ~REQ_RW_SYNC;
+	/* FIXME: what about other flags, should we sync these too? */
+	/*
+	BIO_RW_AHEAD	==> ??
+	BIO_RW_BARRIER	==> REQ_SOFTBARRIER/REQ_HARDBARRIER
+	BIO_RW_FAILFAST	==> REQ_FAILFAST
+	BIO_RW_SYNC	==> REQ_RW_SYNC
+	BIO_RW_META	==> REQ_RW_META
+	*/
 
 	rq->nr_phys_segments = bio_phys_segments(q, bio);
 	rq->nr_hw_segments = bio_hw_segments(q, bio);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 918bb60..c528ab1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1688,7 +1688,7 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 
 	scmd->cmd_len			= 0;
 
-	scmd->sc_data_direction		= DMA_BIDIRECTIONAL;
+	scmd->sc_data_direction		= DMA_NONE;
 
 	init_timer(&scmd->eh_timeout);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9f7482d..1fc0471 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -259,8 +259,10 @@ static int scsi_merge_bio(struct request *rq, struct bio *bio)
 	struct request_queue *q = rq->q;
 
 	bio->bi_flags &= ~(1 << BIO_SEG_VALID);
-	if (rq_data_dir(rq) == WRITE)
+	if (rq_rw_dir(rq) == WRITE)
 		bio->bi_rw |= (1 << BIO_RW);
+	else
+		bio->bi_rw &= ~(1 << BIO_RW);
 	blk_queue_bounce(q, &bio);
 
 	if (!rq->bio)
@@ -392,6 +394,8 @@ int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
 	if (!sioc)
 		return DRIVER_ERROR << 24;
 
+	WARN_ON((data_direction == DMA_NONE) && bufflen);
+	WARN_ON((data_direction != DMA_NONE) && !bufflen);
 	req = blk_get_request(sdev->request_queue, write, gfp);
 	if (!req)
 		goto free_sense;
@@ -1124,12 +1128,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 	BUILD_BUG_ON(sizeof(req->cmd) > sizeof(cmd->cmnd));
 	memcpy(cmd->cmnd, req->cmd, sizeof(cmd->cmnd));
 	cmd->cmd_len = req->cmd_len;
-	if (!req->data_len)
-		cmd->sc_data_direction = DMA_NONE;
-	else if (rq_data_dir(req) == WRITE)
-		cmd->sc_data_direction = DMA_TO_DEVICE;
-	else
-		cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sc_data_direction = rq_dma_dir(req);
 	
 	cmd->transfersize = req->data_len;
 	cmd->allowed = req->retries;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 81e3bc7..46a1f7e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -733,8 +733,6 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 		data_dir = DMA_TO_DEVICE;
 		break;
 	case SG_DXFER_UNKNOWN:
-		data_dir = DMA_BIDIRECTIONAL;
-		break;
 	default:
 		data_dir = DMA_NONE;
 		break;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83dcd8c..c1121d2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -14,6 +14,7 @@
 #include <linux/bio.h>
 #include <linux/module.h>
 #include <linux/stringify.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/scatterlist.h>
 
@@ -177,7 +178,7 @@ enum {
 };
 
 /*
- * request type modified bits. first three bits match BIO_RW* bits, important
+ * request type modified bits.
  */
 enum rq_flag_bits {
 	__REQ_RW,		/* not set, read. set, write */
@@ -545,12 +546,52 @@ enum {
 
 #define list_entry_rq(ptr)	list_entry((ptr), struct request, queuelist)
 
-#define rq_data_dir(rq)		((rq)->cmd_flags & 1)
+static inline int rq_is_bidi(struct request* rq)
+{
+	/*
+	 * FIXME: It is needed below. Will be changed later in the 
+	 *        patchset to a real check, and fixme will be removed.
+	 */
+	return false;
+}
+
+static inline int rq_rw_dir(struct request* rq)
+{
+	int old_check = (rq->cmd_flags & REQ_RW) ? WRITE : READ;
+/*#ifdef 0
+	int ret = (rq->bio && bio_data_dir(rq->bio)) ? WRITE : READ;
+	WARN_ON(ret != old_check );
+#endif*/
+	return old_check;
+}
+
+static inline int rq_data_dir(struct request* rq)
+{
+	WARN_ON(rq_is_bidi(rq));
+	return rq_rw_dir(rq);
+}
+static inline enum dma_data_direction rq_dma_dir(struct request* rq)
+{
+	WARN_ON(rq_is_bidi(rq));
+	if (!rq->bio)
+		return DMA_NONE;
+	else
+		return bio_data_dir(rq->bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+}
+static inline const char* rq_dir_to_string(struct request* rq)
+{
+	if (!rq->bio)
+		return "no data command";
+	else
+		return bio_data_dir(rq->bio) ? 
+			"writing" : 
+			"reading";
+}
 
 /*
  * We regard a request as sync, if it's a READ or a SYNC write.
  */
-#define rq_is_sync(rq)		(rq_data_dir((rq)) == READ || (rq)->cmd_flags & REQ_RW_SYNC)
+#define rq_is_sync(rq)		(rq_rw_dir((rq)) == READ || (rq)->cmd_flags & REQ_RW_SYNC)
 #define rq_is_meta(rq)		((rq)->cmd_flags & REQ_RW_META)
 
 static inline int blk_queue_full(struct request_queue *q, int rw)
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 3680ff9..d9665b1 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -161,7 +161,13 @@ static inline void blk_add_trace_rq(struct request_queue *q, struct request *rq,
 				    u32 what)
 {
 	struct blk_trace *bt = q->blk_trace;
-	int rw = rq->cmd_flags & 0x03;
+	/* blktrace.c prints them according to bio flags */
+	int rw = (((rq_rw_dir(rq) == WRITE) << BIO_RW) |
+	          (((rq->cmd_flags & (REQ_SOFTBARRIER|REQ_HARDBARRIER)) != 0) <<
+	           BIO_RW_BARRIER) |
+	          (((rq->cmd_flags & REQ_FAILFAST) != 0) << BIO_RW_FAILFAST) |
+	          (((rq->cmd_flags & REQ_RW_SYNC) != 0) << BIO_RW_SYNC) |
+	          (((rq->cmd_flags & REQ_RW_META) != 0) << BIO_RW_META));
 
 	if (likely(!bt))
 		return;
-- 
1.5.0.4.402.g8035


[-- Attachment #4: 0003-block-bidi-support.patch --]
[-- Type: text/plain, Size: 23631 bytes --]

>From 7aeec62fe483359289aad9286f8dda149f2ce0d4 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@bh-buildlin2.(none)>
Date: Tue, 1 May 2007 21:09:50 +0300
Subject: [PATCH] block bidi support
- seperate request io members into a substructure (but in a backward compatible way)
  and add a second set of members for bidi_read.
- Add some bidi helpers to work on a bidi request:
  rq_in(), rq_out(), rq_io()
  blk_rq_bio_prep_bidi()
  blk_rq_map_kern_bidi()
  blk_rq_map_sg_bidi()
- change ll_back_merge_fn to support bidi / change only user - scsi_lib.c
  (Both will be removed in a future scsi cleanup)
- Add end_that_request_block that can clean after a bidi request
---
 block/elevator.c        |    7 +--
 block/ll_rw_blk.c       |  214 ++++++++++++++++++++++++++++++++++++-----------
 drivers/scsi/scsi_lib.c |    2 +-
 include/linux/blkdev.h  |  152 ++++++++++++++++++++++++----------
 4 files changed, 276 insertions(+), 99 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 18485f0..90f333e 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -755,14 +755,9 @@ struct request *elv_next_request(request_queue_t *q)
 			rq = NULL;
 			break;
 		} else if (ret == BLKPREP_KILL) {
-			int nr_bytes = rq->hard_nr_sectors << 9;
-
-			if (!nr_bytes)
-				nr_bytes = rq->data_len;
-
 			blkdev_dequeue_request(rq);
 			rq->cmd_flags |= REQ_QUIET;
-			end_that_request_chunk(rq, 0, nr_bytes);
+			end_that_request_block(rq, 0);
 			end_that_request_last(rq, 0);
 		} else {
 			printk(KERN_ERR "%s: bad return=%d\n", __FUNCTION__,
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 0c78540..7d98ba6 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -235,13 +235,19 @@ void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
 
 EXPORT_SYMBOL(blk_queue_make_request);
 
+static void rq_init_io_part(struct request_io_part* req_io)
+{
+	req_io->data_len = 0;
+	req_io->nr_phys_segments = 0;
+	req_io->bio = req_io->biotail = NULL;
+}
+
 static void rq_init(request_queue_t *q, struct request *rq)
 {
 	INIT_LIST_HEAD(&rq->queuelist);
 	INIT_LIST_HEAD(&rq->donelist);
 
 	rq->errors = 0;
-	rq->bio = rq->biotail = NULL;
 	INIT_HLIST_NODE(&rq->hash);
 	RB_CLEAR_NODE(&rq->rb_node);
 	rq->ioprio = 0;
@@ -249,13 +255,13 @@ static void rq_init(request_queue_t *q, struct request *rq)
 	rq->ref_count = 1;
 	rq->q = q;
 	rq->special = NULL;
-	rq->data_len = 0;
 	rq->data = NULL;
-	rq->nr_phys_segments = 0;
 	rq->sense = NULL;
 	rq->end_io = NULL;
 	rq->end_io_data = NULL;
 	rq->completion_data = NULL;
+	rq_init_io_part(&rq->uni);
+	rq_init_io_part(&rq->bidi_read);
 }
 
 /**
@@ -1304,14 +1310,16 @@ static int blk_hw_contig_segment(request_queue_t *q, struct bio *bio,
 }
 
 /*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
+ * map a request_io_part to scatterlist, return number of sg entries setup.
+ * Caller must make sure sg can hold rq_io(rq, rw)->nr_phys_segments entries
  */
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+int blk_rq_map_sg_bidi(request_queue_t *q, struct request *rq,
+	struct scatterlist *sg, int rw)
 {
 	struct bio_vec *bvec, *bvprv;
 	struct bio *bio;
 	int nsegs, i, cluster;
+	struct request_io_part* req_io = rq_io(rq, rw);
 
 	nsegs = 0;
 	cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
@@ -1320,7 +1328,7 @@ int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg
 	 * for each bio in rq
 	 */
 	bvprv = NULL;
-	rq_for_each_bio(bio, rq) {
+	for (bio = req_io->bio; bio; bio = bio->bi_next) {
 		/*
 		 * for each segment in bio
 		 */
@@ -1352,7 +1360,17 @@ new_segment:
 
 	return nsegs;
 }
+EXPORT_SYMBOL(blk_rq_map_sg_bidi);
 
+/*
+ * map a request to scatterlist, return number of sg entries setup. Caller
+ * must make sure sg can hold rq->nr_phys_segments entries
+ */
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,
+                  struct scatterlist *sg)
+{
+	return blk_rq_map_sg_bidi(q, rq, sg, rq_data_dir(rq));
+}
 EXPORT_SYMBOL(blk_rq_map_sg);
 
 /*
@@ -1362,11 +1380,12 @@ EXPORT_SYMBOL(blk_rq_map_sg);
 
 static inline int ll_new_mergeable(request_queue_t *q,
 				   struct request *req,
-				   struct bio *bio)
+				   struct bio *bio,
+				   struct request_io_part* req_io)
 {
 	int nr_phys_segs = bio_phys_segments(q, bio);
 
-	if (req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
+	if (req_io->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
 			q->last_merge = NULL;
@@ -1377,19 +1396,20 @@ static inline int ll_new_mergeable(request_queue_t *q,
 	 * A hw segment is just getting larger, bump just the phys
 	 * counter.
 	 */
-	req->nr_phys_segments += nr_phys_segs;
+	req_io->nr_phys_segments += nr_phys_segs;
 	return 1;
 }
 
 static inline int ll_new_hw_segment(request_queue_t *q,
 				    struct request *req,
-				    struct bio *bio)
+				    struct bio *bio,
+				    struct request_io_part* req_io)
 {
 	int nr_hw_segs = bio_hw_segments(q, bio);
 	int nr_phys_segs = bio_phys_segments(q, bio);
 
-	if (req->nr_hw_segments + nr_hw_segs > q->max_hw_segments
-	    || req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
+	if (req_io->nr_hw_segments + nr_hw_segs > q->max_hw_segments
+	    || req_io->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
 			q->last_merge = NULL;
@@ -1400,46 +1420,48 @@ static inline int ll_new_hw_segment(request_queue_t *q,
 	 * This will form the start of a new hw segment.  Bump both
 	 * counters.
 	 */
-	req->nr_hw_segments += nr_hw_segs;
-	req->nr_phys_segments += nr_phys_segs;
+	req_io->nr_hw_segments += nr_hw_segs;
+	req_io->nr_phys_segments += nr_phys_segs;
 	return 1;
 }
 
-int ll_back_merge_fn(request_queue_t *q, struct request *req, struct bio *bio)
+int ll_back_merge_fn(request_queue_t *q, struct request *req, struct bio *bio, int rw)
 {
 	unsigned short max_sectors;
 	int len;
+	struct request_io_part* req_io = rq_io(req, rw);
 
 	if (unlikely(blk_pc_request(req)))
 		max_sectors = q->max_hw_sectors;
 	else
 		max_sectors = q->max_sectors;
 
-	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
+	if (req_io->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
 			q->last_merge = NULL;
 		return 0;
 	}
-	if (unlikely(!bio_flagged(req->biotail, BIO_SEG_VALID)))
-		blk_recount_segments(q, req->biotail);
+	if (unlikely(!bio_flagged(req_io->biotail, BIO_SEG_VALID)))
+		blk_recount_segments(q, req_io->biotail);
 	if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
 		blk_recount_segments(q, bio);
-	len = req->biotail->bi_hw_back_size + bio->bi_hw_front_size;
-	if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(req->biotail), __BVEC_START(bio)) &&
+	len = req_io->biotail->bi_hw_back_size + bio->bi_hw_front_size;
+	if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(req_io->biotail),
+	                          __BVEC_START(bio)) &&
 	    !BIOVEC_VIRT_OVERSIZE(len)) {
-		int mergeable =  ll_new_mergeable(q, req, bio);
+		int mergeable =  ll_new_mergeable(q, req, bio, req_io);
 
 		if (mergeable) {
-			if (req->nr_hw_segments == 1)
-				req->bio->bi_hw_front_size = len;
+			if (req_io->nr_hw_segments == 1)
+				req_io->bio->bi_hw_front_size = len;
 			if (bio->bi_hw_segments == 1)
 				bio->bi_hw_back_size = len;
 		}
 		return mergeable;
 	}
 
-	return ll_new_hw_segment(q, req, bio);
+	return ll_new_hw_segment(q, req, bio, req_io);
 }
 EXPORT_SYMBOL(ll_back_merge_fn);
 
@@ -1454,6 +1476,7 @@ static int ll_front_merge_fn(request_queue_t *q, struct request *req,
 	else
 		max_sectors = q->max_sectors;
 
+	WARN_ON(rq_is_bidi(req));
 
 	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
@@ -1468,7 +1491,7 @@ static int ll_front_merge_fn(request_queue_t *q, struct request *req,
 		blk_recount_segments(q, req->bio);
 	if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(bio), __BVEC_START(req->bio)) &&
 	    !BIOVEC_VIRT_OVERSIZE(len)) {
-		int mergeable =  ll_new_mergeable(q, req, bio);
+		int mergeable =  ll_new_mergeable(q, req, bio, &req->uni);
 
 		if (mergeable) {
 			if (bio->bi_hw_segments == 1)
@@ -1479,7 +1502,7 @@ static int ll_front_merge_fn(request_queue_t *q, struct request *req,
 		return mergeable;
 	}
 
-	return ll_new_hw_segment(q, req, bio);
+	return ll_new_hw_segment(q, req, bio, &req->uni);
 }
 
 static int ll_merge_requests_fn(request_queue_t *q, struct request *req,
@@ -2358,7 +2381,7 @@ static int __blk_rq_map_user(request_queue_t *q, struct request *rq,
 
 	if (!rq->bio)
 		blk_rq_bio_prep(q, rq, bio);
-	else if (!ll_back_merge_fn(q, rq, bio)) {
+	else if (!ll_back_merge_fn(q, rq, bio, rq_data_dir(rq))) {
 		ret = -EINVAL;
 		goto unmap_bio;
 	} else {
@@ -2528,15 +2551,18 @@ int blk_rq_unmap_user(struct bio *bio)
 EXPORT_SYMBOL(blk_rq_unmap_user);
 
 /**
- * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage
+ * blk_rq_map_kern_bidi - maps kernel data to a request_io_part, for BIDI usage
  * @q:		request queue where request should be inserted
  * @rq:		request to fill
  * @kbuf:	the kernel buffer
  * @len:	length of user data
  * @gfp_mask:	memory allocation flags
+ * @rw:        if it is a bidirectional request than WRITE to prepare
+ *              the bidi_write side or READ to prepare the bidi_read
+ *              side, else it should be same as rq_data_dir(rq)
  */
-int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
-		    unsigned int len, gfp_t gfp_mask)
+int blk_rq_map_kern_bidi(request_queue_t *q, struct request *rq, void *kbuf,
+	unsigned int len, gfp_t gfp_mask, int rw)
 {
 	struct bio *bio;
 
@@ -2549,14 +2575,29 @@ int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
 	if (IS_ERR(bio))
 		return PTR_ERR(bio);
 
-	if (rq_rw_dir(rq) == WRITE)
+	if (rw == WRITE)
 		bio->bi_rw |= (1 << BIO_RW);
 
-	blk_rq_bio_prep(q, rq, bio);
+	blk_rq_bio_prep_bidi(q, rq, bio ,rw);
 	rq->buffer = rq->data = NULL;
 	return 0;
 }
 
+EXPORT_SYMBOL(blk_rq_map_kern_bidi);
+
+/**
+ * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage
+ * @q:		request queue where request should be inserted
+ * @rq:		request to fill
+ * @kbuf:	the kernel buffer
+ * @len:	length of user data
+ * @gfp_mask:	memory allocation flags
+ */
+int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf,
+		    unsigned int len, gfp_t gfp_mask)
+{
+	return blk_rq_map_kern_bidi( q, rq, kbuf, len, gfp_mask, rq_data_dir(rq));
+}
 EXPORT_SYMBOL(blk_rq_map_kern);
 
 /**
@@ -2865,6 +2906,19 @@ static inline int attempt_front_merge(request_queue_t *q, struct request *rq)
 	return 0;
 }
 
+static void init_req_io_part_from_bio(struct request_queue* q,
+	struct request_io_part *req_io, struct bio *bio)
+{
+	req_io->hard_sector = req_io->sector = bio->bi_sector;
+	req_io->hard_nr_sectors = req_io->nr_sectors = bio_sectors(bio);
+	req_io->current_nr_sectors =
+		req_io->hard_cur_sectors = bio_cur_sectors(bio);
+	req_io->nr_phys_segments = bio_phys_segments(q, bio);
+	req_io->nr_hw_segments = bio_hw_segments(q, bio);
+	req_io->bio = req_io->biotail = bio;
+	req_io->data_len = bio->bi_size;
+}
+
 static void init_request_from_bio(struct request *req, struct bio *bio)
 {
 	req->cmd_type = REQ_TYPE_FS;
@@ -2887,14 +2941,10 @@ static void init_request_from_bio(struct request *req, struct bio *bio)
 		req->cmd_flags |= REQ_RW_META;
 
 	req->errors = 0;
-	req->hard_sector = req->sector = bio->bi_sector;
-	req->hard_nr_sectors = req->nr_sectors = bio_sectors(bio);
-	req->current_nr_sectors = req->hard_cur_sectors = bio_cur_sectors(bio);
-	req->nr_phys_segments = bio_phys_segments(req->q, bio);
-	req->nr_hw_segments = bio_hw_segments(req->q, bio);
 	req->buffer = bio_data(bio);	/* see ->buffer comment above */
-	req->bio = req->biotail = bio;
 	req->ioprio = bio_prio(bio);
+	WARN_ON(rq_is_bidi(req));
+	init_req_io_part_from_bio(req->q, &req->uni, bio);
 	req->rq_disk = bio->bi_bdev->bd_disk;
 	req->start_time = jiffies;
 }
@@ -2931,7 +2981,7 @@ static int __make_request(request_queue_t *q, struct bio *bio)
 		case ELEVATOR_BACK_MERGE:
 			BUG_ON(!rq_mergeable(req));
 
-			if (!ll_back_merge_fn(q, req, bio))
+			if (!ll_back_merge_fn(q, req, bio, rq_data_dir(req)))
 				break;
 
 			blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE);
@@ -3405,6 +3455,8 @@ static int __end_that_request_first(struct request *req, int uptodate,
 	if (!req->bio)
 		return 0;
 
+	WARN_ON(rq_is_bidi(req));
+
 	/*
 	 * if the request wasn't completed, update state
 	 */
@@ -3464,6 +3516,47 @@ int end_that_request_chunk(struct request *req, int uptodate, int nr_bytes)
 
 EXPORT_SYMBOL(end_that_request_chunk);
 
+static void __end_req_io_block(struct request_io_part *req_io, int error)
+{
+	struct bio *next, *bio = req_io->bio;
+	req_io->bio = NULL;
+
+	for (; bio; bio = next) {
+		next = bio->bi_next;
+		bio_endio(bio, bio->bi_size, error);
+	}
+}
+
+/**
+ * end_that_request_block - end ALL I/O on a request in one "shloop",
+ * including the bidi part.
+ * @req:      the request being processed
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
+ *
+ * Description:
+ *     Ends ALL I/O on @req, both read/write or bidi. frees all bio resources.
+ **/
+void end_that_request_block(struct request *req, int uptodate)
+{
+	if (blk_pc_request(req)) {
+		int error = 0;
+		if (end_io_error(uptodate))
+			error = !uptodate ? -EIO : uptodate;
+		blk_add_trace_rq(req->q, req, BLK_TA_COMPLETE);
+
+		__end_req_io_block(&req->uni, error);
+		if (rq_is_bidi(req))
+			__end_req_io_block(&req->bidi_read, 0);
+	} else { /* needs elevator bookeeping */
+		int nr_bytes = req->uni.hard_nr_sectors << 9;
+		if (!nr_bytes)
+			nr_bytes = req->uni.data_len;
+		end_that_request_chunk(req, uptodate, nr_bytes);
+	}
+}
+
+EXPORT_SYMBOL(end_that_request_block);
+
 /*
  * splice the completion data to a local structure and hand off to
  * process_completion_queue() to complete the requests
@@ -3591,8 +3684,40 @@ void end_request(struct request *req, int uptodate)
 
 EXPORT_SYMBOL(end_request);
 
+static struct request_io_part* blk_rq_choose_set_io(struct request *rq, int rw)
+{
+	if (rw == WRITE){
+		/* this is a memory leak it must not happen */
+		BUG_ON((rq_rw_dir(rq) == WRITE) && (rq->uni.bio != NULL));
+		if(rq->uni.bio != NULL)
+			rq->bidi_read = rq->uni;
+		rq->cmd_flags |= REQ_RW ;
+		return &rq->uni;
+	}
+	else {
+		BUG_ON((rq_rw_dir(rq) == READ) && (rq->uni.bio != NULL));
+		BUG_ON(rq->bidi_read.bio != NULL);
+		if(rq->uni.bio != NULL)
+			return &rq->bidi_read;
+		else {
+			rq->cmd_flags &= ~REQ_RW ;
+			return &rq->uni;
+		}
+	}
+}
+
+void blk_rq_bio_prep_bidi(request_queue_t *q, struct request *rq,
+	struct bio *bio, int rw)
+{
+	init_req_io_part_from_bio(q, blk_rq_choose_set_io(rq, rw), bio);
+	rq->buffer = NULL;
+}
+EXPORT_SYMBOL(blk_rq_bio_prep_bidi);
+
 void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio)
 {
+	WARN_ON(rq_is_bidi(rq));
+
 	if (bio_data_dir(bio))
 		rq->cmd_flags |= REQ_RW;
 	else
@@ -3611,15 +3736,8 @@ void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio)
 	BIO_RW_META	==> REQ_RW_META
 	*/
 
-	rq->nr_phys_segments = bio_phys_segments(q, bio);
-	rq->nr_hw_segments = bio_hw_segments(q, bio);
-	rq->current_nr_sectors = bio_cur_sectors(bio);
-	rq->hard_cur_sectors = rq->current_nr_sectors;
-	rq->hard_nr_sectors = rq->nr_sectors = bio_sectors(bio);
+	init_req_io_part_from_bio(q, &rq->uni, bio);
 	rq->buffer = bio_data(bio);
-	rq->data_len = bio->bi_size;
-
-	rq->bio = rq->biotail = bio;
 }
 
 EXPORT_SYMBOL(blk_rq_bio_prep);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1fc0471..5c80712 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -267,7 +267,7 @@ static int scsi_merge_bio(struct request *rq, struct bio *bio)
 
 	if (!rq->bio)
 		blk_rq_bio_prep(q, rq, bio);
-	else if (!ll_back_merge_fn(q, rq, bio))
+	else if (!ll_back_merge_fn(q, rq, bio, rq_data_dir(rq)))
 		return -EINVAL;
 	else {
 		rq->biotail->bi_next = bio;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c1121d2..23c2891 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -224,6 +224,44 @@ enum rq_flag_bits {
 #define BLK_MAX_CDB	16
 
 /*
+ * request io members. one for uni read/write and one for bidi_read
+ * This is a bidi hack so refactoring of code is simple but the main
+ * request interface stays the same.
+ * NOTE: member names here must not be reused inside struct request
+ *       as they will conflict
+ */
+#define REQUEST_IO_PART_MEMBERS \
+	unsigned int data_len; \
+ \
+	/* Maintain bio traversal state for part by part I/O submission. \
+	 * hard_* are block layer internals, no driver should touch them! \
+	 */ \
+	sector_t sector;		/* next sector to submit */ \
+	sector_t hard_sector;		/* next sector to complete */ \
+	unsigned long nr_sectors;	/* no. of sectors left to submit */ \
+	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */ \
+	/* no. of sectors left to submit in the current segment */ \
+	unsigned int current_nr_sectors; \
+ \
+	/* no. of sectors left to complete in the current segment */ \
+	unsigned int hard_cur_sectors; \
+ \
+	struct bio *bio; \
+	struct bio *biotail; \
+ \
+	/* Number of scatter-gather DMA addr+len pairs after \
+	 * physical address coalescing is performed. \
+	 */ \
+	unsigned short nr_phys_segments; \
+ \
+	/* Number of scatter-gather addr+len pairs after \
+	 * physical and DMA remapping hardware coalescing is performed. \
+	 * This is the number of scatter-gather entries the driver \
+	 * will actually have to deal with after DMA mapping is done. \
+	 */ \
+	unsigned short nr_hw_segments;
+
+/*
  * try to put the fields that are referenced together in the same cacheline
  */
 struct request {
@@ -235,23 +273,6 @@ struct request {
 	unsigned int cmd_flags;
 	enum rq_cmd_type_bits cmd_type;
 
-	/* Maintain bio traversal state for part by part I/O submission.
-	 * hard_* are block layer internals, no driver should touch them!
-	 */
-
-	sector_t sector;		/* next sector to submit */
-	sector_t hard_sector;		/* next sector to complete */
-	unsigned long nr_sectors;	/* no. of sectors left to submit */
-	unsigned long hard_nr_sectors;	/* no. of sectors left to complete */
-	/* no. of sectors left to submit in the current segment */
-	unsigned int current_nr_sectors;
-
-	/* no. of sectors left to complete in the current segment */
-	unsigned int hard_cur_sectors;
-
-	struct bio *bio;
-	struct bio *biotail;
-
 	struct hlist_node hash;	/* merge hash */
 	/*
 	 * The rb_node is only used inside the io scheduler, requests
@@ -273,22 +294,11 @@ struct request {
 	struct gendisk *rq_disk;
 	unsigned long start_time;
 
-	/* Number of scatter-gather DMA addr+len pairs after
-	 * physical address coalescing is performed.
-	 */
-	unsigned short nr_phys_segments;
-
-	/* Number of scatter-gather addr+len pairs after
-	 * physical and DMA remapping hardware coalescing is performed.
-	 * This is the number of scatter-gather entries the driver
-	 * will actually have to deal with after DMA mapping is done.
-	 */
-	unsigned short nr_hw_segments;
-
 	unsigned short ioprio;
 
 	void *special;
-	char *buffer;
+	char *buffer;			/* FIXME: should be Deprecated */
+	void *data;			/* FIXME: should be Deprecated */
 
 	int tag;
 	int errors;
@@ -301,9 +311,7 @@ struct request {
 	unsigned int cmd_len;
 	unsigned char cmd[BLK_MAX_CDB];
 
-	unsigned int data_len;
 	unsigned int sense_len;
-	void *data;
 	void *sense;
 
 	unsigned int timeout;
@@ -314,6 +322,21 @@ struct request {
 	 */
 	rq_end_io_fn *end_io;
 	void *end_io_data;
+
+	/* Hack for bidi: this tells the compiler to keep all these members
+	 * aligned the same as the struct request_io_part so we can access
+	 * them either directly or through the structure.
+	 */
+	
+	union {
+		struct request_io_part {
+			REQUEST_IO_PART_MEMBERS;
+		} uni;
+		struct {
+			REQUEST_IO_PART_MEMBERS;
+		};
+	};
+	struct request_io_part bidi_read;
 };
 
 /*
@@ -548,21 +571,12 @@ enum {
 
 static inline int rq_is_bidi(struct request* rq)
 {
-	/*
-	 * FIXME: It is needed below. Will be changed later in the 
-	 *        patchset to a real check, and fixme will be removed.
-	 */
-	return false;
+	return rq->bidi_read.bio != NULL;
 }
 
 static inline int rq_rw_dir(struct request* rq)
 {
-	int old_check = (rq->cmd_flags & REQ_RW) ? WRITE : READ;
-/*#ifdef 0
-	int ret = (rq->bio && bio_data_dir(rq->bio)) ? WRITE : READ;
-	WARN_ON(ret != old_check );
-#endif*/
-	return old_check;
+	return (rq->cmd_flags & REQ_RW) ? WRITE : READ;
 }
 
 static inline int rq_data_dir(struct request* rq)
@@ -582,12 +596,36 @@ static inline const char* rq_dir_to_string(struct request* rq)
 {
 	if (!rq->bio)
 		return "no data command";
+	else if (rq_is_bidi(rq))
+		return "bidirectional";
 	else
 		return bio_data_dir(rq->bio) ? 
 			"writing" : 
 			"reading";
 }
 
+static inline struct request_io_part* rq_out(struct request* req)
+{
+	return &req->uni;
+}
+
+static inline struct request_io_part* rq_in(struct request* req)
+{
+	if (rq_rw_dir(req))
+		return &req->bidi_read;
+
+	return &req->uni;
+}
+
+static inline struct request_io_part* rq_io(struct request* req, int rw)
+{
+	if (rw == READ)
+		return rq_in(req);
+
+	WARN_ON(rw != WRITE);
+	return rq_out(req);
+}
+
 /*
  * We regard a request as sync, if it's a READ or a SYNC write.
  */
@@ -684,7 +722,8 @@ extern int sg_scsi_ioctl(struct file *, struct request_queue *,
 /*
  * Temporary export, until SCSI gets fixed up.
  */
-extern int ll_back_merge_fn(request_queue_t *, struct request *, struct bio *);
+extern int ll_back_merge_fn(request_queue_t *, struct request *, struct bio *,
+	int rw);
 
 /*
  * A queue has just exitted congestion.  Note this in the global counter of
@@ -755,6 +794,15 @@ extern void end_request(struct request *req, int uptodate);
 extern void blk_complete_request(struct request *);
 
 /*
+ * end_request_block will complete and free all bio resources held
+ * by the request in one call. User will still need to call
+ * end_that_request_last(..).
+ * It is the only one that can deal with BIDI.
+ * can be called for parial bidi allocation and cleanup.
+ */
+extern void end_that_request_block(struct request *req, int uptodate);
+
+/*
  * end_that_request_first/chunk() takes an uptodate argument. we account
  * any value <= as an io error. 0 means -EIO for compatability reasons,
  * any other < 0 value is the direct error type. An uptodate value of
@@ -833,6 +881,22 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 extern void blk_rq_bio_prep(request_queue_t *, struct request *, struct bio *);
 extern int blkdev_issue_flush(struct block_device *, sector_t *);
 
+/* 
+ * BIDI API
+ *   build a request. for bidi requests must be called twice to map/prepare
+ *   the data-in and data-out buffers, one at a time according to
+ *   the given rw READ/WRITE param.
+ */
+extern void blk_rq_bio_prep_bidi(request_queue_t *, struct request *,
+	struct bio *, int rw);
+extern int blk_rq_map_kern_bidi(request_queue_t *, struct request *,
+	void *, unsigned int, gfp_t, int rw);
+/* retrieve the mapped pages for bidi according to
+ * the given dma_data_direction
+ */
+extern int blk_rq_map_sg_bidi(request_queue_t *, struct request *,
+	struct scatterlist *, int rw);
+
 #define MAX_PHYS_SEGMENTS 128
 #define MAX_HW_SEGMENTS 128
 #define SAFE_MAX_SECTORS 255
-- 
1.5.0.4.402.g8035


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-05-01 18:22                   ` Boaz Harrosh
@ 2007-05-01 18:57                     ` Jens Axboe
  2007-05-01 19:01                       ` FUJITA Tomonori
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2007-05-01 18:57 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Christoph Hellwig, Douglas Gilbert, Benny Halevy,
	FUJITA Tomonori, akpm, michaelc, linux-scsi, linux-ide

On Tue, May 01 2007, Boaz Harrosh wrote:
> Please consider the attached proposal. It is a complete block-level bidi
> implementation that is, I hope, a middle ground which will keep everyone
> happy (including Christoph). It is both quite small and not invasive,
> yet has a full bidi API that is easy to use and maintain.

This isn't much of an improvement imo, if any at all. Why didn't you do
the ->next_rq approach I suggested? Your patch still makes struct
request considerably fatter (30% here, from 280 to 368 bytes on x86-64
from a quick look) for something that will have relatively few uses. And
it still has its paws all over the block layer code.

Please just implement the 2nd data phase as a linked request off the
first one. I think that approach is both much cleaner from a design
perspective, and also much leaner and has zero (well almost, it costs a
pointer) impact on the regular read-write paths.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-05-01 18:57                     ` Jens Axboe
@ 2007-05-01 19:01                       ` FUJITA Tomonori
  0 siblings, 0 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2007-05-01 19:01 UTC (permalink / raw)
  To: jens.axboe
  Cc: bharrosh, James.Bottomley, hch, dougg, bhalevy, tomof, akpm,
	michaelc, linux-scsi, linux-ide

From: Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 4/4] bidi support: bidirectional request
Date: Tue, 1 May 2007 20:57:20 +0200

> On Tue, May 01 2007, Boaz Harrosh wrote:
> > Please consider the attached proposal. It is a complete block-level bidi
> > implementation that is, I hope, a middle ground which will keep everyone
> > happy (including Christoph). It is both quite small and not invasive,
> > yet has a full bidi API that is easy to use and maintain.
> 
> This isn't much of an improvement imo, if any at all. Why didn't you do
> the ->next_rq approach I suggested? Your patch still makes struct
> request considerably fatter (30% here, from 280 to 368 bytes on x86-64
> from a quick look) for something that will have relatively few uses. And
> it still has its paws all over the block layer code.
> 
> Please just implement the 2nd data phase as a linked request off the
> first one. I think that approach is both much cleaner from a design
> perspective, and also much leaner and has zero (well almost, it costs a
> pointer) impact on the regular read-write paths.

I will send a next_rq patch shortly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 11:11         ` Jens Axboe
  2007-04-30 11:53           ` Benny Halevy
@ 2007-04-30 13:05           ` Mark Lord
  2007-04-30 13:07             ` Jens Axboe
  2007-05-01 19:50           ` FUJITA Tomonori
  2 siblings, 1 reply; 22+ messages in thread
From: Mark Lord @ 2007-04-30 13:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James Bottomley, Boaz Harrosh, FUJITA Tomonori, akpm, michaelc,
	hch, linux-scsi, linux-ide, bhalevy

Jens Axboe wrote:
>
> So basically just add a struct request pointer, so you can do rq =
> rq->next_rq or something for the next data phase. I bet this would be a
> LOT less invasive as well, and we can get by with a few helpers to
> support it.

Hey, I want a way to issue those (linked requests) from userspace (SG_IO), too.
Specifically for use with the new SMART Command Transport (SCT) feature set
on modern SATA drives.  As well as for a disk recovery utility I'm working on.

Sounds generally useful, that.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 13:05           ` Mark Lord
@ 2007-04-30 13:07             ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2007-04-30 13:07 UTC (permalink / raw)
  To: Mark Lord
  Cc: James Bottomley, Boaz Harrosh, FUJITA Tomonori, akpm, michaelc,
	hch, linux-scsi, linux-ide, bhalevy

On Mon, Apr 30 2007, Mark Lord wrote:
> Jens Axboe wrote:
> >
> >So basically just add a struct request pointer, so you can do rq =
> >rq->next_rq or something for the next data phase. I bet this would be a
> >LOT less invasive as well, and we can get by with a few helpers to
> >support it.
> 
> Hey, I want a way to issue those (linked requests) from userspace (SG_IO), 
> too.
> Specifically for use with the new SMART Command Transport (SCT) feature set
> on modern SATA drives.  As well as for a disk recovery utility I'm working 
> on.
> 
> Sounds generally useful, that.

Yep, one of the reasons why I like (my :-) proposal as well, we could
potentially find other uses for linking commands like that.
> 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] bidi support: bidirectional request
  2007-04-30 11:11         ` Jens Axboe
  2007-04-30 11:53           ` Benny Halevy
  2007-04-30 13:05           ` Mark Lord
@ 2007-05-01 19:50           ` FUJITA Tomonori
  2 siblings, 0 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2007-05-01 19:50 UTC (permalink / raw)
  To: jens.axboe
  Cc: James.Bottomley, bharrosh, tomof, akpm, michaelc, hch, linux-scsi,
	linux-ide, bhalevy

From: Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 4/4] bidi support: bidirectional request
Date: Mon, 30 Apr 2007 13:11:57 +0200

> On Sun, Apr 29 2007, James Bottomley wrote:
> > On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
> > > FUJITA Tomonori wrote:
> > > > From: Boaz Harrosh <bharrosh@panasas.com>
> > > > Subject: [PATCH 4/4] bidi support: bidirectional request
> > > > Date: Sun, 15 Apr 2007 20:33:28 +0300
> > > > 
> > > >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > > >> index 645d24b..16a02ee 100644
> > > >> --- a/include/linux/blkdev.h
> > > >> +++ b/include/linux/blkdev.h
> > > >> @@ -322,6 +322,7 @@ struct request {
> > > >>      void *end_io_data;
> > > >>
> > > >>      struct request_io_part uni;
> > > >> +    struct request_io_part bidi_read;
> > > >>  };
> > > > 
> > > > Would be more straightforward to have:
> > > > 
> > > > struct request_io_part in;
> > > > struct request_io_part out;
> > > > 
> > > 
> > > Yes I wish I could do that. For bidi supporting drivers this is the most logical.
> > > But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
> > > the hotish paths, this means we will need a pointer to a uni request_io_part.
> > > This is bad because:
> > > 1st- There is no defined stage in a request life where to definitely set that pointer,
> > >      specially in the preparation stages.
> > > 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
> > >      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
> > >      (not sent yet) but a more long term solution is needed. Once such hacks are
> > >      cleaned up we can do what you say. This is exactly why I use the access functions
> > >      rq_uni/rq_io/rq_in/rq_out and not open code access.
> > 
> > I'm still not really convinced about this approach.  The primary job of
> > the block layer is to manage and merge READ and WRITE requests.  It
> > serves a beautiful secondary function of queueing for arbitrary requests
> > it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
> > indeed any non REQ_TYPE_FS).
> > 
> > bidirectional requests fall into the latter category (there's nothing
> > really we can do to merge them ... they're just transported by the block
> > layer).  The only unusual feature is that they carry two bios.  I think
> > the drivers that actually support bidirectional will be a rarity, so it
> > might even be advisable to add it to the queue capability (refuse
> > bidirectional requests at the top rather than perturbing all the drivers
> > to process them).
> > 
> > So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
> > remove it from the standard path and put it on the special command type
> > path where we can process it specially.  Additionally, if you take this
> > approach, you can probably simply chain the second bio through
> > req->special as an additional request in the stream.  The only thing
> > that would then need modification would be the dequeue of the block
> > driver (it would have to dequeue both requests and prepare them) and
> > that needs to be done only for drivers handling bidirectional requests.
> 
> I agree, I'm really not crazy about shuffling the entire request setup
> around just for something as exotic as bidirection commands. How about
> just keeping it simple - have a second request linked off the first one
> for the second data phase? So keep it completely seperate, not just
> overload ->special for 2nd bio list.
> 
> So basically just add a struct request pointer, so you can do rq =
> rq->next_rq or something for the next data phase. I bet this would be a
> LOT less invasive as well, and we can get by with a few helpers to
> support it.

This patch tried this approach. It's just for seeing how it works.

I added bidi support to open-iscsi and bsg and tested this patch
lightly. I've attached only a patch for the block layer and scsl-ml.
You can get all the patches are:

http://www.kernel.org/pub/linux/kernel/people/tomo/bidi

If we go with this approach, we need just minor changes to the block
layer. The overloading rq->special approach needs more but it's
reasonable too.

I need to add the proper error handling code, which might be a bit
tricky, but I think that it will not be so complicated.


> And it should definitely be a request type.

I'm not sure about this. I think that bidi can't be a request type to
trace bidi pc requests (we have bidi special requests like SMP). I use
REQ_BIDI though I've not implemented bidi trace code.


>From 7d278323ff8aad86fb82c823538f7ddfb6ded11c Mon Sep 17 00:00:00 2001
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Wed, 2 May 2007 03:55:56 +0900
Subject: [PATCH] add bidi support

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 block/ll_rw_blk.c        |    1 +
 drivers/scsi/scsi_lib.c  |   72 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/blkdev.h   |    7 ++++
 include/scsi/scsi_cmnd.h |    9 ++++++
 4 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index cf8752a..82842d6 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -256,6 +256,7 @@ static void rq_init(request_queue_t *q,
 	rq->end_io = NULL;
 	rq->end_io_data = NULL;
 	rq->completion_data = NULL;
+	rq->next_rq = NULL;
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index be8e655..96541cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -701,34 +701,36 @@ static struct scsi_cmnd *scsi_end_reques
 	return NULL;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static struct scatterlist *do_scsi_alloc_sgtable(unsigned short use_sg,
+						 unsigned short *sglist_len,
+						 gfp_t gfp_mask)
 {
 	struct scsi_host_sg_pool *sgp;
 	struct scatterlist *sgl;
 
-	BUG_ON(!cmd->use_sg);
+	BUG_ON(!use_sg);
 
-	switch (cmd->use_sg) {
+	switch (use_sg) {
 	case 1 ... 8:
-		cmd->sglist_len = 0;
+		*sglist_len = 0;
 		break;
 	case 9 ... 16:
-		cmd->sglist_len = 1;
+		*sglist_len = 1;
 		break;
 	case 17 ... 32:
-		cmd->sglist_len = 2;
+		*sglist_len = 2;
 		break;
 #if (SCSI_MAX_PHYS_SEGMENTS > 32)
 	case 33 ... 64:
-		cmd->sglist_len = 3;
+		*sglist_len = 3;
 		break;
 #if (SCSI_MAX_PHYS_SEGMENTS > 64)
 	case 65 ... 128:
-		cmd->sglist_len = 4;
+		*sglist_len = 4;
 		break;
 #if (SCSI_MAX_PHYS_SEGMENTS  > 128)
 	case 129 ... 256:
-		cmd->sglist_len = 5;
+		*sglist_len = 5;
 		break;
 #endif
 #endif
@@ -737,11 +739,15 @@ #endif
 		return NULL;
 	}
 
-	sgp = scsi_sg_pools + cmd->sglist_len;
+	sgp = scsi_sg_pools + *sglist_len;
 	sgl = mempool_alloc(sgp->pool, gfp_mask);
 	return sgl;
 }
 
+struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+{
+	return do_scsi_alloc_sgtable(cmd->use_sg, &cmd->sglist_len, gfp_mask);
+}
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
 void scsi_free_sgtable(struct scatterlist *sgl, int index)
@@ -778,6 +784,9 @@ static void scsi_release_buffers(struct
 	if (cmd->use_sg)
 		scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
 
+	if (cmd->ext_request_buffer.use_sg)
+		scsi_free_sgtable(cmd->ext_request_buffer.request_buffer,
+				  cmd->ext_request_buffer.sglist_len);
 	/*
 	 * Zero these out.  They now point to freed memory, and it is
 	 * dangerous to hang onto the pointers.
@@ -1106,9 +1115,48 @@ static int scsi_setup_blk_pc_cmnd(struct
 
 		BUG_ON(!req->nr_phys_segments);
 
+		if (blk_bidi_rq(req)) {
+			BUG_ON(!req->next_rq);
+
+			if (rq_data_dir(req) != WRITE ||
+			    rq_data_dir(req->next_rq) != READ) {
+				scsi_release_buffers(cmd);
+				scsi_put_command(cmd);
+				return BLKPREP_KILL;
+			}
+		}
+
 		ret = scsi_init_io(cmd);
 		if (unlikely(ret))
 			return ret;
+
+		if (blk_bidi_rq(req)) {
+			struct scsi_data_buffer *sdb = &cmd->ext_request_buffer;
+			struct scatterlist *sgpnt;
+			int count;
+
+			sdb->use_sg = req->nr_phys_segments;
+
+			sgpnt = do_scsi_alloc_sgtable(sdb->use_sg,
+						      &sdb->sglist_len,
+						      GFP_ATOMIC);
+			if (unlikely(!sgpnt)) {
+				scsi_release_buffers(cmd);
+				scsi_unprep_request(req);
+				return BLKPREP_DEFER;
+			}
+
+			sdb->request_buffer = sgpnt;
+			sdb->request_bufflen = req->next_rq->data_len;
+			count = blk_rq_map_sg(req->q, req->next_rq, sgpnt);
+			if (unlikely(count > sdb->use_sg)) {
+				scsi_free_sgtable(sgpnt, sdb->sglist_len);
+				scsi_release_buffers(cmd);
+				scsi_put_command(cmd);
+				return BLKPREP_KILL;
+			}
+			sdb->use_sg = count;
+		}
 	} else {
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
@@ -1122,7 +1170,9 @@ static int scsi_setup_blk_pc_cmnd(struct
 	BUILD_BUG_ON(sizeof(req->cmd) > sizeof(cmd->cmnd));
 	memcpy(cmd->cmnd, req->cmd, sizeof(cmd->cmnd));
 	cmd->cmd_len = req->cmd_len;
-	if (!req->data_len)
+	if (blk_bidi_rq(req))
+		cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+	else if (!req->data_len)
 		cmd->sc_data_direction = DMA_NONE;
 	else if (rq_data_dir(req) == WRITE)
 		cmd->sc_data_direction = DMA_TO_DEVICE;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83dcd8c..9d3bb4a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -199,6 +199,7 @@ enum rq_flag_bits {
 	__REQ_ALLOCED,		/* request came from our alloc pool */
 	__REQ_RW_META,		/* metadata io request */
 	__REQ_NR_BITS,		/* stops here */
+	__REQ_BIDI,		/* bidirectional io request */
 };
 
 #define REQ_RW		(1 << __REQ_RW)
@@ -219,6 +220,7 @@ #define REQ_ORDERED_COLOR	(1 << __REQ_OR
 #define REQ_RW_SYNC	(1 << __REQ_RW_SYNC)
 #define REQ_ALLOCED	(1 << __REQ_ALLOCED)
 #define REQ_RW_META	(1 << __REQ_RW_META)
+#define REQ_BIDI	(1 << __REQ_BIDI)
 
 #define BLK_MAX_CDB	16
 
@@ -313,6 +315,9 @@ struct request {
 	 */
 	rq_end_io_fn *end_io;
 	void *end_io_data;
+
+	/* for bidi */
+	struct request *next_rq;
 };
 
 /*
@@ -478,6 +483,7 @@ #define QUEUE_FLAG_DEAD		5	/* queue bein
 #define QUEUE_FLAG_REENTER	6	/* Re-entrancy avoidance */
 #define QUEUE_FLAG_PLUGGED	7	/* queue is plugged */
 #define QUEUE_FLAG_ELVSWITCH	8	/* don't use elevator, just do FIFO */
+#define QUEUE_FLAG_BIDI		9	/* bidirectional requests */
 
 enum {
 	/*
@@ -542,6 +548,7 @@ #define blk_pm_request(rq)	\
 #define blk_sorted_rq(rq)	((rq)->cmd_flags & REQ_SORTED)
 #define blk_barrier_rq(rq)	((rq)->cmd_flags & REQ_HARDBARRIER)
 #define blk_fua_rq(rq)		((rq)->cmd_flags & REQ_FUA)
+#define blk_bidi_rq(rq)		((rq)->cmd_flags & REQ_BIDI)
 
 #define list_entry_rq(ptr)	list_entry((ptr), struct request, queuelist)
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a2e0c10..0e259e5 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,12 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_data_buffer {
+	unsigned short use_sg;       /* Number of pieces of scatter-gather */
+	unsigned short sglist_len;   /* size of malloc'd scatter-gather list */
+	void *request_buffer;        /* Actual requested buffer */
+	unsigned request_bufflen;    /* Actual request size */
+};
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -117,6 +123,9 @@ #define SCSI_SENSE_BUFFERSIZE 	96
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	/* bidi in buffer */
+	struct scsi_data_buffer ext_request_buffer;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
-- 
1.4.3.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] bidi support: block layer bidirectional io.
  2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
                   ` (3 preceding siblings ...)
  2007-04-15 17:33 ` [PATCH 4/4] bidi support: bidirectional request Boaz Harrosh
@ 2007-04-16 18:03 ` Douglas Gilbert
  4 siblings, 0 replies; 22+ messages in thread
From: Douglas Gilbert @ 2007-04-16 18:03 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, James Bottomley, Andrew Morton, Mike Christie,
	Christoph Hellwig, linux-scsi, Linux-ide, Benny Halevy, osd-dev

Boaz Harrosh wrote:
> Following are 4 (large) patches for support of bidirectional
> block I/O in kernel. (not including SCSI-ml or iSCSI)
> 
> The submitted work is against linux-2.6-block tree as of
> 2007/04/15, and will only cleanly apply in succession.
> 
> The patches are based on the RFC I sent 3 months ago. They only
> cover the block layer at this point. I suggest they get included
> in Morton's tree until they reach the kernel so they can get
> compiled on all architectures/platforms. There is still a chance
> that architectures I did not compile were not fully converted.
> (FWIW, my search for use of struct request members failed to find
> them). If you find such a case, please send me the file
> name and I will fix it ASAP.
> 
> Patches summary:
> 1. [PATCH 1/4] bidi support: request dma_data_direction
> 	- Convert REQ_RW bit flag to a dma_data_direction member like in SCSI-ml use.
> 	- removed rq_data_dir() and added other APIs for querying request's direction.
> 	- fix usage of rq_data_dir() and peeking at req->cmd_flags & REQ_RW to using
> 	  new api.
> 	- clean-up bad usage of DMA_BIDIRECTIONAL and bzero of none-queue requests,
> 	  to use the new blk_rq_init_unqueued_req()
> 
> 2. [PATCH 2/4] bidi support: fix req->cmd == INT cases
> 	- Digging into all these old drivers, I have found traces of past life
> 	  where request->cmd was the command type. This patch fixes some of these
> 	  places. All drivers touched by this patch are clear indication of drivers
> 	  that were not used for a while. Should we removed them from Kernel? 
> 	  These Are:
> 		drivers/acorn/block/fd1772.c, drivers/acorn/block/mfmhd.c,
> 		drivers/block/nbd.c, drivers/cdrom/aztcd.c, drivers/cdrom/cm206.c
> 		drivers/cdrom/gscd.c, drivers/cdrom/mcdx.c, drivers/cdrom/optcd.c
> 		drivers/cdrom/sjcd.c, drivers/ide/legacy/hd.c, drivers/block/amiflop.c
> 
> 2. [PATCH 3/4] bidi support: request_io_part
> 	- extract io related fields in struct request into struct request_io_part
> 	  in preparation to full bidi support.
> 	- new rq_uni() API to access the sub-structure. (Please read below comment
> 	  on why an API and not open code the access)
> 	- Convert All users to new API.
> 
> 3. [PATCH 4/4] bidi support: bidirectional block layer
> 	- add one more request_io_part member for bidi support in struct request.
> 	- add block layer API functions for mapping and accessing bidi data buffers
> 	  and for ending a block request as a whole (end_that_request_block())
> 
> --------------------------------------------------------------------------------------------
> Developer comments:
> 
> patch 1/4: Borrow from struct scsi_cmnd use of enum dma_data_direction. Further work (in
> progress) is the removal of the corresponding member from struct scsi_cmnd and converting
> all users to directly access rq_dma_dir(sc->req).
> 
> patch 3/4: The reasons for introducing the rq_uni(req) API rather than directly accessing
> req->uni are:
> 
> * WARN(!bidi_dir(req)) is life saving when developing bidi enabled paths.  Once we, bidi
>   users, start to push bidi requests down the kernel paths, we immediately get warned of
>   paths we did not anticipate. Otherwise, they will be very hard to find, and will hurt
>   kernel stability.
> 
> * A cleaner and saner future implementation could be in/out members rather than
>   uni/bidi_read.  This way the dma_direction member can deprecated and the uni sub-
>   structure can be maintained using a pointer in struct req.
>   With this API we are free to change the implementation in the future without
>   touching any users of the API. We can also experiment with what's best. Also, with the
>   API it is much easier to convert uni-directional drivers for bidi (look in
>   ll_rw_block.c in patch 4/4).
> 
> * Note, that internal uses inside the block layer access req->uni directly, as they will
>   need to be changed if the implementation of req->{uni, bidi_read} changes.

Boaz,
Recently I have been looking at things from the perspective
of a SAS target and thinking about bidi commands. Taking
XDWRITEREAD(10) in sbc3r09.pdf (section 5.44) as an example,
with DISABLE_WRITE=0, the "device server" in the target should
do the following:
  a) decode the cdb **
  b) read from storage [lba, transfer_length]
  c) fetch data_out from initiator [transfer_length] ***
  d) XOR data from (b) and (c) and place result in (z)
  e) write the data from (c) to storage [lba, transfer_length]
  f) send (z) in data_in to initiator [transfer_length]
  g) send SCSI completion status to initiator

Logically a) must occur first and g) last. The b) to f)
sequence could be repeated (perhaps) by the device server
subdividing the transfer_length (i.e. it may not be
reasonable for the OS to assume that the data_out transfer
will be complete before there is any data_in transfer).
With this command (and with most other bidi commands I
suspect) there is little opportunity for full duplex data
movement within this command (i.e. little or no data
associated with this command moving both ways "on the wire"
at the same time).

Seen from sgv4 and the initiator we basically set up
the resources that the target's device server uses
during the execution of that command:
  0) cdb [XDWRITEREAD(lba,transfer_length)]
  1) data_out buffer ("write" out to device)
  2) data_in buffer ("read" in from device)
  3) sense buffer (in case of problems)
  4) SCSI (and transport) completion status

After setting up 1), 2) and 3), the initiator pushes 0) to
the target (lu) and then waits until it is told 4) is
finished. The order in which 1) and 2) are used is dictated
by the device server in the target. [Pushing 0) and 1) can
be partially combined in the ** case, see below.]

I assume 1) and 2) will have their own scatter gather lists in
the "request" layer. Describing that as DMA_BIDIR makes me
feel a bit uneasy. It is definitely better that a binary "rw"
flag. Couldn't the HBA have different inbound and outbound DMA
engines associated with different PCI devices?

My concern stated a different way is that a "bidi" SCSI command
using two scatter gather lists at the initiator end looks very
similar to two queued SCSI commands: typically a READ queued
behind a WRITE. We don't consider the latter pair
(PCI_)DMA_BIDIRECTIONAL.

On rereading your comments above, are you planning to
retire the DMA_BIDIRECTIONAL flag in the future?

** various SCSI transports have a mechanism for sending
   some data_out with the cdb. Some transports allow
   the device server to control that via the "first burst
   size" field in the disconnect-reconnect mode page.
   I have been told that all known SAS implementations
   set that field to zero (and some want the capability
   removed from the standard). FCP?

***Both SAS and FCP control data_out transfers indirectly
   via the XFER_RDY (t->i) frame.

Doug Gilbert

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-05-01 19:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
2007-04-15 17:25 ` [PATCH 1/4] bidi support: request dma_data_direction Boaz Harrosh
2007-04-15 17:31 ` [PATCH 2/4] bidi support: fix req->cmd == INT cases Boaz Harrosh
2007-04-15 17:32 ` [PATCH 3/4] bidi support: request_io_part Boaz Harrosh
2007-04-29 15:49   ` Boaz Harrosh
2007-04-15 17:33 ` [PATCH 4/4] bidi support: bidirectional request Boaz Harrosh
2007-04-28 19:48   ` FUJITA Tomonori
2007-04-29 15:48     ` Boaz Harrosh
2007-04-29 18:49       ` James Bottomley
2007-04-30 11:11         ` Jens Axboe
2007-04-30 11:53           ` Benny Halevy
2007-04-30 11:59             ` Jens Axboe
2007-04-30 14:52               ` Douglas Gilbert
2007-04-30 14:51                 ` Jens Axboe
2007-04-30 15:12                   ` Benny Halevy
2007-05-01 18:22                   ` Boaz Harrosh
2007-05-01 18:57                     ` Jens Axboe
2007-05-01 19:01                       ` FUJITA Tomonori
2007-04-30 13:05           ` Mark Lord
2007-04-30 13:07             ` Jens Axboe
2007-05-01 19:50           ` FUJITA Tomonori
2007-04-16 18:03 ` [PATCH 0/4] bidi support: block layer bidirectional io Douglas Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).