All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	James.Bottomley@SteelEye.com
Cc: linux-scsi@vger.kernel.org, bhalevy@panasas.com,
	jens.axboe@oracle.com, hch@infradead.org,
	akpm@linux-foundation.org, michaelc@cs.wisc.edu
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Tue, 08 May 2007 21:53:24 +0300	[thread overview]
Message-ID: <4640C724.8030409@panasas.com> (raw)
In-Reply-To: <20070508112553C.fujita.tomonori@lab.ntt.co.jp>

[-- Attachment #1: Type: text/plain, Size: 4543 bytes --]

FUJITA Tomonori wrote:
> Here is an updated version of the patch to add bidi support to block
> pc requests. Bugs spotted by Benny were fixed.
> 
> This patch can be applied cleanly to the scsi-misc git tree and is on
> the top of the following patch to add linked request support:
> 
> http://marc.info/?l=linux-scsi&m=117835587615642&w=2
> 
> ---
> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Mon, 7 May 2007 16:42:24 +0900
> Subject: [PATCH] add bidi support to scsi pc commands
> 
> This adds bidi support to block pc requests.
> 
> A bidi request uses req->next_rq pointer for an in request.
> 
> This patch introduces a new structure, scsi_data_buffer to hold the
> data buffer information. To avoid make scsi_cmnd structure fatter, the
> scsi mid-layer uses cmnd->request->next_rq->special pointer for
> a scsi_data_buffer structure. LLDs don't touch the second request
> (req->next_rq) so it's safe to use req->special.
> 
> scsi_blk_pc_done() always completes the whole command so
> scsi_end_request() simply completes the bidi chunk too.
> 
> A helper function, scsi_bidi_data_buffer() is for LLDs to access to
> the scsi_data_buffer structure easily.
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> ---
> @@ -685,6 +696,14 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
>  		}
>  	}
>  
> +	/*
> +	 * a REQ_BLOCK_PC command is always completed fully so just do
> +	 * end the bidi chunk.
> +	 */
> +	if (sdb)
> +		end_that_request_chunk(req->next_rq, uptodate,
> +				       sdb->request_bufflen);
> +

sdb->request_bufflen was zeroed in scsi_release_buffers which is called before
scsi_end_request.

>  static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
>  {
>  	BUG_ON(!blk_pc_request(cmd->request));
> @@ -1090,10 +1159,22 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
>  static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
>  {
>  	struct scsi_cmnd *cmd;
> +	struct scsi_data_buffer *sdb = NULL;
> +
> +	if (blk_bidi_rq(req)) {
> +		sdb = kzalloc(sizeof(*sdb), GFP_ATOMIC);
> +		if (unlikely(!sdb))
> +			return BLKPREP_DEFER;
> +	}
>  
>  	cmd = scsi_get_cmd_from_req(sdev, req);
> -	if (unlikely(!cmd))
> +	if (unlikely(!cmd)) {
> +		kfree(sdb);
>  		return BLKPREP_DEFER;
> +	}
> +
> +	if (sdb)
> +		req->next_rq->special = sdb;
>  
>  	/*
>  	 * BLOCK_PC requests may transfer data, in which case they must

Before I get to my main concern here I have one comment. in scsi_get_cmd_from_req()
there is a code path in which a scsi_cmnd is taken from special and is not newly
allocated. It is best to move bidi allocation to scsi_get_cmd_from_req() and allocate
new sdb only if we get a new Command. (See my attached patch for an example)

OK! My main concern is with kzalloc(sizeof(*sdb), GFP_ATOMIC);
All IO allocations should come from slabs in case of a memory starved system.
There are 3 possible solutions I see:
1. Use struct scsi_cmnd for bidi_read and not a special scsi_data_buffer.
   Attached is above solution. It is by far the simplest of the three.
   So simple that even Jens's BigIO patch can almost apply at scsi_lib.c. (and vise versa)
   What's hanged on the request->next_rq->special is a second scsi_cmnd.
   The command is taken from regular command pool. This solution, though
   wasteful of some memory is very stable.

2. Force the users of BIDI to allocate scsi_data_buffer(s)
   Users will allocate a slab for scsi_data_buffers and hang them on nex_rq->special before
   hand. Than free them together with second request. This approach can be very stable, But
   it is bad because it is a layering violation. When block and SCSI layers decide to change
   the wiring of bidi. Users will have to change with them.

3. Let SCSI-ml manage a slab for scsi_data_buff's
   Have a flag to  scsi_setup_command_freelist() or a new API. When a device is flagged
   bidi-able we could 1-Allocate bigger slots to have extra memory for SDBs together
   with the command itself. or 2-allocate yet another slab for SDBs.

The 3rd approach is clean, and I can submit a patch for it. But I think it is not currently
necessary. The first solution (See attached patch) we can all live with, I hope. Since for
me it gives me the stability I need. And for the rest of the kernel it is the minimum
change, layered on existing building blocks.

If any one wants to try it out I put up the usual patchset that exercise this approach here.
http://www.bhalevy.com/open-osd/download/double_rq_bidi

Thanks
Boaz


[-- Attachment #2: 0002-scsi-ml-bidi-support.patch --]
[-- Type: text/plain, Size: 10884 bytes --]

>From ac8f0d3df711c5d01a269fde6a4ecce3000c3f68 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Tue, 8 May 2007 14:04:46 +0300
Subject: [PATCH] scsi-ml bidi support

    At the block level bidi request uses req->next_rq pointer for a second
    bidi_read request.
    At Scsi-midlayer a second scsi_cmnd structure is used for the sglists
    of the bidi_read part. This command is put on request->next_rq->special.
    So struct scsi_cmnd is not changed. And scsi-ml minimally changes.

    - Define scsi_end_bidi_request() to do what scsi_end_request() does but for a
      bidi request. This is necessary because bidi commands are a bit tricky here.
      (See comments in body)

    - scsi_release_buffers() will also release the bidi_read buffers.
      I have changed the check from "if (cmd->use_sg)" to
      "if(cmd->request_buffer)" because it can now be called on half
      prepared commands. (and use_sg is no longer relevant here)

    - scsi_io_completion() will now call scsi_end_bidi_request on bidi commands
      and return.

    - The previous work done in scsi_init_io() is now done in a new
      scsi_init_data_buf() (which is 95% identical to old scsi_init_io())
      The new scsi_init_io() will call the above twice if needed also for
      the bidi_read command.

    - scsi_get_cmd_from_req() (called from scsi_setup_blk_pc_cmnd()) in the
      case that a new command is allocated, will also allocate a bidi_read
      command and hang it on cmd->request->next_rq->special. Only at this
      point is a command bidi.

    - scsi_setup_blk_pc_cmnd() will update sc_dma_direction to DMA_BIDIRECTIONAL
      which is not correct. It is only here for compatibility with iSCSI bidi
      driver. At final patch this will be a DMA_TO_DEVICE for this command and
      DMA_FROM_DEVICE for the bidi_read command. A driver must call scsi_bidi_cmnd()
      to work on bidi commands.

    - Define scsi_bidi_cmnd() to return true if it is a bidi command and a
      second command was allocated.

    - Define scsi_in()/scsi_out() to return the in or out commands from this command
      This API is to isolate users from the mechanics of bidi, and is also a short
      hand to common used code. (Even in scsi_lib.c)

    - In scsi_error.c at scsi_send_eh_cmnd() make sure bidi-lld is not confused by
      a get-sense command that looks like bidi. This is done by puting NULL at
      request->next_rq, and restoring before exit.
---
 drivers/scsi/scsi_error.c |    5 ++
 drivers/scsi/scsi_lib.c   |  133 ++++++++++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_cmnd.h  |   19 ++++++-
 3 files changed, 148 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index e8350c5..169f4b4 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -620,6 +620,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	unsigned char old_cmd_len;
 	unsigned old_bufflen;
 	void *old_buffer;
+	struct request* old_next_rq;
 	int rtn;
 
 	/*
@@ -636,6 +637,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	old_cmd_len = scmd->cmd_len;
 	old_use_sg = scmd->use_sg;
 
+	old_next_rq = scmd->request->next_rq;
+	scmd->request->next_rq = NULL;
+
 	memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
 	memcpy(scmd->cmnd, cmnd, cmnd_size);
 
@@ -734,6 +738,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	/*
 	 * Restore original data
 	 */
+	scmd->request->next_rq = old_next_rq;
 	scmd->request_buffer = old_buffer;
 	scmd->request_bufflen = old_bufflen;
 	memcpy(scmd->cmnd, old_cmnd, sizeof(scmd->cmnd));
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fbcdc..0f49195 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -85,6 +85,10 @@ static void scsi_unprep_request(struct request *req)
 	req->cmd_flags &= ~REQ_DONTPREP;
 	req->special = NULL;
 
+	if (scsi_bidi_cmnd(cmd)) {
+		scsi_put_command(scsi_in(cmd));
+		cmd->request->next_rq->special = NULL;
+	}
 	scsi_put_command(cmd);
 }
 
@@ -701,6 +705,52 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
 	return NULL;
 }
 
+/*
+ * Bidi commands Must be complete as a whole, both sides at once.
+ * If part of the bytes were written and lld returned
+ * scsi_in()->resid or scsi_out()->resid this information will be left
+ * in req->data_len and req->next_rq->data_len. The upper-layer driver can
+ * decide what to do with this information.
+ */
+void scsi_end_bidi_request(struct scsi_cmnd *cmd)
+{
+	struct scsi_cmnd* bidi_in = scsi_in(cmd);
+	request_queue_t *q = cmd->device->request_queue;
+	struct request *req = cmd->request;
+	unsigned long flags;
+
+	end_that_request_chunk(req, 1, req->data_len);
+	req->data_len = cmd->resid;
+	end_that_request_chunk(req->next_rq, 1, req->next_rq->data_len);
+	req->next_rq->data_len = bidi_in->resid;
+
+	req->next_rq->special = NULL;
+	scsi_put_command(bidi_in);
+
+	/*
+	 *FIXME: If ll_rw_blk.c is changed to also put_request(req->next_rq)
+	 *       in end_that_request_last() then this WARN_ON must be removed.
+	 *       Otherwise, upper-driver must have registered an end_io.
+	 */
+	WARN_ON(!req->end_io);
+
+	/* FIXME: the following code can be shared with scsi_end_request */
+	add_disk_randomness(req->rq_disk);
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	if (blk_rq_tagged(req))
+		blk_queue_end_tag(q, req);
+
+	end_that_request_last(req, 1); /*allways good for now*/
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	/*
+	 * This will goose the queue request function at the end, so we don't
+	 * need to worry about launching another command.
+	 */
+	scsi_next_command(cmd);
+}
+
 struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
 	struct scsi_host_sg_pool *sgp;
@@ -775,7 +825,7 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
+	if (cmd->request_buffer)
 		scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
 
 	/*
@@ -784,6 +834,15 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 	 */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
+
+	if (scsi_bidi_cmnd(cmd)) {
+		struct scsi_cmnd* bidi_in = scsi_in(cmd);
+		if (bidi_in->request_buffer)
+			scsi_free_sgtable(bidi_in->request_buffer,
+			                           bidi_in->sglist_len);
+		bidi_in->request_buffer = NULL;
+		bidi_in->request_bufflen = 0;
+	}
 }
 
 /*
@@ -849,9 +908,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
+		if (scsi_bidi_cmnd(cmd)) {
+			scsi_end_bidi_request(cmd);
+			return;
+		}
 		req->data_len = cmd->resid;
 	}
 
+	BUG_ON(blk_bidi_rq(req));
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -978,17 +1043,18 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 EXPORT_SYMBOL(scsi_io_completion);
 
 /*
- * Function:    scsi_init_io()
+ * Function:    scsi_init_data_buf()
  *
- * Purpose:     SCSI I/O initialize function.
+ * Purpose:     SCSI I/O initialize helper.
+ *              maps the request buffers for the given cmd.
  *
- * Arguments:   cmd   - Command descriptor we wish to initialize
+ * Arguments:   cmd   - Command whos data we wish to initialize
  *
  * Returns:     0 on success
  *		BLKPREP_DEFER if the failure is retryable
  *		BLKPREP_KILL if the failure is fatal
  */
-static int scsi_init_io(struct scsi_cmnd *cmd)
+static int scsi_init_data_buff(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	struct scatterlist *sgpnt;
@@ -1032,12 +1098,45 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
-	/* release the command and kill it */
-	scsi_release_buffers(cmd);
-	scsi_put_command(cmd);
 	return BLKPREP_KILL;
 }
 
+/*
+ * Function:    scsi_init_io()
+ *
+ * Purpose:     SCSI I/O initialize function.
+ *
+ * Arguments:   cmd   - Command descriptor we wish to initialize
+ *
+ * Returns:     0 on success
+ *		BLKPREP_DEFER if the failure is retryable
+ *		BLKPREP_KILL if the failure is fatal
+ */
+static int scsi_init_io(struct scsi_cmnd *cmd)
+{
+	struct request *req = cmd->request;
+	int error;
+
+	error = scsi_init_data_buff(cmd);
+	if (error)
+		goto err_exit;
+
+	if (scsi_bidi_cmnd(cmd)) {
+		error = scsi_init_data_buff(scsi_in(cmd));
+		if (error)
+			goto err_exit;
+	}
+
+	req->buffer = NULL;
+	return 0;
+
+err_exit:
+	scsi_release_buffers(cmd);
+	if (error == BLKPREP_KILL)
+		scsi_unprep_request(req);
+	return error;
+}
+
 static int scsi_issue_flush_fn(request_queue_t *q, struct gendisk *disk,
 			       sector_t *error_sector)
 {
@@ -1064,6 +1163,22 @@ static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev,
 		if (unlikely(!cmd))
 			return NULL;
 		req->special = cmd;
+		if (blk_bidi_rq(req)) {
+			struct scsi_cmnd *bidi_in_cmd;
+			/*
+			 * second bidi request must be blk_pc_request for use of
+			 * correct sizes.
+			 */
+			WARN_ON(!blk_pc_request(req->next_rq));
+
+			bidi_in_cmd = scsi_get_command(sdev, GFP_ATOMIC);
+			if (unlikely(!bidi_in_cmd)) {
+				scsi_put_command(cmd);
+				return NULL;
+			}
+			req->next_rq->special = bidi_in_cmd;
+			bidi_in_cmd->request = req->next_rq;
+		}
 	} else {
 		cmd = req->special;
 	}
@@ -1124,6 +1239,8 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 	cmd->cmd_len = req->cmd_len;
 	if (!req->data_len)
 		cmd->sc_data_direction = DMA_NONE;
+	else if (blk_bidi_rq(req))
+		cmd->sc_data_direction = DMA_BIDIRECTIONAL;
 	else if (rq_data_dir(req) == WRITE)
 		cmd->sc_data_direction = DMA_TO_DEVICE;
 	else
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a2e0c10..1223412 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -2,11 +2,11 @@
 #define _SCSI_SCSI_CMND_H
 
 #include <linux/dma-mapping.h>
+#include <linux/blkdev.h>
 #include <linux/list.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 
-struct request;
 struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
@@ -135,4 +135,21 @@ extern void scsi_kunmap_atomic_sg(void *virt);
 extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
 extern void scsi_free_sgtable(struct scatterlist *, int);
 
+static inline int scsi_bidi_cmnd(struct scsi_cmnd *cmd)
+{
+	return blk_bidi_rq(cmd->request) &&
+	       (cmd->request->next_rq->special != NULL);
+}
+
+static inline struct scsi_cmnd *scsi_in(struct scsi_cmnd *cmd)
+{
+	return scsi_bidi_cmnd(cmd) ?
+	       cmd->request->next_rq->special : cmd;
+}
+
+static inline struct scsi_cmnd *scsi_out(struct scsi_cmnd *cmd)
+{
+	return cmd;
+}
+
 #endif /* _SCSI_SCSI_CMND_H */
-- 
1.5.0.4.402.g8035


  reply	other threads:[~2007-05-08 18:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-08  2:25 [PATCH v2] add bidi support for block pc requests FUJITA Tomonori
2007-05-08 18:53 ` Boaz Harrosh [this message]
2007-05-08 20:01   ` James Bottomley
2007-05-09  7:46     ` Boaz Harrosh
2007-05-09 10:52       ` FUJITA Tomonori
2007-05-09 13:58         ` Boaz Harrosh
2007-05-09 14:54           ` FUJITA Tomonori
2007-05-09 14:55           ` James Bottomley
2007-05-09 16:54             ` Boaz Harrosh
2007-05-10  6:53               ` FUJITA Tomonori
2007-05-10  7:45                 ` FUJITA Tomonori
2007-05-10 12:37                   ` Boaz Harrosh
2007-05-10 13:01                     ` FUJITA Tomonori
2007-05-10 15:10                     ` Douglas Gilbert
2007-05-10 15:48                       ` Boaz Harrosh
2007-05-11 16:26                       ` James Bottomley
2007-05-16 17:29       ` Boaz Harrosh
2007-05-16 17:53         ` Jens Axboe
2007-05-16 18:06           ` James Bottomley
2007-05-16 18:13             ` Jens Axboe
2007-05-17  8:46               ` Boaz Harrosh
2007-05-17  2:57           ` FUJITA Tomonori
2007-05-17  5:48             ` Jens Axboe
2007-05-17  5:59               ` FUJITA Tomonori
2007-05-17  8:49                 ` Boaz Harrosh
2007-05-17 11:12                   ` FUJITA Tomonori
2007-05-17 17:37                     ` Benny Halevy
2007-05-24 16:37                     ` Boaz Harrosh
2007-05-24 16:46                       ` Boaz Harrosh
2007-05-24 16:47                       ` FUJITA Tomonori
2007-05-24 16:59                         ` Boaz Harrosh
2007-05-17 11:29                   ` FUJITA Tomonori
2007-05-17 13:27                   ` James Bottomley
2007-05-17 14:00                     ` Boaz Harrosh
2007-05-17 14:11                       ` FUJITA Tomonori
2007-05-17 15:49                         ` Boaz Harrosh
2007-06-01 20:21                           ` Jeff Garzik
2007-06-03  7:45                             ` Boaz Harrosh
2007-06-03 13:17                               ` James Bottomley
2007-07-07 15:27                                 ` Jeff Garzik
2007-07-07 15:42                                   ` James Bottomley
2007-07-07 16:59                                     ` Jeff Garzik
2007-05-09 10:49     ` FUJITA Tomonori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4640C724.8030409@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhalevy@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.