All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	bhalevy@panasas.com, James.Bottomley@SteelEye.com
Cc: tomof@acm.org, jens.axboe@oracle.com, linux-scsi@vger.kernel.org
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Wed, 25 Jul 2007 22:50:47 +0300	[thread overview]
Message-ID: <46A7A997.4000709@panasas.com> (raw)
In-Reply-To: <20070725174258T.fujita.tomonori@lab.ntt.co.jp>

FUJITA Tomonori wrote:
> From: Benny Halevy <bhalevy@panasas.com>
> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Wed, 25 Jul 2007 11:26:44 +0300
> 
>>> However, I'm perfectly happy to go with whatever the empirical evidence
>>> says is best .. and hopefully, now we don't have to pick this once and
>>> for all time ... we can alter it if whatever is chosen proves to be
>>> suboptimal.
>> I agree.  This isn't a catholic marriage :)
>> We'll run some performance experiments comparing the sgtable chaining
>> implementation vs. a scsi_data_buff implementation pointing
>> at a possibly chained sglist and let's see if we can measure
>> any difference.  We'll send results as soon as we have them.
> 
> I did some tests with your sgtable patchset and the approach to use
> separate buffer for sglists. As expected, there was no performance
> difference with small I/Os. I've not tried very large I/Os, which
> might give some difference.
> 
> The patchset to use separate buffer for sglists is available:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable
> 
> 
> Can you clean up your patchset and upload somewhere?

Sorry sure. Here is scsi_sgtable complete work over linux-block:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block
 
Here is just scsi_sgtable, no chaining, over scsi-misc + more
drivers:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/scsi-misc

Next week I will try to mount lots of scsi_debug devices and
use large parallel IO to try and find a difference. I will
test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable

I have lots of reservations about Tomo's last patches. For me
they are a regression. They use 3 allocations per command instead
of 2. They use an extra pointer and an extra global slab_pool. And
for what, for grouping some scsi_cmnd members in a substructure.
If we want to go the "pointing" way, keeping our extra scatterlist
and our base_2 count on most ARCHs. Than we can just use the 
scsi_data_buff embedded inside scsi_cmnd. 

The second scsi_data_buff for bidi can come either from an extra 
slab_pool like in Tomo's patch - bidi can pay - or in scsi.c at 
scsi_setup_command_freelist() the code can inspect Tomo's flag 
to the request_queue QUEUE_FLAG_BIDI and will than allocate a 
bigger scsi_cmnd in the free list.

I have coded that approach and it is very simple:
http://www.bhalevy.com/open-osd/download/scsi_data_buff

They are over Jens's sglist-arch branch
I have revised all scsi-ml places and it all compiles
But is totally untested.

I will add this branch to the above tests, but I suspect that
they are identical in every way to current code.

For review here is the main scsi_data_buff patch:
------
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Wed, 25 Jul 2007 20:19:14 +0300
Subject: [PATCH] SCSI: scsi_data_buff

  In preparation for bidi we abstract all IO members of scsi_cmnd
  that will need to duplicate into a substructure.
  - Group all IO members of scsi_cmnd into a scsi_data_buff
    structure.
  - Adjust accessors to new members.
  - scsi_{alloc,free}_sgtable receive a scsi_data_buff instead of
    scsi_cmnd. And work on it. (Supporting chaining like before)
  - Adjust scsi_init_io() and  scsi_release_buffers() for above
    change.
  - Fix other parts of scsi_lib to members migration. Use accessors
    where appropriate.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |   68 +++++++++++++++++++--------------------------
 include/scsi/scsi_cmnd.h |   34 +++++++++++-----------
 2 files changed, 46 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d62b184..2b8a865 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -714,16 +714,14 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *sdb, gfp_t gfp_mask)
 {
 	struct scsi_host_sg_pool *sgp;
 	struct scatterlist *sgl, *prev, *ret;
 	unsigned int index;
 	int this, left;
 
-	BUG_ON(!cmd->use_sg);
-
-	left = cmd->use_sg;
+	left = sdb->sg_count;
 	ret = prev = NULL;
 	do {
 		this = left;
@@ -747,7 +745,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		 * first loop through, set initial index and return value
 		 */
 		if (!ret) {
-			cmd->sg_pool = index;
+			sdb->sg_pool = index;
 			ret = sgl;
 		}
 
@@ -769,10 +767,10 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 	} while (left);
 
 	/*
-	 * ->use_sg may get modified after dma mapping has potentially
+	 * sdb->sg_count may get modified after blk_rq_map_sg() potentially
 	 * shrunk the number of segments, so keep a copy of it for free.
 	 */
-	cmd->__use_sg = cmd->use_sg;
+	sdb->sgs_allocated = sdb->sg_count;
 	return ret;
 enomem:
 	if (ret) {
@@ -797,21 +795,21 @@ enomem:
 
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_data_buff *sdb)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
+	struct scatterlist *sgl = sdb->sglist;
 	struct scsi_host_sg_pool *sgp;
 
 	/*
 	 * if this is the biggest size sglist, check if we have
 	 * chained parts we need to free
 	 */
-	if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+	if (sdb->sgs_allocated > SCSI_MAX_SG_SEGMENTS) {
 		unsigned short this, left;
 		struct scatterlist *next;
 		unsigned int index;
 
-		left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
+		left = sdb->sgs_allocated - (SCSI_MAX_SG_SEGMENTS - 1);
 		next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
 		while (left && next) {
 			sgl = next;
@@ -833,7 +831,7 @@ void scsi_free_sgtable(struct scsi_cmnd *cmd)
 		}
 	}
 
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sdb->sglist, scsi_sg_pools[sdb->sg_pool].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -857,16 +855,10 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sdb.sglist)
+		scsi_free_sgtable(&cmd->sdb);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
-	cmd->request_buffer = NULL;
-	cmd->request_bufflen = 0;
-	cmd->use_sg = 0;
+	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 }
 
 /*
@@ -900,7 +892,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -908,8 +900,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -932,9 +922,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -942,7 +934,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -1075,41 +1066,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
+	struct scsi_data_buff* sdb = &cmd->sdb;
 
 	/*
 	 * We used to not use scatter-gather for single segment request,
 	 * but now we do (it makes highmem I/O easier to support without
 	 * kmapping pages)
 	 */
-	cmd->use_sg = req->nr_phys_segments;
+	sdb->sg_count = req->nr_phys_segments;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sdb->sglist = scsi_alloc_sgtable(sdb, GFP_ATOMIC);
+	if (unlikely(!sdb->sglist)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sdb->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sdb->length = req->nr_sectors << 9;
 
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sdb->sglist);
+	if (likely(count <= sdb->sg_count)) {
+		sdb->sg_count = count;
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, sdb->sg_count);
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1165,7 +1157,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1194,9 +1186,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 		req->buffer = NULL;
 	}
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index c7bfc60..89187dc 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,14 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_data_buff {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sgs_allocated;
+	short sg_pool;
+	struct scatterlist* sglist;
+};
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,16 +72,10 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-	unsigned short __use_sg;
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -83,10 +85,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -118,6 +116,8 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	struct scsi_data_buff sdb;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -133,35 +133,35 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *, gfp_t);
+extern void scsi_free_sgtable(struct scsi_data_buff *);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->use_sg;
+	return cmd->sdb.sg_count;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer);
+	return cmd->sdb.sglist;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sdb.length;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	cmd->sdb.resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sdb.resid;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd


      parent reply	other threads:[~2007-07-25 19:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
2007-07-24 13:08   ` Boaz Harrosh
2007-07-25  8:08   ` Boaz Harrosh
2007-07-25  9:05     ` [PATCH AB1/5 ver2] " Boaz Harrosh
2007-07-25  9:06     ` [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation Boaz Harrosh
2007-07-24  8:56 ` [PATCH A2/5] " Boaz Harrosh
2007-07-24  8:59 ` [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable Boaz Harrosh
2007-07-24  9:01 ` [PATCH B2/5] SCSI: support for allocating large scatterlists Boaz Harrosh
2007-07-24  9:03 ` [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning Boaz Harrosh
2007-07-24  9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
2007-07-24 10:01   ` Boaz Harrosh
2007-07-24 11:12     ` FUJITA Tomonori
2007-07-24 13:41       ` FUJITA Tomonori
2007-07-24 14:01         ` Benny Halevy
2007-07-24 16:10           ` James Bottomley
2007-07-25  8:26             ` Benny Halevy
2007-07-25  8:42               ` FUJITA Tomonori
2007-07-25 19:22                 ` Boaz Harrosh
2007-07-26 11:33                   ` FUJITA Tomonori
2007-07-31 20:12                   ` Boaz Harrosh
2007-08-05 16:03                     ` FUJITA Tomonori
2007-08-06  7:22                     ` FUJITA Tomonori
2007-08-07  6:55                       ` Jens Axboe
2007-08-07  8:36                         ` FUJITA Tomonori
2007-08-08  7:16                           ` Jens Axboe
2007-07-25 19:50                 ` Boaz Harrosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46A7A997.4000709@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=bhalevy@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jens.axboe@oracle.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tomof@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.