linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
@ 2007-07-24  8:47 Boaz Harrosh
  2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  8:47 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi

As Jens said, there is nothing common to scsi_sgtable and 
sglists. Save the fact that it is a massive conflict at 
scsi-ml. They touch all the same places.

Proposed is a simple way out. Two patchsets That produce the
same output at the end.

One: scsi_sgtable_than_sg-chaining
Two: sg-chaining_than_scsi_sgtable

They are layed out like this

common to both
  [PATCH AB1/5] SCSI: SG pools allocation cleanup

scsi_sgtable_than_sg-chaining:
  [PATCH A2/5] SCSI: scsi_sgtable implementation
  [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable

sg-chaining_than_scsi_sgtable:
  [PATCH B2/5] SCSI: support for allocating large scatterlists
  [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning

Both patchsets are based on linux-2.6-block sglist-drivers (or sglist-arch)
on top of a:
  git-revert 630cfe6f8a137a3d494aefda1e62bc4b9ba02f58.
  Revert of "SCSI: support for allocating large scatterlists"

First patch will not apply on scsi-misc-2.6 because there
is a missing NULL in the call to kmem_cache_create(). (linux-2.6.23-rcx)
(If any one need a patchset for that please ask)

What we need to do first is agree on the outcome of both sgtable 
and sglist chaining combined. The easiest way is to look at the 
1st patchset AB1/5, A2/5, A3/5. It is the most incremental 
and easy to read.
 
(Note: This patchset is part of a larger patchset as I sent in my
last RFC "scsi-ml: scsi_sgtable implementation". You can find a
revised set here: www.bhalevy.com/open-osd/downloads/scsi_sgtable)

If we all agree on the combined outcome, than we can proceed to decide 
which route to take.

What I would like to ask of Jens is: please review the first 2 patches
from the B patchset. AB1/5, B2/5. These together are almost exactly like
your reverted patch above, but are a bit more friendly to the scsi_sgtable 
at the end.

For James the first patch [AB1/5] is a good patch that solves real
world problems today, like HIGHME64G=y. And maximizes allocations
for all kind of ARCHs. Maybe it could be applied to the tree now
for the next merge window, regardless of what is done with the
rest.

I'm just starting to test large IO's ontop of iSCSI. So probably some 
bugs will be found.

Thanks everybody
Boaz


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH AB1/5] SCSI: SG pools allocation cleanup.
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
@ 2007-07-24  8:52 ` Boaz Harrosh
  2007-07-24 13:08   ` Boaz Harrosh
  2007-07-25  8:08   ` Boaz Harrosh
  2007-07-24  8:56 ` [PATCH A2/5] " Boaz Harrosh
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  8:52 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


  - The code Automatically calculates at compile time the
    maximum size sg-array that will fit in a memory-page and will allocate
    pools of BASE_2 size, up to that maximum size.
  - split scsi_alloc() into an helper scsi_sgtable_index() that will return
    the index of the pool for a given sg_count.
  - Remove now unused SCSI_MAX_PHYS_SEGMENTS
  - rename sglist_len to sg_pool which is what it always was.
  - Some extra prints at scsi_init_queue(). These prints will be removed
    once everything stabilizes.

  Now that the Arrays are automatically calculated to fit in a page, what
  about ARCH's that have a very big page size? I have, just for demonstration,
  calculated upto 512 entries. But I suspect that other kernel-subsystems are
  bounded to 256 or 128, is that true? should I allow more/less than 512 here?

some common numbers:
Arch                      | SCSI_MAX_SG_SEGMENTS =  | sizeof(struct scatterlist)
--------------------------|-------------------------|---------------------------
x86_64                    | 128                     |32
i386 CONFIG_HIGHMEM64G=y  | 205                     |20
i386 other                | 256                     |16

  Could some one give example numbers of an ARCH with big page size?

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  142 +++++++++++++++++++++-------------------------
 include/scsi/scsi.h      |    7 --
 include/scsi/scsi_cmnd.h |   19 +++++-
 3 files changed, 79 insertions(+), 89 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5edadfe..71532f9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -30,40 +30,31 @@
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
-
-#define SG_MEMPOOL_NR		ARRAY_SIZE(scsi_sg_pools)
 #define SG_MEMPOOL_SIZE		2
 
+/*
+ * Should fit within a single page.
+ */
+enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
+
+enum { SG_MEMPOOL_NR =
+	(SCSI_MAX_SG_SEGMENTS >= 8) +
+	(SCSI_MAX_SG_SEGMENTS >= 16) +
+	(SCSI_MAX_SG_SEGMENTS >= 32) +
+	(SCSI_MAX_SG_SEGMENTS >= 64) +
+	(SCSI_MAX_SG_SEGMENTS >= 128) +
+	(SCSI_MAX_SG_SEGMENTS >= 256) +
+	(SCSI_MAX_SG_SEGMENTS >= 512)
+};
+
 struct scsi_host_sg_pool {
-	size_t		size;
-	char		*name; 
+	unsigned		size;
 	struct kmem_cache	*slab;
 	mempool_t	*pool;
 };
+static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
+
 
-#if (SCSI_MAX_PHYS_SEGMENTS < 32)
-#error SCSI_MAX_PHYS_SEGMENTS is too small
-#endif
-
-#define SP(x) { x, "sgpool-" #x } 
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
-	SP(8),
-	SP(16),
-	SP(32),
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	SP(64),
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	SP(128),
-#if (SCSI_MAX_PHYS_SEGMENTS > 128)
-	SP(256),
-#if (SCSI_MAX_PHYS_SEGMENTS > 256)
-#error SCSI_MAX_PHYS_SEGMENTS is too large
-#endif
-#endif
-#endif
-#endif
-}; 	
-#undef SP
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -703,44 +694,32 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
 	return NULL;
 }
 
+static unsigned scsi_sgtable_index(unsigned nents)
+{
+	int i, size;
+
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
+		if (size >= nents)
+			return i;
+
+	if (SCSI_MAX_SG_SEGMENTS >= nents)
+		return SG_MEMPOOL_NR-1;
+
+	printk(KERN_ERR "scsi: bad segment count=%d\n", nents);
+	BUG();
+	return -1;
+}
+
 struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
-	struct scsi_host_sg_pool *sgp;
+	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
 	struct scatterlist *sgl;
 
-	BUG_ON(!cmd->use_sg);
-
-	switch (cmd->use_sg) {
-	case 1 ... 8:
-		cmd->sglist_len = 0;
-		break;
-	case 9 ... 16:
-		cmd->sglist_len = 1;
-		break;
-	case 17 ... 32:
-		cmd->sglist_len = 2;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	case 33 ... 64:
-		cmd->sglist_len = 3;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	case 65 ... 128:
-		cmd->sglist_len = 4;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS  > 128)
-	case 129 ... 256:
-		cmd->sglist_len = 5;
-		break;
-#endif
-#endif
-#endif
-	default:
+	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgl))
 		return NULL;
-	}
 
-	sgp = scsi_sg_pools + cmd->sglist_len;
-	sgl = mempool_alloc(sgp->pool, gfp_mask);
+	cmd->sg_pool = pool;
 	return sgl;
 }
 
@@ -748,13 +727,7 @@ EXPORT_SYMBOL(scsi_alloc_sgtable);
 
 void scsi_free_sgtable(struct scsi_cmnd *cmd)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
-	struct scsi_host_sg_pool *sgp;
-
-	BUG_ON(cmd->sglist_len >= SG_MEMPOOL_NR);
-
-	sgp = scsi_sg_pools + cmd->sglist_len;
-	mempool_free(sgl, sgp->pool);
+	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -787,6 +760,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 	 */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
+	cmd->use_sg = 0;
 }
 
 /*
@@ -994,7 +968,6 @@ EXPORT_SYMBOL(scsi_io_completion);
 static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
-	struct scatterlist *sgpnt;
 	int		   count;
 
 	/*
@@ -1007,14 +980,13 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	sgpnt = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!sgpnt)) {
+	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
+	if (unlikely(!cmd->request_buffer)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
-	cmd->request_buffer = (char *) sgpnt;
 	if (blk_pc_request(req))
 		cmd->request_bufflen = req->data_len;
 	else
@@ -1579,7 +1551,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 		return NULL;
 
 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
-	blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
+	blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);
@@ -1658,9 +1630,15 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_unblock_requests);
 
+const char* sg_names[] = {
+	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
+	"sgpool-128", "sgpool-256", "sgpool-512"
+};
+
 int __init scsi_init_queue(void)
 {
 	int i;
+	unsigned size;
 
 	scsi_io_context_cache = kmem_cache_create("scsi_io_context",
 					sizeof(struct scsi_io_context),
@@ -1670,25 +1648,33 @@ int __init scsi_init_queue(void)
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < SG_MEMPOOL_NR; i++) {
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		int size = sgp->size * sizeof(struct scatterlist);
-
-		sgp->slab = kmem_cache_create(sgp->name, size, 0,
-				SLAB_HWCACHE_ALIGN, NULL);
+		sgp->size = size;
+		sgp->slab = kmem_cache_create(sg_names[i],
+				sgp->size*sizeof(struct scatterlist),
+				0, 0, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 
 		sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
 						     sgp->slab);
 		if (!sgp->pool) {
 			printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 	}
 
+	/* FIXME: Here for the debugging phase only */
+	printk(KERN_ERR
+		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
+		"so_scaterlist=%Zd\n",
+		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
+		sizeof(struct scatterlist)
+	);
+
 	return 0;
 }
 
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 9f8f80a..702fcfe 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -11,13 +11,6 @@
 #include <linux/types.h>
 
 /*
- *	The maximum sg list length SCSI can cope with
- *	(currently must be a power of 2 between 32 and 256)
- */
-#define SCSI_MAX_PHYS_SEGMENTS	MAX_PHYS_SEGMENTS
-
-
-/*
  *	SCSI command lengths
  */
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 760d4a5..279a4df 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -71,7 +71,7 @@ struct scsi_cmnd {
 
 	/* These elements define the operation we ultimately want to perform */
 	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sglist_len;	/* size of malloc'd scatter-gather list */
+	unsigned short sg_pool;	/* pool index of allocated sg array */
 
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
@@ -138,9 +138,20 @@ extern void scsi_free_sgtable(struct scsi_cmnd *);
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
-#define scsi_sg_count(cmd) ((cmd)->use_sg)
-#define scsi_sglist(cmd) ((struct scatterlist *)(cmd)->request_buffer)
-#define scsi_bufflen(cmd) ((cmd)->request_bufflen)
+static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
+{
+	return cmd->->use_sg;
+}
+
+static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
+{
+	return ((struct scatterlist *)cmd->request_buffer)
+}
+
+static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
+{
+	return cmd->request_bufflen;
+}
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH A2/5] SCSI: scsi_sgtable implementation
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
  2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
@ 2007-07-24  8:56 ` Boaz Harrosh
  2007-07-24  8:59 ` [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable Boaz Harrosh
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  8:56 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


  As proposed by James Bottomley all I/O members of struct scsi_cmnd
  and the resid member, which need to be duplicated for bidirectional
  transfers. Can be allocated together with the sg-list they are
  pointing to. This way when bidi comes the structure can be duplicated
  with minimal change to code, and with no extra baggage when bidi is not
  used. The resulting code is the use of a new mechanism called scsi_sgtable.

  scsi_cmnd.h
  - define a new scsi_sgtable structure that will hold IO descriptors + the
    actual scattergather array.
  - Hold a pointer to the scsi_sgtable in scsi_cmnd.
  - Deprecate old, now unnecessary, IO members of scsi_cmnd. These members are
    kept for compatibility with unconverted drivers, still lurking around in
    the code tree. They can be removed completely in the future.
  - Modify data accessors to now use new members of scsi_sgtable.

  scsi_lib.c
  - scsi_lib is converted to use the new scsi_sgtable, in stead of the old
    members and sg-arrays.
  - scsi_{alloc,free}_sgtable() API has changed. This will break scsi_stgt
    which will need to be converted to new implementation.
  - Special code is inserted to initialize the old compatibility members from
    the new structures. This code will be removed.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  113 +++++++++++++++++++++++-----------------------
 include/scsi/scsi_cmnd.h |   40 ++++++++++-------
 2 files changed, 80 insertions(+), 73 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 71532f9..262128c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -38,13 +38,13 @@
 enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
 
 enum { SG_MEMPOOL_NR =
-	(SCSI_MAX_SG_SEGMENTS >= 8) +
-	(SCSI_MAX_SG_SEGMENTS >= 16) +
-	(SCSI_MAX_SG_SEGMENTS >= 32) +
-	(SCSI_MAX_SG_SEGMENTS >= 64) +
-	(SCSI_MAX_SG_SEGMENTS >= 128) +
-	(SCSI_MAX_SG_SEGMENTS >= 256) +
-	(SCSI_MAX_SG_SEGMENTS >= 512)
+	(SCSI_MAX_SG_SEGMENTS > 8) +
+	(SCSI_MAX_SG_SEGMENTS > 16) +
+	(SCSI_MAX_SG_SEGMENTS > 32) +
+	(SCSI_MAX_SG_SEGMENTS > 64) +
+	(SCSI_MAX_SG_SEGMENTS > 128) +
+	(SCSI_MAX_SG_SEGMENTS > 256) +
+	(SCSI_MAX_SG_SEGMENTS > 512)
 };
 
 struct scsi_host_sg_pool {
@@ -54,7 +54,10 @@ struct scsi_host_sg_pool {
 };
 static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
 
-
+static inline unsigned scsi_pool_size(int pool)
+{
+	return scsi_sg_pools[pool].size;
+}
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -699,7 +702,7 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	int i, size;
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
-		if (size >= nents)
+		if (size > nents)
 			return i;
 
 	if (SCSI_MAX_SG_SEGMENTS >= nents)
@@ -710,26 +713,26 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
 {
-	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
-	struct scatterlist *sgl;
+	unsigned int pool = scsi_sgtable_index(sg_count);
+	struct scsi_sgtable *sgt;
 
-	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
-	if (unlikely(!sgl))
+	sgt = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgt))
 		return NULL;
 
-	cmd->sg_pool = pool;
-	return sgl;
+	memset(sgt, 0, SG_TABLE_SIZEOF(scsi_pool_size(pool)));
+	sgt->sg_count = sg_count;
+	sgt->sg_pool = pool;
+	return sgt;
 }
-
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_sgtable *sgt)
 {
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sgt, scsi_sg_pools[sgt->sg_pool].pool);
 }
-
 EXPORT_SYMBOL(scsi_free_sgtable);
 
 /*
@@ -751,13 +754,12 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sgtable)
+		scsi_free_sgtable(cmd->sgtable);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
+	cmd->sgtable = NULL;
+
+	/*FIXME: make code backward compatible with old system */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
 	cmd->use_sg = 0;
@@ -794,7 +796,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -802,8 +804,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -826,9 +826,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -836,7 +838,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -969,41 +970,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
-
-	/*
-	 * We used to not use scatter-gather for single segment request,
-	 * but now we do (it makes highmem I/O easier to support without
-	 * kmapping pages)
-	 */
-	cmd->use_sg = req->nr_phys_segments;
+	struct scsi_sgtable *sgt;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sgt = scsi_alloc_sgtable(req->nr_phys_segments, GFP_ATOMIC);
+	if (unlikely(!sgt)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sgt->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sgt->length = req->nr_sectors << 9;
 
+	cmd->sgtable = sgt;
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sgt->sglist);
+	if (likely(count <= sgt->sg_count)) {
+		sgt->sg_count = count;
+
+		/* FIXME: make code backward compatible with old system */
+		cmd->request_buffer = sgt->sglist;
+		cmd->request_bufflen = sgt->length;
+		cmd->use_sg = sgt->sg_count;
+
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, scsi_sg_count(cmd));
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1059,7 +1061,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1088,9 +1090,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		cmd->sgtable = NULL;
 		req->buffer = NULL;
 	}
 
@@ -1631,8 +1631,8 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 EXPORT_SYMBOL(scsi_unblock_requests);
 
 const char* sg_names[] = {
-	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
-	"sgpool-128", "sgpool-256", "sgpool-512"
+	"sgtable-7", "sgtable-15", "sgtable-31", "sgtable-63",
+	"sgtable-127", "sgtable-255", "sgtable-511"
 };
 
 int __init scsi_init_queue(void)
@@ -1650,10 +1650,9 @@ int __init scsi_init_queue(void)
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		sgp->size = size;
+		sgp->size = size-1;
 		sgp->slab = kmem_cache_create(sg_names[i],
-				sgp->size*sizeof(struct scatterlist),
-				0, 0, NULL);
+				SG_TABLE_SIZEOF(sgp->size), 0, 0, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
 					sg_names[i]);
@@ -1670,9 +1669,9 @@ int __init scsi_init_queue(void)
 	/* FIXME: Here for the debugging phase only */
 	printk(KERN_ERR
 		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
-		"so_scaterlist=%Zd\n",
+		"so_scaterlist=%Zd so_sgtable=%Zd\n",
 		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
-		sizeof(struct scatterlist)
+		sizeof(struct scatterlist), sizeof(struct scsi_sgtable)
 	);
 
 	return 0;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 279a4df..574ea9d 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,16 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_sgtable {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sg_pool;
+	struct scatterlist sglist[0];
+};
+
+#define SG_TABLE_SIZEOF(sg_count) ((sg_count)*sizeof(struct scatterlist) \
+                                   + sizeof(struct scsi_sgtable))
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,15 +74,11 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
+	struct scsi_sgtable *sgtable;
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -82,10 +88,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -117,6 +119,11 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	unsigned short __deprecated use_sg;
+	unsigned __deprecated request_bufflen;
+	void __deprecated *request_buffer;
+	int __deprecated resid;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -132,35 +139,36 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask);
+extern void scsi_free_sgtable(struct scsi_sgtable *sgt);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->->use_sg;
+	return cmd->sgtable ? cmd->sgtable->sg_count : 0;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer)
+	return cmd->sgtable ? cmd->sgtable->sglist : 0;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sgtable ? cmd->sgtable->length : 0;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	if (cmd->sgtable)
+		cmd->sgtable->resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sgtable ? cmd->sgtable->resid : 0;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
  2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
  2007-07-24  8:56 ` [PATCH A2/5] " Boaz Harrosh
@ 2007-07-24  8:59 ` Boaz Harrosh
  2007-07-24  9:01 ` [PATCH B2/5] SCSI: support for allocating large scatterlists Boaz Harrosh
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  8:59 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


   Based on Jens code for sg-chaining but over scsi_sgtable implementation
   - Previous scsi_{alloc,free}_sgtable() renamed to scsi_{alloc,free}_sgtable_page()
   - scsi_{alloc,free}_sgtable() using the above now supports sg-chaining with multiple
     sgtable allocations.
   - Report arbitrary default of 2048 to block layer.

    from Jens:
      This is what enables large commands. If we need to allocate an
      sgtable that doesn't fit in a single page, allocate several
      SCSI_MAX_SG_SEGMENTS sized tables and chain them together.
      SCSI defaults to large chained sg tables, if the arch supports it.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c |   89 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 262128c..13870b5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -59,6 +59,12 @@ static inline unsigned scsi_pool_size(int pool)
 	return scsi_sg_pools[pool].size;
 }
 
+/*
+ * IO limit For archs that have sg chaining. This limit is totally arbitrary,
+ * a setting of 2048 will get you at least 8mb ios.
+ */
+#define SCSI_MAX_SG_CHAIN_SEGMENTS	2048
+
 static void scsi_run_queue(struct request_queue *q);
 
 /*
@@ -713,7 +719,7 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
+static struct scsi_sgtable *scsi_alloc_sgtable_page(int sg_count, gfp_t gfp_mask)
 {
 	unsigned int pool = scsi_sgtable_index(sg_count);
 	struct scsi_sgtable *sgt;
@@ -727,12 +733,77 @@ struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
 	sgt->sg_pool = pool;
 	return sgt;
 }
+
+struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
+{
+	struct scsi_sgtable *sgt, *prev, *ret;
+
+	if (sg_count <= SCSI_MAX_SG_SEGMENTS)
+		return scsi_alloc_sgtable_page(sg_count, gfp_mask);
+
+	ret = prev = NULL;
+	do {
+		int this;
+
+		if (sg_count > SCSI_MAX_SG_SEGMENTS) {
+			this = SCSI_MAX_SG_SEGMENTS - 1; /* room for chain */
+		} else {
+			this = sg_count;
+		}
+
+		sgt = scsi_alloc_sgtable_page(this, gfp_mask);
+		/*
+		 * FIXME: since second and on allocations are done 
+		 * ~__GFP_WAIT we can fail more easilly, but nothing
+		 * prevents us from trying smaller pools and chaining
+		 * more arrays. The last patch in the series does just
+		 * that.
+ 		 */
+		if (unlikely(!sgt))
+			goto enomem;
+
+		/* first loop through, set return value */
+		if (!ret)
+			ret = sgt;
+
+		/* chain previous sglist, if any */
+		if (prev)
+			sg_chain(prev->sglist, scsi_pool_size(prev->sg_pool),
+			                                           sgt->sglist);
+
+		/*
+		 * don't allow subsequent mempool allocs to sleep, it would
+		 * violate the mempool principle.
+		 */
+		gfp_mask &= ~__GFP_WAIT;
+		gfp_mask |= __GFP_HIGH;
+		sg_count -= this;
+		prev = sgt;
+	} while (sg_count);
+
+	return ret;
+enomem:
+	if (ret)
+		scsi_free_sgtable(ret);
+	return NULL;
+}
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_sgtable *sgt)
+static void scsi_free_sgtable_page(struct scsi_sgtable *sgt)
 {
 	mempool_free(sgt, scsi_sg_pools[sgt->sg_pool].pool);
 }
+
+static void scsi_free_sgtable(struct scsi_sgtable *sgt)
+{
+	do {
+		struct scatterlist *next, *here_last;
+		here_last = &sgt->sglist[scsi_pool_size(sgt->sg_pool) - 1];
+		next = sg_is_chain(here_last) ? sg_chain_ptr(here_last) : NULL;
+		scsi_free_sgtable_page(sgt);
+		sgt = next ? ((struct scsi_sgtable*)next) - 1 : NULL;
+	} while(sgt);
+}
 EXPORT_SYMBOL(scsi_free_sgtable);
 
 /*
@@ -1550,8 +1621,22 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 	if (!q)
 		return NULL;
 
+	/*
+	 * this limit is imposed by hardware restrictions
+	 */
 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
+
+	/*
+	 * In the future, sg chaining support will be mandatory and this
+	 * ifdef can then go away. Right now we don't have all archs
+	 * converted, so better keep it safe.
+	 */
+#ifdef ARCH_HAS_SG_CHAIN
+	blk_queue_max_phys_segments(q, SCSI_MAX_SG_CHAIN_SEGMENTS);
+#else
 	blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
+#endif
+
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH B2/5] SCSI: support for allocating large scatterlists
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
                   ` (2 preceding siblings ...)
  2007-07-24  8:59 ` [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable Boaz Harrosh
@ 2007-07-24  9:01 ` Boaz Harrosh
  2007-07-24  9:03 ` [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning Boaz Harrosh
  2007-07-24  9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
  5 siblings, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  9:01 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


    A slightly revised sg-chaining patch to accommodate
    for the cleanup of sg-pools allocations.

    from Jens:
      This is what enables large commands. If we need to allocate an
      sgtable that doesn't fit in a single page, allocate several
      SCSI_MAX_SG_SEGMENTS sized tables and chain them together.

      SCSI defaults to large chained sg tables, if the arch supports it.

    Was-Signed-by: Jens Axboe <jens.axboe@oracle.com>

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  136 +++++++++++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_cmnd.h |    1 +
 2 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 71532f9..7ee5591 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -54,7 +54,11 @@ struct scsi_host_sg_pool {
 };
 static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
 
-
+/*
+ * IO limit For archs that have sg chaining. This limit is totally arbitrary,
+ * a setting of 2048 will get you at least 8mb ios.
+ */
+#define SCSI_MAX_SG_CHAIN_SEGMENTS	2048
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -712,21 +716,123 @@ static unsigned scsi_sgtable_index(unsigned nents)
 
 struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
-	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
-	struct scatterlist *sgl;
+	struct scsi_host_sg_pool *sgp;
+	struct scatterlist *sgl, *prev, *ret;
+	unsigned int index;
+	int this, left;
+
+	BUG_ON(!cmd->use_sg);
+
+	left = cmd->use_sg;
+	ret = prev = NULL;
+	do {
+		this = left;
+		if (this > SCSI_MAX_SG_SEGMENTS) {
+			this = SCSI_MAX_SG_SEGMENTS - 1;
+			index = SG_MEMPOOL_NR - 1;
+		} else
+			index = scsi_sgtable_index(this);
 
-	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
-	if (unlikely(!sgl))
-		return NULL;
+		left -= this;
+
+		sgp = scsi_sg_pools + index;
+
+		sgl = mempool_alloc(sgp->pool, gfp_mask);
+		if (unlikely(!sgl))
+			goto enomem;
+
+		memset(sgl, 0, sizeof(*sgl) * sgp->size);
+
+		/*
+		 * first loop through, set initial index and return value
+		 */
+		if (!ret) {
+			cmd->sg_pool = index;
+			ret = sgl;
+		}
+
+		/*
+		 * chain previous sglist, if any. we know the previous
+		 * sglist must be the biggest one, or we would not have
+		 * ended up doing another loop.
+		 */
+		if (prev)
+			sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
+
+		/*
+		 * don't allow subsequent mempool allocs to sleep, it would
+		 * violate the mempool principle.
+		 */
+		gfp_mask &= ~__GFP_WAIT;
+		gfp_mask |= __GFP_HIGH;
+		prev = sgl;
+	} while (left);
+
+	/*
+	 * ->use_sg may get modified after dma mapping has potentially
+	 * shrunk the number of segments, so keep a copy of it for free.
+	 */
+	cmd->__use_sg = cmd->use_sg;
+	return ret;
+enomem:
+	if (ret) {
+		/*
+		 * Free entries chained off ret. Since we were trying to
+		 * allocate another sglist, we know that all entries are of
+		 * the max size.
+		 */
+		sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
+		prev = ret;
+		ret = &ret[SCSI_MAX_SG_SEGMENTS - 1];
+
+		while ((sgl = sg_chain_ptr(ret)) != NULL) {
+			ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
+			mempool_free(sgl, sgp->pool);
+		}
 
-	cmd->sg_pool = pool;
-	return sgl;
+		mempool_free(prev, sgp->pool);
+	}
+	return NULL;
 }
 
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
 void scsi_free_sgtable(struct scsi_cmnd *cmd)
 {
+	struct scatterlist *sgl = cmd->request_buffer;
+	struct scsi_host_sg_pool *sgp;
+
+	/*
+	 * if this is the biggest size sglist, check if we have
+	 * chained parts we need to free
+	 */
+	if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+		unsigned short this, left;
+		struct scatterlist *next;
+		unsigned int index;
+
+		left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
+		next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
+		while (left && next) {
+			sgl = next;
+			this = left;
+			if (this > SCSI_MAX_SG_SEGMENTS) {
+				this = SCSI_MAX_SG_SEGMENTS - 1;
+				index = SG_MEMPOOL_NR - 1;
+			} else
+				index = scsi_sgtable_index(this);
+
+			left -= this;
+
+			sgp = scsi_sg_pools + index;
+
+			if (left)
+				next = sg_chain_ptr(&sgl[sgp->size - 1]);
+
+			mempool_free(sgl, sgp->pool);
+		}
+	}
+
 	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
 }
 
@@ -1550,8 +1656,22 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 	if (!q)
 		return NULL;
 
+	/*
+	 * this limit is imposed by hardware restrictions
+	 */
 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
+
+	/*
+	 * In the future, sg chaining support will be mandatory and this
+	 * ifdef can then go away. Right now we don't have all archs
+	 * converted, so better keep it safe.
+	 */
+#ifdef ARCH_HAS_SG_CHAIN
+	blk_queue_max_phys_segments(q, SCSI_MAX_SG_CHAIN_SEGMENTS);
+#else
 	blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
+#endif
+
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 279a4df..7d0b2de 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -72,6 +72,7 @@ struct scsi_cmnd {
 	/* These elements define the operation we ultimately want to perform */
 	unsigned short use_sg;	/* Number of pieces of scatter-gather */
 	unsigned short sg_pool;	/* pool index of allocated sg array */
+	unsigned short __use_sg;
 
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
                   ` (3 preceding siblings ...)
  2007-07-24  9:01 ` [PATCH B2/5] SCSI: support for allocating large scatterlists Boaz Harrosh
@ 2007-07-24  9:03 ` Boaz Harrosh
  2007-07-24  9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
  5 siblings, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24  9:03 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


  As proposed by James Bottomley all I/O members of struct scsi_cmnd
  and the resid member, which need to be duplicated for bidirectional
  transfers. Can be allocated together with the sg-list they are
  pointing to. This way when bidi comes the structure can be duplicated
  with minimal change to code, and with no extra baggage when bidi is not
  used. The resulting code is the use of a new mechanism called scsi_sgtable.

  This code is over Jens's sg-chaining patches to scsi-ml. and it keeps
  functionality and compatibility with large IO threw sg-chaining.

  scsi_cmnd.h
  - define a new scsi_sgtable structure that will hold IO descriptors + the
    actual scattergather array.
  - Hold a pointer to the scsi_sgtable in scsi_cmnd.
  - Deprecate old, now unnecessary, IO members of scsi_cmnd. These members are
    kept for compatibility with unconverted drivers, still lurking around in
    the code tree. They can be removed completely in the future.
  - Modify data accessors to now use new members of scsi_sgtable.

  scsi_lib.c
  - scsi_lib is converted to use the new scsi_sgtable, in stead of the old
    members and sg-arrays.
  - scsi_{alloc,free}_sgtable() API has changed. This will break scsi_stgt
    which will need to be converted to new implementation.
  - Special code is inserted to initialize the old compatibility members from
    the new structures. This code will be removed.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  242 ++++++++++++++++++++--------------------------
 include/scsi/scsi_cmnd.h |   41 +++++---
 2 files changed, 127 insertions(+), 156 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7ee5591..13870b5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -38,13 +38,13 @@
 enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
 
 enum { SG_MEMPOOL_NR =
-	(SCSI_MAX_SG_SEGMENTS >= 8) +
-	(SCSI_MAX_SG_SEGMENTS >= 16) +
-	(SCSI_MAX_SG_SEGMENTS >= 32) +
-	(SCSI_MAX_SG_SEGMENTS >= 64) +
-	(SCSI_MAX_SG_SEGMENTS >= 128) +
-	(SCSI_MAX_SG_SEGMENTS >= 256) +
-	(SCSI_MAX_SG_SEGMENTS >= 512)
+	(SCSI_MAX_SG_SEGMENTS > 8) +
+	(SCSI_MAX_SG_SEGMENTS > 16) +
+	(SCSI_MAX_SG_SEGMENTS > 32) +
+	(SCSI_MAX_SG_SEGMENTS > 64) +
+	(SCSI_MAX_SG_SEGMENTS > 128) +
+	(SCSI_MAX_SG_SEGMENTS > 256) +
+	(SCSI_MAX_SG_SEGMENTS > 512)
 };
 
 struct scsi_host_sg_pool {
@@ -54,6 +54,11 @@ struct scsi_host_sg_pool {
 };
 static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
 
+static inline unsigned scsi_pool_size(int pool)
+{
+	return scsi_sg_pools[pool].size;
+}
+
 /*
  * IO limit For archs that have sg chaining. This limit is totally arbitrary,
  * a setting of 2048 will get you at least 8mb ios.
@@ -703,7 +708,7 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	int i, size;
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
-		if (size >= nents)
+		if (size > nents)
 			return i;
 
 	if (SCSI_MAX_SG_SEGMENTS >= nents)
@@ -714,50 +719,57 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static struct scsi_sgtable *scsi_alloc_sgtable_page(int sg_count, gfp_t gfp_mask)
 {
-	struct scsi_host_sg_pool *sgp;
-	struct scatterlist *sgl, *prev, *ret;
-	unsigned int index;
-	int this, left;
-
-	BUG_ON(!cmd->use_sg);
+	unsigned int pool = scsi_sgtable_index(sg_count);
+	struct scsi_sgtable *sgt;
 
-	left = cmd->use_sg;
-	ret = prev = NULL;
-	do {
-		this = left;
-		if (this > SCSI_MAX_SG_SEGMENTS) {
-			this = SCSI_MAX_SG_SEGMENTS - 1;
-			index = SG_MEMPOOL_NR - 1;
-		} else
-			index = scsi_sgtable_index(this);
+	sgt = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgt))
+		return NULL;
 
-		left -= this;
+	memset(sgt, 0, SG_TABLE_SIZEOF(scsi_pool_size(pool)));
+	sgt->sg_count = sg_count;
+	sgt->sg_pool = pool;
+	return sgt;
+}
 
-		sgp = scsi_sg_pools + index;
+struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
+{
+	struct scsi_sgtable *sgt, *prev, *ret;
 
-		sgl = mempool_alloc(sgp->pool, gfp_mask);
-		if (unlikely(!sgl))
-			goto enomem;
+	if (sg_count <= SCSI_MAX_SG_SEGMENTS)
+		return scsi_alloc_sgtable_page(sg_count, gfp_mask);
 
-		memset(sgl, 0, sizeof(*sgl) * sgp->size);
+	ret = prev = NULL;
+	do {
+		int this;
 
-		/*
-		 * first loop through, set initial index and return value
-		 */
-		if (!ret) {
-			cmd->sg_pool = index;
-			ret = sgl;
+		if (sg_count > SCSI_MAX_SG_SEGMENTS) {
+			this = SCSI_MAX_SG_SEGMENTS - 1; /* room for chain */
+		} else {
+			this = sg_count;
 		}
 
+		sgt = scsi_alloc_sgtable_page(this, gfp_mask);
 		/*
-		 * chain previous sglist, if any. we know the previous
-		 * sglist must be the biggest one, or we would not have
-		 * ended up doing another loop.
-		 */
+		 * FIXME: since second and on allocations are done 
+		 * ~__GFP_WAIT we can fail more easilly, but nothing
+		 * prevents us from trying smaller pools and chaining
+		 * more arrays. The last patch in the series does just
+		 * that.
+ 		 */
+		if (unlikely(!sgt))
+			goto enomem;
+
+		/* first loop through, set return value */
+		if (!ret)
+			ret = sgt;
+
+		/* chain previous sglist, if any */
 		if (prev)
-			sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
+			sg_chain(prev->sglist, scsi_pool_size(prev->sg_pool),
+			                                           sgt->sglist);
 
 		/*
 		 * don't allow subsequent mempool allocs to sleep, it would
@@ -765,77 +777,33 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		 */
 		gfp_mask &= ~__GFP_WAIT;
 		gfp_mask |= __GFP_HIGH;
-		prev = sgl;
-	} while (left);
+		sg_count -= this;
+		prev = sgt;
+	} while (sg_count);
 
-	/*
-	 * ->use_sg may get modified after dma mapping has potentially
-	 * shrunk the number of segments, so keep a copy of it for free.
-	 */
-	cmd->__use_sg = cmd->use_sg;
 	return ret;
 enomem:
-	if (ret) {
-		/*
-		 * Free entries chained off ret. Since we were trying to
-		 * allocate another sglist, we know that all entries are of
-		 * the max size.
-		 */
-		sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
-		prev = ret;
-		ret = &ret[SCSI_MAX_SG_SEGMENTS - 1];
-
-		while ((sgl = sg_chain_ptr(ret)) != NULL) {
-			ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
-			mempool_free(sgl, sgp->pool);
-		}
-
-		mempool_free(prev, sgp->pool);
-	}
+	if (ret)
+		scsi_free_sgtable(ret);
 	return NULL;
 }
-
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+static void scsi_free_sgtable_page(struct scsi_sgtable *sgt)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
-	struct scsi_host_sg_pool *sgp;
-
-	/*
-	 * if this is the biggest size sglist, check if we have
-	 * chained parts we need to free
-	 */
-	if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
-		unsigned short this, left;
-		struct scatterlist *next;
-		unsigned int index;
-
-		left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
-		next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
-		while (left && next) {
-			sgl = next;
-			this = left;
-			if (this > SCSI_MAX_SG_SEGMENTS) {
-				this = SCSI_MAX_SG_SEGMENTS - 1;
-				index = SG_MEMPOOL_NR - 1;
-			} else
-				index = scsi_sgtable_index(this);
-
-			left -= this;
-
-			sgp = scsi_sg_pools + index;
-
-			if (left)
-				next = sg_chain_ptr(&sgl[sgp->size - 1]);
-
-			mempool_free(sgl, sgp->pool);
-		}
-	}
-
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sgt, scsi_sg_pools[sgt->sg_pool].pool);
 }
 
+static void scsi_free_sgtable(struct scsi_sgtable *sgt)
+{
+	do {
+		struct scatterlist *next, *here_last;
+		here_last = &sgt->sglist[scsi_pool_size(sgt->sg_pool) - 1];
+		next = sg_is_chain(here_last) ? sg_chain_ptr(here_last) : NULL;
+		scsi_free_sgtable_page(sgt);
+		sgt = next ? ((struct scsi_sgtable*)next) - 1 : NULL;
+	} while(sgt);
+}
 EXPORT_SYMBOL(scsi_free_sgtable);
 
 /*
@@ -857,13 +825,12 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sgtable)
+		scsi_free_sgtable(cmd->sgtable);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
+	cmd->sgtable = NULL;
+
+	/*FIXME: make code backward compatible with old system */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
 	cmd->use_sg = 0;
@@ -900,7 +867,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -908,8 +875,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -932,9 +897,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -942,7 +909,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -1075,41 +1041,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
-
-	/*
-	 * We used to not use scatter-gather for single segment request,
-	 * but now we do (it makes highmem I/O easier to support without
-	 * kmapping pages)
-	 */
-	cmd->use_sg = req->nr_phys_segments;
+	struct scsi_sgtable *sgt;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sgt = scsi_alloc_sgtable(req->nr_phys_segments, GFP_ATOMIC);
+	if (unlikely(!sgt)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sgt->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sgt->length = req->nr_sectors << 9;
 
+	cmd->sgtable = sgt;
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sgt->sglist);
+	if (likely(count <= sgt->sg_count)) {
+		sgt->sg_count = count;
+
+		/*FIXME: make code backward compatible with old system */
+		cmd->request_buffer = sgt->sglist;
+		cmd->request_bufflen = sgt->length;
+		cmd->use_sg = sgt->sg_count;
+
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, scsi_sg_count(cmd));
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1165,7 +1132,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1194,9 +1161,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		cmd->sgtable = NULL;
 		req->buffer = NULL;
 	}
 
@@ -1751,8 +1716,8 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 EXPORT_SYMBOL(scsi_unblock_requests);
 
 const char* sg_names[] = {
-	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
-	"sgpool-128", "sgpool-256", "sgpool-512"
+	"sgtable-7", "sgtable-15", "sgtable-31", "sgtable-63",
+	"sgtable-127", "sgtable-255", "sgtable-511"
 };
 
 int __init scsi_init_queue(void)
@@ -1770,10 +1735,9 @@ int __init scsi_init_queue(void)
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		sgp->size = size;
+		sgp->size = size-1;
 		sgp->slab = kmem_cache_create(sg_names[i],
-				sgp->size*sizeof(struct scatterlist),
-				0, 0, NULL);
+				SG_TABLE_SIZEOF(sgp->size), 0, 0, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
 					sg_names[i]);
@@ -1790,9 +1754,9 @@ int __init scsi_init_queue(void)
 	/* FIXME: Here for the debugging phase only */
 	printk(KERN_ERR
 		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
-		"so_scaterlist=%Zd\n",
+		"so_scaterlist=%Zd so_sgtable=%Zd\n",
 		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
-		sizeof(struct scatterlist)
+		sizeof(struct scatterlist), sizeof(struct scsi_sgtable)
 	);
 
 	return 0;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 7d0b2de..574ea9d 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,16 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_sgtable {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sg_pool;
+	struct scatterlist sglist[0];
+};
+
+#define SG_TABLE_SIZEOF(sg_count) ((sg_count)*sizeof(struct scatterlist) \
+                                   + sizeof(struct scsi_sgtable))
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,16 +74,11 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
+	struct scsi_sgtable *sgtable;
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-	unsigned short __use_sg;
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -83,10 +88,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -118,6 +119,11 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	unsigned short __deprecated use_sg;
+	unsigned __deprecated request_bufflen;
+	void __deprecated *request_buffer;
+	int __deprecated resid;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -133,35 +139,36 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask);
+extern void scsi_free_sgtable(struct scsi_sgtable *sgt);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->->use_sg;
+	return cmd->sgtable ? cmd->sgtable->sg_count : 0;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer)
+	return cmd->sgtable ? cmd->sgtable->sglist : 0;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sgtable ? cmd->sgtable->length : 0;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	if (cmd->sgtable)
+		cmd->sgtable->resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sgtable ? cmd->sgtable->resid : 0;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
                   ` (4 preceding siblings ...)
  2007-07-24  9:03 ` [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning Boaz Harrosh
@ 2007-07-24  9:16 ` FUJITA Tomonori
  2007-07-24 10:01   ` Boaz Harrosh
  5 siblings, 1 reply; 27+ messages in thread
From: FUJITA Tomonori @ 2007-07-24  9:16 UTC (permalink / raw)
  To: bharrosh; +Cc: James.Bottomley, jens.axboe, fujita.tomonori, linux-scsi

From: Boaz Harrosh <bharrosh@panasas.com>
Subject: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Tue, 24 Jul 2007 11:47:50 +0300

> As Jens said, there is nothing common to scsi_sgtable and 
> sglists. Save the fact that it is a massive conflict at 
> scsi-ml. They touch all the same places.
> 
> Proposed is a simple way out. Two patchsets That produce the
> same output at the end.
> 
> One: scsi_sgtable_than_sg-chaining
> Two: sg-chaining_than_scsi_sgtable

Hmm, I thought that I've already posted a scsi_sgtable patch working
with sg-chaining together.

http://marc.info/?l=linux-scsi&m=118519987632758&w=2

I quoted from my mail:

---
I think that the main issue of integrating sgtable and sglist is how
to put scatterlist to scsi_sgtable structure.

If we allocate a scsi_sgtable structure and sglists separately, the
code is pretty simple. But probably it's not the best way from the
perspective of performance.

If we put sglists into the scsi_sgtable structure (like Boaz's patch
does) and allocate and chain sglists only for large I/Os, we would
have the better performance (especially for small I/Os). But we will
have more complicated code.
---

>From a quick look over your patchset, you chose the latter. And my
patch uses the former.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24  9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
@ 2007-07-24 10:01   ` Boaz Harrosh
  2007-07-24 11:12     ` FUJITA Tomonori
  0 siblings, 1 reply; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24 10:01 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: James.Bottomley, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> From: Boaz Harrosh <bharrosh@panasas.com>
> Subject: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Tue, 24 Jul 2007 11:47:50 +0300
> 
>> As Jens said, there is nothing common to scsi_sgtable and 
>> sglists. Save the fact that it is a massive conflict at 
>> scsi-ml. They touch all the same places.
>>
>> Proposed is a simple way out. Two patchsets That produce the
>> same output at the end.
>>
>> One: scsi_sgtable_than_sg-chaining
>> Two: sg-chaining_than_scsi_sgtable
> 
> Hmm, I thought that I've already posted a scsi_sgtable patch working
> with sg-chaining together.
> 
> http://marc.info/?l=linux-scsi&m=118519987632758&w=2
> 
> I quoted from my mail:
> 
> ---
> I think that the main issue of integrating sgtable and sglist is how
> to put scatterlist to scsi_sgtable structure.
> 
> If we allocate a scsi_sgtable structure and sglists separately, the
> code is pretty simple. But probably it's not the best way from the
> perspective of performance.
> 
I was just answering your other mail when this came in so I'll answer
here. 
This Approach is exactly the scsi_data_buffer approach we both
had solutions for. At the time I was all for that approach because it
is safer and could be kept compatible to old drivers. (Less work for
me) But it was decided against it. So suggesting it again is plain
going back.

Now that We have done the work, I must say that James was right. The 
sgtable work is nice and elegant. And makes scsi-ml more simple not
more complicated. And the bidi looks beautifully trivial on-top of
sgtables.

> If we put sglists into the scsi_sgtable structure (like Boaz's patch
> does) and allocate and chain sglists only for large I/Os, we would
> have the better performance (especially for small I/Os). But we will
> have more complicated code.

Only that my proposed solution is more simple more elegant and more
robust than even Jens's solution without any sgtable at all. Actually 
the sgtable does good to chaining. scsi_lib with sglist+sgtable is 
smaller than Jens's sglist by itself.

> ---
> 
> From a quick look over your patchset, you chose the latter. And my
> patch uses the former.

This patchset is not new. I have sent that solution before.
(http://marc.info/?l=linux-scsi&m=117933686313822&w=2)
In fact when first programing for sgtable I had in mind
Jens's work to make sure it will fit and converge.

Please lets not go back to old solutions again

Boaz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24 10:01   ` Boaz Harrosh
@ 2007-07-24 11:12     ` FUJITA Tomonori
  2007-07-24 13:41       ` FUJITA Tomonori
  0 siblings, 1 reply; 27+ messages in thread
From: FUJITA Tomonori @ 2007-07-24 11:12 UTC (permalink / raw)
  To: bharrosh; +Cc: fujita.tomonori, James.Bottomley, jens.axboe, linux-scsi

From: Boaz Harrosh <bharrosh@panasas.com>
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Tue, 24 Jul 2007 13:01:34 +0300

> FUJITA Tomonori wrote:
> > From: Boaz Harrosh <bharrosh@panasas.com>
> > Subject: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> > Date: Tue, 24 Jul 2007 11:47:50 +0300
> > 
> >> As Jens said, there is nothing common to scsi_sgtable and 
> >> sglists. Save the fact that it is a massive conflict at 
> >> scsi-ml. They touch all the same places.
> >>
> >> Proposed is a simple way out. Two patchsets That produce the
> >> same output at the end.
> >>
> >> One: scsi_sgtable_than_sg-chaining
> >> Two: sg-chaining_than_scsi_sgtable
> > 
> > Hmm, I thought that I've already posted a scsi_sgtable patch working
> > with sg-chaining together.
> > 
> > http://marc.info/?l=linux-scsi&m=118519987632758&w=2
> > 
> > I quoted from my mail:
> > 
> > ---
> > I think that the main issue of integrating sgtable and sglist is how
> > to put scatterlist to scsi_sgtable structure.
> > 
> > If we allocate a scsi_sgtable structure and sglists separately, the
> > code is pretty simple. But probably it's not the best way from the
> > perspective of performance.
> > 
> I was just answering your other mail when this came in so I'll answer
> here. 
> This Approach is exactly the scsi_data_buffer approach we both
> had solutions for. At the time I was all for that approach because it
> is safer and could be kept compatible to old drivers. (Less work for
> me) But it was decided against it. So suggesting it again is plain
> going back.

Well, the approach to shuffle the entire request setup around was
rejected. But was the approach to use two data buffers for bidi
completely rejected?


> > If we put sglists into the scsi_sgtable structure (like Boaz's patch
> > does) and allocate and chain sglists only for large I/Os, we would
> > have the better performance (especially for small I/Os). But we will
> > have more complicated code.
> 
> Only that my proposed solution is more simple more elegant and more
> robust than even Jens's solution without any sgtable at all. Actually 
> the sgtable does good to chaining. scsi_lib with sglist+sgtable is 
> smaller than Jens's sglist by itself.

Your patch chains sgtables instead of sglists. It uses more memory for
simple code. But I think that it's a nice approach.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH AB1/5] SCSI: SG pools allocation cleanup.
  2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
@ 2007-07-24 13:08   ` Boaz Harrosh
  2007-07-25  8:08   ` Boaz Harrosh
  1 sibling, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-24 13:08 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi

Boaz Harrosh wrote:
> First patch will not apply on scsi-misc-2.6 because there
> is a missing NULL in the call to kmem_cache_create(). (linux-2.6.23-rcx)
> (If any one need a patchset for that please ask)
It will have more problems if Jens's: 
ac133644304cd1721dfb77193e0502f8afd4ea9b - scsi: simplify scsi_free_sgtable()
is not applied. any way here is an: over scsi-misc one.
Also A2/5 will have issues.

>From 20f05cf739fc414033b4ca11e9f52e9f6d8db95b Mon Sep 17 00:00:00 2001
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Mon, 23 Jul 2007 18:51:59 +0300
Subject: [PATCH] SCSI: SG pools allocation cleanup.

  - The code Automatically calculates at compile time the
    maximum size sg-array that will fit in a memory-page and will allocate
    pools of BASE_2 size, up to that maximum size.
  - split scsi_alloc() into an helper scsi_sgtable_index() that will return
    the index of the pool for a given sg_count.
  - Remove now unused SCSI_MAX_PHYS_SEGMENTS
  - rename sglist_len to sg_pool which is what it always was.
  - Some extra prints at scsi_init_queue(). These prints will be removed
    once everything stabilizes.

  Now that the Arrays are automatically calculated to fit in a page, what
  about ARCH's that have a very big page size? I have, just for demonstration,
  calculated upto 512 entries. But I suspect that other kernel-subsystems are
  bounded to 256 or 128, is that true? should I allow more/less than 512 here?

some common numbers:
Arch                      | SCSI_MAX_SG_SEGMENTS =  | sizeof(struct scatterlist)
--------------------------|-------------------------|---------------------------
x86_64                    | 128                     |32
i386 CONFIG_HIGHMEM64G=y  | 205                     |20
i386 other                | 256                     |16

  Could some one give example numbers of an ARCH with big page size?

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  140 +++++++++++++++++++++-------------------------
 include/scsi/scsi.h      |    7 --
 include/scsi/scsi_cmnd.h |   19 +++++-
 3 files changed, 79 insertions(+), 87 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1f5a07b..c065de5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -29,40 +29,31 @@
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
-
-#define SG_MEMPOOL_NR		ARRAY_SIZE(scsi_sg_pools)
 #define SG_MEMPOOL_SIZE		2
 
+/*
+ * Should fit within a single page.
+ */
+enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
+
+enum { SG_MEMPOOL_NR =
+	(SCSI_MAX_SG_SEGMENTS >= 8) +
+	(SCSI_MAX_SG_SEGMENTS >= 16) +
+	(SCSI_MAX_SG_SEGMENTS >= 32) +
+	(SCSI_MAX_SG_SEGMENTS >= 64) +
+	(SCSI_MAX_SG_SEGMENTS >= 128) +
+	(SCSI_MAX_SG_SEGMENTS >= 256) +
+	(SCSI_MAX_SG_SEGMENTS >= 512)
+};
+
 struct scsi_host_sg_pool {
-	size_t		size;
-	char		*name; 
+	unsigned		size;
 	struct kmem_cache	*slab;
 	mempool_t	*pool;
 };
+static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
+
 
-#if (SCSI_MAX_PHYS_SEGMENTS < 32)
-#error SCSI_MAX_PHYS_SEGMENTS is too small
-#endif
-
-#define SP(x) { x, "sgpool-" #x } 
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
-	SP(8),
-	SP(16),
-	SP(32),
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	SP(64),
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	SP(128),
-#if (SCSI_MAX_PHYS_SEGMENTS > 128)
-	SP(256),
-#if (SCSI_MAX_PHYS_SEGMENTS > 256)
-#error SCSI_MAX_PHYS_SEGMENTS is too large
-#endif
-#endif
-#endif
-#endif
-}; 	
-#undef SP
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -701,44 +692,32 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
 	return NULL;
 }
 
+static unsigned scsi_sgtable_index(unsigned nents)
+{
+	int i, size;
+
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
+		if (size >= nents)
+			return i;
+
+	if (SCSI_MAX_SG_SEGMENTS >= nents)
+		return SG_MEMPOOL_NR-1;
+
+	printk(KERN_ERR "scsi: bad segment count=%d\n", nents);
+	BUG();
+	return -1;
+}
+
 struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
-	struct scsi_host_sg_pool *sgp;
+	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
 	struct scatterlist *sgl;
 
-	BUG_ON(!cmd->use_sg);
-
-	switch (cmd->use_sg) {
-	case 1 ... 8:
-		cmd->sglist_len = 0;
-		break;
-	case 9 ... 16:
-		cmd->sglist_len = 1;
-		break;
-	case 17 ... 32:
-		cmd->sglist_len = 2;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	case 33 ... 64:
-		cmd->sglist_len = 3;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	case 65 ... 128:
-		cmd->sglist_len = 4;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS  > 128)
-	case 129 ... 256:
-		cmd->sglist_len = 5;
-		break;
-#endif
-#endif
-#endif
-	default:
+	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgl))
 		return NULL;
-	}
 
-	sgp = scsi_sg_pools + cmd->sglist_len;
-	sgl = mempool_alloc(sgp->pool, gfp_mask);
+	cmd->sg_pool = pool;
 	return sgl;
 }
 
@@ -746,12 +725,8 @@ EXPORT_SYMBOL(scsi_alloc_sgtable);
 
 void scsi_free_sgtable(struct scatterlist *sgl, int index)
 {
-	struct scsi_host_sg_pool *sgp;
-
 	BUG_ON(index >= SG_MEMPOOL_NR);
-
-	sgp = scsi_sg_pools + index;
-	mempool_free(sgl, sgp->pool);
+	mempool_free(sgl, scsi_sg_pools[index].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -784,6 +759,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 	 */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
+	cmd->use_sg = 0;
 }
 
 /*
@@ -991,7 +967,6 @@ EXPORT_SYMBOL(scsi_io_completion);
 static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
-	struct scatterlist *sgpnt;
 	int		   count;
 
 	/*
@@ -1004,14 +979,13 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	sgpnt = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!sgpnt)) {
+	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
+	if (unlikely(!cmd->request_buffer)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
-	cmd->request_buffer = (char *) sgpnt;
 	if (blk_pc_request(req))
 		cmd->request_bufflen = req->data_len;
 	else
@@ -1576,7 +1550,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 		return NULL;
 
 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
-	blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
+	blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);
@@ -1655,9 +1629,15 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_unblock_requests);
 
+const char* sg_names[] = {
+	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
+	"sgpool-128", "sgpool-256", "sgpool-512"
+};
+
 int __init scsi_init_queue(void)
 {
 	int i;
+	unsigned size;
 
 	scsi_io_context_cache = kmem_cache_create("scsi_io_context",
 					sizeof(struct scsi_io_context),
@@ -1667,25 +1647,33 @@ int __init scsi_init_queue(void)
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < SG_MEMPOOL_NR; i++) {
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		int size = sgp->size * sizeof(struct scatterlist);
-
-		sgp->slab = kmem_cache_create(sgp->name, size, 0,
-				SLAB_HWCACHE_ALIGN, NULL, NULL);
+		sgp->size = size;
+		sgp->slab = kmem_cache_create(sg_names[i],
+				sgp->size*sizeof(struct scatterlist),
+				0, 0, NULL, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 
 		sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
 						     sgp->slab);
 		if (!sgp->pool) {
 			printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 	}
 
+	/* FIXME: Here for the debugging phase only */
+	printk(KERN_ERR
+		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
+		"so_scaterlist=%Zd\n",
+		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
+		sizeof(struct scatterlist)
+	);
+
 	return 0;
 }
 
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 9f8f80a..702fcfe 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -11,13 +11,6 @@
 #include <linux/types.h>
 
 /*
- *	The maximum sg list length SCSI can cope with
- *	(currently must be a power of 2 between 32 and 256)
- */
-#define SCSI_MAX_PHYS_SEGMENTS	MAX_PHYS_SEGMENTS
-
-
-/*
  *	SCSI command lengths
  */
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 53e1705..9214ad6 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -71,7 +71,7 @@ struct scsi_cmnd {
 
 	/* These elements define the operation we ultimately want to perform */
 	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sglist_len;	/* size of malloc'd scatter-gather list */
+	unsigned short sg_pool;	/* pool index of allocated sg array */
 
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
@@ -138,9 +138,20 @@ extern void scsi_free_sgtable(struct scatterlist *, int);
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
-#define scsi_sg_count(cmd) ((cmd)->use_sg)
-#define scsi_sglist(cmd) ((struct scatterlist *)(cmd)->request_buffer)
-#define scsi_bufflen(cmd) ((cmd)->request_bufflen)
+static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
+{
+	return cmd->->use_sg;
+}
+
+static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
+{
+	return ((struct scatterlist *)cmd->request_buffer)
+}
+
+static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
+{
+	return cmd->request_bufflen;
+}
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24 11:12     ` FUJITA Tomonori
@ 2007-07-24 13:41       ` FUJITA Tomonori
  2007-07-24 14:01         ` Benny Halevy
  0 siblings, 1 reply; 27+ messages in thread
From: FUJITA Tomonori @ 2007-07-24 13:41 UTC (permalink / raw)
  To: fujita.tomonori; +Cc: bharrosh, James.Bottomley, jens.axboe, linux-scsi

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Tue, 24 Jul 2007 20:12:47 +0900

> From: Boaz Harrosh <bharrosh@panasas.com>
> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Tue, 24 Jul 2007 13:01:34 +0300
> 
> > FUJITA Tomonori wrote:
> > > From: Boaz Harrosh <bharrosh@panasas.com>
> > > Subject: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> > > Date: Tue, 24 Jul 2007 11:47:50 +0300
> > > 
> > >> As Jens said, there is nothing common to scsi_sgtable and 
> > >> sglists. Save the fact that it is a massive conflict at 
> > >> scsi-ml. They touch all the same places.
> > >>
> > >> Proposed is a simple way out. Two patchsets That produce the
> > >> same output at the end.
> > >>
> > >> One: scsi_sgtable_than_sg-chaining
> > >> Two: sg-chaining_than_scsi_sgtable
> > > 
> > > Hmm, I thought that I've already posted a scsi_sgtable patch working
> > > with sg-chaining together.
> > > 
> > > http://marc.info/?l=linux-scsi&m=118519987632758&w=2
> > > 
> > > I quoted from my mail:
> > > 
> > > ---
> > > I think that the main issue of integrating sgtable and sglist is how
> > > to put scatterlist to scsi_sgtable structure.
> > > 
> > > If we allocate a scsi_sgtable structure and sglists separately, the
> > > code is pretty simple. But probably it's not the best way from the
> > > perspective of performance.
> > > 
> > I was just answering your other mail when this came in so I'll answer
> > here. 
> > This Approach is exactly the scsi_data_buffer approach we both
> > had solutions for. At the time I was all for that approach because it
> > is safer and could be kept compatible to old drivers. (Less work for
> > me) But it was decided against it. So suggesting it again is plain
> > going back.
> 
> Well, the approach to shuffle the entire request setup around was
> rejected. But was the approach to use two data buffers for bidi
> completely rejected?

I should have said that, was the approach to use separate buffer for
sglists instead of putting the sglists and the parameters in one
buffer completely rejected?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24 13:41       ` FUJITA Tomonori
@ 2007-07-24 14:01         ` Benny Halevy
  2007-07-24 16:10           ` James Bottomley
  0 siblings, 1 reply; 27+ messages in thread
From: Benny Halevy @ 2007-07-24 14:01 UTC (permalink / raw)
  To: FUJITA Tomonori, James.Bottomley
  Cc: fujita.tomonori, bharrosh, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> I should have said that, was the approach to use separate buffer for
> sglists instead of putting the sglists and the parameters in one
> buffer completely rejected?

I think that James should be asked this question.
My understanding was that he preferred allocating the sgtable
header along with the scatterlist array.

There are pro's and con's either way.  In my opinion separating
the headers is better for mapping buffers that have a power of 2
#pages (which seems to be the typical case) since when you're
losing one entry in the sgtable for the header you'd waste a lot
more when you just cross the bucket boundary. E.g. for 64 pages
you need to allocate from the "64 to 127" bucket rather than the
"33 to 64" bucket).  Separated, one sgtable header structure
can just be embedded in struct scsi_cmnd for uni-directional transfers
(wasting some space when transferring no data, but saving space and
cycles in the common case vs. allocating it from a separate memory pool)
and the one for bidi read buffers can be allocated separately just for
bidi commands.

Benny

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24 14:01         ` Benny Halevy
@ 2007-07-24 16:10           ` James Bottomley
  2007-07-25  8:26             ` Benny Halevy
  0 siblings, 1 reply; 27+ messages in thread
From: James Bottomley @ 2007-07-24 16:10 UTC (permalink / raw)
  To: Benny Halevy
  Cc: FUJITA Tomonori, fujita.tomonori, bharrosh, jens.axboe,
	linux-scsi

On Tue, 2007-07-24 at 17:01 +0300, Benny Halevy wrote:
> FUJITA Tomonori wrote:
> > I should have said that, was the approach to use separate buffer for
> > sglists instead of putting the sglists and the parameters in one
> > buffer completely rejected?
> 
> I think that James should be asked this question.
> My understanding was that he preferred allocating the sgtable
> header along with the scatterlist array.

All I really cared about was insulating the drivers from future changes
in this area.  It strikes me that for chained sglist implementations,
this can all become a block layer responsibility, since more than SCSI
will want to make use of it.

Just remember, though that whatever is picked has to work in both memory
constrained embedded systems as well as high end clusters.  It seems to
me (but this isn't a mandate) that a single tunable sg element chunk
size will accomplish this the best (as in get rid of the entire SCSI
sglist sizing machinery) .

However, I'm perfectly happy to go with whatever the empirical evidence
says is best .. and hopefully, now we don't have to pick this once and
for all time ... we can alter it if whatever is chosen proves to be
suboptimal.

> There are pro's and con's either way.  In my opinion separating
> the headers is better for mapping buffers that have a power of 2
> #pages (which seems to be the typical case) since when you're
> losing one entry in the sgtable for the header you'd waste a lot
> more when you just cross the bucket boundary. E.g. for 64 pages
> you need to allocate from the "64 to 127" bucket rather than the
> "33 to 64" bucket).  Separated, one sgtable header structure
> can just be embedded in struct scsi_cmnd for uni-directional transfers
> (wasting some space when transferring no data, but saving space and
> cycles in the common case vs. allocating it from a separate memory pool)
> and the one for bidi read buffers can be allocated separately just for
> bidi commands.

This is all opinion ... could someone actually run some performance
tests?

James



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH AB1/5] SCSI: SG pools allocation cleanup.
  2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
  2007-07-24 13:08   ` Boaz Harrosh
@ 2007-07-25  8:08   ` Boaz Harrosh
  2007-07-25  9:05     ` [PATCH AB1/5 ver2] " Boaz Harrosh
  2007-07-25  9:06     ` [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation Boaz Harrosh
  1 sibling, 2 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-25  8:08 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi

I have a very serious and stupid bug in this patch
which did not show in my tests. Please forgive me.
Below is a diff of the bug. As an answer to this mail 
I will resend 2 revised patches AB1, and A2 is also affected.

Sorry
Boaz

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c065de5..29adcc6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1649,7 +1649,7 @@ int __init scsi_init_queue(void)
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		sgp->size = size;
+		sgp->size = (i != SG_MEMPOOL_NR-1) ? size : SCSI_MAX_SG_SEGMENTS;
 		sgp->slab = kmem_cache_create(sg_names[i],
 				sgp->size*sizeof(struct scatterlist),
 				0, 0, NULL);


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-24 16:10           ` James Bottomley
@ 2007-07-25  8:26             ` Benny Halevy
  2007-07-25  8:42               ` FUJITA Tomonori
  0 siblings, 1 reply; 27+ messages in thread
From: Benny Halevy @ 2007-07-25  8:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: FUJITA Tomonori, fujita.tomonori, bharrosh, jens.axboe,
	linux-scsi

James Bottomley wrote:
> On Tue, 2007-07-24 at 17:01 +0300, Benny Halevy wrote:
>> FUJITA Tomonori wrote:
>>> I should have said that, was the approach to use separate buffer for
>>> sglists instead of putting the sglists and the parameters in one
>>> buffer completely rejected?
>> I think that James should be asked this question.
>> My understanding was that he preferred allocating the sgtable
>> header along with the scatterlist array.
> 
> All I really cared about was insulating the drivers from future changes
> in this area.  It strikes me that for chained sglist implementations,
> this can all become a block layer responsibility, since more than SCSI
> will want to make use of it.

agreed.

> 
> Just remember, though that whatever is picked has to work in both memory
> constrained embedded systems as well as high end clusters.  It seems to
> me (but this isn't a mandate) that a single tunable sg element chunk
> size will accomplish this the best (as in get rid of the entire SCSI
> sglist sizing machinery) .

maybe :)
I'm not as familiar as you are with all the different uses of linux.
IMO, having a tunable is worse for the administrator and I'd prefer
an automatic mechanism that will dynamically adapt the allocation
size(s) to the actual use, very much like the one you have today.

> 
> However, I'm perfectly happy to go with whatever the empirical evidence
> says is best .. and hopefully, now we don't have to pick this once and
> for all time ... we can alter it if whatever is chosen proves to be
> suboptimal.

I agree.  This isn't a catholic marriage :)
We'll run some performance experiments comparing the sgtable chaining
implementation vs. a scsi_data_buff implementation pointing
at a possibly chained sglist and let's see if we can measure
any difference.  We'll send results as soon as we have them.

> 
>> There are pro's and con's either way.  In my opinion separating
>> the headers is better for mapping buffers that have a power of 2
>> #pages (which seems to be the typical case) since when you're
>> losing one entry in the sgtable for the header you'd waste a lot
>> more when you just cross the bucket boundary. E.g. for 64 pages
>> you need to allocate from the "64 to 127" bucket rather than the
>> "33 to 64" bucket).  Separated, one sgtable header structure
>> can just be embedded in struct scsi_cmnd for uni-directional transfers
>> (wasting some space when transferring no data, but saving space and
>> cycles in the common case vs. allocating it from a separate memory pool)
>> and the one for bidi read buffers can be allocated separately just for
>> bidi commands.
> 
> This is all opinion ... could someone actually run some performance
> tests?
> 
> James
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-25  8:26             ` Benny Halevy
@ 2007-07-25  8:42               ` FUJITA Tomonori
  2007-07-25 19:22                 ` Boaz Harrosh
  2007-07-25 19:50                 ` Boaz Harrosh
  0 siblings, 2 replies; 27+ messages in thread
From: FUJITA Tomonori @ 2007-07-25  8:42 UTC (permalink / raw)
  To: bhalevy
  Cc: James.Bottomley, tomof, fujita.tomonori, bharrosh, jens.axboe,
	linux-scsi

From: Benny Halevy <bhalevy@panasas.com>
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Wed, 25 Jul 2007 11:26:44 +0300

> > However, I'm perfectly happy to go with whatever the empirical evidence
> > says is best .. and hopefully, now we don't have to pick this once and
> > for all time ... we can alter it if whatever is chosen proves to be
> > suboptimal.
> 
> I agree.  This isn't a catholic marriage :)
> We'll run some performance experiments comparing the sgtable chaining
> implementation vs. a scsi_data_buff implementation pointing
> at a possibly chained sglist and let's see if we can measure
> any difference.  We'll send results as soon as we have them.

I did some tests with your sgtable patchset and the approach to use
separate buffer for sglists. As expected, there was no performance
difference with small I/Os. I've not tried very large I/Os, which
might give some difference.

The patchset to use separate buffer for sglists is available:

git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable


Can you clean up your patchset and upload somewhere?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH AB1/5 ver2] SCSI: SG pools allocation cleanup.
  2007-07-25  8:08   ` Boaz Harrosh
@ 2007-07-25  9:05     ` Boaz Harrosh
  2007-07-25  9:06     ` [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation Boaz Harrosh
  1 sibling, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-25  9:05 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


  - The code Automatically calculates at compile time the
    maximum size sg-array that will fit in a memory-page and will allocate
    pools of BASE_2 size, up to that maximum size.
  - split scsi_alloc() into an helper scsi_sgtable_index() that will return
    the index of the pool for a given sg_count.
  - Remove now unused SCSI_MAX_PHYS_SEGMENTS
  - rename sglist_len to sg_pool which is what it always was.
  - Some extra prints at scsi_init_queue(). These prints will be removed
    once everything stabilizes.

  Now that the Arrays are automatically calculated to fit in a page, what
  about ARCH's that have a very big page size? I have, just for demonstration,
  calculated upto 512 entries. But I suspect that other kernel-subsystems are
  bounded to 256 or 128, is that true? should I allow more/less than 512 here?

some common numbers:
Arch                      | SCSI_MAX_SG_SEGMENTS =  | sizeof(struct scatterlist)
--------------------------|-------------------------|---------------------------
x86_64                    | 128                     |32
i386 CONFIG_HIGHMEM64G=y  | 205                     |20
i386 other                | 256                     |16

  Could some one give example numbers of an ARCH with big page size?

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  143 +++++++++++++++++++++-------------------------
 include/scsi/scsi.h      |    7 --
 include/scsi/scsi_cmnd.h |   19 +++++-
 3 files changed, 80 insertions(+), 89 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5edadfe..694bffa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -30,40 +30,31 @@
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
-
-#define SG_MEMPOOL_NR		ARRAY_SIZE(scsi_sg_pools)
 #define SG_MEMPOOL_SIZE		2
 
+/*
+ * Should fit within a single page.
+ */
+enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
+
+enum { SG_MEMPOOL_NR =
+	(SCSI_MAX_SG_SEGMENTS >= 8) +
+	(SCSI_MAX_SG_SEGMENTS >= 16) +
+	(SCSI_MAX_SG_SEGMENTS >= 32) +
+	(SCSI_MAX_SG_SEGMENTS >= 64) +
+	(SCSI_MAX_SG_SEGMENTS >= 128) +
+	(SCSI_MAX_SG_SEGMENTS >= 256) +
+	(SCSI_MAX_SG_SEGMENTS >= 512)
+};
+
 struct scsi_host_sg_pool {
-	size_t		size;
-	char		*name; 
+	unsigned		size;
 	struct kmem_cache	*slab;
 	mempool_t	*pool;
 };
+static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
+
 
-#if (SCSI_MAX_PHYS_SEGMENTS < 32)
-#error SCSI_MAX_PHYS_SEGMENTS is too small
-#endif
-
-#define SP(x) { x, "sgpool-" #x } 
-static struct scsi_host_sg_pool scsi_sg_pools[] = {
-	SP(8),
-	SP(16),
-	SP(32),
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	SP(64),
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	SP(128),
-#if (SCSI_MAX_PHYS_SEGMENTS > 128)
-	SP(256),
-#if (SCSI_MAX_PHYS_SEGMENTS > 256)
-#error SCSI_MAX_PHYS_SEGMENTS is too large
-#endif
-#endif
-#endif
-#endif
-}; 	
-#undef SP
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -703,44 +694,32 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
 	return NULL;
 }
 
+static unsigned scsi_sgtable_index(unsigned nents)
+{
+	int i, size;
+
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
+		if (size >= nents)
+			return i;
+
+	if (SCSI_MAX_SG_SEGMENTS >= nents)
+		return SG_MEMPOOL_NR-1;
+
+	printk(KERN_ERR "scsi: bad segment count=%d\n", nents);
+	BUG();
+	return -1;
+}
+
 struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
-	struct scsi_host_sg_pool *sgp;
+	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
 	struct scatterlist *sgl;
 
-	BUG_ON(!cmd->use_sg);
-
-	switch (cmd->use_sg) {
-	case 1 ... 8:
-		cmd->sglist_len = 0;
-		break;
-	case 9 ... 16:
-		cmd->sglist_len = 1;
-		break;
-	case 17 ... 32:
-		cmd->sglist_len = 2;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
-	case 33 ... 64:
-		cmd->sglist_len = 3;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
-	case 65 ... 128:
-		cmd->sglist_len = 4;
-		break;
-#if (SCSI_MAX_PHYS_SEGMENTS  > 128)
-	case 129 ... 256:
-		cmd->sglist_len = 5;
-		break;
-#endif
-#endif
-#endif
-	default:
+	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgl))
 		return NULL;
-	}
 
-	sgp = scsi_sg_pools + cmd->sglist_len;
-	sgl = mempool_alloc(sgp->pool, gfp_mask);
+	cmd->sg_pool = pool;
 	return sgl;
 }
 
@@ -748,13 +727,7 @@ EXPORT_SYMBOL(scsi_alloc_sgtable);
 
 void scsi_free_sgtable(struct scsi_cmnd *cmd)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
-	struct scsi_host_sg_pool *sgp;
-
-	BUG_ON(cmd->sglist_len >= SG_MEMPOOL_NR);
-
-	sgp = scsi_sg_pools + cmd->sglist_len;
-	mempool_free(sgl, sgp->pool);
+	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -787,6 +760,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 	 */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
+	cmd->use_sg = 0;
 }
 
 /*
@@ -994,7 +968,6 @@ EXPORT_SYMBOL(scsi_io_completion);
 static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
-	struct scatterlist *sgpnt;
 	int		   count;
 
 	/*
@@ -1007,14 +980,13 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	sgpnt = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!sgpnt)) {
+	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
+	if (unlikely(!cmd->request_buffer)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
-	cmd->request_buffer = (char *) sgpnt;
 	if (blk_pc_request(req))
 		cmd->request_bufflen = req->data_len;
 	else
@@ -1579,7 +1551,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 		return NULL;
 
 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
-	blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
+	blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);
@@ -1658,9 +1630,15 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_unblock_requests);
 
+const char* sg_names[] = {
+	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
+	"sgpool-128", "sgpool-256", "sgpool-512"
+};
+
 int __init scsi_init_queue(void)
 {
 	int i;
+	unsigned size;
 
 	scsi_io_context_cache = kmem_cache_create("scsi_io_context",
 					sizeof(struct scsi_io_context),
@@ -1670,25 +1648,34 @@ int __init scsi_init_queue(void)
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < SG_MEMPOOL_NR; i++) {
+	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		int size = sgp->size * sizeof(struct scatterlist);
-
-		sgp->slab = kmem_cache_create(sgp->name, size, 0,
-				SLAB_HWCACHE_ALIGN, NULL);
+		sgp->size = (i != SG_MEMPOOL_NR-1) ? size :
+		                                           SCSI_MAX_SG_SEGMENTS;
+		sgp->slab = kmem_cache_create(sg_names[i],
+				sgp->size*sizeof(struct scatterlist),
+				0, 0, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 
 		sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
 						     sgp->slab);
 		if (!sgp->pool) {
 			printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
-					sgp->name);
+					sg_names[i]);
 		}
 	}
 
+	/* FIXME: Here for the debugging phase only */
+	printk(KERN_ERR
+		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
+		"so_scaterlist=%Zd\n",
+		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
+		sizeof(struct scatterlist)
+	);
+
 	return 0;
 }
 
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 9f8f80a..702fcfe 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -11,13 +11,6 @@
 #include <linux/types.h>
 
 /*
- *	The maximum sg list length SCSI can cope with
- *	(currently must be a power of 2 between 32 and 256)
- */
-#define SCSI_MAX_PHYS_SEGMENTS	MAX_PHYS_SEGMENTS
-
-
-/*
  *	SCSI command lengths
  */
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 760d4a5..279a4df 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -71,7 +71,7 @@ struct scsi_cmnd {
 
 	/* These elements define the operation we ultimately want to perform */
 	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sglist_len;	/* size of malloc'd scatter-gather list */
+	unsigned short sg_pool;	/* pool index of allocated sg array */
 
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
@@ -138,9 +138,20 @@ extern void scsi_free_sgtable(struct scsi_cmnd *);
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
-#define scsi_sg_count(cmd) ((cmd)->use_sg)
-#define scsi_sglist(cmd) ((struct scatterlist *)(cmd)->request_buffer)
-#define scsi_bufflen(cmd) ((cmd)->request_bufflen)
+static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
+{
+	return cmd->->use_sg;
+}
+
+static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
+{
+	return ((struct scatterlist *)cmd->request_buffer)
+}
+
+static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
+{
+	return cmd->request_bufflen;
+}
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation
  2007-07-25  8:08   ` Boaz Harrosh
  2007-07-25  9:05     ` [PATCH AB1/5 ver2] " Boaz Harrosh
@ 2007-07-25  9:06     ` Boaz Harrosh
  1 sibling, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-25  9:06 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, FUJITA Tomonori, linux-scsi


  As proposed by James Bottomley all I/O members of struc scsi_cmnd
  and the resid member, which need to be duplicated for bidirectional
  transfers. Can be allocated together with the sg-list they are
  pointing to. This way when bidi comes the all structure can be duplicated
  with minimal change to code, and with no extra baggage when bidi is not
  used. The resulting code is the use of a new mechanism called scsi_sgtable.

  scsi_cmnd.h
  - define a new scsi_sgtable structure that will hold IO descriptors + the
    actual scattergather array.
  - Hold a pointer to the scsi_sgtable in scsi_cmnd.
  - Deprecate old, now unnecessary, IO members of scsi_cmnd. These members are
    kept for compatibility with unconverted drivers, still lurking around in
    the code tree. Last patch in the series removes them completely.
  - Modify data accessors to now use new members of scsi_sgtable.

  scsi_lib.c
  - scsi_lib is converted to use the new scsi_sgtable, in stead of the old
    members and sg-arrays.
  - scsi_{alloc,free}_sgtable() API has changed. This will break scsi_stgt
    which will need to be converted to new implementation.
  - Special code is inserted to initialize the old compatibility members from
    the new structures. This code will be removed.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |  116 +++++++++++++++++++++++-----------------------
 include/scsi/scsi_cmnd.h |   40 ++++++++++------
 2 files changed, 82 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 694bffa..899b7df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -35,16 +35,17 @@
 /*
  * Should fit within a single page.
  */
-enum { SCSI_MAX_SG_SEGMENTS = (PAGE_SIZE / sizeof(struct scatterlist)) };
+enum { SCSI_MAX_SG_SEGMENTS = ((PAGE_SIZE - sizeof(struct scsi_sgtable)) /
+                               sizeof(struct scatterlist)) };
 
 enum { SG_MEMPOOL_NR =
-	(SCSI_MAX_SG_SEGMENTS >= 8) +
-	(SCSI_MAX_SG_SEGMENTS >= 16) +
-	(SCSI_MAX_SG_SEGMENTS >= 32) +
-	(SCSI_MAX_SG_SEGMENTS >= 64) +
-	(SCSI_MAX_SG_SEGMENTS >= 128) +
-	(SCSI_MAX_SG_SEGMENTS >= 256) +
-	(SCSI_MAX_SG_SEGMENTS >= 512)
+	(SCSI_MAX_SG_SEGMENTS > 8) +
+	(SCSI_MAX_SG_SEGMENTS > 16) +
+	(SCSI_MAX_SG_SEGMENTS > 32) +
+	(SCSI_MAX_SG_SEGMENTS > 64) +
+	(SCSI_MAX_SG_SEGMENTS > 128) +
+	(SCSI_MAX_SG_SEGMENTS > 256) +
+	(SCSI_MAX_SG_SEGMENTS > 512)
 };
 
 struct scsi_host_sg_pool {
@@ -54,7 +55,10 @@ struct scsi_host_sg_pool {
 };
 static struct scsi_host_sg_pool scsi_sg_pools[SG_MEMPOOL_NR];
 
-
+static inline unsigned scsi_pool_size(int pool)
+{
+	return scsi_sg_pools[pool].size;
+}
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -699,7 +703,7 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	int i, size;
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1)
-		if (size >= nents)
+		if (size > nents)
 			return i;
 
 	if (SCSI_MAX_SG_SEGMENTS >= nents)
@@ -710,26 +714,26 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask)
 {
-	unsigned int pool = scsi_sgtable_index(cmd->use_sg);
-	struct scatterlist *sgl;
+	unsigned int pool = scsi_sgtable_index(sg_count);
+	struct scsi_sgtable *sgt;
 
-	sgl = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
-	if (unlikely(!sgl))
+	sgt = mempool_alloc(scsi_sg_pools[pool].pool, gfp_mask);
+	if (unlikely(!sgt))
 		return NULL;
 
-	cmd->sg_pool = pool;
-	return sgl;
+	memset(sgt, 0, SG_TABLE_SIZEOF(scsi_pool_size(pool)));
+	sgt->sg_count = sg_count;
+	sgt->sg_pool = pool;
+	return sgt;
 }
-
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_sgtable *sgt)
 {
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sgt, scsi_sg_pools[sgt->sg_pool].pool);
 }
-
 EXPORT_SYMBOL(scsi_free_sgtable);
 
 /*
@@ -751,13 +755,12 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sgtable)
+		scsi_free_sgtable(cmd->sgtable);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
+	cmd->sgtable = NULL;
+
+	/*FIXME: make code backward compatible with old system */
 	cmd->request_buffer = NULL;
 	cmd->request_bufflen = 0;
 	cmd->use_sg = 0;
@@ -794,7 +797,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -802,8 +805,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -826,9 +827,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -836,7 +839,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -969,41 +971,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
-
-	/*
-	 * We used to not use scatter-gather for single segment request,
-	 * but now we do (it makes highmem I/O easier to support without
-	 * kmapping pages)
-	 */
-	cmd->use_sg = req->nr_phys_segments;
+	struct scsi_sgtable *sgt;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sgt = scsi_alloc_sgtable(req->nr_phys_segments, GFP_ATOMIC);
+	if (unlikely(!sgt)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sgt->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sgt->length = req->nr_sectors << 9;
 
+	cmd->sgtable = sgt;
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sgt->sglist);
+	if (likely(count <= sgt->sg_count)) {
+		sgt->sg_count = count;
+
+		/*FIXME: make code backward compatible with old system */
+		cmd->request_buffer = sgt->sglist;
+		cmd->request_bufflen = sgt->length;
+		cmd->use_sg = sgt->sg_count;
+
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, scsi_sg_count(cmd));
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1059,7 +1062,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1088,9 +1091,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		cmd->sgtable = NULL;
 		req->buffer = NULL;
 	}
 
@@ -1631,8 +1632,8 @@ void scsi_unblock_requests(struct Scsi_Host *shost)
 EXPORT_SYMBOL(scsi_unblock_requests);
 
 const char* sg_names[] = {
-	"sgpool-8", "sgpool-16", "sgpool-32", "sgpool-64",
-	"sgpool-128", "sgpool-256", "sgpool-512"
+	"sgtable-7", "sgtable-15", "sgtable-31", "sgtable-63",
+	"sgtable-127", "sgtable-255", "sgtable-511"
 };
 
 int __init scsi_init_queue(void)
@@ -1650,11 +1651,10 @@ int __init scsi_init_queue(void)
 
 	for (i = 0, size = 8; i < SG_MEMPOOL_NR; i++, size <<= 1) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
-		sgp->size = (i != SG_MEMPOOL_NR-1) ? size :
+		sgp->size = (i != SG_MEMPOOL_NR-1) ? size - 1:
 		                                           SCSI_MAX_SG_SEGMENTS;
 		sgp->slab = kmem_cache_create(sg_names[i],
-				sgp->size*sizeof(struct scatterlist),
-				0, 0, NULL);
+				SG_TABLE_SIZEOF(sgp->size), 0, 0, NULL);
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
 					sg_names[i]);
@@ -1671,9 +1671,9 @@ int __init scsi_init_queue(void)
 	/* FIXME: Here for the debugging phase only */
 	printk(KERN_ERR
 		"SCSI: max_sg_count=%d SG_MEMPOOL_NR=%d page=%ld "
-		"so_scaterlist=%Zd\n",
+		"so_scaterlist=%Zd so_sgtable=%Zd\n",
 		SCSI_MAX_SG_SEGMENTS, SG_MEMPOOL_NR, PAGE_SIZE,
-		sizeof(struct scatterlist)
+		sizeof(struct scatterlist), sizeof(struct scsi_sgtable)
 	);
 
 	return 0;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 279a4df..574ea9d 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,16 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_sgtable {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sg_pool;
+	struct scatterlist sglist[0];
+};
+
+#define SG_TABLE_SIZEOF(sg_count) ((sg_count)*sizeof(struct scatterlist) \
+                                   + sizeof(struct scsi_sgtable))
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,15 +74,11 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
+	struct scsi_sgtable *sgtable;
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -82,10 +88,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -117,6 +119,11 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	unsigned short __deprecated use_sg;
+	unsigned __deprecated request_bufflen;
+	void __deprecated *request_buffer;
+	int __deprecated resid;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -132,35 +139,36 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scsi_sgtable *scsi_alloc_sgtable(int sg_count, gfp_t gfp_mask);
+extern void scsi_free_sgtable(struct scsi_sgtable *sgt);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->->use_sg;
+	return cmd->sgtable ? cmd->sgtable->sg_count : 0;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer)
+	return cmd->sgtable ? cmd->sgtable->sglist : 0;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sgtable ? cmd->sgtable->length : 0;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	if (cmd->sgtable)
+		cmd->sgtable->resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sgtable ? cmd->sgtable->resid : 0;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-25  8:42               ` FUJITA Tomonori
@ 2007-07-25 19:22                 ` Boaz Harrosh
  2007-07-26 11:33                   ` FUJITA Tomonori
  2007-07-31 20:12                   ` Boaz Harrosh
  2007-07-25 19:50                 ` Boaz Harrosh
  1 sibling, 2 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-25 19:22 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: bhalevy, James.Bottomley, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> From: Benny Halevy <bhalevy@panasas.com>
> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Wed, 25 Jul 2007 11:26:44 +0300
> 
>>> However, I'm perfectly happy to go with whatever the empirical evidence
>>> says is best .. and hopefully, now we don't have to pick this once and
>>> for all time ... we can alter it if whatever is chosen proves to be
>>> suboptimal.
>> I agree.  This isn't a catholic marriage :)
>> We'll run some performance experiments comparing the sgtable chaining
>> implementation vs. a scsi_data_buff implementation pointing
>> at a possibly chained sglist and let's see if we can measure
>> any difference.  We'll send results as soon as we have them.
> 
> I did some tests with your sgtable patchset and the approach to use
> separate buffer for sglists. As expected, there was no performance
> difference with small I/Os. I've not tried very large I/Os, which
> might give some difference.
> 
> The patchset to use separate buffer for sglists is available:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable
> 
> 
> Can you clean up your patchset and upload somewhere?

Sorry sure. Here is scsi_sgtable complete work over linux-block:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block
 
Here is just scsi_sgtable, no chaining, over scsi-misc + more
drivers:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/scsi-misc


Next week I will try to mount lots of scsi_debug devices and
use large parallel IO to try and find a difference. I will
test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable

I have lots of reservations about Tomo's last patches. For me
they are a regression. They use 3 allocations per command instead
of 2. They use an extra pointer and an extra global slab_pool. And
for what, for grouping some scsi_cmnd members in a substructure.
If we want to go the "pointing" way, keeping our extra scatterlist
and our base_2 count on most ARCHs. Than we can just use the 
scsi_data_buff embedded inside scsi_cmnd. 

The second scsi_data_buff for bidi can come either from an extra 
slab_pool like in Tomo's patch - bidi can pay - or in scsi.c at 
scsi_setup_command_freelist() the code can inspect Tomo's flag 
to the request_queue QUEUE_FLAG_BIDI and will than allocate a 
bigger scsi_cmnd in the free list.

I have coded that approach and it is very simple:
http://www.bhalevy.com/open-osd/download/scsi_data_buff

They are over Jens's sglist-arch branch
I have revised all scsi-ml places and it all compiles
But is totally untested.

I will add this branch to the above tests, but I suspect that
they are identical in every way to current code.


For review here is the main scsi_data_buff patch:

------
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Wed, 25 Jul 2007 20:19:14 +0300
Subject: [PATCH] SCSI: scsi_data_buff

  In preparation for bidi we abstract all IO members of scsi_cmnd
  that will need to duplicate into a substructure.
  - Group all IO members of scsi_cmnd into a scsi_data_buff
    structure.
  - Adjust accessors to new members.
  - scsi_{alloc,free}_sgtable receive a scsi_data_buff instead of
    scsi_cmnd. And work on it. (Supporting chaining like before)
  - Adjust scsi_init_io() and  scsi_release_buffers() for above
    change.
  - Fix other parts of scsi_lib to members migration. Use accessors
    where appropriate.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |   68 +++++++++++++++++++--------------------------
 include/scsi/scsi_cmnd.h |   34 +++++++++++-----------
 2 files changed, 46 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d62b184..2b8a865 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -714,16 +714,14 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *sdb, gfp_t gfp_mask)
 {
 	struct scsi_host_sg_pool *sgp;
 	struct scatterlist *sgl, *prev, *ret;
 	unsigned int index;
 	int this, left;
 
-	BUG_ON(!cmd->use_sg);
-
-	left = cmd->use_sg;
+	left = sdb->sg_count;
 	ret = prev = NULL;
 	do {
 		this = left;
@@ -747,7 +745,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		 * first loop through, set initial index and return value
 		 */
 		if (!ret) {
-			cmd->sg_pool = index;
+			sdb->sg_pool = index;
 			ret = sgl;
 		}
 
@@ -769,10 +767,10 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 	} while (left);
 
 	/*
-	 * ->use_sg may get modified after dma mapping has potentially
+	 * sdb->sg_count may get modified after blk_rq_map_sg() potentially
 	 * shrunk the number of segments, so keep a copy of it for free.
 	 */
-	cmd->__use_sg = cmd->use_sg;
+	sdb->sgs_allocated = sdb->sg_count;
 	return ret;
 enomem:
 	if (ret) {
@@ -797,21 +795,21 @@ enomem:
 
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_data_buff *sdb)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
+	struct scatterlist *sgl = sdb->sglist;
 	struct scsi_host_sg_pool *sgp;
 
 	/*
 	 * if this is the biggest size sglist, check if we have
 	 * chained parts we need to free
 	 */
-	if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+	if (sdb->sgs_allocated > SCSI_MAX_SG_SEGMENTS) {
 		unsigned short this, left;
 		struct scatterlist *next;
 		unsigned int index;
 
-		left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
+		left = sdb->sgs_allocated - (SCSI_MAX_SG_SEGMENTS - 1);
 		next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
 		while (left && next) {
 			sgl = next;
@@ -833,7 +831,7 @@ void scsi_free_sgtable(struct scsi_cmnd *cmd)
 		}
 	}
 
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sdb->sglist, scsi_sg_pools[sdb->sg_pool].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -857,16 +855,10 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sdb.sglist)
+		scsi_free_sgtable(&cmd->sdb);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
-	cmd->request_buffer = NULL;
-	cmd->request_bufflen = 0;
-	cmd->use_sg = 0;
+	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 }
 
 /*
@@ -900,7 +892,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -908,8 +900,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -932,9 +922,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -942,7 +934,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -1075,41 +1066,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
+	struct scsi_data_buff* sdb = &cmd->sdb;
 
 	/*
 	 * We used to not use scatter-gather for single segment request,
 	 * but now we do (it makes highmem I/O easier to support without
 	 * kmapping pages)
 	 */
-	cmd->use_sg = req->nr_phys_segments;
+	sdb->sg_count = req->nr_phys_segments;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sdb->sglist = scsi_alloc_sgtable(sdb, GFP_ATOMIC);
+	if (unlikely(!sdb->sglist)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sdb->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sdb->length = req->nr_sectors << 9;
 
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sdb->sglist);
+	if (likely(count <= sdb->sg_count)) {
+		sdb->sg_count = count;
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, sdb->sg_count);
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1165,7 +1157,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1194,9 +1186,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 		req->buffer = NULL;
 	}
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index c7bfc60..89187dc 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,14 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_data_buff {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sgs_allocated;
+	short sg_pool;
+	struct scatterlist* sglist;
+};
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,16 +72,10 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-	unsigned short __use_sg;
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -83,10 +85,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -118,6 +116,8 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	struct scsi_data_buff sdb;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -133,35 +133,35 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *, gfp_t);
+extern void scsi_free_sgtable(struct scsi_data_buff *);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->use_sg;
+	return cmd->sdb.sg_count;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer);
+	return cmd->sdb.sglist;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sdb.length;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	cmd->sdb.resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sdb.resid;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-25  8:42               ` FUJITA Tomonori
  2007-07-25 19:22                 ` Boaz Harrosh
@ 2007-07-25 19:50                 ` Boaz Harrosh
  1 sibling, 0 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-25 19:50 UTC (permalink / raw)
  To: FUJITA Tomonori, bhalevy, James.Bottomley; +Cc: tomof, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> From: Benny Halevy <bhalevy@panasas.com>
> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Wed, 25 Jul 2007 11:26:44 +0300
> 
>>> However, I'm perfectly happy to go with whatever the empirical evidence
>>> says is best .. and hopefully, now we don't have to pick this once and
>>> for all time ... we can alter it if whatever is chosen proves to be
>>> suboptimal.
>> I agree.  This isn't a catholic marriage :)
>> We'll run some performance experiments comparing the sgtable chaining
>> implementation vs. a scsi_data_buff implementation pointing
>> at a possibly chained sglist and let's see if we can measure
>> any difference.  We'll send results as soon as we have them.
> 
> I did some tests with your sgtable patchset and the approach to use
> separate buffer for sglists. As expected, there was no performance
> difference with small I/Os. I've not tried very large I/Os, which
> might give some difference.
> 
> The patchset to use separate buffer for sglists is available:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable
> 
> 
> Can you clean up your patchset and upload somewhere?

Sorry sure. Here is scsi_sgtable complete work over linux-block:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block
 
Here is just scsi_sgtable, no chaining, over scsi-misc + more
drivers:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/scsi-misc

Next week I will try to mount lots of scsi_debug devices and
use large parallel IO to try and find a difference. I will
test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable

I have lots of reservations about Tomo's last patches. For me
they are a regression. They use 3 allocations per command instead
of 2. They use an extra pointer and an extra global slab_pool. And
for what, for grouping some scsi_cmnd members in a substructure.
If we want to go the "pointing" way, keeping our extra scatterlist
and our base_2 count on most ARCHs. Than we can just use the 
scsi_data_buff embedded inside scsi_cmnd. 

The second scsi_data_buff for bidi can come either from an extra 
slab_pool like in Tomo's patch - bidi can pay - or in scsi.c at 
scsi_setup_command_freelist() the code can inspect Tomo's flag 
to the request_queue QUEUE_FLAG_BIDI and will than allocate a 
bigger scsi_cmnd in the free list.

I have coded that approach and it is very simple:
http://www.bhalevy.com/open-osd/download/scsi_data_buff

They are over Jens's sglist-arch branch
I have revised all scsi-ml places and it all compiles
But is totally untested.

I will add this branch to the above tests, but I suspect that
they are identical in every way to current code.

For review here is the main scsi_data_buff patch:
------
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Wed, 25 Jul 2007 20:19:14 +0300
Subject: [PATCH] SCSI: scsi_data_buff

  In preparation for bidi we abstract all IO members of scsi_cmnd
  that will need to duplicate into a substructure.
  - Group all IO members of scsi_cmnd into a scsi_data_buff
    structure.
  - Adjust accessors to new members.
  - scsi_{alloc,free}_sgtable receive a scsi_data_buff instead of
    scsi_cmnd. And work on it. (Supporting chaining like before)
  - Adjust scsi_init_io() and  scsi_release_buffers() for above
    change.
  - Fix other parts of scsi_lib to members migration. Use accessors
    where appropriate.

 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/scsi_lib.c  |   68 +++++++++++++++++++--------------------------
 include/scsi/scsi_cmnd.h |   34 +++++++++++-----------
 2 files changed, 46 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d62b184..2b8a865 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -714,16 +714,14 @@ static unsigned scsi_sgtable_index(unsigned nents)
 	return -1;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *sdb, gfp_t gfp_mask)
 {
 	struct scsi_host_sg_pool *sgp;
 	struct scatterlist *sgl, *prev, *ret;
 	unsigned int index;
 	int this, left;
 
-	BUG_ON(!cmd->use_sg);
-
-	left = cmd->use_sg;
+	left = sdb->sg_count;
 	ret = prev = NULL;
 	do {
 		this = left;
@@ -747,7 +745,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		 * first loop through, set initial index and return value
 		 */
 		if (!ret) {
-			cmd->sg_pool = index;
+			sdb->sg_pool = index;
 			ret = sgl;
 		}
 
@@ -769,10 +767,10 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 	} while (left);
 
 	/*
-	 * ->use_sg may get modified after dma mapping has potentially
+	 * sdb->sg_count may get modified after blk_rq_map_sg() potentially
 	 * shrunk the number of segments, so keep a copy of it for free.
 	 */
-	cmd->__use_sg = cmd->use_sg;
+	sdb->sgs_allocated = sdb->sg_count;
 	return ret;
 enomem:
 	if (ret) {
@@ -797,21 +795,21 @@ enomem:
 
 EXPORT_SYMBOL(scsi_alloc_sgtable);
 
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_data_buff *sdb)
 {
-	struct scatterlist *sgl = cmd->request_buffer;
+	struct scatterlist *sgl = sdb->sglist;
 	struct scsi_host_sg_pool *sgp;
 
 	/*
 	 * if this is the biggest size sglist, check if we have
 	 * chained parts we need to free
 	 */
-	if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+	if (sdb->sgs_allocated > SCSI_MAX_SG_SEGMENTS) {
 		unsigned short this, left;
 		struct scatterlist *next;
 		unsigned int index;
 
-		left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
+		left = sdb->sgs_allocated - (SCSI_MAX_SG_SEGMENTS - 1);
 		next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
 		while (left && next) {
 			sgl = next;
@@ -833,7 +831,7 @@ void scsi_free_sgtable(struct scsi_cmnd *cmd)
 		}
 	}
 
-	mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+	mempool_free(sdb->sglist, scsi_sg_pools[sdb->sg_pool].pool);
 }
 
 EXPORT_SYMBOL(scsi_free_sgtable);
@@ -857,16 +855,10 @@ EXPORT_SYMBOL(scsi_free_sgtable);
  */
 static void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
-	if (cmd->use_sg)
-		scsi_free_sgtable(cmd);
+	if (cmd->sdb.sglist)
+		scsi_free_sgtable(&cmd->sdb);
 
-	/*
-	 * Zero these out.  They now point to freed memory, and it is
-	 * dangerous to hang onto the pointers.
-	 */
-	cmd->request_buffer = NULL;
-	cmd->request_bufflen = 0;
-	cmd->use_sg = 0;
+	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 }
 
 /*
@@ -900,7 +892,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	int this_count = cmd->request_bufflen;
+	int this_count = scsi_bufflen(cmd);
 	request_queue_t *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int clear_errors = 1;
@@ -908,8 +900,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	scsi_release_buffers(cmd);
-
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
 		if (sense_valid)
@@ -932,9 +922,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				req->sense_len = len;
 			}
 		}
-		req->data_len = cmd->resid;
+		req->data_len = scsi_get_resid(cmd);
 	}
 
+	scsi_release_buffers(cmd);
+
 	/*
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
@@ -942,7 +934,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
 				      "%d bytes done.\n",
 				      req->nr_sectors, good_bytes));
-	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 	if (clear_errors)
 		req->errors = 0;
@@ -1075,41 +1066,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 {
 	struct request     *req = cmd->request;
 	int		   count;
+	struct scsi_data_buff* sdb = &cmd->sdb;
 
 	/*
 	 * We used to not use scatter-gather for single segment request,
 	 * but now we do (it makes highmem I/O easier to support without
 	 * kmapping pages)
 	 */
-	cmd->use_sg = req->nr_phys_segments;
+	sdb->sg_count = req->nr_phys_segments;
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
-	if (unlikely(!cmd->request_buffer)) {
+	sdb->sglist = scsi_alloc_sgtable(sdb, GFP_ATOMIC);
+	if (unlikely(!sdb->sglist)) {
 		scsi_unprep_request(req);
 		return BLKPREP_DEFER;
 	}
 
 	req->buffer = NULL;
 	if (blk_pc_request(req))
-		cmd->request_bufflen = req->data_len;
+		sdb->length = req->data_len;
 	else
-		cmd->request_bufflen = req->nr_sectors << 9;
+		sdb->length = req->nr_sectors << 9;
 
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
 	 * each segment.
 	 */
-	count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
-	if (likely(count <= cmd->use_sg)) {
-		cmd->use_sg = count;
+	count = blk_rq_map_sg(req->q, req, sdb->sglist);
+	if (likely(count <= sdb->sg_count)) {
+		sdb->sg_count = count;
 		return BLKPREP_OK;
 	}
 
 	printk(KERN_ERR "Incorrect number of segments after building list\n");
-	printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+	printk(KERN_ERR "counted %d, received %d\n", count, sdb->sg_count);
 	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
 			req->current_nr_sectors);
 
@@ -1165,7 +1157,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->request_bufflen);
+	scsi_io_completion(cmd, scsi_bufflen(cmd));
 }
 
 static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1194,9 +1186,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
 		BUG_ON(req->data_len);
 		BUG_ON(req->data);
 
-		cmd->request_bufflen = 0;
-		cmd->request_buffer = NULL;
-		cmd->use_sg = 0;
+		memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 		req->buffer = NULL;
 	}
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index c7bfc60..89187dc 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,14 @@ struct scatterlist;
 struct Scsi_Host;
 struct scsi_device;
 
+struct scsi_data_buff {
+	unsigned length;
+	int resid;
+	short sg_count;
+	short sgs_allocated;
+	short sg_pool;
+	struct scatterlist* sglist;
+};
 
 /* embedded in scsi_cmnd */
 struct scsi_pointer {
@@ -64,16 +72,10 @@ struct scsi_cmnd {
 	/* These elements define the operation we are about to perform */
 #define MAX_COMMAND_SIZE	16
 	unsigned char cmnd[MAX_COMMAND_SIZE];
-	unsigned request_bufflen;	/* Actual request size */
 
 	struct timer_list eh_timeout;	/* Used to time out the command. */
-	void *request_buffer;		/* Actual requested buffer */
 
 	/* These elements define the operation we ultimately want to perform */
-	unsigned short use_sg;	/* Number of pieces of scatter-gather */
-	unsigned short sg_pool;	/* pool index of allocated sg array */
-	unsigned short __use_sg;
-
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -83,10 +85,6 @@ struct scsi_cmnd {
 				   reconnects.   Probably == sector
 				   size */
 
-	int resid;		/* Number of bytes requested to be
-				   transferred less actual number
-				   transferred (0 if not supported) */
-
 	struct request *request;	/* The command we are
 				   	   working on */
 
@@ -118,6 +116,8 @@ struct scsi_cmnd {
 
 	unsigned char tag;	/* SCSI-II queued command tag */
 	unsigned long pid;	/* Process ID, starts at 0. Unique per host. */
+
+	struct scsi_data_buff sdb;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -133,35 +133,35 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
 extern void scsi_kunmap_atomic_sg(void *virt);
 
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *, gfp_t);
+extern void scsi_free_sgtable(struct scsi_data_buff *);
 
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
-	return cmd->use_sg;
+	return cmd->sdb.sg_count;
 }
 
 static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
 {
-	return ((struct scatterlist *)cmd->request_buffer);
+	return cmd->sdb.sglist;
 }
 
 static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
 {
-	return cmd->request_bufflen;
+	return cmd->sdb.length;
 }
 
 static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
 {
-	cmd->resid = resid;
+	cmd->sdb.resid = resid;
 }
 
 static inline int scsi_get_resid(struct scsi_cmnd *cmd)
 {
-	return cmd->resid;
+	return cmd->sdb.resid;
 }
 
 #define scsi_for_each_sg(cmd, sg, nseg, __i)			\
-- 
1.5.2.2.249.g45fd


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-25 19:22                 ` Boaz Harrosh
@ 2007-07-26 11:33                   ` FUJITA Tomonori
  2007-07-31 20:12                   ` Boaz Harrosh
  1 sibling, 0 replies; 27+ messages in thread
From: FUJITA Tomonori @ 2007-07-26 11:33 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, bhalevy, James.Bottomley, jens.axboe, linux-scsi

From: Boaz Harrosh <bharrosh@panasas.com>
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Wed, 25 Jul 2007 22:22:20 +0300

> FUJITA Tomonori wrote:
> > From: Benny Halevy <bhalevy@panasas.com>
> > Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> > Date: Wed, 25 Jul 2007 11:26:44 +0300
> > 
> >>> However, I'm perfectly happy to go with whatever the empirical evidence
> >>> says is best .. and hopefully, now we don't have to pick this once and
> >>> for all time ... we can alter it if whatever is chosen proves to be
> >>> suboptimal.
> >> I agree.  This isn't a catholic marriage :)
> >> We'll run some performance experiments comparing the sgtable chaining
> >> implementation vs. a scsi_data_buff implementation pointing
> >> at a possibly chained sglist and let's see if we can measure
> >> any difference.  We'll send results as soon as we have them.
> > 
> > I did some tests with your sgtable patchset and the approach to use
> > separate buffer for sglists. As expected, there was no performance
> > difference with small I/Os. I've not tried very large I/Os, which
> > might give some difference.
> > 
> > The patchset to use separate buffer for sglists is available:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable
> > 
> > 
> > Can you clean up your patchset and upload somewhere?
> 
> Sorry sure. Here is scsi_sgtable complete work over linux-block:
> http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block
>  
> Here is just scsi_sgtable, no chaining, over scsi-misc + more
> drivers:
> http://www.bhalevy.com/open-osd/download/scsi_sgtable/scsi-misc
> 
> 
> Next week I will try to mount lots of scsi_debug devices and
> use large parallel IO to try and find a difference. I will
> test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable
> 
> I have lots of reservations about Tomo's last patches. For me
> they are a regression. They use 3 allocations per command instead
> of 2. They use an extra pointer and an extra global slab_pool. And
> for what, for grouping some scsi_cmnd members in a substructure.
> If we want to go the "pointing" way, keeping our extra scatterlist
> and our base_2 count on most ARCHs. Than we can just use the 
> scsi_data_buff embedded inside scsi_cmnd. 

Yeah, I changed and tested the patch to embed one sgtable header
structure in struct scsi_cmnd for uni-directional transfers.

Somehow, I uploaded the old patchset to my bidi tree yesterday.


> The second scsi_data_buff for bidi can come either from an extra 
> slab_pool like in Tomo's patch - bidi can pay - or in scsi.c at 
> scsi_setup_command_freelist() the code can inspect Tomo's flag 
> to the request_queue QUEUE_FLAG_BIDI and will than allocate a 
> bigger scsi_cmnd in the free list.
> 
> I have coded that approach and it is very simple:
> http://www.bhalevy.com/open-osd/download/scsi_data_buff
> 
> They are over Jens's sglist-arch branch
> I have revised all scsi-ml places and it all compiles
> But is totally untested.

This is similar to the patch that I tested.


> I will add this branch to the above tests, but I suspect that
> they are identical in every way to current code.

With the large I/Os, there might be some difference.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-25 19:22                 ` Boaz Harrosh
  2007-07-26 11:33                   ` FUJITA Tomonori
@ 2007-07-31 20:12                   ` Boaz Harrosh
  2007-08-05 16:03                     ` FUJITA Tomonori
  2007-08-06  7:22                     ` FUJITA Tomonori
  1 sibling, 2 replies; 27+ messages in thread
From: Boaz Harrosh @ 2007-07-31 20:12 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: bhalevy, James.Bottomley, jens.axboe, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4383 bytes --]

Boaz Harrosh wrote:
> FUJITA Tomonori wrote:
>> From: Benny Halevy <bhalevy@panasas.com>
>> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
>> Date: Wed, 25 Jul 2007 11:26:44 +0300
>>
>>>> However, I'm perfectly happy to go with whatever the empirical evidence
>>>> says is best .. and hopefully, now we don't have to pick this once and
>>>> for all time ... we can alter it if whatever is chosen proves to be
>>>> suboptimal.
>>> I agree.  This isn't a catholic marriage :)
>>> We'll run some performance experiments comparing the sgtable chaining
>>> implementation vs. a scsi_data_buff implementation pointing
>>> at a possibly chained sglist and let's see if we can measure
>>> any difference.  We'll send results as soon as we have them.
>> I did some tests with your sgtable patchset and the approach to use
>> separate buffer for sglists. As expected, there was no performance
>> difference with small I/Os. I've not tried very large I/Os, which
>> might give some difference.
>>
> 
> Next week I will try to mount lots of scsi_debug devices and
> use large parallel IO to try and find a difference. I will
> test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable
> 


I was able to run some tests here are my results.

The results:
PPT - is Pages Per Transfer (sg_count)

The numbers are accumulated time of 20 transfers of 32GB each,
and the average of 4 such runs. (Lower time is better)
Transfers are sg_dd into scsi_debug

Kernel         | total time 128-PPT | total time 2048-PPT
---------------|--------------------|---------------------
sglist-arch    |      47.26         | Test Failed
scsi_data_buff |      41.68         | 35.05
scsi_sgtable   |      42.42         | 36.45


The test:
1. scsi_debug
  I mounted the scsi_debug module which was converted and fixed for 
  chaining with the following options:
  $ modprobe scsi_debug virtual_gb=32 delay=0 dev_size_mb=32 fake_rw=1
  
  32 GB of virtual drive on 32M of memory with 0 delay
  and read/write do nothing with the fake_rw=1.
  After that I just enabled chained IO on the device

  So what I'm actually testing is only sg + scsi-ml request
  queuing and sglist allocation/deallocation. Which is what I want
  to test.

2. sg_dd
  In the test script (see prof_test_scsi_debug attached)
  I use sg_dd in direct io mode to send a direct scsi-command
  to above device.
  I did 2 tests, in both I transfer 32GB of data.
  1st test with 128 (4K) pages IO size.
  2nd test with 2048 pages IO size.
  The second test will successfully run only if chaining is enabled
  and working. Otherwise it will fail.

The tested Kernels:

1. Jens's sglist-arch
  I was not able to pass all tests with this Kernel. For some reason when
  bigger than 256 pages commands are queued the Machine will run out
  of memory and will kill the test. After the test is killed the system
  is left with 10M of memory and can hardly reboot.
  I have done some prints at the queuecommand entry in scsi_debug.c
  and I can see that I receive the expected large sg_count and bufflen
  but unlike other tests I get a different pointer at scsi_sglist().
  In other tests since nothing is happening at this machine while in
  the test, the sglist pointer is always the same. commands comes in,
  allocates memory, do nothing in scsi_debug, freed, and returns. 
  I suspect sglist leak or allocation bug.

2. scsi_data_buff
  This tree is what I posted last. It is basically: 
  0. sglist-arch
  1. revert of scsi-ml support for chaining.
  2. sg-pools cleanup [PATCH AB1]
  3. scsi-ml sglist-arch [PATCH B1]
  4. scsi_data_buff patch. scsi_lib.c (Last patch sent)
  5. scsi_data_buff patch for sr.c sd.c & scsi_error.c
  6. Plus converted libata, ide-scsi, so Kernel can compile.
  7. convert of scsi_debug.c and fix for chaining.
  ( see http://www.bhalevy.com/open-osd/download/scsi_data_buff)

  All Tests run

3. scsi_sgtable 
  This tree is what I posted as patches that open this mailing thread.
  0. sglist-arch
  1. revert of scsi-ml support for chaining.
  2. sg-pools cleanup [PATCH AB1]
  3. sgtable [PATCH A2]
  3. chaining [PATCH A3]
  4. scsi_sgtable for sd sr and scsi_error
  6. Converted libata ide-scsi so Kernel can compile.
  7. convert of scsi_debug.c and fix for chaining.
  ( see http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block/)

  All Tests run


[-- Attachment #2: install_sdebug_chaining --]
[-- Type: text/plain, Size: 473 bytes --]

#!/bin/sh
sdx=sdb
#load the device with these params
modprobe scsi_debug virtual_gb=32 delay=0 dev_size_mb=32 fake_rw=1

# go set some live params
# $ cd /sys/bus/pseudo/drivers/scsi_debug
# $ echo 1 > fake_rw

# mess with sglist chaining
cd /sys/block/$sdx/queue
echo 4096 > max_segments
cat max_hw_sectors_kb  > max_sectors_kb
echo "max_hw_sectors_kb="$(cat max_hw_sectors_kb) 
echo "max_hw_sectors_kb="$(cat max_sectors_kb) 
echo "max_hw_sectors_kb="$(cat max_segments)

[-- Attachment #3: prof_test_scsi_debug --]
[-- Type: text/plain, Size: 1265 bytes --]

#!/bin/sh

#load the device with these params
#$ modprobe scsi_debug virtual_gb=32 delay=0 dev_size_mb=32 fake_rw=1

# go set some live params
# $ cd /sys/bus/pseudo/drivers/scsi_debug
# $ echo 1 > fake_rw

# mess with sglist chaining
# $ cd /sys/block/sdb/queue
# $ echo 4096 > max_segments
# $ cat max_hw_sectors_kb  > max_sectors_kb
# $ cat max_hw_sectors_kb 


if=/dev/zero
of=/dev/sdb

outputfile=$1.txt
echo "Testing $1"

# send 32G in $1 sectrors at once
do_dd()
{
# blocks of one sector
bs=512
#memory page in blocks
page=8
#number of scatterlist elements in a transfer
sgs=$1
#calculate the bpt param
bpt=$(($sgs*$page))
#total blocks to transfer 32 Giga bytes
count=64M


echo $3: "bpt=$bpt"

\time bash -c \
	"sg_dd blk_sgio=1 dio=1 if=$if of=$of bpt=$bpt bs=$bs count=$count 2>/dev/null" \
	2>> $2
}

echo "BEGIN RUN $1" >> $outputfile

# warm run
for i in {1..5}; do
do_dd 2048 /dev/null $i;
done

# one page trasfers
echo "one page transfers"
echo "one page transfers" >> $outputfile
for i in {1..20}; do
do_dd 128 $outputfile $i;
done

# chained
# 16K / 8 = 2K pages
# 2K / 128 = 16 chained sglists
echo "16 chained sglists"
echo "16 chained sglists" >> $outputfile
for i in {1..20}; do
do_dd 2048 $outputfile $i;
done

echo "END RUN" >> $outputfile

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-31 20:12                   ` Boaz Harrosh
@ 2007-08-05 16:03                     ` FUJITA Tomonori
  2007-08-06  7:22                     ` FUJITA Tomonori
  1 sibling, 0 replies; 27+ messages in thread
From: FUJITA Tomonori @ 2007-08-05 16:03 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, bhalevy, James.Bottomley, jens.axboe, linux-scsi

From: Boaz Harrosh <bharrosh@panasas.com>
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Tue, 31 Jul 2007 23:12:26 +0300

> Boaz Harrosh wrote:
> > FUJITA Tomonori wrote:
> >> From: Benny Halevy <bhalevy@panasas.com>
> >> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> >> Date: Wed, 25 Jul 2007 11:26:44 +0300
> >>
> >>>> However, I'm perfectly happy to go with whatever the empirical evidence
> >>>> says is best .. and hopefully, now we don't have to pick this once and
> >>>> for all time ... we can alter it if whatever is chosen proves to be
> >>>> suboptimal.
> >>> I agree.  This isn't a catholic marriage :)
> >>> We'll run some performance experiments comparing the sgtable chaining
> >>> implementation vs. a scsi_data_buff implementation pointing
> >>> at a possibly chained sglist and let's see if we can measure
> >>> any difference.  We'll send results as soon as we have them.
> >> I did some tests with your sgtable patchset and the approach to use
> >> separate buffer for sglists. As expected, there was no performance
> >> difference with small I/Os. I've not tried very large I/Os, which
> >> might give some difference.
> >>
> > 
> > Next week I will try to mount lots of scsi_debug devices and
> > use large parallel IO to try and find a difference. I will
> > test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable
> > 
> 
> 
> I was able to run some tests here are my results.
> 
> The results:
> PPT - is Pages Per Transfer (sg_count)
> 
> The numbers are accumulated time of 20 transfers of 32GB each,
> and the average of 4 such runs. (Lower time is better)
> Transfers are sg_dd into scsi_debug
> 
> Kernel         | total time 128-PPT | total time 2048-PPT
> ---------------|--------------------|---------------------
> sglist-arch    |      47.26         | Test Failed
> scsi_data_buff |      41.68         | 35.05
> scsi_sgtable   |      42.42         | 36.45
> 
> 
> The test:
> 1. scsi_debug
>   I mounted the scsi_debug module which was converted and fixed for 
>   chaining with the following options:
>   $ modprobe scsi_debug virtual_gb=32 delay=0 dev_size_mb=32 fake_rw=1
>   
>   32 GB of virtual drive on 32M of memory with 0 delay
>   and read/write do nothing with the fake_rw=1.
>   After that I just enabled chained IO on the device
> 
>   So what I'm actually testing is only sg + scsi-ml request
>   queuing and sglist allocation/deallocation. Which is what I want
>   to test.
> 
> 2. sg_dd
>   In the test script (see prof_test_scsi_debug attached)
>   I use sg_dd in direct io mode to send a direct scsi-command
>   to above device.

Your script is doing:

sg_dd blk_sgio=1 dio=1 if=/dev/zero of=/dev/sdb ...

dio works for SG_IO? Only sg suuports it, I think.

And sd_read is more appropriate than sg_dd for performance tests.

I'm not sure about the results. How can sglist-arch be slower than
scsi_data_buff and scsi_sgtable.


>   I did 2 tests, in both I transfer 32GB of data.
>   1st test with 128 (4K) pages IO size.
>   2nd test with 2048 pages IO size.
>   The second test will successfully run only if chaining is enabled
>   and working. Otherwise it will fail.
> 
> The tested Kernels:
> 
> 1. Jens's sglist-arch
>   I was not able to pass all tests with this Kernel. For some reason when
>   bigger than 256 pages commands are queued the Machine will run out
>   of memory and will kill the test. After the test is killed the system
>   is left with 10M of memory and can hardly reboot.

sglist-arch works for me. I hit memory leak due to the sg (I used sg
instead of SG_IO) bug though the bug happens even with scsi_data_buff
and sgtable.


> 2. scsi_data_buff
>   This tree is what I posted last. It is basically: 
>   0. sglist-arch
>   1. revert of scsi-ml support for chaining.
>   2. sg-pools cleanup [PATCH AB1]

I don't think this is the proper way. As I said, we can implement
scsi_data_buffer stuff on the top of sglist cleanly. No need to revert
something in sglist.

I don't think that sg-pools cleanup is necessary. We can live without
the sg segment size hack due to sglist.

My scsi data buffer patch is at:

git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git sdbuffer

The differences are: I use 'scsi_data_buffer structure' instead of
scsi_data_buff; it's on the top of sglist cleanly.


I also put your scsi_data_buff and sgtable patchset:

git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git boaz-sdbuff
git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git boaz-sgtable

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-07-31 20:12                   ` Boaz Harrosh
  2007-08-05 16:03                     ` FUJITA Tomonori
@ 2007-08-06  7:22                     ` FUJITA Tomonori
  2007-08-07  6:55                       ` Jens Axboe
  1 sibling, 1 reply; 27+ messages in thread
From: FUJITA Tomonori @ 2007-08-06  7:22 UTC (permalink / raw)
  To: jens.axboe, bharrosh
  Cc: fujita.tomonori, bhalevy, James.Bottomley, linux-scsi

On Tue, 31 Jul 2007 23:12:26 +0300
Boaz Harrosh <bharrosh@panasas.com> wrote:

> The tested Kernels:
> 
> 1. Jens's sglist-arch
>   I was not able to pass all tests with this Kernel. For some reason when
>   bigger than 256 pages commands are queued the Machine will run out
>   of memory and will kill the test. After the test is killed the system
>   is left with 10M of memory and can hardly reboot.
>   I have done some prints at the queuecommand entry in scsi_debug.c
>   and I can see that I receive the expected large sg_count and bufflen
>   but unlike other tests I get a different pointer at scsi_sglist().
>   In other tests since nothing is happening at this machine while in
>   the test, the sglist pointer is always the same. commands comes in,
>   allocates memory, do nothing in scsi_debug, freed, and returns. 
>   I suspect sglist leak or allocation bug.

Ok, I found the leak.


>From 011c05c2e514d1db4834147ed83526473711b0a3 Mon Sep 17 00:00:00 2001
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Mon, 6 Aug 2007 16:16:24 +0900
Subject: [PATCH] fix sg chaining leak

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/scsi/scsi_lib.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5884b1b..25988b9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -48,7 +48,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
 	SP(32),
 	SP(64),
 	SP(128),
-	SP(256),
 };
 #undef SP
 
-- 
1.5.2.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large  IO sg-chaining
  2007-08-06  7:22                     ` FUJITA Tomonori
@ 2007-08-07  6:55                       ` Jens Axboe
  2007-08-07  8:36                         ` FUJITA Tomonori
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2007-08-07  6:55 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: bharrosh, bhalevy, James.Bottomley, linux-scsi

On Mon, Aug 06 2007, FUJITA Tomonori wrote:
> On Tue, 31 Jul 2007 23:12:26 +0300
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
> > The tested Kernels:
> > 
> > 1. Jens's sglist-arch
> >   I was not able to pass all tests with this Kernel. For some reason when
> >   bigger than 256 pages commands are queued the Machine will run out
> >   of memory and will kill the test. After the test is killed the system
> >   is left with 10M of memory and can hardly reboot.
> >   I have done some prints at the queuecommand entry in scsi_debug.c
> >   and I can see that I receive the expected large sg_count and bufflen
> >   but unlike other tests I get a different pointer at scsi_sglist().
> >   In other tests since nothing is happening at this machine while in
> >   the test, the sglist pointer is always the same. commands comes in,
> >   allocates memory, do nothing in scsi_debug, freed, and returns. 
> >   I suspect sglist leak or allocation bug.
> 
> Ok, I found the leak.
> 
> 
> From 011c05c2e514d1db4834147ed83526473711b0a3 Mon Sep 17 00:00:00 2001
> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Mon, 6 Aug 2007 16:16:24 +0900
> Subject: [PATCH] fix sg chaining leak
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> ---
>  drivers/scsi/scsi_lib.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 5884b1b..25988b9 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -48,7 +48,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
>  	SP(32),
>  	SP(64),
>  	SP(128),
> -	SP(256),
>  };
>  #undef SP

Thanks Tomo! Trying to catch up with mails, will apply this one right
away.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
  2007-08-07  6:55                       ` Jens Axboe
@ 2007-08-07  8:36                         ` FUJITA Tomonori
  2007-08-08  7:16                           ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: FUJITA Tomonori @ 2007-08-07  8:36 UTC (permalink / raw)
  To: jens.axboe
  Cc: fujita.tomonori, bharrosh, bhalevy, James.Bottomley, linux-scsi

On Tue, 7 Aug 2007 08:55:49 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:

> On Mon, Aug 06 2007, FUJITA Tomonori wrote:
> > On Tue, 31 Jul 2007 23:12:26 +0300
> > Boaz Harrosh <bharrosh@panasas.com> wrote:
> > 
> > > The tested Kernels:
> > > 
> > > 1. Jens's sglist-arch
> > >   I was not able to pass all tests with this Kernel. For some reason when
> > >   bigger than 256 pages commands are queued the Machine will run out
> > >   of memory and will kill the test. After the test is killed the system
> > >   is left with 10M of memory and can hardly reboot.
> > >   I have done some prints at the queuecommand entry in scsi_debug.c
> > >   and I can see that I receive the expected large sg_count and bufflen
> > >   but unlike other tests I get a different pointer at scsi_sglist().
> > >   In other tests since nothing is happening at this machine while in
> > >   the test, the sglist pointer is always the same. commands comes in,
> > >   allocates memory, do nothing in scsi_debug, freed, and returns. 
> > >   I suspect sglist leak or allocation bug.
> > 
> > Ok, I found the leak.
> > 
> > 
> > From 011c05c2e514d1db4834147ed83526473711b0a3 Mon Sep 17 00:00:00 2001
> > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Date: Mon, 6 Aug 2007 16:16:24 +0900
> > Subject: [PATCH] fix sg chaining leak
> > 
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > ---
> >  drivers/scsi/scsi_lib.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 5884b1b..25988b9 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -48,7 +48,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
> >  	SP(32),
> >  	SP(64),
> >  	SP(128),
> > -	SP(256),
> >  };
> >  #undef SP
> 
> Thanks Tomo! Trying to catch up with mails, will apply this one right
> away.

You can add the following patch to your sglist branches:


>From abd73c05d5f08ee307776150e1deecac7a709b60 Mon Sep 17 00:00:00 2001
From: FUJITA Tomonori <tomof@acm.org>
Date: Mon, 30 Jul 2007 23:01:32 +0900
Subject: [PATCH] zfcp: sg chaining support

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/s390/scsi/zfcp_def.h  |    1 +
 drivers/s390/scsi/zfcp_qdio.c |    6 ++----
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_def.h b/drivers/s390/scsi/zfcp_def.h
index b36dfc4..0d80150 100644
--- a/drivers/s390/scsi/zfcp_def.h
+++ b/drivers/s390/scsi/zfcp_def.h
@@ -34,6 +34,7 @@
 #include <linux/slab.h>
 #include <linux/mempool.h>
 #include <linux/syscalls.h>
+#include <linux/scatterlist.h>
 #include <linux/ioctl.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_tcq.h>
diff --git a/drivers/s390/scsi/zfcp_qdio.c b/drivers/s390/scsi/zfcp_qdio.c
index 81daa82..60bc269 100644
--- a/drivers/s390/scsi/zfcp_qdio.c
+++ b/drivers/s390/scsi/zfcp_qdio.c
@@ -591,7 +591,7 @@ zfcp_qdio_sbals_from_segment(struct zfcp_fsf_req *fsf_req, unsigned long sbtype,
  */
 int
 zfcp_qdio_sbals_from_sg(struct zfcp_fsf_req *fsf_req, unsigned long sbtype,
-                        struct scatterlist *sg,	int sg_count, int max_sbals)
+                        struct scatterlist *sgl, int sg_count, int max_sbals)
 {
 	int sg_index;
 	struct scatterlist *sg_segment;
@@ -607,9 +607,7 @@ zfcp_qdio_sbals_from_sg(struct zfcp_fsf_req *fsf_req, unsigned long sbtype,
 	sbale->flags |= sbtype;
 
 	/* process all segements of scatter-gather list */
-	for (sg_index = 0, sg_segment = sg, bytes = 0;
-	     sg_index < sg_count;
-	     sg_index++, sg_segment++) {
+	for_each_sg(sgl, sg_segment, sg_count, sg_index) {
 		retval = zfcp_qdio_sbals_from_segment(
 				fsf_req,
 				sbtype,
-- 
1.5.2.4






^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large   IO sg-chaining
  2007-08-07  8:36                         ` FUJITA Tomonori
@ 2007-08-08  7:16                           ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2007-08-08  7:16 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: bharrosh, bhalevy, James.Bottomley, linux-scsi

On Tue, Aug 07 2007, FUJITA Tomonori wrote:
> On Tue, 7 Aug 2007 08:55:49 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Mon, Aug 06 2007, FUJITA Tomonori wrote:
> > > On Tue, 31 Jul 2007 23:12:26 +0300
> > > Boaz Harrosh <bharrosh@panasas.com> wrote:
> > > 
> > > > The tested Kernels:
> > > > 
> > > > 1. Jens's sglist-arch
> > > >   I was not able to pass all tests with this Kernel. For some reason when
> > > >   bigger than 256 pages commands are queued the Machine will run out
> > > >   of memory and will kill the test. After the test is killed the system
> > > >   is left with 10M of memory and can hardly reboot.
> > > >   I have done some prints at the queuecommand entry in scsi_debug.c
> > > >   and I can see that I receive the expected large sg_count and bufflen
> > > >   but unlike other tests I get a different pointer at scsi_sglist().
> > > >   In other tests since nothing is happening at this machine while in
> > > >   the test, the sglist pointer is always the same. commands comes in,
> > > >   allocates memory, do nothing in scsi_debug, freed, and returns. 
> > > >   I suspect sglist leak or allocation bug.
> > > 
> > > Ok, I found the leak.
> > > 
> > > 
> > > From 011c05c2e514d1db4834147ed83526473711b0a3 Mon Sep 17 00:00:00 2001
> > > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > > Date: Mon, 6 Aug 2007 16:16:24 +0900
> > > Subject: [PATCH] fix sg chaining leak
> > > 
> > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > > ---
> > >  drivers/scsi/scsi_lib.c |    1 -
> > >  1 files changed, 0 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 5884b1b..25988b9 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -48,7 +48,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
> > >  	SP(32),
> > >  	SP(64),
> > >  	SP(128),
> > > -	SP(256),
> > >  };
> > >  #undef SP
> > 
> > Thanks Tomo! Trying to catch up with mails, will apply this one right
> > away.
> 
> You can add the following patch to your sglist branches:
> 
> 
> From abd73c05d5f08ee307776150e1deecac7a709b60 Mon Sep 17 00:00:00 2001
> From: FUJITA Tomonori <tomof@acm.org>
> Date: Mon, 30 Jul 2007 23:01:32 +0900
> Subject: [PATCH] zfcp: sg chaining support

Thanks, applied!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-08-08  7:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-24  8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
2007-07-24  8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
2007-07-24 13:08   ` Boaz Harrosh
2007-07-25  8:08   ` Boaz Harrosh
2007-07-25  9:05     ` [PATCH AB1/5 ver2] " Boaz Harrosh
2007-07-25  9:06     ` [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation Boaz Harrosh
2007-07-24  8:56 ` [PATCH A2/5] " Boaz Harrosh
2007-07-24  8:59 ` [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable Boaz Harrosh
2007-07-24  9:01 ` [PATCH B2/5] SCSI: support for allocating large scatterlists Boaz Harrosh
2007-07-24  9:03 ` [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning Boaz Harrosh
2007-07-24  9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
2007-07-24 10:01   ` Boaz Harrosh
2007-07-24 11:12     ` FUJITA Tomonori
2007-07-24 13:41       ` FUJITA Tomonori
2007-07-24 14:01         ` Benny Halevy
2007-07-24 16:10           ` James Bottomley
2007-07-25  8:26             ` Benny Halevy
2007-07-25  8:42               ` FUJITA Tomonori
2007-07-25 19:22                 ` Boaz Harrosh
2007-07-26 11:33                   ` FUJITA Tomonori
2007-07-31 20:12                   ` Boaz Harrosh
2007-08-05 16:03                     ` FUJITA Tomonori
2007-08-06  7:22                     ` FUJITA Tomonori
2007-08-07  6:55                       ` Jens Axboe
2007-08-07  8:36                         ` FUJITA Tomonori
2007-08-08  7:16                           ` Jens Axboe
2007-07-25 19:50                 ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).