From: Boaz Harrosh <bharrosh@panasas.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: bhalevy@panasas.com, James.Bottomley@SteelEye.com,
jens.axboe@oracle.com, linux-scsi@vger.kernel.org
Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
Date: Wed, 25 Jul 2007 22:22:20 +0300 [thread overview]
Message-ID: <46A7A2EC.6040400@panasas.com> (raw)
In-Reply-To: <20070725174258T.fujita.tomonori@lab.ntt.co.jp>
FUJITA Tomonori wrote:
> From: Benny Halevy <bhalevy@panasas.com>
> Subject: Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
> Date: Wed, 25 Jul 2007 11:26:44 +0300
>
>>> However, I'm perfectly happy to go with whatever the empirical evidence
>>> says is best .. and hopefully, now we don't have to pick this once and
>>> for all time ... we can alter it if whatever is chosen proves to be
>>> suboptimal.
>> I agree. This isn't a catholic marriage :)
>> We'll run some performance experiments comparing the sgtable chaining
>> implementation vs. a scsi_data_buff implementation pointing
>> at a possibly chained sglist and let's see if we can measure
>> any difference. We'll send results as soon as we have them.
>
> I did some tests with your sgtable patchset and the approach to use
> separate buffer for sglists. As expected, there was no performance
> difference with small I/Os. I've not tried very large I/Os, which
> might give some difference.
>
> The patchset to use separate buffer for sglists is available:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git simple-sgtable
>
>
> Can you clean up your patchset and upload somewhere?
Sorry sure. Here is scsi_sgtable complete work over linux-block:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/linux-block
Here is just scsi_sgtable, no chaining, over scsi-misc + more
drivers:
http://www.bhalevy.com/open-osd/download/scsi_sgtable/scsi-misc
Next week I will try to mount lots of scsi_debug devices and
use large parallel IO to try and find a difference. I will
test Jens's sglist-arch tree against above sglist-arch+scsi_sgtable
I have lots of reservations about Tomo's last patches. For me
they are a regression. They use 3 allocations per command instead
of 2. They use an extra pointer and an extra global slab_pool. And
for what, for grouping some scsi_cmnd members in a substructure.
If we want to go the "pointing" way, keeping our extra scatterlist
and our base_2 count on most ARCHs. Than we can just use the
scsi_data_buff embedded inside scsi_cmnd.
The second scsi_data_buff for bidi can come either from an extra
slab_pool like in Tomo's patch - bidi can pay - or in scsi.c at
scsi_setup_command_freelist() the code can inspect Tomo's flag
to the request_queue QUEUE_FLAG_BIDI and will than allocate a
bigger scsi_cmnd in the free list.
I have coded that approach and it is very simple:
http://www.bhalevy.com/open-osd/download/scsi_data_buff
They are over Jens's sglist-arch branch
I have revised all scsi-ml places and it all compiles
But is totally untested.
I will add this branch to the above tests, but I suspect that
they are identical in every way to current code.
For review here is the main scsi_data_buff patch:
------
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Wed, 25 Jul 2007 20:19:14 +0300
Subject: [PATCH] SCSI: scsi_data_buff
In preparation for bidi we abstract all IO members of scsi_cmnd
that will need to duplicate into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buff
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buff instead of
scsi_cmnd. And work on it. (Supporting chaining like before)
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib to members migration. Use accessors
where appropriate.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
drivers/scsi/scsi_lib.c | 68 +++++++++++++++++++--------------------------
include/scsi/scsi_cmnd.h | 34 +++++++++++-----------
2 files changed, 46 insertions(+), 56 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d62b184..2b8a865 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -714,16 +714,14 @@ static unsigned scsi_sgtable_index(unsigned nents)
return -1;
}
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *sdb, gfp_t gfp_mask)
{
struct scsi_host_sg_pool *sgp;
struct scatterlist *sgl, *prev, *ret;
unsigned int index;
int this, left;
- BUG_ON(!cmd->use_sg);
-
- left = cmd->use_sg;
+ left = sdb->sg_count;
ret = prev = NULL;
do {
this = left;
@@ -747,7 +745,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
* first loop through, set initial index and return value
*/
if (!ret) {
- cmd->sg_pool = index;
+ sdb->sg_pool = index;
ret = sgl;
}
@@ -769,10 +767,10 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
} while (left);
/*
- * ->use_sg may get modified after dma mapping has potentially
+ * sdb->sg_count may get modified after blk_rq_map_sg() potentially
* shrunk the number of segments, so keep a copy of it for free.
*/
- cmd->__use_sg = cmd->use_sg;
+ sdb->sgs_allocated = sdb->sg_count;
return ret;
enomem:
if (ret) {
@@ -797,21 +795,21 @@ enomem:
EXPORT_SYMBOL(scsi_alloc_sgtable);
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgtable(struct scsi_data_buff *sdb)
{
- struct scatterlist *sgl = cmd->request_buffer;
+ struct scatterlist *sgl = sdb->sglist;
struct scsi_host_sg_pool *sgp;
/*
* if this is the biggest size sglist, check if we have
* chained parts we need to free
*/
- if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+ if (sdb->sgs_allocated > SCSI_MAX_SG_SEGMENTS) {
unsigned short this, left;
struct scatterlist *next;
unsigned int index;
- left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
+ left = sdb->sgs_allocated - (SCSI_MAX_SG_SEGMENTS - 1);
next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
while (left && next) {
sgl = next;
@@ -833,7 +831,7 @@ void scsi_free_sgtable(struct scsi_cmnd *cmd)
}
}
- mempool_free(cmd->request_buffer, scsi_sg_pools[cmd->sg_pool].pool);
+ mempool_free(sdb->sglist, scsi_sg_pools[sdb->sg_pool].pool);
}
EXPORT_SYMBOL(scsi_free_sgtable);
@@ -857,16 +855,10 @@ EXPORT_SYMBOL(scsi_free_sgtable);
*/
static void scsi_release_buffers(struct scsi_cmnd *cmd)
{
- if (cmd->use_sg)
- scsi_free_sgtable(cmd);
+ if (cmd->sdb.sglist)
+ scsi_free_sgtable(&cmd->sdb);
- /*
- * Zero these out. They now point to freed memory, and it is
- * dangerous to hang onto the pointers.
- */
- cmd->request_buffer = NULL;
- cmd->request_bufflen = 0;
- cmd->use_sg = 0;
+ memset(&cmd->sdb, 0, sizeof(cmd->sdb));
}
/*
@@ -900,7 +892,7 @@ static void scsi_release_buffers(struct scsi_cmnd *cmd)
void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
{
int result = cmd->result;
- int this_count = cmd->request_bufflen;
+ int this_count = scsi_bufflen(cmd);
request_queue_t *q = cmd->device->request_queue;
struct request *req = cmd->request;
int clear_errors = 1;
@@ -908,8 +900,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
int sense_valid = 0;
int sense_deferred = 0;
- scsi_release_buffers(cmd);
-
if (result) {
sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
if (sense_valid)
@@ -932,9 +922,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
req->sense_len = len;
}
}
- req->data_len = cmd->resid;
+ req->data_len = scsi_get_resid(cmd);
}
+ scsi_release_buffers(cmd);
+
/*
* Next deal with any sectors which we were able to correctly
* handle.
@@ -942,7 +934,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
"%d bytes done.\n",
req->nr_sectors, good_bytes));
- SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
if (clear_errors)
req->errors = 0;
@@ -1075,41 +1066,42 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
{
struct request *req = cmd->request;
int count;
+ struct scsi_data_buff* sdb = &cmd->sdb;
/*
* We used to not use scatter-gather for single segment request,
* but now we do (it makes highmem I/O easier to support without
* kmapping pages)
*/
- cmd->use_sg = req->nr_phys_segments;
+ sdb->sg_count = req->nr_phys_segments;
/*
* If sg table allocation fails, requeue request later.
*/
- cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
- if (unlikely(!cmd->request_buffer)) {
+ sdb->sglist = scsi_alloc_sgtable(sdb, GFP_ATOMIC);
+ if (unlikely(!sdb->sglist)) {
scsi_unprep_request(req);
return BLKPREP_DEFER;
}
req->buffer = NULL;
if (blk_pc_request(req))
- cmd->request_bufflen = req->data_len;
+ sdb->length = req->data_len;
else
- cmd->request_bufflen = req->nr_sectors << 9;
+ sdb->length = req->nr_sectors << 9;
/*
* Next, walk the list, and fill in the addresses and sizes of
* each segment.
*/
- count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
- if (likely(count <= cmd->use_sg)) {
- cmd->use_sg = count;
+ count = blk_rq_map_sg(req->q, req, sdb->sglist);
+ if (likely(count <= sdb->sg_count)) {
+ sdb->sg_count = count;
return BLKPREP_OK;
}
printk(KERN_ERR "Incorrect number of segments after building list\n");
- printk(KERN_ERR "counted %d, received %d\n", count, cmd->use_sg);
+ printk(KERN_ERR "counted %d, received %d\n", count, sdb->sg_count);
printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
req->current_nr_sectors);
@@ -1165,7 +1157,7 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
* successfully. Since this is a REQ_BLOCK_PC command the
* caller should check the request's errors value
*/
- scsi_io_completion(cmd, cmd->request_bufflen);
+ scsi_io_completion(cmd, scsi_bufflen(cmd));
}
static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
@@ -1194,9 +1186,7 @@ static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
BUG_ON(req->data_len);
BUG_ON(req->data);
- cmd->request_bufflen = 0;
- cmd->request_buffer = NULL;
- cmd->use_sg = 0;
+ memset(&cmd->sdb, 0, sizeof(cmd->sdb));
req->buffer = NULL;
}
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index c7bfc60..89187dc 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -11,6 +11,14 @@ struct scatterlist;
struct Scsi_Host;
struct scsi_device;
+struct scsi_data_buff {
+ unsigned length;
+ int resid;
+ short sg_count;
+ short sgs_allocated;
+ short sg_pool;
+ struct scatterlist* sglist;
+};
/* embedded in scsi_cmnd */
struct scsi_pointer {
@@ -64,16 +72,10 @@ struct scsi_cmnd {
/* These elements define the operation we are about to perform */
#define MAX_COMMAND_SIZE 16
unsigned char cmnd[MAX_COMMAND_SIZE];
- unsigned request_bufflen; /* Actual request size */
struct timer_list eh_timeout; /* Used to time out the command. */
- void *request_buffer; /* Actual requested buffer */
/* These elements define the operation we ultimately want to perform */
- unsigned short use_sg; /* Number of pieces of scatter-gather */
- unsigned short sg_pool; /* pool index of allocated sg array */
- unsigned short __use_sg;
-
unsigned underflow; /* Return error if less than
this amount is transferred */
@@ -83,10 +85,6 @@ struct scsi_cmnd {
reconnects. Probably == sector
size */
- int resid; /* Number of bytes requested to be
- transferred less actual number
- transferred (0 if not supported) */
-
struct request *request; /* The command we are
working on */
@@ -118,6 +116,8 @@ struct scsi_cmnd {
unsigned char tag; /* SCSI-II queued command tag */
unsigned long pid; /* Process ID, starts at 0. Unique per host. */
+
+ struct scsi_data_buff sdb;
};
extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
@@ -133,35 +133,35 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
size_t *offset, size_t *len);
extern void scsi_kunmap_atomic_sg(void *virt);
-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct scatterlist *scsi_alloc_sgtable(struct scsi_data_buff *, gfp_t);
+extern void scsi_free_sgtable(struct scsi_data_buff *);
extern int scsi_dma_map(struct scsi_cmnd *cmd);
extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
{
- return cmd->use_sg;
+ return cmd->sdb.sg_count;
}
static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
{
- return ((struct scatterlist *)cmd->request_buffer);
+ return cmd->sdb.sglist;
}
static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
{
- return cmd->request_bufflen;
+ return cmd->sdb.length;
}
static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
{
- cmd->resid = resid;
+ cmd->sdb.resid = resid;
}
static inline int scsi_get_resid(struct scsi_cmnd *cmd)
{
- return cmd->resid;
+ return cmd->sdb.resid;
}
#define scsi_for_each_sg(cmd, sg, nseg, __i) \
--
1.5.2.2.249.g45fd
next prev parent reply other threads:[~2007-07-25 19:23 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-24 8:47 [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining Boaz Harrosh
2007-07-24 8:52 ` [PATCH AB1/5] SCSI: SG pools allocation cleanup Boaz Harrosh
2007-07-24 13:08 ` Boaz Harrosh
2007-07-25 8:08 ` Boaz Harrosh
2007-07-25 9:05 ` [PATCH AB1/5 ver2] " Boaz Harrosh
2007-07-25 9:06 ` [PATCH A2/5 ver2] SCSI: scsi_sgtable implementation Boaz Harrosh
2007-07-24 8:56 ` [PATCH A2/5] " Boaz Harrosh
2007-07-24 8:59 ` [PATCH A3/5] SCSI: sg-chaining over scsi_sgtable Boaz Harrosh
2007-07-24 9:01 ` [PATCH B2/5] SCSI: support for allocating large scatterlists Boaz Harrosh
2007-07-24 9:03 ` [PATCH B3/5] SCSI: scsi_sgtable over sg-chainning Boaz Harrosh
2007-07-24 9:16 ` [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining FUJITA Tomonori
2007-07-24 10:01 ` Boaz Harrosh
2007-07-24 11:12 ` FUJITA Tomonori
2007-07-24 13:41 ` FUJITA Tomonori
2007-07-24 14:01 ` Benny Halevy
2007-07-24 16:10 ` James Bottomley
2007-07-25 8:26 ` Benny Halevy
2007-07-25 8:42 ` FUJITA Tomonori
2007-07-25 19:22 ` Boaz Harrosh [this message]
2007-07-26 11:33 ` FUJITA Tomonori
2007-07-31 20:12 ` Boaz Harrosh
2007-08-05 16:03 ` FUJITA Tomonori
2007-08-06 7:22 ` FUJITA Tomonori
2007-08-07 6:55 ` Jens Axboe
2007-08-07 8:36 ` FUJITA Tomonori
2007-08-08 7:16 ` Jens Axboe
2007-07-25 19:50 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46A7A2EC.6040400@panasas.com \
--to=bharrosh@panasas.com \
--cc=James.Bottomley@SteelEye.com \
--cc=bhalevy@panasas.com \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=jens.axboe@oracle.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).