public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michaelc@cs.wisc.edu>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Grant Grundler <grundler@google.com>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	mike.miller@hp.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	hare@novell.com, iss_storagedev@hp.com, iss.sbteam@hp.com
Subject: Re: [PATCH] hpsa: SCSI driver for HP Smart Array controllers
Date: Mon, 02 Mar 2009 14:33:23 -0600	[thread overview]
Message-ID: <49AC4293.4030902@cs.wisc.edu> (raw)
In-Reply-To: <20090302183649.GB11787@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]

Jens Axboe wrote:
> On Mon, Mar 02 2009, Mike Christie wrote:
>> Grant Grundler wrote:
>>> On Sun, Mar 1, 2009 at 10:32 PM, FUJITA Tomonori
>>> <fujita.tomonori@lab.ntt.co.jp> wrote:
>>> ...
>>>>> +/*
>>>>> + * For operations that cannot sleep, a command block is allocated at init,
>>>>> + * and managed by cmd_alloc() and cmd_free() using a simple bitmap to track
>>>>> + * which ones are free or in use.  Lock must be held when calling this.
>>>>> + * cmd_free() is the complement.
>>>>> + */
>>>>> +static struct CommandList_struct *cmd_alloc(struct ctlr_info *h)
>>>>> +{
>>>>> +     struct CommandList_struct *c;
>>>>> +     int i;
>>>>> +     union u64bit temp64;
>>>>> +     dma_addr_t cmd_dma_handle, err_dma_handle;
>>>>> +
>>>>> +     do {
>>>>> +             i = find_first_zero_bit(h->cmd_pool_bits, h->nr_cmds);
>>>>> +             if (i == h->nr_cmds)
>>>>> +                     return NULL;
>>>>> +     } while (test_and_set_bit
>>>>> +              (i & (BITS_PER_LONG - 1),
>>>>> +               h->cmd_pool_bits + (i / BITS_PER_LONG)) != 0);
>>>> Using bitmap to manage free commands looks too complicated a bit to
>>>> me. Can we just use lists for command management?
>>> Bit maps are generally more efficient than lists since we touch less data.
>>> For both search and moving elements from free<->busy lists. This probably
>>> won't matter if we are talking less than 10K IOPS. And willy demonstrated
>>> other layers have pretty high overhead (block, libata and SCSI midlayer)
>>> at high transaction rates.
>>>
>> If it was just needing this for the queuecommand path it would be  
>> simple. For the queuecommand path we could just use the scsi host  
>> tagging code for the index. You do not need a lock in the queuecommand  
>> path for getting a index and command, and you do not need to duplicate  
>> the tag/index allocation code in the block/scsi code
>>
>> A problem with the host tagging is what to do if you need a tag/index  
>> for a internal command. In the slow path like the device reset and cache  
>> flush case you could use a list or preallocated command or whatever  
>> other drivers are using that makes you happy.
>>
>> Or for the reset/shutdown/internal path could we come up with a  
>> extension to the existing API. Maybe just add some wrapper around some  
>> of blk_queue_start_tag that takes a the bqt (the bqt would come from the  
>> host wide one) and allocates the tag (need a something similar for the  
>> release side).
> 
> This is precisely what I did for libata, here is is interleaved with
> some other stuff:
> 
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=f557570ec6042370333b6b9c33bbbae175120a89
> 
> It needs a little more polish and so on, but the concept is identical to
> what you describe for this case. And I agree, it's much better to use
> the same index instead of generating/maintaining seperate bitmaps for
> this type of thing.
> 

In that patch where does the tag come from? Is it from libata?

What if we wanted and/or needed the bqt to give us a tag value and we 
need it for the lookup? It looks like for hpsa we could kill its 
find_first_zero_bit code and use and use the code in blk_queue_start_tag.

iscsi also needs the unique tag and then it needs the 
blk_map_queue_find_tag functionality too. iscsi needs the lookup and tag 
for host/transport level commands that do not have a scsi 
command/request. The tag value has to be unique accross the 
host/transport (acutally just the transport, but ignore that for now to 
make it simple and because for software iscsi we do a host per transport 
connection). Do you think something like the attached patch would be ok 
(it is only compile tested)?

[-- Attachment #2: make-tagging-more-generic.patch --]
[-- Type: text/plain, Size: 6486 bytes --]

diff --git a/block/blk-tag.c b/block/blk-tag.c
index 3c518e3..0614faf 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -106,7 +106,7 @@ EXPORT_SYMBOL(blk_queue_free_tags);
 static int
 init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
 {
-	struct request **tag_index;
+	void **tag_index;
 	unsigned long *tag_map;
 	int nr_ulongs;
 
@@ -116,7 +116,7 @@ init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
 		       __func__, depth);
 	}
 
-	tag_index = kzalloc(depth * sizeof(struct request *), GFP_ATOMIC);
+	tag_index = kzalloc(depth * sizeof(void *), GFP_ATOMIC);
 	if (!tag_index)
 		goto fail;
 
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(blk_queue_init_tags);
 int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 {
 	struct blk_queue_tag *bqt = q->queue_tags;
-	struct request **tag_index;
+	void **tag_index;
 	unsigned long *tag_map;
 	int max_depth, nr_ulongs;
 
@@ -254,7 +254,7 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 	if (init_tag_map(q, bqt, new_depth))
 		return -ENOMEM;
 
-	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
+	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(void *));
 	nr_ulongs = ALIGN(max_depth, BITS_PER_LONG) / BITS_PER_LONG;
 	memcpy(bqt->tag_map, tag_map, nr_ulongs * sizeof(unsigned long));
 
@@ -265,24 +265,12 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 EXPORT_SYMBOL(blk_queue_resize_tags);
 
 /**
- * blk_queue_end_tag - end tag operations for a request
- * @q:  the request queue for the device
- * @rq: the request that has completed
- *
- *  Description:
- *    Typically called when end_that_request_first() returns %0, meaning
- *    all transfers have been done for a request. It's important to call
- *    this function before end_that_request_last(), as that will put the
- *    request back on the free list thus corrupting the internal tag list.
- *
- *  Notes:
- *   queue lock must be held.
+ * blk_map_end_tag - end tag operation
+ * @bqt: block queue tag
+ * @tag: tag to clear
  **/
-void blk_queue_end_tag(struct request_queue *q, struct request *rq)
+void blk_map_end_tag(struct blk_queue_tag *bqt, int tag)
 {
-	struct blk_queue_tag *bqt = q->queue_tags;
-	int tag = rq->tag;
-
 	BUG_ON(tag == -1);
 
 	if (unlikely(tag >= bqt->real_max_depth))
@@ -292,10 +280,6 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 		 */
 		return;
 
-	list_del_init(&rq->queuelist);
-	rq->cmd_flags &= ~REQ_QUEUED;
-	rq->tag = -1;
-
 	if (unlikely(bqt->tag_index[tag] == NULL))
 		printk(KERN_ERR "%s: tag %d is missing\n",
 		       __func__, tag);
@@ -313,9 +297,65 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 	 */
 	clear_bit_unlock(tag, bqt->tag_map);
 }
+EXPORT_SYMBOL(blk_map_end_tag);
+
+/**
+ * blk_queue_end_tag - end tag operations for a request
+ * @q:  the request queue for the device
+ * @rq: the request that has completed
+ *
+ *  Description:
+ *    Typically called when end_that_request_first() returns %0, meaning
+ *    all transfers have been done for a request. It's important to call
+ *    this function before end_that_request_last(), as that will put the
+ *    request back on the free list thus corrupting the internal tag list.
+ *
+ *  Notes:
+ *   queue lock must be held.
+ **/
+void blk_queue_end_tag(struct request_queue *q, struct request *rq)
+{
+	blk_map_end_tag(q->queue_tags, rq->tag);
+
+	list_del_init(&rq->queuelist);
+	rq->cmd_flags &= ~REQ_QUEUED;
+	rq->tag = -1;
+}
 EXPORT_SYMBOL(blk_queue_end_tag);
 
 /**
+ * blk_map_start_tag - find a free tag
+ * @bqt: block queue tag
+ * @object: object to store in bqt tag_index at index returned by tag
+ * @offset: offset into bqt tag map
+ **/
+int blk_map_start_tag(struct blk_queue_tag *bqt, void *object, unsigned offset)
+{
+	unsigned max_depth;
+	int tag;
+
+	/*
+	 * Protect against shared tag maps, as we may not have exclusive
+	 * access to the tag map.
+	 */
+	max_depth = bqt->max_depth;
+	do {
+		tag = find_next_zero_bit(bqt->tag_map, max_depth, offset);
+		if (tag >= max_depth)
+			return -1;
+
+	} while (test_and_set_bit_lock(tag, bqt->tag_map));
+	/*
+	 * We need lock ordering semantics given by test_and_set_bit_lock.
+	 * See blk_map_end_tag for details.
+	 */
+
+	bqt->tag_index[tag] = object;
+	return tag;
+}
+EXPORT_SYMBOL(blk_map_start_tag);
+
+/**
  * blk_queue_start_tag - find a free tag and assign it
  * @q:  the request queue for the device
  * @rq:  the block request that needs tagging
@@ -347,10 +387,8 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 		BUG();
 	}
 
+
 	/*
-	 * Protect against shared tag maps, as we may not have exclusive
-	 * access to the tag map.
-	 *
 	 * We reserve a few tags just for sync IO, since we don't want
 	 * to starve sync IO on behalf of flooding async IO.
 	 */
@@ -360,20 +398,12 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 	else
 		offset = max_depth >> 2;
 
-	do {
-		tag = find_next_zero_bit(bqt->tag_map, max_depth, offset);
-		if (tag >= max_depth)
-			return 1;
-
-	} while (test_and_set_bit_lock(tag, bqt->tag_map));
-	/*
-	 * We need lock ordering semantics given by test_and_set_bit_lock.
-	 * See blk_queue_end_tag for details.
-	 */
+	tag = blk_map_start_tag(bqt, rq, offset);
+	if (tag < 0)
+		return 1;
 
 	rq->cmd_flags |= REQ_QUEUED;
 	rq->tag = tag;
-	bqt->tag_index[tag] = rq;
 	blkdev_dequeue_request(rq);
 	list_add(&rq->queuelist, &q->tag_busy_list);
 	return 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 465d6ba..d748261 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -290,7 +290,7 @@ enum blk_queue_state {
 };
 
 struct blk_queue_tag {
-	struct request **tag_index;	/* map of busy tags */
+	void **tag_index;		/* map of busy tags */
 	unsigned long *tag_map;		/* bit map of free/busy tags */
 	int busy;			/* current depth */
 	int max_depth;			/* what we will send to device */
@@ -904,6 +904,8 @@ extern int blk_queue_resize_tags(struct request_queue *, int);
 extern void blk_queue_invalidate_tags(struct request_queue *);
 extern struct blk_queue_tag *blk_init_tags(int);
 extern void blk_free_tags(struct blk_queue_tag *);
+extern int blk_map_start_tag(struct blk_queue_tag *, void *, unsigned);
+extern void blk_map_end_tag(struct blk_queue_tag *, int);
 
 static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 						int tag)

  reply	other threads:[~2009-03-02 20:34 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-27 23:09 [PATCH] hpsa: SCSI driver for HP Smart Array controllers Mike Miller
2009-03-01 13:49 ` Rolf Eike Beer
2009-03-02  6:32 ` FUJITA Tomonori
2009-03-02 17:19   ` Grant Grundler
2009-03-02 18:20     ` Mike Christie
2009-03-02 18:36       ` Jens Axboe
2009-03-02 20:33         ` Mike Christie [this message]
2009-03-02 20:37           ` Mike Christie
2009-03-03  9:43           ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2009-03-02 14:56 scameron
2009-03-03  6:35 ` FUJITA Tomonori
2009-03-03 16:28   ` scameron
2009-03-05  5:48     ` FUJITA Tomonori
2009-03-05 14:21       ` scameron
2009-03-05 16:54         ` Andrew Patterson
2009-03-06  8:55         ` Jens Axboe
2009-03-06  9:13           ` FUJITA Tomonori
2009-03-06  9:21             ` Jens Axboe
2009-03-06  9:27               ` FUJITA Tomonori
2009-03-06  9:35                 ` Jens Axboe
2009-03-06 14:38                   ` scameron
2009-03-06 19:06                     ` Jens Axboe
2009-03-06 20:59                     ` Grant Grundler
2009-03-06 21:18                       ` scameron
2009-03-06 21:55                         ` Grant Grundler
2009-03-06 21:59                         ` James Bottomley
2009-03-05 14:55       ` Miller, Mike (OS Dev)
2009-03-03 16:49 ` Mike Christie
2009-03-03 21:28   ` scameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49AC4293.4030902@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=akpm@linux-foundation.org \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=grundler@google.com \
    --cc=hare@novell.com \
    --cc=iss.sbteam@hp.com \
    --cc=iss_storagedev@hp.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mike.miller@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox