From: Boaz Harrosh <bharrosh@panasas.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: Actually using the sg table/chain code
Date: Wed, 16 Jan 2008 18:11:24 +0200 [thread overview]
Message-ID: <478E2CAC.7030607@panasas.com> (raw)
In-Reply-To: <1200496176.3136.5.camel@localhost.localdomain>
On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>> I thought, now we had this new shiny code to increase the scatterlist
>>>>>> table size I'd try it out. It turns out there's a pretty vast block
>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
>>>>>>
>>>>>> The first problems are in SCSI: The host parameters sg_tablesize and
>>>>>> max_sectors are used to set the queue limits max_hw_segments and
>>>>>> max_sectors respectively (the former is the maximum number of entries
>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
>>>>>> The default settings, assuming the HBA doesn't vary them are
>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
>>>>>> (1024). A quick calculation shows the latter is actually 512k or 128
>>>>>> pages (at 4k pages), hence the persistent 128 entry limit.
>>>>>>
>>>>>> However, raising max_sectors and sg_tablesize together still doesn't
>>>>>> help: There's actually an insidious limit sitting in the block layer as
>>>>>> well. This is what blk_queue_max_sectors says:
>>>>>>
>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
>>>>>> max_sectors)
>>>>>> {
>>>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
>>>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
>>>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
>>>>>> }
>>>>>>
>>>>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
>>>>>> q->max_hw_sectors = q->max_sectors = max_sectors;
>>>>>> else {
>>>>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
>>>>>> q->max_hw_sectors = max_sectors;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
>>>>>> 128 scatterlist entries.
>>>>>>
>>>>>> Once I raised this limit as well, I was able to transfer over 128
>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually
>>>>>> kernel compiles seem best, they hit 608 scatterlist entries).
>>>>>>
>>>>>> So my question, is there any reason not to raise this limit to something
>>>>>> large (like 65536) or even eliminate it altogether?
>>>>>>
>>>>>> James
>>>>>>
>>>>> I have an old branch here where I've swiped through the scsi drivers just
>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
>>>>> the code to change it to something driver specific if they really meant
>>>>> 255.
>>>>>
>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
>>>>> and some driver constant if there is a real limit. Though removing
>>>>> SG_ALL at the end.
>>>>>
>>>>> Should I freshen up this branch and send it.
>>>> By all means; however, I think having the defined constant SG_ALL is
>>>> useful (even if it is eventually just set to ~0) it means I can support
>>>> any scatterlist size. Having the drivers set sg_tablesize correctly
>>>> that can't support SG_ALL is pretty vital.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>> OK will do.
>>>
>>> I have found the old branch and am looking. I agree with you about the
>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing
>>> SG_ALL, and then final patch to just change SG_ALL.
>>>
>>> Boaz
>>
>> James hi
>> reinspecting the code, what should I do with drivers that do not support chaining
>> do to SW that still do sglist++?
>>
>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
>> a FIXME: in the submit message?
>>
>> or should we fix them first and serialize this effort on top of those fixes.
>> (also in light of the other email where you removed the chaining flag)
>
> How many of them are left?
>
> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
> "[PATCH] remove use_sg_chaining" moved into a shared header. Worst
> case, just use that and add a fixme comment giving the real value (if
> there is one).
>
> James
>
>
I have 9 up to now and 10 more drivers to check. All but one are
SW, one by one SCp.buffer++, so once it's fixed they should be able
to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS
as you requested. I have not checked drivers that did not use SG_ALL
but I trust these are usually smaller.
Boaz
next prev parent reply other threads:[~2008-01-16 16:12 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
2008-01-15 16:49 ` James Bottomley
2008-01-15 17:35 ` Boaz Harrosh
2008-01-16 14:01 ` Boaz Harrosh
2008-01-16 15:09 ` James Bottomley
2008-01-16 16:11 ` Boaz Harrosh [this message]
2008-01-16 16:37 ` Boaz Harrosh
2008-01-16 16:46 ` James Bottomley
2008-01-15 19:52 ` Jeff Garzik
2008-01-15 20:14 ` James Bottomley
2008-01-16 15:06 ` Jens Axboe
2008-01-16 15:47 ` James Bottomley
2008-01-16 16:08 ` Jens Axboe
2008-02-22 16:13 ` Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478E2CAC.7030607@panasas.com \
--to=bharrosh@panasas.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=Jens.Axboe@oracle.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.