* Actually using the sg table/chain code
@ 2008-01-15 15:52 James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: James Bottomley @ 2008-01-15 15:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi
I thought, now we had this new shiny code to increase the scatterlist
table size I'd try it out. It turns out there's a pretty vast block
conspiracy that prevents us going over 128 entries in a scatterlist.
The first problems are in SCSI: The host parameters sg_tablesize and
max_sectors are used to set the queue limits max_hw_segments and
max_sectors respectively (the former is the maximum number of entries
the HBA can tolerate in a scatterlist for each transaction, the latter
is a total transfer cap on the maxiumum number of 512 byte sectors).
The default settings, assuming the HBA doesn't vary them are
sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
(1024). A quick calculation shows the latter is actually 512k or 128
pages (at 4k pages), hence the persistent 128 entry limit.
However, raising max_sectors and sg_tablesize together still doesn't
help: There's actually an insidious limit sitting in the block layer as
well. This is what blk_queue_max_sectors says:
void blk_queue_max_sectors(struct request_queue *q, unsigned int
max_sectors)
{
if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
}
if (BLK_DEF_MAX_SECTORS > max_sectors)
q->max_hw_sectors = q->max_sectors = max_sectors;
else {
q->max_sectors = BLK_DEF_MAX_SECTORS;
q->max_hw_sectors = max_sectors;
}
}
So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
defined in blkdev.h to .... 1024, thus also forcing the queue down to
128 scatterlist entries.
Once I raised this limit as well, I was able to transfer over 128
scatterlist elements during benchmark test runs of normal I/O (actually
kernel compiles seem best, they hit 608 scatterlist entries).
So my question, is there any reason not to raise this limit to something
large (like 65536) or even eliminate it altogether?
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
@ 2008-01-15 16:09 ` Boaz Harrosh
2008-01-15 16:49 ` James Bottomley
2008-01-15 19:52 ` Jeff Garzik
2008-01-16 15:06 ` Jens Axboe
2 siblings, 1 reply; 15+ messages in thread
From: Boaz Harrosh @ 2008-01-15 16:09 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> I thought, now we had this new shiny code to increase the scatterlist
> table size I'd try it out. It turns out there's a pretty vast block
> conspiracy that prevents us going over 128 entries in a scatterlist.
>
> The first problems are in SCSI: The host parameters sg_tablesize and
> max_sectors are used to set the queue limits max_hw_segments and
> max_sectors respectively (the former is the maximum number of entries
> the HBA can tolerate in a scatterlist for each transaction, the latter
> is a total transfer cap on the maxiumum number of 512 byte sectors).
> The default settings, assuming the HBA doesn't vary them are
> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> (1024). A quick calculation shows the latter is actually 512k or 128
> pages (at 4k pages), hence the persistent 128 entry limit.
>
> However, raising max_sectors and sg_tablesize together still doesn't
> help: There's actually an insidious limit sitting in the block layer as
> well. This is what blk_queue_max_sectors says:
>
> void blk_queue_max_sectors(struct request_queue *q, unsigned int
> max_sectors)
> {
> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> }
>
> if (BLK_DEF_MAX_SECTORS > max_sectors)
> q->max_hw_sectors = q->max_sectors = max_sectors;
> else {
> q->max_sectors = BLK_DEF_MAX_SECTORS;
> q->max_hw_sectors = max_sectors;
> }
> }
>
> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> 128 scatterlist entries.
>
> Once I raised this limit as well, I was able to transfer over 128
> scatterlist elements during benchmark test runs of normal I/O (actually
> kernel compiles seem best, they hit 608 scatterlist entries).
>
> So my question, is there any reason not to raise this limit to something
> large (like 65536) or even eliminate it altogether?
>
> James
>
I have an old branch here where I've swiped through the scsi drivers just
to remove the SG_ALL limit. Unfortunately some drivers mean laterally
255 when using SG_ALL. So I passed driver by driver and carfully inspected
the code to change it to something driver specific if they really meant
255.
I have used sg_tablesize = ~0; to indicate, I don't care any will do,
and some driver constant if there is a real limit. Though removing
SG_ALL at the end.
Should I freshen up this branch and send it.
Boaz
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 16:09 ` Boaz Harrosh
@ 2008-01-15 16:49 ` James Bottomley
2008-01-15 17:35 ` Boaz Harrosh
0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2008-01-15 16:49 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Jens Axboe, linux-scsi
On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > I thought, now we had this new shiny code to increase the scatterlist
> > table size I'd try it out. It turns out there's a pretty vast block
> > conspiracy that prevents us going over 128 entries in a scatterlist.
> >
> > The first problems are in SCSI: The host parameters sg_tablesize and
> > max_sectors are used to set the queue limits max_hw_segments and
> > max_sectors respectively (the former is the maximum number of entries
> > the HBA can tolerate in a scatterlist for each transaction, the latter
> > is a total transfer cap on the maxiumum number of 512 byte sectors).
> > The default settings, assuming the HBA doesn't vary them are
> > sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> > (1024). A quick calculation shows the latter is actually 512k or 128
> > pages (at 4k pages), hence the persistent 128 entry limit.
> >
> > However, raising max_sectors and sg_tablesize together still doesn't
> > help: There's actually an insidious limit sitting in the block layer as
> > well. This is what blk_queue_max_sectors says:
> >
> > void blk_queue_max_sectors(struct request_queue *q, unsigned int
> > max_sectors)
> > {
> > if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> > max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> > printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> > }
> >
> > if (BLK_DEF_MAX_SECTORS > max_sectors)
> > q->max_hw_sectors = q->max_sectors = max_sectors;
> > else {
> > q->max_sectors = BLK_DEF_MAX_SECTORS;
> > q->max_hw_sectors = max_sectors;
> > }
> > }
> >
> > So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> > defined in blkdev.h to .... 1024, thus also forcing the queue down to
> > 128 scatterlist entries.
> >
> > Once I raised this limit as well, I was able to transfer over 128
> > scatterlist elements during benchmark test runs of normal I/O (actually
> > kernel compiles seem best, they hit 608 scatterlist entries).
> >
> > So my question, is there any reason not to raise this limit to something
> > large (like 65536) or even eliminate it altogether?
> >
> > James
> >
> I have an old branch here where I've swiped through the scsi drivers just
> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
> the code to change it to something driver specific if they really meant
> 255.
>
> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
> and some driver constant if there is a real limit. Though removing
> SG_ALL at the end.
>
> Should I freshen up this branch and send it.
By all means; however, I think having the defined constant SG_ALL is
useful (even if it is eventually just set to ~0) it means I can support
any scatterlist size. Having the drivers set sg_tablesize correctly
that can't support SG_ALL is pretty vital.
Thanks,
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 16:49 ` James Bottomley
@ 2008-01-15 17:35 ` Boaz Harrosh
2008-01-16 14:01 ` Boaz Harrosh
0 siblings, 1 reply; 15+ messages in thread
From: Boaz Harrosh @ 2008-01-15 17:35 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>> I thought, now we had this new shiny code to increase the scatterlist
>>> table size I'd try it out. It turns out there's a pretty vast block
>>> conspiracy that prevents us going over 128 entries in a scatterlist.
>>>
>>> The first problems are in SCSI: The host parameters sg_tablesize and
>>> max_sectors are used to set the queue limits max_hw_segments and
>>> max_sectors respectively (the former is the maximum number of entries
>>> the HBA can tolerate in a scatterlist for each transaction, the latter
>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
>>> The default settings, assuming the HBA doesn't vary them are
>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
>>> (1024). A quick calculation shows the latter is actually 512k or 128
>>> pages (at 4k pages), hence the persistent 128 entry limit.
>>>
>>> However, raising max_sectors and sg_tablesize together still doesn't
>>> help: There's actually an insidious limit sitting in the block layer as
>>> well. This is what blk_queue_max_sectors says:
>>>
>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
>>> max_sectors)
>>> {
>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
>>> }
>>>
>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
>>> q->max_hw_sectors = q->max_sectors = max_sectors;
>>> else {
>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
>>> q->max_hw_sectors = max_sectors;
>>> }
>>> }
>>>
>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
>>> 128 scatterlist entries.
>>>
>>> Once I raised this limit as well, I was able to transfer over 128
>>> scatterlist elements during benchmark test runs of normal I/O (actually
>>> kernel compiles seem best, they hit 608 scatterlist entries).
>>>
>>> So my question, is there any reason not to raise this limit to something
>>> large (like 65536) or even eliminate it altogether?
>>>
>>> James
>>>
>> I have an old branch here where I've swiped through the scsi drivers just
>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
>> the code to change it to something driver specific if they really meant
>> 255.
>>
>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
>> and some driver constant if there is a real limit. Though removing
>> SG_ALL at the end.
>>
>> Should I freshen up this branch and send it.
>
> By all means; however, I think having the defined constant SG_ALL is
> useful (even if it is eventually just set to ~0) it means I can support
> any scatterlist size. Having the drivers set sg_tablesize correctly
> that can't support SG_ALL is pretty vital.
>
> Thanks,
>
> James
OK will do.
I have found the old branch and am looking. I agree with you about the
SG_ALL. I will fix it to have a patch per changed driver, with out changing
SG_ALL, and then final patch to just change SG_ALL.
Boaz
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
@ 2008-01-15 19:52 ` Jeff Garzik
2008-01-15 20:14 ` James Bottomley
2008-01-16 15:06 ` Jens Axboe
2 siblings, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2008-01-15 19:52 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
James Bottomley wrote:
> if (BLK_DEF_MAX_SECTORS > max_sectors)
> q->max_hw_sectors = q->max_sectors = max_sectors;
> else {
> q->max_sectors = BLK_DEF_MAX_SECTORS;
> q->max_hw_sectors = max_sectors;
> }
> }
>
> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> 128 scatterlist entries.
>
> Once I raised this limit as well, I was able to transfer over 128
> scatterlist elements during benchmark test runs of normal I/O (actually
> kernel compiles seem best, they hit 608 scatterlist entries).
>
> So my question, is there any reason not to raise this limit to something
> large (like 65536) or even eliminate it altogether?
ISTR a thread long ago, perhaps including Andrea A (as well as Jens),
where 1024 sectors was arrived upon as a reasonable balance between
tying up gobs of VM memory on a single command (multiplied, then, across
N commands), and getting decent per-command throughput.
Jens probably recalls better than I... but I'm pretty sure that the
1024 limit played into "being nice with the VM" somehow.
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 19:52 ` Jeff Garzik
@ 2008-01-15 20:14 ` James Bottomley
0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2008-01-15 20:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Jens Axboe, linux-scsi
On Tue, 2008-01-15 at 14:52 -0500, Jeff Garzik wrote:
> James Bottomley wrote:
> > if (BLK_DEF_MAX_SECTORS > max_sectors)
> > q->max_hw_sectors = q->max_sectors = max_sectors;
> > else {
> > q->max_sectors = BLK_DEF_MAX_SECTORS;
> > q->max_hw_sectors = max_sectors;
> > }
> > }
> >
> > So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> > defined in blkdev.h to .... 1024, thus also forcing the queue down to
> > 128 scatterlist entries.
> >
> > Once I raised this limit as well, I was able to transfer over 128
> > scatterlist elements during benchmark test runs of normal I/O (actually
> > kernel compiles seem best, they hit 608 scatterlist entries).
> >
> > So my question, is there any reason not to raise this limit to something
> > large (like 65536) or even eliminate it altogether?
>
> ISTR a thread long ago, perhaps including Andrea A (as well as Jens),
> where 1024 sectors was arrived upon as a reasonable balance between
> tying up gobs of VM memory on a single command (multiplied, then, across
> N commands), and getting decent per-command throughput.
>
> Jens probably recalls better than I... but I'm pretty sure that the
> 1024 limit played into "being nice with the VM" somehow.
There's certainly the writeout deadlock avoidance issue. 1024 sectors
is 128 scatterlist entries (at minimum physical merging), is currently
the maximum SCSI scatterlist from the mempools, so would always have a
logically provable forward progress clear path. Once we go over this,
we start to get into corner cases where we have to hope that having deep
enough mempools makes the issue magically disappear.
The question is, how should we police all of this? Should be block
layer really be blindly enforcing the 1024 sector limit ... particularly
now the default allocator won't begin chaining until we get 2048 sectors
or higher?
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 17:35 ` Boaz Harrosh
@ 2008-01-16 14:01 ` Boaz Harrosh
2008-01-16 15:09 ` James Bottomley
0 siblings, 1 reply; 15+ messages in thread
From: Boaz Harrosh @ 2008-01-16 14:01 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>> I thought, now we had this new shiny code to increase the scatterlist
>>>> table size I'd try it out. It turns out there's a pretty vast block
>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
>>>>
>>>> The first problems are in SCSI: The host parameters sg_tablesize and
>>>> max_sectors are used to set the queue limits max_hw_segments and
>>>> max_sectors respectively (the former is the maximum number of entries
>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
>>>> The default settings, assuming the HBA doesn't vary them are
>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
>>>> (1024). A quick calculation shows the latter is actually 512k or 128
>>>> pages (at 4k pages), hence the persistent 128 entry limit.
>>>>
>>>> However, raising max_sectors and sg_tablesize together still doesn't
>>>> help: There's actually an insidious limit sitting in the block layer as
>>>> well. This is what blk_queue_max_sectors says:
>>>>
>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
>>>> max_sectors)
>>>> {
>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
>>>> }
>>>>
>>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
>>>> q->max_hw_sectors = q->max_sectors = max_sectors;
>>>> else {
>>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
>>>> q->max_hw_sectors = max_sectors;
>>>> }
>>>> }
>>>>
>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
>>>> 128 scatterlist entries.
>>>>
>>>> Once I raised this limit as well, I was able to transfer over 128
>>>> scatterlist elements during benchmark test runs of normal I/O (actually
>>>> kernel compiles seem best, they hit 608 scatterlist entries).
>>>>
>>>> So my question, is there any reason not to raise this limit to something
>>>> large (like 65536) or even eliminate it altogether?
>>>>
>>>> James
>>>>
>>> I have an old branch here where I've swiped through the scsi drivers just
>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
>>> the code to change it to something driver specific if they really meant
>>> 255.
>>>
>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
>>> and some driver constant if there is a real limit. Though removing
>>> SG_ALL at the end.
>>>
>>> Should I freshen up this branch and send it.
>> By all means; however, I think having the defined constant SG_ALL is
>> useful (even if it is eventually just set to ~0) it means I can support
>> any scatterlist size. Having the drivers set sg_tablesize correctly
>> that can't support SG_ALL is pretty vital.
>>
>> Thanks,
>>
>> James
> OK will do.
>
> I have found the old branch and am looking. I agree with you about the
> SG_ALL. I will fix it to have a patch per changed driver, with out changing
> SG_ALL, and then final patch to just change SG_ALL.
>
> Boaz
James hi
reinspecting the code, what should I do with drivers that do not support chaining
do to SW that still do sglist++?
should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
a FIXME: in the submit message?
or should we fix them first and serialize this effort on top of those fixes.
(also in light of the other email where you removed the chaining flag)
Boaz
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
2008-01-15 19:52 ` Jeff Garzik
@ 2008-01-16 15:06 ` Jens Axboe
2008-01-16 15:47 ` James Bottomley
2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2008-01-16 15:06 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
On Tue, Jan 15 2008, James Bottomley wrote:
> I thought, now we had this new shiny code to increase the scatterlist
> table size I'd try it out. It turns out there's a pretty vast block
> conspiracy that prevents us going over 128 entries in a scatterlist.
>
> The first problems are in SCSI: The host parameters sg_tablesize and
> max_sectors are used to set the queue limits max_hw_segments and
> max_sectors respectively (the former is the maximum number of entries
> the HBA can tolerate in a scatterlist for each transaction, the latter
> is a total transfer cap on the maxiumum number of 512 byte sectors).
> The default settings, assuming the HBA doesn't vary them are
> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> (1024). A quick calculation shows the latter is actually 512k or 128
> pages (at 4k pages), hence the persistent 128 entry limit.
>
> However, raising max_sectors and sg_tablesize together still doesn't
> help: There's actually an insidious limit sitting in the block layer as
> well. This is what blk_queue_max_sectors says:
>
> void blk_queue_max_sectors(struct request_queue *q, unsigned int
> max_sectors)
> {
> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> }
>
> if (BLK_DEF_MAX_SECTORS > max_sectors)
> q->max_hw_sectors = q->max_sectors = max_sectors;
> else {
> q->max_sectors = BLK_DEF_MAX_SECTORS;
> q->max_hw_sectors = max_sectors;
> }
> }
>
> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> 128 scatterlist entries.
>
> Once I raised this limit as well, I was able to transfer over 128
> scatterlist elements during benchmark test runs of normal I/O (actually
> kernel compiles seem best, they hit 608 scatterlist entries).
>
> So my question, is there any reason not to raise this limit to something
> large (like 65536) or even eliminate it altogether?
That function is meant for low level drivers to set their hw limits. So
ideally it should just set ->max_hw_sectors to what the driver asks for.
As Jeff mentions, a long time ago we experimentally decided that going
above 512k typically didn't yield any benefit, so Linux should not
generate commands larger than that for normal fs io. That is what
BLK_DEF_MAX_SECTORS does.
IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb
for instance. Linux then sets that as the hw limit, and puts a
reasonable limit on the generated size based on a
throughput/latency/memory concern. I think that is quite reasonable, and
there's nothing preventing users from setting a larger size using sysfs
by echoing something into queue/max_sectors_kb. You can set > 512kb
there easily, as long as the max_hw_sectors_kb is honored.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 14:01 ` Boaz Harrosh
@ 2008-01-16 15:09 ` James Bottomley
2008-01-16 16:11 ` Boaz Harrosh
0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2008-01-16 15:09 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Jens Axboe, linux-scsi
On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> > On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
> >>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>> I thought, now we had this new shiny code to increase the scatterlist
> >>>> table size I'd try it out. It turns out there's a pretty vast block
> >>>> conspiracy that prevents us going over 128 entries in a scatterlist.
> >>>>
> >>>> The first problems are in SCSI: The host parameters sg_tablesize and
> >>>> max_sectors are used to set the queue limits max_hw_segments and
> >>>> max_sectors respectively (the former is the maximum number of entries
> >>>> the HBA can tolerate in a scatterlist for each transaction, the latter
> >>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
> >>>> The default settings, assuming the HBA doesn't vary them are
> >>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> >>>> (1024). A quick calculation shows the latter is actually 512k or 128
> >>>> pages (at 4k pages), hence the persistent 128 entry limit.
> >>>>
> >>>> However, raising max_sectors and sg_tablesize together still doesn't
> >>>> help: There's actually an insidious limit sitting in the block layer as
> >>>> well. This is what blk_queue_max_sectors says:
> >>>>
> >>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
> >>>> max_sectors)
> >>>> {
> >>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> >>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> >>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> >>>> }
> >>>>
> >>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
> >>>> q->max_hw_sectors = q->max_sectors = max_sectors;
> >>>> else {
> >>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
> >>>> q->max_hw_sectors = max_sectors;
> >>>> }
> >>>> }
> >>>>
> >>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> >>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> >>>> 128 scatterlist entries.
> >>>>
> >>>> Once I raised this limit as well, I was able to transfer over 128
> >>>> scatterlist elements during benchmark test runs of normal I/O (actually
> >>>> kernel compiles seem best, they hit 608 scatterlist entries).
> >>>>
> >>>> So my question, is there any reason not to raise this limit to something
> >>>> large (like 65536) or even eliminate it altogether?
> >>>>
> >>>> James
> >>>>
> >>> I have an old branch here where I've swiped through the scsi drivers just
> >>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
> >>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
> >>> the code to change it to something driver specific if they really meant
> >>> 255.
> >>>
> >>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
> >>> and some driver constant if there is a real limit. Though removing
> >>> SG_ALL at the end.
> >>>
> >>> Should I freshen up this branch and send it.
> >> By all means; however, I think having the defined constant SG_ALL is
> >> useful (even if it is eventually just set to ~0) it means I can support
> >> any scatterlist size. Having the drivers set sg_tablesize correctly
> >> that can't support SG_ALL is pretty vital.
> >>
> >> Thanks,
> >>
> >> James
> > OK will do.
> >
> > I have found the old branch and am looking. I agree with you about the
> > SG_ALL. I will fix it to have a patch per changed driver, with out changing
> > SG_ALL, and then final patch to just change SG_ALL.
> >
> > Boaz
>
>
> James hi
> reinspecting the code, what should I do with drivers that do not support chaining
> do to SW that still do sglist++?
>
> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
> a FIXME: in the submit message?
>
> or should we fix them first and serialize this effort on top of those fixes.
> (also in light of the other email where you removed the chaining flag)
How many of them are left?
The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
"[PATCH] remove use_sg_chaining" moved into a shared header. Worst
case, just use that and add a fixme comment giving the real value (if
there is one).
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 15:06 ` Jens Axboe
@ 2008-01-16 15:47 ` James Bottomley
2008-01-16 16:08 ` Jens Axboe
2008-02-22 16:13 ` Mike Christie
0 siblings, 2 replies; 15+ messages in thread
From: James Bottomley @ 2008-01-16 15:47 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi
On Wed, 2008-01-16 at 16:06 +0100, Jens Axboe wrote:
> On Tue, Jan 15 2008, James Bottomley wrote:
> > I thought, now we had this new shiny code to increase the scatterlist
> > table size I'd try it out. It turns out there's a pretty vast block
> > conspiracy that prevents us going over 128 entries in a scatterlist.
> >
> > The first problems are in SCSI: The host parameters sg_tablesize and
> > max_sectors are used to set the queue limits max_hw_segments and
> > max_sectors respectively (the former is the maximum number of entries
> > the HBA can tolerate in a scatterlist for each transaction, the latter
> > is a total transfer cap on the maxiumum number of 512 byte sectors).
> > The default settings, assuming the HBA doesn't vary them are
> > sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> > (1024). A quick calculation shows the latter is actually 512k or 128
> > pages (at 4k pages), hence the persistent 128 entry limit.
> >
> > However, raising max_sectors and sg_tablesize together still doesn't
> > help: There's actually an insidious limit sitting in the block layer as
> > well. This is what blk_queue_max_sectors says:
> >
> > void blk_queue_max_sectors(struct request_queue *q, unsigned int
> > max_sectors)
> > {
> > if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> > max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> > printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> > }
> >
> > if (BLK_DEF_MAX_SECTORS > max_sectors)
> > q->max_hw_sectors = q->max_sectors = max_sectors;
> > else {
> > q->max_sectors = BLK_DEF_MAX_SECTORS;
> > q->max_hw_sectors = max_sectors;
> > }
> > }
> >
> > So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> > defined in blkdev.h to .... 1024, thus also forcing the queue down to
> > 128 scatterlist entries.
> >
> > Once I raised this limit as well, I was able to transfer over 128
> > scatterlist elements during benchmark test runs of normal I/O (actually
> > kernel compiles seem best, they hit 608 scatterlist entries).
> >
> > So my question, is there any reason not to raise this limit to something
> > large (like 65536) or even eliminate it altogether?
>
> That function is meant for low level drivers to set their hw limits. So
> ideally it should just set ->max_hw_sectors to what the driver asks for.
>
> As Jeff mentions, a long time ago we experimentally decided that going
> above 512k typically didn't yield any benefit, so Linux should not
> generate commands larger than that for normal fs io. That is what
> BLK_DEF_MAX_SECTORS does.
>
> IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb
> for instance. Linux then sets that as the hw limit, and puts a
> reasonable limit on the generated size based on a
> throughput/latency/memory concern. I think that is quite reasonable, and
> there's nothing preventing users from setting a larger size using sysfs
> by echoing something into queue/max_sectors_kb. You can set > 512kb
> there easily, as long as the max_hw_sectors_kb is honored.
Yes, I can buy the argument for filesystem I/Os. What about tapes which
currently use the block queue and have internal home grown stuff to
handle larger transfers ... how are they supposed to set the larger
default sector size? Just modify the bare q->max_sectors?
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 15:47 ` James Bottomley
@ 2008-01-16 16:08 ` Jens Axboe
2008-02-22 16:13 ` Mike Christie
1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2008-01-16 16:08 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
On Wed, Jan 16 2008, James Bottomley wrote:
>
> On Wed, 2008-01-16 at 16:06 +0100, Jens Axboe wrote:
> > On Tue, Jan 15 2008, James Bottomley wrote:
> > > I thought, now we had this new shiny code to increase the scatterlist
> > > table size I'd try it out. It turns out there's a pretty vast block
> > > conspiracy that prevents us going over 128 entries in a scatterlist.
> > >
> > > The first problems are in SCSI: The host parameters sg_tablesize and
> > > max_sectors are used to set the queue limits max_hw_segments and
> > > max_sectors respectively (the former is the maximum number of entries
> > > the HBA can tolerate in a scatterlist for each transaction, the latter
> > > is a total transfer cap on the maxiumum number of 512 byte sectors).
> > > The default settings, assuming the HBA doesn't vary them are
> > > sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> > > (1024). A quick calculation shows the latter is actually 512k or 128
> > > pages (at 4k pages), hence the persistent 128 entry limit.
> > >
> > > However, raising max_sectors and sg_tablesize together still doesn't
> > > help: There's actually an insidious limit sitting in the block layer as
> > > well. This is what blk_queue_max_sectors says:
> > >
> > > void blk_queue_max_sectors(struct request_queue *q, unsigned int
> > > max_sectors)
> > > {
> > > if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> > > max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> > > printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> > > }
> > >
> > > if (BLK_DEF_MAX_SECTORS > max_sectors)
> > > q->max_hw_sectors = q->max_sectors = max_sectors;
> > > else {
> > > q->max_sectors = BLK_DEF_MAX_SECTORS;
> > > q->max_hw_sectors = max_sectors;
> > > }
> > > }
> > >
> > > So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> > > defined in blkdev.h to .... 1024, thus also forcing the queue down to
> > > 128 scatterlist entries.
> > >
> > > Once I raised this limit as well, I was able to transfer over 128
> > > scatterlist elements during benchmark test runs of normal I/O (actually
> > > kernel compiles seem best, they hit 608 scatterlist entries).
> > >
> > > So my question, is there any reason not to raise this limit to something
> > > large (like 65536) or even eliminate it altogether?
> >
> > That function is meant for low level drivers to set their hw limits. So
> > ideally it should just set ->max_hw_sectors to what the driver asks for.
> >
> > As Jeff mentions, a long time ago we experimentally decided that going
> > above 512k typically didn't yield any benefit, so Linux should not
> > generate commands larger than that for normal fs io. That is what
> > BLK_DEF_MAX_SECTORS does.
> >
> > IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb
> > for instance. Linux then sets that as the hw limit, and puts a
> > reasonable limit on the generated size based on a
> > throughput/latency/memory concern. I think that is quite reasonable, and
> > there's nothing preventing users from setting a larger size using sysfs
> > by echoing something into queue/max_sectors_kb. You can set > 512kb
> > there easily, as long as the max_hw_sectors_kb is honored.
>
> Yes, I can buy the argument for filesystem I/Os. What about tapes which
> currently use the block queue and have internal home grown stuff to
> handle larger transfers ... how are they supposed to set the larger
> default sector size? Just modify the bare q->max_sectors?
Yep, either that or we add a function for setting that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 15:09 ` James Bottomley
@ 2008-01-16 16:11 ` Boaz Harrosh
2008-01-16 16:37 ` Boaz Harrosh
0 siblings, 1 reply; 15+ messages in thread
From: Boaz Harrosh @ 2008-01-16 16:11 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>> I thought, now we had this new shiny code to increase the scatterlist
>>>>>> table size I'd try it out. It turns out there's a pretty vast block
>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
>>>>>>
>>>>>> The first problems are in SCSI: The host parameters sg_tablesize and
>>>>>> max_sectors are used to set the queue limits max_hw_segments and
>>>>>> max_sectors respectively (the former is the maximum number of entries
>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
>>>>>> The default settings, assuming the HBA doesn't vary them are
>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
>>>>>> (1024). A quick calculation shows the latter is actually 512k or 128
>>>>>> pages (at 4k pages), hence the persistent 128 entry limit.
>>>>>>
>>>>>> However, raising max_sectors and sg_tablesize together still doesn't
>>>>>> help: There's actually an insidious limit sitting in the block layer as
>>>>>> well. This is what blk_queue_max_sectors says:
>>>>>>
>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
>>>>>> max_sectors)
>>>>>> {
>>>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
>>>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
>>>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
>>>>>> }
>>>>>>
>>>>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
>>>>>> q->max_hw_sectors = q->max_sectors = max_sectors;
>>>>>> else {
>>>>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
>>>>>> q->max_hw_sectors = max_sectors;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
>>>>>> 128 scatterlist entries.
>>>>>>
>>>>>> Once I raised this limit as well, I was able to transfer over 128
>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually
>>>>>> kernel compiles seem best, they hit 608 scatterlist entries).
>>>>>>
>>>>>> So my question, is there any reason not to raise this limit to something
>>>>>> large (like 65536) or even eliminate it altogether?
>>>>>>
>>>>>> James
>>>>>>
>>>>> I have an old branch here where I've swiped through the scsi drivers just
>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
>>>>> the code to change it to something driver specific if they really meant
>>>>> 255.
>>>>>
>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
>>>>> and some driver constant if there is a real limit. Though removing
>>>>> SG_ALL at the end.
>>>>>
>>>>> Should I freshen up this branch and send it.
>>>> By all means; however, I think having the defined constant SG_ALL is
>>>> useful (even if it is eventually just set to ~0) it means I can support
>>>> any scatterlist size. Having the drivers set sg_tablesize correctly
>>>> that can't support SG_ALL is pretty vital.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>> OK will do.
>>>
>>> I have found the old branch and am looking. I agree with you about the
>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing
>>> SG_ALL, and then final patch to just change SG_ALL.
>>>
>>> Boaz
>>
>> James hi
>> reinspecting the code, what should I do with drivers that do not support chaining
>> do to SW that still do sglist++?
>>
>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
>> a FIXME: in the submit message?
>>
>> or should we fix them first and serialize this effort on top of those fixes.
>> (also in light of the other email where you removed the chaining flag)
>
> How many of them are left?
>
> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
> "[PATCH] remove use_sg_chaining" moved into a shared header. Worst
> case, just use that and add a fixme comment giving the real value (if
> there is one).
>
> James
>
>
I have 9 up to now and 10 more drivers to check. All but one are
SW, one by one SCp.buffer++, so once it's fixed they should be able
to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS
as you requested. I have not checked drivers that did not use SG_ALL
but I trust these are usually smaller.
Boaz
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 16:11 ` Boaz Harrosh
@ 2008-01-16 16:37 ` Boaz Harrosh
2008-01-16 16:46 ` James Bottomley
0 siblings, 1 reply; 15+ messages in thread
From: Boaz Harrosh @ 2008-01-16 16:37 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
>>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
>>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>>> I thought, now we had this new shiny code to increase the scatterlist
>>>>>>> table size I'd try it out. It turns out there's a pretty vast block
>>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
>>>>>>>
>>>>>>> The first problems are in SCSI: The host parameters sg_tablesize and
>>>>>>> max_sectors are used to set the queue limits max_hw_segments and
>>>>>>> max_sectors respectively (the former is the maximum number of entries
>>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
>>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
>>>>>>> The default settings, assuming the HBA doesn't vary them are
>>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
>>>>>>> (1024). A quick calculation shows the latter is actually 512k or 128
>>>>>>> pages (at 4k pages), hence the persistent 128 entry limit.
>>>>>>>
>>>>>>> However, raising max_sectors and sg_tablesize together still doesn't
>>>>>>> help: There's actually an insidious limit sitting in the block layer as
>>>>>>> well. This is what blk_queue_max_sectors says:
>>>>>>>
>>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
>>>>>>> max_sectors)
>>>>>>> {
>>>>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
>>>>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
>>>>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
>>>>>>> }
>>>>>>>
>>>>>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
>>>>>>> q->max_hw_sectors = q->max_sectors = max_sectors;
>>>>>>> else {
>>>>>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
>>>>>>> q->max_hw_sectors = max_sectors;
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
>>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
>>>>>>> 128 scatterlist entries.
>>>>>>>
>>>>>>> Once I raised this limit as well, I was able to transfer over 128
>>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually
>>>>>>> kernel compiles seem best, they hit 608 scatterlist entries).
>>>>>>>
>>>>>>> So my question, is there any reason not to raise this limit to something
>>>>>>> large (like 65536) or even eliminate it altogether?
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>> I have an old branch here where I've swiped through the scsi drivers just
>>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
>>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
>>>>>> the code to change it to something driver specific if they really meant
>>>>>> 255.
>>>>>>
>>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
>>>>>> and some driver constant if there is a real limit. Though removing
>>>>>> SG_ALL at the end.
>>>>>>
>>>>>> Should I freshen up this branch and send it.
>>>>> By all means; however, I think having the defined constant SG_ALL is
>>>>> useful (even if it is eventually just set to ~0) it means I can support
>>>>> any scatterlist size. Having the drivers set sg_tablesize correctly
>>>>> that can't support SG_ALL is pretty vital.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>> OK will do.
>>>>
>>>> I have found the old branch and am looking. I agree with you about the
>>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing
>>>> SG_ALL, and then final patch to just change SG_ALL.
>>>>
>>>> Boaz
>>> James hi
>>> reinspecting the code, what should I do with drivers that do not support chaining
>>> do to SW that still do sglist++?
>>>
>>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
>>> a FIXME: in the submit message?
>>>
>>> or should we fix them first and serialize this effort on top of those fixes.
>>> (also in light of the other email where you removed the chaining flag)
>> How many of them are left?
>>
>> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
>> "[PATCH] remove use_sg_chaining" moved into a shared header. Worst
>> case, just use that and add a fixme comment giving the real value (if
>> there is one).
>>
>> James
>>
>>
>
> I have 9 up to now and 10 more drivers to check. All but one are
> SW, one by one SCp.buffer++, so once it's fixed they should be able
> to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS
> as you requested. I have not checked drivers that did not use SG_ALL
> but I trust these are usually smaller.
>
> Boaz
>
>
James Hi.
Looking at the patches I just realized that I made a mistake and did
not work on top of your: "[PATCH] remove use_sg_chaining" .
Now rebasing should be easy but I think my patch should go first because
there are some 10-15 drivers that are not chained ready but will work
perfectly after my patch that sets sg_tablesize to SCSI_MAX_SG_SEGMENTS
should I rebase or should "[PATCH] remove use_sg_chaining" be rebased?
Thanks
Boaz
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 16:37 ` Boaz Harrosh
@ 2008-01-16 16:46 ` James Bottomley
0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2008-01-16 16:46 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Jens Axboe, linux-scsi
On Wed, 2008-01-16 at 18:37 +0200, Boaz Harrosh wrote:
> On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> > On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
> >>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> >>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
> >>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>>>> I thought, now we had this new shiny code to increase the scatterlist
> >>>>>>> table size I'd try it out. It turns out there's a pretty vast block
> >>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
> >>>>>>>
> >>>>>>> The first problems are in SCSI: The host parameters sg_tablesize and
> >>>>>>> max_sectors are used to set the queue limits max_hw_segments and
> >>>>>>> max_sectors respectively (the former is the maximum number of entries
> >>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
> >>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
> >>>>>>> The default settings, assuming the HBA doesn't vary them are
> >>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> >>>>>>> (1024). A quick calculation shows the latter is actually 512k or 128
> >>>>>>> pages (at 4k pages), hence the persistent 128 entry limit.
> >>>>>>>
> >>>>>>> However, raising max_sectors and sg_tablesize together still doesn't
> >>>>>>> help: There's actually an insidious limit sitting in the block layer as
> >>>>>>> well. This is what blk_queue_max_sectors says:
> >>>>>>>
> >>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
> >>>>>>> max_sectors)
> >>>>>>> {
> >>>>>>> if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> >>>>>>> max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> >>>>>>> printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> >>>>>>> }
> >>>>>>>
> >>>>>>> if (BLK_DEF_MAX_SECTORS > max_sectors)
> >>>>>>> q->max_hw_sectors = q->max_sectors = max_sectors;
> >>>>>>> else {
> >>>>>>> q->max_sectors = BLK_DEF_MAX_SECTORS;
> >>>>>>> q->max_hw_sectors = max_sectors;
> >>>>>>> }
> >>>>>>> }
> >>>>>>>
> >>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> >>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> >>>>>>> 128 scatterlist entries.
> >>>>>>>
> >>>>>>> Once I raised this limit as well, I was able to transfer over 128
> >>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually
> >>>>>>> kernel compiles seem best, they hit 608 scatterlist entries).
> >>>>>>>
> >>>>>>> So my question, is there any reason not to raise this limit to something
> >>>>>>> large (like 65536) or even eliminate it altogether?
> >>>>>>>
> >>>>>>> James
> >>>>>>>
> >>>>>> I have an old branch here where I've swiped through the scsi drivers just
> >>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
> >>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
> >>>>>> the code to change it to something driver specific if they really meant
> >>>>>> 255.
> >>>>>>
> >>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
> >>>>>> and some driver constant if there is a real limit. Though removing
> >>>>>> SG_ALL at the end.
> >>>>>>
> >>>>>> Should I freshen up this branch and send it.
> >>>>> By all means; however, I think having the defined constant SG_ALL is
> >>>>> useful (even if it is eventually just set to ~0) it means I can support
> >>>>> any scatterlist size. Having the drivers set sg_tablesize correctly
> >>>>> that can't support SG_ALL is pretty vital.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> James
> >>>> OK will do.
> >>>>
> >>>> I have found the old branch and am looking. I agree with you about the
> >>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing
> >>>> SG_ALL, and then final patch to just change SG_ALL.
> >>>>
> >>>> Boaz
> >>> James hi
> >>> reinspecting the code, what should I do with drivers that do not support chaining
> >>> do to SW that still do sglist++?
> >>>
> >>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
> >>> a FIXME: in the submit message?
> >>>
> >>> or should we fix them first and serialize this effort on top of those fixes.
> >>> (also in light of the other email where you removed the chaining flag)
> >> How many of them are left?
> >>
> >> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
> >> "[PATCH] remove use_sg_chaining" moved into a shared header. Worst
> >> case, just use that and add a fixme comment giving the real value (if
> >> there is one).
> >>
> >> James
> >>
> >>
> >
> > I have 9 up to now and 10 more drivers to check. All but one are
> > SW, one by one SCp.buffer++, so once it's fixed they should be able
> > to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS
> > as you requested. I have not checked drivers that did not use SG_ALL
> > but I trust these are usually smaller.
> >
> > Boaz
> >
> >
> James Hi.
>
> Looking at the patches I just realized that I made a mistake and did
> not work on top of your: "[PATCH] remove use_sg_chaining" .
> Now rebasing should be easy but I think my patch should go first because
> there are some 10-15 drivers that are not chained ready but will work
> perfectly after my patch that sets sg_tablesize to SCSI_MAX_SG_SEGMENTS
>
> should I rebase or should "[PATCH] remove use_sg_chaining" be rebased?
The order doesn't matter; the two patches are completely orthogonal.
Just send the list what you have ... I'm rebasing a lot of stuff fairly
often at this stage in the merge cycle.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Actually using the sg table/chain code
2008-01-16 15:47 ` James Bottomley
2008-01-16 16:08 ` Jens Axboe
@ 2008-02-22 16:13 ` Mike Christie
1 sibling, 0 replies; 15+ messages in thread
From: Mike Christie @ 2008-02-22 16:13 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-scsi
James Bottomley wrote:
> Yes, I can buy the argument for filesystem I/Os. What about tapes which
> currently use the block queue and have internal home grown stuff to
> handle larger transfers ... how are they supposed to set the larger
> default sector size? Just modify the bare q->max_sectors?
>
Sorry for the late response. I have been doing userspace stuff and not
keeping up with linux-scsi :(
For scsi tape and passthrough (any place we use REQ_TYPE_BLOCK_PC like
with st or sg or block/scsi_ioctl or bsg), the block/bio/scatterlist
building code ignores q->max_sectors and uses q->max_hw_sectors.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-02-22 16:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
2008-01-15 16:49 ` James Bottomley
2008-01-15 17:35 ` Boaz Harrosh
2008-01-16 14:01 ` Boaz Harrosh
2008-01-16 15:09 ` James Bottomley
2008-01-16 16:11 ` Boaz Harrosh
2008-01-16 16:37 ` Boaz Harrosh
2008-01-16 16:46 ` James Bottomley
2008-01-15 19:52 ` Jeff Garzik
2008-01-15 20:14 ` James Bottomley
2008-01-16 15:06 ` Jens Axboe
2008-01-16 15:47 ` James Bottomley
2008-01-16 16:08 ` Jens Axboe
2008-02-22 16:13 ` Mike Christie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).