libata+SGIO: is .dma_boundary respected?

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* libata+SGIO:  is .dma_boundary respected?
@ 2006-03-19 20:48 Mark Lord
  2006-03-19 21:14 ` Jeff Garzik
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-19 20:48 UTC (permalink / raw)
  To: Jens Axboe, Jeff Garzik, IDE/ATA development list

Jens / Jeff,

Each libata driver registers a .dma_boundary field with SCSI.
This field is used to prevent merging of bio segments across
a hardware limitation boundary, usually 0xffff.

This looks like it works for regular block I/O,
but I'm not so sure about SGIO originated requests.

Any thoughts, or code you can point me to?

Thanks

Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 20:48 libata+SGIO: is .dma_boundary respected? Mark Lord
@ 2006-03-19 21:14 ` Jeff Garzik
  2006-03-19 21:19   ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-19 21:14 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Jens / Jeff,
> 
> Each libata driver registers a .dma_boundary field with SCSI.
> This field is used to prevent merging of bio segments across
> a hardware limitation boundary, usually 0xffff.
> 
> This looks like it works for regular block I/O,
> but I'm not so sure about SGIO originated requests.
> 
> Any thoughts, or code you can point me to?

Everything goes through the block layer, including SG_IO, so everyone 
agrees on the boundaries that must be respected.

scsi sets blk_queue_segment_boundary() then gets out of the way, for the 
most part.  BIOVEC_SEG_BOUNDARY() is the macro that accesses this. 
Trace back and forth from there.  You will probably run into a call to 
blk_recount_segments() in __bio_add_page(), or maybe you'll check the 
seg boundary from another path.

	Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:14 ` Jeff Garzik
@ 2006-03-19 21:19   ` Mark Lord
  2006-03-19 21:38     ` Jeff Garzik
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-19 21:19 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Jeff Garzik wrote:
> Mark Lord wrote:
>> Jens / Jeff,
>>
>> Each libata driver registers a .dma_boundary field with SCSI.
>> This field is used to prevent merging of bio segments across
>> a hardware limitation boundary, usually 0xffff.
>>
>> This looks like it works for regular block I/O,
>> but I'm not so sure about SGIO originated requests.
>>
>> Any thoughts, or code you can point me to?
> 
> Everything goes through the block layer, including SG_IO, so everyone 
> agrees on the boundaries that must be respected.
> 
> scsi sets blk_queue_segment_boundary() then gets out of the way, for the 
> most part.  BIOVEC_SEG_BOUNDARY() is the macro that accesses this. Trace 
> back and forth from there.  You will probably run into a call to 
> blk_recount_segments() in __bio_add_page(), or maybe you'll check the 
> seg boundary from another path.

Yeah, I'm familiar with that part, and thanks for the note about SGIO.

So therefore, code to manage the dma_boundary is NOT necessary in sata drivers. 
Right?  Currently we have in sata_mv.c:

	MV_DMA_BOUNDARY = 0xffff;
                 while (sg_len) {
                         offset = addr & MV_DMA_BOUNDARY;
                         len = sg_len;
                         if ((offset + sg_len) > 0x10000)
                                 len = 0x10000 - offset;
                 ...


That whole block should be able to go, then.

Cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:19   ` Mark Lord
@ 2006-03-19 21:38     ` Jeff Garzik
  2006-03-19 21:45       ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-19 21:38 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Jeff Garzik wrote:
> 
>> Mark Lord wrote:
>>
>>> Jens / Jeff,
>>>
>>> Each libata driver registers a .dma_boundary field with SCSI.
>>> This field is used to prevent merging of bio segments across
>>> a hardware limitation boundary, usually 0xffff.
>>>
>>> This looks like it works for regular block I/O,
>>> but I'm not so sure about SGIO originated requests.
>>>
>>> Any thoughts, or code you can point me to?
>>
>>
>> Everything goes through the block layer, including SG_IO, so everyone 
>> agrees on the boundaries that must be respected.
>>
>> scsi sets blk_queue_segment_boundary() then gets out of the way, for 
>> the most part.  BIOVEC_SEG_BOUNDARY() is the macro that accesses this. 
>> Trace back and forth from there.  You will probably run into a call to 
>> blk_recount_segments() in __bio_add_page(), or maybe you'll check the 
>> seg boundary from another path.
> 
> 
> Yeah, I'm familiar with that part, and thanks for the note about SGIO.
> 
> So therefore, code to manage the dma_boundary is NOT necessary in sata 
> drivers. Right?  Currently we have in sata_mv.c:
> 
>     MV_DMA_BOUNDARY = 0xffff;
>                 while (sg_len) {
>                         offset = addr & MV_DMA_BOUNDARY;
>                         len = sg_len;
>                         if ((offset + sg_len) > 0x10000)
>                                 len = 0x10000 - offset;
>                 ...
> 
> 
> That whole block should be able to go, then.

Incorrect.  :)

The idiot IOMMU layer may merge too aggressively, which is the reason 
for this code and similar code in ata_fill_sg().  The IOMMU stuff always 
happens at pci_map_sg() time, after the block layer gets out of the way.

	Jeff




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:38     ` Jeff Garzik
@ 2006-03-19 21:45       ` Mark Lord
  2006-03-19 21:54         ` Mark Lord
  2006-03-21  1:15         ` Jeff Garzik
  0 siblings, 2 replies; 29+ messages in thread
From: Mark Lord @ 2006-03-19 21:45 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Jeff Garzik wrote:
> Mark Lord wrote:
>
>> So therefore, code to manage the dma_boundary is NOT necessary in sata 
>> drivers. Right?  Currently we have in sata_mv.c:
>>
>>     MV_DMA_BOUNDARY = 0xffff;
>>                 while (sg_len) {
>>                         offset = addr & MV_DMA_BOUNDARY;
>>                         len = sg_len;
>>                         if ((offset + sg_len) > 0x10000)
>>                                 len = 0x10000 - offset;
>>                 ...
>>
>>
>> That whole block should be able to go, then.
> 
> Incorrect.  :)
> 
> The idiot IOMMU layer may merge too aggressively, which is the reason 
> for this code and similar code in ata_fill_sg().  The IOMMU stuff always 
> happens at pci_map_sg() time, after the block layer gets out of the way.

Ahh.. then how does the low-level driver know what to use for ".sg_tablesize"?

It cannot use the real hardware/driver value, because it may need to do
request splitting.  I wonder what the worst case number of splits required
is, for each sg[] entry?

?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:45       ` Mark Lord
@ 2006-03-19 21:54         ` Mark Lord
  2006-03-21  1:18           ` Jeff Garzik
  2006-03-21  1:15         ` Jeff Garzik
  1 sibling, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-19 21:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Jeff Garzik wrote:
>
>> The idiot IOMMU layer may merge too aggressively, which is the reason 
>> for this code and similar code in ata_fill_sg().  The IOMMU stuff 
>> always happens at pci_map_sg() time, after the block layer gets out of 
>> the way.
> 
> Ahh.. then how does the low-level driver know what to use for 
> ".sg_tablesize"?
> 
> It cannot use the real hardware/driver value, because it may need to do
> request splitting.  I wonder what the worst case number of splits required
> is, for each sg[] entry?

Mmmm.  I suppose the answer is that the block layer guarantees
no more than .sg_tablesize entries, and the IOMMU layer may reduce
the segment count, but never increase it.

So the low-level driver should be able to safely use it's own internal
hardware/driver limit when registering .sg_tablesize.

Mmm..  sata_mv.c currently divides by two.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:45       ` Mark Lord
  2006-03-19 21:54         ` Mark Lord
@ 2006-03-21  1:15         ` Jeff Garzik
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21  1:15 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Jeff Garzik wrote:
> 
>> Mark Lord wrote:
>>
>>> So therefore, code to manage the dma_boundary is NOT necessary in 
>>> sata drivers. Right?  Currently we have in sata_mv.c:
>>>
>>>     MV_DMA_BOUNDARY = 0xffff;
>>>                 while (sg_len) {
>>>                         offset = addr & MV_DMA_BOUNDARY;
>>>                         len = sg_len;
>>>                         if ((offset + sg_len) > 0x10000)
>>>                                 len = 0x10000 - offset;
>>>                 ...
>>>
>>>
>>> That whole block should be able to go, then.
>>
>>
>> Incorrect.  :)
>>
>> The idiot IOMMU layer may merge too aggressively, which is the reason 
>> for this code and similar code in ata_fill_sg().  The IOMMU stuff 
>> always happens at pci_map_sg() time, after the block layer gets out of 
>> the way.
> 
> 
> Ahh.. then how does the low-level driver know what to use for 
> ".sg_tablesize"?
> 
> It cannot use the real hardware/driver value, because it may need to do
> request splitting.  I wonder what the worst case number of splits required
> is, for each sg[] entry?

To answer that question, you have to take into account the 64k DMA 
boundary requirement, the worst case split (==sg_tablesize), and how 
many splits required in each s/g entry -- in the case of sata_mv.c and 
ata_fill_sg(), worst case is one split per s/g entry.

	Jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-19 21:54         ` Mark Lord
@ 2006-03-21  1:18           ` Jeff Garzik
  2006-03-21  4:43             ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21  1:18 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Mark Lord wrote:
> 
>> Jeff Garzik wrote:
>>
>>> The idiot IOMMU layer may merge too aggressively, which is the reason 
>>> for this code and similar code in ata_fill_sg().  The IOMMU stuff 
>>> always happens at pci_map_sg() time, after the block layer gets out 
>>> of the way.
>>
>>
>> Ahh.. then how does the low-level driver know what to use for 
>> ".sg_tablesize"?
>>
>> It cannot use the real hardware/driver value, because it may need to do
>> request splitting.  I wonder what the worst case number of splits 
>> required
>> is, for each sg[] entry?
> 
> 
> Mmmm.  I suppose the answer is that the block layer guarantees
> no more than .sg_tablesize entries, and the IOMMU layer may reduce
> the segment count, but never increase it.
> 
> So the low-level driver should be able to safely use it's own internal
> hardware/driver limit when registering .sg_tablesize.

The IOMMU layer can merge across 64k boundaries, yet still produce a 
worst case s/g entry count.  Thus, you wind up with sg_tablesize 
entries, and splits still to be done.

That's why drivers that worry about 64k boundary have to give a false 
sg_tablesize to the SCSI layer: to reserve sufficient "true" s/g entries 
for the worst case IOMMU split.

	Jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21  1:18           ` Jeff Garzik
@ 2006-03-21  4:43             ` Mark Lord
  2006-03-21  6:14               ` Jeff Garzik
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-21  4:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Jeff Garzik wrote:
> Mark Lord wrote:
..
>>> Ahh.. then how does the low-level driver know what to use for 
>>> ".sg_tablesize"?
>>>
>>> It cannot use the real hardware/driver value, because it may need to do
>>> request splitting.  I wonder what the worst case number of splits 
>>> required
>>> is, for each sg[] entry?
>>
>>
>> Mmmm.  I suppose the answer is that the block layer guarantees
>> no more than .sg_tablesize entries, and the IOMMU layer may reduce
>> the segment count, but never increase it.
>>
>> So the low-level driver should be able to safely use it's own internal
>> hardware/driver limit when registering .sg_tablesize.
> 
> The IOMMU layer can merge across 64k boundaries, yet still produce a 
> worst case s/g entry count.  Thus, you wind up with sg_tablesize 
> entries, and splits still to be done.
> 
> That's why drivers that worry about 64k boundary have to give a false 
> sg_tablesize to the SCSI layer: to reserve sufficient "true" s/g entries 
> for the worst case IOMMU split.

But what is the worst case?  What's to stop the IOMMU layer from merging,
say, thirty 64KB segments into a single SG entry.  And then doing that
several times..  nothing.

But (as I replied to myself earlier), I think it is a non issue,
because the IOMMU merging cannot produce more SG entries than
there were originally.  It may produce less, and the driver may then
end up splitting them apart again, but it will never exceed what
the block layer permitted in the first place.

So, I think that means the driver can report the real SG tablesize,
and not worry about divide-by-??? margins.

Cheers


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21  4:43             ` Mark Lord
@ 2006-03-21  6:14               ` Jeff Garzik
  2006-03-21 13:59                 ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21  6:14 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> But (as I replied to myself earlier), I think it is a non issue,
> because the IOMMU merging cannot produce more SG entries than
> there were originally.  It may produce less, and the driver may then
> end up splitting them apart again, but it will never exceed what
> the block layer permitted in the first place.

That says nothing about the boundaries upon which the IOMMU layer will 
or will not merge.  Without the fix, the problem case happens when (for 
example) the IOMMU output produces sg_tablesize segments, but some of 
those segments cross a 64k boundary and need to be split.

	Jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21  6:14               ` Jeff Garzik
@ 2006-03-21 13:59                 ` Mark Lord
  2006-03-21 18:42                   ` Jens Axboe
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-21 13:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Jeff Garzik wrote:
> Mark Lord wrote:
>> But (as I replied to myself earlier), I think it is a non issue,
>> because the IOMMU merging cannot produce more SG entries than
>> there were originally.  It may produce less, and the driver may then
>> end up splitting them apart again, but it will never exceed what
>> the block layer permitted in the first place.
> 
> That says nothing about the boundaries upon which the IOMMU layer will 
> or will not merge.  Without the fix, the problem case happens when (for 
> example) the IOMMU output produces sg_tablesize segments, but some of 
> those segments cross a 64k boundary and need to be split.

Yes, but the merging happens *after* the block layer has already
guaranteed an sg list that respects what the driver told it.

So worst case, the IOMMU merges the entire sg list into a single
multi-megabyte segment, and then the driver's fill_sg() function
splits it all apart again while making up it's PRD list.

Even in that worst case, the number of segments for the PRD list
will *always* be less than or equal to the size of the original
block layer sg list.

So no need for hocus-pocus divide by 2.
Good thing, too, because "divide by 2" would also fail
if the above stuff were not true.

Think about it some more.

Jens, you know more about this stuff than most folks:
What am I missing?

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 13:59                 ` Mark Lord
@ 2006-03-21 18:42                   ` Jens Axboe
  2006-03-21 19:18                     ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2006-03-21 18:42 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jeff Garzik, IDE/ATA development list

On Tue, Mar 21 2006, Mark Lord wrote:
> Jeff Garzik wrote:
> >Mark Lord wrote:
> >>But (as I replied to myself earlier), I think it is a non issue,
> >>because the IOMMU merging cannot produce more SG entries than
> >>there were originally.  It may produce less, and the driver may then
> >>end up splitting them apart again, but it will never exceed what
> >>the block layer permitted in the first place.
> >
> >That says nothing about the boundaries upon which the IOMMU layer will 
> >or will not merge.  Without the fix, the problem case happens when (for 
> >example) the IOMMU output produces sg_tablesize segments, but some of 
> >those segments cross a 64k boundary and need to be split.
> 
> Yes, but the merging happens *after* the block layer has already
> guaranteed an sg list that respects what the driver told it.
> 
> So worst case, the IOMMU merges the entire sg list into a single
> multi-megabyte segment, and then the driver's fill_sg() function
> splits it all apart again while making up it's PRD list.
> 
> Even in that worst case, the number of segments for the PRD list
> will *always* be less than or equal to the size of the original
> block layer sg list.
> 
> So no need for hocus-pocus divide by 2.
> Good thing, too, because "divide by 2" would also fail
> if the above stuff were not true.
> 
> Think about it some more.
> 
> Jens, you know more about this stuff than most folks:
> What am I missing?

Seems to me that your reasoning is correct. It's a fact that the
original block mapped sg lists satisfies all requirements of the device
driver and/or hardware, otherwise would be a bug. The iommu may go nuts
of course, but logically that new sg list should be choppable into the
same requirements.

It would be much nicer if the iommu actually had some more knowledge,
ideally the same requirements that the block layer is faced with. No
driver should have to check the mapped sg list.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 18:42                   ` Jens Axboe
@ 2006-03-21 19:18                     ` Mark Lord
  2006-03-21 19:29                       ` Jeff Garzik
                                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Mark Lord @ 2006-03-21 19:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Garzik, IDE/ATA development list

Jens Axboe wrote:
..
> Seems to me that your reasoning is correct. It's a fact that the
> original block mapped sg lists satisfies all requirements of the device
> driver and/or hardware, otherwise would be a bug. The iommu may go nuts
> of course, but logically that new sg list should be choppable into the
> same requirements.

I just finished going through all of the arch implementations and,
as near as I can tell, they only ever *merge* sg list items,
and never create additional sg entries.

So low-level drivers (at present) can safely report their real limits,
and then in their fill_sg() routines they can run around and split up
any IOMMU merges that their hardware cannot tolerate.

> It would be much nicer if the iommu actually had some more knowledge,
> ideally the same requirements that the block layer is faced with. No
> driver should have to check the mapped sg list.

Yup.  Absolutely.  So long as they continue to never *add* new sg entries
(only doing merges instead), then I believe they just need to know the
device's .dma_boundary parameter.  We could pass this to them as an extra
parameters, or perhaps embed it into the sg_list data structure somehow.

In the case of sata_mv on the Marvell 6081 (which I'm looking at this week)
it's hardware limit is actually 0xffffffff rather than 0xffff.

I wonder how well Linux drivers in general deal with that on a 64-bit machine?

Cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:18                     ` Mark Lord
@ 2006-03-21 19:29                       ` Jeff Garzik
  2006-03-21 19:31                         ` Mark Lord
  2006-03-21 19:31                       ` Jens Axboe
  2006-03-22 11:25                       ` Tejun Heo
  2 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21 19:29 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jens Axboe, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

Mark Lord wrote:
> Jens Axboe wrote:
> ..
> 
>> Seems to me that your reasoning is correct. It's a fact that the
>> original block mapped sg lists satisfies all requirements of the device
>> driver and/or hardware, otherwise would be a bug. The iommu may go nuts
>> of course, but logically that new sg list should be choppable into the
>> same requirements.
> 
> 
> I just finished going through all of the arch implementations and,
> as near as I can tell, they only ever *merge* sg list items,
> and never create additional sg entries.
> 
> So low-level drivers (at present) can safely report their real limits,
> and then in their fill_sg() routines they can run around and split up
> any IOMMU merges that their hardware cannot tolerate.

I remain highly skeptical, and would be interested to see James and Ben 
weight in on the subject, as they were the key iommu vmerge people 
around the time libata ata_fill_sg() was originally written (and fixed 
by BenH).


>> It would be much nicer if the iommu actually had some more knowledge,
>> ideally the same requirements that the block layer is faced with. No
>> driver should have to check the mapped sg list.


> Yup.  Absolutely.  So long as they continue to never *add* new sg entries
> (only doing merges instead), then I believe they just need to know the
> device's .dma_boundary parameter.  We could pass this to them as an extra
> parameters, or perhaps embed it into the sg_list data structure somehow.
> 
> In the case of sata_mv on the Marvell 6081 (which I'm looking at this week)
> it's hardware limit is actually 0xffffffff rather than 0xffff.

If the limit is not 0xffff, then there's no need for any of this 
limitation junk.  No s/g entry splitting after pci_map_sg(), no 
artificial sg_tablesize limitation, etc.


> I wonder how well Linux drivers in general deal with that on a 64-bit 
> machine?

Works just fine.

	Jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:29                       ` Jeff Garzik
@ 2006-03-21 19:31                         ` Mark Lord
  2006-03-21 19:33                           ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-21 19:31 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jens Axboe, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

Jeff Garzik wrote:
>
>> In the case of sata_mv on the Marvell 6081 (which I'm looking at this 
>> week)
>> it's hardware limit is actually 0xffffffff rather than 0xffff.
> 
> If the limit is not 0xffff, then there's no need for any of this 
> limitation junk.  No s/g entry splitting after pci_map_sg(), no 
> artificial sg_tablesize limitation, etc.

Not even for a merged IOMMU segment that crosses the 4GB "boundary" ?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:18                     ` Mark Lord
  2006-03-21 19:29                       ` Jeff Garzik
@ 2006-03-21 19:31                       ` Jens Axboe
  2006-03-21 19:36                         ` Mark Lord
  2006-03-22 11:25                       ` Tejun Heo
  2 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2006-03-21 19:31 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jeff Garzik, IDE/ATA development list

On Tue, Mar 21 2006, Mark Lord wrote:
> Jens Axboe wrote:
> ..
> >Seems to me that your reasoning is correct. It's a fact that the
> >original block mapped sg lists satisfies all requirements of the device
> >driver and/or hardware, otherwise would be a bug. The iommu may go nuts
> >of course, but logically that new sg list should be choppable into the
> >same requirements.
> 
> I just finished going through all of the arch implementations and,
> as near as I can tell, they only ever *merge* sg list items,
> and never create additional sg entries.
> 
> So low-level drivers (at present) can safely report their real limits,
> and then in their fill_sg() routines they can run around and split up
> any IOMMU merges that their hardware cannot tolerate.

Definitely, it would be highly illegal for the iommu code to do that.
And pretty odd, too :-)

> >It would be much nicer if the iommu actually had some more knowledge,
> >ideally the same requirements that the block layer is faced with. No
> >driver should have to check the mapped sg list.
> 
> Yup.  Absolutely.  So long as they continue to never *add* new sg entries
> (only doing merges instead), then I believe they just need to know the
> device's .dma_boundary parameter.  We could pass this to them as an extra
> parameters, or perhaps embed it into the sg_list data structure somehow.

You want max size as well, so boundary and max size should be enough.

> In the case of sata_mv on the Marvell 6081 (which I'm looking at this 
> week)
> it's hardware limit is actually 0xffffffff rather than 0xffff.
> 
> I wonder how well Linux drivers in general deal with that on a 64-bit 
> machine?

0xffffffff is the default boundary exactly because we assume (based on
experience and real hardware, aic7xxx springs to mind) that most
hardware cannot deal with a 4GB wrap.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:31                         ` Mark Lord
@ 2006-03-21 19:33                           ` Mark Lord
  2006-03-21 19:35                             ` Jens Axboe
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-21 19:33 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jens Axboe, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

Mark Lord wrote:
> Jeff Garzik wrote:
>>
>>> In the case of sata_mv on the Marvell 6081 (which I'm looking at this 
>>> week)
>>> it's hardware limit is actually 0xffffffff rather than 0xffff.
>>
>> If the limit is not 0xffff, then there's no need for any of this 
>> limitation junk.  No s/g entry splitting after pci_map_sg(), no 
>> artificial sg_tablesize limitation, etc.
> 
> Not even for a merged IOMMU segment that crosses the 4GB "boundary" ?

Clarification:  this is a 64-bit PCI(e/X) device, and the above query
applies mainly to it's use in a 64-bit slot on a 64-bit kernel.

It's not clear to me whether this can be an issue on a 32-bit kernel
on 36-bit hardware, though.

Cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:33                           ` Mark Lord
@ 2006-03-21 19:35                             ` Jens Axboe
  2006-03-21 19:38                               ` Jeff Garzik
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2006-03-21 19:35 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

On Tue, Mar 21 2006, Mark Lord wrote:
> Mark Lord wrote:
> >Jeff Garzik wrote:
> >>
> >>>In the case of sata_mv on the Marvell 6081 (which I'm looking at this 
> >>>week)
> >>>it's hardware limit is actually 0xffffffff rather than 0xffff.
> >>
> >>If the limit is not 0xffff, then there's no need for any of this 
> >>limitation junk.  No s/g entry splitting after pci_map_sg(), no 
> >>artificial sg_tablesize limitation, etc.
> >
> >Not even for a merged IOMMU segment that crosses the 4GB "boundary" ?
> 
> Clarification:  this is a 64-bit PCI(e/X) device, and the above query
> applies mainly to it's use in a 64-bit slot on a 64-bit kernel.
> 
> It's not clear to me whether this can be an issue on a 32-bit kernel
> on 36-bit hardware, though.

My explanation was for the block layer part of course, I'm hoping (did
not check) that the iommu has similar sane defaults.

But this still really wants a unification of the dma restrictions...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:31                       ` Jens Axboe
@ 2006-03-21 19:36                         ` Mark Lord
  2006-03-21 19:43                           ` Jeff Garzik
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Lord @ 2006-03-21 19:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Garzik, IDE/ATA development list

Jens Axboe wrote:
> On Tue, Mar 21 2006, Mark Lord wrote:
>> Jens Axboe wrote:
..
>>> It would be much nicer if the iommu actually had some more knowledge,
>>> ideally the same requirements that the block layer is faced with. No
>>> driver should have to check the mapped sg list.
>> Yup.  Absolutely.  So long as they continue to never *add* new sg entries
>> (only doing merges instead), then I believe they just need to know the
>> device's .dma_boundary parameter.  We could pass this to them as an extra
>> parameters, or perhaps embed it into the sg_list data structure somehow.
> 
> You want max size as well, so boundary and max size should be enough.

Oh, now there's a thought.  How do we specify "max segment size" today?
I'll need to make sure that sata_mv still does that (64KB), even though
it doesn't care about crossing 64KB boundaries.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:35                             ` Jens Axboe
@ 2006-03-21 19:38                               ` Jeff Garzik
  2006-03-21 19:42                                 ` Jens Axboe
  2006-03-21 19:43                                 ` James Bottomley
  0 siblings, 2 replies; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21 19:38 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mark Lord, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

Jens Axboe wrote:
> My explanation was for the block layer part of course, I'm hoping (did
> not check) that the iommu has similar sane defaults.

Part of the problem is that the iommu doesn't know as much as the block 
layer.


> But this still really wants a unification of the dma restrictions...

Strongly agreed.  ISTR JamesB had some concrete thoughts in that 
direction, but they never made it beyond an IRC channel and/or a few emails.

	Jeff




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:38                               ` Jeff Garzik
@ 2006-03-21 19:42                                 ` Jens Axboe
  2006-03-21 19:43                                 ` James Bottomley
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2006-03-21 19:42 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mark Lord, IDE/ATA development list, James Bottomley,
	Benjamin Herrenschmidt

On Tue, Mar 21 2006, Jeff Garzik wrote:
> Jens Axboe wrote:
> >My explanation was for the block layer part of course, I'm hoping (did
> >not check) that the iommu has similar sane defaults.
> 
> Part of the problem is that the iommu doesn't know as much as the block 
> layer.

Right, this is what needs fixing.

> >But this still really wants a unification of the dma restrictions...
> 
> Strongly agreed.  ISTR JamesB had some concrete thoughts in that 
> direction, but they never made it beyond an IRC channel and/or a few 
> emails.

The pci dev already holds the dma address, we could add dma boundary and
dma segment size as well?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:38                               ` Jeff Garzik
  2006-03-21 19:42                                 ` Jens Axboe
@ 2006-03-21 19:43                                 ` James Bottomley
  2006-03-21 19:46                                   ` Jens Axboe
  1 sibling, 1 reply; 29+ messages in thread
From: James Bottomley @ 2006-03-21 19:43 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jens Axboe, Mark Lord, IDE/ATA development list,
	Benjamin Herrenschmidt

On Tue, 2006-03-21 at 14:38 -0500, Jeff Garzik wrote:
> Strongly agreed.  ISTR JamesB had some concrete thoughts in that 
> direction, but they never made it beyond an IRC channel and/or a few emails.

Actually, I had a patch for it ... but it never really went anywhere.
The argument being that the machines which would actually need it didn't
use IDE anyway ...

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:36                         ` Mark Lord
@ 2006-03-21 19:43                           ` Jeff Garzik
  2006-03-21 20:51                             ` Mark Lord
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Garzik @ 2006-03-21 19:43 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, IDE/ATA development list

Mark Lord wrote:
> Oh, now there's a thought.  How do we specify "max segment size" today?
> I'll need to make sure that sata_mv still does that (64KB), even though
> it doesn't care about crossing 64KB boundaries.

Agreed.  50xx is the same:  64k segment size limit, dma boundary is 
really 0xffffffff.  Good catch, I had made the mistaken assumption that 
there was the standard IDE 64k boundary as well.

	Jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:43                                 ` James Bottomley
@ 2006-03-21 19:46                                   ` Jens Axboe
  2006-03-21 20:44                                     ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2006-03-21 19:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Garzik, Mark Lord, IDE/ATA development list,
	Benjamin Herrenschmidt

On Tue, Mar 21 2006, James Bottomley wrote:
> On Tue, 2006-03-21 at 14:38 -0500, Jeff Garzik wrote:
> > Strongly agreed.  ISTR JamesB had some concrete thoughts in that
> > direction, but they never made it beyond an IRC channel and/or a few
> > emails.
> 
> Actually, I had a patch for it ... but it never really went anywhere.
> The argument being that the machines which would actually need it didn't
> use IDE anyway ...

Do <insert random device here> really never have segment or boundary
restrictions outside of IDE? Seems to me that supporting that would be
the conservative and sane thing to do.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:46                                   ` Jens Axboe
@ 2006-03-21 20:44                                     ` James Bottomley
  2006-03-21 21:54                                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2006-03-21 20:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jeff Garzik, Mark Lord, IDE/ATA development list,
	Benjamin Herrenschmidt

On Tue, 2006-03-21 at 20:46 +0100, Jens Axboe wrote:
> Do <insert random device here> really never have segment or boundary
> restrictions outside of IDE? Seems to me that supporting that would be
> the conservative and sane thing to do.

Well the only machines that actually turn on virtual merging are sparc
and parisc.  They have a small list of "certified" devices for them,
none of which seems to have aribtrary segment boundary restrictions
(although most of them have the standard 4GB one).

When I brought this up the last time it degenerated into a slanging
match over the value of virtual merging (which no-one can seem to
provide a definitive answer to).

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:43                           ` Jeff Garzik
@ 2006-03-21 20:51                             ` Mark Lord
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Lord @ 2006-03-21 20:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jens Axboe, IDE/ATA development list

Jeff Garzik wrote:
> Mark Lord wrote:
>> Oh, now there's a thought.  How do we specify "max segment size" today?
>> I'll need to make sure that sata_mv still does that (64KB), even though
>> it doesn't care about crossing 64KB boundaries.
> 
> Agreed.  50xx is the same:  64k segment size limit, dma boundary is 
> really 0xffffffff.  Good catch, I had made the mistaken assumption that 
> there was the standard IDE 64k boundary as well.

Okay, good.  I can do this with a scsi slave config function,
or we could (longer term) add it to libata somewhere.

Short term, I've already added a mv_slave_config() SCSI function here.

Cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 20:44                                     ` James Bottomley
@ 2006-03-21 21:54                                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2006-03-21 21:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Jeff Garzik, Mark Lord, IDE/ATA development list

On Tue, 2006-03-21 at 14:44 -0600, James Bottomley wrote:
> On Tue, 2006-03-21 at 20:46 +0100, Jens Axboe wrote:
> > Do <insert random device here> really never have segment or boundary
> > restrictions outside of IDE? Seems to me that supporting that would be
> > the conservative and sane thing to do.
> 
> Well the only machines that actually turn on virtual merging are sparc
> and parisc.  They have a small list of "certified" devices for them,
> none of which seems to have aribtrary segment boundary restrictions
> (although most of them have the standard 4GB one).
> 
> When I brought this up the last time it degenerated into a slanging
> match over the value of virtual merging (which no-one can seem to
> provide a definitive answer to).

Well, on ppc, what I do is I advertise no virtual merging to the block
layer but I do merge as much as I can still in the iommu code. That's
what I call "best try" merging. Though that also means that to be
totally correct with devices having boundary restrictions, I would have
to know that at the iommu level which I don't (and which is, I think, we
we hacked something at the ata level back then to work around it).

The problem with my approach is that since the driver can't know in
advance wether the iommu will be able to merge or not, it can't request
larger requests from the block layer unless it has the ability to do
partial completion and partial failure, that sort of thing... at least
that's what I remember from that 2 yrs old discussion we had :)

There is interest in virtual merging tho. My measurements back then on
the dual G5 were that it did compensate for the cost of the iommu on the
bus and actually did a bit better (I _think_ the workload was kernbench
but I can't remember for sure). Newer G5s have a better iommu thus
virtual merging may be even more of a benefit. By having the ability to
maybe get larger requests from the block layer would be good too. The
problem is that if I turn virtual merging on, with the current
implementation, then I _have_ to merge. The iommu isn't allowed to not
be able to merge because the block layer will have provided something
that maxes out the device sglist capabilities assuming complete merge...

In fact, the best would be to move the merge logic so that it's enslaved
to the driver & iommu code, but that would require a different interface
I suppose... Something like 

 1 - driver "opens" an sglist with the iommu
 2 - driver request a segment from the block layer, sends it down the
the iommu
 3 - iommu can merge -> go back to 2 until no more coming from the block
layer
 4 - iommu can't merge -> if driver can cope with more sglist entries,
add one and go to 2
 5 - driver limit reached, "close" the sglist (proceed to actual hw
mapping at this point maybe) and submit request to the hardware

I agree though that the loop between 2 and 3 can have interesting
"issues" if N drivers are hitting the iommu layer at the same time
unless we invent creative ways of either locking or scattering
allocation starting points at stage 1

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-21 19:18                     ` Mark Lord
  2006-03-21 19:29                       ` Jeff Garzik
  2006-03-21 19:31                       ` Jens Axboe
@ 2006-03-22 11:25                       ` Tejun Heo
  2006-03-22 14:52                         ` Mark Lord
  2 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2006-03-22 11:25 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jens Axboe, Jeff Garzik, IDE/ATA development list

Hello, all.

Mark Lord wrote:
> Jens Axboe wrote:
> ..
>> Seems to me that your reasoning is correct. It's a fact that the
>> original block mapped sg lists satisfies all requirements of the device
>> driver and/or hardware, otherwise would be a bug. The iommu may go nuts
>> of course, but logically that new sg list should be choppable into the
>> same requirements.
> 
> I just finished going through all of the arch implementations and,
> as near as I can tell, they only ever *merge* sg list items,
> and never create additional sg entries.
> 

One question though. Do IOMMU's preserve alignment? ie. Do they align 
33k block on 64k boundary? I guess they do, just wanna make sure.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: libata+SGIO:  is .dma_boundary respected?
  2006-03-22 11:25                       ` Tejun Heo
@ 2006-03-22 14:52                         ` Mark Lord
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Lord @ 2006-03-22 14:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Jeff Garzik, IDE/ATA development list

Tejun Heo wrote:
>
> One question though. Do IOMMU's preserve alignment? ie. Do they align 
> 33k block on 64k boundary? I guess they do, just wanna make sure.

They don't move the physical memory (that the sg list targets) around,
so it doesn't really matter, does it?

Cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2006-03-22 14:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-19 20:48 libata+SGIO: is .dma_boundary respected? Mark Lord
2006-03-19 21:14 ` Jeff Garzik
2006-03-19 21:19   ` Mark Lord
2006-03-19 21:38     ` Jeff Garzik
2006-03-19 21:45       ` Mark Lord
2006-03-19 21:54         ` Mark Lord
2006-03-21  1:18           ` Jeff Garzik
2006-03-21  4:43             ` Mark Lord
2006-03-21  6:14               ` Jeff Garzik
2006-03-21 13:59                 ` Mark Lord
2006-03-21 18:42                   ` Jens Axboe
2006-03-21 19:18                     ` Mark Lord
2006-03-21 19:29                       ` Jeff Garzik
2006-03-21 19:31                         ` Mark Lord
2006-03-21 19:33                           ` Mark Lord
2006-03-21 19:35                             ` Jens Axboe
2006-03-21 19:38                               ` Jeff Garzik
2006-03-21 19:42                                 ` Jens Axboe
2006-03-21 19:43                                 ` James Bottomley
2006-03-21 19:46                                   ` Jens Axboe
2006-03-21 20:44                                     ` James Bottomley
2006-03-21 21:54                                       ` Benjamin Herrenschmidt
2006-03-21 19:31                       ` Jens Axboe
2006-03-21 19:36                         ` Mark Lord
2006-03-21 19:43                           ` Jeff Garzik
2006-03-21 20:51                             ` Mark Lord
2006-03-22 11:25                       ` Tejun Heo
2006-03-22 14:52                         ` Mark Lord
2006-03-21  1:15         ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).