* [RFC] speeding up pci_unmap_sg() for SAC mappings
@ 2004-02-09 15:27 Jes Sorensen
2004-02-09 16:38 ` Alex Williamson
2004-02-09 16:52 ` Grant Grundler
0 siblings, 2 replies; 3+ messages in thread
From: Jes Sorensen @ 2004-02-09 15:27 UTC (permalink / raw)
To: linux-ia64
Hi,
I was looking at the sn2 PCI mapping code and realized how it is costing
to do a basic pci_unmap because the code has to search a table to figure
out which struct dmamap entry matches a given dma address. Clearly the
sn code could be improved in terms of how it is currently implemented,
however there is still the fundamental problem of mapping from a
dma_addr_t to a dma-map entry which I believe all IOMMU code
implmentations suffer from. The pretty way to clean this up would
probably require changing the whole mapping API, however one of the most
interesting cases is pci_unmap_sg.
Christoph suggested that we add an arch dependent pointer to struct
scatterlist that we can use to short circuit the unmap process.
Anyone have any strong objections to this? While it can be considered a
bit hackerish it really should help on performance without making any
visible changes to the end user.
Comments?
Cheers,
Jes
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] speeding up pci_unmap_sg() for SAC mappings
2004-02-09 15:27 [RFC] speeding up pci_unmap_sg() for SAC mappings Jes Sorensen
@ 2004-02-09 16:38 ` Alex Williamson
2004-02-09 16:52 ` Grant Grundler
1 sibling, 0 replies; 3+ messages in thread
From: Alex Williamson @ 2004-02-09 16:38 UTC (permalink / raw)
To: linux-ia64
On Mon, 2004-02-09 at 08:27, Jes Sorensen wrote:
[snip]
>
> Christoph suggested that we add an arch dependent pointer to struct
> scatterlist that we can use to short circuit the unmap process.
>
> Anyone have any strong objections to this? While it can be considered a
> bit hackerish it really should help on performance without making any
> visible changes to the end user.
I don't necessarily have a strong objections, but I also don't see
this as a problem that all iommus have. The sba_iommu, for instance,
has a direct translation between dma_addr_t and pdir entries. There's
no lookup necessary, just a mask and shift. The swiotlb uses the same
type of approach. Even with a change to scatterlist, won't you still
have the issue w/ pci_unmap_single? Perhaps a lookup table within the
iommu code would provide the speedup you're looking for.
Alex
--
Alex Williamson HP Linux & Open Source Lab
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] speeding up pci_unmap_sg() for SAC mappings
2004-02-09 15:27 [RFC] speeding up pci_unmap_sg() for SAC mappings Jes Sorensen
2004-02-09 16:38 ` Alex Williamson
@ 2004-02-09 16:52 ` Grant Grundler
1 sibling, 0 replies; 3+ messages in thread
From: Grant Grundler @ 2004-02-09 16:52 UTC (permalink / raw)
To: linux-ia64
On Mon, Feb 09, 2004 at 10:27:39AM -0500, Jes Sorensen wrote:
> I was looking at the sn2 PCI mapping code and realized how it is costing
> to do a basic pci_unmap because the code has to search a table to figure
> out which struct dmamap entry matches a given dma address. Clearly the
> sn code could be improved in terms of how it is currently implemented,
> however there is still the fundamental problem of mapping from a
> dma_addr_t to a dma-map entry which I believe all IOMMU code
> implmentations suffer from.
Not the two implementations I helped write.
Did you have some particular other (non-ia64) implementations in mind?
Neither ccio-dma (parisc only) nor sba_iommu (parisc, ia64) maintain
any seperate tables outside the bitmap to manage "free/used".
All relevant info is stored directly in the IO Pdir (or NOT if
the IO Pdir is being bypassed - ia64 only).
> The pretty way to clean this up would
> probably require changing the whole mapping API, however one of the most
> interesting cases is pci_unmap_sg.
HPUX uses a "DMA Handle" to reference a "DMA Object".
That works too but is not as simple and not lighter weight.
> Christoph suggested that we add an arch dependent pointer to struct
> scatterlist that we can use to short circuit the unmap process.
yeah, I understand how that might help.
But it doesn't solve the problem for networking drivers.
And it will grow the cacheline footprint of the SG list.
Right now we are at 32 bytes (28 bytes used) - 4 per cacheline.
Alignment requirements would push that to 40 bytes per entry.
While this isn't a big deal, it will impact all platforms.
> Anyone have any strong objections to this? While it can be considered a
> bit hackerish it really should help on performance without making any
> visible changes to the end user.
Another even more hackerish idea is to use the remaining "int" (4 bytes)
as an index into a table.
> Comments?
Can one extract an "index" from contents of dma_address field?
If so, then the same "index" should work for pci_map_single() as well.
Is it necessary to touch the IOMMU for 64-bit capable devices?
Any way to differentiate 32 vs 64-bit and PCI vs PCI-X mappings
so the problem can be handled seperately for each "class" of mapping?
If only 32-bit PCI devices have this problem, I think I'd rather
not see 'struct scatterlist' grow.
hth,
grant
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-02-09 16:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-09 15:27 [RFC] speeding up pci_unmap_sg() for SAC mappings Jes Sorensen
2004-02-09 16:38 ` Alex Williamson
2004-02-09 16:52 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox