* dma_mask semantic problems
@ 2004-03-01 0:37 Benjamin Herrenschmidt
2004-03-01 5:47 ` David S. Miller
2004-03-01 10:23 ` Ivan Kokshaysky
0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-01 0:37 UTC (permalink / raw)
To: Linux Arch list
Hi !
Time to bring out another one...
There is, I think, a problem with the exact semantics of *_set_dma_mask,
and the way it's used (especially the result code) in various drivers.
It's mixing a lot of different things that are unrelated, and the whole
stuff only works by luck imho on some archs.
So here are the different kind of "informations" that are beeing
exchanged between the driver and the arch:
- The mask that gets actually passed by the driver to *_set_dma_mask.
This mask indicates what the HW support as addresses to DMA to. That
means this is an indication of what is supported as an _output_ of
pci_map_* (or dma_map_*).
- On archs without an iommu, that is also putting a constraint on the
_input_ of pci/dma_map_* though. Of course, nothing in the kernel does
properly the differenciation between those. There are various
assumptions, like network drivers assuming a failing result code when
passsing a 64 bits mask means no HIGHDMA, which is only half-related
(and arch specific assumption in the driver).
On arch with an iommu, the constraint is only on the iommu's own virtual
space allocator, which may or may not be able to address "zoned"
allocations...
What I think we need here is the arch to pass back up the mask of
addresses it can get on input of pci_map_* to address the driver
requirements. Typically, a non-iommu platform would just pass back
the driver's mask (eventually cropped if the arch has additional
restrictions on useable DMA addresses). Then the driver can use
that mask to either pass up to the BIO / network driver to control
allocation/bouncing of buffers, or it's own allocation scheme for
drivers not under control of an upper layer.
- Finally, some SCSI drivers are "using" the result code of
pci_set_dma_mask() in a weird way. They assume that a failing
set_dma_mask for a 64 bits mask means they won't ever get 64 bits
addresses, and uses that as an indication so they can "optimize" the
HW to not have to use 64 bits addressing, thus in some cases, increasing
the max possible request queue depth. Because of that, some archs with
an iommu play the trick of failing 64 bits mask passed to set_dma_mask,
which doesn't make sense imho. The driver passes the mask of addresses
the HW supports. If the arch always generate addresses that do fit in
that mask (and 32 bits addresses _do_ fit in a 64 bits mask), then the
function should not fail. What we need here is completely different,
we need to pass back _up_ to the driver the mask of addresses we'll
actually generate so it can use that as a "hint" for its optimisations.
So, resuming it, I think we need to differenciate those 3 things:
- dma_set_dma_mask : Get passed the mask of supported addresses the
driven HW supports (that is output of dma_map_*). Failure means the
arch cannot address those restrictions (doesn't provide a zoned
allocator on non-iommu archs for addressing such a zone or the iommu
code cannot enforce such a restriction).
- either the above is modified to _return_ the modified mask (as
described in step 2) or we add a separate function
dma_adjust_dma_mask(mask *). This is where the arch will actually
provide a mask that tells the driver what kind of addresses will
be supported on _input_ of dma_map_*. This is usually to be passed
upstream to control bouncing etc... For example, an iommu arch would
usually return the full 64 bits in there to indicate that any
memory page can be mapped for DMA.
- dma_get_arch_limit() (or find a better name) is a _hinting_
function that returns what kind of addresses will be produced by
dma_map_* for this device. It can be used by a driver like some
SCSI ones to disabled 64 bits addressing and thus win more queue
space on iommu archs that will never generate a more than 32
(or even 31) bits address.
Any comments ? Better ideas ?
The current stuff is wrong in some cases, it happens to work
because both archs and drivers are cheating... It gets even worse
when DAC gets into the picture...
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dma_mask semantic problems
2004-03-01 0:37 dma_mask semantic problems Benjamin Herrenschmidt
@ 2004-03-01 5:47 ` David S. Miller
2004-03-01 5:51 ` Benjamin Herrenschmidt
2004-03-01 10:23 ` Ivan Kokshaysky
1 sibling, 1 reply; 6+ messages in thread
From: David S. Miller @ 2004-03-01 5:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-arch
On Mon, 01 Mar 2004 11:37:36 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> - dma_set_dma_mask : Get passed the mask of supported addresses the
> driven HW supports (that is output of dma_map_*). Failure means the
> arch cannot address those restrictions (doesn't provide a zoned
> allocator on non-iommu archs for addressing such a zone or the iommu
> code cannot enforce such a restriction).
OK, I'm fine with this of course.
> - either the above is modified to _return_ the modified mask (as
> described in step 2) or we add a separate function
> dma_adjust_dma_mask(mask *). This is where the arch will actually
> provide a mask that tells the driver what kind of addresses will
> be supported on _input_ of dma_map_*. This is usually to be passed
> upstream to control bouncing etc... For example, an iommu arch would
> usually return the full 64 bits in there to indicate that any
> memory page can be mapped for DMA.
Hmmm, Ok I guess.
> - dma_get_arch_limit() (or find a better name) is a _hinting_
> function that returns what kind of addresses will be produced by
> dma_map_* for this device. It can be used by a driver like some
> SCSI ones to disabled 64 bits addressing and thus win more queue
> space on iommu archs that will never generate a more than 32
> (or even 31) bits address.
This needs work. What we want to say for what SCSI is doing is:
1) A way to say "I can map all of physical space into the 32-bit
PCI BUS DMA addressing space, ie. iommu platform.
2) A way to say "I can map all of physical space using 64-bit
addressing too"
I think the first thing is a seperate question. Purely, the IOMMU exists
precisely to serve this purpose, map physical memory space (however large
bitness wise it is) into the 32-bit SAC PCI address space.
I would go so far as to propose a:
int dma_can_use_iommu(dev);
u64 dma_iommu_mask(dev);
so scsi could say:
if (dma_can_use_iommu(dev) &&
!(dma_iommu_mask(dev) & ~(u64)0xffffffff)) {
/* do the 32-bit space saving trick... */
dma_set_blah_mask(0xffffffff);
... etc. ...
}
Sound ok?
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: dma_mask semantic problems
2004-03-01 5:47 ` David S. Miller
@ 2004-03-01 5:51 ` Benjamin Herrenschmidt
2004-03-01 6:34 ` David S. Miller
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-01 5:51 UTC (permalink / raw)
To: David S. Miller; +Cc: Linux Arch list
On Mon, 2004-03-01 at 16:47, David S. Miller wrote:
> 1) A way to say "I can map all of physical space into the 32-bit
> PCI BUS DMA addressing space, ie. iommu platform.
>
> 2) A way to say "I can map all of physical space using 64-bit
> addressing too"
>
> I think the first thing is a seperate question. Purely, the IOMMU exists
> precisely to serve this purpose, map physical memory space (however large
> bitness wise it is) into the 32-bit SAC PCI address space.
>
> I would go so far as to propose a:
>
> int dma_can_use_iommu(dev);
> u64 dma_iommu_mask(dev);
What about 32 bits archs without iommu ?
> so scsi could say:
>
> if (dma_can_use_iommu(dev) &&
> !(dma_iommu_mask(dev) & ~(u64)0xffffffff)) {
> /* do the 32-bit space saving trick... */
> dma_set_blah_mask(0xffffffff);
> ... etc. ...
> }
>
> Sound ok?
Except for 32 bits archs without iommu ...
Well... Assuming we fix that, then what if we can do both 64 bit
addresses and 32 bits addresses, who decides what to use ? like we
support DAC. Who decides wether to use it or not ?
If it's the pci_map_* code that will either go through the iommu or
not, on what bases does it make that decision ?
Once the above example you pasted did return that we can indeed get
only 32 bits addresses, we do a dma_set_dma_mask(32bits) to force
pci_map_sg() to only return us addresses in the low 32 bits mask,
that's fine, just make it clear that this dma_set_blah_mask is
actually the real & good old dma/pci_set_dma_mask()...
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: dma_mask semantic problems
2004-03-01 5:51 ` Benjamin Herrenschmidt
@ 2004-03-01 6:34 ` David S. Miller
2004-03-01 6:28 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: David S. Miller @ 2004-03-01 6:34 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-arch
On Mon, 01 Mar 2004 16:51:28 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > I would go so far as to propose a:
> >
> > int dma_can_use_iommu(dev);
> > u64 dma_iommu_mask(dev);
>
> What about 32 bits archs without iommu ?
They do not return true for the first function, which means
the second function may not be called.
> Well... Assuming we fix that, then what if we can do both 64 bit
> addresses and 32 bits addresses, who decides what to use ? like we
> support DAC. Who decides wether to use it or not ?
Good question.
Long ago we decided that the way we handle this on sparc64 is to
only support 32-bit stuff via the dma interfaces, you have to use
the explicit pci_dac_*() stuff to get at the 64-bit addresses and
these interfaces are extremely frowned upon except in very specific
kinds of drivers. See the section entitled "DAC Addressing for
Address Space Hungry Devices".
So on sparc64 we define dma_addr_t as a u32. We want to use the
IOMMU for everything normal because unless you use the IOMMU you
don't get the PCI controller DMA cache usage (which does prefetching
for reads and coalescing for writes to encourage cacheline sized
transactions on the system bus).
On the other side in SCSI we are talking about a driver specific
issue in terms of performance. Basically what they want to know,
in our terms, is: "If 32-bit vs 64-bit DMA address spacing has
roughly the same performance, and gets access to the whole range
of physical memory, let me use 32-bit."
Maybe that more precise question leads more directly to a more useful
dma_*() interface name and semantics? :-)
You're right, my original suggestion does not handle this properly.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dma_mask semantic problems
2004-03-01 6:34 ` David S. Miller
@ 2004-03-01 6:28 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-01 6:28 UTC (permalink / raw)
To: David S. Miller; +Cc: Linux Arch list
On Mon, 2004-03-01 at 17:34, David S. Miller wrote:
> On Mon, 01 Mar 2004 16:51:28 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > > I would go so far as to propose a:
> > >
> > > int dma_can_use_iommu(dev);
> > > u64 dma_iommu_mask(dev);
> >
> > What about 32 bits archs without iommu ?
>
> They do not return true for the first function, which means
> the second function may not be called.
Yes, which means according to your example, that we do not enable
the 32 bits DMA optimizations for 32 bits arch :)
> > Well... Assuming we fix that, then what if we can do both 64 bit
> > addresses and 32 bits addresses, who decides what to use ? like we
> > support DAC. Who decides wether to use it or not ?
>
> Good question.
>
> Long ago we decided that the way we handle this on sparc64 is to
> only support 32-bit stuff via the dma interfaces, you have to use
> the explicit pci_dac_*() stuff to get at the 64-bit addresses and
> these interfaces are extremely frowned upon except in very specific
> kinds of drivers. See the section entitled "DAC Addressing for
> Address Space Hungry Devices".
Hrm...
> So on sparc64 we define dma_addr_t as a u32. We want to use the
> IOMMU for everything normal because unless you use the IOMMU you
> don't get the PCI controller DMA cache usage (which does prefetching
> for reads and coalescing for writes to encourage cacheline sized
> transactions on the system bus).
Ok, your controller is better than ours :)
> On the other side in SCSI we are talking about a driver specific
> issue in terms of performance. Basically what they want to know,
> in our terms, is: "If 32-bit vs 64-bit DMA address spacing has
> roughly the same performance, and gets access to the whole range
> of physical memory, let me use 32-bit."
>
> Maybe that more precise question leads more directly to a more useful
> dma_*() interface name and semantics? :-)
>
> You're right, my original suggestion does not handle this properly.
The thing is at the moment, I'm not sure what would be those better
semantics ... I started this discussion to get more inputs from
people like you that had to deal with it already ;)
I'll sleep on this and see if I get some good ideas, but any suggestion
is welcome.
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dma_mask semantic problems
2004-03-01 0:37 dma_mask semantic problems Benjamin Herrenschmidt
2004-03-01 5:47 ` David S. Miller
@ 2004-03-01 10:23 ` Ivan Kokshaysky
1 sibling, 0 replies; 6+ messages in thread
From: Ivan Kokshaysky @ 2004-03-01 10:23 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linux Arch list
On Mon, Mar 01, 2004 at 11:37:36AM +1100, Benjamin Herrenschmidt wrote:
> The current stuff is wrong in some cases, it happens to work
> because both archs and drivers are cheating... It gets even worse
> when DAC gets into the picture...
Precisely. :-(
Note that DAC DMA have additional restrictions on some platforms.
That is, to use DAC on Alpha, the PCI device must be able to generate
at least 40-bit addresses to access "monster window" on IOMMU.
On sparc64, IIRC, the DAC-capable device must have full 64-bit DMA.
But there are some devices, like older LSI chips, which are limited
to 39 or so bit DMA in DAC mode.
For PCI, we handle that with pci_dac_dma_supported(), which is
completely ignored by generic dma stuff...
Ivan.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-03-01 10:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-01 0:37 dma_mask semantic problems Benjamin Herrenschmidt
2004-03-01 5:47 ` David S. Miller
2004-03-01 5:51 ` Benjamin Herrenschmidt
2004-03-01 6:34 ` David S. Miller
2004-03-01 6:28 ` Benjamin Herrenschmidt
2004-03-01 10:23 ` Ivan Kokshaysky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox