* can device drivers return non-ram via vm_ops->nopage?
@ 2004-03-20 13:30 Andrea Arcangeli
2004-03-20 14:40 ` William Lee Irwin III
2004-03-20 17:39 ` Linus Torvalds
0 siblings, 2 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 13:30 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Linus Torvalds
The only bugreport I've got so far for the latest anon_vma code is from
Jens, and it's a device driver bug in my opinion, but I'd like to have a
definitive confirmation from you about the ->nopage API.
I changed ->nopage like this to catch bugs:
retry:
new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);
/*
* non-ram cannot be mapped via ->nopage, it must
* be mapped via remap_page_range instead synchronously
* in the ->mmap device driver callback.
*
* PageReserved pages can be mapped as far as they're under
* a VM_RESERVED vma.
*/
BUG_ON(!pfn_valid(page_to_pfn(new_page)));
/* ->nopage cannot return swapcache */
BUG_ON(PageSwapCache(new_page));
/* ->nopage cannot return anonymous pages */
BUG_ON(PageAnon(new_page));
/*
* This is the entry point for memory under VM_RESERVED vmas.
* That memory will not be tracked by the vm. These aren't
* real anonymous pages, they're "device" reserved pages
* instead.
* These pages under VM_RESERVED vmas are the only pages mapped
* by the VM into userspace with page->as.mapping = NULL.
*/
reserved = vma->vm_flags & VM_RESERVED;
BUG_ON(!reserved && (!new_page->mapping || PageReserved(new_page)));
really it would not be mandatory for me to enforce the last BUG_ON,
since we don't do the pagetable walk anymore to unmap stuff, but I think
it's nicer to enforce the model in the drivers so if we'll ever want to
do the pagetable walk again, we could, if we giveup then we'll be unable
to go back to the pagetable walk. I'm not saying that we'll ever want to
go back, but since most drivers are already setting VM_RESREVED
correctly to work with 2.4, I believe it worth to maintain this
abstraction so if we really want we can go back.
Anyways returning to the non-ram returned by ->nopage see the below
email exchange with Jens. the bug triggering of course is the
BUG_ON(!pfn_valid(page_to_pfn(new_page))).
If we want to return non-ram, we could, but I believe we should change
the API to return a pfn not a page_t * if we want to.
----- Forwarded message from Andrea Arcangeli <andrea@suse.de> -----
Date: Sat, 20 Mar 2004 14:21:56 +0100
From: Andrea Arcangeli <andrea@suse.de>
To: Jens Axboe <axboe@suse.de>
On Fri, Mar 19, 2004 at 01:32:13PM +0100, Jens Axboe wrote:
> kernel BUG at mm/memory.c:1412!
> invalid operand: 0000 [#1]
> SMP
> CPU: 1
> EIP: 0060:[<c01407fe>] Not tainted
> EFLAGS: 00010216 (2.6.4-0-axboe)
> EIP is at do_no_page+0x42c/0x4dc
> eax: 01f80000 ebx: 00000000 ecx: 00001000 edx: f0c5be78
> esi: f0860280 edi: c0453880 ebp: f0c5bec0 esp: f0c5be88
> ds: 007b es: 007b ss: 0068
> Process mplayer (pid: 1500, threadinfo=f0c5a000 task=f149d300)
> Stack: f7f997a0 f7f997a0 00000282 f06fec80 f0c5bea8 00000000 f7d8c984
> 40b0e000
> f0860280 f1267800 00000001 f0c5a000 f0866c38 c0453880 f0c5bf0c
> c01415da
> 00000000 f0866c38 f08b3408 00000000 000081ff 405ae7a3 1be20e55
> f7de0480
> Call Trace:
> [<c01415da>] handle_mm_fault+0xf3/0x694
> [<c011674d>] do_page_fault+0x16c/0x535
> [<c0144179>] __do_mmap_pgoff+0x34c/0x643
> [<c010c44a>] do_mmap2+0x7a/0xa8
> [<c01165e1>] do_page_fault+0x0/0x535
> [<c0106aad>] error_code+0x2d/0x38
a device driver is returning a non-ram page via ->nopage.
I don't think this has ever been safe, it's just that my more robust
anon_vma code is trapping this bug, I think non-ram pages should use
remap_file_pages not ->nopage.
Let's assume I'm wrong and you can return non-ram via ->nopage (even
ignoring the API would be totally incorrect since one should return a
'pfn' not a 'page_t *' if really ->nopage can return non-ram), let's
take plain 2.6.5-rc1 (w/o my anon_vma code)
new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);
[..]
if (pte_none(*page_table)) {
if (!PageReserved(new_page))
++mm->rss;
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte(page_table, entry);
pte_chain = page_add_rmap(new_page, page_table, pte_chain);
pte_unmap(page_table);
PageReserved(new_page) is already reading random memory, that could even
genrate a machine exception on amd64 or ia64 and lock the box hard.
then it goes ahead and it even does page_add_rmap(new_page), writing
new_page->pte_chain in non-ram with, again potentially crashing the box.
It's ironic that Andrew removed a pfn_valid check in front of
page_add_rmap just in 2.6.5-rc1 (previous kernels wouldn't overwrite
random non-ram there, but still the unrecoverable machine check was
still there simply due the PageReserved).
So I think my anon_vma code is right forbidding non-ram to be returned
by ->nopage, and the device driver should be fixed.
If you disagree, I can change ->nopage to survive a non-ram page, but
besides the API of returning a page_t * would be misleading, this would
be a new feature, and it wouldn't work stable with mainline kernels
(regardless if page_add_rmap starts with a pfn_valid check or not).
----- End forwarded message -----
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 13:30 can device drivers return non-ram via vm_ops->nopage? Andrea Arcangeli
@ 2004-03-20 14:40 ` William Lee Irwin III
2004-03-20 15:06 ` Andrea Arcangeli
2004-03-20 17:39 ` Linus Torvalds
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 14:40 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton, Linus Torvalds
On Sat, Mar 20, 2004 at 02:30:25PM +0100, Andrea Arcangeli wrote:
> Anyways returning to the non-ram returned by ->nopage see the below
> email exchange with Jens. the bug triggering of course is the
> BUG_ON(!pfn_valid(page_to_pfn(new_page))).
> If we want to return non-ram, we could, but I believe we should change
> the API to return a pfn not a page_t * if we want to.
This would be very helpful for other reasons also. There's a general
API issue with drivers that want or need to do this. The one I've
heard most about is /dev/mem when it's used to mmap() physical areas
lying in memory holes not covered by ->node_mem_map. Once ->mmap() and
->nopage() supplied by drivers are liberated from reliance on struct
page, numerous hacks, validation overheads, and stability issues may be
eliminated. I'd rather strongly advocate such an API change for mainline,
as it's something that fixes a number of drivers at once, but only if
the implementation carries out a full sweep of all affected callees
and only if it actually resolves the issues with these drivers.
But there's another question that should be asked up-front: in order to
give drivers sufficient expressiveness to correctly implement their
->mmap() methods, is this even sufficient? There is a serious question
of whether the core can actually handle the driver-specific issues,
which suggests devolving a larger swath of the fault handling codepath
to drivers supporting ->mmap() if it is insufficient after all. For
instance, will cache-disabled mappings or bolted/locked TLB entries
that the core doesn't understand be required? I'd like to get someone
with more driver experience or who may have architecture-specific
issues with driver ->nopage() methods to chime in here with respect to
the sufficiency of a pfn-based ->nopage() vs. stronger methods, since
it's pointless to make the pfn-based ->nopage() change if it's
insufficient anyway.
There is also a special case that's hitting a number of architectures
simultaneously that may or may not be a mainline concern. This is that
a number of people actually want to handle faults on hugetlb and do
ZFOD fault handling so that, for instance, various kinds of NUMA and
latency issues can be addressed. The current methods are to trap the
fault before calling handle_mm_fault() in arch code, but a cleaner
solution would very nicely reuse more general or stronger forms of
driver fault handling that would fix driver issues also. It's basically
an upstream call as to whether this will be allowed to have any
influence on the design of a solution to the more critical "drivers are
getting bitten by the requirement of a struct page * return value of
->nopage()" issue, and it looks like upstream is cc:'d on this thread. =)
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 14:40 ` William Lee Irwin III
@ 2004-03-20 15:06 ` Andrea Arcangeli
2004-03-20 15:27 ` William Lee Irwin III
` (2 more replies)
0 siblings, 3 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 15:06 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 06:40:22AM -0800, William Lee Irwin III wrote:
> On Sat, Mar 20, 2004 at 02:30:25PM +0100, Andrea Arcangeli wrote:
> > Anyways returning to the non-ram returned by ->nopage see the below
> > email exchange with Jens. the bug triggering of course is the
> > BUG_ON(!pfn_valid(page_to_pfn(new_page))).
> > If we want to return non-ram, we could, but I believe we should change
> > the API to return a pfn not a page_t * if we want to.
>
> This would be very helpful for other reasons also. There's a general
> API issue with drivers that want or need to do this. The one I've
I'm afraid I'll have to teach ->nopage how to deal with non-ram with
this page_t API too (changing it to pfn sounds too intrusive in the
short term), it seems to me that alsa can return non-ram (in the nopage
callback there's a virt_to_page on some iomm region), and changing alsa
to use remap_file_pages sounds too intrusive too.
So in short I believe alsa can corrupt memory randomly starting with
2.6.5-rc1, and it could only generate machine check crashes in previous
kernels.
So for the short term (i.e. next few weeks) we'll have to deal with
page_t still there...
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:06 ` Andrea Arcangeli
@ 2004-03-20 15:27 ` William Lee Irwin III
2004-03-20 15:44 ` Russell King
2004-03-20 20:13 ` Andrew Morton
2 siblings, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 15:27 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton, Linus Torvalds
On Sat, Mar 20, 2004 at 04:06:21PM +0100, Andrea Arcangeli wrote:
> I'm afraid I'll have to teach ->nopage how to deal with non-ram with
> this page_t API too (changing it to pfn sounds too intrusive in the
> short term), it seems to me that alsa can return non-ram (in the nopage
> callback there's a virt_to_page on some iomm region), and changing alsa
> to use remap_file_pages sounds too intrusive too.
> So in short I believe alsa can corrupt memory randomly starting with
> 2.6.5-rc1, and it could only generate machine check crashes in previous
> kernels.
> So for the short term (i.e. next few weeks) we'll have to deal with
> page_t still there...
I've developed an interest in drivers recently, so I may be able to do
some of the footwork here in a timely fashion if we want to go the pfn-
based API route. That actually sounded like the less intrusive of the
two methods I mentioned as well as easily mergeable within a stable
series. OTOH, if there are objections, it may have to wait.
I don't believe devolving larger swaths of the fault path to drivers
would be very difficult to restructure drivers to use. The hard parts
are that it would be time-consuming and would likely merit a support
API exported by architectures to make driver writers' lives easier (i.e.
not introduce more bugs than it resolves) that would need to be agreed
upon, or at least backed by a feature request survey. And, of course,
it would need an implementation for every architecture, which could be
difficult to arrange for the less documented and/or less frequently
updated architectures if features the core doesn't already rely upon
would be required. And that's a certainty, since the core not
understanding the needs of those drivers would be the primary motive
for the more intrusive approach.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:06 ` Andrea Arcangeli
2004-03-20 15:27 ` William Lee Irwin III
@ 2004-03-20 15:44 ` Russell King
2004-03-20 15:57 ` Andrea Arcangeli
2004-03-20 15:58 ` Jaroslav Kysela
2004-03-20 20:13 ` Andrew Morton
2 siblings, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-20 15:44 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 04:06:21PM +0100, Andrea Arcangeli wrote:
> On Sat, Mar 20, 2004 at 06:40:22AM -0800, William Lee Irwin III wrote:
> > On Sat, Mar 20, 2004 at 02:30:25PM +0100, Andrea Arcangeli wrote:
> > > Anyways returning to the non-ram returned by ->nopage see the below
> > > email exchange with Jens. the bug triggering of course is the
> > > BUG_ON(!pfn_valid(page_to_pfn(new_page))).
> > > If we want to return non-ram, we could, but I believe we should change
> > > the API to return a pfn not a page_t * if we want to.
> >
> > This would be very helpful for other reasons also. There's a general
> > API issue with drivers that want or need to do this. The one I've
>
> I'm afraid I'll have to teach ->nopage how to deal with non-ram with
> this page_t API too (changing it to pfn sounds too intrusive in the
> short term), it seems to me that alsa can return non-ram (in the nopage
> callback there's a virt_to_page on some iomm region), and changing alsa
> to use remap_file_pages sounds too intrusive too.
Actually, ALSA is broken in that respect - it isn't portable as it
stands. It isn't the API which is broken - it's ALSA which is broken.
Performing virt_to_page() on any non-direct mapped RAM page (which
means the value returned from dma_alloc_coherent or pci_alloc_consistent)
is undefined.
One of my current projects is fixing this crap in ALSA.
Besides, returning an invalid struct page will lead to Bad Things(tm)
in set_pte().
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:44 ` Russell King
@ 2004-03-20 15:57 ` Andrea Arcangeli
2004-03-20 16:15 ` Russell King
2004-03-20 15:58 ` Jaroslav Kysela
1 sibling, 1 reply; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 15:57 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 03:44:19PM +0000, Russell King wrote:
> On Sat, Mar 20, 2004 at 04:06:21PM +0100, Andrea Arcangeli wrote:
> > On Sat, Mar 20, 2004 at 06:40:22AM -0800, William Lee Irwin III wrote:
> > > On Sat, Mar 20, 2004 at 02:30:25PM +0100, Andrea Arcangeli wrote:
> > > > Anyways returning to the non-ram returned by ->nopage see the below
> > > > email exchange with Jens. the bug triggering of course is the
> > > > BUG_ON(!pfn_valid(page_to_pfn(new_page))).
> > > > If we want to return non-ram, we could, but I believe we should change
> > > > the API to return a pfn not a page_t * if we want to.
> > >
> > > This would be very helpful for other reasons also. There's a general
> > > API issue with drivers that want or need to do this. The one I've
> >
> > I'm afraid I'll have to teach ->nopage how to deal with non-ram with
> > this page_t API too (changing it to pfn sounds too intrusive in the
> > short term), it seems to me that alsa can return non-ram (in the nopage
> > callback there's a virt_to_page on some iomm region), and changing alsa
> > to use remap_file_pages sounds too intrusive too.
>
> Actually, ALSA is broken in that respect - it isn't portable as it
> stands. It isn't the API which is broken - it's ALSA which is broken.
> Performing virt_to_page() on any non-direct mapped RAM page (which
> means the value returned from dma_alloc_coherent or pci_alloc_consistent)
> is undefined.
this is exactly the problem.
> One of my current projects is fixing this crap in ALSA.
Do you agree it should be fixed by returning a PFN from ->nopage? or are
you doing it differently with remap_file_pages or peraphs you're just
multiplying the right pfn for sizeof(page_t) ingoring the misleading API?
> Besides, returning an invalid struct page will lead to Bad Things(tm)
> in set_pte().
you mean in the non-x86 archs right?
there is no way I can change ->nopage to return a pfn right now (this
stuff must work stable ASAP), so I'm currently teaching do_no_page to
handle non-ram pages (for the first time ever it will be able to do
that), I expect this at least will make it work right on x86 w/o iommu.
for mainline 2.6 if we want to keep using ->nopage I agree with Wli that
an API change is reasonable.
BTW, I was wrong talking about machine checks, the problem here is
reading random _virtual_ address (not phyisical ones), so it could oops
too on x86, and starting from 2.6.5-rc1 it'll corrupt mem too.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:44 ` Russell King
2004-03-20 15:57 ` Andrea Arcangeli
@ 2004-03-20 15:58 ` Jaroslav Kysela
2004-03-20 16:09 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: Jaroslav Kysela @ 2004-03-20 15:58 UTC (permalink / raw)
To: Russell King
Cc: Andrea Arcangeli, William Lee Irwin III, linux-kernel,
Andrew Morton, Linus Torvalds
On Sat, 20 Mar 2004, Russell King wrote:
> Actually, ALSA is broken in that respect - it isn't portable as it
> stands. It isn't the API which is broken - it's ALSA which is broken.
> Performing virt_to_page() on any non-direct mapped RAM page (which
> means the value returned from dma_alloc_coherent or pci_alloc_consistent)
> is undefined.
>
> One of my current projects is fixing this crap in ALSA.
Yes, but if there's no API in the kernel code allowing to obtain page
pointers using any value returned from dma_alloc_coherent(), then we
cannot fix this problem.
So, it's not much subsystem (ALSA) problem, but kernel core is not matured
enough.
The same problem is for the cache coherency for mmaped pages.
Jaroslav
-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:58 ` Jaroslav Kysela
@ 2004-03-20 16:09 ` Russell King
2004-03-20 19:44 ` Jaroslav Kysela
0 siblings, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-20 16:09 UTC (permalink / raw)
To: Jaroslav Kysela
Cc: Andrea Arcangeli, William Lee Irwin III, linux-kernel,
Andrew Morton, Linus Torvalds
On Sat, Mar 20, 2004 at 04:58:21PM +0100, Jaroslav Kysela wrote:
> On Sat, 20 Mar 2004, Russell King wrote:
> > Actually, ALSA is broken in that respect - it isn't portable as it
> > stands. It isn't the API which is broken - it's ALSA which is broken.
> > Performing virt_to_page() on any non-direct mapped RAM page (which
> > means the value returned from dma_alloc_coherent or pci_alloc_consistent)
> > is undefined.
> >
> > One of my current projects is fixing this crap in ALSA.
>
> Yes, but if there's no API in the kernel code allowing to obtain page
> pointers using any value returned from dma_alloc_coherent(), then we
> cannot fix this problem.
It is fixable, if someone sits down and works through it, which is
precisely what I've been doing.
> So, it's not much subsystem (ALSA) problem, but kernel core is not matured
> enough.
It is well known that virt_to_page() is only valid on virtual addresses
which correspond to kernel direct mapped RAM pages, and undefined on
everything else. Unfortunately, ALSA has been using it with
pci_alloc_consistent() for a long time, and this behaviour is what
makes ALSA broken. The fact it works on x86 is merely incidental.
If ALSA wants this functionality, the ALSA people should ideally have
put their requirements forward during the 2.5 development cycle so the
problem could be addressed. However, luckily in this instance, it is
not a big problem to solve. It just requires time to sort through all
the abstraction layers upon abstraction layers which ALSA has.
- and I'm doing exactly this, right now. Be patient. -
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:57 ` Andrea Arcangeli
@ 2004-03-20 16:15 ` Russell King
2004-03-20 16:25 ` Andrea Arcangeli
0 siblings, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-20 16:15 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 04:57:39PM +0100, Andrea Arcangeli wrote:
> > One of my current projects is fixing this crap in ALSA.
>
> Do you agree it should be fixed by returning a PFN from ->nopage?
No. How would you return the PFN from a remapped page? It's far
easier to provide an interface which returns the struct page* for
the underlying pages, thusly:
static struct page *
dma_coherent_to_page(struct device *dev, void *cpu_addr,
dma_addr_t handle, unsigned int offset)
And this is precisely what I would be working on if I weren't writing
this mail. 8)
Take a moment to think about the problem. We've allocated some memory
for coherent DMA via the dma_alloc_coherent() interface. At some point,
we've had to get a struct page* in this allocator. However, the
allocator has had to do some architecture defined operations to provide
coherent memory.
Only the architecture can translate the results from dma_alloc_coherent()
back to a struct page* - which it needs to be able to do if
dma_free_coherent() is going to work.
Therefore, what we need to do to solve the ALSA problem is require all
architectures to provide dma_coherent_to_page() and make ALSA use that.
(A related problem is that some architectures need pgprot_dmacoherent()
to modify the page protections so that the user space mapping is also
DMA-coherent. However, that discussion should be the subject of a
new thread.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 16:15 ` Russell King
@ 2004-03-20 16:25 ` Andrea Arcangeli
2004-03-20 16:57 ` William Lee Irwin III
2004-03-20 17:48 ` Andrea Arcangeli
0 siblings, 2 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 16:25 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 04:15:38PM +0000, Russell King wrote:
> On Sat, Mar 20, 2004 at 04:57:39PM +0100, Andrea Arcangeli wrote:
> > > One of my current projects is fixing this crap in ALSA.
> >
> > Do you agree it should be fixed by returning a PFN from ->nopage?
>
> No. How would you return the PFN from a remapped page? It's far
> easier to provide an interface which returns the struct page* for
> the underlying pages, thusly:
>
> static struct page *
> dma_coherent_to_page(struct device *dev, void *cpu_addr,
> dma_addr_t handle, unsigned int offset)
>
> And this is precisely what I would be working on if I weren't writing
> this mail. 8)
>
> Take a moment to think about the problem. We've allocated some memory
> for coherent DMA via the dma_alloc_coherent() interface. At some point,
they're using MMIO pci space or it wouldn't catch my BUG_ON on x86.
The whole point is that it is non ram, if it would be ram, x86 couldn't
notice the virt_to_page, since the page_t would be in the range of the
mem_map_t and pfn_valid would be happy with it.
If it was dma_alloc_coherent it would return ram I think, not non-ram.
> we've had to get a struct page* in this allocator. However, the
> allocator has had to do some architecture defined operations to provide
> coherent memory.
>
> Only the architecture can translate the results from dma_alloc_coherent()
> back to a struct page* - which it needs to be able to do if
> dma_free_coherent() is going to work.
>
> Therefore, what we need to do to solve the ALSA problem is require all
> architectures to provide dma_coherent_to_page() and make ALSA use that.
will this dma_coherent_to_page be allowed to run on a non-ram page?
It's pretty ugly to use page_t for non-ram. one can always convert with
a mul or div with sizeof(page_t) though. My point is that if you want to
allow stuff to deal with non-ram you must never have this stuff work
with page_t but it should work with pfn instead.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 16:25 ` Andrea Arcangeli
@ 2004-03-20 16:57 ` William Lee Irwin III
2004-03-20 17:48 ` Andrea Arcangeli
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 16:57 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton, Linus Torvalds
On Sat, Mar 20, 2004 at 05:25:34PM +0100, Andrea Arcangeli wrote:
> they're using MMIO pci space or it wouldn't catch my BUG_ON on x86.
> The whole point is that it is non ram, if it would be ram, x86 couldn't
> notice the virt_to_page, since the page_t would be in the range of the
> mem_map_t and pfn_valid would be happy with it.
> If it was dma_alloc_coherent it would return ram I think, not non-ram.
Any idea what driver? /dev/mem, which is where X typically gets its
mappings of mmiospace, doesn't actually use ->nopage(). Maybe rmk's
notion of doing it all from within the drivers is the right idea in
general, or at least until we hit cases that can't be handled that
way at all.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 13:30 can device drivers return non-ram via vm_ops->nopage? Andrea Arcangeli
2004-03-20 14:40 ` William Lee Irwin III
@ 2004-03-20 17:39 ` Linus Torvalds
2004-03-20 17:56 ` Andrea Arcangeli
` (2 more replies)
1 sibling, 3 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-20 17:39 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton
On Sat, 20 Mar 2004, Andrea Arcangeli wrote:
>
> The only bugreport I've got so far for the latest anon_vma code is from
> Jens, and it's a device driver bug in my opinion, but I'd like to have a
> definitive confirmation from you about the ->nopage API.
I'd say that this is definitely a driver bug.
If a driver wants to map non-RAM pages, that's perfectly ok, but it MUST
NOT happen through "nopage()". The driver should map them with
"remap_page_range()", and thus never take a page fault for such pages at
all.
There is no reason to ever lazily map non-RAM pages - clearly they aren't
using any "real memory", so there is no reason to not fill the page tables
at mmap() time.
In other words, the driver is horribly broken.
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 16:25 ` Andrea Arcangeli
2004-03-20 16:57 ` William Lee Irwin III
@ 2004-03-20 17:48 ` Andrea Arcangeli
2004-03-20 19:03 ` Andrea Arcangeli
1 sibling, 1 reply; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 17:48 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
I noticed there was effectively one bug in anon_vma that would have
caused a VM_FAULT_OOM/SIGBUS to be mistaken for non-ram (I checked
VM_FAULT after pfn_valid oh well). So it's possible what Jens
experienced was a sigbus and not a non-ram condition, sorry.
So it's not certain anymore that alsa was returning non-ram, but it's
still possible in theory and it would have worked in practice up to
2.6.4 (not anymore in 2.6.5-rc1). The non-ram page_t would fall into the
direct mapping (for a 4G phys address, the page_t would be at address
3G+120M triggering an oops only on machines with less than 128m of ram,
and if the mmio region would been at lower than 4G even less ram would
be needed to hide the bug). This untested patch should make it working
with non-ram too, so it sounds safer for the short term. I will test it
a little bit and I'll upload an update.
--- x/mm/memory.c.~1~ 2004-03-20 15:47:31.000000000 +0100
+++ x/mm/memory.c 2004-03-20 18:35:19.000000000 +0100
@@ -1384,7 +1384,7 @@ do_no_page(struct mm_struct *mm, struct
struct page * new_page;
struct address_space *mapping = NULL;
pte_t entry;
- int sequence = 0, reserved, anon;
+ int sequence = 0, reserved, anon, pageable, ram, as;
int ret = VM_FAULT_MINOR;
if (!vma->vm_ops || !vma->vm_ops->nopage)
@@ -1401,36 +1401,40 @@ do_no_page(struct mm_struct *mm, struct
retry:
new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);
+ /* no page was available -- either SIGBUS or OOM */
+ if (new_page == NOPAGE_SIGBUS)
+ return VM_FAULT_SIGBUS;
+ if (new_page == NOPAGE_OOM)
+ return VM_FAULT_OOM;
+
/*
- * non-ram cannot be mapped via ->nopage, it must
- * be mapped via remap_page_range instead synchronously
- * in the ->mmap device driver callback.
- *
- * PageReserved pages can be mapped as far as they're under
- * a VM_RESERVED vma.
+ * ->nopage should return a PFN not a page_t if here we
+ * wanted to handle non-ram, though we've to make non-ram
+ * work with page_t too for a number of device drivers
+ * that may return non-ram via ->nopage.
*/
- BUG_ON(!pfn_valid(page_to_pfn(new_page)));
-
- /* ->nopage cannot return swapcache */
- BUG_ON(PageSwapCache(new_page));
- /* ->nopage cannot return anonymous pages */
- BUG_ON(PageAnon(new_page));
+ pageable = ram = pfn_valid(page_to_pfn(new_page));
+ if (likely(ram)) {
+ pageable = !PageReserved(new_page);
+ as = !!new_page->mapping;
+
+ BUG_ON(!pageable && as);
+
+ pageable &= as;
+
+ /* ->nopage cannot return swapcache */
+ BUG_ON(PageSwapCache(new_page));
+ /* ->nopage cannot return anonymous pages */
+ BUG_ON(PageAnon(new_page));
+ }
/*
* This is the entry point for memory under VM_RESERVED vmas.
* That memory will not be tracked by the vm. These aren't
* real anonymous pages, they're "device" reserved pages instead.
- * These pages under VM_RESERVED vmas are the only pages mapped
- * by the VM into userspace with page->as.mapping = NULL.
*/
- reserved = vma->vm_flags & VM_RESERVED;
- BUG_ON(!reserved && (!new_page->mapping || PageReserved(new_page)));
-
- /* no page was available -- either SIGBUS or OOM */
- if (new_page == NOPAGE_SIGBUS)
- return VM_FAULT_SIGBUS;
- if (new_page == NOPAGE_OOM)
- return VM_FAULT_OOM;
+ reserved = !!(vma->vm_flags & VM_RESERVED);
+ BUG_ON(reserved != pageable);
/*
* Should we do an early C-O-W break?
@@ -1438,6 +1442,8 @@ retry:
anon = 0;
if (write_access && !(vma->vm_flags & VM_SHARED)) {
struct page * page;
+ if (unlikely(!ram))
+ return VM_FAULT_SIGBUS;
if (unlikely(anon_vma_prepare(vma)))
goto oom;
page = alloc_page(GFP_HIGHUSER);
@@ -1460,7 +1466,8 @@ retry:
(unlikely(sequence != atomic_read(&mapping->truncate_count)))) {
sequence = atomic_read(&mapping->truncate_count);
spin_unlock(&mm->page_table_lock);
- page_cache_release(new_page);
+ if (likely(ram))
+ page_cache_release(new_page);
goto retry;
}
page_table = pte_offset_map(pmd, address);
@@ -1477,20 +1484,21 @@ retry:
*/
/* Only go through if we didn't race with anybody else... */
if (pte_none(*page_table)) {
- if (!PageReserved(new_page))
+ if (likely(ram && !PageReserved(new_page)))
++mm->rss;
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte(page_table, entry);
- if (likely(!reserved))
+ if (likely(ram))
page_add_rmap(new_page, vma, address, anon);
pte_unmap(page_table);
} else {
/* One of our sibling threads was faster, back out. */
pte_unmap(page_table);
- page_cache_release(new_page);
+ if (likely(ram))
+ page_cache_release(new_page);
spin_unlock(&mm->page_table_lock);
goto out;
}
@@ -1502,7 +1510,8 @@ retry:
return ret;
oom:
- page_cache_release(new_page);
+ if (likely(ram))
+ page_cache_release(new_page);
ret = VM_FAULT_OOM;
goto out;
}
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 17:39 ` Linus Torvalds
@ 2004-03-20 17:56 ` Andrea Arcangeli
2004-03-20 18:22 ` William Lee Irwin III
2004-03-21 3:13 ` Chris Wedgwood
2 siblings, 0 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 17:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton
On Sat, Mar 20, 2004 at 09:39:51AM -0800, Linus Torvalds wrote:
>
>
> On Sat, 20 Mar 2004, Andrea Arcangeli wrote:
> >
> > The only bugreport I've got so far for the latest anon_vma code is from
> > Jens, and it's a device driver bug in my opinion, but I'd like to have a
> > definitive confirmation from you about the ->nopage API.
>
> I'd say that this is definitely a driver bug.
>
> If a driver wants to map non-RAM pages, that's perfectly ok, but it MUST
> NOT happen through "nopage()". The driver should map them with
> "remap_page_range()", and thus never take a page fault for such pages at
> all.
>
> There is no reason to ever lazily map non-RAM pages - clearly they aren't
> using any "real memory", so there is no reason to not fill the page tables
> at mmap() time.
>
> In other words, the driver is horribly broken.
thanks for the clarification.
At the moment I'm not sure anymore if this was non-ram or a
VM_FAULT_SIGBUS because I noticed I was doing BUG_ON(!pfn_valid)
_before_ checking new_page == VM_FAULT_SIGBUS. Though my theory about
do_no_page working fine with non-ram page_t with >=128m machines up to
2.6.4 still holds, and it's not obvious that Jens triggered a SIGBUS
either.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 17:39 ` Linus Torvalds
2004-03-20 17:56 ` Andrea Arcangeli
@ 2004-03-20 18:22 ` William Lee Irwin III
2004-03-21 3:13 ` Chris Wedgwood
2 siblings, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 18:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrea Arcangeli, linux-kernel, Andrew Morton
On Sat, 20 Mar 2004, Andrea Arcangeli wrote:
>> The only bugreport I've got so far for the latest anon_vma code is from
>> Jens, and it's a device driver bug in my opinion, but I'd like to have a
>> definitive confirmation from you about the ->nopage API.
On Sat, Mar 20, 2004 at 09:39:51AM -0800, Linus Torvalds wrote:
> I'd say that this is definitely a driver bug.
> If a driver wants to map non-RAM pages, that's perfectly ok, but it MUST
> NOT happen through "nopage()". The driver should map them with
> "remap_page_range()", and thus never take a page fault for such pages at
> all.
> There is no reason to ever lazily map non-RAM pages - clearly they aren't
> using any "real memory", so there is no reason to not fill the page tables
> at mmap() time.
> In other words, the driver is horribly broken.
If our official story is prefaulting, there should be very little to do.
I'll grep around for drivers doing the wrong thing and see if any rmk's
not handling are in need of conversion from fault handling to
remap_page_range().
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 17:48 ` Andrea Arcangeli
@ 2004-03-20 19:03 ` Andrea Arcangeli
0 siblings, 0 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 19:03 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel, Andrew Morton,
Linus Torvalds
On Sat, Mar 20, 2004 at 06:48:57PM +0100, Andrea Arcangeli wrote:
> be needed to hide the bug). This untested patch should make it working
> with non-ram too, so it sounds safer for the short term. I will test it
btw, that works only if NUMA is disabled. there's no way to do
page_to_pfn with a non-ram page with numa enabled since page_zone starts
by reading page->flags, only pte_pfn works (as I found from Martin's
oops).
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 16:09 ` Russell King
@ 2004-03-20 19:44 ` Jaroslav Kysela
2004-03-20 22:23 ` Russell King
2004-03-22 4:43 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 105+ messages in thread
From: Jaroslav Kysela @ 2004-03-20 19:44 UTC (permalink / raw)
To: Russell King; +Cc: LKML
On Sat, 20 Mar 2004, Russell King wrote:
> It is well known that virt_to_page() is only valid on virtual addresses
> which correspond to kernel direct mapped RAM pages, and undefined on
> everything else. Unfortunately, ALSA has been using it with
> pci_alloc_consistent() for a long time, and this behaviour is what
> makes ALSA broken. The fact it works on x86 is merely incidental.
It works on PPC as well (at least we have no error reports).
> If ALSA wants this functionality, the ALSA people should ideally have
> put their requirements forward during the 2.5 development cycle so the
> problem could be addressed.
Yes, I'm sorry about that, but the ->nopage usage was requested by Jeff
Garzik and we're not gurus for the VM stuff. Because we're probably first
starting using of this mapping scheme, it resulted to problems.
> However, luckily in this instance, it is not a big problem to solve.
> It just requires time to sort through all the abstraction layers upon
> abstraction layers which ALSA has.
>
> - and I'm doing exactly this, right now. Be patient. -
Thanks a lot.
Jaroslav
-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 15:06 ` Andrea Arcangeli
2004-03-20 15:27 ` William Lee Irwin III
2004-03-20 15:44 ` Russell King
@ 2004-03-20 20:13 ` Andrew Morton
2004-03-20 20:28 ` Andrea Arcangeli
2004-03-20 20:50 ` William Lee Irwin III
2 siblings, 2 replies; 105+ messages in thread
From: Andrew Morton @ 2004-03-20 20:13 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: wli, linux-kernel, torvalds
Andrea Arcangeli <andrea@suse.de> wrote:
>
> On Sat, Mar 20, 2004 at 06:40:22AM -0800, William Lee Irwin III wrote:
> > On Sat, Mar 20, 2004 at 02:30:25PM +0100, Andrea Arcangeli wrote:
> > > Anyways returning to the non-ram returned by ->nopage see the below
> > > email exchange with Jens. the bug triggering of course is the
> > > BUG_ON(!pfn_valid(page_to_pfn(new_page))).
> > > If we want to return non-ram, we could, but I believe we should change
> > > the API to return a pfn not a page_t * if we want to.
> >
> > This would be very helpful for other reasons also. There's a general
> > API issue with drivers that want or need to do this. The one I've
>
> I'm afraid I'll have to teach ->nopage how to deal with non-ram with
> this page_t API too (changing it to pfn sounds too intrusive in the
> short term), it seems to me that alsa can return non-ram (in the nopage
> callback there's a virt_to_page on some iomm region), and changing alsa
> to use remap_file_pages sounds too intrusive too.
I had a check in a valid pfn in page_add_rmap() for several weeks before I
actually removed the test. The debug check never triggered. But looking
at the code I don't see why not. Weird.
fyi, we don't need the check in page_referenced() and try_to_unmap()
because do_no_page() does not place pages on the LRU. It is the ->nopage
implementation which is responsible for that. Presumably the ALSA driver
was not adding the "page" to the LRU.
I agree that ->nopage implementations should not be doing what that driver
is doing. ->nopage is defined to return a page*: it's crazy to be
returning someting from there which isn't covered by mem_map[].
I just don't think it's important enough to be able to cope with
non-mem_map[] "memory" in do_no_page(), so I agree that requiring ->mmap()
to synchronously instantiate the pte's and retaining the debug check in
do_no_page() is a good idea.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 20:13 ` Andrew Morton
@ 2004-03-20 20:28 ` Andrea Arcangeli
2004-03-20 20:50 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-20 20:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: wli, linux-kernel, torvalds
On Sat, Mar 20, 2004 at 12:13:45PM -0800, Andrew Morton wrote:
> fyi, we don't need the check in page_referenced() and try_to_unmap()
> because do_no_page() does not place pages on the LRU. It is the ->nopage
yes I've noticed.
> I agree that ->nopage implementations should not be doing what that driver
> is doing. ->nopage is defined to return a page*: it's crazy to be
> returning someting from there which isn't covered by mem_map[].
I may have been wrong about that sorry, it's still not certain though,
but I had a bug in the code that would mistake a sigbus for a page_t
outside the mem_map, so it could have been a sigbus not a non-ram page.
Also in the meantime I noticed with NUMA it's impossible to handle
non-ram correctly in ->nopage, at least if using the current
page_to_pfn.
> I just don't think it's important enough to be able to cope with
> non-mem_map[] "memory" in do_no_page(), so I agree that requiring ->mmap()
> to synchronously instantiate the pte's and retaining the debug check in
> do_no_page() is a good idea.
I agree, I reistantiated the debug check because we cannot handle
non-ram from there if it's numa (actually discontigmem). If alsa uses
non-ram pages it must be fixed, but I've an hope it was a sigbus
trouble. We'll know more in a few more hours.
(btw, Martin definitely triggered the sigbus with numa, the 0x3()
dereference was a the page_t address to read page->flags >> 24)
it's good to have reminded the API cannot handle non-ram.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 20:13 ` Andrew Morton
2004-03-20 20:28 ` Andrea Arcangeli
@ 2004-03-20 20:50 ` William Lee Irwin III
2004-03-20 22:26 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 20:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andrea Arcangeli, linux-kernel, torvalds
On Sat, Mar 20, 2004 at 12:13:45PM -0800, Andrew Morton wrote:
> I agree that ->nopage implementations should not be doing what that driver
> is doing. ->nopage is defined to return a page*: it's crazy to be
> returning someting from there which isn't covered by mem_map[].
> I just don't think it's important enough to be able to cope with
> non-mem_map[] "memory" in do_no_page(), so I agree that requiring ->mmap()
> to synchronously instantiate the pte's and retaining the debug check in
> do_no_page() is a good idea.
There are other reasons for doing it, e.g. unusual TLB attributes
and/or unusual pagetable structures backing the virtual region. I don't
see anyone standing up and screaming for more functionality than cache
coherency and/or disablement now, so as far as I'm concerned,
remap_area_pages() (or rmk's stuff) kills the issue.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 19:44 ` Jaroslav Kysela
@ 2004-03-20 22:23 ` Russell King
2004-03-20 22:45 ` William Lee Irwin III
2004-03-22 4:43 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-20 22:23 UTC (permalink / raw)
To: Jaroslav Kysela, Linus Torvalds; +Cc: LKML
On Sat, Mar 20, 2004 at 08:44:44PM +0100, Jaroslav Kysela wrote:
> Yes, I'm sorry about that, but the ->nopage usage was requested by Jeff
> Garzik and we're not gurus for the VM stuff. Because we're probably first
> starting using of this mapping scheme, it resulted to problems.
Well, I've been told to effectively screw my idea by David Woodhouse,
so may I make the radical suggestion that rm -rf linux/sound would
also fix the problem. No, didn't think that was acceptable either.
Ok, so, how the fsck do we fix the sound drivers? How do we mmap()
memory provided by dma_alloc_coherent() into user space portably?
It appears from what David Woodhouse has been going on about, even
providing an architecture dma_coherent_to_page() interface isn't
acceptable.
If we can't answer that question, we might as well remove ALSA and
OSS from the kernel because they are abusing existing kernel
interfaces in ways which can not be solved.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 20:50 ` William Lee Irwin III
@ 2004-03-20 22:26 ` Russell King
2004-03-20 22:45 ` William Lee Irwin III
2004-03-22 6:36 ` William Lee Irwin III
0 siblings, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-20 22:26 UTC (permalink / raw)
To: William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel, torvalds
On Sat, Mar 20, 2004 at 12:50:53PM -0800, William Lee Irwin III wrote:
> On Sat, Mar 20, 2004 at 12:13:45PM -0800, Andrew Morton wrote:
> > I agree that ->nopage implementations should not be doing what that driver
> > is doing. ->nopage is defined to return a page*: it's crazy to be
> > returning someting from there which isn't covered by mem_map[].
> > I just don't think it's important enough to be able to cope with
> > non-mem_map[] "memory" in do_no_page(), so I agree that requiring ->mmap()
> > to synchronously instantiate the pte's and retaining the debug check in
> > do_no_page() is a good idea.
>
> There are other reasons for doing it, e.g. unusual TLB attributes
> and/or unusual pagetable structures backing the virtual region. I don't
> see anyone standing up and screaming for more functionality than cache
> coherency and/or disablement now, so as far as I'm concerned,
> remap_area_pages() (or rmk's stuff) kills the issue.
I'm no longer planning on this. In fact, I see a future where I tell
people who want to use sound on ARM to go screw themselves because
there doesn't seem to be an acceptable solution to this problem.
Of course, this will lead to dirty hacks by many people who *REQUIRE*
sound to work, but I guess we just don't care about that.
(Yes, I'm pissed off over this issue.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 22:26 ` Russell King
@ 2004-03-20 22:45 ` William Lee Irwin III
2004-03-21 20:45 ` David Woodhouse
2004-03-22 6:36 ` William Lee Irwin III
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 22:45 UTC (permalink / raw)
To: rmk, Andrew Morton, Andrea Arcangeli, linux-kernel, torvalds
On Sat, Mar 20, 2004 at 12:50:53PM -0800, William Lee Irwin III wrote:
>> There are other reasons for doing it, e.g. unusual TLB attributes
>> and/or unusual pagetable structures backing the virtual region. I don't
>> see anyone standing up and screaming for more functionality than cache
>> coherency and/or disablement now, so as far as I'm concerned,
>> remap_area_pages() (or rmk's stuff) kills the issue.
On Sat, Mar 20, 2004 at 10:26:39PM +0000, Russell King wrote:
> I'm no longer planning on this. In fact, I see a future where I tell
> people who want to use sound on ARM to go screw themselves because
> there doesn't seem to be an acceptable solution to this problem.
> Of course, this will lead to dirty hacks by many people who *REQUIRE*
> sound to work, but I guess we just don't care about that.
> (Yes, I'm pissed off over this issue.)
This is the exact opposite of what I'd hoped come of this discussion.
ISTR something about remap_area_pages() missing several pieces, but
I pretty much need some kind of clarification to know what. Well, that,
and I presumed your fixups for ALSA were headed toward mainline
regardless after coping with whatever issue dwmw2 had (e.g. returning
pfn's or something).
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 22:23 ` Russell King
@ 2004-03-20 22:45 ` William Lee Irwin III
2004-03-20 23:54 ` Russell King
0 siblings, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-20 22:45 UTC (permalink / raw)
To: rmk, Jaroslav Kysela, Linus Torvalds, LKML
On Sat, Mar 20, 2004 at 08:44:44PM +0100, Jaroslav Kysela wrote:
>> Yes, I'm sorry about that, but the ->nopage usage was requested by Jeff
>> Garzik and we're not gurus for the VM stuff. Because we're probably first
>> starting using of this mapping scheme, it resulted to problems.
On Sat, Mar 20, 2004 at 10:23:41PM +0000, Russell King wrote:
> Well, I've been told to effectively screw my idea by David Woodhouse,
> so may I make the radical suggestion that rm -rf linux/sound would
> also fix the problem. No, didn't think that was acceptable either.
> Ok, so, how the fsck do we fix the sound drivers? How do we mmap()
> memory provided by dma_alloc_coherent() into user space portably?
> It appears from what David Woodhouse has been going on about, even
> providing an architecture dma_coherent_to_page() interface isn't
> acceptable.
> If we can't answer that question, we might as well remove ALSA and
> OSS from the kernel because they are abusing existing kernel
> interfaces in ways which can not be solved.
Is there any possibility of an extension to remap_area_pages() that
could resolve this? I can't say I fully understood and/or remember
the issue with it that you pointed out.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 22:45 ` William Lee Irwin III
@ 2004-03-20 23:54 ` Russell King
2004-03-21 0:22 ` Zwane Mwaikambo
` (3 more replies)
0 siblings, 4 replies; 105+ messages in thread
From: Russell King @ 2004-03-20 23:54 UTC (permalink / raw)
To: William Lee Irwin III, Jaroslav Kysela, Linus Torvalds, LKML
On Sat, Mar 20, 2004 at 02:45:18PM -0800, William Lee Irwin III wrote:
> Is there any possibility of an extension to remap_area_pages() that
> could resolve this? I can't say I fully understood and/or remember
> the issue with it that you pointed out.
The issues are:
1. ALSA wants to mmap the buffer used to transfer data to/from the
card into user space. This buffer may be direct-mapped RAM,
memory allocated via dma_alloc_coherent(), an on-device buffer,
or anything else.
The user space mapping must likewise be DMA-coherent.
Currently, ALSA just does virt_to_page() on whatever address it
feels like in its nopage() function, which is obviously not
acceptable for two out of the three specific cases above.
2. ALSA wants to _coherently_ share data between the kernel-side
drivers, and user space ALSA library, mainly the DMA buffer
head/tail pointers so both kernel space and user space knows
when the buffer is full/empty.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 23:54 ` Russell King
@ 2004-03-21 0:22 ` Zwane Mwaikambo
2004-03-22 4:46 ` Benjamin Herrenschmidt
2004-03-21 0:23 ` William Lee Irwin III
` (2 subsequent siblings)
3 siblings, 1 reply; 105+ messages in thread
From: Zwane Mwaikambo @ 2004-03-21 0:22 UTC (permalink / raw)
To: Russell King; +Cc: William Lee Irwin III, Jaroslav Kysela, Linus Torvalds, LKML
On Sat, 20 Mar 2004, Russell King wrote:
> The issues are:
>
> 1. ALSA wants to mmap the buffer used to transfer data to/from the
> card into user space. This buffer may be direct-mapped RAM,
> memory allocated via dma_alloc_coherent(), an on-device buffer,
> or anything else.
>
> The user space mapping must likewise be DMA-coherent.
>
> Currently, ALSA just does virt_to_page() on whatever address it
> feels like in its nopage() function, which is obviously not
> acceptable for two out of the three specific cases above.
>
> 2. ALSA wants to _coherently_ share data between the kernel-side
> drivers, and user space ALSA library, mainly the DMA buffer
> head/tail pointers so both kernel space and user space knows
> when the buffer is full/empty.
Doesn't DRI also suffer from the same issues?
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 23:54 ` Russell King
2004-03-21 0:22 ` Zwane Mwaikambo
@ 2004-03-21 0:23 ` William Lee Irwin III
2004-03-21 9:52 ` Arjan van de Ven
2004-03-21 10:39 ` Jaroslav Kysela
3 siblings, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-21 0:23 UTC (permalink / raw)
To: rmk, Jaroslav Kysela, Linus Torvalds, LKML
On Sat, Mar 20, 2004 at 11:54:45PM +0000, Russell King wrote:
> The issues are:
> 1. ALSA wants to mmap the buffer used to transfer data to/from the
> card into user space. This buffer may be direct-mapped RAM,
> memory allocated via dma_alloc_coherent(), an on-device buffer,
> or anything else.
> The user space mapping must likewise be DMA-coherent.
> Currently, ALSA just does virt_to_page() on whatever address it
> feels like in its nopage() function, which is obviously not
> acceptable for two out of the three specific cases above.
> 2. ALSA wants to _coherently_ share data between the kernel-side
> drivers, and user space ALSA library, mainly the DMA buffer
> head/tail pointers so both kernel space and user space knows
> when the buffer is full/empty.
Okay, so we've got these pinned down. So I've got two small ideas
(I mentioned them earlier, but maybe vger dropped the message):
(a) I think prefaulting should work for that in general, though the API
doesn't fit the extra things needed for e.g. DMA. Is there some way we
could extend remap_area_pages() (or provide an alternative interface to
similar functionality with the missing pieces included) to do the extra
things needed to make the coherency and/or DMA (or whatever else is
missing) work?
(b) Alternatively, would dma_coherent_to_pfn() instead of
dma_coherent_to_page() and making ->nopage() return pfns help salvage
the method using non-cachable and/or dma-coherent page protections in
vma->vm_page_prot?
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 17:39 ` Linus Torvalds
2004-03-20 17:56 ` Andrea Arcangeli
2004-03-20 18:22 ` William Lee Irwin III
@ 2004-03-21 3:13 ` Chris Wedgwood
2004-03-21 6:23 ` Christoph Hellwig
2 siblings, 1 reply; 105+ messages in thread
From: Chris Wedgwood @ 2004-03-21 3:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrea Arcangeli, linux-kernel, Andrew Morton
On Sat, Mar 20, 2004 at 09:39:51AM -0800, Linus Torvalds wrote:
> If a driver wants to map non-RAM pages, that's perfectly ok, but it
> MUST NOT happen through "nopage()". The driver should map them with
> "remap_page_range()", and thus never take a page fault for such
> pages at all.
This is what the fetchop driver does.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 3:13 ` Chris Wedgwood
@ 2004-03-21 6:23 ` Christoph Hellwig
2004-03-21 7:00 ` Chris Wedgwood
0 siblings, 1 reply; 105+ messages in thread
From: Christoph Hellwig @ 2004-03-21 6:23 UTC (permalink / raw)
To: Chris Wedgwood
Cc: Linus Torvalds, Andrea Arcangeli, linux-kernel, Andrew Morton
On Sat, Mar 20, 2004 at 07:13:55PM -0800, Chris Wedgwood wrote:
> > If a driver wants to map non-RAM pages, that's perfectly ok, but it
> > MUST NOT happen through "nopage()". The driver should map them with
> > "remap_page_range()", and thus never take a page fault for such
> > pages at all.
>
> This is what the fetchop driver does.
Not sure how you get to fetchop here, but that driver does map ram pages
so it should take pagefaults and not use remap_page_range().
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 6:23 ` Christoph Hellwig
@ 2004-03-21 7:00 ` Chris Wedgwood
0 siblings, 0 replies; 105+ messages in thread
From: Chris Wedgwood @ 2004-03-21 7:00 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Andrea Arcangeli, linux-kernel,
Andrew Morton
On Sun, Mar 21, 2004 at 06:23:22AM +0000, Christoph Hellwig wrote:
> Not sure how you get to fetchop here, but that driver does map ram
> pages so it should take pagefaults and not use remap_page_range().
It's been a while since I looked at this.... the fetchop driver maps
AMO space which is excluded from the EFI memory map (and any SHub
aliases) and thus shouldn't be touching anything normally considered
RAM.
<pause>
Checking the source I see:
if (remap_page_range(vm_start, __pa(maddr), PAGE_SIZE, vma->vm_page_prot)) {
fetchop_free_pages(vma->vm_private_data);
vfree(vdata);
fetchop_update_stats(-1, -pages);
return -EAGAIN;
}
as part of the drivers 'mmap fop'. The underlying page is actually
from region-6 so I'm pretty sure it's safe. If you think it is doing
something weird please let me know.
--cw
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 23:54 ` Russell King
2004-03-21 0:22 ` Zwane Mwaikambo
2004-03-21 0:23 ` William Lee Irwin III
@ 2004-03-21 9:52 ` Arjan van de Ven
2004-03-21 10:39 ` Jaroslav Kysela
3 siblings, 0 replies; 105+ messages in thread
From: Arjan van de Ven @ 2004-03-21 9:52 UTC (permalink / raw)
To: Russell King; +Cc: William Lee Irwin III, Jaroslav Kysela, Linus Torvalds, LKML
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
On Sun, 2004-03-21 at 00:54, Russell King wrote:
> On Sat, Mar 20, 2004 at 02:45:18PM -0800, William Lee Irwin III wrote:
> > Is there any possibility of an extension to remap_area_pages() that
> > could resolve this? I can't say I fully understood and/or remember
> > the issue with it that you pointed out.
>
> The issues are:
>
> 1. ALSA wants to mmap the buffer used to transfer data to/from the
> card into user space. This buffer may be direct-mapped RAM,
> memory allocated via dma_alloc_coherent(), an on-device buffer,
> or anything else.
fwiw an ideal DRI/DRM driver would do the same with the video cards
ringbuffer so the problem isn't unique to alsa, it's a generic issue.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 23:54 ` Russell King
` (2 preceding siblings ...)
2004-03-21 9:52 ` Arjan van de Ven
@ 2004-03-21 10:39 ` Jaroslav Kysela
3 siblings, 0 replies; 105+ messages in thread
From: Jaroslav Kysela @ 2004-03-21 10:39 UTC (permalink / raw)
To: Russell King; +Cc: William Lee Irwin III, Linus Torvalds, LKML
On Sat, 20 Mar 2004, Russell King wrote:
> The issues are:
>
> 1. ALSA wants to mmap the buffer used to transfer data to/from the
> card into user space. This buffer may be direct-mapped RAM,
> memory allocated via dma_alloc_coherent(), an on-device buffer,
> or anything else.
We don't require to remap the mmio ring buffer (actually only RME32 has
a PCI memory window with the ring buffer, but this driver uses
memcpy_(to|from)io already). So, we need to remap RAM and DMA pages
(should be special RAM also) only.
> The user space mapping must likewise be DMA-coherent.
>
> Currently, ALSA just does virt_to_page() on whatever address it
> feels like in its nopage() function, which is obviously not
> acceptable for two out of the three specific cases above.
Yes.
> 2. ALSA wants to _coherently_ share data between the kernel-side
> drivers, and user space ALSA library, mainly the DMA buffer
> head/tail pointers so both kernel space and user space knows
> when the buffer is full/empty.
Yes.
Jaroslav
-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 22:45 ` William Lee Irwin III
@ 2004-03-21 20:45 ` David Woodhouse
2004-03-21 20:49 ` Christoph Hellwig
0 siblings, 1 reply; 105+ messages in thread
From: David Woodhouse @ 2004-03-21 20:45 UTC (permalink / raw)
To: William Lee Irwin III
Cc: rmk, Andrew Morton, Andrea Arcangeli, linux-kernel, torvalds
On Sat, 2004-03-20 at 14:45 -0800, William Lee Irwin III wrote:
> This is the exact opposite of what I'd hoped come of this discussion.
> ISTR something about remap_area_pages() missing several pieces, but
> I pretty much need some kind of clarification to know what. Well, that,
> and I presumed your fixups for ALSA were headed toward mainline
> regardless after coping with whatever issue dwmw2 had (e.g. returning
> pfn's or something).
My request was that we shouldn't assume an architecture will have a
'struct page' corresponding to whatever it chooses to return from
dma_alloc_coherent().
There are machines where DMA to/from main memory _cannot_ be coherent
but we have some memory elsewhere, perhaps some SRAM which itself is
hanging off an I/O bus somewhere, which can be used. One of my toys is
currently running with dma_alloc_coherent() giving out memory from a PCI
video card, in fact.
Using a PFN should be OK.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 20:45 ` David Woodhouse
@ 2004-03-21 20:49 ` Christoph Hellwig
2004-03-21 20:57 ` David Woodhouse
0 siblings, 1 reply; 105+ messages in thread
From: Christoph Hellwig @ 2004-03-21 20:49 UTC (permalink / raw)
To: David Woodhouse
Cc: William Lee Irwin III, rmk, Andrew Morton, Andrea Arcangeli,
linux-kernel, torvalds
On Sun, Mar 21, 2004 at 08:45:14PM +0000, David Woodhouse wrote:
> There are machines where DMA to/from main memory _cannot_ be coherent
> but we have some memory elsewhere, perhaps some SRAM which itself is
> hanging off an I/O bus somewhere, which can be used. One of my toys is
> currently running with dma_alloc_coherent() giving out memory from a PCI
> video card, in fact.
>
> Using a PFN should be OK.
And what exactly is a PFN without associated struct page supposed to mean?
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 20:49 ` Christoph Hellwig
@ 2004-03-21 20:57 ` David Woodhouse
2004-03-21 21:53 ` Linus Torvalds
0 siblings, 1 reply; 105+ messages in thread
From: David Woodhouse @ 2004-03-21 20:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: William Lee Irwin III, rmk, Andrew Morton, Andrea Arcangeli,
linux-kernel, torvalds
On Sun, 2004-03-21 at 20:49 +0000, Christoph Hellwig wrote:
> And what exactly is a PFN without associated struct page supposed to mean?
It's something you can put into a PTE, and that's about it. Which unless
I'm misunderstanding ALSA/rmk's requirements, should be enough.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 20:57 ` David Woodhouse
@ 2004-03-21 21:53 ` Linus Torvalds
2004-03-21 22:17 ` Jeff Garzik
` (2 more replies)
0 siblings, 3 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-21 21:53 UTC (permalink / raw)
To: David Woodhouse
Cc: Christoph Hellwig, William Lee Irwin III, rmk, Andrew Morton,
Andrea Arcangeli, linux-kernel
On Sun, 21 Mar 2004, David Woodhouse wrote:
>
> On Sun, 2004-03-21 at 20:49 +0000, Christoph Hellwig wrote:
> > And what exactly is a PFN without associated struct page supposed to mean?
>
> It's something you can put into a PTE, and that's about it. Which unless
> I'm misunderstanding ALSA/rmk's requirements, should be enough.
It would really be wrong to have nopage() return a pte. The thing is, the
VM really works on "struct page", all over the map. It does things like
"page_cache_release()" on the page if the file-backed VMA has been
truncated, and it just knows that the return value from "nopage()" has
_structure_.
Some architectures have per-page flags for things like "this page may need
to have icache flushed from it" etc.
So I really put my veto on "nopage()" returning a PFN. That's just wrong,
wrong, wrong. It returns a "struct page" pointer, and it has lots of
reasons for that.
And none of the reasons for _not_ doing it are valid, since such a user
can just pre-populate the page tables anyway.
So don't go there.
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 21:53 ` Linus Torvalds
@ 2004-03-21 22:17 ` Jeff Garzik
2004-03-21 22:23 ` David Woodhouse
2004-03-21 22:23 ` Russell King
2 siblings, 0 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-21 22:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Woodhouse, Christoph Hellwig, William Lee Irwin III, rmk,
Andrew Morton, Andrea Arcangeli, linux-kernel
I wonder if we could jump back a step...
Years ago, I wanted to avoid remap_page_range() when I was writing
via82cxxx_audio.c, and so Linus suggested the ->nopage approach (which I
liked, and which is still present today in the sound/oss dir).
AFAICS device drivers have three needs that keep getting reinvented over
and over again, WRT mmap(2):
1) letting userspace directly address a region allocated by the kernel
DMA APIs
2) ditto, for MMIO (ioremap)
3) ditto, for PIO (inl/outl)
Alas, #3 must be faked on x86[-64], but this is done anyway for e.g.
mmap'd PCI config access. Many platforms implement in[bwl] essentially
as read[bwl], so for them mmap'd PIO is easy.
#1-3 above are really what device drivers want to do. My
suggestion/request to the VM wizards would be to directly provide mmap
helpers for dma/mmio/pio, that Does The Right Thing. And require their
use in every driver. Don't give driver writers the opportunity to think
about this stuff and/or screw it up.
If there are special DMA requirements of a particular bus or platform,
hide that in there. If some methods of DMA or MMIO or PIO do not lend
themselves to directly mapping to a struct page, the MM guys may dicker
about the interface, but the device driver guys just want #1-3 and don't
really care :) Either it's directly addressible [via some page table
magic] from userland, or it isn't.
So please forgive the tangent, but this thread is IMO talking more about
implementation than the real problem :) pci_dma_mmap() helper or
something like it should be the only thing the driver should care about.
I'm tired of the same platform bugs and issues, in mmap handlers,
reappearing over and over again... Tired of platform-specific ifdefs in
mmap-capable drivers, too.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 21:53 ` Linus Torvalds
2004-03-21 22:17 ` Jeff Garzik
@ 2004-03-21 22:23 ` David Woodhouse
2004-03-21 22:23 ` Russell King
2 siblings, 0 replies; 105+ messages in thread
From: David Woodhouse @ 2004-03-21 22:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Hellwig, William Lee Irwin III, rmk, Andrew Morton,
Andrea Arcangeli, linux-kernel
On Sun, 2004-03-21 at 13:53 -0800, Linus Torvalds wrote:
> So I really put my veto on "nopage()" returning a PFN. That's just wrong,
> wrong, wrong. It returns a "struct page" pointer, and it has lots of
> reasons for that.
That's fine -- I wasn't suggesting nopage() should return a PFN.
I was suggesting that if someone wants to map something they're given by
dma_alloc_coherent() into memory, they should be given a PFN to deal
with -- _not_ a "struct page". Therefore, you can't use nopage() for
mapping dma_coherent memory into userspace.
Basically, we should consider the stuff returned by dma_alloc_coherent
to be 'non-RAM' in the context of your previous statement:
'If a driver wants to map non-RAM pages, that's perfectly ok,
but it MUST NOT happen through "nopage()".'
There are machines where you _cannot_ sensibly use host memory for
dma_coherent() allocations, but on which there _is_ a few megabytes of
SRAM hanging off the PCI bus which was put there specifically for that
purpose. So dma_alloc_coherent() returns something for which there is
not a valid 'struct page'.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 21:53 ` Linus Torvalds
2004-03-21 22:17 ` Jeff Garzik
2004-03-21 22:23 ` David Woodhouse
@ 2004-03-21 22:23 ` Russell King
2004-03-21 22:34 ` Jeff Garzik
2 siblings, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-21 22:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Woodhouse, Christoph Hellwig, William Lee Irwin III,
Andrew Morton, Andrea Arcangeli, linux-kernel
On Sun, Mar 21, 2004 at 01:53:41PM -0800, Linus Torvalds wrote:
> So I really put my veto on "nopage()" returning a PFN. That's just wrong,
> wrong, wrong. It returns a "struct page" pointer, and it has lots of
> reasons for that.
Ok then. We leave nopage() as is, and define that for returning RAM
backed pages.
We also have a fault() handler which is used for faulting in driver
mappings, which returns a PFN suitable for set_pte(). The fault()
would be separate from do_no_page() in much the same way as
do_anonymous_page() is separate, and it knows that PFNs returned
from this have nothing to do with struct pages. All it does is
set the relevant PTE entry in the page tables to create the mapping.
I don't think remap_area_pages() solves the problem - think about
the DMA ring buffer returned by dma_alloc_coherent(). This returns
an architectually defined virtual address and a DMA address.
Neither of these two addresses can be converted today to a struct
page or a PFN. Sure, we can invent some architecture defined
interface to get hold of this information, but take a moment to
consider all the cases where this type of activity goes on.
What about the case where the buffer is scatter-gather in nature,
just like we're so fond of telling driver writers who want to grab
(eg) 1MB of contiguous kernel memory for video buffers and the like?
Do we really want to tell driver writers to walk over 1MB of pages,
page by page, inserting them into the processes page tables via
remap_area_pages()?
Or does the ->fault() method make sense in all these cases?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:23 ` Russell King
@ 2004-03-21 22:34 ` Jeff Garzik
2004-03-21 22:42 ` David Woodhouse
2004-03-21 22:51 ` Russell King
0 siblings, 2 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-21 22:34 UTC (permalink / raw)
To: Russell King
Cc: Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
Russell King wrote:
> What about the case where the buffer is scatter-gather in nature,
> just like we're so fond of telling driver writers who want to grab
> (eg) 1MB of contiguous kernel memory for video buffers and the like?
> Do we really want to tell driver writers to walk over 1MB of pages,
> page by page, inserting them into the processes page tables via
> remap_area_pages()?
Tell driver writers to call a standard platform function with a
{dma|mmio|pio|vmalloc} handle+size+len for {dma|mmio|pio|vmalloc} mmap
setup, and {fault|nopage} handler. ;-) IMO they shouldn't have to care
about the details.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:34 ` Jeff Garzik
@ 2004-03-21 22:42 ` David Woodhouse
2004-03-21 23:06 ` Jeff Garzik
2004-03-21 22:51 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: David Woodhouse @ 2004-03-21 22:42 UTC (permalink / raw)
To: Jeff Garzik
Cc: Russell King, Linus Torvalds, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, 2004-03-21 at 17:34 -0500, Jeff Garzik wrote:
> Tell driver writers to call a standard platform function with a
> {dma|mmio|pio|vmalloc} handle+size+len for {dma|mmio|pio|vmalloc} mmap
> setup, and {fault|nopage} handler. ;-) IMO they shouldn't have to care
> about the details.
Don't let drivers see the {fault|nopage} handler. On most arches it can
probably continue to be nopage(); other arches may use the
newly-proposed fault() or perhaps just put all the PTEs in place up
front. The driver shouldn't be given an opportunity to care.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:34 ` Jeff Garzik
2004-03-21 22:42 ` David Woodhouse
@ 2004-03-21 22:51 ` Russell King
2004-03-21 23:09 ` Jeff Garzik
2004-03-21 23:11 ` Linus Torvalds
1 sibling, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-21 22:51 UTC (permalink / raw)
To: Jeff Garzik
Cc: Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, Mar 21, 2004 at 05:34:01PM -0500, Jeff Garzik wrote:
> Russell King wrote:
> > What about the case where the buffer is scatter-gather in nature,
> > just like we're so fond of telling driver writers who want to grab
> > (eg) 1MB of contiguous kernel memory for video buffers and the like?
> > Do we really want to tell driver writers to walk over 1MB of pages,
> > page by page, inserting them into the processes page tables via
> > remap_area_pages()?
>
> Tell driver writers to call a standard platform function with a
> {dma|mmio|pio|vmalloc} handle+size+len for {dma|mmio|pio|vmalloc} mmap
> setup, and {fault|nopage} handler. ;-) IMO they shouldn't have to care
> about the details.
I don't think this addresses the scatter-gather case I mentioned above.
Or if we are, we've rewritten ALSA before hand to use Linux scatterlists
along side several dma_alloc_coherent mappings and have the ability to
mmap these as well.
Remember that we're fond of telling driver writers to use scatter gather
lists rather than grabbing one large contiguous memory chunk... So
they did exactly as we told them. Using pci_alloc_consistent and/or
dma_alloc_coherent and built their own scatter lists.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:42 ` David Woodhouse
@ 2004-03-21 23:06 ` Jeff Garzik
0 siblings, 0 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-21 23:06 UTC (permalink / raw)
To: David Woodhouse
Cc: Russell King, Linus Torvalds, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
David Woodhouse wrote:
> On Sun, 2004-03-21 at 17:34 -0500, Jeff Garzik wrote:
>
>>Tell driver writers to call a standard platform function with a
>>{dma|mmio|pio|vmalloc} handle+size+len for {dma|mmio|pio|vmalloc} mmap
>>setup, and {fault|nopage} handler. ;-) IMO they shouldn't have to care
>>about the details.
>
>
> Don't let drivers see the {fault|nopage} handler. On most arches it can
> probably continue to be nopage(); other arches may use the
> newly-proposed fault() or perhaps just put all the PTEs in place up
> front. The driver shouldn't be given an opportunity to care.
If that's possible within the MM APIs... certainly. Have a standard
struct vm_operations_struct for dma, dma s/g, mmio, ... I presume?
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:51 ` Russell King
@ 2004-03-21 23:09 ` Jeff Garzik
2004-03-21 23:11 ` Linus Torvalds
1 sibling, 0 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-21 23:09 UTC (permalink / raw)
To: Russell King
Cc: Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
Russell King wrote:
> I don't think this addresses the scatter-gather case I mentioned above.
> Or if we are, we've rewritten ALSA before hand to use Linux scatterlists
> along side several dma_alloc_coherent mappings and have the ability to
> mmap these as well.
>
> Remember that we're fond of telling driver writers to use scatter gather
> lists rather than grabbing one large contiguous memory chunk... So
> they did exactly as we told them. Using pci_alloc_consistent and/or
> dma_alloc_coherent and built their own scatter lists.
Agreed... though IMO that can handled by considering DMA S/G as just
one more set of helper functions that the driver writer should not have
to implement ;) dma_sg_setup_mmap() could function as a peer alongside
dma_setup_mmap(), mmio_setup_mmap(), etc. Providing such to driver
writers gives them incentive to use S/G lists as well as incentive not
to invent their own mmap(2) setup and handling code.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 22:51 ` Russell King
2004-03-21 23:09 ` Jeff Garzik
@ 2004-03-21 23:11 ` Linus Torvalds
2004-03-21 23:22 ` Jeff Garzik
2004-03-21 23:45 ` Russell King
1 sibling, 2 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-21 23:11 UTC (permalink / raw)
To: Russell King
Cc: Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, 21 Mar 2004, Russell King wrote:
>
> Remember that we're fond of telling driver writers to use scatter gather
> lists rather than grabbing one large contiguous memory chunk... So
> they did exactly as we told them. Using pci_alloc_consistent and/or
> dma_alloc_coherent and built their own scatter lists.
I do think that we should introduce a "map_dma_coherent()" thing, which
basically takes a list of pages that have been allocated by
dma_alloc_coherent(), and remaps them into user space. How hard can that
be?
In fact, on a lot of architectures (well, at least x86, and likely
anything else that doesn't use any IOTLB and just allocates a chunk of
physical memory), I think the "map_dma_coherent()" thing should basically
just become a "remap_page_range()". Ie something like
#define map_dma_coherent(vma, vaddr, len) \
remap_page_range(vma, vma->vm_start, __pa(vaddr), len, vma->vm_page_prot)
for the simple case.
Ehh?
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:11 ` Linus Torvalds
@ 2004-03-21 23:22 ` Jeff Garzik
2004-03-21 23:51 ` Linus Torvalds
2004-03-21 23:45 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: Jeff Garzik @ 2004-03-21 23:22 UTC (permalink / raw)
To: Linus Torvalds
Cc: Russell King, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
Linus Torvalds wrote:
> In fact, on a lot of architectures (well, at least x86, and likely
> anything else that doesn't use any IOTLB and just allocates a chunk of
> physical memory), I think the "map_dma_coherent()" thing should basically
> just become a "remap_page_range()". Ie something like
>
> #define map_dma_coherent(vma, vaddr, len) \
> remap_page_range(vma, vma->vm_start, __pa(vaddr), len, vma->vm_page_prot)
>
> for the simple case.
That would be nice, though the reason I avoided remap_page_range() in
via82cxxx_audio is that it discourages S/G. Because remap_page_range()
is easier and more portable, several drivers allocate one-big-area and
then create an S/G list describing individual portions of that area.
I want to avoid that. Most decent h/w is s/g these days.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:11 ` Linus Torvalds
2004-03-21 23:22 ` Jeff Garzik
@ 2004-03-21 23:45 ` Russell King
2004-03-22 0:23 ` William Lee Irwin III
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-21 23:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, Mar 21, 2004 at 03:11:58PM -0800, Linus Torvalds wrote:
> On Sun, 21 Mar 2004, Russell King wrote:
> > Remember that we're fond of telling driver writers to use scatter gather
> > lists rather than grabbing one large contiguous memory chunk... So
> > they did exactly as we told them. Using pci_alloc_consistent and/or
> > dma_alloc_coherent and built their own scatter lists.
>
> I do think that we should introduce a "map_dma_coherent()" thing, which
> basically takes a list of pages that have been allocated by
> dma_alloc_coherent(), and remaps them into user space. How hard can that
> be?
>
> In fact, on a lot of architectures (well, at least x86, and likely
> anything else that doesn't use any IOTLB and just allocates a chunk of
> physical memory), I think the "map_dma_coherent()" thing should basically
> just become a "remap_page_range()". Ie something like
>
> #define map_dma_coherent(vma, vaddr, len) \
> remap_page_range(vma, vma->vm_start, __pa(vaddr), len, vma->vm_page_prot)
>
> for the simple case.
Ok, splitting hairs, for the coherent contiguous case, what about:
int dma_coherent_map(struct vm_area_struct *vma, void *cpu_addr,
dma_addr_t dma_addr, size_t size);
and x86 would be:
#define dma_coherent_map(vma,cpu_addr,dma_addr,size) \
remap_page_range(vma, vma->vm_start, __pa(cpu_addr), \
size, vma->vm_page_prot)
This then leaves the PCI BAR case and the DMA coherent SG buffer case,
though neither of those fall within my personal problem space at present.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:22 ` Jeff Garzik
@ 2004-03-21 23:51 ` Linus Torvalds
2004-03-21 23:58 ` Russell King
` (2 more replies)
0 siblings, 3 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-21 23:51 UTC (permalink / raw)
To: Jeff Garzik
Cc: Russell King, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, 21 Mar 2004, Jeff Garzik wrote:
>
> That would be nice, though the reason I avoided remap_page_range() in
> via82cxxx_audio is that it discourages S/G. Because remap_page_range()
> is easier and more portable, several drivers allocate one-big-area and
> then create an S/G list describing individual portions of that area.
Note that there is really two different kinds of IO memory:
- real IO-mapped memory on the other side of a bus
- real RAM which is on the CPU side of the bus, but that has additionally
been "mapped" some way as to be visible from devices.
The second kind is what you seem to be talking about, and it actually
_does_ have a "struct page" associated with it, and as such you can
happily return it from "nopage()". It's just that you had better be sure
that you find the page properly. Just doing a "virt_to_page()" doesn't do
it - you have to make sure to undo the mapping that was done for DMA
reasons.
So the minimal fix for any misuses would be to just have a
"dma_map_to_page()" reverse mapping for "dma_alloc_coherent()". For x86,
that's just the same thing as "virt_to_page()". For others, you have to
look more carefully at undoing whatever mapping the iommu has been set up
for.
That might be the minimal fix, since it would basically involve:
- change whatever offensive "virt_to_page()" calls into
"dma_map_to_page()".
- implement "dma_map_to_page()" for all architectures.
Would that make people happy?
(Architectures that have cache coherency issues will obviously also have
to set cache disable bits in the vma information, that's they broken
architecture problem)
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:51 ` Linus Torvalds
@ 2004-03-21 23:58 ` Russell King
2004-03-22 0:34 ` Andrea Arcangeli
2004-03-23 17:59 ` Andy Whitcroft
2004-03-22 0:02 ` David Woodhouse
2004-03-22 0:10 ` Jeff Garzik
2 siblings, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-21 23:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, Mar 21, 2004 at 03:51:31PM -0800, Linus Torvalds wrote:
> That might be the minimal fix, since it would basically involve:
> - change whatever offensive "virt_to_page()" calls into
> "dma_map_to_page()".
> - implement "dma_map_to_page()" for all architectures.
>
> Would that make people happy?
Unfortunately this doesn't make dwmw2 happy - he claims to have machines
which implement dma_alloc_coherent using RAM which doesn't have any
struct page associated with it.
I've already got the interface you suggest above for ARM, and I'd have
taken this further had dwmw2 not chimed in.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:51 ` Linus Torvalds
2004-03-21 23:58 ` Russell King
@ 2004-03-22 0:02 ` David Woodhouse
2004-03-22 3:28 ` Linus Torvalds
2004-03-22 0:10 ` Jeff Garzik
2 siblings, 1 reply; 105+ messages in thread
From: David Woodhouse @ 2004-03-22 0:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Garzik, Russell King, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, 2004-03-21 at 15:51 -0800, Linus Torvalds wrote:
> Note that there is really two different kinds of IO memory:
> - real IO-mapped memory on the other side of a bus
> - real RAM which is on the CPU side of the bus, but that has additionally
> been "mapped" some way as to be visible from devices.
>
> The second kind is what you seem to be talking about,
<...>
> So the minimal fix for any misuses would be to just have a
> "dma_map_to_page()" reverse mapping for "dma_alloc_coherent()". For x86,
> that's just the same thing as "virt_to_page()". For others, you have to
> look more carefully at undoing whatever mapping the iommu has been set up
> for.
You are assuming that dma_alloc_coherent() will always return memory of
that second kind -- host-side RAM, not PCI-side. That hasn't previously
been a requirement, and there are machines out there on which it makes a
lot more sense for dma_alloc_coherent() to use some SRAM which happens
to be hanging off the I/O bus than it does to use host RAM.
Doing dma_map_to_pfn() instead of dma_map_to_page() would work. That
means you can't use nopage() for mappings of dma_coherent memory. That's
fine though.
> Would that make people happy?
No. It'd be OK if you make it dma_map_to_pfn() instead of
dma_map_to_page() though. As discussed, that means you can't use
nopage() for mappings of dma_coherent memory. That's fine though.
I think it would be better to provide arch-specific functions for
mapping dma_coherent allocations and SG lists. On most architectures we
can just do it with virt_to_page() and nopage() and it'll be OK. On
others we can do the right thing as appropriate.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:51 ` Linus Torvalds
2004-03-21 23:58 ` Russell King
2004-03-22 0:02 ` David Woodhouse
@ 2004-03-22 0:10 ` Jeff Garzik
2004-03-22 0:20 ` Russell King
2 siblings, 1 reply; 105+ messages in thread
From: Jeff Garzik @ 2004-03-22 0:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Russell King, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
Linus Torvalds wrote:
>
> On Sun, 21 Mar 2004, Jeff Garzik wrote:
>
>>That would be nice, though the reason I avoided remap_page_range() in
>>via82cxxx_audio is that it discourages S/G. Because remap_page_range()
>>is easier and more portable, several drivers allocate one-big-area and
>>then create an S/G list describing individual portions of that area.
>
>
> Note that there is really two different kinds of IO memory:
> - real IO-mapped memory on the other side of a bus
> - real RAM which is on the CPU side of the bus, but that has additionally
> been "mapped" some way as to be visible from devices.
Yes. via audio example is DMA (second kind), and an fbdev driver would
need to worry about the first kind (MMIO).
For the second kind, your solution (snipped) seems sane, though I wonder
where dma_unmap_to_page() is called.
For the first kind, please read fb_mmap in drivers/video/fbmem.c. Look
at the _horror_ of ifdefs in exporting the framebuffer. And that horror
is what's often needed when letting userspace mmap(2) PCI memory IO regions.
So, an mmio_map() in addition to dma_map*?
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:10 ` Jeff Garzik
@ 2004-03-22 0:20 ` Russell King
2004-03-22 0:33 ` Jeff Garzik
2004-03-22 4:57 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-22 0:20 UTC (permalink / raw)
To: Jeff Garzik
Cc: Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Sun, Mar 21, 2004 at 07:10:53PM -0500, Jeff Garzik wrote:
> For the first kind, please read fb_mmap in drivers/video/fbmem.c. Look
> at the _horror_ of ifdefs in exporting the framebuffer. And that horror
> is what's often needed when letting userspace mmap(2) PCI memory IO regions.
Most of this:
#if defined(__mc68000__)
...
#elif defined(__mips__)
pgprot_val(vma->vm_page_prot) &= ~_CACHE_MASK;
pgprot_val(vma->vm_page_prot) |= _CACHE_UNCACHED;
#elif defined(__sh__)
pgprot_val(vma->vm_page_prot) &= ~_PAGE_CACHABLE;
#elif defined(__hppa__)
pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
#elif defined(__ia64__) || defined(__arm__)
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
#else
#warning What do we have to do here??
#endif
exists because architectures haven't defined their private
pgprot_writecombine() implementations, preferring instead to add
to the preprocessor junk instead.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:45 ` Russell King
@ 2004-03-22 0:23 ` William Lee Irwin III
2004-03-22 0:29 ` Jeff Garzik
0 siblings, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 0:23 UTC (permalink / raw)
To: rmk, Linus Torvalds, Jeff Garzik, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli, linux-kernel
On Sun, Mar 21, 2004 at 11:45:15PM +0000, Russell King wrote:
> Ok, splitting hairs, for the coherent contiguous case, what about:
> int dma_coherent_map(struct vm_area_struct *vma, void *cpu_addr,
> dma_addr_t dma_addr, size_t size);
> and x86 would be:
> #define dma_coherent_map(vma,cpu_addr,dma_addr,size) \
> remap_page_range(vma, vma->vm_start, __pa(cpu_addr), \
> size, vma->vm_page_prot)
> This then leaves the PCI BAR case and the DMA coherent SG buffer case,
> though neither of those fall within my personal problem space at present.
Can we get an offset into the area as one of the args? Then scatter/gather
should be trivially constructible (via iteration) from the interface.
Maybe something like:
struct dma_scatterlist {
dma_addr_t dma_addr; /* DMA address */
void *cpu_addr; /* cpu address */
unsigned long length; /* in units of pages */
};
int dma_mmap_coherent_sg(struct dma_scatterlist *sglist,
int nr_sglist_elements, /* length of sglist */
struct vm_area_struct *vma, /* for address space */
unsigned long address, /* user virtual address */
unsigned long offset, /* offset (in pages) */
unsigned long nr_pages); /* length (in pages) */
int dma_munmap_coherent_sg(struct dma_scatterlist *sglist,
int nr_sglist_elements, /* length of sglist */
struct vm_area_struct *vma, /* for address space */
unsigned long address, /* user virtual address */
unsigned long offset, /* offset (in pages) */
unsigned long nr_pages); /* length (in pages) */
int dma_alloc_coherent_sg(struct dma_scatterlist **sglist,
unsigned long length); /* length in pages */
int dma_free_coherent_sg(struct dma_scatterlist **sglist,
unsigned long length); /* length in pages */
Would be useful? And these in turn would drive the dma_alloc_coherent()
and helpers like:
int dma_mmap_coherent(struct vm_area_struct *vma,
unsigned long address,
dma_addr_t dma_addr, /* DMA address */
void *cpu_addr, /* cpu address */
unsigned long nr_pages); /* length (in pages) */
int dma_munmap_coherent(struct vm_area_struct *vma,
unsigned long address,
dma_addr_t dma_addr, /* DMA address */
void *cpu_addr, /* cpu address */
unsigned long nr_pages); /* length (in pages) */
Does any of this sound like it's on the right track API-wise?
My thought on attacking the scatter/gather issue is basically centered
around "they're going to try to do it anyway, and if they don't have
something there to do it for them, they'll get it wrong."
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:23 ` William Lee Irwin III
@ 2004-03-22 0:29 ` Jeff Garzik
2004-03-22 1:28 ` William Lee Irwin III
2004-03-22 3:45 ` William Lee Irwin III
0 siblings, 2 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-22 0:29 UTC (permalink / raw)
To: William Lee Irwin III
Cc: rmk, Linus Torvalds, David Woodhouse, Christoph Hellwig,
Andrew Morton, Andrea Arcangeli, linux-kernel
William Lee Irwin III wrote:
> int dma_mmap_coherent_sg(struct dma_scatterlist *sglist,
> int nr_sglist_elements, /* length of sglist */
> struct vm_area_struct *vma, /* for address space */
> unsigned long address, /* user virtual address */
> unsigned long offset, /* offset (in pages) */
> unsigned long nr_pages); /* length (in pages) */
>
> int dma_munmap_coherent_sg(struct dma_scatterlist *sglist,
> int nr_sglist_elements, /* length of sglist */
> struct vm_area_struct *vma, /* for address space */
> unsigned long address, /* user virtual address */
> unsigned long offset, /* offset (in pages) */
> unsigned long nr_pages); /* length (in pages) */
>
> int dma_alloc_coherent_sg(struct dma_scatterlist **sglist,
> unsigned long length); /* length in pages */
>
> int dma_free_coherent_sg(struct dma_scatterlist **sglist,
> unsigned long length); /* length in pages */
No comment on struct dma_scatterlist, but the above is the most natural
API for audio drivers at least.
Audio drivers allocate buffers at ->probe() or open(2), and the only
entity that actually cares about the contents of the buffers are (a) the
hardware and (b) userland. via82cxxx_audio only uses
pci_alloc_consistent because there's not a more appropriate DMA
allocator for the use to which that memory is put.
Audio drivers only need to read/write the buffers inside the kernel when
implementing read(2) and write(2) via copy_{to,from}_user().
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:20 ` Russell King
@ 2004-03-22 0:33 ` Jeff Garzik
2004-03-22 4:57 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-22 0:33 UTC (permalink / raw)
To: Russell King
Cc: Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
Russell King wrote:
> On Sun, Mar 21, 2004 at 07:10:53PM -0500, Jeff Garzik wrote:
>
>>For the first kind, please read fb_mmap in drivers/video/fbmem.c. Look
>>at the _horror_ of ifdefs in exporting the framebuffer. And that horror
>>is what's often needed when letting userspace mmap(2) PCI memory IO regions.
>
>
> Most of this:
[...]
> exists because architectures haven't defined their private
> pgprot_writecombine() implementations, preferring instead to add
> to the preprocessor junk instead.
Agreed but the larger point is that that code should not be in fbmem.c
at all.
There are two main types of usage for bus IO memory (MMIO), data and
hardware registers. Both types driver writers currently export to
userspace via mmap(2). Caching and write combining are simply
driver-controlled attributes one must consider.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:58 ` Russell King
@ 2004-03-22 0:34 ` Andrea Arcangeli
2004-03-22 3:05 ` Linus Torvalds
2004-03-23 17:59 ` Andy Whitcroft
1 sibling, 1 reply; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-22 0:34 UTC (permalink / raw)
To: Linus Torvalds, Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, linux-kernel
On Sun, Mar 21, 2004 at 11:58:54PM +0000, Russell King wrote:
> On Sun, Mar 21, 2004 at 03:51:31PM -0800, Linus Torvalds wrote:
> > That might be the minimal fix, since it would basically involve:
> > - change whatever offensive "virt_to_page()" calls into
> > "dma_map_to_page()".
> > - implement "dma_map_to_page()" for all architectures.
> >
> > Would that make people happy?
>
> Unfortunately this doesn't make dwmw2 happy - he claims to have machines
> which implement dma_alloc_coherent using RAM which doesn't have any
> struct page associated with it.
I would suggest to add a ->nopage_dma (or whatever other name for an
additional callback in the vm_ops) that will return a non pageable "pfn"
number (not a page_t*). This is all the VM needs to setup the pte
properly, this callback will not know anything about the pageable stuff
(i.e. it will not have to call page_add_rmap or stuff like that).
I definitely agree a driver currently has no way to work safe if it
returns non-ram via ->nopage and it must use remap_file_pages, but OTOH
I don't like remap_file_pages myself, it's a lot nicer to use paging
even for mapping non-ram, even if you don't use scatter gather, even if
you've just an huge block of contigous physical ram, at the very least
for the scheduler latencies in a loop under the page_table_lock.
nopage_dma will be like this:
do_no_page_dma(vma, ...)
{
pfn = vma->vm_ops->nopage_dma()
if (pfn_valid(pfn)) {
/*
* going from valid pfn to page is always ok
* the other way around not
*/
page = pfn_to_page(pfn);
BUG_ON(page->mapping);
if (!PageReserved(page))
mm->rss++;
}
setup the pte using the pfn here, no vm accounting or pte tracking
required since it's either non valid pfn or reserved page that
will be ignored by the zap_pte stuff
}
do_no_page()
{
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table,
pmd, write_access, address);
if (vma->vm_ops->nopage_dma)
return do_no_page_dma(...)
}
Then the mmu VM troubles are over, how you keep the cache of this pte
view coherent with the iommu view isn't something solvable by the mmu,
but certainly you can add whatever cache flushing callback in teh
do_no_page_dma core, that's a slow path so you can play with it from any
arch adding whatever needed library calls.
btw, on a slightly related note, I don't think this is safe in
get_user_pages in 2.6:
if (!PageReserved(pages[i]))
page_cache_get(pages[i]);
there's nothing preventing munmap to free the page while somebody does
I/O on the page via get_user_pages.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:29 ` Jeff Garzik
@ 2004-03-22 1:28 ` William Lee Irwin III
2004-03-22 3:45 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 1:28 UTC (permalink / raw)
To: Jeff Garzik
Cc: rmk, Linus Torvalds, David Woodhouse, Christoph Hellwig,
Andrew Morton, Andrea Arcangeli, linux-kernel
On Sun, Mar 21, 2004 at 07:29:59PM -0500, Jeff Garzik wrote:
> No comment on struct dma_scatterlist, but the above is the most natural
> API for audio drivers at least.
> Audio drivers allocate buffers at ->probe() or open(2), and the only
> entity that actually cares about the contents of the buffers are (a) the
> hardware and (b) userland. via82cxxx_audio only uses
> pci_alloc_consistent because there's not a more appropriate DMA
> allocator for the use to which that memory is put.
> Audio drivers only need to read/write the buffers inside the kernel when
> implementing read(2) and write(2) via copy_{to,from}_user().
I based it on rmk's set of arguments to his functions; I'm hoping for
feedback (or another API/implementation) from him and hopefully at
least one other arch maintainer having problems in this area. I'm
hoping to focus mostly on the driver sweep, and to devolve e.g. finer
details of the design like the above arguments and/or structures to
those with more detailed knowledge or direct experience (and the
broader details came from elsewhere too).
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:34 ` Andrea Arcangeli
@ 2004-03-22 3:05 ` Linus Torvalds
0 siblings, 0 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-22 3:05 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, linux-kernel
On Mon, 22 Mar 2004, Andrea Arcangeli wrote:
>
> On Sun, Mar 21, 2004 at 11:58:54PM +0000, Russell King wrote:
> > On Sun, Mar 21, 2004 at 03:51:31PM -0800, Linus Torvalds wrote:
> > > That might be the minimal fix, since it would basically involve:
> > > - change whatever offensive "virt_to_page()" calls into
> > > "dma_map_to_page()".
> > > - implement "dma_map_to_page()" for all architectures.
> > >
> > > Would that make people happy?
> >
> > Unfortunately this doesn't make dwmw2 happy - he claims to have machines
> > which implement dma_alloc_coherent using RAM which doesn't have any
> > struct page associated with it.
>
> I would suggest to add a ->nopage_dma (or whatever other name for an
> additional callback in the vm_ops) that will return a non pageable "pfn"
No.
Fix the broken architecture instead.
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:02 ` David Woodhouse
@ 2004-03-22 3:28 ` Linus Torvalds
0 siblings, 0 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-22 3:28 UTC (permalink / raw)
To: David Woodhouse
Cc: Jeff Garzik, Russell King, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Mon, 22 Mar 2004, David Woodhouse wrote:
>
> You are assuming that dma_alloc_coherent() will always return memory of
> that second kind -- host-side RAM, not PCI-side. That hasn't previously
> been a requirement, and there are machines out there on which it makes a
> lot more sense for dma_alloc_coherent() to use some SRAM which happens
> to be hanging off the I/O bus than it does to use host RAM.
So? Those architectures can just allocate "struct page" entries for that
memory too.
There is a point where we should not care about idiotic architectures any
more. We should care about what happens in 99% of all architectures, and
the rest get to work around their _own_ quirks. We do not make the VM
uglier for some insane "it can happen" case.
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:29 ` Jeff Garzik
2004-03-22 1:28 ` William Lee Irwin III
@ 2004-03-22 3:45 ` William Lee Irwin III
2004-03-22 4:41 ` James Bottomley
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 3:45 UTC (permalink / raw)
To: linux-arch, Jeff Garzik
Cc: rmk, Linus Torvalds, David Woodhouse, Christoph Hellwig,
Andrew Morton, Andrea Arcangeli
Sorry about the top posting and long quote; I wanted to fully quote the
API under discussion while getting the central issues aired in the first
few lines. The suggested dma_scatterlist structure, for the API
proposed below, was:
struct dma_scatterlist {
dma_addr_t dma_addr; /* DMA address */
void *cpu_addr; /* cpu address */
unsigned long length; /* in units of pages */
};
What we're trying to resolve here is drivers supporting ->mmap() doing
virt_to_page() on the results of dma_alloc_coherent() and other things
they shouldn't, and so passing back bogus page pointers as the return
value from ->nopage(), and having no method of resolving it due to the
fact mem_map[] may not cover the area referred to and there is no
portable method for reliably determining pfn's or other information
necessary even to establish mappings by hand. I think it's worth noting
that (according to rmk) ->cpu_addr may not be in any way relevant to
RAM, pfn's, or virtual mappings (I'm not actually sure what it is)
and has to be treated as arch-private otherwise-opaque data.
The way this is expected to solve the problem is by providing a method
for the arch to establish mappings of these areas not reliant on struct
page or fault handling. That is, these functions prefault the areas
into the process address space, thus insulating the core from the
details of fault handling on these areas and eliminating fault handling
on these areas altogether.
I tried to translate a function prototype for prefaulting these areas
into userspace that rmk gave as an example into a full set of operations
based on his proposed piece of the API. So what I'm looking for here
is to find out whether this is good enough for all of the various
arches, and if not, how we can get something together that will fix the
bugs in these drivers that will work portably.
jgarzik's comments on suitability for sound drivers follow the API
itself.
William Lee Irwin III wrote:
>>int dma_mmap_coherent_sg(struct dma_scatterlist *sglist,
>> int nr_sglist_elements, /* length of sglist */
>> struct vm_area_struct *vma, /* for address space */
>> unsigned long address, /* user virtual
>> address */
>> unsigned long offset, /* offset (in pages) */
>> unsigned long nr_pages); /* length (in pages) */
>>
>>int dma_munmap_coherent_sg(struct dma_scatterlist *sglist,
>> int nr_sglist_elements, /* length of sglist */
>> struct vm_area_struct *vma, /* for address space */
>> unsigned long address, /* user virtual
>> address */
>> unsigned long offset, /* offset (in pages) */
>> unsigned long nr_pages); /* length (in pages) */
>>
>>int dma_alloc_coherent_sg(struct dma_scatterlist **sglist,
>> unsigned long length); /* length in pages */
>>
>>int dma_free_coherent_sg(struct dma_scatterlist **sglist,
>> unsigned long length); /* length in pages */
Where it was proposed that these would be helper functions that sit
atop primitive functions like:
int dma_mmap_coherent(struct vm_area_struct *vma,
unsigned long address,
dma_addr_t dma_addr, /* DMA address */
void *cpu_addr, /* cpu address */
unsigned long nr_pages); /* length (in pages) */
int dma_munmap_coherent(struct vm_area_struct *vma,
unsigned long address,
dma_addr_t dma_addr, /* DMA address */
void *cpu_addr, /* cpu address */
unsigned long nr_pages); /* length (in pages) */
jgarzik's assessment was:
On Sun, Mar 21, 2004 at 07:29:59PM -0500, Jeff Garzik wrote:
> No comment on struct dma_scatterlist, but the above is the most natural
> API for audio drivers at least.
> Audio drivers allocate buffers at ->probe() or open(2), and the only
> entity that actually cares about the contents of the buffers are (a) the
> hardware and (b) userland. via82cxxx_audio only uses
> pci_alloc_consistent because there's not a more appropriate DMA
> allocator for the use to which that memory is put.
> Audio drivers only need to read/write the buffers inside the kernel when
> implementing read(2) and write(2) via copy_{to,from}_user().
One thing that concerns me about this is that jgarzik seems to be
saying that via82cxxx_audio's needs aren't covered, so some alteration
to accommodate it may be necessary.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 3:45 ` William Lee Irwin III
@ 2004-03-22 4:41 ` James Bottomley
2004-03-22 4:46 ` William Lee Irwin III
2004-03-22 9:30 ` Russell King
0 siblings, 2 replies; 105+ messages in thread
From: James Bottomley @ 2004-03-22 4:41 UTC (permalink / raw)
To: William Lee Irwin III
Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Sun, 2004-03-21 at 22:45, William Lee Irwin III wrote:
> What we're trying to resolve here is drivers supporting ->mmap() doing
> virt_to_page() on the results of dma_alloc_coherent() and other things
> they shouldn't, and so passing back bogus page pointers as the return
> value from ->nopage(), and having no method of resolving it due to the
> fact mem_map[] may not cover the area referred to and there is no
> portable method for reliably determining pfn's or other information
> necessary even to establish mappings by hand. I think it's worth noting
> that (according to rmk) ->cpu_addr may not be in any way relevant to
> RAM, pfn's, or virtual mappings (I'm not actually sure what it is)
> and has to be treated as arch-private otherwise-opaque data.
Hang on a minute, what makes you think it's legal in any way shape or
form to construct a user mapping for a coherent area?
Such an entity, if it were made, wouldn't follow the rules for normal
mmaps.
Let me illustrate what would go wrong on parisc: we have a VIPT cache
and the concept of an address space. This means that when we allocate
coherent memory, we mean it will *only* be coherent with respect to the
single specified address space (which is currently the kernel). We have
to make this explicit in the iommu by programming a so called coherence
index for each IOMMU pte (which tells the CPU's cache which line to
flush when the device writes to this address). Thus, if you mmap our
coherent memory and the device does a write to this memory, the write
will not be seen by the user if the users address space has a cache
entry for it already.
Therefore, a user trying to make use of a coherent area mmap would have
to flush/invalidate everything all the time just to try to make sure
they weren't missing device updates (because we have no mechanism for
the kernel to know the data has changed and call flush_dcache_page).
James
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 19:44 ` Jaroslav Kysela
2004-03-20 22:23 ` Russell King
@ 2004-03-22 4:43 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 4:43 UTC (permalink / raw)
To: Jaroslav Kysela; +Cc: Russell King, LKML
On Sun, 2004-03-21 at 06:44, Jaroslav Kysela wrote:
> On Sat, 20 Mar 2004, Russell King wrote:
>
> > It is well known that virt_to_page() is only valid on virtual addresses
> > which correspond to kernel direct mapped RAM pages, and undefined on
> > everything else. Unfortunately, ALSA has been using it with
> > pci_alloc_consistent() for a long time, and this behaviour is what
> > makes ALSA broken. The fact it works on x86 is merely incidental.
>
> It works on PPC as well (at least we have no error reports).
It works only on PPCs that have consistent IOs, not embedded ones which
will use special mappings for getting non-cacheable pages.
> Yes, I'm sorry about that, but the ->nopage usage was requested by Jeff
> Garzik and we're not gurus for the VM stuff. Because we're probably first
> starting using of this mapping scheme, it resulted to problems.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 4:41 ` James Bottomley
@ 2004-03-22 4:46 ` William Lee Irwin III
2004-03-22 4:56 ` James Bottomley
2004-03-22 9:30 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 4:46 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote:
> Hang on a minute, what makes you think it's legal in any way shape or
> form to construct a user mapping for a coherent area?
> Such an entity, if it were made, wouldn't follow the rules for normal
> mmaps.
Okay, this is bad news for sound (and possibly some graphics) drivers
on PA-RISC, since this mapping of coherent areas into userspace is
exactly what they're trying to do for the device interfaces they
export to the user.
Are you seeing breakage there, or are the drivers doing this
unused on PA-RISC?
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 0:22 ` Zwane Mwaikambo
@ 2004-03-22 4:46 ` Benjamin Herrenschmidt
2004-03-22 18:23 ` Richard Curnow
0 siblings, 1 reply; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 4:46 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: Russell King, William Lee Irwin III, Jaroslav Kysela,
Linus Torvalds, LKML
> Doesn't DRI also suffer from the same issues?
Well, it depends. Most of the time, DRI uses AGP which is a different
story altogether.
DRI suffers from similar issue when using PCI GART, but then, it also
doesn't use the consistent alloc routines, it gets pages with GFP,
mmap those into userland, and does pci_map_single in the kernel on
each individual page to obtain the bus addresses. This will not be
pretty on non-coherent architectures though.
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 4:46 ` William Lee Irwin III
@ 2004-03-22 4:56 ` James Bottomley
2004-03-22 5:26 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 105+ messages in thread
From: James Bottomley @ 2004-03-22 4:56 UTC (permalink / raw)
To: William Lee Irwin III
Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Sun, 2004-03-21 at 23:46, William Lee Irwin III wrote:
> Okay, this is bad news for sound (and possibly some graphics) drivers
> on PA-RISC, since this mapping of coherent areas into userspace is
> exactly what they're trying to do for the device interfaces they
> export to the user.
>
> Are you seeing breakage there, or are the drivers doing this
> unused on PA-RISC?
Well, our older sound drivers have never worked since ALSA (they hang
off the GSC bus which ALSA doesn't have an abstraction for). Mostly we
use serial console, and a HP specific thing called a STI framebuffer for
video.
The problems I describe only occur if you try to mmap coherent memory.
mmaping streaming memory is fine.
But, I would expect that any arch with a virtually indexed cache would
have similar problems: there may be many address aliases in the cache
and the DMA controller probably only knows about one of them.
James
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 0:20 ` Russell King
2004-03-22 0:33 ` Jeff Garzik
@ 2004-03-22 4:57 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 4:57 UTC (permalink / raw)
To: Russell King
Cc: Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
Linux Kernel list
> exists because architectures haven't defined their private
> pgprot_writecombine() implementations, preferring instead to add
> to the preprocessor junk instead.
And it's not even about writecombine... Like writecombine is an
attribute of the PCI host bridge on pmacs, not a pgprot, while
cacheability issues cannot be always abstracted the same way on
different archs. Actually, GUARDED could be used for !writecombine
on ppc, but !GUARDED would allow prefetch and out of order IOs....
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 4:56 ` James Bottomley
@ 2004-03-22 5:26 ` Benjamin Herrenschmidt
2004-03-22 11:58 ` Andrea Arcangeli
0 siblings, 1 reply; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 5:26 UTC (permalink / raw)
To: James Bottomley
Cc: William Lee Irwin III, Linux Arch list, Jeff Garzik, Russell King,
Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
Well, I just went over this whole discussion and I think it's just
going to hell.
So here are my 2 cents of suggestions:
- We _WANT_ the ability to map coherent memory to userspace, that's
the normal way to map sound buffers to userland for low latency (though
mapping the actual DMA ptrs is a different matter and is definitely not
working with a bunch of sound interfaces). This is also necessary for
the infiniband/myrinet kind of things. DRI sort-of need that when not
using AGP, AGP itself is a special case but could be considered as
coherent memory in some platforms too (and will be with PCI Express
afaik) etc...
- Some architectures apparently cannot do that (parisc ?)
- Too bad for them... They won't have low latency audio and fast
networking and be done with it. Let's implement a couple of simple
to use (driver-wise) helpers
dma_can_mmap_coherent() -> parisc returns false here
dma_mmap_coherent()
dma_mmap_coherent_sg()
And be done with it. I don't see where is the debate here ? The
API takes the same sglist as used for dma_map_sg, I don't see the
point of anything different, I agree with linus that it's not worth
even thinking about not having struct page here.
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-20 22:26 ` Russell King
2004-03-20 22:45 ` William Lee Irwin III
@ 2004-03-22 6:36 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 6:36 UTC (permalink / raw)
To: rmk, Andrew Morton, Andrea Arcangeli, linux-kernel, torvalds
On Sat, Mar 20, 2004 at 10:26:39PM +0000, Russell King wrote:
> I'm no longer planning on this. In fact, I see a future where I tell
> people who want to use sound on ARM to go screw themselves because
> there doesn't seem to be an acceptable solution to this problem.
> Of course, this will lead to dirty hacks by many people who *REQUIRE*
> sound to work, but I guess we just don't care about that.
> (Yes, I'm pissed off over this issue.)
I was really hoping I could help make things work for everyone
because they are not working for everyone now.
Unfortunately, now I also see a future without working sound drivers on
ARM and several others. I'm sorry. I tried, but I've completely run out
of ideas (hell, most of them weren't even mine to begin with, so that
goes back to even before I gave up) and thus far every possible method
of fixing this has been shot down.
... and here I thought fixing drivers was the Right Thing to Do (TM).
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 4:41 ` James Bottomley
2004-03-22 4:46 ` William Lee Irwin III
@ 2004-03-22 9:30 ` Russell King
2004-03-22 15:04 ` James Bottomley
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-22 9:30 UTC (permalink / raw)
To: James Bottomley
Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote:
> Let me illustrate what would go wrong on parisc: we have a VIPT cache
> and the concept of an address space.
Is it not the case that VIPT caches are coloured, and mapping a page
into the appropriate place results in the same virtual index for both?
If this isn't true, this means that SHM is also broken on PARISC since
there is no value of SHMLBA which makes SHM mappings coherent with
each other.
> Therefore, a user trying to make use of a coherent area mmap would have
> to flush/invalidate everything all the time just to try to make sure
> they weren't missing device updates (because we have no mechanism for
> the kernel to know the data has changed and call flush_dcache_page).
Unfortunately, there is a class of drivers where mmaping a large DMA
buffer into user space makes sense. These are video capture and
sound drivers.
By saying that "we can't support DMA coherent mmap" you're forcing
driver writers to write their own DMA coherent mmap implementations,
which they _have_ done already, and they've screwed up the interfaces
such that it only works on x86 today.
What I want is an interface which allows most of the architectures
which are capable of doing this to indeed do this. Those which can't
should fail the mmap attempt. It has to be said that by doing this
we're actually better off - more drivers work across more platforms
and we have a well defined failure mode for platforms where it doesn't
work.
If those platforms want to use those drivers, they aren't actually in
a worse situation - they had to find some way to work around this
before now, and they still have to find some way to work around this
afterwards, or maybe decide that the subset of drivers which need
this are incompatible with the architecture.
However, please don't prevent all architectures from being able to
use these drivers just because a small number can't.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 5:26 ` Benjamin Herrenschmidt
@ 2004-03-22 11:58 ` Andrea Arcangeli
2004-03-22 12:05 ` Russell King
0 siblings, 1 reply; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-22 11:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: James Bottomley, William Lee Irwin III, Linux Arch list,
Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Mon, Mar 22, 2004 at 04:26:29PM +1100, Benjamin Herrenschmidt wrote:
> Well, I just went over this whole discussion and I think it's just
> going to hell.
>
> So here are my 2 cents of suggestions:
>
> - We _WANT_ the ability to map coherent memory to userspace, that's
> the normal way to map sound buffers to userland for low latency (though
> mapping the actual DMA ptrs is a different matter and is definitely not
> working with a bunch of sound interfaces). This is also necessary for
> the infiniband/myrinet kind of things. DRI sort-of need that when not
> using AGP, AGP itself is a special case but could be considered as
> coherent memory in some platforms too (and will be with PCI Express
> afaik) etc...
>
> - Some architectures apparently cannot do that (parisc ?)
>
> - Too bad for them... They won't have low latency audio and fast
> networking and be done with it. Let's implement a couple of simple
> to use (driver-wise) helpers
>
> dma_can_mmap_coherent() -> parisc returns false here
> dma_mmap_coherent()
> dma_mmap_coherent_sg()
>
> And be done with it. I don't see where is the debate here ? The
> API takes the same sglist as used for dma_map_sg, I don't see the
> point of anything different, I agree with linus that it's not worth
> even thinking about not having struct page here.
I like your three functions and the clear decription.
The only reason I believe a paging mechanism would been nicer, is that
it would avoid latencies in dma_mmap_coherent (not necessairly scheduler
latencies, but you would pay all the cost of the pagetables immediatly
during the mmap syscall, so if you've to map gigs of ram that would tend
to hang the task doing the mmap a little bit, I found it nicer to use
the paging for this so we also only allocate the memory for the
pagetables that we need, but OTOH Linus's right that in most cases it
doesn't worth a single branch in a fast path).
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 11:58 ` Andrea Arcangeli
@ 2004-03-22 12:05 ` Russell King
2004-03-22 12:34 ` Andrea Arcangeli
0 siblings, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-22 12:05 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III,
Linux Arch list, Jeff Garzik, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Mon, Mar 22, 2004 at 12:58:07PM +0100, Andrea Arcangeli wrote:
> The only reason I believe a paging mechanism would been nicer, is that
> it would avoid latencies in dma_mmap_coherent (not necessairly scheduler
> latencies, but you would pay all the cost of the pagetables immediatly
> during the mmap syscall, so if you've to map gigs of ram that would tend
> to hang the task doing the mmap a little bit, I found it nicer to use
> the paging for this so we also only allocate the memory for the
> pagetables that we need, but OTOH Linus's right that in most cases it
> doesn't worth a single branch in a fast path).
However, if you go on to read what Linus said later, he seems to be saying
that we can guarantee that dma_alloc_coherent() will be backed by memory
which has page structures associated with it. This means that we _can_
use the ->nopage function for the DMA coherent implementation after all.
However, it isn't useful for the PCI device-side buffer case, which would
need to be handled via remap_page_range().
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 12:05 ` Russell King
@ 2004-03-22 12:34 ` Andrea Arcangeli
0 siblings, 0 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-22 12:34 UTC (permalink / raw)
To: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III,
Linux Arch list, Jeff Garzik, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Mon, Mar 22, 2004 at 12:05:37PM +0000, Russell King wrote:
> However, it isn't useful for the PCI device-side buffer case, which would
> need to be handled via remap_page_range().
one could allocate page_t for the PCI device-side buffer case too (with
discontigmem to avoid a terrible waste), but it would still be a small
waste. So for non-ram it's better you always map all ptes during ->mmap
and you avoid the page faults like with remap_file_pages than to
allocate a page_t for non-ram ranges with discontigmem.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 9:30 ` Russell King
@ 2004-03-22 15:04 ` James Bottomley
2004-03-22 15:15 ` Russell King
0 siblings, 1 reply; 105+ messages in thread
From: James Bottomley @ 2004-03-22 15:04 UTC (permalink / raw)
To: Russell King
Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Mon, 2004-03-22 at 04:30, Russell King wrote:
> On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote:
> > Let me illustrate what would go wrong on parisc: we have a VIPT cache
> > and the concept of an address space.
>
> Is it not the case that VIPT caches are coloured, and mapping a page
> into the appropriate place results in the same virtual index for both?
Not coloured exactly since the caches are associative, but we have a
congruence modulus. As long as two virtual addresses are equal modulo
this, the cache will detect and unify virtual aliasing (basically it
assigns the addresses the same coherence index). So, as long as the
proposed API gives the arch control over where in the user vm the
mapping goes, we would be able to accommodate it.
However, my understanding of the API was that you *already* had a vm
range and were trying to place a coherently mapped page into it.
> However, please don't prevent all architectures from being able to
> use these drivers just because a small number can't.
I don't believe I was. I was merely pointing out the problems as I saw
them with mmap'ing a coherent memory area.
James
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 15:04 ` James Bottomley
@ 2004-03-22 15:15 ` Russell King
2004-03-22 15:27 ` James Bottomley
0 siblings, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-22 15:15 UTC (permalink / raw)
To: James Bottomley
Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Mon, Mar 22, 2004 at 10:04:23AM -0500, James Bottomley wrote:
> On Mon, 2004-03-22 at 04:30, Russell King wrote:
> > On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote:
> > > Let me illustrate what would go wrong on parisc: we have a VIPT cache
> > > and the concept of an address space.
> >
> > Is it not the case that VIPT caches are coloured, and mapping a page
> > into the appropriate place results in the same virtual index for both?
>
> Not coloured exactly since the caches are associative, but we have a
> congruence modulus. As long as two virtual addresses are equal modulo
> this, the cache will detect and unify virtual aliasing (basically it
> assigns the addresses the same coherence index). So, as long as the
> proposed API gives the arch control over where in the user vm the
> mapping goes, we would be able to accommodate it.
>
> However, my understanding of the API was that you *already* had a vm
> range and were trying to place a coherently mapped page into it.
Correct. However, note that the kernels view of the DMA mapping would
not be accessed in this instance. I guess this still causes you some
problems, though I suspect that given an adequate API, you could
tweak your iommu appropriately.
For example, if we had:
int dma_coherent_mmap(vma, cpuaddr, dmaaddr, size)
then the architecture could do whatever it needed to mmap that address
space. It could:
(a) call remap_page_range() with appropriate pgprot
(b) use a vm_operations_struct interally to fault the pages in,
again using the appropraite pgprot.
(c) disallow the mmap if it is within the architectures rules
(eg, all mmapings are of the same cache colour/congruence
modulus)
(d) adjust whatever hardware for device DMA such that the mapping
is coherent and then do (a) or (b) and/or (c).
(e) disallow the mmap entirely.
I suspect x86, ARM and similar could be either (a) or (b). PA RISC would
be (c) and (d).
Note: I don't see the need for dma_coherent_munmap() - the mappings are
destroyed on process exit, and we should not be freeing the coherent
mapping until the mmap of it has gone - and you get to know this via
the ->release method. However, with (b) an architecture can positively
check that this rule is followed via suitable refcounting and checking
in dma_free_coherent.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 15:15 ` Russell King
@ 2004-03-22 15:27 ` James Bottomley
2004-03-22 21:50 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 105+ messages in thread
From: James Bottomley @ 2004-03-22 15:27 UTC (permalink / raw)
To: Russell King
Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Mon, 2004-03-22 at 10:15, Russell King wrote:
> Correct. However, note that the kernels view of the DMA mapping would
> not be accessed in this instance. I guess this still causes you some
> problems, though I suspect that given an adequate API, you could
> tweak your iommu appropriately.
Ah, well now we're getting into one of the problems with the kernel's
API. Currently we have a two stage approach: the DMA API makes the
kernel space coherent, and then vm APIs make the user spaces coherent.
We could do this exactly as you propose: make the mapping directly
coherent with the user address space and never visible to the kernel and
everything would work correctly. We could do this simply by loading the
user coherency index into the IOMMU ptes on the mapping.
I've already begun thinking that we may want to shift the API to this
model (i.e. have a preferred address space to do DMA operations to).
Even in most filesystem streaming mappings, only one address space
ususally wants to see the data (sharing is the rarity rather than the
rule).
> (a) call remap_page_range() with appropriate pgprot
> (b) use a vm_operations_struct interally to fault the pages in,
> again using the appropraite pgprot.
> (c) disallow the mmap if it is within the architectures rules
> (eg, all mmapings are of the same cache colour/congruence
> modulus)
> (d) adjust whatever hardware for device DMA such that the mapping
> is coherent and then do (a) or (b) and/or (c).
> (e) disallow the mmap entirely.
>
> I suspect x86, ARM and similar could be either (a) or (b). PA RISC would
> be (c) and (d).
Yes, we could probably do (c). Like I said, (d) is a bit of a paradigm
shift for the API, but it's also doable.
> Note: I don't see the need for dma_coherent_munmap() - the mappings are
> destroyed on process exit, and we should not be freeing the coherent
> mapping until the mmap of it has gone - and you get to know this via
> the ->release method. However, with (b) an architecture can positively
> check that this rule is followed via suitable refcounting and checking
> in dma_free_coherent.
I could see a point: since we can only keep one address space coherent,
we cannot allow multiple mmappings of the same region. Thus, processes
would be able to hand off the coherent mmap, but wouldn't be allowed
simultaneously to map. the unmap API would be telling the arch that the
mapping was free to be remapped.
James
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 4:46 ` Benjamin Herrenschmidt
@ 2004-03-22 18:23 ` Richard Curnow
0 siblings, 0 replies; 105+ messages in thread
From: Richard Curnow @ 2004-03-22 18:23 UTC (permalink / raw)
To: LKML
* Benjamin Herrenschmidt <benh@kernel.crashing.org> [2004-03-22]:
> DRI suffers from similar issue when using PCI GART, but then, it also
> doesn't use the consistent alloc routines, it gets pages with GFP,
> mmap those into userland, and does pci_map_single in the kernel on
> each individual page to obtain the bus addresses. This will not be
> pretty on non-coherent architectures though.
... or on platforms where PCI bounce-buffers are being used.
--
Richard \\\ SH-4/SH-5 Core & Debug Architect
Curnow \\\ SuperH (UK) Ltd, Bristol
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 15:27 ` James Bottomley
@ 2004-03-22 21:50 ` Benjamin Herrenschmidt
2004-03-22 22:18 ` Jeff Garzik
0 siblings, 1 reply; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 21:50 UTC (permalink / raw)
To: James Bottomley
Cc: Russell King, William Lee Irwin III, Linux Arch list, Jeff Garzik,
Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
> I could see a point: since we can only keep one address space coherent,
> we cannot allow multiple mmappings of the same region. Thus, processes
> would be able to hand off the coherent mmap, but wouldn't be allowed
> simultaneously to map. the unmap API would be telling the arch that the
> mapping was free to be remapped.
You cannot have the mapping coherent in both kernel and user space ? Hrm,
I'm afraid drivers won't like that. The DRI will definitely be unhappy,
and while I don't think sound drivers need to tap the buffers from the
kernel mapping in normal cases, I'm pretty sure things like infiniband
or myrinet will have a problem too.
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 21:50 ` Benjamin Herrenschmidt
@ 2004-03-22 22:18 ` Jeff Garzik
2004-03-22 22:35 ` William Lee Irwin III
2004-03-22 23:19 ` Russell King
0 siblings, 2 replies; 105+ messages in thread
From: Jeff Garzik @ 2004-03-22 22:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: James Bottomley, Russell King, William Lee Irwin III,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
Benjamin Herrenschmidt wrote:
>>I could see a point: since we can only keep one address space coherent,
>>we cannot allow multiple mmappings of the same region. Thus, processes
>>would be able to hand off the coherent mmap, but wouldn't be allowed
>>simultaneously to map. the unmap API would be telling the arch that the
>>mapping was free to be remapped.
>
>
> You cannot have the mapping coherent in both kernel and user space ? Hrm,
> I'm afraid drivers won't like that. The DRI will definitely be unhappy,
> and while I don't think sound drivers need to tap the buffers from the
> kernel mapping in normal cases, I'm pretty sure things like infiniband
> or myrinet will have a problem too.
You need both kernel and userspace... for audio drivers, mmap(2) is
direct to userspace, but read(2) and write(2) must copy_from_user() into
the allocated DMA area.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 22:18 ` Jeff Garzik
@ 2004-03-22 22:35 ` William Lee Irwin III
2004-03-22 23:57 ` Benjamin Herrenschmidt
2004-03-22 23:19 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-22 22:35 UTC (permalink / raw)
To: Jeff Garzik
Cc: Benjamin Herrenschmidt, James Bottomley, Russell King,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
Benjamin Herrenschmidt wrote:
>> You cannot have the mapping coherent in both kernel and user space ? Hrm,
>> I'm afraid drivers won't like that. The DRI will definitely be unhappy,
>> and while I don't think sound drivers need to tap the buffers from the
>> kernel mapping in normal cases, I'm pretty sure things like infiniband
>> or myrinet will have a problem too.
On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
> You need both kernel and userspace... for audio drivers, mmap(2) is
> direct to userspace, but read(2) and write(2) must copy_from_user() into
> the allocated DMA area.
This is burned into silicon, so supporting it's not an option. Frankly
I think what's best is another device interface for userspace to fall
back to when this coherent userspace mmap() is unimplementable, e.g.
read()/write() on some device node.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 22:18 ` Jeff Garzik
2004-03-22 22:35 ` William Lee Irwin III
@ 2004-03-22 23:19 ` Russell King
2004-03-22 23:35 ` Jeff Garzik
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-22 23:19 UTC (permalink / raw)
To: Jeff Garzik
Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
> You need both kernel and userspace... for audio drivers, mmap(2) is
> direct to userspace, but read(2) and write(2) must copy_from_user() into
> the allocated DMA area.
Not actually true in this case - audio drivers are either mmap() only
or read/write only, never both at the same time.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 23:19 ` Russell King
@ 2004-03-22 23:35 ` Jeff Garzik
2004-03-23 2:26 ` James Bottomley
0 siblings, 1 reply; 105+ messages in thread
From: Jeff Garzik @ 2004-03-22 23:35 UTC (permalink / raw)
To: Russell King
Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
Russell King wrote:
> On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
>
>>You need both kernel and userspace... for audio drivers, mmap(2) is
>>direct to userspace, but read(2) and write(2) must copy_from_user() into
>>the allocated DMA area.
>
>
> Not actually true in this case - audio drivers are either mmap() only
> or read/write only, never both at the same time.
Agreed, but due to OSS dain bramage you can read/write as much as you
like, up until the mmap point, AFAICS. It's much easier for the driver
to allocate one set of buffers, than to allocate a set at open(2), throw
away those allocs at mmap(2) and make new ones.
Jeff
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 22:35 ` William Lee Irwin III
@ 2004-03-22 23:57 ` Benjamin Herrenschmidt
2004-03-23 0:22 ` David Woodhouse
2004-03-23 2:07 ` William Lee Irwin III
0 siblings, 2 replies; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-22 23:57 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Jeff Garzik, James Bottomley, Russell King, Linux Arch list,
Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
> On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
> > You need both kernel and userspace... for audio drivers, mmap(2) is
> > direct to userspace, but read(2) and write(2) must copy_from_user() into
> > the allocated DMA area.
>
> This is burned into silicon, so supporting it's not an option. Frankly
> I think what's best is another device interface for userspace to fall
> back to when this coherent userspace mmap() is unimplementable, e.g.
> read()/write() on some device node.
Exactly. We can implement the simple/nice interface discussed here, and
just not support it on those platforms, they'll have to fall back to
read/write or simply not support those drivers who require that
functionality.
Eventually a nopage variant may be worth for things doing really
large mappings, but I tend to think that when we need to do that mapping
to userland, it is because we need short latencies, which is the opposite
of what a nopage implementation provides, dunno if it's worth the pain
(though it's not _that_ painful).
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 23:57 ` Benjamin Herrenschmidt
@ 2004-03-23 0:22 ` David Woodhouse
2004-03-23 2:07 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: David Woodhouse @ 2004-03-23 0:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: William Lee Irwin III, Jeff Garzik, James Bottomley, Russell King,
Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Tue, 2004-03-23 at 10:57 +1100, Benjamin Herrenschmidt wrote:
> Eventually a nopage variant may be worth for things doing really
> large mappings, but I tend to think that when we need to do that mapping
> to userland, it is because we need short latencies, which is the opposite
> of what a nopage implementation provides, dunno if it's worth the pain
> (though it's not _that_ painful).
Ideally the nopage variant is an implementation detail which the driver
doesn't care about. Latency only counts on the first time each 'page' is
touched, so shouldn't be too much of a problem in general, even for
sound buffers?
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 23:57 ` Benjamin Herrenschmidt
2004-03-23 0:22 ` David Woodhouse
@ 2004-03-23 2:07 ` William Lee Irwin III
2004-03-23 9:28 ` Russell King
2004-03-23 11:35 ` Andrea Arcangeli
1 sibling, 2 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-23 2:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Jeff Garzik, James Bottomley, Russell King, Linux Arch list,
Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton,
Andrea Arcangeli
On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
>> This is burned into silicon, so supporting it's not an option. Frankly
>> I think what's best is another device interface for userspace to fall
>> back to when this coherent userspace mmap() is unimplementable, e.g.
>> read()/write() on some device node.
On Tue, Mar 23, 2004 at 10:57:19AM +1100, Benjamin Herrenschmidt wrote:
> Exactly. We can implement the simple/nice interface discussed here, and
> just not support it on those platforms, they'll have to fall back to
> read/write or simply not support those drivers who require that
> functionality.
> Eventually a nopage variant may be worth for things doing really
> large mappings, but I tend to think that when we need to do that mapping
> to userland, it is because we need short latencies, which is the opposite
> of what a nopage implementation provides, dunno if it's worth the pain
> (though it's not _that_ painful).
More generality in fault handling would be useful in various ways even
beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the
notion. I didn't insist on API but just moved on to trying to push for
a solution to the driver issues to get merged at all so I can get on
with cleaning up drivers using whatever API people want for the solution.
I've already been over every ->nopage() in the kernel once (wrt. what's
been merged anyway; a number of times for other reasons), so I really
think I can do a bit of useful footwork here.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-22 23:35 ` Jeff Garzik
@ 2004-03-23 2:26 ` James Bottomley
0 siblings, 0 replies; 105+ messages in thread
From: James Bottomley @ 2004-03-23 2:26 UTC (permalink / raw)
To: Jeff Garzik
Cc: Russell King, Benjamin Herrenschmidt, William Lee Irwin III,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Mon, 2004-03-22 at 18:35, Jeff Garzik wrote:
> Agreed, but due to OSS dain bramage you can read/write as much as you
> like, up until the mmap point, AFAICS. It's much easier for the driver
> to allocate one set of buffers, than to allocate a set at open(2), throw
> away those allocs at mmap(2) and make new ones.
I didn't say throw the buffers away, merely the mapping.
I think you're looking at this the wrong way. We only get into this
whole mess of being coherent with respect to a single address space if
we don't obey the virtual address congruence modulus rules
As Russell already pointed out, as long as we can force the virtual
addresses of the mappings (that's all mappings, in both the kernel and
in user space) to obey the congruence modulus rules then were home free.
On PA, we already force any mmapping that will be shared (MAP_SHARED) to
obey the congruence rules (we allocate them all at 0 mod 4MB, which is
our congruence modulus) by hijacking arch_get_unmapped_area.
Thus, as long as the sound card application designates its mappings as
MAP_SHARED, we're half way there. The other wrinkle is that we'll have
to allocate the coherent memory *also* on a virtual address of 0 mod
4MB. i.e. if we can be told *before* we hand out the coherent area that
it will be mmapped, we can make it work. This is going to have to be an
extra flag to dma_alloc_coherent() or something.
The wrong thinking is that this is something we can fix at mapping time,
it's not, it's something we have to set up at buffer allocation time.
James
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 2:07 ` William Lee Irwin III
@ 2004-03-23 9:28 ` Russell King
2004-03-23 9:34 ` David Woodhouse
2004-03-23 11:35 ` Andrea Arcangeli
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-23 9:28 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley,
Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote:
> I've already been over every ->nopage() in the kernel once (wrt. what's
> been merged anyway; a number of times for other reasons), so I really
> think I can do a bit of useful footwork here.
Note that currently I have dma_coherent_to_page(), dma_coherent_to_pfn()
and dma_coherent_mmap() (and maybe dma_coherent_munmap()) implemented
here. I'm now taking a back seat in these discussions waiting for one
of them to take centre stage and be the One True chosen method.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 9:28 ` Russell King
@ 2004-03-23 9:34 ` David Woodhouse
2004-03-23 10:04 ` Russell King
0 siblings, 1 reply; 105+ messages in thread
From: David Woodhouse @ 2004-03-23 9:34 UTC (permalink / raw)
To: Russell King
Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik,
James Bottomley, Linux Arch list, Linus Torvalds,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Tue, 2004-03-23 at 09:28 +0000, Russell King wrote:
> On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote:
> > I've already been over every ->nopage() in the kernel once (wrt. what's
> > been merged anyway; a number of times for other reasons), so I really
> > think I can do a bit of useful footwork here.
>
> Note that currently I have dma_coherent_to_page(), dma_coherent_to_pfn()
> and dma_coherent_mmap() (and maybe dma_coherent_munmap()) implemented
> here. I'm now taking a back seat in these discussions waiting for one
> of them to take centre stage and be the One True chosen method.
dma_coherent_m{un,}map() makes most sense to me. Given that it's hard
for some arches to make a 'struct page' available, is there any _reason_
to make them jump through that particular hoop expecting them to provide
a dma_coherent_to_page()?
Populating PTEs on demand through nopage() can be an implementation
detail. You don't have to make 'struct page' available in the generic
API to achieve that optimisation.
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 9:34 ` David Woodhouse
@ 2004-03-23 10:04 ` Russell King
2004-03-23 10:05 ` William Lee Irwin III
2004-03-23 11:29 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-23 10:04 UTC (permalink / raw)
To: David Woodhouse
Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik,
James Bottomley, Linux Arch list, Linus Torvalds,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote:
> Populating PTEs on demand through nopage() can be an implementation
> detail. You don't have to make 'struct page' available in the generic
> API to achieve that optimisation.
Indeed - and this is what my implementation of dma_coherent_mmap() does
on ARM.
Once everyone has decided on a solution, we can then move it forward.
Currently it does look like dma_coherent_mmap() is the one of choice,
so... Are there any remaining objections to it?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 10:04 ` Russell King
@ 2004-03-23 10:05 ` William Lee Irwin III
2004-03-23 11:29 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-23 10:05 UTC (permalink / raw)
To: rmk, David Woodhouse, Benjamin Herrenschmidt, Jeff Garzik,
James Bottomley, Linux Arch list, Linus Torvalds,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote:
>> Populating PTEs on demand through nopage() can be an implementation
>> detail. You don't have to make 'struct page' available in the generic
>> API to achieve that optimisation.
On Tue, Mar 23, 2004 at 10:04:29AM +0000, Russell King wrote:
> Indeed - and this is what my implementation of dma_coherent_mmap() does
> on ARM.
> Once everyone has decided on a solution, we can then move it forward.
> Currently it does look like dma_coherent_mmap() is the one of choice,
> so... Are there any remaining objections to it?
I like dma_coherent_mmap().
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 10:04 ` Russell King
2004-03-23 10:05 ` William Lee Irwin III
@ 2004-03-23 11:29 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-23 11:29 UTC (permalink / raw)
To: Russell King
Cc: David Woodhouse, William Lee Irwin III, Jeff Garzik,
James Bottomley, Linux Arch list, Linus Torvalds,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli
On Tue, 2004-03-23 at 21:04, Russell King wrote:
> On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote:
> > Populating PTEs on demand through nopage() can be an implementation
> > detail. You don't have to make 'struct page' available in the generic
> > API to achieve that optimisation.
>
> Indeed - and this is what my implementation of dma_coherent_mmap() does
> on ARM.
>
> Once everyone has decided on a solution, we can then move it forward.
> Currently it does look like dma_coherent_mmap() is the one of choice,
> so... Are there any remaining objections to it?
Looks fine to me. We may want to refine dma_coherent_alloc() in the
first place though, like introducing a "real" __dma_coherent_alloc()
that takes additional flags and have dma_coherent_alloc() just be
a macro, that way James can pass in flags telling at alloc time that
a given alloc will potentially be mapped to userland (if I understand
James requirements properly).
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 2:07 ` William Lee Irwin III
2004-03-23 9:28 ` Russell King
@ 2004-03-23 11:35 ` Andrea Arcangeli
2004-03-23 11:44 ` William Lee Irwin III
1 sibling, 1 reply; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-23 11:35 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley,
Russell King, Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote:
> On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote:
> >> This is burned into silicon, so supporting it's not an option. Frankly
> >> I think what's best is another device interface for userspace to fall
> >> back to when this coherent userspace mmap() is unimplementable, e.g.
> >> read()/write() on some device node.
>
> On Tue, Mar 23, 2004 at 10:57:19AM +1100, Benjamin Herrenschmidt wrote:
> > Exactly. We can implement the simple/nice interface discussed here, and
> > just not support it on those platforms, they'll have to fall back to
> > read/write or simply not support those drivers who require that
> > functionality.
> > Eventually a nopage variant may be worth for things doing really
> > large mappings, but I tend to think that when we need to do that mapping
> > to userland, it is because we need short latencies, which is the opposite
> > of what a nopage implementation provides, dunno if it's worth the pain
> > (though it's not _that_ painful).
>
> More generality in fault handling would be useful in various ways even
> beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the
> notion. I didn't insist on API but just moved on to trying to push for
my guess is that he doesn't like a branch in the fast path and he thinks
remap_file_pages approch is simpler for drivers to use.
as for the initial page fault mentioned by Benjamin, that's a non issue,
if one prefers to preallocate all the ptes thank to take a page fault
the very first time the pages are touched, I already said some email ago
that one can call mlock on the mapping and there will be not a single
page fault anymore afterwards.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 11:35 ` Andrea Arcangeli
@ 2004-03-23 11:44 ` William Lee Irwin III
2004-03-23 12:34 ` Andrea Arcangeli
0 siblings, 1 reply; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-23 11:44 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley,
Russell King, Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote:
>> More generality in fault handling would be useful in various ways even
>> beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the
>> notion. I didn't insist on API but just moved on to trying to push for
On Tue, Mar 23, 2004 at 12:35:34PM +0100, Andrea Arcangeli wrote:
> my guess is that he doesn't like a branch in the fast path and he thinks
> remap_file_pages approch is simpler for drivers to use.
Hmm. It should move preexisting method calls further up the call chain.
I can't say I have a pressing enough need to pursue it personally
unless it's the way to resolve issues like the one under discussion and
so on. It looks like there's another way that's preferred, so I'm not
looking into it anymore.
On Tue, Mar 23, 2004 at 12:35:34PM +0100, Andrea Arcangeli wrote:
> as for the initial page fault mentioned by Benjamin, that's a non issue,
> if one prefers to preallocate all the ptes thank to take a page fault
> the very first time the pages are touched, I already said some email ago
> that one can call mlock on the mapping and there will be not a single
> page fault anymore afterwards.
mlock actually loops through the fault path, so in a sense it still
requires fault handling on the part of the driver, though AFAICT it can
largely be done by library code. I agree it should be an implementation
detail of dma_mmap_coherent() etc. and pretty much up-front believed
drivers would need code to assist them with fault handling if the did
it, though it didn't originally occur to me that an mmap() function
could install the entire handler on their behalf transparently to them.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 11:44 ` William Lee Irwin III
@ 2004-03-23 12:34 ` Andrea Arcangeli
2004-03-23 12:40 ` Russell King
2004-03-23 12:49 ` William Lee Irwin III
0 siblings, 2 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-23 12:34 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley,
Russell King, Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 03:44:52AM -0800, William Lee Irwin III wrote:
> mlock actually loops through the fault path, so in a sense it still
> requires fault handling on the part of the driver, though AFAICT it can
it requires fault handling of course, but that's just the API with the
driver, on the performance side (and it was the performance/latency side
of the page faults to be complained) no page faults are generated, so
it's not going to be a lot different from the map_sg stuff at runtime.
anyways Linus vetoed the lazy approch so we probably should give it up
(the one thing I like most is to avoid the branch in the fast path).
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 12:34 ` Andrea Arcangeli
@ 2004-03-23 12:40 ` Russell King
2004-03-23 15:25 ` Linus Torvalds
2004-03-25 20:25 ` Russell King
2004-03-23 12:49 ` William Lee Irwin III
1 sibling, 2 replies; 105+ messages in thread
From: Russell King @ 2004-03-23 12:40 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik,
James Bottomley, Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> anyways Linus vetoed the lazy approch so we probably should give it up
> (the one thing I like most is to avoid the branch in the fast path).
I don't think he did - he vetoed adding another special condition to
the fast path, or returning non-RAM pages via ->nopage.
However, I do not believe he has vetoed an architecture implementing
dma_coherent_mmap() in such a way that it uses the ->nopage method,
_provided_ ->nopage returns valid struct pages.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 12:34 ` Andrea Arcangeli
2004-03-23 12:40 ` Russell King
@ 2004-03-23 12:49 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-23 12:49 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley,
Russell King, Linux Arch list, Linus Torvalds, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 03:44:52AM -0800, William Lee Irwin III wrote:
>> mlock actually loops through the fault path, so in a sense it still
>> requires fault handling on the part of the driver, though AFAICT it can
On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> it requires fault handling of course, but that's just the API with the
> driver, on the performance side (and it was the performance/latency side
> of the page faults to be complained) no page faults are generated, so
> it's not going to be a lot different from the map_sg stuff at runtime.
> anyways Linus vetoed the lazy approch so we probably should give it up
> (the one thing I like most is to avoid the branch in the fast path).
dma_mmap_coherent() being implemented via fault handling is unrelated
to ->fault() methods. It just uses the preexisting ->nopage() method
internally and transparently to the driver, and without any hooks
needed in the API either. Basically, however the arch wants to do it
so long as it fits into ->nopage(), doesn't need changes to the core, etc.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 12:40 ` Russell King
@ 2004-03-23 15:25 ` Linus Torvalds
2004-03-23 15:36 ` Andrea Arcangeli
2004-03-25 20:25 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: Linus Torvalds @ 2004-03-23 15:25 UTC (permalink / raw)
To: Russell King
Cc: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, 23 Mar 2004, Russell King wrote:
>
> On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> > anyways Linus vetoed the lazy approch so we probably should give it up
> > (the one thing I like most is to avoid the branch in the fast path).
>
> I don't think he did - he vetoed adding another special condition to
> the fast path, or returning non-RAM pages via ->nopage.
Indeed.
What I _don't_ want is top add a new VM op function pointer as a special
case. I abhor special cases, since they never go away, and end up making
the code really hard to follow.
> However, I do not believe he has vetoed an architecture implementing
> dma_coherent_mmap() in such a way that it uses the ->nopage method,
> _provided_ ->nopage returns valid struct pages.
Yes. For all I care, the "struct page" migth even be dynamically
allocated, or something else very special (eg in a zone of its own that
the rest of the VM never ever actually sees). As long as "page_to_pfn()"
works and does the right thing wrt such pages, that would be fine by me
(ie as long as the VM doesn't need to have any special case code).
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 15:25 ` Linus Torvalds
@ 2004-03-23 15:36 ` Andrea Arcangeli
2004-03-23 15:46 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 105+ messages in thread
From: Andrea Arcangeli @ 2004-03-23 15:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Russell King, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 07:25:31AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 23 Mar 2004, Russell King wrote:
> >
> > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> > > anyways Linus vetoed the lazy approch so we probably should give it up
> > > (the one thing I like most is to avoid the branch in the fast path).
> >
> > I don't think he did - he vetoed adding another special condition to
> > the fast path, or returning non-RAM pages via ->nopage.
>
> Indeed.
note that I was talking about non-ram, obviously ram pages can be
returned via ->nopage and that's what drivers are using already.
I know there is a problem with ram pages too, but as far as the ->nopage
API is concerned the only problem are the non-ram pages. Russell's
problem have nothing to do with ->nopage itself.
> What I _don't_ want is top add a new VM op function pointer as a special
> case. I abhor special cases, since they never go away, and end up making
> the code really hard to follow.
>
> > However, I do not believe he has vetoed an architecture implementing
> > dma_coherent_mmap() in such a way that it uses the ->nopage method,
> > _provided_ ->nopage returns valid struct pages.
>
> Yes. For all I care, the "struct page" migth even be dynamically
> allocated, or something else very special (eg in a zone of its own that
I don't think it's sane to use discontigmem just to make ->nopage work
with non-ram, if one has to use discontigmem just for that then I think
it's much simpler to fill all the pagetables in ->mmap using the pfn w/o
page_t.
> the rest of the VM never ever actually sees). As long as "page_to_pfn()"
zones cannot create holes in the middle of mem_map, only discontigmem
can. I'd expect in most archs to have holes between ram and mmio
regions (at least in various common ram configuration). That's why I
guess discontigmem would be needed for that.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 15:36 ` Andrea Arcangeli
@ 2004-03-23 15:46 ` Linus Torvalds
2004-03-23 15:50 ` Russell King
2004-03-23 22:10 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 105+ messages in thread
From: Linus Torvalds @ 2004-03-23 15:46 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Russell King, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, 23 Mar 2004, Andrea Arcangeli wrote:
> >
> > Yes. For all I care, the "struct page" migth even be dynamically
> > allocated, or something else very special (eg in a zone of its own that
>
> I don't think it's sane to use discontigmem just to make ->nopage work
> with non-ram, if one has to use discontigmem just for that then I think
> it's much simpler to fill all the pagetables in ->mmap using the pfn w/o
> page_t.
Oh, I absolutely agree. My point was that I really don't care how a driver
does things, as long as it does _not_ create any VM special cases. And I
definitely think that for non-RAM pages it tends to make most sense to
just statically set up the mapping at ->mmap() time.
But I'm also saying that if a driver _wants_ to do dynamic mapping for
some really strange architecture reasons, then such an architecture could
choose to have a magic zone or something like that for that case. At that
point it is an _architecture_ special case, which contains the problem
enough that I don't need to care.
That kind of "strange 'struct page'" approach would cover the case where
you really want to have a "struct page" associated with a DMA coherent
allocation, even if such a page would never be part of any _normal_ memory
allocations (and I seriously doubt that any sane architecture would want
to do anything like that, but I could well imagine that some Amiga with
"chip ram" or similar might go this route).
In general, I'd _prefer_ for really special mappings to be as static as
possible. So we should probably aim for having "IO mappings" be set up at
"->mmap()" time if at all possible. The less clever stuff that happens
dynamically, the better, imho.
Linus
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 15:36 ` Andrea Arcangeli
2004-03-23 15:46 ` Linus Torvalds
@ 2004-03-23 15:50 ` Russell King
2004-03-23 22:10 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 105+ messages in thread
From: Russell King @ 2004-03-23 15:50 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Linus Torvalds, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse,
Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 04:36:41PM +0100, Andrea Arcangeli wrote:
> On Tue, Mar 23, 2004 at 07:25:31AM -0800, Linus Torvalds wrote:
> > On Tue, 23 Mar 2004, Russell King wrote:
> > > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> > > > anyways Linus vetoed the lazy approch so we probably should give it up
> > > > (the one thing I like most is to avoid the branch in the fast path).
> > >
> > > I don't think he did - he vetoed adding another special condition to
> > > the fast path, or returning non-RAM pages via ->nopage.
> >
> > Indeed.
>
> note that I was talking about non-ram, obviously ram pages can be
> returned via ->nopage and that's what drivers are using already.
Let's not get distracted into the other problem areas. What we're
talking about here is solving the "how to map memory returned from
dma_alloc_coherent()".
There's the related problem (which Jeff has - via82cxxx_audio.c)
which is effectively a scatter-gather dma_alloc_coherent() +
dma_coherent_mmap() problem.
Then there's the unrelated problem where ALSA wants to map buffers
on PCI devices coherently into user space.
The these are three distinct problems, and we should not confuse
them.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 17:59 ` Andy Whitcroft
@ 2004-03-23 17:58 ` David Woodhouse
2004-03-23 18:11 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: David Woodhouse @ 2004-03-23 17:58 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Russell King, Linus Torvalds, Jeff Garzik, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
On Tue, 2004-03-23 at 17:59 +0000, Andy Whitcroft wrote:
> Would it not be possible to allocate struct page's for these special areas
> of memory? Worst, worst, worst case could they not represent pages in a
> memory only node in the NUMA sense? I am sure there is some way they could
> be 'tacked' onto the end of the cmap in reality?
It would be possible. But why? What benefit do we gain from this
pretence?
Just hide it all from the driver with dma_coherent_mmap() and forget
about it. Let the arch deal with it -- the _common_ case will be that we
use nopage for the actual mapping, perhaps. But why mandate it?
--
dwmw2
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-21 23:58 ` Russell King
2004-03-22 0:34 ` Andrea Arcangeli
@ 2004-03-23 17:59 ` Andy Whitcroft
2004-03-23 17:58 ` David Woodhouse
2004-03-23 18:11 ` William Lee Irwin III
1 sibling, 2 replies; 105+ messages in thread
From: Andy Whitcroft @ 2004-03-23 17:59 UTC (permalink / raw)
To: Russell King, Linus Torvalds
Cc: Jeff Garzik, David Woodhouse, Christoph Hellwig,
William Lee Irwin III, Andrew Morton, Andrea Arcangeli,
linux-kernel
--On 21 March 2004 23:58 +0000 Russell King <rmk+lkml@arm.linux.org.uk>
wrote:
> On Sun, Mar 21, 2004 at 03:51:31PM -0800, Linus Torvalds wrote:
>> That might be the minimal fix, since it would basically involve:
>> - change whatever offensive "virt_to_page()" calls into
>> "dma_map_to_page()".
>> - implement "dma_map_to_page()" for all architectures.
>>
>> Would that make people happy?
>
> Unfortunately this doesn't make dwmw2 happy - he claims to have machines
> which implement dma_alloc_coherent using RAM which doesn't have any
> struct page associated with it.
Would it not be possible to allocate struct page's for these special areas
of memory? Worst, worst, worst case could they not represent pages in a
memory only node in the NUMA sense? I am sure there is some way they could
be 'tacked' onto the end of the cmap in reality?
-apw
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 17:59 ` Andy Whitcroft
2004-03-23 17:58 ` David Woodhouse
@ 2004-03-23 18:11 ` William Lee Irwin III
1 sibling, 0 replies; 105+ messages in thread
From: William Lee Irwin III @ 2004-03-23 18:11 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Russell King, Linus Torvalds, Jeff Garzik, David Woodhouse,
Christoph Hellwig, Andrew Morton, Andrea Arcangeli, linux-kernel
On 21 March 2004 23:58 +0000 Russell King <rmk+lkml@arm.linux.org.uk>
>> Unfortunately this doesn't make dwmw2 happy - he claims to have machines
>> which implement dma_alloc_coherent using RAM which doesn't have any
>> struct page associated with it.
On Tue, Mar 23, 2004 at 05:59:20PM +0000, Andy Whitcroft wrote:
> Would it not be possible to allocate struct page's for these special areas
> of memory? Worst, worst, worst case could they not represent pages in a
> memory only node in the NUMA sense? I am sure there is some way they could
> be 'tacked' onto the end of the cmap in reality?
This has already been beaten to death and resolved. dma_mmap_coherent()
is the preferred solution and will have no reliance on the coremap apart
from requiring it when faults are handled (to feed the core API), and
requiring prefaulting when coremap elements are absent for the mapped
areas. More importantly, it allows sane fallback to read()/write() and
understands the results of dma_alloc_coherent(), which virt_to_page(),
whose current use on dma_alloc_coherent()'s results causes driver bugs,
does not.
-- wli
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 15:36 ` Andrea Arcangeli
2004-03-23 15:46 ` Linus Torvalds
2004-03-23 15:50 ` Russell King
@ 2004-03-23 22:10 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 105+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-23 22:10 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Linus Torvalds, Russell King, William Lee Irwin III, Jeff Garzik,
James Bottomley, Linux Arch list, David Woodhouse,
Christoph Hellwig, Andrew Morton
> zones cannot create holes in the middle of mem_map, only discontigmem
> can. I'd expect in most archs to have holes between ram and mmio
> regions (at least in various common ram configuration). That's why I
> guess discontigmem would be needed for that.
Well, just waste some mem_map or use non-trivial page_to_pfn using
some high bit in the address on those archs. No need for DISCONTIGMEM
for that.
For example, on various ppc64's, there is an IO hole of 1 or 2Gb,
so you have 2 or 3Gb of RAM, then the IO hole, then the rest of RAM,
so far I implement that without using DISCONTIGMEM, just giving the
hole size when initializing the zone. That waste some memmap space,
but that's fine for now (the ppc64 discontigmem code would need
some surgery to be split from the numa stuff for beeing able to use
it).
Ben.
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-23 12:40 ` Russell King
2004-03-23 15:25 ` Linus Torvalds
@ 2004-03-25 20:25 ` Russell King
2004-03-28 10:17 ` Russell King
1 sibling, 1 reply; 105+ messages in thread
From: Russell King @ 2004-03-25 20:25 UTC (permalink / raw)
To: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton
On Tue, Mar 23, 2004 at 12:40:27PM +0000, Russell King wrote:
> On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote:
> > anyways Linus vetoed the lazy approch so we probably should give it up
> > (the one thing I like most is to avoid the branch in the fast path).
>
> I don't think he did - he vetoed adding another special condition to
> the fast path, or returning non-RAM pages via ->nopage.
>
> However, I do not believe he has vetoed an architecture implementing
> dma_coherent_mmap() in such a way that it uses the ->nopage method,
> _provided_ ->nopage returns valid struct pages.
Ok, since this thread seems to have died without much action happening,
its time to re-start it (but note - I probably won't be around tomorrow.)
I'd like to get the dma_coherent_mmap() API sorted out such that everyone
is happy, and we can progress it.
From what I've gathered, we seem to be happy with the dma_coherent_mmap()
approach. Is everyone happy with these prototypes?
int dma_coherent_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size);
and, for the PA-RISC architecture (c/o James Bottomley):
void dma_coherent_munmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size);
where:
- dev: the device for which this coherent region was created for
- vma: VM area struct describing the requested user mapping
- cpu_addr: the address returned from dma_alloc_coherent
- dma_addr: the DMA cookie returned from dma_alloc_coherent
- size: the size of the DMA allocation
As far as ARM goes, we (currently) only need cpu_addr to look up the
data associated with the kernels coherent DMA mapping. Whether the
other arguments are useful depends on what other architectures require.
Is everyone happy with the name, or would people prefer it to be more
consistent with the other dma_xxx_coherent() functions (iow,
dma_mmap_coherent?)
PS, one of my pet annoyances with the DMA API is that dma_alloc_coherent()
doesn't return/take some architecturally defined structure, and that
there aren't accessor macros like dma_cpu_addr() dma_device_addr().
This means that we end up carrying around several bits of data, which
may be the same on some architectures. People objected to this in 2.4,
and we ended up adding that yucky "DECLARE_PCI_UNMAP_ADDR" stuff - which
may happen during 2.6 to the DMA API. Adding these further APIs is just
making this mistake worse IMO. It's really a 2.7 problem though. And
yes, I've just talked people out of the prototypes I've proposed above.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage?
2004-03-25 20:25 ` Russell King
@ 2004-03-28 10:17 ` Russell King
0 siblings, 0 replies; 105+ messages in thread
From: Russell King @ 2004-03-28 10:17 UTC (permalink / raw)
To: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt,
Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds,
David Woodhouse, Christoph Hellwig, Andrew Morton
On Thu, Mar 25, 2004 at 08:25:44PM +0000, Russell King wrote:
> >From what I've gathered, we seem to be happy with the dma_coherent_mmap()
> approach. Is everyone happy with these prototypes?
>
> int dma_coherent_mmap(struct device *dev, struct vm_area_struct *vma,
> void *cpu_addr, dma_addr_t dma_addr, size_t size);
>
> and, for the PA-RISC architecture (c/o James Bottomley):
>
> void dma_coherent_munmap(struct device *dev, struct vm_area_struct *vma,
> void *cpu_addr, dma_addr_t dma_addr, size_t size);
I'm not happy with dma_coherent_munmap() actually - we don't really
know the lifetime of the vma, so drivers should not be tempted into
keeping a reference to it.
Since interest in this subject appears to have dropped to zero (as
can be seen from the numerous (0) responses to my last post) it is
my intention to provide just the dma_mmap_coherent interface and
let PA-RISC people figure out how to handle their architecture.
I'm shortly going to post a couple of patches to support
dma_coherent_mmap() on x86 and ARM on linux-arch. Could other
architectures follow up with their patches please?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 105+ messages in thread
end of thread, other threads:[~2004-03-28 10:18 UTC | newest]
Thread overview: 105+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-20 13:30 can device drivers return non-ram via vm_ops->nopage? Andrea Arcangeli
2004-03-20 14:40 ` William Lee Irwin III
2004-03-20 15:06 ` Andrea Arcangeli
2004-03-20 15:27 ` William Lee Irwin III
2004-03-20 15:44 ` Russell King
2004-03-20 15:57 ` Andrea Arcangeli
2004-03-20 16:15 ` Russell King
2004-03-20 16:25 ` Andrea Arcangeli
2004-03-20 16:57 ` William Lee Irwin III
2004-03-20 17:48 ` Andrea Arcangeli
2004-03-20 19:03 ` Andrea Arcangeli
2004-03-20 15:58 ` Jaroslav Kysela
2004-03-20 16:09 ` Russell King
2004-03-20 19:44 ` Jaroslav Kysela
2004-03-20 22:23 ` Russell King
2004-03-20 22:45 ` William Lee Irwin III
2004-03-20 23:54 ` Russell King
2004-03-21 0:22 ` Zwane Mwaikambo
2004-03-22 4:46 ` Benjamin Herrenschmidt
2004-03-22 18:23 ` Richard Curnow
2004-03-21 0:23 ` William Lee Irwin III
2004-03-21 9:52 ` Arjan van de Ven
2004-03-21 10:39 ` Jaroslav Kysela
2004-03-22 4:43 ` Benjamin Herrenschmidt
2004-03-20 20:13 ` Andrew Morton
2004-03-20 20:28 ` Andrea Arcangeli
2004-03-20 20:50 ` William Lee Irwin III
2004-03-20 22:26 ` Russell King
2004-03-20 22:45 ` William Lee Irwin III
2004-03-21 20:45 ` David Woodhouse
2004-03-21 20:49 ` Christoph Hellwig
2004-03-21 20:57 ` David Woodhouse
2004-03-21 21:53 ` Linus Torvalds
2004-03-21 22:17 ` Jeff Garzik
2004-03-21 22:23 ` David Woodhouse
2004-03-21 22:23 ` Russell King
2004-03-21 22:34 ` Jeff Garzik
2004-03-21 22:42 ` David Woodhouse
2004-03-21 23:06 ` Jeff Garzik
2004-03-21 22:51 ` Russell King
2004-03-21 23:09 ` Jeff Garzik
2004-03-21 23:11 ` Linus Torvalds
2004-03-21 23:22 ` Jeff Garzik
2004-03-21 23:51 ` Linus Torvalds
2004-03-21 23:58 ` Russell King
2004-03-22 0:34 ` Andrea Arcangeli
2004-03-22 3:05 ` Linus Torvalds
2004-03-23 17:59 ` Andy Whitcroft
2004-03-23 17:58 ` David Woodhouse
2004-03-23 18:11 ` William Lee Irwin III
2004-03-22 0:02 ` David Woodhouse
2004-03-22 3:28 ` Linus Torvalds
2004-03-22 0:10 ` Jeff Garzik
2004-03-22 0:20 ` Russell King
2004-03-22 0:33 ` Jeff Garzik
2004-03-22 4:57 ` Benjamin Herrenschmidt
2004-03-21 23:45 ` Russell King
2004-03-22 0:23 ` William Lee Irwin III
2004-03-22 0:29 ` Jeff Garzik
2004-03-22 1:28 ` William Lee Irwin III
2004-03-22 3:45 ` William Lee Irwin III
2004-03-22 4:41 ` James Bottomley
2004-03-22 4:46 ` William Lee Irwin III
2004-03-22 4:56 ` James Bottomley
2004-03-22 5:26 ` Benjamin Herrenschmidt
2004-03-22 11:58 ` Andrea Arcangeli
2004-03-22 12:05 ` Russell King
2004-03-22 12:34 ` Andrea Arcangeli
2004-03-22 9:30 ` Russell King
2004-03-22 15:04 ` James Bottomley
2004-03-22 15:15 ` Russell King
2004-03-22 15:27 ` James Bottomley
2004-03-22 21:50 ` Benjamin Herrenschmidt
2004-03-22 22:18 ` Jeff Garzik
2004-03-22 22:35 ` William Lee Irwin III
2004-03-22 23:57 ` Benjamin Herrenschmidt
2004-03-23 0:22 ` David Woodhouse
2004-03-23 2:07 ` William Lee Irwin III
2004-03-23 9:28 ` Russell King
2004-03-23 9:34 ` David Woodhouse
2004-03-23 10:04 ` Russell King
2004-03-23 10:05 ` William Lee Irwin III
2004-03-23 11:29 ` Benjamin Herrenschmidt
2004-03-23 11:35 ` Andrea Arcangeli
2004-03-23 11:44 ` William Lee Irwin III
2004-03-23 12:34 ` Andrea Arcangeli
2004-03-23 12:40 ` Russell King
2004-03-23 15:25 ` Linus Torvalds
2004-03-23 15:36 ` Andrea Arcangeli
2004-03-23 15:46 ` Linus Torvalds
2004-03-23 15:50 ` Russell King
2004-03-23 22:10 ` Benjamin Herrenschmidt
2004-03-25 20:25 ` Russell King
2004-03-28 10:17 ` Russell King
2004-03-23 12:49 ` William Lee Irwin III
2004-03-22 23:19 ` Russell King
2004-03-22 23:35 ` Jeff Garzik
2004-03-23 2:26 ` James Bottomley
2004-03-22 6:36 ` William Lee Irwin III
2004-03-20 17:39 ` Linus Torvalds
2004-03-20 17:56 ` Andrea Arcangeli
2004-03-20 18:22 ` William Lee Irwin III
2004-03-21 3:13 ` Chris Wedgwood
2004-03-21 6:23 ` Christoph Hellwig
2004-03-21 7:00 ` Chris Wedgwood
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.