* consistent_alloc() revisited
@ 2002-07-16 2:13 David Gibson
2002-07-16 14:09 ` Dan Malek
2002-07-20 15:22 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 11+ messages in thread
From: David Gibson @ 2002-07-16 2:13 UTC (permalink / raw)
To: linuxppc-embedded
[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]
A little while back Tom Rini suggested that we implement the various
tricks ARM uses in its version of consistent_alloc() in our version.
I've made a start on that, but there are some complications.
There are basically two things that ARM does and we don't:
1. Frees unused pages in the allocation, if the requested size
isn't a power of two pages
2. Marks the pages Reserved
In addition, the following seem to me like desirable (to a greater or
lesser extent) properties for consistent_alloc():
a. one of the return values can be used as a single "handle"
so that consistent_free() only needs one parameter.
b. works on both cache coherent and non cache coherent
machines
c. be able to use highmem pages
(1) is certainly desirable and the attached patch implements it (for
2.5). I think the patch may also correct some potential bugs in the
current implementation - I'm not 100% convinced the current
implementation can't leak memory if map_page() fails at the wrong
time. I'll commit this if no-one has objections.
(2) is a little more complicated. The reason asserted in the comments
in the ARM code is so that remap_page_range() works. However, the
only code I can see that would call remap_page_range() on memory which
came from consistent_alloc() (actually from pci_alloc_consistent()) is
in some (apparently x86 specific) sound drivers. [actually these are
broken on PPC (2.5) anyway since they use virt_to_bus() on the virtual
address returned by pci_alloc_consistent() but that's easily fixed].
These drivers set and clear PageReserved themselves, so they're
clearly not expecting pci_alloc_consistent() to return Reserved pages.
Arguably setting PageReserved is the Right Thing, since
consistent_alloc() memory obviously shouldn't be swapped or paged.
However, it's only relevant if the memory is mapped into userspace,
and then a driver can set VM_IO on the vma to prevent this.
Furthermore, at the moment it is diffuclt for a general driver to
correctly map consistent_alloc() memory into userspace, since it must
set the flags on the new mappings correctly to disable caching on
machines which aren't DMA consistent. At the moment that means the
driver must be (a) arch-specific and (b) have an ifdef on
CONFIG_NOT_COHERENT_CACHE or various different options for some of the
other archs.
Therefore, I think a _PAGE_DMA_CONSISTENT page flag would be a useful
thing. Patches probably forthcoming.
We currently have (a) by use of vfree() on the virtual address, we
don't have (b) or (c). Some earlier patches of mine added (b), but at
the expense of (a). paulus pointer out however, that we can add (b)
while keeping (a) by having consistent_alloc() remap the memory even
on cache coherent machines - it will just omit the cache flush and the
_PAGE_NO_CACHE flag. A _PAGE_DMA_CONSISTENT would make this even
easier. As a bonus, having this remapping gives us (c) almost for
free.
Comments? Objections?
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
[-- Attachment #2: patch0 --]
[-- Type: text/plain, Size: 2888 bytes --]
diff -urN /home/dgibson/kernel/linuxppc-2.5/arch/ppc/mm/cachemap.c linux-bluefish/arch/ppc/mm/cachemap.c
--- /home/dgibson/kernel/linuxppc-2.5/arch/ppc/mm/cachemap.c Fri Jun 28 10:40:59 2002
+++ linux-bluefish/arch/ppc/mm/cachemap.c Mon Jul 15 16:20:39 2002
@@ -61,57 +61,69 @@
*/
void *consistent_alloc(int gfp, size_t size, dma_addr_t *dma_handle)
{
- int order, err, i;
- unsigned long page, va, pa, flags;
- struct vm_struct *area;
- void *ret;
+ int order, err;
+ struct page *page, *free, *end;
+ unsigned long pa, flags, offset;
+ struct vm_struct *area = NULL;
+ unsigned long va = 0;
- if (in_interrupt())
- BUG();
+ BUG_ON(in_interrupt());
- /* Only allocate page size areas.
- */
+ /* Only allocate page size areas */
size = PAGE_ALIGN(size);
order = get_order(size);
- page = __get_free_pages(gfp, order);
- if (!page) {
- BUG();
+ free = page = alloc_pages(gfp, order);
+ if (! page)
return NULL;
- }
+
+ pa = page_to_phys(page);
+ *dma_handle = page_to_bus(page);
+ end = page + (1 << order);
/*
* we need to ensure that there are no cachelines in use,
* or worse dirty in this area.
*/
- invalidate_dcache_range(page, page + size);
+ invalidate_dcache_range((unsigned long)page_address(page),
+ (unsigned long)page_address(page) + size);
- /* Allocate some common virtual space to map the new pages.
- */
+ /*
+ * alloc_pages() expects the block to be handled as a unit, so
+ * it only sets the page count on the first page. We set the
+ * counts on each page so they can be freed individually
+ */
+ for (; page < end; page++)
+ set_page_count(page, 1);
+
+
+ /* Allocate some common virtual space to map the new pages*/
area = get_vm_area(size, VM_ALLOC);
- if (area == 0) {
- free_pages(page, order);
- return NULL;
- }
- va = VMALLOC_VMADDR(area->addr);
- ret = (void *)va;
+ if (! area)
+ goto out;
- /* This gives us the real physical address of the first page.
- */
- *dma_handle = pa = virt_to_bus((void *)page);
+ va = VMALLOC_VMADDR(area->addr);
flags = _PAGE_KERNEL | _PAGE_NO_CACHE;
-
- err = 0;
- for (i = 0; i < size && err == 0; i += PAGE_SIZE)
- err = map_page(va+i, pa+i, flags);
- if (err) {
- vfree((void *)va);
- return NULL;
+ for (offset = 0; offset < size; offset += PAGE_SIZE) {
+ err = map_page(va+offset, pa+offset, flags);
+ if (err) {
+ vfree((void *)va);
+ va = 0;
+ goto out;
+ }
+
+ free++;
+ }
+
+ out:
+ /* Free pages which weren't mapped */
+ for (; free < end; free++) {
+ __free_page(free);
}
- return ret;
+ return (void *)va;
}
/*
@@ -119,8 +131,7 @@
*/
void consistent_free(void *vaddr)
{
- if (in_interrupt())
- BUG();
+ BUG_ON(in_interrupt());
vfree(vaddr);
}
@@ -157,6 +168,6 @@
{
unsigned long start;
- start = page_address(page) + offset;
+ start = (unsigned long)page_address(page) + offset;
consistent_sync((void *)start, size, direction);
}
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: consistent_alloc() revisited
2002-07-16 2:13 consistent_alloc() revisited David Gibson
@ 2002-07-16 14:09 ` Dan Malek
2002-07-16 14:58 ` Tom Rini
` (3 more replies)
2002-07-20 15:22 ` Benjamin Herrenschmidt
1 sibling, 4 replies; 11+ messages in thread
From: Dan Malek @ 2002-07-16 14:09 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-embedded
David Gibson wrote:
> In addition, the following seem to me like desirable (to a greater or
> lesser extent) properties for consistent_alloc():
> a. one of the return values can be used as a single "handle"
> so that consistent_free() only needs one parameter.
We want the implementation API to be identical across all architectures.
Until there is a consistent way of handling devices that are on local
processor busses, platforms with these types of devices may call these
functions directly. This is particularly true of some host USB controllers
on embedded systems that don't have PCI busses.
> b. works on both cache coherent and non cache coherent
> machines
The original purpose of the consistent_* functions was to provide support
for processors that aren't cache coherent. The pci_* (and any other functions)
can call these if appropriate, but I still believe the decision should be
made at a higher level. Obviously, these consistent_* functions will
be implemented differently on non- or cache coherent processors, and across
different architectures. The PCI functions (or others) should still do their
thing as they always have, and call these consistent_* functions when
appropriate.
> c. be able to use highmem pages
You think there will be non-coherent processors with this much memory? :-)
> - if (in_interrupt())
> - BUG();
> + BUG_ON(in_interrupt());
We should be able to call these functions from interrupt, let's push
the remainder of the changes though get_free_pages() to make this happen.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: consistent_alloc() revisited
2002-07-16 14:09 ` Dan Malek
@ 2002-07-16 14:58 ` Tom Rini
2002-07-16 15:23 ` Matt Porter
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Tom Rini @ 2002-07-16 14:58 UTC (permalink / raw)
To: Dan Malek; +Cc: David Gibson, linuxppc-embedded, Paul Mackerras
On Tue, Jul 16, 2002 at 10:09:03AM -0400, Dan Malek wrote:
>
> David Gibson wrote:
> >- if (in_interrupt())
> >- BUG();
> >+ BUG_ON(in_interrupt());
>
> We should be able to call these functions from interrupt, let's push
> the remainder of the changes though get_free_pages() to make this happen.
I think the only reason we should keep that like it is, for now, is that
fixing that requires other generic changes. Which Paul was going to ask
Dave M about doing. Paul, any word back from him yet?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: consistent_alloc() revisited
2002-07-16 14:09 ` Dan Malek
2002-07-16 14:58 ` Tom Rini
@ 2002-07-16 15:23 ` Matt Porter
2002-07-16 15:27 ` Matt Porter
2002-07-17 1:24 ` David Gibson
3 siblings, 0 replies; 11+ messages in thread
From: Matt Porter @ 2002-07-16 15:23 UTC (permalink / raw)
To: Dan Malek; +Cc: linuxppc-embedded
On Tue, Jul 16, 2002 at 10:09:03AM -0400, Dan Malek wrote:
>
> David Gibson wrote:
> > c. be able to use highmem pages
>
> You think there will be non-coherent processors with this much memory? :-)
I guess you're kidding, but to be clear to everybody, the 440GP does
handle up to 2GB of DDR. I run my ref platform with 1GB using highmem.
To be fair, I don't like this proliferation of non-coherent processors
(especially in the high-end PPC application space) any better than you
do. :-/
Regards,
--
Matt Porter
porter@cox.net
This is Linux Country. On a quiet night, you can hear Windows reboot.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
2002-07-16 14:09 ` Dan Malek
2002-07-16 14:58 ` Tom Rini
2002-07-16 15:23 ` Matt Porter
@ 2002-07-16 15:27 ` Matt Porter
2002-07-17 1:24 ` David Gibson
3 siblings, 0 replies; 11+ messages in thread
From: Matt Porter @ 2002-07-16 15:27 UTC (permalink / raw)
To: Dan Malek; +Cc: David Gibson, linuxppc-embedded
On Tue, Jul 16, 2002 at 10:09:03AM -0400, Dan Malek wrote:
>
> David Gibson wrote:
> > - if (in_interrupt())
> > - BUG();
> > + BUG_ON(in_interrupt());
>
> We should be able to call these functions from interrupt, let's push
> the remainder of the changes though get_free_pages() to make this happen.
Yes, not just "should", but "must". They must be callable from an
interrupt since many generic drivers follow the rules as they have
been laid down and allocate from an interrupt context.
Regards,
--
Matt Porter
porter@cox.net
This is Linux Country. On a quiet night, you can hear Windows reboot.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
2002-07-16 14:09 ` Dan Malek
` (2 preceding siblings ...)
2002-07-16 15:27 ` Matt Porter
@ 2002-07-17 1:24 ` David Gibson
2002-07-20 15:28 ` Benjamin Herrenschmidt
3 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2002-07-17 1:24 UTC (permalink / raw)
To: linuxppc-embedded
On Tue, Jul 16, 2002 at 10:09:03AM -0400, Dan Malek wrote:
>
> David Gibson wrote:
>
> >In addition, the following seem to me like desirable (to a greater or
> >lesser extent) properties for consistent_alloc():
> > a. one of the return values can be used as a single "handle"
> >so that consistent_free() only needs one parameter.
>
> We want the implementation API to be identical across all architectures.
> Until there is a consistent way of handling devices that are on local
> processor busses, platforms with these types of devices may call these
> functions directly. This is particularly true of some host USB
> controllers
> on embedded systems that don't have PCI busses.
Well, the unified device tree stuff should give us that.
> > b. works on both cache coherent and non cache coherent
> >machines
>
> The original purpose of the consistent_* functions was to provide support
> for processors that aren't cache coherent. The pci_* (and any other
> functions)
> can call these if appropriate, but I still believe the decision should be
> made at a higher level. Obviously, these consistent_* functions will
> be implemented differently on non- or cache coherent processors, and across
> different architectures. The PCI functions (or others) should still do
> their
> thing as they always have, and call these consistent_* functions when
> appropriate.
I know that's their original purpose, but it's *really easy* to make
consistent_* work on cache coherent chips too, so why not do it.
It'll make our life easier when we get an OCP device appearing on a
cache coherent chip.
> > c. be able to use highmem pages
>
> You think there will be non-coherent processors with this much memory? :-)
I think it's possible, yes.
> >- if (in_interrupt())
> >- BUG();
> >+ BUG_ON(in_interrupt());
>
> We should be able to call these functions from interrupt, let's push
> the remainder of the changes though get_free_pages() to make this happen.
As far as I can tell there's no problem with get_free_pages() (or
rather, alloc_pages()), except that we need to add GFP_ATOMIC to the
flags if we're in interrupt context.
The problems are in get_vm_area() which does a kmalloc() without
GFP_ATOMIC and in map_page() which can sleep.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: consistent_alloc() revisited
2002-07-17 1:24 ` David Gibson
@ 2002-07-20 15:28 ` Benjamin Herrenschmidt
2002-07-21 13:26 ` David Gibson
0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2002-07-20 15:28 UTC (permalink / raw)
To: David Gibson, linuxppc-embedded; +Cc: David S. Miller
>As far as I can tell there's no problem with get_free_pages() (or
>rather, alloc_pages()), except that we need to add GFP_ATOMIC to the
>flags if we're in interrupt context.
>
>The problems are in get_vm_area() which does a kmalloc() without
>GFP_ATOMIC and in map_page() which can sleep.
(Note do DaveM: this thread from linuxppc-embedded talks about
out consistent_alloc() functions used to get non-cacheable
DMA consistent memory for peripherials on the various non-PCI
type busses you can find on such embedded CPUs)
We can't rely on in_interrupt(). We can be called with a spinlock
held or on the VM path. In those cases, we need GFP_ATOMIC but
in_interrupt() won't be set.
I would rather pass a gfp_mask argument to consistent_alloc. I know
the pci_xxx version lacks that argument, I yet have to talk to DaveM
about it, in the meantime, it should be implemented as calling
consistent_alloc(..., GFP_ATOMIC).
I want the gfp argument to be passed since some drivers will want
a quite large pool to be allocated at insmod/kernel init time,
and I want to let those drivers to be able to sleep while VM
gets the pages back in this case instead of failing as they would
probably do with a GFP_ATOMIC on a machine that have been running
for some time.
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
2002-07-20 15:28 ` Benjamin Herrenschmidt
@ 2002-07-21 13:26 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2002-07-21 13:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-embedded, David S. Miller
On Sat, Jul 20, 2002 at 05:28:30PM +0200, Benjamin Herrenschmidt wrote:
>
> >As far as I can tell there's no problem with get_free_pages() (or
> >rather, alloc_pages()), except that we need to add GFP_ATOMIC to the
> >flags if we're in interrupt context.
> >
> >The problems are in get_vm_area() which does a kmalloc() without
> >GFP_ATOMIC and in map_page() which can sleep.
>
> (Note do DaveM: this thread from linuxppc-embedded talks about
> out consistent_alloc() functions used to get non-cacheable
> DMA consistent memory for peripherials on the various non-PCI
> type busses you can find on such embedded CPUs)
And which is also used to implement pci_alloc_consistent().
> We can't rely on in_interrupt(). We can be called with a spinlock
> held or on the VM path. In those cases, we need GFP_ATOMIC but
> in_interrupt() won't be set.
That's true.
> I would rather pass a gfp_mask argument to consistent_alloc. I know
> the pci_xxx version lacks that argument, I yet have to talk to DaveM
> about it, in the meantime, it should be implemented as calling
> consistent_alloc(..., GFP_ATOMIC).
That makes sense. But we would also need a GFP argument to
get_vm_area() and to pte_alloc_kernel()
> I want the gfp argument to be passed since some drivers will want
> a quite large pool to be allocated at insmod/kernel init time,
> and I want to let those drivers to be able to sleep while VM
> gets the pages back in this case instead of failing as they would
> probably do with a GFP_ATOMIC on a machine that have been running
> for some time.
>
> Ben.
>
>
>
>
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
2002-07-16 2:13 consistent_alloc() revisited David Gibson
2002-07-16 14:09 ` Dan Malek
@ 2002-07-20 15:22 ` Benjamin Herrenschmidt
2002-07-21 13:23 ` David Gibson
1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2002-07-20 15:22 UTC (permalink / raw)
To: David Gibson, linuxppc-embedded
>
>Arguably setting PageReserved is the Right Thing, since
>consistent_alloc() memory obviously shouldn't be swapped or paged.
>However, it's only relevant if the memory is mapped into userspace,
>and then a driver can set VM_IO on the vma to prevent this.
An other issue I had with AGP is, I think, killing the userspace
process mmap'ing some of this memory will cause the kernel to try
to free the pages if not PageReserved
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
2002-07-20 15:22 ` Benjamin Herrenschmidt
@ 2002-07-21 13:23 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2002-07-21 13:23 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-embedded
On Sat, Jul 20, 2002 at 05:22:13PM +0200, Benjamin Herrenschmidt wrote:
>
> >
> >Arguably setting PageReserved is the Right Thing, since
> >consistent_alloc() memory obviously shouldn't be swapped or paged.
> >However, it's only relevant if the memory is mapped into userspace,
> >and then a driver can set VM_IO on the vma to prevent this.
>
> An other issue I had with AGP is, I think, killing the userspace
> process mmap'ing some of this memory will cause the kernel to try
> to free the pages if not PageReserved
Surely it shouldn't if VM_IO is set.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: consistent_alloc() revisited
@ 2002-07-16 16:14 Ralph Blach
0 siblings, 0 replies; 11+ messages in thread
From: Ralph Blach @ 2002-07-16 16:14 UTC (permalink / raw)
To: linuxppc-embedded
Dan and Matt
I agree with you 100%
Chip
---------------------- Forwarded by Ralph Blach/Raleigh/IBM on 07/16/2002
12:13 PM ---------------------------
Matt Porter <porter@cox.net>@lists.linuxppc.org on 07/16/2002 11:27:26 AM
Sent by: owner-linuxppc-embedded@lists.linuxppc.org
To: Dan Malek <dan@embeddededge.com>
cc: David Gibson <david@gibson.dropbear.id.au>,
linuxppc-embedded@lists.linuxppc.org
Subject: Re: consistent_alloc() revisited
On Tue, Jul 16, 2002 at 10:09:03AM -0400, Dan Malek wrote:
>
> David Gibson wrote:
> > - if (in_interrupt())
> > - BUG();
> > + BUG_ON(in_interrupt());
>
> We should be able to call these functions from interrupt, let's push
> the remainder of the changes though get_free_pages() to make this happen.
Yes, not just "should", but "must". They must be callable from an
interrupt since many generic drivers follow the rules as they have
been laid down and allocate from an interrupt context.
Regards,
--
Matt Porter
porter@cox.net
This is Linux Country. On a quiet night, you can hear Windows reboot.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-07-21 13:26 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-16 2:13 consistent_alloc() revisited David Gibson
2002-07-16 14:09 ` Dan Malek
2002-07-16 14:58 ` Tom Rini
2002-07-16 15:23 ` Matt Porter
2002-07-16 15:27 ` Matt Porter
2002-07-17 1:24 ` David Gibson
2002-07-20 15:28 ` Benjamin Herrenschmidt
2002-07-21 13:26 ` David Gibson
2002-07-20 15:22 ` Benjamin Herrenschmidt
2002-07-21 13:23 ` David Gibson
-- strict thread matches above, loose matches on Subject: below --
2002-07-16 16:14 Ralph Blach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).