* consistent_free re-revisited @ 2002-09-11 21:45 Todd Poynor 2002-09-12 3:26 ` David Gibson 0 siblings, 1 reply; 10+ messages in thread From: Todd Poynor @ 2002-09-11 21:45 UTC (permalink / raw) To: linuxppc-dev The APIs for consistent_free have diverged between PPC and ARM. It has, at least in the past, been a goal to keep the APIs in sync. Revisiting this is prompted by a couple of new platforms recently added (Xilinx Virtex II Pro and IBM 405LP/Beech) that use consistent_alloc for framebuffers mmap'ed to X servers. It would be nice (but is hardly a critical need) if consistent_alloc/free would set/clear the reserved bit on the struct page's allocated, such that the returned memory can be mmap'ed to userspace without the driver explicitly setting these bits, a la ARM. On PPC this is complicated by the fact that consistent_free does not take a size parameter, describing the size of the allocated area, nor a dma_handle parameter, which ARM uses to derive struct page's. Furthermore, consistent_alloc does not fill out vm_struct fields for the VM area that map it to the struct pages, so consistent_free can't use those to derive the physical pages. And neither can vfree(), which is what consistent_free() calls to do its work. And so consistent_free() does not free the physical pages allocated by consistent_alloc(), which is a potentially more serious matter (although I suppose consistent_free isn't normally in heavy use, but hi-res framebuffers can get large). Setting/clearing reserved bits and init'ing the vm_struct fields such that physical pages are reclaimed by vfree() can be accomplished without API changes, at the expense of intruding into private VM data structures and inefficiency of allocating an array of struct page pointers to describe a contiguous chunk of memory. So it seems preferable to add the size and dma_handle params to consistent_free and do the physical page freeing there (and update the drivers that call it). Comments? Thanks, -- Todd ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-11 21:45 consistent_free re-revisited Todd Poynor @ 2002-09-12 3:26 ` David Gibson 2002-09-12 14:29 ` Tom Rini 2002-09-12 21:33 ` Todd Poynor 0 siblings, 2 replies; 10+ messages in thread From: David Gibson @ 2002-09-12 3:26 UTC (permalink / raw) To: Todd Poynor; +Cc: linuxppc-dev On Wed, Sep 11, 2002 at 02:45:31PM -0700, Todd Poynor wrote: > The APIs for consistent_free have diverged between PPC and ARM. It has, > at least in the past, been a goal to keep the APIs in sync. That's the least of our worries - the current consistent_alloc() is badly broken. It breaks the DMA-mapping.txt rules, because it can't safely be called from interrupt context. I think we need generic changes (gfp masks to various functions) to fix this sanely. It would be nice to have the same interface as ARM, though - on the other hand the consistent_*() interface as such may be obsoleted by unified device model interfaces: I believe per-bus consistent memory interfaces are supposed to be in there. > Revisiting this is prompted by a couple of new platforms recently added > (Xilinx Virtex II Pro and IBM 405LP/Beech) that use consistent_alloc for > framebuffers mmap'ed to X servers. It would be nice (but is hardly a > critical need) if consistent_alloc/free would set/clear the reserved bit > on the struct page's allocated, such that the returned memory can be > mmap'ed to userspace without the driver explicitly setting these bits, a > la ARM. I've thought about this before, but on the whole I don't think it's a good idea. I think setting VM_IO on the vma when it is mapped into userspace is a better idea. In PCI space, the assumption appears to be that PageReserved is *not* set by pci_alloc_consistent() - a couple of the sound drivers assume this, because they explicitly set and clear the Reserved bit when they remap various DMA buffers. Incidentally I think there are also cases where consistent_alloc() (both ours and ARM's, IIRC) could leak memory if failures occur at just the right point. > On PPC this is complicated by the fact that consistent_free does not > take a size parameter, describing the size of the allocated area, nor a > dma_handle parameter, which ARM uses to derive struct page's. Yes, on the other hand having a single "handle" on the memory in order to free it is nice. > Furthermore, consistent_alloc does not fill out vm_struct fields for the > VM area that map it to the struct pages, so consistent_free can't use > those to derive the physical pages. And neither can vfree(), which is > what consistent_free() calls to do its work. And so consistent_free() > does not free the physical pages allocated by consistent_alloc(), which > is a potentially more serious matter (although I suppose consistent_free > isn't normally in heavy use, but hi-res framebuffers can get large). Erm... vfree() walks the page tables to find the pages to free, which should work fine. > Setting/clearing reserved bits and init'ing the vm_struct fields such > that physical pages are reclaimed by vfree() can be accomplished without > API changes, at the expense of intruding into private VM data structures > and inefficiency of allocating an array of struct page pointers to > describe a contiguous chunk of memory. So it seems preferable to add > the size and dma_handle params to consistent_free and do the physical > page freeing there (and update the drivers that call it). Comments? > Thanks, > > -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 3:26 ` David Gibson @ 2002-09-12 14:29 ` Tom Rini 2002-09-12 7:23 ` Benjamin Herrenschmidt 2002-09-12 17:05 ` Matt Porter 2002-09-12 21:33 ` Todd Poynor 1 sibling, 2 replies; 10+ messages in thread From: Tom Rini @ 2002-09-12 14:29 UTC (permalink / raw) To: Todd Poynor, linuxppc-dev On Thu, Sep 12, 2002 at 01:26:40PM +1000, David Gibson wrote: > > On Wed, Sep 11, 2002 at 02:45:31PM -0700, Todd Poynor wrote: > > The APIs for consistent_free have diverged between PPC and ARM. It has, > > at least in the past, been a goal to keep the APIs in sync. > > That's the least of our worries - the current consistent_alloc() is > badly broken. It breaks the DMA-mapping.txt rules, because it can't > safely be called from interrupt context. I think we need generic > changes (gfp masks to various functions) to fix this sanely. IIRC, to fix the interrupt context bit, we need something like the following, which was Paul's idea, and he said he would talk to Dave M. about it. Did anything ever happen there? But on the other hand, IIRC, the allocation can fail, and the driver has to be able to deal with that. But almost none of them do. > It would be nice to have the same interface as ARM, though - on the > other hand the consistent_*() interface as such may be obsoleted by > unified device model interfaces: I believe per-bus consistent memory > interfaces are supposed to be in there. Well, if we can have the same interface as ARM, wouldn't that go a ways towards having the code already done for the unified device model? And this is a 2.4 problem too. [snip] > Incidentally I think there are also cases where consistent_alloc() > (both ours and ARM's, IIRC) could leak memory if failures occur at > just the right point. Do you remember what that point was? -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ --- arch/ppc/mm/cachemap.c 2002/05/10 16:12:32 1.2 +++ arch/ppc/mm/cachemap.c 2002/07/12 19:01:00 @@ -68,9 +68,6 @@ struct vm_struct *area; void *ret; - if (in_interrupt()) - BUG(); - /* Only allocate page size areas. */ size = PAGE_ALIGN(size); --- mm/vmalloc.c 2002/05/09 00:26:12 1.1.1.1 +++ mm/vmalloc.c 2002/07/12 19:01:00 @@ -12,6 +12,7 @@ #include <linux/spinlock.h> #include <linux/highmem.h> #include <linux/smp_lock.h> +#include <linux/interrupt.h> #include <asm/uaccess.h> #include <asm/pgalloc.h> @@ -169,15 +170,16 @@ struct vm_struct * get_vm_area(unsigned long size, unsigned long flags) { - unsigned long addr; + unsigned long addr, lflags; struct vm_struct **p, *tmp, *area; - area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); + area = (struct vm_struct *) kmalloc(sizeof(*area), + (in_interrupt() ? GFP_ATOMIC : GFP_KERNEL)); if (!area) return NULL; size += PAGE_SIZE; addr = VMALLOC_START; - write_lock(&vmlist_lock); + write_lock_irqsave(&vmlist_lock, lflags); for (p = &vmlist; (tmp = *p) ; p = &tmp->next) { if ((size + addr) < addr) goto out; @@ -192,11 +194,11 @@ area->size = size; area->next = *p; *p = area; - write_unlock(&vmlist_lock); + write_unlock_irqrestore(&vmlist_lock, lflags); return area; out: - write_unlock(&vmlist_lock); + write_unlock_irqrestore(&vmlist_lock, lflags); kfree(area); return NULL; } --- include/asm-ppc/pgalloc.h.orig Wed Aug 7 11:04:12 2002 +++ include/asm-ppc/pgalloc.h Wed Aug 7 11:09:02 2002 @@ -114,7 +114,8 @@ extern void *early_get_page(void); if (mem_init_done) - pte = (pte_t *) __get_free_page(GFP_KERNEL); + pte = (pte_t *) __get_free_page( + (in_interrupt() ? GFP_ATOMIC : GFP_KERNEL)); else pte = (pte_t *) early_get_page(); if (pte != NULL) ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 14:29 ` Tom Rini @ 2002-09-12 7:23 ` Benjamin Herrenschmidt 2002-09-12 14:51 ` Tom Rini 2002-09-12 17:05 ` Matt Porter 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2002-09-12 7:23 UTC (permalink / raw) To: Tom Rini, Todd Poynor, linuxppc-dev >> That's the least of our worries - the current consistent_alloc() is >> badly broken. It breaks the DMA-mapping.txt rules, because it can't >> safely be called from interrupt context. I think we need generic >> changes (gfp masks to various functions) to fix this sanely. > >IIRC, to fix the interrupt context bit, we need something like the >following, which was Paul's idea, and he said he would talk to Dave M. >about it. Did anything ever happen there? I tend to hate anything that relies on in_interrupt() as they are other contexts that will have in_interrupt() cleared but still have the same limitations. Typically, anything on the VM path must do either GFP_ATOMIC or GFP_NOIO allocations, wether it's running at interrupt time or not. Ben. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 7:23 ` Benjamin Herrenschmidt @ 2002-09-12 14:51 ` Tom Rini 2002-09-12 7:49 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 10+ messages in thread From: Tom Rini @ 2002-09-12 14:51 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Todd Poynor, linuxppc-dev On Thu, Sep 12, 2002 at 09:23:15AM +0200, Benjamin Herrenschmidt wrote: > >> That's the least of our worries - the current consistent_alloc() is > >> badly broken. It breaks the DMA-mapping.txt rules, because it can't > >> safely be called from interrupt context. I think we need generic > >> changes (gfp masks to various functions) to fix this sanely. > > > >IIRC, to fix the interrupt context bit, we need something like the > >following, which was Paul's idea, and he said he would talk to Dave M. > >about it. Did anything ever happen there? > > I tend to hate anything that relies on in_interrupt() as they > are other contexts that will have in_interrupt() cleared but still > have the same limitations. Typically, anything on the VM path must > do either GFP_ATOMIC or GFP_NOIO allocations, wether it's running > at interrupt time or not. The problem is that the atomic pool is limited, iirc. -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 14:51 ` Tom Rini @ 2002-09-12 7:49 ` Benjamin Herrenschmidt 2002-09-12 15:23 ` Tom Rini 0 siblings, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2002-09-12 7:49 UTC (permalink / raw) To: Tom Rini; +Cc: Todd Poynor, linuxppc-dev >> I tend to hate anything that relies on in_interrupt() as they >> are other contexts that will have in_interrupt() cleared but still >> have the same limitations. Typically, anything on the VM path must >> do either GFP_ATOMIC or GFP_NOIO allocations, wether it's running >> at interrupt time or not. > >The problem is that the atomic pool is limited, iirc. Well, ATOMIC can fail, sure, but if you do GFP_KERNEL within a VM code path, then be prepared for deadlocks. A driver that would need such allocations during it's request handling path would suck anyway and should pre-allocate ;) But if this is the case, then it must be able to say that it wants ATOMIC or NOIO allocations, and if the driver is done properly, it should be able to deal gracefully with failure, typically for a block driver by triggering a timer to try the request again later or just fail the request (but that is generally bad for a block driver). Ben. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 7:49 ` Benjamin Herrenschmidt @ 2002-09-12 15:23 ` Tom Rini 2002-09-12 8:08 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 10+ messages in thread From: Tom Rini @ 2002-09-12 15:23 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Todd Poynor, linuxppc-dev On Thu, Sep 12, 2002 at 09:49:07AM +0200, Benjamin Herrenschmidt wrote: > >> I tend to hate anything that relies on in_interrupt() as they > >> are other contexts that will have in_interrupt() cleared but still > >> have the same limitations. Typically, anything on the VM path must > >> do either GFP_ATOMIC or GFP_NOIO allocations, wether it's running > >> at interrupt time or not. > > > >The problem is that the atomic pool is limited, iirc. > > Well, ATOMIC can fail, sure, but if you do GFP_KERNEL within a > VM code path, then be prepared for deadlocks. Er, so you're objecting to existing GFP_KERNEL's then? Or is it too early and I need more coffee? :) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 15:23 ` Tom Rini @ 2002-09-12 8:08 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 10+ messages in thread From: Benjamin Herrenschmidt @ 2002-09-12 8:08 UTC (permalink / raw) To: Tom Rini; +Cc: Todd Poynor, linuxppc-dev >Er, so you're objecting to existing GFP_KERNEL's then? > >Or is it too early and I need more coffee? :) Existing GFP_KERNEL is no good at all, sure ;) Ben. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 14:29 ` Tom Rini 2002-09-12 7:23 ` Benjamin Herrenschmidt @ 2002-09-12 17:05 ` Matt Porter 1 sibling, 0 replies; 10+ messages in thread From: Matt Porter @ 2002-09-12 17:05 UTC (permalink / raw) To: Tom Rini; +Cc: Todd Poynor, linuxppc-dev On Thu, Sep 12, 2002 at 07:29:45AM -0700, Tom Rini wrote: > IIRC, to fix the interrupt context bit, we need something like the > following, which was Paul's idea, and he said he would talk to Dave M. > about it. Did anything ever happen there? <snip> In addition to that, you need to scrub the entire consistent_alloc path...that includes map_page()->pte_alloc_kernel() which may do a GFP_KERNEL allocation. Then it's more complex yet with Book E's need to have those possible page table allocations from the pinned TLB region. Regards, -- Matt Porter porter@cox.net This is Linux Country. On a quiet night, you can hear Windows reboot. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: consistent_free re-revisited 2002-09-12 3:26 ` David Gibson 2002-09-12 14:29 ` Tom Rini @ 2002-09-12 21:33 ` Todd Poynor 1 sibling, 0 replies; 10+ messages in thread From: Todd Poynor @ 2002-09-12 21:33 UTC (permalink / raw) To: David Gibson; +Cc: linuxppc-dev David Gibson wrote: >>Furthermore, consistent_alloc does not fill out vm_struct fields for the >>VM area that map it to the struct pages, so consistent_free can't use >>those to derive the physical pages. And neither can vfree(), which is >>what consistent_free() calls to do its work. And so consistent_free() >>does not free the physical pages allocated by consistent_alloc()... > > Erm... vfree() walks the page tables to find the pages to free, which > should work fine. Been curious about this... vfree() walks the page tables to find PTE entries to unmap (modifying the PTE entry). But its code to call __free_page() on the physical pages for the area (the deallocate_pages condition in __vunmap()) is not being executed (because area->nr_pages == 0), and it seems likely this is needed. After modifying consistent_alloc() to fill out the area->pages array and set area->nr_pages, __vunmap() will then call __free_page() on those pages. Whereupon: vaddr = consistent_alloc(gfp, size, &physaddr); consistent_free(vaddr); vaddr = consistent_alloc(gfp, size, &physaddr); then allocates the same physaddr both times, which is not currently the case. There is also an overall difference in memory usage in the neighborhood of the allocated buffer size between the 2 schemes (per free(1) at a shell prompt), but I wouldn't swear to it... -- Todd ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-09-12 21:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-09-11 21:45 consistent_free re-revisited Todd Poynor 2002-09-12 3:26 ` David Gibson 2002-09-12 14:29 ` Tom Rini 2002-09-12 7:23 ` Benjamin Herrenschmidt 2002-09-12 14:51 ` Tom Rini 2002-09-12 7:49 ` Benjamin Herrenschmidt 2002-09-12 15:23 ` Tom Rini 2002-09-12 8:08 ` Benjamin Herrenschmidt 2002-09-12 17:05 ` Matt Porter 2002-09-12 21:33 ` Todd Poynor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).