* Re: can device drivers return non-ram via vm_ops->nopage? [not found] ` <405E3387.1050505@pobox.com> @ 2004-03-22 3:45 ` William Lee Irwin III 2004-03-22 4:41 ` James Bottomley 0 siblings, 1 reply; 38+ messages in thread From: William Lee Irwin III @ 2004-03-22 3:45 UTC (permalink / raw) To: linux-arch, Jeff Garzik Cc: rmk, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli Sorry about the top posting and long quote; I wanted to fully quote the API under discussion while getting the central issues aired in the first few lines. The suggested dma_scatterlist structure, for the API proposed below, was: struct dma_scatterlist { dma_addr_t dma_addr; /* DMA address */ void *cpu_addr; /* cpu address */ unsigned long length; /* in units of pages */ }; What we're trying to resolve here is drivers supporting ->mmap() doing virt_to_page() on the results of dma_alloc_coherent() and other things they shouldn't, and so passing back bogus page pointers as the return value from ->nopage(), and having no method of resolving it due to the fact mem_map[] may not cover the area referred to and there is no portable method for reliably determining pfn's or other information necessary even to establish mappings by hand. I think it's worth noting that (according to rmk) ->cpu_addr may not be in any way relevant to RAM, pfn's, or virtual mappings (I'm not actually sure what it is) and has to be treated as arch-private otherwise-opaque data. The way this is expected to solve the problem is by providing a method for the arch to establish mappings of these areas not reliant on struct page or fault handling. That is, these functions prefault the areas into the process address space, thus insulating the core from the details of fault handling on these areas and eliminating fault handling on these areas altogether. I tried to translate a function prototype for prefaulting these areas into userspace that rmk gave as an example into a full set of operations based on his proposed piece of the API. So what I'm looking for here is to find out whether this is good enough for all of the various arches, and if not, how we can get something together that will fix the bugs in these drivers that will work portably. jgarzik's comments on suitability for sound drivers follow the API itself. William Lee Irwin III wrote: >>int dma_mmap_coherent_sg(struct dma_scatterlist *sglist, >> int nr_sglist_elements, /* length of sglist */ >> struct vm_area_struct *vma, /* for address space */ >> unsigned long address, /* user virtual >> address */ >> unsigned long offset, /* offset (in pages) */ >> unsigned long nr_pages); /* length (in pages) */ >> >>int dma_munmap_coherent_sg(struct dma_scatterlist *sglist, >> int nr_sglist_elements, /* length of sglist */ >> struct vm_area_struct *vma, /* for address space */ >> unsigned long address, /* user virtual >> address */ >> unsigned long offset, /* offset (in pages) */ >> unsigned long nr_pages); /* length (in pages) */ >> >>int dma_alloc_coherent_sg(struct dma_scatterlist **sglist, >> unsigned long length); /* length in pages */ >> >>int dma_free_coherent_sg(struct dma_scatterlist **sglist, >> unsigned long length); /* length in pages */ Where it was proposed that these would be helper functions that sit atop primitive functions like: int dma_mmap_coherent(struct vm_area_struct *vma, unsigned long address, dma_addr_t dma_addr, /* DMA address */ void *cpu_addr, /* cpu address */ unsigned long nr_pages); /* length (in pages) */ int dma_munmap_coherent(struct vm_area_struct *vma, unsigned long address, dma_addr_t dma_addr, /* DMA address */ void *cpu_addr, /* cpu address */ unsigned long nr_pages); /* length (in pages) */ jgarzik's assessment was: On Sun, Mar 21, 2004 at 07:29:59PM -0500, Jeff Garzik wrote: > No comment on struct dma_scatterlist, but the above is the most natural > API for audio drivers at least. > Audio drivers allocate buffers at ->probe() or open(2), and the only > entity that actually cares about the contents of the buffers are (a) the > hardware and (b) userland. via82cxxx_audio only uses > pci_alloc_consistent because there's not a more appropriate DMA > allocator for the use to which that memory is put. > Audio drivers only need to read/write the buffers inside the kernel when > implementing read(2) and write(2) via copy_{to,from}_user(). One thing that concerns me about this is that jgarzik seems to be saying that via82cxxx_audio's needs aren't covered, so some alteration to accommodate it may be necessary. -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 3:45 ` can device drivers return non-ram via vm_ops->nopage? William Lee Irwin III @ 2004-03-22 4:41 ` James Bottomley 2004-03-22 4:46 ` William Lee Irwin III 2004-03-22 9:30 ` Russell King 0 siblings, 2 replies; 38+ messages in thread From: James Bottomley @ 2004-03-22 4:41 UTC (permalink / raw) To: William Lee Irwin III Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Sun, 2004-03-21 at 22:45, William Lee Irwin III wrote: > What we're trying to resolve here is drivers supporting ->mmap() doing > virt_to_page() on the results of dma_alloc_coherent() and other things > they shouldn't, and so passing back bogus page pointers as the return > value from ->nopage(), and having no method of resolving it due to the > fact mem_map[] may not cover the area referred to and there is no > portable method for reliably determining pfn's or other information > necessary even to establish mappings by hand. I think it's worth noting > that (according to rmk) ->cpu_addr may not be in any way relevant to > RAM, pfn's, or virtual mappings (I'm not actually sure what it is) > and has to be treated as arch-private otherwise-opaque data. Hang on a minute, what makes you think it's legal in any way shape or form to construct a user mapping for a coherent area? Such an entity, if it were made, wouldn't follow the rules for normal mmaps. Let me illustrate what would go wrong on parisc: we have a VIPT cache and the concept of an address space. This means that when we allocate coherent memory, we mean it will *only* be coherent with respect to the single specified address space (which is currently the kernel). We have to make this explicit in the iommu by programming a so called coherence index for each IOMMU pte (which tells the CPU's cache which line to flush when the device writes to this address). Thus, if you mmap our coherent memory and the device does a write to this memory, the write will not be seen by the user if the users address space has a cache entry for it already. Therefore, a user trying to make use of a coherent area mmap would have to flush/invalidate everything all the time just to try to make sure they weren't missing device updates (because we have no mechanism for the kernel to know the data has changed and call flush_dcache_page). James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 4:41 ` James Bottomley @ 2004-03-22 4:46 ` William Lee Irwin III 2004-03-22 4:56 ` James Bottomley 2004-03-22 9:30 ` Russell King 1 sibling, 1 reply; 38+ messages in thread From: William Lee Irwin III @ 2004-03-22 4:46 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote: > Hang on a minute, what makes you think it's legal in any way shape or > form to construct a user mapping for a coherent area? > Such an entity, if it were made, wouldn't follow the rules for normal > mmaps. Okay, this is bad news for sound (and possibly some graphics) drivers on PA-RISC, since this mapping of coherent areas into userspace is exactly what they're trying to do for the device interfaces they export to the user. Are you seeing breakage there, or are the drivers doing this unused on PA-RISC? -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 4:46 ` William Lee Irwin III @ 2004-03-22 4:56 ` James Bottomley 2004-03-22 5:26 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 38+ messages in thread From: James Bottomley @ 2004-03-22 4:56 UTC (permalink / raw) To: William Lee Irwin III Cc: linux-arch, Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Sun, 2004-03-21 at 23:46, William Lee Irwin III wrote: > Okay, this is bad news for sound (and possibly some graphics) drivers > on PA-RISC, since this mapping of coherent areas into userspace is > exactly what they're trying to do for the device interfaces they > export to the user. > > Are you seeing breakage there, or are the drivers doing this > unused on PA-RISC? Well, our older sound drivers have never worked since ALSA (they hang off the GSC bus which ALSA doesn't have an abstraction for). Mostly we use serial console, and a HP specific thing called a STI framebuffer for video. The problems I describe only occur if you try to mmap coherent memory. mmaping streaming memory is fine. But, I would expect that any arch with a virtually indexed cache would have similar problems: there may be many address aliases in the cache and the DMA controller probably only knows about one of them. James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 4:56 ` James Bottomley @ 2004-03-22 5:26 ` Benjamin Herrenschmidt 2004-03-22 11:58 ` Andrea Arcangeli 0 siblings, 1 reply; 38+ messages in thread From: Benjamin Herrenschmidt @ 2004-03-22 5:26 UTC (permalink / raw) To: James Bottomley Cc: William Lee Irwin III, Linux Arch list, Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli Well, I just went over this whole discussion and I think it's just going to hell. So here are my 2 cents of suggestions: - We _WANT_ the ability to map coherent memory to userspace, that's the normal way to map sound buffers to userland for low latency (though mapping the actual DMA ptrs is a different matter and is definitely not working with a bunch of sound interfaces). This is also necessary for the infiniband/myrinet kind of things. DRI sort-of need that when not using AGP, AGP itself is a special case but could be considered as coherent memory in some platforms too (and will be with PCI Express afaik) etc... - Some architectures apparently cannot do that (parisc ?) - Too bad for them... They won't have low latency audio and fast networking and be done with it. Let's implement a couple of simple to use (driver-wise) helpers dma_can_mmap_coherent() -> parisc returns false here dma_mmap_coherent() dma_mmap_coherent_sg() And be done with it. I don't see where is the debate here ? The API takes the same sglist as used for dma_map_sg, I don't see the point of anything different, I agree with linus that it's not worth even thinking about not having struct page here. Ben. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 5:26 ` Benjamin Herrenschmidt @ 2004-03-22 11:58 ` Andrea Arcangeli 2004-03-22 12:05 ` Russell King 0 siblings, 1 reply; 38+ messages in thread From: Andrea Arcangeli @ 2004-03-22 11:58 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: James Bottomley, William Lee Irwin III, Linux Arch list, Jeff Garzik, Russell King, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Mon, Mar 22, 2004 at 04:26:29PM +1100, Benjamin Herrenschmidt wrote: > Well, I just went over this whole discussion and I think it's just > going to hell. > > So here are my 2 cents of suggestions: > > - We _WANT_ the ability to map coherent memory to userspace, that's > the normal way to map sound buffers to userland for low latency (though > mapping the actual DMA ptrs is a different matter and is definitely not > working with a bunch of sound interfaces). This is also necessary for > the infiniband/myrinet kind of things. DRI sort-of need that when not > using AGP, AGP itself is a special case but could be considered as > coherent memory in some platforms too (and will be with PCI Express > afaik) etc... > > - Some architectures apparently cannot do that (parisc ?) > > - Too bad for them... They won't have low latency audio and fast > networking and be done with it. Let's implement a couple of simple > to use (driver-wise) helpers > > dma_can_mmap_coherent() -> parisc returns false here > dma_mmap_coherent() > dma_mmap_coherent_sg() > > And be done with it. I don't see where is the debate here ? The > API takes the same sglist as used for dma_map_sg, I don't see the > point of anything different, I agree with linus that it's not worth > even thinking about not having struct page here. I like your three functions and the clear decription. The only reason I believe a paging mechanism would been nicer, is that it would avoid latencies in dma_mmap_coherent (not necessairly scheduler latencies, but you would pay all the cost of the pagetables immediatly during the mmap syscall, so if you've to map gigs of ram that would tend to hang the task doing the mmap a little bit, I found it nicer to use the paging for this so we also only allocate the memory for the pagetables that we need, but OTOH Linus's right that in most cases it doesn't worth a single branch in a fast path). ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 11:58 ` Andrea Arcangeli @ 2004-03-22 12:05 ` Russell King 2004-03-22 12:34 ` Andrea Arcangeli 0 siblings, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-22 12:05 UTC (permalink / raw) To: Andrea Arcangeli Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III, Linux Arch list, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Mon, Mar 22, 2004 at 12:58:07PM +0100, Andrea Arcangeli wrote: > The only reason I believe a paging mechanism would been nicer, is that > it would avoid latencies in dma_mmap_coherent (not necessairly scheduler > latencies, but you would pay all the cost of the pagetables immediatly > during the mmap syscall, so if you've to map gigs of ram that would tend > to hang the task doing the mmap a little bit, I found it nicer to use > the paging for this so we also only allocate the memory for the > pagetables that we need, but OTOH Linus's right that in most cases it > doesn't worth a single branch in a fast path). However, if you go on to read what Linus said later, he seems to be saying that we can guarantee that dma_alloc_coherent() will be backed by memory which has page structures associated with it. This means that we _can_ use the ->nopage function for the DMA coherent implementation after all. However, it isn't useful for the PCI device-side buffer case, which would need to be handled via remap_page_range(). -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 12:05 ` Russell King @ 2004-03-22 12:34 ` Andrea Arcangeli 0 siblings, 0 replies; 38+ messages in thread From: Andrea Arcangeli @ 2004-03-22 12:34 UTC (permalink / raw) To: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III, Linux Arch list, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Mon, Mar 22, 2004 at 12:05:37PM +0000, Russell King wrote: > However, it isn't useful for the PCI device-side buffer case, which would > need to be handled via remap_page_range(). one could allocate page_t for the PCI device-side buffer case too (with discontigmem to avoid a terrible waste), but it would still be a small waste. So for non-ram it's better you always map all ptes during ->mmap and you avoid the page faults like with remap_file_pages than to allocate a page_t for non-ram ranges with discontigmem. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 4:41 ` James Bottomley 2004-03-22 4:46 ` William Lee Irwin III @ 2004-03-22 9:30 ` Russell King 2004-03-22 15:04 ` James Bottomley 1 sibling, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-22 9:30 UTC (permalink / raw) To: James Bottomley Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote: > Let me illustrate what would go wrong on parisc: we have a VIPT cache > and the concept of an address space. Is it not the case that VIPT caches are coloured, and mapping a page into the appropriate place results in the same virtual index for both? If this isn't true, this means that SHM is also broken on PARISC since there is no value of SHMLBA which makes SHM mappings coherent with each other. > Therefore, a user trying to make use of a coherent area mmap would have > to flush/invalidate everything all the time just to try to make sure > they weren't missing device updates (because we have no mechanism for > the kernel to know the data has changed and call flush_dcache_page). Unfortunately, there is a class of drivers where mmaping a large DMA buffer into user space makes sense. These are video capture and sound drivers. By saying that "we can't support DMA coherent mmap" you're forcing driver writers to write their own DMA coherent mmap implementations, which they _have_ done already, and they've screwed up the interfaces such that it only works on x86 today. What I want is an interface which allows most of the architectures which are capable of doing this to indeed do this. Those which can't should fail the mmap attempt. It has to be said that by doing this we're actually better off - more drivers work across more platforms and we have a well defined failure mode for platforms where it doesn't work. If those platforms want to use those drivers, they aren't actually in a worse situation - they had to find some way to work around this before now, and they still have to find some way to work around this afterwards, or maybe decide that the subset of drivers which need this are incompatible with the architecture. However, please don't prevent all architectures from being able to use these drivers just because a small number can't. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 9:30 ` Russell King @ 2004-03-22 15:04 ` James Bottomley 2004-03-22 15:15 ` Russell King 0 siblings, 1 reply; 38+ messages in thread From: James Bottomley @ 2004-03-22 15:04 UTC (permalink / raw) To: Russell King Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, 2004-03-22 at 04:30, Russell King wrote: > On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote: > > Let me illustrate what would go wrong on parisc: we have a VIPT cache > > and the concept of an address space. > > Is it not the case that VIPT caches are coloured, and mapping a page > into the appropriate place results in the same virtual index for both? Not coloured exactly since the caches are associative, but we have a congruence modulus. As long as two virtual addresses are equal modulo this, the cache will detect and unify virtual aliasing (basically it assigns the addresses the same coherence index). So, as long as the proposed API gives the arch control over where in the user vm the mapping goes, we would be able to accommodate it. However, my understanding of the API was that you *already* had a vm range and were trying to place a coherently mapped page into it. > However, please don't prevent all architectures from being able to > use these drivers just because a small number can't. I don't believe I was. I was merely pointing out the problems as I saw them with mmap'ing a coherent memory area. James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 15:04 ` James Bottomley @ 2004-03-22 15:15 ` Russell King 2004-03-22 15:27 ` James Bottomley 0 siblings, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-22 15:15 UTC (permalink / raw) To: James Bottomley Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, Mar 22, 2004 at 10:04:23AM -0500, James Bottomley wrote: > On Mon, 2004-03-22 at 04:30, Russell King wrote: > > On Sun, Mar 21, 2004 at 11:41:35PM -0500, James Bottomley wrote: > > > Let me illustrate what would go wrong on parisc: we have a VIPT cache > > > and the concept of an address space. > > > > Is it not the case that VIPT caches are coloured, and mapping a page > > into the appropriate place results in the same virtual index for both? > > Not coloured exactly since the caches are associative, but we have a > congruence modulus. As long as two virtual addresses are equal modulo > this, the cache will detect and unify virtual aliasing (basically it > assigns the addresses the same coherence index). So, as long as the > proposed API gives the arch control over where in the user vm the > mapping goes, we would be able to accommodate it. > > However, my understanding of the API was that you *already* had a vm > range and were trying to place a coherently mapped page into it. Correct. However, note that the kernels view of the DMA mapping would not be accessed in this instance. I guess this still causes you some problems, though I suspect that given an adequate API, you could tweak your iommu appropriately. For example, if we had: int dma_coherent_mmap(vma, cpuaddr, dmaaddr, size) then the architecture could do whatever it needed to mmap that address space. It could: (a) call remap_page_range() with appropriate pgprot (b) use a vm_operations_struct interally to fault the pages in, again using the appropraite pgprot. (c) disallow the mmap if it is within the architectures rules (eg, all mmapings are of the same cache colour/congruence modulus) (d) adjust whatever hardware for device DMA such that the mapping is coherent and then do (a) or (b) and/or (c). (e) disallow the mmap entirely. I suspect x86, ARM and similar could be either (a) or (b). PA RISC would be (c) and (d). Note: I don't see the need for dma_coherent_munmap() - the mappings are destroyed on process exit, and we should not be freeing the coherent mapping until the mmap of it has gone - and you get to know this via the ->release method. However, with (b) an architecture can positively check that this rule is followed via suitable refcounting and checking in dma_free_coherent. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 15:15 ` Russell King @ 2004-03-22 15:27 ` James Bottomley 2004-03-22 21:50 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 38+ messages in thread From: James Bottomley @ 2004-03-22 15:27 UTC (permalink / raw) To: Russell King Cc: William Lee Irwin III, linux-arch, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, 2004-03-22 at 10:15, Russell King wrote: > Correct. However, note that the kernels view of the DMA mapping would > not be accessed in this instance. I guess this still causes you some > problems, though I suspect that given an adequate API, you could > tweak your iommu appropriately. Ah, well now we're getting into one of the problems with the kernel's API. Currently we have a two stage approach: the DMA API makes the kernel space coherent, and then vm APIs make the user spaces coherent. We could do this exactly as you propose: make the mapping directly coherent with the user address space and never visible to the kernel and everything would work correctly. We could do this simply by loading the user coherency index into the IOMMU ptes on the mapping. I've already begun thinking that we may want to shift the API to this model (i.e. have a preferred address space to do DMA operations to). Even in most filesystem streaming mappings, only one address space ususally wants to see the data (sharing is the rarity rather than the rule). > (a) call remap_page_range() with appropriate pgprot > (b) use a vm_operations_struct interally to fault the pages in, > again using the appropraite pgprot. > (c) disallow the mmap if it is within the architectures rules > (eg, all mmapings are of the same cache colour/congruence > modulus) > (d) adjust whatever hardware for device DMA such that the mapping > is coherent and then do (a) or (b) and/or (c). > (e) disallow the mmap entirely. > > I suspect x86, ARM and similar could be either (a) or (b). PA RISC would > be (c) and (d). Yes, we could probably do (c). Like I said, (d) is a bit of a paradigm shift for the API, but it's also doable. > Note: I don't see the need for dma_coherent_munmap() - the mappings are > destroyed on process exit, and we should not be freeing the coherent > mapping until the mmap of it has gone - and you get to know this via > the ->release method. However, with (b) an architecture can positively > check that this rule is followed via suitable refcounting and checking > in dma_free_coherent. I could see a point: since we can only keep one address space coherent, we cannot allow multiple mmappings of the same region. Thus, processes would be able to hand off the coherent mmap, but wouldn't be allowed simultaneously to map. the unmap API would be telling the arch that the mapping was free to be remapped. James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 15:27 ` James Bottomley @ 2004-03-22 21:50 ` Benjamin Herrenschmidt 2004-03-22 22:18 ` Jeff Garzik 0 siblings, 1 reply; 38+ messages in thread From: Benjamin Herrenschmidt @ 2004-03-22 21:50 UTC (permalink / raw) To: James Bottomley Cc: Russell King, William Lee Irwin III, Linux Arch list, Jeff Garzik, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli > I could see a point: since we can only keep one address space coherent, > we cannot allow multiple mmappings of the same region. Thus, processes > would be able to hand off the coherent mmap, but wouldn't be allowed > simultaneously to map. the unmap API would be telling the arch that the > mapping was free to be remapped. You cannot have the mapping coherent in both kernel and user space ? Hrm, I'm afraid drivers won't like that. The DRI will definitely be unhappy, and while I don't think sound drivers need to tap the buffers from the kernel mapping in normal cases, I'm pretty sure things like infiniband or myrinet will have a problem too. Ben. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 21:50 ` Benjamin Herrenschmidt @ 2004-03-22 22:18 ` Jeff Garzik 2004-03-22 22:35 ` William Lee Irwin III 2004-03-22 23:19 ` Russell King 0 siblings, 2 replies; 38+ messages in thread From: Jeff Garzik @ 2004-03-22 22:18 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: James Bottomley, Russell King, William Lee Irwin III, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli Benjamin Herrenschmidt wrote: >>I could see a point: since we can only keep one address space coherent, >>we cannot allow multiple mmappings of the same region. Thus, processes >>would be able to hand off the coherent mmap, but wouldn't be allowed >>simultaneously to map. the unmap API would be telling the arch that the >>mapping was free to be remapped. > > > You cannot have the mapping coherent in both kernel and user space ? Hrm, > I'm afraid drivers won't like that. The DRI will definitely be unhappy, > and while I don't think sound drivers need to tap the buffers from the > kernel mapping in normal cases, I'm pretty sure things like infiniband > or myrinet will have a problem too. You need both kernel and userspace... for audio drivers, mmap(2) is direct to userspace, but read(2) and write(2) must copy_from_user() into the allocated DMA area. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 22:18 ` Jeff Garzik @ 2004-03-22 22:35 ` William Lee Irwin III 2004-03-22 23:57 ` Benjamin Herrenschmidt 2004-03-22 23:19 ` Russell King 1 sibling, 1 reply; 38+ messages in thread From: William Lee Irwin III @ 2004-03-22 22:35 UTC (permalink / raw) To: Jeff Garzik Cc: Benjamin Herrenschmidt, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli Benjamin Herrenschmidt wrote: >> You cannot have the mapping coherent in both kernel and user space ? Hrm, >> I'm afraid drivers won't like that. The DRI will definitely be unhappy, >> and while I don't think sound drivers need to tap the buffers from the >> kernel mapping in normal cases, I'm pretty sure things like infiniband >> or myrinet will have a problem too. On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: > You need both kernel and userspace... for audio drivers, mmap(2) is > direct to userspace, but read(2) and write(2) must copy_from_user() into > the allocated DMA area. This is burned into silicon, so supporting it's not an option. Frankly I think what's best is another device interface for userspace to fall back to when this coherent userspace mmap() is unimplementable, e.g. read()/write() on some device node. -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 22:35 ` William Lee Irwin III @ 2004-03-22 23:57 ` Benjamin Herrenschmidt 2004-03-23 0:22 ` David Woodhouse 2004-03-23 2:07 ` William Lee Irwin III 0 siblings, 2 replies; 38+ messages in thread From: Benjamin Herrenschmidt @ 2004-03-22 23:57 UTC (permalink / raw) To: William Lee Irwin III Cc: Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli > On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: > > You need both kernel and userspace... for audio drivers, mmap(2) is > > direct to userspace, but read(2) and write(2) must copy_from_user() into > > the allocated DMA area. > > This is burned into silicon, so supporting it's not an option. Frankly > I think what's best is another device interface for userspace to fall > back to when this coherent userspace mmap() is unimplementable, e.g. > read()/write() on some device node. Exactly. We can implement the simple/nice interface discussed here, and just not support it on those platforms, they'll have to fall back to read/write or simply not support those drivers who require that functionality. Eventually a nopage variant may be worth for things doing really large mappings, but I tend to think that when we need to do that mapping to userland, it is because we need short latencies, which is the opposite of what a nopage implementation provides, dunno if it's worth the pain (though it's not _that_ painful). Ben. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 23:57 ` Benjamin Herrenschmidt @ 2004-03-23 0:22 ` David Woodhouse 2004-03-23 2:07 ` William Lee Irwin III 1 sibling, 0 replies; 38+ messages in thread From: David Woodhouse @ 2004-03-23 0:22 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: William Lee Irwin III, Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Tue, 2004-03-23 at 10:57 +1100, Benjamin Herrenschmidt wrote: > Eventually a nopage variant may be worth for things doing really > large mappings, but I tend to think that when we need to do that mapping > to userland, it is because we need short latencies, which is the opposite > of what a nopage implementation provides, dunno if it's worth the pain > (though it's not _that_ painful). Ideally the nopage variant is an implementation detail which the driver doesn't care about. Latency only counts on the first time each 'page' is touched, so shouldn't be too much of a problem in general, even for sound buffers? -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 23:57 ` Benjamin Herrenschmidt 2004-03-23 0:22 ` David Woodhouse @ 2004-03-23 2:07 ` William Lee Irwin III 2004-03-23 9:28 ` Russell King 2004-03-23 11:35 ` Andrea Arcangeli 1 sibling, 2 replies; 38+ messages in thread From: William Lee Irwin III @ 2004-03-23 2:07 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: >> This is burned into silicon, so supporting it's not an option. Frankly >> I think what's best is another device interface for userspace to fall >> back to when this coherent userspace mmap() is unimplementable, e.g. >> read()/write() on some device node. On Tue, Mar 23, 2004 at 10:57:19AM +1100, Benjamin Herrenschmidt wrote: > Exactly. We can implement the simple/nice interface discussed here, and > just not support it on those platforms, they'll have to fall back to > read/write or simply not support those drivers who require that > functionality. > Eventually a nopage variant may be worth for things doing really > large mappings, but I tend to think that when we need to do that mapping > to userland, it is because we need short latencies, which is the opposite > of what a nopage implementation provides, dunno if it's worth the pain > (though it's not _that_ painful). More generality in fault handling would be useful in various ways even beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the notion. I didn't insist on API but just moved on to trying to push for a solution to the driver issues to get merged at all so I can get on with cleaning up drivers using whatever API people want for the solution. I've already been over every ->nopage() in the kernel once (wrt. what's been merged anyway; a number of times for other reasons), so I really think I can do a bit of useful footwork here. -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 2:07 ` William Lee Irwin III @ 2004-03-23 9:28 ` Russell King 2004-03-23 9:34 ` David Woodhouse 2004-03-23 11:35 ` Andrea Arcangeli 1 sibling, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-23 9:28 UTC (permalink / raw) To: William Lee Irwin III Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote: > I've already been over every ->nopage() in the kernel once (wrt. what's > been merged anyway; a number of times for other reasons), so I really > think I can do a bit of useful footwork here. Note that currently I have dma_coherent_to_page(), dma_coherent_to_pfn() and dma_coherent_mmap() (and maybe dma_coherent_munmap()) implemented here. I'm now taking a back seat in these discussions waiting for one of them to take centre stage and be the One True chosen method. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 9:28 ` Russell King @ 2004-03-23 9:34 ` David Woodhouse 2004-03-23 10:04 ` Russell King 0 siblings, 1 reply; 38+ messages in thread From: David Woodhouse @ 2004-03-23 9:34 UTC (permalink / raw) To: Russell King Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Tue, 2004-03-23 at 09:28 +0000, Russell King wrote: > On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote: > > I've already been over every ->nopage() in the kernel once (wrt. what's > > been merged anyway; a number of times for other reasons), so I really > > think I can do a bit of useful footwork here. > > Note that currently I have dma_coherent_to_page(), dma_coherent_to_pfn() > and dma_coherent_mmap() (and maybe dma_coherent_munmap()) implemented > here. I'm now taking a back seat in these discussions waiting for one > of them to take centre stage and be the One True chosen method. dma_coherent_m{un,}map() makes most sense to me. Given that it's hard for some arches to make a 'struct page' available, is there any _reason_ to make them jump through that particular hoop expecting them to provide a dma_coherent_to_page()? Populating PTEs on demand through nopage() can be an implementation detail. You don't have to make 'struct page' available in the generic API to achieve that optimisation. -- dwmw2 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 9:34 ` David Woodhouse @ 2004-03-23 10:04 ` Russell King 2004-03-23 10:05 ` William Lee Irwin III 2004-03-23 11:29 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 38+ messages in thread From: Russell King @ 2004-03-23 10:04 UTC (permalink / raw) To: David Woodhouse Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote: > Populating PTEs on demand through nopage() can be an implementation > detail. You don't have to make 'struct page' available in the generic > API to achieve that optimisation. Indeed - and this is what my implementation of dma_coherent_mmap() does on ARM. Once everyone has decided on a solution, we can then move it forward. Currently it does look like dma_coherent_mmap() is the one of choice, so... Are there any remaining objections to it? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 10:04 ` Russell King @ 2004-03-23 10:05 ` William Lee Irwin III 2004-03-23 11:29 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 38+ messages in thread From: William Lee Irwin III @ 2004-03-23 10:05 UTC (permalink / raw) To: rmk, David Woodhouse, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote: >> Populating PTEs on demand through nopage() can be an implementation >> detail. You don't have to make 'struct page' available in the generic >> API to achieve that optimisation. On Tue, Mar 23, 2004 at 10:04:29AM +0000, Russell King wrote: > Indeed - and this is what my implementation of dma_coherent_mmap() does > on ARM. > Once everyone has decided on a solution, we can then move it forward. > Currently it does look like dma_coherent_mmap() is the one of choice, > so... Are there any remaining objections to it? I like dma_coherent_mmap(). -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 10:04 ` Russell King 2004-03-23 10:05 ` William Lee Irwin III @ 2004-03-23 11:29 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 38+ messages in thread From: Benjamin Herrenschmidt @ 2004-03-23 11:29 UTC (permalink / raw) To: Russell King Cc: David Woodhouse, William Lee Irwin III, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Tue, 2004-03-23 at 21:04, Russell King wrote: > On Tue, Mar 23, 2004 at 09:34:52AM +0000, David Woodhouse wrote: > > Populating PTEs on demand through nopage() can be an implementation > > detail. You don't have to make 'struct page' available in the generic > > API to achieve that optimisation. > > Indeed - and this is what my implementation of dma_coherent_mmap() does > on ARM. > > Once everyone has decided on a solution, we can then move it forward. > Currently it does look like dma_coherent_mmap() is the one of choice, > so... Are there any remaining objections to it? Looks fine to me. We may want to refine dma_coherent_alloc() in the first place though, like introducing a "real" __dma_coherent_alloc() that takes additional flags and have dma_coherent_alloc() just be a macro, that way James can pass in flags telling at alloc time that a given alloc will potentially be mapped to userland (if I understand James requirements properly). Ben. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 2:07 ` William Lee Irwin III 2004-03-23 9:28 ` Russell King @ 2004-03-23 11:35 ` Andrea Arcangeli 2004-03-23 11:44 ` William Lee Irwin III 1 sibling, 1 reply; 38+ messages in thread From: Andrea Arcangeli @ 2004-03-23 11:35 UTC (permalink / raw) To: William Lee Irwin III Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote: > On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: > >> This is burned into silicon, so supporting it's not an option. Frankly > >> I think what's best is another device interface for userspace to fall > >> back to when this coherent userspace mmap() is unimplementable, e.g. > >> read()/write() on some device node. > > On Tue, Mar 23, 2004 at 10:57:19AM +1100, Benjamin Herrenschmidt wrote: > > Exactly. We can implement the simple/nice interface discussed here, and > > just not support it on those platforms, they'll have to fall back to > > read/write or simply not support those drivers who require that > > functionality. > > Eventually a nopage variant may be worth for things doing really > > large mappings, but I tend to think that when we need to do that mapping > > to userland, it is because we need short latencies, which is the opposite > > of what a nopage implementation provides, dunno if it's worth the pain > > (though it's not _that_ painful). > > More generality in fault handling would be useful in various ways even > beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the > notion. I didn't insist on API but just moved on to trying to push for my guess is that he doesn't like a branch in the fast path and he thinks remap_file_pages approch is simpler for drivers to use. as for the initial page fault mentioned by Benjamin, that's a non issue, if one prefers to preallocate all the ptes thank to take a page fault the very first time the pages are touched, I already said some email ago that one can call mlock on the mapping and there will be not a single page fault anymore afterwards. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 11:35 ` Andrea Arcangeli @ 2004-03-23 11:44 ` William Lee Irwin III 2004-03-23 12:34 ` Andrea Arcangeli 0 siblings, 1 reply; 38+ messages in thread From: William Lee Irwin III @ 2004-03-23 11:44 UTC (permalink / raw) To: Andrea Arcangeli Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Mon, Mar 22, 2004 at 06:07:56PM -0800, William Lee Irwin III wrote: >> More generality in fault handling would be useful in various ways even >> beyond fixing ALSA's issues. I'm not sure why Linus doesn't like the >> notion. I didn't insist on API but just moved on to trying to push for On Tue, Mar 23, 2004 at 12:35:34PM +0100, Andrea Arcangeli wrote: > my guess is that he doesn't like a branch in the fast path and he thinks > remap_file_pages approch is simpler for drivers to use. Hmm. It should move preexisting method calls further up the call chain. I can't say I have a pressing enough need to pursue it personally unless it's the way to resolve issues like the one under discussion and so on. It looks like there's another way that's preferred, so I'm not looking into it anymore. On Tue, Mar 23, 2004 at 12:35:34PM +0100, Andrea Arcangeli wrote: > as for the initial page fault mentioned by Benjamin, that's a non issue, > if one prefers to preallocate all the ptes thank to take a page fault > the very first time the pages are touched, I already said some email ago > that one can call mlock on the mapping and there will be not a single > page fault anymore afterwards. mlock actually loops through the fault path, so in a sense it still requires fault handling on the part of the driver, though AFAICT it can largely be done by library code. I agree it should be an implementation detail of dma_mmap_coherent() etc. and pretty much up-front believed drivers would need code to assist them with fault handling if the did it, though it didn't originally occur to me that an mmap() function could install the entire handler on their behalf transparently to them. -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 11:44 ` William Lee Irwin III @ 2004-03-23 12:34 ` Andrea Arcangeli 2004-03-23 12:40 ` Russell King 2004-03-23 12:49 ` William Lee Irwin III 0 siblings, 2 replies; 38+ messages in thread From: Andrea Arcangeli @ 2004-03-23 12:34 UTC (permalink / raw) To: William Lee Irwin III Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 03:44:52AM -0800, William Lee Irwin III wrote: > mlock actually loops through the fault path, so in a sense it still > requires fault handling on the part of the driver, though AFAICT it can it requires fault handling of course, but that's just the API with the driver, on the performance side (and it was the performance/latency side of the page faults to be complained) no page faults are generated, so it's not going to be a lot different from the map_sg stuff at runtime. anyways Linus vetoed the lazy approch so we probably should give it up (the one thing I like most is to avoid the branch in the fast path). ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 12:34 ` Andrea Arcangeli @ 2004-03-23 12:40 ` Russell King 2004-03-23 15:25 ` Linus Torvalds 2004-03-25 20:25 ` Russell King 2004-03-23 12:49 ` William Lee Irwin III 1 sibling, 2 replies; 38+ messages in thread From: Russell King @ 2004-03-23 12:40 UTC (permalink / raw) To: Andrea Arcangeli Cc: William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > anyways Linus vetoed the lazy approch so we probably should give it up > (the one thing I like most is to avoid the branch in the fast path). I don't think he did - he vetoed adding another special condition to the fast path, or returning non-RAM pages via ->nopage. However, I do not believe he has vetoed an architecture implementing dma_coherent_mmap() in such a way that it uses the ->nopage method, _provided_ ->nopage returns valid struct pages. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 12:40 ` Russell King @ 2004-03-23 15:25 ` Linus Torvalds 2004-03-23 15:36 ` Andrea Arcangeli 2004-03-25 20:25 ` Russell King 1 sibling, 1 reply; 38+ messages in thread From: Linus Torvalds @ 2004-03-23 15:25 UTC (permalink / raw) To: Russell King Cc: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, 23 Mar 2004, Russell King wrote: > > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > > anyways Linus vetoed the lazy approch so we probably should give it up > > (the one thing I like most is to avoid the branch in the fast path). > > I don't think he did - he vetoed adding another special condition to > the fast path, or returning non-RAM pages via ->nopage. Indeed. What I _don't_ want is top add a new VM op function pointer as a special case. I abhor special cases, since they never go away, and end up making the code really hard to follow. > However, I do not believe he has vetoed an architecture implementing > dma_coherent_mmap() in such a way that it uses the ->nopage method, > _provided_ ->nopage returns valid struct pages. Yes. For all I care, the "struct page" migth even be dynamically allocated, or something else very special (eg in a zone of its own that the rest of the VM never ever actually sees). As long as "page_to_pfn()" works and does the right thing wrt such pages, that would be fine by me (ie as long as the VM doesn't need to have any special case code). Linus ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 15:25 ` Linus Torvalds @ 2004-03-23 15:36 ` Andrea Arcangeli 2004-03-23 15:46 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Andrea Arcangeli @ 2004-03-23 15:36 UTC (permalink / raw) To: Linus Torvalds Cc: Russell King, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 07:25:31AM -0800, Linus Torvalds wrote: > > > On Tue, 23 Mar 2004, Russell King wrote: > > > > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > > > anyways Linus vetoed the lazy approch so we probably should give it up > > > (the one thing I like most is to avoid the branch in the fast path). > > > > I don't think he did - he vetoed adding another special condition to > > the fast path, or returning non-RAM pages via ->nopage. > > Indeed. note that I was talking about non-ram, obviously ram pages can be returned via ->nopage and that's what drivers are using already. I know there is a problem with ram pages too, but as far as the ->nopage API is concerned the only problem are the non-ram pages. Russell's problem have nothing to do with ->nopage itself. > What I _don't_ want is top add a new VM op function pointer as a special > case. I abhor special cases, since they never go away, and end up making > the code really hard to follow. > > > However, I do not believe he has vetoed an architecture implementing > > dma_coherent_mmap() in such a way that it uses the ->nopage method, > > _provided_ ->nopage returns valid struct pages. > > Yes. For all I care, the "struct page" migth even be dynamically > allocated, or something else very special (eg in a zone of its own that I don't think it's sane to use discontigmem just to make ->nopage work with non-ram, if one has to use discontigmem just for that then I think it's much simpler to fill all the pagetables in ->mmap using the pfn w/o page_t. > the rest of the VM never ever actually sees). As long as "page_to_pfn()" zones cannot create holes in the middle of mem_map, only discontigmem can. I'd expect in most archs to have holes between ram and mmio regions (at least in various common ram configuration). That's why I guess discontigmem would be needed for that. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 15:36 ` Andrea Arcangeli @ 2004-03-23 15:46 ` Linus Torvalds 2004-03-23 15:50 ` Russell King 2004-03-23 22:10 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 38+ messages in thread From: Linus Torvalds @ 2004-03-23 15:46 UTC (permalink / raw) To: Andrea Arcangeli Cc: Russell King, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, 23 Mar 2004, Andrea Arcangeli wrote: > > > > Yes. For all I care, the "struct page" migth even be dynamically > > allocated, or something else very special (eg in a zone of its own that > > I don't think it's sane to use discontigmem just to make ->nopage work > with non-ram, if one has to use discontigmem just for that then I think > it's much simpler to fill all the pagetables in ->mmap using the pfn w/o > page_t. Oh, I absolutely agree. My point was that I really don't care how a driver does things, as long as it does _not_ create any VM special cases. And I definitely think that for non-RAM pages it tends to make most sense to just statically set up the mapping at ->mmap() time. But I'm also saying that if a driver _wants_ to do dynamic mapping for some really strange architecture reasons, then such an architecture could choose to have a magic zone or something like that for that case. At that point it is an _architecture_ special case, which contains the problem enough that I don't need to care. That kind of "strange 'struct page'" approach would cover the case where you really want to have a "struct page" associated with a DMA coherent allocation, even if such a page would never be part of any _normal_ memory allocations (and I seriously doubt that any sane architecture would want to do anything like that, but I could well imagine that some Amiga with "chip ram" or similar might go this route). In general, I'd _prefer_ for really special mappings to be as static as possible. So we should probably aim for having "IO mappings" be set up at "->mmap()" time if at all possible. The less clever stuff that happens dynamically, the better, imho. Linus ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 15:36 ` Andrea Arcangeli 2004-03-23 15:46 ` Linus Torvalds @ 2004-03-23 15:50 ` Russell King 2004-03-23 22:10 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 38+ messages in thread From: Russell King @ 2004-03-23 15:50 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 04:36:41PM +0100, Andrea Arcangeli wrote: > On Tue, Mar 23, 2004 at 07:25:31AM -0800, Linus Torvalds wrote: > > On Tue, 23 Mar 2004, Russell King wrote: > > > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > > > > anyways Linus vetoed the lazy approch so we probably should give it up > > > > (the one thing I like most is to avoid the branch in the fast path). > > > > > > I don't think he did - he vetoed adding another special condition to > > > the fast path, or returning non-RAM pages via ->nopage. > > > > Indeed. > > note that I was talking about non-ram, obviously ram pages can be > returned via ->nopage and that's what drivers are using already. Let's not get distracted into the other problem areas. What we're talking about here is solving the "how to map memory returned from dma_alloc_coherent()". There's the related problem (which Jeff has - via82cxxx_audio.c) which is effectively a scatter-gather dma_alloc_coherent() + dma_coherent_mmap() problem. Then there's the unrelated problem where ALSA wants to map buffers on PCI devices coherently into user space. The these are three distinct problems, and we should not confuse them. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 15:36 ` Andrea Arcangeli 2004-03-23 15:46 ` Linus Torvalds 2004-03-23 15:50 ` Russell King @ 2004-03-23 22:10 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 38+ messages in thread From: Benjamin Herrenschmidt @ 2004-03-23 22:10 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Russell King, William Lee Irwin III, Jeff Garzik, James Bottomley, Linux Arch list, David Woodhouse, Christoph Hellwig, Andrew Morton > zones cannot create holes in the middle of mem_map, only discontigmem > can. I'd expect in most archs to have holes between ram and mmio > regions (at least in various common ram configuration). That's why I > guess discontigmem would be needed for that. Well, just waste some mem_map or use non-trivial page_to_pfn using some high bit in the address on those archs. No need for DISCONTIGMEM for that. For example, on various ppc64's, there is an IO hole of 1 or 2Gb, so you have 2 or 3Gb of RAM, then the IO hole, then the rest of RAM, so far I implement that without using DISCONTIGMEM, just giving the hole size when initializing the zone. That waste some memmap space, but that's fine for now (the ppc64 discontigmem code would need some surgery to be split from the numa stuff for beeing able to use it). Ben. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 12:40 ` Russell King 2004-03-23 15:25 ` Linus Torvalds @ 2004-03-25 20:25 ` Russell King 2004-03-28 10:17 ` Russell King 1 sibling, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-25 20:25 UTC (permalink / raw) To: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 12:40:27PM +0000, Russell King wrote: > On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > > anyways Linus vetoed the lazy approch so we probably should give it up > > (the one thing I like most is to avoid the branch in the fast path). > > I don't think he did - he vetoed adding another special condition to > the fast path, or returning non-RAM pages via ->nopage. > > However, I do not believe he has vetoed an architecture implementing > dma_coherent_mmap() in such a way that it uses the ->nopage method, > _provided_ ->nopage returns valid struct pages. Ok, since this thread seems to have died without much action happening, its time to re-start it (but note - I probably won't be around tomorrow.) I'd like to get the dma_coherent_mmap() API sorted out such that everyone is happy, and we can progress it. From what I've gathered, we seem to be happy with the dma_coherent_mmap() approach. Is everyone happy with these prototypes? int dma_coherent_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size); and, for the PA-RISC architecture (c/o James Bottomley): void dma_coherent_munmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size); where: - dev: the device for which this coherent region was created for - vma: VM area struct describing the requested user mapping - cpu_addr: the address returned from dma_alloc_coherent - dma_addr: the DMA cookie returned from dma_alloc_coherent - size: the size of the DMA allocation As far as ARM goes, we (currently) only need cpu_addr to look up the data associated with the kernels coherent DMA mapping. Whether the other arguments are useful depends on what other architectures require. Is everyone happy with the name, or would people prefer it to be more consistent with the other dma_xxx_coherent() functions (iow, dma_mmap_coherent?) PS, one of my pet annoyances with the DMA API is that dma_alloc_coherent() doesn't return/take some architecturally defined structure, and that there aren't accessor macros like dma_cpu_addr() dma_device_addr(). This means that we end up carrying around several bits of data, which may be the same on some architectures. People objected to this in 2.4, and we ended up adding that yucky "DECLARE_PCI_UNMAP_ADDR" stuff - which may happen during 2.6 to the DMA API. Adding these further APIs is just making this mistake worse IMO. It's really a 2.7 problem though. And yes, I've just talked people out of the prototypes I've proposed above. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-25 20:25 ` Russell King @ 2004-03-28 10:17 ` Russell King 0 siblings, 0 replies; 38+ messages in thread From: Russell King @ 2004-03-28 10:17 UTC (permalink / raw) To: Andrea Arcangeli, William Lee Irwin III, Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Thu, Mar 25, 2004 at 08:25:44PM +0000, Russell King wrote: > >From what I've gathered, we seem to be happy with the dma_coherent_mmap() > approach. Is everyone happy with these prototypes? > > int dma_coherent_mmap(struct device *dev, struct vm_area_struct *vma, > void *cpu_addr, dma_addr_t dma_addr, size_t size); > > and, for the PA-RISC architecture (c/o James Bottomley): > > void dma_coherent_munmap(struct device *dev, struct vm_area_struct *vma, > void *cpu_addr, dma_addr_t dma_addr, size_t size); I'm not happy with dma_coherent_munmap() actually - we don't really know the lifetime of the vma, so drivers should not be tempted into keeping a reference to it. Since interest in this subject appears to have dropped to zero (as can be seen from the numerous (0) responses to my last post) it is my intention to provide just the dma_mmap_coherent interface and let PA-RISC people figure out how to handle their architecture. I'm shortly going to post a couple of patches to support dma_coherent_mmap() on x86 and ARM on linux-arch. Could other architectures follow up with their patches please? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-23 12:34 ` Andrea Arcangeli 2004-03-23 12:40 ` Russell King @ 2004-03-23 12:49 ` William Lee Irwin III 1 sibling, 0 replies; 38+ messages in thread From: William Lee Irwin III @ 2004-03-23 12:49 UTC (permalink / raw) To: Andrea Arcangeli Cc: Benjamin Herrenschmidt, Jeff Garzik, James Bottomley, Russell King, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton On Tue, Mar 23, 2004 at 03:44:52AM -0800, William Lee Irwin III wrote: >> mlock actually loops through the fault path, so in a sense it still >> requires fault handling on the part of the driver, though AFAICT it can On Tue, Mar 23, 2004 at 01:34:39PM +0100, Andrea Arcangeli wrote: > it requires fault handling of course, but that's just the API with the > driver, on the performance side (and it was the performance/latency side > of the page faults to be complained) no page faults are generated, so > it's not going to be a lot different from the map_sg stuff at runtime. > anyways Linus vetoed the lazy approch so we probably should give it up > (the one thing I like most is to avoid the branch in the fast path). dma_mmap_coherent() being implemented via fault handling is unrelated to ->fault() methods. It just uses the preexisting ->nopage() method internally and transparently to the driver, and without any hooks needed in the API either. Basically, however the arch wants to do it so long as it fits into ->nopage(), doesn't need changes to the core, etc. -- wli ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 22:18 ` Jeff Garzik 2004-03-22 22:35 ` William Lee Irwin III @ 2004-03-22 23:19 ` Russell King 2004-03-22 23:35 ` Jeff Garzik 1 sibling, 1 reply; 38+ messages in thread From: Russell King @ 2004-03-22 23:19 UTC (permalink / raw) To: Jeff Garzik Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: > You need both kernel and userspace... for audio drivers, mmap(2) is > direct to userspace, but read(2) and write(2) must copy_from_user() into > the allocated DMA area. Not actually true in this case - audio drivers are either mmap() only or read/write only, never both at the same time. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 23:19 ` Russell King @ 2004-03-22 23:35 ` Jeff Garzik 2004-03-23 2:26 ` James Bottomley 0 siblings, 1 reply; 38+ messages in thread From: Jeff Garzik @ 2004-03-22 23:35 UTC (permalink / raw) To: Russell King Cc: Benjamin Herrenschmidt, James Bottomley, William Lee Irwin III, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli Russell King wrote: > On Mon, Mar 22, 2004 at 05:18:30PM -0500, Jeff Garzik wrote: > >>You need both kernel and userspace... for audio drivers, mmap(2) is >>direct to userspace, but read(2) and write(2) must copy_from_user() into >>the allocated DMA area. > > > Not actually true in this case - audio drivers are either mmap() only > or read/write only, never both at the same time. Agreed, but due to OSS dain bramage you can read/write as much as you like, up until the mmap point, AFAICS. It's much easier for the driver to allocate one set of buffers, than to allocate a set at open(2), throw away those allocs at mmap(2) and make new ones. Jeff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: can device drivers return non-ram via vm_ops->nopage? 2004-03-22 23:35 ` Jeff Garzik @ 2004-03-23 2:26 ` James Bottomley 0 siblings, 0 replies; 38+ messages in thread From: James Bottomley @ 2004-03-23 2:26 UTC (permalink / raw) To: Jeff Garzik Cc: Russell King, Benjamin Herrenschmidt, William Lee Irwin III, Linux Arch list, Linus Torvalds, David Woodhouse, Christoph Hellwig, Andrew Morton, Andrea Arcangeli On Mon, 2004-03-22 at 18:35, Jeff Garzik wrote: > Agreed, but due to OSS dain bramage you can read/write as much as you > like, up until the mmap point, AFAICS. It's much easier for the driver > to allocate one set of buffers, than to allocate a set at open(2), throw > away those allocs at mmap(2) and make new ones. I didn't say throw the buffers away, merely the mapping. I think you're looking at this the wrong way. We only get into this whole mess of being coherent with respect to a single address space if we don't obey the virtual address congruence modulus rules As Russell already pointed out, as long as we can force the virtual addresses of the mappings (that's all mappings, in both the kernel and in user space) to obey the congruence modulus rules then were home free. On PA, we already force any mmapping that will be shared (MAP_SHARED) to obey the congruence rules (we allocate them all at 0 mod 4MB, which is our congruence modulus) by hijacking arch_get_unmapped_area. Thus, as long as the sound card application designates its mappings as MAP_SHARED, we're half way there. The other wrinkle is that we'll have to allocate the coherent memory *also* on a virtual address of 0 mod 4MB. i.e. if we can be told *before* we hand out the coherent area that it will be mmapped, we can make it work. This is going to have to be an extra flag to dma_alloc_coherent() or something. The wrong thinking is that this is something we can fix at mapping time, it's not, it's something we have to set up at buffer allocation time. James ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2004-03-28 10:18 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040321204931.A11519@infradead.org>
[not found] ` <1079902670.17681.324.camel@imladris.demon.co.uk>
[not found] ` <Pine.LNX.4.58.0403211349340.1106@ppc970.osdl.org>
[not found] ` <20040321222327.D26708@flint.arm.linux.org.uk>
[not found] ` <405E1859.5030906@pobox.com>
[not found] ` <20040321225117.F26708@flint.arm.linux.org.uk>
[not found] ` <Pine.LNX.4.58.0403211504550.1106@ppc970.osdl.org>
[not found] ` <20040321234515.G26708@flint.arm.linux.org.uk>
[not found] ` <20040322002349.GZ2045@holomorphy.com>
[not found] ` <405E3387.1050505@pobox.com>
2004-03-22 3:45 ` can device drivers return non-ram via vm_ops->nopage? William Lee Irwin III
2004-03-22 4:41 ` James Bottomley
2004-03-22 4:46 ` William Lee Irwin III
2004-03-22 4:56 ` James Bottomley
2004-03-22 5:26 ` Benjamin Herrenschmidt
2004-03-22 11:58 ` Andrea Arcangeli
2004-03-22 12:05 ` Russell King
2004-03-22 12:34 ` Andrea Arcangeli
2004-03-22 9:30 ` Russell King
2004-03-22 15:04 ` James Bottomley
2004-03-22 15:15 ` Russell King
2004-03-22 15:27 ` James Bottomley
2004-03-22 21:50 ` Benjamin Herrenschmidt
2004-03-22 22:18 ` Jeff Garzik
2004-03-22 22:35 ` William Lee Irwin III
2004-03-22 23:57 ` Benjamin Herrenschmidt
2004-03-23 0:22 ` David Woodhouse
2004-03-23 2:07 ` William Lee Irwin III
2004-03-23 9:28 ` Russell King
2004-03-23 9:34 ` David Woodhouse
2004-03-23 10:04 ` Russell King
2004-03-23 10:05 ` William Lee Irwin III
2004-03-23 11:29 ` Benjamin Herrenschmidt
2004-03-23 11:35 ` Andrea Arcangeli
2004-03-23 11:44 ` William Lee Irwin III
2004-03-23 12:34 ` Andrea Arcangeli
2004-03-23 12:40 ` Russell King
2004-03-23 15:25 ` Linus Torvalds
2004-03-23 15:36 ` Andrea Arcangeli
2004-03-23 15:46 ` Linus Torvalds
2004-03-23 15:50 ` Russell King
2004-03-23 22:10 ` Benjamin Herrenschmidt
2004-03-25 20:25 ` Russell King
2004-03-28 10:17 ` Russell King
2004-03-23 12:49 ` William Lee Irwin III
2004-03-22 23:19 ` Russell King
2004-03-22 23:35 ` Jeff Garzik
2004-03-23 2:26 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox