* Creating kernel mappings for memory initially marked with bootmem NOMAP? @ 2017-03-08 19:03 Florian Fainelli 2017-03-08 19:14 ` Ard Biesheuvel 2017-03-08 19:26 ` Russell King - ARM Linux 0 siblings, 2 replies; 9+ messages in thread From: Florian Fainelli @ 2017-03-08 19:03 UTC (permalink / raw) To: linux-arm-kernel Hi, On our platforms (brcmstb) we have an use case where we boot with some (a lot actually) memory carved out and marked initially with bootmem NOMAP in order for this memory not to be mapped in the kernel's linear mapping. Now, we have some peripherals that want large chunks of physically and virtually contiguous memory that belong to these memblock NOMAP ranges. I have no problems using mmap() against this memory, because the kernel will do what is necessary for a process to map it for me. The struggle is for a kernel driver which specifies a range of physical memory and size, and expects a virtually contiguous mapping in return (not using DMA-API, because reasons). Essentially the problem is that there are no PTEs created for these memory regions (and pfn_valid() returns 0, since this is NOMAP memory), so I have been playing with __add_pages() from the memory hotplug code in an attempt to get proper page references to this memory, but I am clearly missing something. Yes I know it's a terrible idea, but what if I wanted to get that working? Thanks in advance! -- Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 19:03 Creating kernel mappings for memory initially marked with bootmem NOMAP? Florian Fainelli @ 2017-03-08 19:14 ` Ard Biesheuvel 2017-03-08 19:52 ` Florian Fainelli 2017-03-08 19:26 ` Russell King - ARM Linux 1 sibling, 1 reply; 9+ messages in thread From: Ard Biesheuvel @ 2017-03-08 19:14 UTC (permalink / raw) To: linux-arm-kernel > On 8 Mar 2017, at 20:03, Florian Fainelli <f.fainelli@gmail.com> wrote: > > Hi, > > On our platforms (brcmstb) we have an use case where we boot with some > (a lot actually) memory carved out and marked initially with bootmem > NOMAP in order for this memory not to be mapped in the kernel's linear > mapping. > > Now, we have some peripherals that want large chunks of physically and > virtually contiguous memory that belong to these memblock NOMAP ranges. > I have no problems using mmap() against this memory, because the kernel > will do what is necessary for a process to map it for me. The struggle > is for a kernel driver which specifies a range of physical memory and > size, and expects a virtually contiguous mapping in return (not using > DMA-API, because reasons). > > Essentially the problem is that there are no PTEs created for these > memory regions (and pfn_valid() returns 0, since this is NOMAP memory), > so I have been playing with __add_pages() from the memory hotplug code > in an attempt to get proper page references to this memory, but I am > clearly missing something. > > Yes I know it's a terrible idea, but what if I wanted to get that working? > Did you try memremap? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 19:14 ` Ard Biesheuvel @ 2017-03-08 19:52 ` Florian Fainelli 2017-03-08 22:06 ` Ard Biesheuvel 0 siblings, 1 reply; 9+ messages in thread From: Florian Fainelli @ 2017-03-08 19:52 UTC (permalink / raw) To: linux-arm-kernel On 03/08/2017 11:14 AM, Ard Biesheuvel wrote: > >> On 8 Mar 2017, at 20:03, Florian Fainelli <f.fainelli@gmail.com> wrote: >> >> Hi, >> >> On our platforms (brcmstb) we have an use case where we boot with some >> (a lot actually) memory carved out and marked initially with bootmem >> NOMAP in order for this memory not to be mapped in the kernel's linear >> mapping. >> >> Now, we have some peripherals that want large chunks of physically and >> virtually contiguous memory that belong to these memblock NOMAP ranges. >> I have no problems using mmap() against this memory, because the kernel >> will do what is necessary for a process to map it for me. The struggle >> is for a kernel driver which specifies a range of physical memory and >> size, and expects a virtually contiguous mapping in return (not using >> DMA-API, because reasons). >> >> Essentially the problem is that there are no PTEs created for these >> memory regions (and pfn_valid() returns 0, since this is NOMAP memory), >> so I have been playing with __add_pages() from the memory hotplug code >> in an attempt to get proper page references to this memory, but I am >> clearly missing something. >> >> Yes I know it's a terrible idea, but what if I wanted to get that working? >> > > Did you try memremap? Not yet, because this is done on 4.1 at the moment, but I will definitively give this a try, thanks a lot! Side note: on a kernel that does not have memremap() (such as 4.1), would not an ioremap_cache() on the physical range marked as NOMAP result in the same behavior anyway? ioremap() won't catch the fact that we are mapping RAM, since this is NOMAP, pfn_valid() returns 0. My understanding of the pfn_valid() check for ioremap() is to avoid mapping the same DRAM location twice with potentially conflicting attributes, but if it has not been mapped at all, as is the case with NOMAP, does not that get me the same results? Thanks! -- Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 19:52 ` Florian Fainelli @ 2017-03-08 22:06 ` Ard Biesheuvel 2017-03-08 22:10 ` Florian Fainelli 0 siblings, 1 reply; 9+ messages in thread From: Ard Biesheuvel @ 2017-03-08 22:06 UTC (permalink / raw) To: linux-arm-kernel On 8 March 2017 at 20:52, Florian Fainelli <f.fainelli@gmail.com> wrote: > On 03/08/2017 11:14 AM, Ard Biesheuvel wrote: >> >>> On 8 Mar 2017, at 20:03, Florian Fainelli <f.fainelli@gmail.com> wrote: >>> >>> Hi, >>> >>> On our platforms (brcmstb) we have an use case where we boot with some >>> (a lot actually) memory carved out and marked initially with bootmem >>> NOMAP in order for this memory not to be mapped in the kernel's linear >>> mapping. >>> >>> Now, we have some peripherals that want large chunks of physically and >>> virtually contiguous memory that belong to these memblock NOMAP ranges. >>> I have no problems using mmap() against this memory, because the kernel >>> will do what is necessary for a process to map it for me. The struggle >>> is for a kernel driver which specifies a range of physical memory and >>> size, and expects a virtually contiguous mapping in return (not using >>> DMA-API, because reasons). >>> >>> Essentially the problem is that there are no PTEs created for these >>> memory regions (and pfn_valid() returns 0, since this is NOMAP memory), >>> so I have been playing with __add_pages() from the memory hotplug code >>> in an attempt to get proper page references to this memory, but I am >>> clearly missing something. >>> >>> Yes I know it's a terrible idea, but what if I wanted to get that working? >>> >> >> Did you try memremap? > > Not yet, because this is done on 4.1 at the moment, but I will > definitively give this a try, thanks a lot! > > Side note: on a kernel that does not have memremap() (such as 4.1), > would not an ioremap_cache() on the physical range marked as NOMAP > result in the same behavior anyway? ioremap() won't catch the fact that > we are mapping RAM, since this is NOMAP, pfn_valid() returns 0. > > My understanding of the pfn_valid() check for ioremap() is to avoid > mapping the same DRAM location twice with potentially conflicting > attributes, but if it has not been mapped at all, as is the case with > NOMAP, does not that get me the same results? > Yes, it does. But ioremap_cache() is deprecated for mapping normal memory. There remains a case for ioremap_cache() on ARM for mapping NOR flash (which is arguably a device) with cacheable attributes, but for the general case of mapping DRAM, you should not expect new code using ioremap_cache() to be accepted upstream. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 22:06 ` Ard Biesheuvel @ 2017-03-08 22:10 ` Florian Fainelli 2017-03-16 19:04 ` Florian Fainelli 0 siblings, 1 reply; 9+ messages in thread From: Florian Fainelli @ 2017-03-08 22:10 UTC (permalink / raw) To: linux-arm-kernel On 03/08/2017 02:06 PM, Ard Biesheuvel wrote: > On 8 March 2017 at 20:52, Florian Fainelli <f.fainelli@gmail.com> wrote: >> On 03/08/2017 11:14 AM, Ard Biesheuvel wrote: >>> >>>> On 8 Mar 2017, at 20:03, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> On our platforms (brcmstb) we have an use case where we boot with some >>>> (a lot actually) memory carved out and marked initially with bootmem >>>> NOMAP in order for this memory not to be mapped in the kernel's linear >>>> mapping. >>>> >>>> Now, we have some peripherals that want large chunks of physically and >>>> virtually contiguous memory that belong to these memblock NOMAP ranges. >>>> I have no problems using mmap() against this memory, because the kernel >>>> will do what is necessary for a process to map it for me. The struggle >>>> is for a kernel driver which specifies a range of physical memory and >>>> size, and expects a virtually contiguous mapping in return (not using >>>> DMA-API, because reasons). >>>> >>>> Essentially the problem is that there are no PTEs created for these >>>> memory regions (and pfn_valid() returns 0, since this is NOMAP memory), >>>> so I have been playing with __add_pages() from the memory hotplug code >>>> in an attempt to get proper page references to this memory, but I am >>>> clearly missing something. >>>> >>>> Yes I know it's a terrible idea, but what if I wanted to get that working? >>>> >>> >>> Did you try memremap? >> >> Not yet, because this is done on 4.1 at the moment, but I will >> definitively give this a try, thanks a lot! >> >> Side note: on a kernel that does not have memremap() (such as 4.1), >> would not an ioremap_cache() on the physical range marked as NOMAP >> result in the same behavior anyway? ioremap() won't catch the fact that >> we are mapping RAM, since this is NOMAP, pfn_valid() returns 0. >> >> My understanding of the pfn_valid() check for ioremap() is to avoid >> mapping the same DRAM location twice with potentially conflicting >> attributes, but if it has not been mapped at all, as is the case with >> NOMAP, does not that get me the same results? >> > > Yes, it does. But ioremap_cache() is deprecated for mapping normal > memory. There remains a case for ioremap_cache() on ARM for mapping > NOR flash (which is arguably a device) with cacheable attributes, but > for the general case of mapping DRAM, you should not expect new code > using ioremap_cache() to be accepted upstream. This is very likely going to remain out of tree, and I will keep an eye on migrating this to memremap() when we update to a newer kernel. Thanks! -- Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 22:10 ` Florian Fainelli @ 2017-03-16 19:04 ` Florian Fainelli 2017-03-16 20:00 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: Florian Fainelli @ 2017-03-16 19:04 UTC (permalink / raw) To: linux-arm-kernel On 03/08/2017 02:10 PM, Florian Fainelli wrote: >> Yes, it does. But ioremap_cache() is deprecated for mapping normal >> memory. There remains a case for ioremap_cache() on ARM for mapping >> NOR flash (which is arguably a device) with cacheable attributes, but >> for the general case of mapping DRAM, you should not expect new code >> using ioremap_cache() to be accepted upstream. > > This is very likely going to remain out of tree, and I will keep an eye > on migrating this to memremap() when we update to a newer kernel. Thanks! And now I have another interesting problem, self inflicted of course. We have this piece of code here in mm/gup.c [1] which is meant to allow doing O_DIRECT on pages that are now marked as NOMAP. Our middle-ware does a mmap() of some regions initially marked as NOMAP such that it can access this memory and do a mapping "on demand" only when using these physical memory regions. The use case for O_DIRECT is to playback a file directly from e.g: a local hard drive it provides a significant enough performance boost we want to keep bypassing the page cache. After removing the check in the above mentioned piece of code for !pfn_valid() and making it a !memblock_is_memory(__pfn_to_phys(pfn)) I can move on and everything seems to be fine, except that eventually, we have the following call trace: ata_qc_issue -> arm_dma_map_sg -> arm_dma_map_page -> __dma_page_cpu_to_dev -> dma_cache_maint_page [ 170.253148] [00000000] *pgd=07b0e003, *pmd=0bc31003, *pte=00000000 [ 170.262157] Internal error: Oops: 207 [#1] SMP ARM [ 170.279088] CPU: 1 PID: 1688 Comm: nx_io_worker0 Tainted: P O 4.1.20-1.8pre-01028-g970868a93bbc-dirty #6 [ 170.289708] Hardware name: Broadcom STB (Flattened Device Tree) [ 170.295635] task: cd16d500 ti: c7340000 task.ti: c7340000 [ 170.301048] PC is at dma_cache_maint_page+0x70/0x140 [ 170.306019] LR is at __dma_page_cpu_to_dev+0x2c/0xa8 [ 170.310989] pc : [<c001cbec>] lr : [<c001cce8>] psr: 60010093 [ 170.310989] sp : c7341af8 ip : 00000000 fp : c0e3a300 [ 170.322479] r10: 00000002 r9 : c00219a4 r8 : c0e6c740 [ 170.327709] r7 : 00000000 r6 : 00010000 r5 : feb8cca0 r4 : fff5c665 [ 170.334244] r3 : c0e0a4a8 r2 : 0000007f r1 : 0000fff5 r0 : ce97aca0 [ 170.340779] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 170.348009] Control: 30c5387d Table: 07c351c0 DAC: 55555555 and that's actually coming from the fact that we have SPARSEMEM (actually SPARSEMEM && SPARSEMEM_MANUAL && SPARSEMEM_EXTREME) enabled for this platform and __section_mem_map_addr() de-references section->section_mem_map and section is NULL here as a result of a call to __page_to_pfn() and __pfn_to_page(). So I guess my question is: if a process is mapping some physical memory through /dev/mem, could sparsemem somehow populate that section corresponding to this PFN? Everything I see seems to occur at boot time and when memory hotplug is used (maybe I should start using memory hotplug). Thanks! [1]: https://github.com/Broadcom/stblinux-4.1/blob/master/linux/mm/gup.c#L388 -- Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-16 19:04 ` Florian Fainelli @ 2017-03-16 20:00 ` Russell King - ARM Linux 0 siblings, 0 replies; 9+ messages in thread From: Russell King - ARM Linux @ 2017-03-16 20:00 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 16, 2017 at 12:04:26PM -0700, Florian Fainelli wrote: > On 03/08/2017 02:10 PM, Florian Fainelli wrote: > >> Yes, it does. But ioremap_cache() is deprecated for mapping normal > >> memory. There remains a case for ioremap_cache() on ARM for mapping > >> NOR flash (which is arguably a device) with cacheable attributes, but > >> for the general case of mapping DRAM, you should not expect new code > >> using ioremap_cache() to be accepted upstream. > > > > This is very likely going to remain out of tree, and I will keep an eye > > on migrating this to memremap() when we update to a newer kernel. Thanks! > > And now I have another interesting problem, self inflicted of course. We > have this piece of code here in mm/gup.c [1] which is meant to allow > doing O_DIRECT on pages that are now marked as NOMAP. I think you're wrong. get_user_pages() retrieves a list of "struct page" pointers for the range of user addresses. NOMAP regions do not have an associated "struct page" (they're not declared into the Linux page allocator.) > Our middle-ware does a mmap() of some regions initially marked as NOMAP > such that it can access this memory and do a mapping "on demand" only > when using these physical memory regions. The use case for O_DIRECT is > to playback a file directly from e.g: a local hard drive it provides a > significant enough performance boost we want to keep bypassing the page > cache. > > After removing the check in the above mentioned piece of code for > !pfn_valid() and making it a !memblock_is_memory(__pfn_to_phys(pfn)) I > can move on and everything seems to be fine, except that eventually, we > have the following call trace: pfn_valid()'s whole point of existing is to return true only for pfns that correspond with pages managed by the Linux page allocator. You've bypassed that, making the test return true for other pfns. This means that: page = pte_page(pte); is going to return rubbish for "page", which will lead to... > ata_qc_issue -> arm_dma_map_sg -> arm_dma_map_page -> > __dma_page_cpu_to_dev -> dma_cache_maint_page > > [ 170.253148] [00000000] *pgd=07b0e003, *pmd=0bc31003, *pte=00000000 > [ 170.262157] Internal error: Oops: 207 [#1] SMP ARM > [ 170.279088] CPU: 1 PID: 1688 Comm: nx_io_worker0 Tainted: P > O 4.1.20-1.8pre-01028-g970868a93bbc-dirty #6 > [ 170.289708] Hardware name: Broadcom STB (Flattened Device Tree) > [ 170.295635] task: cd16d500 ti: c7340000 task.ti: c7340000 > [ 170.301048] PC is at dma_cache_maint_page+0x70/0x140 > [ 170.306019] LR is at __dma_page_cpu_to_dev+0x2c/0xa8 exactly this, because DMA cache maintanence relies upon having a valid and de-reference-able struct page. > So I guess my question is: if a process is mapping some physical memory > through /dev/mem, could sparsemem somehow populate that section > corresponding to this PFN? Everything I see seems to occur at boot time > and when memory hotplug is used (maybe I should start using memory hotplug). If you hotplug the memory into the Linux page allocator, then you will need the memory to be mapped, and Linux will integrate it into the page allocator, and it will be no different from any other memory. At that point, you might as well have ignored the NOMAP. Linux's block IO is just not designed to do device DMA to random bits of memory that are not part of the page allocator. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 19:03 Creating kernel mappings for memory initially marked with bootmem NOMAP? Florian Fainelli 2017-03-08 19:14 ` Ard Biesheuvel @ 2017-03-08 19:26 ` Russell King - ARM Linux 2017-03-08 19:29 ` Russell King - ARM Linux 1 sibling, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2017-03-08 19:26 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 08, 2017 at 11:03:43AM -0800, Florian Fainelli wrote: > Now, we have some peripherals that want large chunks of physically and > virtually contiguous memory that belong to these memblock NOMAP ranges. > I have no problems using mmap() against this memory, because the kernel > will do what is necessary for a process to map it for me. The struggle > is for a kernel driver which specifies a range of physical memory and > size, and expects a virtually contiguous mapping in return (not using > DMA-API, because reasons). Will vm_iomap_memory() do the job? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Creating kernel mappings for memory initially marked with bootmem NOMAP? 2017-03-08 19:26 ` Russell King - ARM Linux @ 2017-03-08 19:29 ` Russell King - ARM Linux 0 siblings, 0 replies; 9+ messages in thread From: Russell King - ARM Linux @ 2017-03-08 19:29 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 08, 2017 at 07:26:53PM +0000, Russell King - ARM Linux wrote: > On Wed, Mar 08, 2017 at 11:03:43AM -0800, Florian Fainelli wrote: > > Now, we have some peripherals that want large chunks of physically and > > virtually contiguous memory that belong to these memblock NOMAP ranges. > > I have no problems using mmap() against this memory, because the kernel > > will do what is necessary for a process to map it for me. The struggle > > is for a kernel driver which specifies a range of physical memory and > > size, and expects a virtually contiguous mapping in return (not using > > DMA-API, because reasons). > > Will vm_iomap_memory() do the job? Sorry, I thought you were asking about userspace. The memremap() family of functions is what you want for mapping it into the kernel. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-03-16 20:00 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-08 19:03 Creating kernel mappings for memory initially marked with bootmem NOMAP? Florian Fainelli 2017-03-08 19:14 ` Ard Biesheuvel 2017-03-08 19:52 ` Florian Fainelli 2017-03-08 22:06 ` Ard Biesheuvel 2017-03-08 22:10 ` Florian Fainelli 2017-03-16 19:04 ` Florian Fainelli 2017-03-16 20:00 ` Russell King - ARM Linux 2017-03-08 19:26 ` Russell King - ARM Linux 2017-03-08 19:29 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox