* AArch64 memory
@ 2018-05-17 15:58 Tim Harvey
2018-05-18 11:59 ` Robin Murphy
0 siblings, 1 reply; 6+ messages in thread
From: Tim Harvey @ 2018-05-17 15:58 UTC (permalink / raw)
To: linux-arm-kernel
Greetings,
I'm trying to understand some details of the AArch64 memory
configuration in the kernel.
I've looked at Documentation/arm64/memory.txt which describes the
virtual memory layout used in terms of translation levels. This
relates to CONFIG_ARM64_{4K,16K,64K}_PAGES and CONFIG_ARM64_VA_BITS*.
My first question has to do with virtual memory layout: What are the
advantages and disadvantages for a system with a fixed 2GB of DRAM
when using using 4KB pages + 3 levels (CONFIG_ARM64_4K_PAGES=y
CONFIG_ARM64_VA_BITS=39) resulting in 512GB user / 512GB kernel vs
using 64KB pages + 3 levels (CONFIG_ARM64_64K_PAGES=y
CONFIG_ARM64_VA_BITS=48)? The physical memory is far less than what
any of the combinations would offer but I'm not clear if the number of
levels affects any sort of performance or how fragmentation could play
into performance.
My second question has to do with CMA and coherent_pool. I have
understood CMA as being a chunk of physical memory carved out by the
kernel for allocations from dma_alloc_coherent by drivers that need
chunks of contiguous memory for DMA buffers. I believe that before CMA
was introduced we had to do this by defining memory holes. I'm not
understanding the difference between CMA and the coherent pool. I've
noticed that if CONFIG_DMA_CMA=y then the coherent pool allocates from
CMA. Is there some disadvantage of CONFIG_DMA_CMA=y other than if
defined you need to make sure your CMA is larger than coherent_pool?
What drivers/calls use coherent_pool vs cma?
Best Regards,
Tim
^ permalink raw reply [flat|nested] 6+ messages in thread* AArch64 memory 2018-05-17 15:58 AArch64 memory Tim Harvey @ 2018-05-18 11:59 ` Robin Murphy 2018-05-18 16:43 ` Tim Harvey 0 siblings, 1 reply; 6+ messages in thread From: Robin Murphy @ 2018-05-18 11:59 UTC (permalink / raw) To: linux-arm-kernel Hi Tim, On 17/05/18 16:58, Tim Harvey wrote: > Greetings, > > I'm trying to understand some details of the AArch64 memory > configuration in the kernel. > > I've looked at Documentation/arm64/memory.txt which describes the > virtual memory layout used in terms of translation levels. This > relates to CONFIG_ARM64_{4K,16K,64K}_PAGES and CONFIG_ARM64_VA_BITS*. > > My first question has to do with virtual memory layout: What are the > advantages and disadvantages for a system with a fixed 2GB of DRAM > when using using 4KB pages + 3 levels (CONFIG_ARM64_4K_PAGES=y > CONFIG_ARM64_VA_BITS=39) resulting in 512GB user / 512GB kernel vs > using 64KB pages + 3 levels (CONFIG_ARM64_64K_PAGES=y > CONFIG_ARM64_VA_BITS=48)? The physical memory is far less than what > any of the combinations would offer but I'm not clear if the number of > levels affects any sort of performance or how fragmentation could play > into performance. There have been a number of discussions on the lists about the general topic in the contexts of several architectures, and I'm sure the last one I saw regarding arm64 actually had some measurements in it, although it's proving remarkably tricky to actually dig up again this morning :/ I think the rough executive summary remains that for certain memory-intensive workloads on AArch64, 64K pages *can* give a notable performance benefit in terms of reduced TLB pressure (and potentially also some for TLB miss overhead with 42-bit VA and 2-level tables). The (major) tradeoff is that for most other workloads, including much of the kernel itself, the increased allocation granularity leads to a significant increase in wasted RAM. My gut feeling is that if you have relatively limited RAM and don't know otherwise, then 39-bit VA is probably the way to go - notably, there are also still drivers/filesystems/etc. which don't play too well with PAGE_SIZE != 4096 - but I'm by no means an expert in this area. If you're targeting a particular application area (e.g. networking) and can benchmark some representative workloads to look at performance vs. RAM usage for different configs, that would probably help inform your decision the most. > My second question has to do with CMA and coherent_pool. I have > understood CMA as being a chunk of physical memory carved out by the > kernel for allocations from dma_alloc_coherent by drivers that need > chunks of contiguous memory for DMA buffers. I believe that before CMA > was introduced we had to do this by defining memory holes. I'm not > understanding the difference between CMA and the coherent pool. I've > noticed that if CONFIG_DMA_CMA=y then the coherent pool allocates from > CMA. Is there some disadvantage of CONFIG_DMA_CMA=y other than if > defined you need to make sure your CMA is larger than coherent_pool? > What drivers/calls use coherent_pool vs cma? coherent_pool is a special thing which exists for the sake of non-hardware-coherent devices - normally for those we satisfy DMA-coherent allocations by setting up a non-cacheable remap of the allocated buffer - see dma_common_contiguous_remap(). However, drivers may call dma_alloc_coherent(..., GFP_ATOMIC) from interrupt handlers, at which point we can't call get_vm_area() to remap on demand, since that might sleep, so we reserve a pool pre-mapped as non-cacheable to satisfy such atomic allocations from. I'm not sure why its user-visible name is "coherent pool" rather than the more descriptive "atomic pool" which it's named internally, but there's probably some history there. If you're lucky enough not to have any non-coherent DMA masters then you can safely ignore the whole thing; otherwise it's still generally rare that it should need adjusting. CMA is, as you surmise, a much more general thing for providing large physically-contiguous areas, which the arch code correspondingly uses to get DMA-contiguous buffers. Unless all your DMA masters are behind IOMMUs (such that we can make any motley collection of pages look DMA-contiguous), you probably don't want to turn it off. None of these details should be relevant as far as drivers are concerned, since from their viewpoint it's all abstracted behind dma_alloc_coherent(). Robin. ^ permalink raw reply [flat|nested] 6+ messages in thread
* AArch64 memory 2018-05-18 11:59 ` Robin Murphy @ 2018-05-18 16:43 ` Tim Harvey 2018-05-18 18:15 ` Robin Murphy 0 siblings, 1 reply; 6+ messages in thread From: Tim Harvey @ 2018-05-18 16:43 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 18, 2018 at 4:59 AM, Robin Murphy <robin.murphy@arm.com> wrote: > Hi Tim, > > On 17/05/18 16:58, Tim Harvey wrote: >> >> Greetings, >> >> I'm trying to understand some details of the AArch64 memory >> configuration in the kernel. >> >> I've looked at Documentation/arm64/memory.txt which describes the >> virtual memory layout used in terms of translation levels. This >> relates to CONFIG_ARM64_{4K,16K,64K}_PAGES and CONFIG_ARM64_VA_BITS*. >> >> My first question has to do with virtual memory layout: What are the >> advantages and disadvantages for a system with a fixed 2GB of DRAM >> when using using 4KB pages + 3 levels (CONFIG_ARM64_4K_PAGES=y >> CONFIG_ARM64_VA_BITS=39) resulting in 512GB user / 512GB kernel vs >> using 64KB pages + 3 levels (CONFIG_ARM64_64K_PAGES=y >> CONFIG_ARM64_VA_BITS=48)? The physical memory is far less than what >> any of the combinations would offer but I'm not clear if the number of >> levels affects any sort of performance or how fragmentation could play >> into performance. > > > There have been a number of discussions on the lists about the general topic > in the contexts of several architectures, and I'm sure the last one I saw > regarding arm64 actually had some measurements in it, although it's proving > remarkably tricky to actually dig up again this morning :/ > > I think the rough executive summary remains that for certain > memory-intensive workloads on AArch64, 64K pages *can* give a notable > performance benefit in terms of reduced TLB pressure (and potentially also > some for TLB miss overhead with 42-bit VA and 2-level tables). The (major) > tradeoff is that for most other workloads, including much of the kernel > itself, the increased allocation granularity leads to a significant increase > in wasted RAM. > > My gut feeling is that if you have relatively limited RAM and don't know > otherwise, then 39-bit VA is probably the way to go - notably, there are > also still drivers/filesystems/etc. which don't play too well with PAGE_SIZE > != 4096 - but I'm by no means an expert in this area. If you're targeting a > particular application area (e.g. networking) and can benchmark some > representative workloads to look at performance vs. RAM usage for different > configs, that would probably help inform your decision the most. Robin, Thanks for the explanation - this makes sense and I understand that its not easy to determine what is best. I'll do some tests with the boards I'm working with (which are Cavium Octeon-TX CN80XX quad-core 1.5GHz boards with 1MB L2 cache and 2GB 32bit DDR4 with up to 5x GbE). > >> My second question has to do with CMA and coherent_pool. I have >> understood CMA as being a chunk of physical memory carved out by the >> kernel for allocations from dma_alloc_coherent by drivers that need >> chunks of contiguous memory for DMA buffers. I believe that before CMA >> was introduced we had to do this by defining memory holes. I'm not >> understanding the difference between CMA and the coherent pool. I've >> noticed that if CONFIG_DMA_CMA=y then the coherent pool allocates from >> CMA. Is there some disadvantage of CONFIG_DMA_CMA=y other than if >> defined you need to make sure your CMA is larger than coherent_pool? >> What drivers/calls use coherent_pool vs cma? > > > coherent_pool is a special thing which exists for the sake of > non-hardware-coherent devices - normally for those we satisfy DMA-coherent > allocations by setting up a non-cacheable remap of the allocated buffer - > see dma_common_contiguous_remap(). However, drivers may call > dma_alloc_coherent(..., GFP_ATOMIC) from interrupt handlers, at which point > we can't call get_vm_area() to remap on demand, since that might sleep, so > we reserve a pool pre-mapped as non-cacheable to satisfy such atomic > allocations from. I'm not sure why its user-visible name is "coherent pool" > rather than the more descriptive "atomic pool" which it's named internally, > but there's probably some history there. If you're lucky enough not to have > any non-coherent DMA masters then you can safely ignore the whole thing; > otherwise it's still generally rare that it should need adjusting. is there an easy way to tell if I have non-coherent DMA masters? The Cavium SDK uses a kernel cmdline param of coherent_pool=16M so I'm guessing something in the CN80XX/CN81XX (BGX NIC's or CPT perhaps) need atomic pool mem. > > CMA is, as you surmise, a much more general thing for providing large > physically-contiguous areas, which the arch code correspondingly uses to get > DMA-contiguous buffers. Unless all your DMA masters are behind IOMMUs (such > that we can make any motley collection of pages look DMA-contiguous), you > probably don't want to turn it off. None of these details should be relevant > as far as drivers are concerned, since from their viewpoint it's all > abstracted behind dma_alloc_coherent(). > I don't want to turn off CONFIG_CMA but I'm still not clear if I should turn off CONFIG_DMA_CMA. I noticed the Cavium SDK 4.9 kernel has CONFIG_CMA=y but does not enable CONFIG_DMA_CMA which I believe means that the atomic pool does not pull its chunks from the CMA pool. Thanks, Tim ^ permalink raw reply [flat|nested] 6+ messages in thread
* AArch64 memory 2018-05-18 16:43 ` Tim Harvey @ 2018-05-18 18:15 ` Robin Murphy 2018-05-18 18:49 ` Tim Harvey 0 siblings, 1 reply; 6+ messages in thread From: Robin Murphy @ 2018-05-18 18:15 UTC (permalink / raw) To: linux-arm-kernel On 18/05/18 17:43, Tim Harvey wrote: [...] >>> My second question has to do with CMA and coherent_pool. I have >>> understood CMA as being a chunk of physical memory carved out by the >>> kernel for allocations from dma_alloc_coherent by drivers that need >>> chunks of contiguous memory for DMA buffers. I believe that before CMA >>> was introduced we had to do this by defining memory holes. I'm not >>> understanding the difference between CMA and the coherent pool. I've >>> noticed that if CONFIG_DMA_CMA=y then the coherent pool allocates from >>> CMA. Is there some disadvantage of CONFIG_DMA_CMA=y other than if >>> defined you need to make sure your CMA is larger than coherent_pool? >>> What drivers/calls use coherent_pool vs cma? >> >> >> coherent_pool is a special thing which exists for the sake of >> non-hardware-coherent devices - normally for those we satisfy DMA-coherent >> allocations by setting up a non-cacheable remap of the allocated buffer - >> see dma_common_contiguous_remap(). However, drivers may call >> dma_alloc_coherent(..., GFP_ATOMIC) from interrupt handlers, at which point >> we can't call get_vm_area() to remap on demand, since that might sleep, so >> we reserve a pool pre-mapped as non-cacheable to satisfy such atomic >> allocations from. I'm not sure why its user-visible name is "coherent pool" >> rather than the more descriptive "atomic pool" which it's named internally, >> but there's probably some history there. If you're lucky enough not to have >> any non-coherent DMA masters then you can safely ignore the whole thing; >> otherwise it's still generally rare that it should need adjusting. > > is there an easy way to tell if I have non-coherent DMA masters? The > Cavium SDK uses a kernel cmdline param of coherent_pool=16M so I'm > guessing something in the CN80XX/CN81XX (BGX NIC's or CPT perhaps) > need atomic pool mem. AFAIK the big-boy CN88xx is fully coherent everywhere, but whether the peripherals and interconnect in the littler Octeon TX variants are different I have no idea. If the contents of your dts-newport repo on GitHub are the right thing to be looking at, then you do have the "dma-coherent" property on the PCI nodes, which should cover everything beneath (I'd expect that in reality the SMMU may actually be coherent as well, but fortunately that's irrelevant here). Thus everything which matters *should* be being picked up as coherent already, and if not it would be a Linux problem. I can't imagine what the SDK is up to there, but 16MB of coherent pool does sound like something being done wrong, like incorrectly compensating for bad firmware failing to describe the hardware properly in the first place. >> CMA is, as you surmise, a much more general thing for providing large >> physically-contiguous areas, which the arch code correspondingly uses to get >> DMA-contiguous buffers. Unless all your DMA masters are behind IOMMUs (such >> that we can make any motley collection of pages look DMA-contiguous), you >> probably don't want to turn it off. None of these details should be relevant >> as far as drivers are concerned, since from their viewpoint it's all >> abstracted behind dma_alloc_coherent(). >> > > I don't want to turn off CONFIG_CMA but I'm still not clear if I > should turn off CONFIG_DMA_CMA. I noticed the Cavium SDK 4.9 kernel > has CONFIG_CMA=y but does not enable CONFIG_DMA_CMA which I believe > means that the atomic pool does not pull its chunks from the CMA pool. I wouldn't think there's much good reason to turn DMA_CMA off either, even if nothing actually needs huge DMA buffers. Where the atomic pool comes from shouldn't really matter, as it's a very early one-off allocation. To speculate wildly I suppose there *might* possibly be some performance difference between cma_alloc() and falling back to the regular page allocator - if that were the case it ought to be measurable by profiling something which calls dma_alloc_coherent() in process context a lot, under both configurations. Even then I'd imagine it's something that would matter most on the 2-socket 96-core systems, and not so much on the diddy ones. Robin. ^ permalink raw reply [flat|nested] 6+ messages in thread
* AArch64 memory 2018-05-18 18:15 ` Robin Murphy @ 2018-05-18 18:49 ` Tim Harvey 2018-05-18 20:59 ` Robin Murphy 0 siblings, 1 reply; 6+ messages in thread From: Tim Harvey @ 2018-05-18 18:49 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 18, 2018 at 11:15 AM, Robin Murphy <robin.murphy@arm.com> wrote: > On 18/05/18 17:43, Tim Harvey wrote: > [...] > >>>> My second question has to do with CMA and coherent_pool. I have >>>> understood CMA as being a chunk of physical memory carved out by the >>>> kernel for allocations from dma_alloc_coherent by drivers that need >>>> chunks of contiguous memory for DMA buffers. I believe that before CMA >>>> was introduced we had to do this by defining memory holes. I'm not >>>> understanding the difference between CMA and the coherent pool. I've >>>> noticed that if CONFIG_DMA_CMA=y then the coherent pool allocates from >>>> CMA. Is there some disadvantage of CONFIG_DMA_CMA=y other than if >>>> defined you need to make sure your CMA is larger than coherent_pool? >>>> What drivers/calls use coherent_pool vs cma? >>> >>> >>> >>> coherent_pool is a special thing which exists for the sake of >>> non-hardware-coherent devices - normally for those we satisfy >>> DMA-coherent >>> allocations by setting up a non-cacheable remap of the allocated buffer - >>> see dma_common_contiguous_remap(). However, drivers may call >>> dma_alloc_coherent(..., GFP_ATOMIC) from interrupt handlers, at which >>> point >>> we can't call get_vm_area() to remap on demand, since that might sleep, >>> so >>> we reserve a pool pre-mapped as non-cacheable to satisfy such atomic >>> allocations from. I'm not sure why its user-visible name is "coherent >>> pool" >>> rather than the more descriptive "atomic pool" which it's named >>> internally, >>> but there's probably some history there. If you're lucky enough not to >>> have >>> any non-coherent DMA masters then you can safely ignore the whole thing; >>> otherwise it's still generally rare that it should need adjusting. >> >> >> is there an easy way to tell if I have non-coherent DMA masters? The >> Cavium SDK uses a kernel cmdline param of coherent_pool=16M so I'm >> guessing something in the CN80XX/CN81XX (BGX NIC's or CPT perhaps) >> need atomic pool mem. > > > AFAIK the big-boy CN88xx is fully coherent everywhere, but whether the > peripherals and interconnect in the littler Octeon TX variants are different > I have no idea. If the contents of your dts-newport repo on GitHub are the > right thing to be looking at, then you do have the "dma-coherent" property > on the PCI nodes, which should cover everything beneath (I'd expect that in > reality the SMMU may actually be coherent as well, but fortunately that's > irrelevant here). Thus everything which matters *should* be being picked up > as coherent already, and if not it would be a Linux problem. I can't imagine > what the SDK is up to there, but 16MB of coherent pool does sound like > something being done wrong, like incorrectly compensating for bad firmware > failing to describe the hardware properly in the first place. Yes https://github.com/Gateworks/dts-newport/ is the board that I'm working with :) Ok, I think I understand now that the dma-coherent property on the PCI host controller is saying that all allocations by PCI device drivers will come from the atomic pool defined by coherent_pool=. Why does coherent_pool=16M seem wrong to you? > >>> CMA is, as you surmise, a much more general thing for providing large >>> physically-contiguous areas, which the arch code correspondingly uses to >>> get >>> DMA-contiguous buffers. Unless all your DMA masters are behind IOMMUs >>> (such >>> that we can make any motley collection of pages look DMA-contiguous), you >>> probably don't want to turn it off. None of these details should be >>> relevant >>> as far as drivers are concerned, since from their viewpoint it's all >>> abstracted behind dma_alloc_coherent(). >>> >> >> I don't want to turn off CONFIG_CMA but I'm still not clear if I >> should turn off CONFIG_DMA_CMA. I noticed the Cavium SDK 4.9 kernel >> has CONFIG_CMA=y but does not enable CONFIG_DMA_CMA which I believe >> means that the atomic pool does not pull its chunks from the CMA pool. > > > I wouldn't think there's much good reason to turn DMA_CMA off either, even > if nothing actually needs huge DMA buffers. Where the atomic pool comes from > shouldn't really matter, as it's a very early one-off allocation. To > speculate wildly I suppose there *might* possibly be some performance > difference between cma_alloc() and falling back to the regular page > allocator - if that were the case it ought to be measurable by profiling > something which calls dma_alloc_coherent() in process context a lot, under > both configurations. Even then I'd imagine it's something that would matter > most on the 2-socket 96-core systems, and not so much on the diddy ones. > If you enable DMA_CMA then you have to make sure to size CMA large enough to handle coherent_pool (and any additional CMA you will need). I made the mistake of setting CONFIG_CMA_SIZE_MBYTES=16 then passing in a coherent_pool=64M which causes the coherent pool DMA allocation to fail and I'm not clear if that even has an impact on the system. It seems to me that the kernel should perhaps catch the case where CMA < dma_coherent when CONFIG_CMA_DMA=y and either warn about that condition or set cma to coherent_pool to resolve it. Tim ^ permalink raw reply [flat|nested] 6+ messages in thread
* AArch64 memory 2018-05-18 18:49 ` Tim Harvey @ 2018-05-18 20:59 ` Robin Murphy 0 siblings, 0 replies; 6+ messages in thread From: Robin Murphy @ 2018-05-18 20:59 UTC (permalink / raw) To: linux-arm-kernel On Fri, 18 May 2018 11:49:05 -0700 Tim Harvey <tharvey@gateworks.com> wrote: > On Fri, May 18, 2018 at 11:15 AM, Robin Murphy <robin.murphy@arm.com> > wrote: > > On 18/05/18 17:43, Tim Harvey wrote: > > [...] > > > >>>> My second question has to do with CMA and coherent_pool. I have > >>>> understood CMA as being a chunk of physical memory carved out by > >>>> the kernel for allocations from dma_alloc_coherent by drivers > >>>> that need chunks of contiguous memory for DMA buffers. I believe > >>>> that before CMA was introduced we had to do this by defining > >>>> memory holes. I'm not understanding the difference between CMA > >>>> and the coherent pool. I've noticed that if CONFIG_DMA_CMA=y > >>>> then the coherent pool allocates from CMA. Is there some > >>>> disadvantage of CONFIG_DMA_CMA=y other than if defined you need > >>>> to make sure your CMA is larger than coherent_pool? What > >>>> drivers/calls use coherent_pool vs cma? > >>> > >>> > >>> > >>> coherent_pool is a special thing which exists for the sake of > >>> non-hardware-coherent devices - normally for those we satisfy > >>> DMA-coherent > >>> allocations by setting up a non-cacheable remap of the allocated > >>> buffer - see dma_common_contiguous_remap(). However, drivers may > >>> call dma_alloc_coherent(..., GFP_ATOMIC) from interrupt handlers, > >>> at which point > >>> we can't call get_vm_area() to remap on demand, since that might > >>> sleep, so > >>> we reserve a pool pre-mapped as non-cacheable to satisfy such > >>> atomic allocations from. I'm not sure why its user-visible name > >>> is "coherent pool" > >>> rather than the more descriptive "atomic pool" which it's named > >>> internally, > >>> but there's probably some history there. If you're lucky enough > >>> not to have > >>> any non-coherent DMA masters then you can safely ignore the whole > >>> thing; otherwise it's still generally rare that it should need > >>> adjusting. > >> > >> > >> is there an easy way to tell if I have non-coherent DMA masters? > >> The Cavium SDK uses a kernel cmdline param of coherent_pool=16M so > >> I'm guessing something in the CN80XX/CN81XX (BGX NIC's or CPT > >> perhaps) need atomic pool mem. > > > > > > AFAIK the big-boy CN88xx is fully coherent everywhere, but whether > > the peripherals and interconnect in the littler Octeon TX variants > > are different I have no idea. If the contents of your dts-newport > > repo on GitHub are the right thing to be looking at, then you do > > have the "dma-coherent" property on the PCI nodes, which should > > cover everything beneath (I'd expect that in reality the SMMU may > > actually be coherent as well, but fortunately that's irrelevant > > here). Thus everything which matters *should* be being picked up as > > coherent already, and if not it would be a Linux problem. I can't > > imagine what the SDK is up to there, but 16MB of coherent pool does > > sound like something being done wrong, like incorrectly > > compensating for bad firmware failing to describe the hardware > > properly in the first place. > > Yes https://github.com/Gateworks/dts-newport/ is the board that I'm > working with :) > > Ok, I think I understand now that the dma-coherent property on the PCI > host controller is saying that all allocations by PCI device drivers > will come from the atomic pool defined by coherent_pool=. No no, quite the opposite! With that property present, all the devices should be treated as hardware-coherent, meaning that CPU accesses to DMA buffers can be via the regular (cacheable) kernel address, and the non-cacheable remaps aren't necessary. Thus *nothing* will be touching the atomic pool at all. > Why does coherent_pool=16M seem wrong to you? Because it's two-hundred and fifty six times the default value, and atomic allocations should be very rare to begin with. IOW it stinks of badly-written drivers. > > > >>> CMA is, as you surmise, a much more general thing for providing > >>> large physically-contiguous areas, which the arch code > >>> correspondingly uses to get > >>> DMA-contiguous buffers. Unless all your DMA masters are behind > >>> IOMMUs (such > >>> that we can make any motley collection of pages look > >>> DMA-contiguous), you probably don't want to turn it off. None of > >>> these details should be relevant > >>> as far as drivers are concerned, since from their viewpoint it's > >>> all abstracted behind dma_alloc_coherent(). > >>> > >> > >> I don't want to turn off CONFIG_CMA but I'm still not clear if I > >> should turn off CONFIG_DMA_CMA. I noticed the Cavium SDK 4.9 kernel > >> has CONFIG_CMA=y but does not enable CONFIG_DMA_CMA which I believe > >> means that the atomic pool does not pull its chunks from the CMA > >> pool. > > > > > > I wouldn't think there's much good reason to turn DMA_CMA off > > either, even if nothing actually needs huge DMA buffers. Where the > > atomic pool comes from shouldn't really matter, as it's a very > > early one-off allocation. To speculate wildly I suppose there > > *might* possibly be some performance difference between cma_alloc() > > and falling back to the regular page allocator - if that were the > > case it ought to be measurable by profiling something which calls > > dma_alloc_coherent() in process context a lot, under both > > configurations. Even then I'd imagine it's something that would > > matter most on the 2-socket 96-core systems, and not so much on the > > diddy ones. > > If you enable DMA_CMA then you have to make sure to size CMA large > enough to handle coherent_pool (and any additional CMA you will need). > I made the mistake of setting CONFIG_CMA_SIZE_MBYTES=16 then passing > in a coherent_pool=64M which causes the coherent pool DMA allocation > to fail and I'm not clear if that even has an impact on the system. It > seems to me that the kernel should perhaps catch the case where CMA < > dma_coherent when CONFIG_CMA_DMA=y and either warn about that > condition or set cma to coherent_pool to resolve it. Unfortunately that's not really practical - the default DMA_CMA region is pulled out of memblock way early by generic code, while the atomic pool is an Arm-specific thing which only comes into the picture much later. Users already get a warning when creating the atomic pool failed, so if they really want to go to crazy town with command-line values they can always just reboot with "cma=<bigger>" as well (and without CMA you're way beyond MAX_ORDER with those kind of sizes anyway). Robin. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-18 20:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-17 15:58 AArch64 memory Tim Harvey 2018-05-18 11:59 ` Robin Murphy 2018-05-18 16:43 ` Tim Harvey 2018-05-18 18:15 ` Robin Murphy 2018-05-18 18:49 ` Tim Harvey 2018-05-18 20:59 ` Robin Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).