* [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
@ 2014-09-28 15:52 Akinobu Mita
[not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-28 15:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
Akinobu Mita, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Woodhouse
If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
allocate memory region by dma_alloc_from_contiguous() before trying to
use alloc_pages().
This wastes CMA region by small DMA-coherent buffers which can be
allocated by alloc_pages(). And it also causes performance degradation,
as this is trying to drive _all_ dma mapping allocations through a
_very_ small window, reported by Peter Hurley.
This fixes it by trying to allocate by alloc_pages() first in
dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
called only for huge allocation.
Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
---
arch/x86/kernel/pci-dma.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a25e202..0402266 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
flag &= ~__GFP_ZERO;
again:
- page = NULL;
+ page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
+ get_order(size));
/* CMA can be used only in the context which permits sleeping */
- if (flag & __GFP_WAIT) {
+ if (!page && (flag & __GFP_WAIT)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size));
if (page && page_to_phys(page) + size > dma_mask) {
dma_release_from_contiguous(dev, page, count);
page = NULL;
}
}
- /* fallback */
- if (!page)
- page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
- if (!page)
+ if (!page) {
+ warn_alloc_failed(flag, get_order(size), NULL);
return NULL;
+ }
addr = page_to_phys(page);
if (addr + size > dma_mask) {
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] intel-iommu: don't unnecessarily call dma_alloc_from_contiguous()
[not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-28 15:52 ` Akinobu Mita
0 siblings, 0 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-28 15:52 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
Akinobu Mita, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Woodhouse
If CONFIG_DMA_CMA is enabled, intel_alloc_coherent() tries to allocate
memory region by dma_alloc_from_contiguous() before trying to use
alloc_pages().
This wastes CMA region by small DMA-coherent buffers which can be
allocated by alloc_pages(). And it also causes performance degradation,
as this is trying to drive _all_ dma mapping allocations through a
_very_ small window, reported by Peter Hurley.
This fixes it by trying to allocate by alloc_pages() first in
intel_alloc_coherent() as dma_alloc_from_contiguous should be called
only for huge allocation.
Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
---
drivers/iommu/intel-iommu.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3441615..f9794fa 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3250,7 +3250,8 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
flags |= GFP_DMA32;
}
- if (flags & __GFP_WAIT) {
+ page = alloc_pages(flags | __GFP_NOWARN, order);
+ if (!page && (flags & __GFP_WAIT)) {
unsigned int count = size >> PAGE_SHIFT;
page = dma_alloc_from_contiguous(dev, count, order);
@@ -3261,10 +3262,10 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
}
}
- if (!page)
- page = alloc_pages(flags, order);
- if (!page)
+ if (!page) {
+ warn_alloc_failed(flags, order, NULL);
return NULL;
+ }
memset(page_address(page), 0, size);
*dma_handle = __intel_map_single(dev, page_to_phys(page), size,
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
[not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-28 20:41 ` Chuck Ebbert
2014-09-28 20:45 ` Chuck Ebbert
2 siblings, 0 replies; 5+ messages in thread
From: Chuck Ebbert @ 2014-09-28 20:41 UTC (permalink / raw)
To: Akinobu Mita
Cc: linux-kernel, akpm, Peter Hurley, Marek Szyprowski,
Konrad Rzeszutek Wilk, David Woodhouse, Don Dutile,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
Yinghai Lu, x86, iommu
On Mon, 29 Sep 2014 00:52:03 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:
> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
> allocate memory region by dma_alloc_from_contiguous() before trying to
> use alloc_pages().
>
> This wastes CMA region by small DMA-coherent buffers which can be
> allocated by alloc_pages(). And it also causes performance degradation,
> as this is trying to drive _all_ dma mapping allocations through a
> _very_ small window, reported by Peter Hurley.
>
> This fixes it by trying to allocate by alloc_pages() first in
> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
> called only for huge allocation.
>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Reported-by: Peter Hurley <peter@hurleysoftware.com>
> Cc: Peter Hurley <peter@hurleysoftware.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Don Dutile <ddutile@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
> Cc: iommu@lists.linux-foundation.org
> ---
> arch/x86/kernel/pci-dma.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index a25e202..0402266 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>
> flag &= ~__GFP_ZERO;
> again:
> - page = NULL;
> + page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
> + get_order(size));
Shouldn't you be doing this only in the case where order is relatively
small? Like < PAGE_ALLOC_COSTLY_ORDER or so?
> /* CMA can be used only in the context which permits sleeping */
> - if (flag & __GFP_WAIT) {
> + if (!page && (flag & __GFP_WAIT)) {
> page = dma_alloc_from_contiguous(dev, count, get_order(size));
> if (page && page_to_phys(page) + size > dma_mask) {
> dma_release_from_contiguous(dev, page, count);
> page = NULL;
> }
> }
> - /* fallback */
> - if (!page)
> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
> - if (!page)
> + if (!page) {
> + warn_alloc_failed(flag, get_order(size), NULL);
> return NULL;
> + }
>
> addr = page_to_phys(page);
> if (addr + size > dma_mask) {
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
[not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-28 20:41 ` [PATCH 1/2] x86: " Chuck Ebbert
@ 2014-09-28 20:45 ` Chuck Ebbert
2014-09-29 13:21 ` Akinobu Mita
2 siblings, 1 reply; 5+ messages in thread
From: Chuck Ebbert @ 2014-09-28 20:45 UTC (permalink / raw)
To: Akinobu Mita
Cc: linux-kernel, akpm, Peter Hurley, Marek Szyprowski,
Konrad Rzeszutek Wilk, David Woodhouse, Don Dutile,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
Yinghai Lu, x86, iommu
On Mon, 29 Sep 2014 00:52:03 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:
> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
> allocate memory region by dma_alloc_from_contiguous() before trying to
> use alloc_pages().
>
> This wastes CMA region by small DMA-coherent buffers which can be
> allocated by alloc_pages(). And it also causes performance degradation,
> as this is trying to drive _all_ dma mapping allocations through a
> _very_ small window, reported by Peter Hurley.
>
> This fixes it by trying to allocate by alloc_pages() first in
> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
> called only for huge allocation.
>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Reported-by: Peter Hurley <peter@hurleysoftware.com>
> Cc: Peter Hurley <peter@hurleysoftware.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Don Dutile <ddutile@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
> Cc: iommu@lists.linux-foundation.org
> ---
> arch/x86/kernel/pci-dma.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index a25e202..0402266 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>
> flag &= ~__GFP_ZERO;
> again:
> - page = NULL;
> + page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
> + get_order(size));
Only try small allocs here, like when order < PAGE_ALLOC_COSTLY_ORDER ?
> /* CMA can be used only in the context which permits sleeping */
> - if (flag & __GFP_WAIT) {
> + if (!page && (flag & __GFP_WAIT)) {
> page = dma_alloc_from_contiguous(dev, count, get_order(size));
> if (page && page_to_phys(page) + size > dma_mask) {
> dma_release_from_contiguous(dev, page, count);
> page = NULL;
> }
> }
> - /* fallback */
> - if (!page)
> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
(I forgot to add this in my first reply). I think it should try for a
small alloc without CMA first, then try CMA, and then this final
fallback for larger allocs.
> - if (!page)
> + if (!page) {
> + warn_alloc_failed(flag, get_order(size), NULL);
> return NULL;
> + }
>
> addr = page_to_phys(page);
> if (addr + size > dma_mask) {
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
2014-09-28 20:45 ` Chuck Ebbert
@ 2014-09-29 13:21 ` Akinobu Mita
0 siblings, 0 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-29 13:21 UTC (permalink / raw)
To: Chuck Ebbert
Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
LKML, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Ingo Molnar, H. Peter Anvin, Andrew Morton, David Woodhouse,
Thomas Gleixner
2014-09-29 5:45 GMT+09:00 Chuck Ebbert <cebbert.lkml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On Mon, 29 Sep 2014 00:52:03 +0900
> Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
>> allocate memory region by dma_alloc_from_contiguous() before trying to
>> use alloc_pages().
>>
>> This wastes CMA region by small DMA-coherent buffers which can be
>> allocated by alloc_pages(). And it also causes performance degradation,
>> as this is trying to drive _all_ dma mapping allocations through a
>> _very_ small window, reported by Peter Hurley.
>>
>> This fixes it by trying to allocate by alloc_pages() first in
>> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
>> called only for huge allocation.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
>> Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
>> Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>> Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
>> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
>> Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
>> Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>> Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> ---
>> arch/x86/kernel/pci-dma.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index a25e202..0402266 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>>
>> flag &= ~__GFP_ZERO;
>> again:
>> - page = NULL;
>> + page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
>> + get_order(size));
>
> Only try small allocs here, like when order < PAGE_ALLOC_COSTLY_ORDER ?
>
>> /* CMA can be used only in the context which permits sleeping */
>> - if (flag & __GFP_WAIT) {
>> + if (!page && (flag & __GFP_WAIT)) {
>> page = dma_alloc_from_contiguous(dev, count, get_order(size));
>> if (page && page_to_phys(page) + size > dma_mask) {
>> dma_release_from_contiguous(dev, page, count);
>> page = NULL;
>> }
>> }
>> - /* fallback */
>> - if (!page)
>> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
>
> (I forgot to add this in my first reply). I think it should try for a
> small alloc without CMA first, then try CMA, and then this final
> fallback for larger allocs.
I'm concerned with the performance problem reported by Peter Hurley.
This could be a solution, but I would like to hear Peter's opinion.
For now, I prefer the solution by this patch because it gives less
impact on CONFIG_DMA_CMA enabled. But it can be improved later on.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-09-29 13:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
[not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-28 15:52 ` [PATCH 2/2] intel-iommu: " Akinobu Mita
2014-09-28 20:41 ` [PATCH 1/2] x86: " Chuck Ebbert
2014-09-28 20:45 ` Chuck Ebbert
2014-09-29 13:21 ` Akinobu Mita
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).