iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
@ 2014-09-28 15:52 Akinobu Mita
       [not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-28 15:52 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Akinobu Mita, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Woodhouse

If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
allocate memory region by dma_alloc_from_contiguous() before trying to
use alloc_pages().

This wastes CMA region by small DMA-coherent buffers which can be
allocated by alloc_pages().  And it also causes performance degradation,
as this is trying to drive _all_ dma mapping allocations through a
_very_ small window, reported by Peter Hurley.

This fixes it by trying to allocate by alloc_pages() first in
dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
called only for huge allocation.

Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
---
 arch/x86/kernel/pci-dma.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a25e202..0402266 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 
 	flag &= ~__GFP_ZERO;
 again:
-	page = NULL;
+	page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
+				get_order(size));
 	/* CMA can be used only in the context which permits sleeping */
-	if (flag & __GFP_WAIT) {
+	if (!page && (flag & __GFP_WAIT)) {
 		page = dma_alloc_from_contiguous(dev, count, get_order(size));
 		if (page && page_to_phys(page) + size > dma_mask) {
 			dma_release_from_contiguous(dev, page, count);
 			page = NULL;
 		}
 	}
-	/* fallback */
-	if (!page)
-		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
-	if (!page)
+	if (!page) {
+		warn_alloc_failed(flag, get_order(size), NULL);
 		return NULL;
+	}
 
 	addr = page_to_phys(page);
 	if (addr + size > dma_mask) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] intel-iommu: don't unnecessarily call dma_alloc_from_contiguous()
       [not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-28 15:52   ` Akinobu Mita
  0 siblings, 0 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-28 15:52 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Akinobu Mita, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Woodhouse

If CONFIG_DMA_CMA is enabled, intel_alloc_coherent() tries to allocate
memory region by dma_alloc_from_contiguous() before trying to use
alloc_pages().

This wastes CMA region by small DMA-coherent buffers which can be
allocated by alloc_pages().  And it also causes performance degradation,
as this is trying to drive _all_ dma mapping allocations through a
_very_ small window, reported by Peter Hurley.

This fixes it by trying to allocate by alloc_pages() first in
intel_alloc_coherent() as dma_alloc_from_contiguous should be called
only for huge allocation.

Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
---
 drivers/iommu/intel-iommu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3441615..f9794fa 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3250,7 +3250,8 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
 			flags |= GFP_DMA32;
 	}
 
-	if (flags & __GFP_WAIT) {
+	page = alloc_pages(flags | __GFP_NOWARN, order);
+	if (!page && (flags & __GFP_WAIT)) {
 		unsigned int count = size >> PAGE_SHIFT;
 
 		page = dma_alloc_from_contiguous(dev, count, order);
@@ -3261,10 +3262,10 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
 		}
 	}
 
-	if (!page)
-		page = alloc_pages(flags, order);
-	if (!page)
+	if (!page) {
+		warn_alloc_failed(flags, order, NULL);
 		return NULL;
+	}
 	memset(page_address(page), 0, size);
 
 	*dma_handle = __intel_map_single(dev, page_to_phys(page), size,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
  2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
       [not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-28 20:41 ` Chuck Ebbert
  2014-09-28 20:45 ` Chuck Ebbert
  2 siblings, 0 replies; 5+ messages in thread
From: Chuck Ebbert @ 2014-09-28 20:41 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-kernel, akpm, Peter Hurley, Marek Szyprowski,
	Konrad Rzeszutek Wilk, David Woodhouse, Don Dutile,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Yinghai Lu, x86, iommu

On Mon, 29 Sep 2014 00:52:03 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
> allocate memory region by dma_alloc_from_contiguous() before trying to
> use alloc_pages().
> 
> This wastes CMA region by small DMA-coherent buffers which can be
> allocated by alloc_pages().  And it also causes performance degradation,
> as this is trying to drive _all_ dma mapping allocations through a
> _very_ small window, reported by Peter Hurley.
> 
> This fixes it by trying to allocate by alloc_pages() first in
> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
> called only for huge allocation.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Reported-by: Peter Hurley <peter@hurleysoftware.com>
> Cc: Peter Hurley <peter@hurleysoftware.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Don Dutile <ddutile@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
> Cc: iommu@lists.linux-foundation.org
> ---
>  arch/x86/kernel/pci-dma.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index a25e202..0402266 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>  
>  	flag &= ~__GFP_ZERO;
>  again:
> -	page = NULL;
> +	page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
> +				get_order(size));

Shouldn't you be doing this only in the case where order is relatively
small? Like < PAGE_ALLOC_COSTLY_ORDER or so?

>  	/* CMA can be used only in the context which permits sleeping */
> -	if (flag & __GFP_WAIT) {
> +	if (!page && (flag & __GFP_WAIT)) {
>  		page = dma_alloc_from_contiguous(dev, count, get_order(size));
>  		if (page && page_to_phys(page) + size > dma_mask) {
>  			dma_release_from_contiguous(dev, page, count);
>  			page = NULL;
>  		}
>  	}
> -	/* fallback */
> -	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
> -	if (!page)
> +	if (!page) {
> +		warn_alloc_failed(flag, get_order(size), NULL);
>  		return NULL;
> +	}
>  
>  	addr = page_to_phys(page);
>  	if (addr + size > dma_mask) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
  2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
       [not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-09-28 20:41 ` [PATCH 1/2] x86: " Chuck Ebbert
@ 2014-09-28 20:45 ` Chuck Ebbert
  2014-09-29 13:21   ` Akinobu Mita
  2 siblings, 1 reply; 5+ messages in thread
From: Chuck Ebbert @ 2014-09-28 20:45 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-kernel, akpm, Peter Hurley, Marek Szyprowski,
	Konrad Rzeszutek Wilk, David Woodhouse, Don Dutile,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Yinghai Lu, x86, iommu

On Mon, 29 Sep 2014 00:52:03 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
> allocate memory region by dma_alloc_from_contiguous() before trying to
> use alloc_pages().
> 
> This wastes CMA region by small DMA-coherent buffers which can be
> allocated by alloc_pages().  And it also causes performance degradation,
> as this is trying to drive _all_ dma mapping allocations through a
> _very_ small window, reported by Peter Hurley.
> 
> This fixes it by trying to allocate by alloc_pages() first in
> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
> called only for huge allocation.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Reported-by: Peter Hurley <peter@hurleysoftware.com>
> Cc: Peter Hurley <peter@hurleysoftware.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Don Dutile <ddutile@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
> Cc: iommu@lists.linux-foundation.org
> ---
>  arch/x86/kernel/pci-dma.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index a25e202..0402266 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>  
>  	flag &= ~__GFP_ZERO;
>  again:
> -	page = NULL;
> +	page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
> +				get_order(size));

Only try small allocs here, like when order < PAGE_ALLOC_COSTLY_ORDER ?

>  	/* CMA can be used only in the context which permits sleeping */
> -	if (flag & __GFP_WAIT) {
> +	if (!page && (flag & __GFP_WAIT)) {
>  		page = dma_alloc_from_contiguous(dev, count, get_order(size));
>  		if (page && page_to_phys(page) + size > dma_mask) {
>  			dma_release_from_contiguous(dev, page, count);
>  			page = NULL;
>  		}
>  	}
> -	/* fallback */
> -	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));

(I forgot to add this in my first reply). I think it should try for a
small alloc without CMA first, then try CMA, and then this final
fallback for larger allocs.

> -	if (!page)
> +	if (!page) {
> +		warn_alloc_failed(flag, get_order(size), NULL);
>  		return NULL;
> +	}
>  
>  	addr = page_to_phys(page);
>  	if (addr + size > dma_mask) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous()
  2014-09-28 20:45 ` Chuck Ebbert
@ 2014-09-29 13:21   ` Akinobu Mita
  0 siblings, 0 replies; 5+ messages in thread
From: Akinobu Mita @ 2014-09-29 13:21 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Andi Kleen, Peter Hurley, Yinghai Lu, x86-DgEjT+Ai2ygdnm+yROfE0A,
	LKML, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Ingo Molnar, H. Peter Anvin, Andrew Morton, David Woodhouse,
	Thomas Gleixner

2014-09-29 5:45 GMT+09:00 Chuck Ebbert <cebbert.lkml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On Mon, 29 Sep 2014 00:52:03 +0900
> Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> If CONFIG_DMA_CMA is enabled, dma_generic_alloc_coherent() tries to
>> allocate memory region by dma_alloc_from_contiguous() before trying to
>> use alloc_pages().
>>
>> This wastes CMA region by small DMA-coherent buffers which can be
>> allocated by alloc_pages().  And it also causes performance degradation,
>> as this is trying to drive _all_ dma mapping allocations through a
>> _very_ small window, reported by Peter Hurley.
>>
>> This fixes it by trying to allocate by alloc_pages() first in
>> dma_generic_alloc_coherent() as dma_alloc_from_contiguous should be
>> called only for huge allocation.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Reported-by: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
>> Cc: Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>
>> Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> Cc: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>> Cc: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
>> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
>> Cc: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
>> Cc: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Cc: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>> Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> ---
>>  arch/x86/kernel/pci-dma.c | 12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index a25e202..0402266 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -99,20 +99,20 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
>>
>>       flag &= ~__GFP_ZERO;
>>  again:
>> -     page = NULL;
>> +     page = alloc_pages_node(dev_to_node(dev), flag | __GFP_NOWARN,
>> +                             get_order(size));
>
> Only try small allocs here, like when order < PAGE_ALLOC_COSTLY_ORDER ?
>
>>       /* CMA can be used only in the context which permits sleeping */
>> -     if (flag & __GFP_WAIT) {
>> +     if (!page && (flag & __GFP_WAIT)) {
>>               page = dma_alloc_from_contiguous(dev, count, get_order(size));
>>               if (page && page_to_phys(page) + size > dma_mask) {
>>                       dma_release_from_contiguous(dev, page, count);
>>                       page = NULL;
>>               }
>>       }
>> -     /* fallback */
>> -     if (!page)
>> -             page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
>
> (I forgot to add this in my first reply). I think it should try for a
> small alloc without CMA first, then try CMA, and then this final
> fallback for larger allocs.

I'm concerned with the performance problem reported by Peter Hurley.
This could be a solution, but I would like to hear Peter's opinion.

For now, I prefer the solution by this patch because it gives less
impact on CONFIG_DMA_CMA enabled.  But it can be improved later on.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-29 13:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-28 15:52 [PATCH 1/2] x86: don't unnecessarily call dma_alloc_from_contiguous() Akinobu Mita
     [not found] ` <1411919524-3666-1-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-28 15:52   ` [PATCH 2/2] intel-iommu: " Akinobu Mita
2014-09-28 20:41 ` [PATCH 1/2] x86: " Chuck Ebbert
2014-09-28 20:45 ` Chuck Ebbert
2014-09-29 13:21   ` Akinobu Mita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).