[PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
@ 2008-09-10 11:19 FUJITA Tomonori
  2008-09-10 11:57 ` Ingo Molnar
  2008-09-10 12:03 ` Joerg Roedel
  0 siblings, 2 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 11:19 UTC (permalink / raw)
  To: joerg.roedel; +Cc: linux-kernel, mingo

This is for tip/iommu.

btw, Joery, what happens if map_sg (or map_single) gets
not-DMA-capable buffer in the case of !iommu || !domain?


==
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent

x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
need it for devices that the IOMMU can do virtual mappings for. This
patch avoids unnecessary low zone allocation.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 arch/x86/kernel/amd_iommu.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 01c68c3..8efd249 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -999,6 +999,11 @@ static void *alloc_coherent(struct device *dev, size_t size,
 	u16 devid;
 	phys_addr_t paddr;
 
+	get_device_resources(dev, &iommu, &domain, &devid);
+
+	if (iommu && domain)
+		flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
 	virt_addr = (void *)__get_free_pages(flag, get_order(size));
 	if (!virt_addr)
 		return 0;
@@ -1006,8 +1011,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
 	memset(virt_addr, 0, size);
 	paddr = virt_to_phys(virt_addr);
 
-	get_device_resources(dev, &iommu, &domain, &devid);
-
 	if (!iommu || !domain) {
 		*dma_addr = (dma_addr_t)paddr;
 		return virt_addr;
-- 
1.5.4.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 11:19 [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent FUJITA Tomonori
@ 2008-09-10 11:57 ` Ingo Molnar
  2008-09-10 12:03 ` Joerg Roedel
  1 sibling, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2008-09-10 11:57 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: joerg.roedel, linux-kernel, Thomas Gleixner, H. Peter Anvin

* FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> This is for tip/iommu.

applied these commits to tip/x86/iommu:

edee956: x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
9821626: x86: convert dma_alloc_coherent to use is_device_dma_capable
8a493d3: x86: remove duplicated extern force_iommu

thanks!

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 11:19 [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent FUJITA Tomonori
  2008-09-10 11:57 ` Ingo Molnar
@ 2008-09-10 12:03 ` Joerg Roedel
  2008-09-10 12:18   ` Ingo Molnar
  2008-09-10 12:38   ` FUJITA Tomonori
  1 sibling, 2 replies; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 12:03 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Wed, Sep 10, 2008 at 08:19:40PM +0900, FUJITA Tomonori wrote:
> This is for tip/iommu.
> 
> btw, Joery, what happens if map_sg (or map_single) gets
> not-DMA-capable buffer in the case of !iommu || !domain?

It blows up :-)
I know it but currently it is no problem because this will never happen
in any real AMD IOMMU system (because the hardware just remaps every
device in the system). It needs a fix anyway and the
right solution here is to fall back to one of the software iommu
implementations. The stackable dma_ops patches I have currently in work
will do exactly that.

> 
> 
> ==
> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Subject: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
> 
> x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
> up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
> need it for devices that the IOMMU can do virtual mappings for. This
> patch avoids unnecessary low zone allocation.
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> ---
>  arch/x86/kernel/amd_iommu.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
> index 01c68c3..8efd249 100644
> --- a/arch/x86/kernel/amd_iommu.c
> +++ b/arch/x86/kernel/amd_iommu.c
> @@ -999,6 +999,11 @@ static void *alloc_coherent(struct device *dev, size_t size,
>  	u16 devid;
>  	phys_addr_t paddr;
>  
> +	get_device_resources(dev, &iommu, &domain, &devid);
> +
> +	if (iommu && domain)
> +		flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
> +

These flags are already removed in the dma_alloc_coherent function which
calls this one. Further I think in the case of a remapping IOMMU like
this one we should avoid implementing any gfp hacks and just trust the
caller.

>  	virt_addr = (void *)__get_free_pages(flag, get_order(size));
>  	if (!virt_addr)
>  		return 0;
> @@ -1006,8 +1011,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
>  	memset(virt_addr, 0, size);
>  	paddr = virt_to_phys(virt_addr);
>  
> -	get_device_resources(dev, &iommu, &domain, &devid);
> -
>  	if (!iommu || !domain) {
>  		*dma_addr = (dma_addr_t)paddr;
>  		return virt_addr;
> -- 
> 1.5.4.2
> 
> 

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 12:03 ` Joerg Roedel
@ 2008-09-10 12:18   ` Ingo Molnar
  2008-09-10 12:38   ` FUJITA Tomonori
  1 sibling, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2008-09-10 12:18 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: FUJITA Tomonori, linux-kernel


* Joerg Roedel <joerg.roedel@amd.com> wrote:

> > +	get_device_resources(dev, &iommu, &domain, &devid);
> > +
> > +	if (iommu && domain)
> > +		flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
> > +
> 
> These flags are already removed in the dma_alloc_coherent function 
> which calls this one. Further I think in the case of a remapping IOMMU 
> like this one we should avoid implementing any gfp hacks and just 
> trust the caller.

or add a WARN_ON_ONCE() instead, to document and enforce that 
assumption.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 12:03 ` Joerg Roedel
  2008-09-10 12:18   ` Ingo Molnar
@ 2008-09-10 12:38   ` FUJITA Tomonori
  2008-09-10 12:48     ` Joerg Roedel
  1 sibling, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 12:38 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 14:03:10 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 08:19:40PM +0900, FUJITA Tomonori wrote:
> > This is for tip/iommu.
> > 
> > btw, Joery, what happens if map_sg (or map_single) gets
> > not-DMA-capable buffer in the case of !iommu || !domain?
> 
> It blows up :-)
> I know it but currently it is no problem because this will never happen
> in any real AMD IOMMU system (because the hardware just remaps every
> device in the system).

I see, thanks.

> It needs a fix anyway and the
> right solution here is to fall back to one of the software iommu
> implementations. The stackable dma_ops patches I have currently in work
> will do exactly that.

I'm not sure you need the stackable dma_ops support. Calgary IOMMU had
the same problem and already solved it with dma_ops-per-device option.


> > ==
> > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Subject: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
> > 
> > x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
> > up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
> > need it for devices that the IOMMU can do virtual mappings for. This
> > patch avoids unnecessary low zone allocation.
> > 
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > ---
> >  arch/x86/kernel/amd_iommu.c |    7 +++++--
> >  1 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
> > index 01c68c3..8efd249 100644
> > --- a/arch/x86/kernel/amd_iommu.c
> > +++ b/arch/x86/kernel/amd_iommu.c
> > @@ -999,6 +999,11 @@ static void *alloc_coherent(struct device *dev, size_t size,
> >  	u16 devid;
> >  	phys_addr_t paddr;
> >  
> > +	get_device_resources(dev, &iommu, &domain, &devid);
> > +
> > +	if (iommu && domain)
> > +		flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
> > +
> 
> These flags are already removed in the dma_alloc_coherent function which
> calls this one. Further I think in the case of a remapping IOMMU like

Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does
that so that swiotlb and pci-nommu don't need the gfp hack. Clearing
the gfp flags is much simpler than setting up the flags correctly
mainly because of the fallback device, setting up the flags is really
difficult.


> this one we should avoid implementing any gfp hacks and just trust the
> caller.
> 
> >  	virt_addr = (void *)__get_free_pages(flag, get_order(size));
> >  	if (!virt_addr)
> >  		return 0;
> > @@ -1006,8 +1011,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
> >  	memset(virt_addr, 0, size);
> >  	paddr = virt_to_phys(virt_addr);
> >  
> > -	get_device_resources(dev, &iommu, &domain, &devid);
> > -
> >  	if (!iommu || !domain) {
> >  		*dma_addr = (dma_addr_t)paddr;
> >  		return virt_addr;
> > -- 
> > 1.5.4.2
> > 
> > 
> 
> -- 
>            |           AMD Saxony Limited Liability Company & Co. KG
>  Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
>  System    |                  Register Court Dresden: HRA 4896
>  Research  |              General Partner authorized to represent:
>  Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
>            | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 12:38   ` FUJITA Tomonori
@ 2008-09-10 12:48     ` Joerg Roedel
  2008-09-10 13:03       ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 12:48 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Wed, Sep 10, 2008 at 09:38:11PM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 14:03:10 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> > It needs a fix anyway and the
> > right solution here is to fall back to one of the software iommu
> > implementations. The stackable dma_ops patches I have currently in work
> > will do exactly that.
> 
> I'm not sure you need the stackable dma_ops support. Calgary IOMMU had
> the same problem and already solved it with dma_ops-per-device option.

We need stackable dma_ops anyway for paravirt IOMMU support in KVM. And
they will fix this issue too.

> > These flags are already removed in the dma_alloc_coherent function which
> > calls this one. Further I think in the case of a remapping IOMMU like
> 
> Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does
> that so that swiotlb and pci-nommu don't need the gfp hack. Clearing
> the gfp flags is much simpler than setting up the flags correctly
> mainly because of the fallback device, setting up the flags is really
> difficult.

Yes, dma_alloc_coherent in dma-mapping.h clears the flags. And this
function also calls ops->alloc_coherent which points to the AMD IOMMUs
alloc_coherent function if the driver is in place.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 12:48     ` Joerg Roedel
@ 2008-09-10 13:03       ` FUJITA Tomonori
  2008-09-10 13:10         ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 13:03 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 14:48:22 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 09:38:11PM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 14:03:10 +0200
> > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > It needs a fix anyway and the
> > > right solution here is to fall back to one of the software iommu
> > > implementations. The stackable dma_ops patches I have currently in work
> > > will do exactly that.
> > 
> > I'm not sure you need the stackable dma_ops support. Calgary IOMMU had
> > the same problem and already solved it with dma_ops-per-device option.
> 
> We need stackable dma_ops anyway for paravirt IOMMU support in KVM.

I know. We discussed it when adding dma_ops-per-device support.


> And they will fix this issue too.

Ok, I'll wait until I see how the patches solve the problem cleanly.


> > > These flags are already removed in the dma_alloc_coherent function which
> > > calls this one. Further I think in the case of a remapping IOMMU like
> > 
> > Not true about x86/tip/iommu. dma_alloc_coherent in dma-mapping.h does
> > that so that swiotlb and pci-nommu don't need the gfp hack. Clearing
> > the gfp flags is much simpler than setting up the flags correctly
> > mainly because of the fallback device, setting up the flags is really
> > difficult.
> 
> Yes, dma_alloc_coherent in dma-mapping.h clears the flags. And this
> function also calls ops->alloc_coherent which points to the AMD IOMMUs
> alloc_coherent function if the driver is in place.

Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
in tip/x86/iommu:

dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
		gfp_t gfp)
{
	struct dma_mapping_ops *ops = get_dma_ops(dev);
	void *memory;

	gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);

Surely we here clear the flag but...

	if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
		return memory;

	if (!dev) {
		dev = &x86_dma_fallback_dev;
		gfp |= GFP_DMA;
	}

we play with it here though (not happens with pci devices),

	if (!dev->dma_mask)
		return NULL;

	if (!ops->alloc_coherent)
		return NULL;

Then dma_alloc_coherent_gfp_flags() sets it again according to
device->coherent_dma_mask and gfp before ops->alloc_coherent hook:

	return ops->alloc_coherent(dev, size, dma_handle,
				   dma_alloc_coherent_gfp_flags(dev, gfp));


This code can set up the exact same gfp flag for swiotbl and nommu as
before.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 13:03       ` FUJITA Tomonori
@ 2008-09-10 13:10         ` Joerg Roedel
  2008-09-10 13:37           ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 13:10 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Wed, Sep 10, 2008 at 10:03:32PM +0900, FUJITA Tomonori wrote:

Ok, I see.

> Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
> in tip/x86/iommu:
> 
> dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
> 		gfp_t gfp)
> {
> 	struct dma_mapping_ops *ops = get_dma_ops(dev);
> 	void *memory;
> 
> 	gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
> 
> Surely we here clear the flag but...
> 
> 	if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
> 		return memory;
> 
> 	if (!dev) {
> 		dev = &x86_dma_fallback_dev;
> 		gfp |= GFP_DMA;
> 	}
> 
> we play with it here though (not happens with pci devices),
> 
> 	if (!dev->dma_mask)
> 		return NULL;
> 
> 	if (!ops->alloc_coherent)
> 		return NULL;
> 
> Then dma_alloc_coherent_gfp_flags() sets it again according to
> device->coherent_dma_mask and gfp before ops->alloc_coherent hook:
> 
> 	return ops->alloc_coherent(dev, size, dma_handle,
> 				   dma_alloc_coherent_gfp_flags(dev, gfp));
> 
> 
> This code can set up the exact same gfp flag for swiotbl and nommu as
> before.

So its possible that alloc_coherent is called with region specifiers in
the gfp flags. Can't we simply make the gfp hacks depend on
dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
implementations?

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 13:10         ` Joerg Roedel
@ 2008-09-10 13:37           ` FUJITA Tomonori
  2008-09-10 13:53             ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 13:37 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 15:10:32 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 10:03:32PM +0900, FUJITA Tomonori wrote:
> 
> Ok, I see.
> 
> > Hmm, I'm not sure what code you look at. Here's dma_alloc_coherent()
> > in tip/x86/iommu:
> > 
> > dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
> > 		gfp_t gfp)
> > {
> > 	struct dma_mapping_ops *ops = get_dma_ops(dev);
> > 	void *memory;
> > 
> > 	gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
> > 
> > Surely we here clear the flag but...
> > 
> > 	if (dma_alloc_from_coherent(dev, size, dma_handle, &memory))
> > 		return memory;
> > 
> > 	if (!dev) {
> > 		dev = &x86_dma_fallback_dev;
> > 		gfp |= GFP_DMA;
> > 	}
> > 
> > we play with it here though (not happens with pci devices),
> > 
> > 	if (!dev->dma_mask)
> > 		return NULL;
> > 
> > 	if (!ops->alloc_coherent)
> > 		return NULL;
> > 
> > Then dma_alloc_coherent_gfp_flags() sets it again according to
> > device->coherent_dma_mask and gfp before ops->alloc_coherent hook:
> > 
> > 	return ops->alloc_coherent(dev, size, dma_handle,
> > 				   dma_alloc_coherent_gfp_flags(dev, gfp));
> > 
> > 
> > This code can set up the exact same gfp flag for swiotbl and nommu as
> > before.
> 
> So its possible that alloc_coherent is called with region specifiers in
> the gfp flags.

Yes, that's how we solved the ZONE_DMA exhaustion problem in
swiotlb. And we don't duplicate the gfp hack in both swiotlb and
pci-nommu.

> Can't we simply make the gfp hacks depend on
> dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> implementations?

I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
hook doesn't make the code simpler much. Currently, we have the gfp
setting hack in just one place. It's not bad. Adding such new hook
means adding more lines than we can remove.

Yeah, I was against your patch to adding the gfp setting hack to
swiotlb but it's because gfp is kinda architecture specific stuff and
swiotlb should not. It's the bad design IMO. It's ok for me that
architecture specific IOMMUs can do the architecture specific stuff
(and it's about just clearing the gfp flag).

Intel IOMMU already clears up the flag so I'll send a patch to do that
for Calgary shortly then I think that we can finish x86 alloc_coherent
rewrite.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 13:37           ` FUJITA Tomonori
@ 2008-09-10 13:53             ` Joerg Roedel
  2008-09-10 14:24               ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 13:53 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Wed, Sep 10, 2008 at 10:37:45PM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 15:10:32 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> > Can't we simply make the gfp hacks depend on
> > dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> > implementations?
> 
> I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
> hook doesn't make the code simpler much. Currently, we have the gfp
> setting hack in just one place. It's not bad. Adding such new hook
> means adding more lines than we can remove.

The is_phys flas is already in place and its meaning is "the dma_ops
return bus addresses equal to physical addresses". This is exactly the
case when we need the gfp hacks. So I don't see a problem in just
skipping the gfp rewrite if is_phys is zero. I don't see a point in
adding gfp flags in dma_alloc_coherent and remove them again
dma_ops->alloc_coherent code. Specially in this case where we already
know in dma_alloc_coherent if we really need the flag rewrite.

> Yeah, I was against your patch to adding the gfp setting hack to
> swiotlb but it's because gfp is kinda architecture specific stuff and
> swiotlb should not. It's the bad design IMO. It's ok for me that
> architecture specific IOMMUs can do the architecture specific stuff
> (and it's about just clearing the gfp flag).

The generic swiotlb code already contained a gfp hack for IA64 and I
added another one for x86 which is ok in my opinion. But the #ifdef was
ugly, I agree with that now :-)
Your solution of removing the flag hacks from the generic code completly
was the other possible way.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 13:53             ` Joerg Roedel
@ 2008-09-10 14:24               ` FUJITA Tomonori
  2008-09-10 14:38                 ` Joerg Roedel
  2008-09-10 14:39                 ` FUJITA Tomonori
  0 siblings, 2 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 14:24 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 15:53:47 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 10:37:45PM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 15:10:32 +0200
> > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > Can't we simply make the gfp hacks depend on
> > > dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> > > implementations?
> > 
> > I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
> > hook doesn't make the code simpler much. Currently, we have the gfp
> > setting hack in just one place. It's not bad. Adding such new hook
> > means adding more lines than we can remove.
> 
> The is_phys flas is already in place and its meaning is "the dma_ops
> return bus addresses equal to physical addresses". This is exactly the
> case when we need the gfp hacks. So I don't see a problem in just
> skipping the gfp rewrite if is_phys is zero. I don't see a point in
> adding gfp flags in dma_alloc_coherent and remove them again
> dma_ops->alloc_coherent code. Specially in this case where we already
> know in dma_alloc_coherent if we really need the flag rewrite.

dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do
virtual mappings for some devices and doesn't for some.

We need to a hook that can pass a point to a device to IOMMUs like:

dma_ops->is_phys(struct device *dev)


Because they need to look at a device to know if they will do virtual
mappings or not for it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:24               ` FUJITA Tomonori
@ 2008-09-10 14:38                 ` Joerg Roedel
  2008-09-10 14:45                   ` FUJITA Tomonori
  2008-09-10 14:39                 ` FUJITA Tomonori
  1 sibling, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 14:38 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: joerg.roedel, linux-kernel, mingo

On Wed, Sep 10, 2008 at 11:24:57PM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 15:53:47 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > On Wed, Sep 10, 2008 at 10:37:45PM +0900, FUJITA Tomonori wrote:
> > > On Wed, 10 Sep 2008 15:10:32 +0200
> > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > > Can't we simply make the gfp hacks depend on
> > > > dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> > > > implementations?
> > > 
> > > I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
> > > hook doesn't make the code simpler much. Currently, we have the gfp
> > > setting hack in just one place. It's not bad. Adding such new hook
> > > means adding more lines than we can remove.
> > 
> > The is_phys flas is already in place and its meaning is "the dma_ops
> > return bus addresses equal to physical addresses". This is exactly the
> > case when we need the gfp hacks. So I don't see a problem in just
> > skipping the gfp rewrite if is_phys is zero. I don't see a point in
> > adding gfp flags in dma_alloc_coherent and remove them again
> > dma_ops->alloc_coherent code. Specially in this case where we already
> > know in dma_alloc_coherent if we really need the flag rewrite.
> 
> dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do
> virtual mappings for some devices and doesn't for some.
> 
> We need to a hook that can pass a point to a device to IOMMUs like:
> 
> dma_ops->is_phys(struct device *dev)
> 
> 
> Because they need to look at a device to know if they will do virtual
> mappings or not for it.

Ok, thats a valid point. I queue your patch with the AMD IOMMU updates
for 2.6.28. Thanks.

Joerg


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:24               ` FUJITA Tomonori
  2008-09-10 14:38                 ` Joerg Roedel
@ 2008-09-10 14:39                 ` FUJITA Tomonori
  2008-09-10 14:52                   ` Joerg Roedel
  1 sibling, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 14:39 UTC (permalink / raw)
  To: joerg.roedel; +Cc: linux-kernel, mingo

On Wed, 10 Sep 2008 23:24:57 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> On Wed, 10 Sep 2008 15:53:47 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > On Wed, Sep 10, 2008 at 10:37:45PM +0900, FUJITA Tomonori wrote:
> > > On Wed, 10 Sep 2008 15:10:32 +0200
> > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > > Can't we simply make the gfp hacks depend on
> > > > dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> > > > implementations?
> > > 
> > > I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
> > > hook doesn't make the code simpler much. Currently, we have the gfp
> > > setting hack in just one place. It's not bad. Adding such new hook
> > > means adding more lines than we can remove.
> > 
> > The is_phys flas is already in place and its meaning is "the dma_ops
> > return bus addresses equal to physical addresses". This is exactly the
> > case when we need the gfp hacks. So I don't see a problem in just
> > skipping the gfp rewrite if is_phys is zero. I don't see a point in
> > adding gfp flags in dma_alloc_coherent and remove them again
> > dma_ops->alloc_coherent code. Specially in this case where we already
> > know in dma_alloc_coherent if we really need the flag rewrite.
> 
> dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do
> virtual mappings for some devices and doesn't for some.
> 
> We need to a hook that can pass a point to a device to IOMMUs like:
> 
> dma_ops->is_phys(struct device *dev)
> 
> 
> Because they need to look at a device to know if they will do virtual
> mappings or not for it.

btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
mappings to allocate a size-aligned memory (as DMA-mapping.txt
defines).

Because someone strongly insisted, I modified GART's alloc_coherent to
do so but as I said again and again, it's completely meaningless (only
POWER IOMMU does it and drivers don't depend on such requirement).

I guess that it would be better to do virtual mappings only when
necessary as the current mainline does since GART I/O space is
precious in some systems. But I don't care much. What's your opinion
(as a AMD developer)?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:38                 ` Joerg Roedel
@ 2008-09-10 14:45                   ` FUJITA Tomonori
  2008-09-11  9:10                     ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 14:45 UTC (permalink / raw)
  To: joro; +Cc: fujita.tomonori, joerg.roedel, linux-kernel, mingo

On Wed, 10 Sep 2008 16:38:18 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> On Wed, Sep 10, 2008 at 11:24:57PM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 15:53:47 +0200
> > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > On Wed, Sep 10, 2008 at 10:37:45PM +0900, FUJITA Tomonori wrote:
> > > > On Wed, 10 Sep 2008 15:10:32 +0200
> > > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > > > Can't we simply make the gfp hacks depend on
> > > > > dma_ops->is_phys and avoid further gfp hacks in the hardware iommu
> > > > > implementations?
> > > > 
> > > > I thought about it but adding a new dma_ops->we_don't_want_gfp_flag
> > > > hook doesn't make the code simpler much. Currently, we have the gfp
> > > > setting hack in just one place. It's not bad. Adding such new hook
> > > > means adding more lines than we can remove.
> > > 
> > > The is_phys flas is already in place and its meaning is "the dma_ops
> > > return bus addresses equal to physical addresses". This is exactly the
> > > case when we need the gfp hacks. So I don't see a problem in just
> > > skipping the gfp rewrite if is_phys is zero. I don't see a point in
> > > adding gfp flags in dma_alloc_coherent and remove them again
> > > dma_ops->alloc_coherent code. Specially in this case where we already
> > > know in dma_alloc_coherent if we really need the flag rewrite.
> > 
> > dma_ops->is_phys doesn't work well for GART and Intel IOMMU, that do
> > virtual mappings for some devices and doesn't for some.
> > 
> > We need to a hook that can pass a point to a device to IOMMUs like:
> > 
> > dma_ops->is_phys(struct device *dev)
> > 
> > 
> > Because they need to look at a device to know if they will do virtual
> > mappings or not for it.
> 
> Ok, thats a valid point. I queue your patch with the AMD IOMMU updates
> for 2.6.28. Thanks.

Ingo already has queued it his tree, I think.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:39                 ` FUJITA Tomonori
@ 2008-09-10 14:52                   ` Joerg Roedel
  2008-09-10 15:09                     ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 14:52 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
 
> btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> mappings to allocate a size-aligned memory (as DMA-mapping.txt
> defines).
> 
> Because someone strongly insisted, I modified GART's alloc_coherent to
> do so but as I said again and again, it's completely meaningless (only
> POWER IOMMU does it and drivers don't depend on such requirement).
> 
> I guess that it would be better to do virtual mappings only when
> necessary as the current mainline does since GART I/O space is
> precious in some systems. But I don't care much. What's your opinion
> (as a AMD developer)?

Very true. My original rewrite did the mapping only when necessary too.
What were the reasons to do the mapping always?

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:52                   ` Joerg Roedel
@ 2008-09-10 15:09                     ` FUJITA Tomonori
  2008-09-10 15:29                       ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 15:09 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 16:52:49 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
>  
> > btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> > mappings to allocate a size-aligned memory (as DMA-mapping.txt
> > defines).
> > 
> > Because someone strongly insisted, I modified GART's alloc_coherent to
> > do so but as I said again and again, it's completely meaningless (only
> > POWER IOMMU does it and drivers don't depend on such requirement).
> > 
> > I guess that it would be better to do virtual mappings only when
> > necessary as the current mainline does since GART I/O space is
> > precious in some systems. But I don't care much. What's your opinion
> > (as a AMD developer)?
> 
> Very true. My original rewrite did the mapping only when necessary too.
> What were the reasons to do the mapping always?

As I said above, it's for allocating a size-aligned memory. Look at
the description of pci_alloc_consistent in DMA-mapping.txt:

The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size.  This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.

You can't do this with __get_free_pages easily (you need some hacks to
do this). You can do this via iommu_area_alloc() for free.

Well, actually you agreed with adding such requirement (though I said
again and again that it's totally meaningless...):

http://lkml.org/lkml/2008/7/24/162

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 15:09                     ` FUJITA Tomonori
@ 2008-09-10 15:29                       ` Joerg Roedel
  2008-09-10 16:29                         ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 15:29 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Thu, Sep 11, 2008 at 12:09:43AM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 16:52:49 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
> >  
> > > btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> > > mappings to allocate a size-aligned memory (as DMA-mapping.txt
> > > defines).
> > > 
> > > Because someone strongly insisted, I modified GART's alloc_coherent to
> > > do so but as I said again and again, it's completely meaningless (only
> > > POWER IOMMU does it and drivers don't depend on such requirement).
> > > 
> > > I guess that it would be better to do virtual mappings only when
> > > necessary as the current mainline does since GART I/O space is
> > > precious in some systems. But I don't care much. What's your opinion
> > > (as a AMD developer)?
> > 
> > Very true. My original rewrite did the mapping only when necessary too.
> > What were the reasons to do the mapping always?
> 
> As I said above, it's for allocating a size-aligned memory. Look at
> the description of pci_alloc_consistent in DMA-mapping.txt:
> 
> The cpu return address and the DMA bus master address are both
> guaranteed to be aligned to the smallest PAGE_SIZE order which
> is greater than or equal to the requested size.  This invariant
> exists (for example) to guarantee that if you allocate a chunk
> which is smaller than or equal to 64 kilobytes, the extent of the
> buffer you receive will not cross a 64K boundary.
> 
> You can't do this with __get_free_pages easily (you need some hacks to
> do this). You can do this via iommu_area_alloc() for free.

What hacks do you need with __get_free_pages? The memory it returns is
_always_ aligned at its size.

> 
> Well, actually you agreed with adding such requirement (though I said
> again and again that it's totally meaningless...):

The other possible way is removing this requirement from the
documentation. But if the spec has this requirement the code _has_ to
fullfill it, even if its meaningless for most drivers.

> http://lkml.org/lkml/2008/7/24/162

This message stated nothing about _always_ map GART alloc_coherent
allocations using the aperture.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 15:29                       ` Joerg Roedel
@ 2008-09-10 16:29                         ` FUJITA Tomonori
  2008-09-10 17:05                           ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 16:29 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 17:29:18 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Thu, Sep 11, 2008 at 12:09:43AM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 16:52:49 +0200
> > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
> > >  
> > > > btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> > > > mappings to allocate a size-aligned memory (as DMA-mapping.txt
> > > > defines).
> > > > 
> > > > Because someone strongly insisted, I modified GART's alloc_coherent to
> > > > do so but as I said again and again, it's completely meaningless (only
> > > > POWER IOMMU does it and drivers don't depend on such requirement).
> > > > 
> > > > I guess that it would be better to do virtual mappings only when
> > > > necessary as the current mainline does since GART I/O space is
> > > > precious in some systems. But I don't care much. What's your opinion
> > > > (as a AMD developer)?
> > > 
> > > Very true. My original rewrite did the mapping only when necessary too.
> > > What were the reasons to do the mapping always?
> > 
> > As I said above, it's for allocating a size-aligned memory. Look at
> > the description of pci_alloc_consistent in DMA-mapping.txt:
> > 
> > The cpu return address and the DMA bus master address are both
> > guaranteed to be aligned to the smallest PAGE_SIZE order which
> > is greater than or equal to the requested size.  This invariant
> > exists (for example) to guarantee that if you allocate a chunk
> > which is smaller than or equal to 64 kilobytes, the extent of the
> > buffer you receive will not cross a 64K boundary.
> > 
> > You can't do this with __get_free_pages easily (you need some hacks to
> > do this). You can do this via iommu_area_alloc() for free.
> 
> What hacks do you need with __get_free_pages? The memory it returns is
> _always_ aligned at its size.

Is it guaranteed (documented somewhere) ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 16:29                         ` FUJITA Tomonori
@ 2008-09-10 17:05                           ` Joerg Roedel
  2008-09-10 17:15                             ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 17:05 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Thu, Sep 11, 2008 at 01:29:54AM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 17:29:18 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > On Thu, Sep 11, 2008 at 12:09:43AM +0900, FUJITA Tomonori wrote:
> > > On Wed, 10 Sep 2008 16:52:49 +0200
> > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > 
> > > > On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
> > > >  
> > > > > btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> > > > > mappings to allocate a size-aligned memory (as DMA-mapping.txt
> > > > > defines).
> > > > > 
> > > > > Because someone strongly insisted, I modified GART's alloc_coherent to
> > > > > do so but as I said again and again, it's completely meaningless (only
> > > > > POWER IOMMU does it and drivers don't depend on such requirement).
> > > > > 
> > > > > I guess that it would be better to do virtual mappings only when
> > > > > necessary as the current mainline does since GART I/O space is
> > > > > precious in some systems. But I don't care much. What's your opinion
> > > > > (as a AMD developer)?
> > > > 
> > > > Very true. My original rewrite did the mapping only when necessary too.
> > > > What were the reasons to do the mapping always?
> > > 
> > > As I said above, it's for allocating a size-aligned memory. Look at
> > > the description of pci_alloc_consistent in DMA-mapping.txt:
> > > 
> > > The cpu return address and the DMA bus master address are both
> > > guaranteed to be aligned to the smallest PAGE_SIZE order which
> > > is greater than or equal to the requested size.  This invariant
> > > exists (for example) to guarantee that if you allocate a chunk
> > > which is smaller than or equal to 64 kilobytes, the extent of the
> > > buffer you receive will not cross a 64K boundary.
> > > 
> > > You can't do this with __get_free_pages easily (you need some hacks to
> > > do this). You can do this via iommu_area_alloc() for free.
> > 
> > What hacks do you need with __get_free_pages? The memory it returns is
> > _always_ aligned at its size.
> 
> Is it guaranteed (documented somewhere) ?

I don't know if there is a formal definition for it. It is documented in
some books about the Linux kernel (I read this in some book the first
time). This alignment results from the buddy algorithm the page alloctor
uses. You can definitly rely on that.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 17:05                           ` Joerg Roedel
@ 2008-09-10 17:15                             ` FUJITA Tomonori
  2008-09-10 17:25                               ` Joerg Roedel
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-10 17:15 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, linux-kernel, mingo

On Wed, 10 Sep 2008 19:05:04 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Thu, Sep 11, 2008 at 01:29:54AM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 17:29:18 +0200
> > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > On Thu, Sep 11, 2008 at 12:09:43AM +0900, FUJITA Tomonori wrote:
> > > > On Wed, 10 Sep 2008 16:52:49 +0200
> > > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > > 
> > > > > On Wed, Sep 10, 2008 at 11:39:00PM +0900, FUJITA Tomonori wrote:
> > > > >  
> > > > > > btw, in tip/x86/iommu, GART's alloc_coherent always does virtual
> > > > > > mappings to allocate a size-aligned memory (as DMA-mapping.txt
> > > > > > defines).
> > > > > > 
> > > > > > Because someone strongly insisted, I modified GART's alloc_coherent to
> > > > > > do so but as I said again and again, it's completely meaningless (only
> > > > > > POWER IOMMU does it and drivers don't depend on such requirement).
> > > > > > 
> > > > > > I guess that it would be better to do virtual mappings only when
> > > > > > necessary as the current mainline does since GART I/O space is
> > > > > > precious in some systems. But I don't care much. What's your opinion
> > > > > > (as a AMD developer)?
> > > > > 
> > > > > Very true. My original rewrite did the mapping only when necessary too.
> > > > > What were the reasons to do the mapping always?
> > > > 
> > > > As I said above, it's for allocating a size-aligned memory. Look at
> > > > the description of pci_alloc_consistent in DMA-mapping.txt:
> > > > 
> > > > The cpu return address and the DMA bus master address are both
> > > > guaranteed to be aligned to the smallest PAGE_SIZE order which
> > > > is greater than or equal to the requested size.  This invariant
> > > > exists (for example) to guarantee that if you allocate a chunk
> > > > which is smaller than or equal to 64 kilobytes, the extent of the
> > > > buffer you receive will not cross a 64K boundary.
> > > > 
> > > > You can't do this with __get_free_pages easily (you need some hacks to
> > > > do this). You can do this via iommu_area_alloc() for free.
> > > 
> > > What hacks do you need with __get_free_pages? The memory it returns is
> > > _always_ aligned at its size.
> > 
> > Is it guaranteed (documented somewhere) ?
> 
> I don't know if there is a formal definition for it. It is documented in
> some books about the Linux kernel (I read this in some book the first
> time). This alignment results from the buddy algorithm the page alloctor
> uses. You can definitly rely on that.

I meant, if it's not documented as a guaranteed feature (not just the
characteristic of the current code), it could change any time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 17:15                             ` FUJITA Tomonori
@ 2008-09-10 17:25                               ` Joerg Roedel
  0 siblings, 0 replies; 23+ messages in thread
From: Joerg Roedel @ 2008-09-10 17:25 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, mingo

On Thu, Sep 11, 2008 at 02:15:22AM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 19:05:04 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > On Thu, Sep 11, 2008 at 01:29:54AM +0900, FUJITA Tomonori wrote:
> > > On Wed, 10 Sep 2008 17:29:18 +0200
> > > Joerg Roedel <joerg.roedel@amd.com> wrote:
> > > 
> > > > What hacks do you need with __get_free_pages? The memory it returns is
> > > > _always_ aligned at its size.
> > > 
> > > Is it guaranteed (documented somewhere) ?
> > 
> > I don't know if there is a formal definition for it. It is documented in
> > some books about the Linux kernel (I read this in some book the first
> > time). This alignment results from the buddy algorithm the page alloctor
> > uses. You can definitly rely on that.
> 
> I meant, if it's not documented as a guaranteed feature (not just the
> characteristic of the current code), it could change any time.

It will never change. I am sure about this. A lot of code relys on
this like the HugeTLBfs implementations for example. In fact, it may be
possible that the requirement to the dma_alloc_coherent function derive
from this behavior of the Linux page allocator (to not break anything
when the dma api was introduced).

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-10 14:45                   ` FUJITA Tomonori
@ 2008-09-11  9:10                     ` Joerg Roedel
  2008-09-11 13:36                       ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Joerg Roedel @ 2008-09-11  9:10 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: joro, linux-kernel, mingo

On Wed, Sep 10, 2008 at 11:45:34PM +0900, FUJITA Tomonori wrote:
> On Wed, 10 Sep 2008 16:38:18 +0200
> Joerg Roedel <joro@8bytes.org> wrote:
> > Ok, thats a valid point. I queue your patch with the AMD IOMMU updates
> > for 2.6.28. Thanks.
> 
> Ingo already has queued it his tree, I think.

I asked Ingo to remove it from his tree to avoid merge conflicts with
updates I am currently preparing. Here is the patch I queued into my
2.6.28 update chain.

Joerg

===
commit 42a7fa8d8a7f1680cba76e4824dba30bfd1c9d70
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date:   Wed Sep 10 20:19:40 2008 +0900

    x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
    
    x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
    up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
    need it for devices that the IOMMU can do virtual mappings for. This
    patch avoids unnecessary low zone allocation.
    
    Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
    Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index ae38b24..09261dc 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -1360,6 +1360,9 @@ static void *alloc_coherent(struct device *dev, size_t size,
 	if (!check_device(dev))
 		return NULL;
 
+	if (!get_device_resources(dev, &iommu, &domain, &devid))
+		flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+
 	virt_addr = (void *)__get_free_pages(flag, get_order(size));
 	if (!virt_addr)
 		return 0;
@@ -1367,8 +1370,6 @@ static void *alloc_coherent(struct device *dev, size_t size,
 	memset(virt_addr, 0, size);
 	paddr = virt_to_phys(virt_addr);
 
-	get_device_resources(dev, &iommu, &domain, &devid);
-
 	if (!iommu || !domain) {
 		*dma_addr = (dma_addr_t)paddr;
 		return virt_addr;

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
  2008-09-11  9:10                     ` Joerg Roedel
@ 2008-09-11 13:36                       ` FUJITA Tomonori
  0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-09-11 13:36 UTC (permalink / raw)
  To: joerg.roedel; +Cc: fujita.tomonori, joro, linux-kernel, mingo

On Thu, 11 Sep 2008 11:10:00 +0200
Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Wed, Sep 10, 2008 at 11:45:34PM +0900, FUJITA Tomonori wrote:
> > On Wed, 10 Sep 2008 16:38:18 +0200
> > Joerg Roedel <joro@8bytes.org> wrote:
> > > Ok, thats a valid point. I queue your patch with the AMD IOMMU updates
> > > for 2.6.28. Thanks.
> > 
> > Ingo already has queued it his tree, I think.
> 
> I asked Ingo to remove it from his tree to avoid merge conflicts with
> updates I am currently preparing. Here is the patch I queued into my
> 2.6.28 update chain.
> 
> Joerg
> 
> ===
> commit 42a7fa8d8a7f1680cba76e4824dba30bfd1c9d70
> Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date:   Wed Sep 10 20:19:40 2008 +0900
> 
>     x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent
>     
>     x86's common alloc_coherent (dma_alloc_coherent in dma-mapping.h) sets
>     up the gfp flag according to the device dma_mask but AMD IOMMU doesn't
>     need it for devices that the IOMMU can do virtual mappings for. This
>     patch avoids unnecessary low zone allocation.
>     
>     Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
>     Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> 
> diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
> index ae38b24..09261dc 100644
> --- a/arch/x86/kernel/amd_iommu.c
> +++ b/arch/x86/kernel/amd_iommu.c

I see, thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-09-11 13:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-10 11:19 [PATCH] x86: avoid unnecessary low zone allocation in AMD IOMMU's alloc_coherent FUJITA Tomonori
2008-09-10 11:57 ` Ingo Molnar
2008-09-10 12:03 ` Joerg Roedel
2008-09-10 12:18   ` Ingo Molnar
2008-09-10 12:38   ` FUJITA Tomonori
2008-09-10 12:48     ` Joerg Roedel
2008-09-10 13:03       ` FUJITA Tomonori
2008-09-10 13:10         ` Joerg Roedel
2008-09-10 13:37           ` FUJITA Tomonori
2008-09-10 13:53             ` Joerg Roedel
2008-09-10 14:24               ` FUJITA Tomonori
2008-09-10 14:38                 ` Joerg Roedel
2008-09-10 14:45                   ` FUJITA Tomonori
2008-09-11  9:10                     ` Joerg Roedel
2008-09-11 13:36                       ` FUJITA Tomonori
2008-09-10 14:39                 ` FUJITA Tomonori
2008-09-10 14:52                   ` Joerg Roedel
2008-09-10 15:09                     ` FUJITA Tomonori
2008-09-10 15:29                       ` Joerg Roedel
2008-09-10 16:29                         ` FUJITA Tomonori
2008-09-10 17:05                           ` Joerg Roedel
2008-09-10 17:15                             ` FUJITA Tomonori
2008-09-10 17:25                               ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox