All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads
@ 2008-08-27  7:29 Yinghai Lu
  2008-08-27  7:37 ` Ingo Molnar
  2008-08-27  7:40 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 4+ messages in thread
From: Yinghai Lu @ 2008-08-27  7:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton
  Cc: linux-kernel, Yinghai Lu, Pavel Machek, Benjamin Herrenschmidt,
	Jesse Barnes

Diag guys, found one system when loading is high, will have gart wark error.
root cause is P2P bridge try to prefetch for several intel e1000 under
it. and that skb is near GART iommu area.

try to reserve page in the boundary at first.
last page near TOM2, and last page near MMIO
also gart first and last page.

need one better way for all arch support PCI and memory with a lot of holes etc.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Pavel Machek <pavel@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jesse Barnes<jbarnes@virtuousgeek.org>

---
 arch/x86/kernel/pci-dma.c     |   28 ++++++++++++++++++++++++++++
 arch/x86/kernel/pci-gart_64.c |    6 ++++++
 2 files changed, 34 insertions(+)

Index: linux-2.6/arch/x86/kernel/pci-dma.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-dma.c
+++ linux-2.6/arch/x86/kernel/pci-dma.c
@@ -72,12 +72,40 @@ static int __init parse_dma32_size_opt(c
 }
 early_param("dma32_size", parse_dma32_size_opt);
 
+static void __init reserve_last_page(unsigned long pfn)
+{
+	unsigned long phys;
+	void *ptr;
+
+	phys = (pfn - 1)<<PAGE_SHIFT;
+	ptr = __alloc_bootmem_nopanic(PAGE_SIZE, PAGE_SIZE, phys);
+
+	if (!ptr || virt_to_phys(ptr) != phys)
+		printk(KERN_WARNING "Can not hold last page near %lx for workaround P2P pref DMA reads!\n", phys);
+	else
+		printk(KERN_WARNING "Last page is reserved near %lx for workaround P2P pref DMA reads!\n", phys);
+}
 void __init dma32_reserve_bootmem(void)
 {
 	unsigned long size, align;
+
+	/*
+	 * try to reserve last page to workaround P2P bridge pref DMA reads
+	 * normally don't need to reserve the page near mmio,
+	 * because always has acpi etc sit there.
+	 * but some system has that acpi in the middle of ram below 4g
+	 * so just reserve it.
+	 */
+	if (max_low_pfn_mapped < max_pfn_mapped)
+		reserve_last_page(max_low_pfn_mapped);
+
+	/* less than 4G, don't need iommu */
 	if (max_pfn <= MAX_DMA32_PFN)
 		return;
 
+	/* try to reserve last page to workaround P2P bridge pref DMA reads */
+	reserve_last_page(max_pfn);
+
 	/*
 	 * check aperture_64.c allocate_aperture() for reason about
 	 * using 512M as goal
Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -826,6 +826,9 @@ void __init gart_iommu_init(void)
 	 */
 	set_bit_string(iommu_gart_bitmap, 0, EMERGENCY_PAGES);
 
+	/* reserve one page at tail, for P2P bridge pref DMA reads */
+	set_bit_string(iommu_gart_bitmap, iommu_pages - 1, 1);
+
 	agp_memory_reserved = iommu_size;
 	printk(KERN_INFO
 	       "PCI-DMA: Reserving %luMB of IOMMU area in the AGP aperture\n",
@@ -870,6 +873,9 @@ void __init gart_iommu_init(void)
 	for (i = EMERGENCY_PAGES; i < iommu_pages; i++)
 		iommu_gatt_base[i] = gart_unmapped_entry;
 
+	/* we need set unmapped on head too, for P2P bridge pref DMA reads */
+	iommu_gatt_base[0] = gart_unmapped_entry;
+
 	flush_gart();
 	dma_ops = &gart_dma_ops;
 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads
  2008-08-27  7:29 RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads Yinghai Lu
@ 2008-08-27  7:37 ` Ingo Molnar
  2008-08-27  7:40 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-08-27  7:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	Pavel Machek, Benjamin Herrenschmidt, Jesse Barnes


* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> Diag guys, found one system when loading is high, will have gart wark 
> error. root cause is P2P bridge try to prefetch for several intel 
> e1000 under it. and that skb is near GART iommu area.
> 
> try to reserve page in the boundary at first. last page near TOM2, and 
> last page near MMIO also gart first and last page.
> 
> need one better way for all arch support PCI and memory with a lot of 
> holes etc.

>  void __init dma32_reserve_bootmem(void)
>  {
>  	unsigned long size, align;
> +
> +	/*
> +	 * try to reserve last page to workaround P2P bridge pref DMA reads
> +	 * normally don't need to reserve the page near mmio,
> +	 * because always has acpi etc sit there.
> +	 * but some system has that acpi in the middle of ram below 4g
> +	 * so just reserve it.
> +	 */

Nice! Could this be the root cause of those skb corruptions and e1000 
crashes you've been reporting? So the _usual_ setup accidentally 
protects us from these prefetch induced failures.

I think your patch is fine for the iommu bits, but the 
reserve_last_page() thing should be done in a cleaner way. Cannot we 
just reserve it all at the e820 stage, before passing that RAM to the 
bootmem allocator?

Also, what is the guarantee that 4K of a space is enough to stop all 
prefetching across that boundary?

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads
  2008-08-27  7:29 RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads Yinghai Lu
  2008-08-27  7:37 ` Ingo Molnar
@ 2008-08-27  7:40 ` Benjamin Herrenschmidt
  2008-08-27  7:45   ` Yinghai Lu
  1 sibling, 1 reply; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-27  7:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	linux-kernel, Pavel Machek, Jesse Barnes

On Wed, 2008-08-27 at 00:29 -0700, Yinghai Lu wrote:
> Diag guys, found one system when loading is high, will have gart wark error.
> root cause is P2P bridge try to prefetch for several intel e1000 under
> it. and that skb is near GART iommu area.
> 
> try to reserve page in the boundary at first.
> last page near TOM2, and last page near MMIO
> also gart first and last page.
> 
> need one better way for all arch support PCI and memory with a lot of holes etc.

Sounds like a similar problem for which we don't unmap GART pages,
just map them read-only and route them to a dummy page. In that
case, the right approach is to write the last page of the GART
to the dummy page as well and mark it occupied. That shouldn't
involve actually reserving memory.

I'm not familiar with the x86 PCI DMA code but afaik, it's fairly
similar to ours (powerpc) in that area.

Ben.
 
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> Cc: Pavel Machek <pavel@suse.cz>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Jesse Barnes<jbarnes@virtuousgeek.org>
> 
> ---
>  arch/x86/kernel/pci-dma.c     |   28 ++++++++++++++++++++++++++++
>  arch/x86/kernel/pci-gart_64.c |    6 ++++++
>  2 files changed, 34 insertions(+)
> 
> Index: linux-2.6/arch/x86/kernel/pci-dma.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/pci-dma.c
> +++ linux-2.6/arch/x86/kernel/pci-dma.c
> @@ -72,12 +72,40 @@ static int __init parse_dma32_size_opt(c
>  }
>  early_param("dma32_size", parse_dma32_size_opt);
>  
> +static void __init reserve_last_page(unsigned long pfn)
> +{
> +	unsigned long phys;
> +	void *ptr;
> +
> +	phys = (pfn - 1)<<PAGE_SHIFT;
> +	ptr = __alloc_bootmem_nopanic(PAGE_SIZE, PAGE_SIZE, phys);
> +
> +	if (!ptr || virt_to_phys(ptr) != phys)
> +		printk(KERN_WARNING "Can not hold last page near %lx for workaround P2P pref DMA reads!\n", phys);
> +	else
> +		printk(KERN_WARNING "Last page is reserved near %lx for workaround P2P pref DMA reads!\n", phys);
> +}
>  void __init dma32_reserve_bootmem(void)
>  {
>  	unsigned long size, align;
> +
> +	/*
> +	 * try to reserve last page to workaround P2P bridge pref DMA reads
> +	 * normally don't need to reserve the page near mmio,
> +	 * because always has acpi etc sit there.
> +	 * but some system has that acpi in the middle of ram below 4g
> +	 * so just reserve it.
> +	 */
> +	if (max_low_pfn_mapped < max_pfn_mapped)
> +		reserve_last_page(max_low_pfn_mapped);
> +
> +	/* less than 4G, don't need iommu */
>  	if (max_pfn <= MAX_DMA32_PFN)
>  		return;
>  
> +	/* try to reserve last page to workaround P2P bridge pref DMA reads */
> +	reserve_last_page(max_pfn);
> +
>  	/*
>  	 * check aperture_64.c allocate_aperture() for reason about
>  	 * using 512M as goal
> Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
> +++ linux-2.6/arch/x86/kernel/pci-gart_64.c
> @@ -826,6 +826,9 @@ void __init gart_iommu_init(void)
>  	 */
>  	set_bit_string(iommu_gart_bitmap, 0, EMERGENCY_PAGES);
>  
> +	/* reserve one page at tail, for P2P bridge pref DMA reads */
> +	set_bit_string(iommu_gart_bitmap, iommu_pages - 1, 1);
> +
>  	agp_memory_reserved = iommu_size;
>  	printk(KERN_INFO
>  	       "PCI-DMA: Reserving %luMB of IOMMU area in the AGP aperture\n",
> @@ -870,6 +873,9 @@ void __init gart_iommu_init(void)
>  	for (i = EMERGENCY_PAGES; i < iommu_pages; i++)
>  		iommu_gatt_base[i] = gart_unmapped_entry;
>  
> +	/* we need set unmapped on head too, for P2P bridge pref DMA reads */
> +	iommu_gatt_base[0] = gart_unmapped_entry;
> +
>  	flush_gart();
>  	dma_ops = &gart_dma_ops;
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads
  2008-08-27  7:40 ` Benjamin Herrenschmidt
@ 2008-08-27  7:45   ` Yinghai Lu
  0 siblings, 0 replies; 4+ messages in thread
From: Yinghai Lu @ 2008-08-27  7:45 UTC (permalink / raw)
  To: benh
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	linux-kernel, Pavel Machek, Jesse Barnes

On Wed, Aug 27, 2008 at 12:40 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2008-08-27 at 00:29 -0700, Yinghai Lu wrote:
>> Diag guys, found one system when loading is high, will have gart wark error.
>> root cause is P2P bridge try to prefetch for several intel e1000 under
>> it. and that skb is near GART iommu area.
>>
>> try to reserve page in the boundary at first.
>> last page near TOM2, and last page near MMIO
>> also gart first and last page.
>>
>> need one better way for all arch support PCI and memory with a lot of holes etc.
>
> Sounds like a similar problem for which we don't unmap GART pages,
> just map them read-only and route them to a dummy page. In that
> case, the right approach is to write the last page of the GART
> to the dummy page as well and mark it occupied. That shouldn't
> involve actually reserving memory.

i should split it into two patches. that patch include two patch
second part is doing map to scratch memory.

but memory near TOP of MEMORY and TOP of MEMORY2, could have the same problem.

YH

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-27  7:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-27  7:29 RFC [PATCH] x86/pci: reserve extra page to avoid error caused by P2P pref DMA reads Yinghai Lu
2008-08-27  7:37 ` Ingo Molnar
2008-08-27  7:40 ` Benjamin Herrenschmidt
2008-08-27  7:45   ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.