public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-10 22:20 long
  2004-11-11  1:30 ` Greg KH
  2004-11-11  7:57 ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: long @ 2004-11-10 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-pci
  Cc: ak, akpm, greg, sundarapandian.durairaj, tom.l.nguyen

Here I have attached pci mmconfig fix for 2.6.9 kernel. 

This will fix the flush error in pci_mmcfg_write.

When pci_mmcfg_write is used to program the PMCSR in the Power
Management Capability structure of PCI config space in the PCI Express
device to a different power state, the dummy readl to flush the previous
write violates the transition delay specified in the PCI power
management spec. Please see PCI Power Management Spec. 1.2 Table 5-6.
For example, while changing the power state of the device through PMCSR
register, a transition delay of 10msec is required before any access can
be made to the device.

Since the configuration write access for PCI Express is non posted,
flushing is not necessary and  it will be safe to remove the dummy
readl. 

This patch will remove dummy readl function implemented in
"pci_mmcfg_write" and use set_fixmap_nocahe instead of set_fixmap.

Signed-off-by: Sundarapandian Durairaj <sundarapandian.duraijai@intel.com>
Signed-off-by: T. Long Nguyen <tom.l.nguyen@intel.com>

--------------------------------------------------------------------------
diff -urpN linux-2.6.9/arch/i386/pci/mmconfig.c linux-2.6.9-mmconfig-patch/arch/i386/pci/mmconfig.c
--- linux-2.6.9/arch/i386/pci/mmconfig.c	2004-10-18 17:54:38.000000000 -0400
+++ linux-2.6.9-mmconfig-patch/arch/i386/pci/mmconfig.c	2004-11-10 13:36:29.287713784 -0500
@@ -30,7 +30,7 @@ static inline void pci_exp_set_dev_base(
 	u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn << 12);
 	if (dev_base != mmcfg_last_accessed_device) {
 		mmcfg_last_accessed_device = dev_base;
-		set_fixmap(FIX_PCIE_MCFG, dev_base);
+		set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
 	}
 }
 
@@ -85,9 +85,6 @@ static int pci_mmcfg_write(int seg, int 
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(mmcfg_virt_addr);
-
 	spin_unlock_irqrestore(&pci_config_lock, flags);
 
 	return 0;
diff -urpN linux-2.6.9/arch/x86_64/pci/mmconfig.c linux-2.6.9-mmconfig-patch/arch/x86_64/pci/mmconfig.c
--- linux-2.6.9/arch/x86_64/pci/mmconfig.c	2004-10-18 17:53:41.000000000 -0400
+++ linux-2.6.9-mmconfig-patch/arch/x86_64/pci/mmconfig.c	2004-11-10 13:37:49.585506664 -0500
@@ -63,9 +63,6 @@ static int pci_mmcfg_write(int seg, int 
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(addr);
-
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-15  7:07 Michael Chan
  2004-11-15  8:01 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-15  7:07 UTC (permalink / raw)
  To: Grant Grundler, Andi Kleen
  Cc: linux-kernel, linux-pci, akpm, greg, Durairaj, Sundarapandian

Grant Grundler wrote:
 
> There are two tests in arch/i386/pci/mmconfig.c:pci_mmcfg_init() before
> the raw_pci_ops is set to &pci_mmcfg. Perhaps some additional crude tests
> could select a different set of pci_raw_ops to deal with posted writes
> to mmconfig space. Someone more familiar with those chipsets might
> find a more elegant solution.

Do you mean something like pci_mmcfg1 for Intel chipsets that implement non-posted mmconfig and pci_mmcfg2 for other chipsets that may implement posted mmconfig? pci_mmcfg2's write method will guarantee that the write has reached the target before returning. If pci_mmcfg2's write method uses read from the target to flush the write, we are back to the original problem of out-of-spec read when writing the PMCSR register. If the flush does not require reading from the target device, then it's fine.
 
Michael


 


^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-13 16:22 Michael Chan
  2004-11-13 19:46 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-13 16:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Grant Grundler, linux-kernel, linux-pci, akpm, greg,
	Durairaj, Sundarapandian

> If I got the discussion so far correctly then the PCI-SGI spec does not
> guarantee that there is no posting, but you know that the chipset
> you are using right now doesn't do it.

Yes, that's my understanding of the spec. Grant Grundler does not agree and thinks that non-posting is the only compliant implementation. I wish he was right as it would be the easiest to deal with. We contacted Intel about the out-of-spec readl when writing to the PMCSR to change power state as they were the original author of the mmconfig code. Their solution was to remove the readl after confirming that mmconfig was non-posted on their chipsets.

> The problem with the explanation is that there will be very soon
> chipsets not from Intel that also implement PCI-Express. And also
> even systems with non Intel CPUs that also do PCI-Express and
> mmconfig.

I agree with you. Having the readl would be the safest and the most conservative approach. But it is out-of-spec to have a readl immediately following writel to the PMCSR when power state is changed. So should we have a relaxed version of pci_write_config_* with no flushing read for programming the PMCSR?
 
Michael



^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-12 23:56 Michael Chan
  2004-11-13  9:22 ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-12 23:56 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andi Kleen, linux-kernel, linux-pci, akpm, greg,
	Durairaj, Sundarapandian

On Fri, 12 Nov 2004, 15:31:59 -0700, Grant Grundler wrote:

> 
> On Fri, Nov 12, 2004 at 01:49:18PM -0800, Michael Chan wrote:
> > In short, I believe mmconfig is allowed to be posted or 
> non-posted. If 
> > it is posted, there must be a method to allow software to flush it.
> 
> Yes. Agreed.
> But existing direct access methods must implement 
> non-postable writes to be compliant.
> 
> E.g. the second paragraph of the Implementation Note:
> | In those cases in which the software must know that a posted 
> | transaction is completed by the completer, ...
> 
> IMHO, "In those cases" refers to the second class of systems. 
> i386 and x86_64 are (still) in the first class of "legacy" systems.

Hi Grant,

I think we are almost in agreement. IMHO, the entire ECN applies the
PC-compatible systems and any examples in the Implementation Note is a
valid implementation. mmconfig can be posted, as long as there is a way
to guarantee that the write has completed, such as flush by reading. We
disagree on this point.

The Intel mmconfig implementation is non-posted which is a valid
implementation. Therefore readl is unnecessary and can be removed.

Michael


^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-12 21:49 Michael Chan
  2004-11-12 22:31 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-12 21:49 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andi Kleen, linux-kernel, linux-pci, akpm, greg,
	Durairaj, Sundarapandian

On Fri, 12 Nov 2004, 13:55:09 -0700, Grant Grundler wrote:

> Yes. I found it on page 5 of PciEx_ECN_MMCONFIG_040217.pdf. 
> AFAICT, this section only applies to "systems that implement 
> a processor-architecture-specific firmware interface 
> standard". e.g. ia64 SAL calls.
> 

Hi Grant,

I disagree with your interpretations of the ECN. Here's my
interpretations:

1. PC-compatible systems or systems that do not implement a
processor-architecture-specific firmware interface must implement
mmconfig as specified in the ECN.

2. mmconfig implementation must provide a method for software to
guarantee that the config access has completed before software execution
continues. In Implementation Note, it provides some examples on how to
do this. One example is to make mmconfig non-posted. But there are other
examples.

In short, I believe mmconfig is allowed to be posted or non-posted. If
it is posted, there must be a method to allow software to flush it.

Michael



^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-12 19:23 Michael Chan
  2004-11-12 20:55 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-12 19:23 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andi Kleen, linux-kernel, linux-pci, akpm, greg,
	Durairaj, Sundarapandian

On Fri, 12 Nov 2004, 11:35:47 -0700, Grant Grundler wrote:

> Michael,
> Thanks for digging that up.
> I think Andi was looking for references to the PCI-E spec.
> I found such a statement in "PCI Express(TM) Base 
> Specification Revision 1.0a".
> 

Hi Grant,

I think it is well documented that config cycles are non-posted in PCI,
PCIX, and PCI Express specs as you pointed out. The only ambiguity is
whether the mmconfig memory cycle from the CPU to the chipset is posted
or not. The Implementation Note in the MMCONFIG ECN from pcisig (link
below) allows the mmconfig write cycle to be posted, meaning mmconfig
write cycle can complete before the real config write cycle completes.
That's why we needed confirmations from chipset engineers.

Michael

http://www.pcisig.com/specifications/pciexpress/specifications/specifica
tions


^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-12 17:52 Michael Chan
  2004-11-12 18:35 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Chan @ 2004-11-12 17:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-pci, akpm, greg, Durairaj, Sundarapandian

On Wednesday, November 10, 2004 11:58 PM Andi Kleen wrote:
> Where is it guaranteed that these writes are non posted?

Intel chipset engineer confirmed that they are non-posted. Here are
excerpts from some email exchange with Intel. We reported the
out-of-spec dummy read problem to Intel a while back.

Begin forwarded message:

Hi Michael,

I have checked our chipset engineer. He specified that the mmconfig is
truly non posted and memory cycles will be completed only after the
config write are finished.

So I think flushing is not necessary and readl can be removed.

Thanks,
Sundar

-----Original Message-----
From: Michael Chan [mailto:mchan@broadcom.com] 
Sent: Friday, November 05, 2004 5:08 AM
To: Durairaj, Sundarapandian
Subject: RE: MMCONFIG Bug

Hi Sundar,

Thanks for the update. I agree that config cycles are non-posted and
therefore flushing is unnecessary. However, since config cycles are not
directly generated by the CPU, it is a bit more complicated. When the
CPU issues the memory cycle (writel) to the chipset, the chipset will
translate the memory cycle into a PCI Express config request on the
appropriate PCI Express link. The target device will then return a
completion.

Is the memory cycle to the chipset posted or not? In other words, does
the chipset complete the memory cycle before issuing the config cycle,
or does it issue the config cycle and wait for completion before
completing the memory cycle (writel) from the CPU? If the latter is
true, then it is unnecessary to flush the writel as it is truly
non-posted. If the former is true, I think flushing is still necessary.
Just wanted to confirm this.

Thanks,
Michael



^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-11 16:33 Nguyen, Tom L
  0 siblings, 0 replies; 23+ messages in thread
From: Nguyen, Tom L @ 2004-11-11 16:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, linux-pci, akpm, greg, Durairaj, Sundarapandian,
	Nguyen, Tom L

On Wednesday, November 10, 2004 11:58 PM Andi Kleen wrote:
>Where is it guaranteed that these writes are non posted?

Please refer to section 2.6.1 Flow Control Rules of PCI Express Base
Specs Rev1.0. 

Thanks,
Long

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-10 19:38 long
  2004-11-10 19:35 ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: long @ 2004-11-10 19:38 UTC (permalink / raw)
  To: greg, linux-kernel, linux-pci
  Cc: akpm, sundarapandian.durairaj, tom.l.nguyen, willy

On Wed, Nov 10, 2004 at 9:36 Greg KH wrote:
> Your patch is line wrapped, and mime encoded.

On behalf of Sundar, I redo it.
Signed-off-by: Sundarapandian Durairaj

Thanks,
Long
---------------------------------------------------------------
diff -Naur linux-2.6.9/arch/i386/pci/mmconfig.c pcie-fix-linux-2.6.9/arch/i386/pci/mmconfig.c
--- linux-2.6.9/arch/i386/pci/mmconfig.c	2004-10-19 03:24:38.000000000 +0530
+++ pcie-fix-linux-2.6.9/arch/i386/pci/mmconfig.c	2004-11-04 11:35:36.029054848 +0530
@@ -30,7 +30,7 @@
 	u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn << 12);
 	if (dev_base != mmcfg_last_accessed_device) {
 		mmcfg_last_accessed_device = dev_base;
-		set_fixmap(FIX_PCIE_MCFG, dev_base);
+		set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
 	}
 }
 
@@ -85,9 +85,6 @@
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(mmcfg_virt_addr);
-
 	spin_unlock_irqrestore(&pci_config_lock, flags);
 
 	return 0;
diff -Naur linux-2.6.9/arch/x86_64/pci/mmconfig.c pcie-fix-linux-2.6.9/arch/x86_64/pci/mmconfig.c
--- linux-2.6.9/arch/x86_64/pci/mmconfig.c	2004-10-19 03:23:41.000000000 +0530
+++ pcie-fix-linux-2.6.9/arch/x86_64/pci/mmconfig.c	2004-11-04 11:39:21.989703608 +0530
@@ -63,9 +63,6 @@
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(addr);
-
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCH] pci-mmconfig fix for 2.6.9
@ 2004-11-10 17:26 Durairaj, Sundarapandian
  2004-11-10 17:35 ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Durairaj, Sundarapandian @ 2004-11-10 17:26 UTC (permalink / raw)
  To: linux-pci, linux-kernel
  Cc: Seshadri, Harinarayanan, Carbonari, Steven, Michael Chan,
	Goswami, Pranjal, Greg KH, Andrew Morton, Matthew Wilcox

Here I have attached pci mmconfig fix for 2.6.9 kernel. 

This will fix the flush error in pci_mmcfg_write.

When pci_mmcfg_write is used to program the PMCSR in the Power
Management Capability structure of PCI config space in the PCI Express
device to a different power state, the dummy readl to flush the previous
write violates the transition delay specified in the PCI power
management spec. Please see PCI Power Management Spec. 1.2 Table 5-6.
For example, while changing the power state of the device through PMCSR
register, a transition delay of 10msec is required before any access can
be made to the device.

Since the configuration write access for PCI Express is non posted,
flushing is not necessary and  it will be safe to remove the dummy
readl. 

This patch will remove dummy readl function implemented in
"pci_mmcfg_write" and use set_fixmap_nocahe instead of set_fixmap.

Thanks,
Sundar

diff -Naur linux-2.6.9/arch/i386/pci/mmconfig.c
pcie-fix-linux-2.6.9/arch/i386/pci/mmconfig.c
--- linux-2.6.9/arch/i386/pci/mmconfig.c	2004-10-19
03:24:38.000000000 +0530
+++ pcie-fix-linux-2.6.9/arch/i386/pci/mmconfig.c	2004-11-04
11:35:36.029054848 +0530
@@ -30,7 +30,7 @@
 	u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn <<
12);
 	if (dev_base != mmcfg_last_accessed_device) {
 		mmcfg_last_accessed_device = dev_base;
-		set_fixmap(FIX_PCIE_MCFG, dev_base);
+		set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
 	}
 }
 
@@ -85,9 +85,6 @@
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(mmcfg_virt_addr);
-
 	spin_unlock_irqrestore(&pci_config_lock, flags);
 
 	return 0;
diff -Naur linux-2.6.9/arch/x86_64/pci/mmconfig.c
pcie-fix-linux-2.6.9/arch/x86_64/pci/mmconfig.c
--- linux-2.6.9/arch/x86_64/pci/mmconfig.c	2004-10-19
03:23:41.000000000 +0530
+++ pcie-fix-linux-2.6.9/arch/x86_64/pci/mmconfig.c	2004-11-04
11:39:21.989703608 +0530
@@ -63,9 +63,6 @@
 		break;
 	}
 
-	/* Dummy read to flush PCI write */
-	readl(addr);
-
 	return 0;
 }


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-11-15  9:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-10 22:20 [PATCH] pci-mmconfig fix for 2.6.9 long
2004-11-11  1:30 ` Greg KH
2004-11-11  7:57 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2004-11-15  7:07 Michael Chan
2004-11-15  8:01 ` Grant Grundler
2004-11-13 16:22 Michael Chan
2004-11-13 19:46 ` Grant Grundler
2004-11-14  8:58   ` Andi Kleen
2004-11-15  6:00     ` Grant Grundler
2004-11-15  8:33       ` Andi Kleen
2004-11-12 23:56 Michael Chan
2004-11-13  9:22 ` Andi Kleen
2004-11-12 21:49 Michael Chan
2004-11-12 22:31 ` Grant Grundler
2004-11-12 19:23 Michael Chan
2004-11-12 20:55 ` Grant Grundler
2004-11-12 17:52 Michael Chan
2004-11-12 18:35 ` Grant Grundler
2004-11-11 16:33 Nguyen, Tom L
2004-11-10 19:38 long
2004-11-10 19:35 ` Greg KH
2004-11-10 17:26 Durairaj, Sundarapandian
2004-11-10 17:35 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox