public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@linux-foundation.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Houston <mikeserv@bmts.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 2.6.22-rc2
Date: Wed, 23 May 2007 07:58:20 -0700	[thread overview]
Message-ID: <20070523075820.3d9fc3f8@freepuppy> (raw)
In-Reply-To: <alpine.LFD.0.98.0705221817150.3890@woody.linux-foundation.org>

On Tue, 22 May 2007 18:53:33 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 22 May 2007, Stephen Hemminger wrote:
> > 
> > It looks like the chip reads the wrong memory sometimes. The problem happens
> > only on the on-board NIC's and only on this kind of motherboard.
> 
> Do you know if it happens for particular addresses? (Ie, can you tell what 
> the physical address of the descriptor is for the errors?)

I'll look but there didn't seem to be an obvious pattern when I last looked.


> 
> > For testing, I have put code in to check that the receive data actually
> > arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
> > appears that DMA access is messed up.
> 
> Yes, that certainly would also explain memory corruption. Either because 
> writes went to the wrong address, or because writes went to the right 
> address, but because an earlier IO descriptor read had gotten corrupted, 
> the "right address" was in fact the wrong one ;)
> 
> The reason I ask whether you have some way of telling the pattern for the 
> physical address is that one traditional cause of DMA errors is due to 
> broken RAM remapping setup.
> 
> As an example of that - imagine that you have 1GB of RAM in the machine, 
> and realize that the memory behind the 640kB -> 1MB area isn't accessible, 
> because it's taken up by the legacy ISA region.
> 
> You have two possible outcomes: either (a) the memory is just "gone", and 
> you lost it, or (b) there is some RAM remapping in the core chipset that 
> makes the lost 384kB show up _above_ the 1GB mark instead.
> 
> The same "legacy ISA" hole situation happens for the "legacy PCI" hole, 
> which is why if you have 4GB of RAM in the machine, usually you'll see 
> 3GB at addresses 0-3GB (roughly), and then you'll see the rest at above 
> the 4GB mark, in order to have a nice PCI hole in the 32-bit access range.
> 
> There's also the "legacy 286" hole at the 15-16MB mark (which nobody uses 
> any more, but chipsets still inexplicably support), and the SMM remapping. 
> 
> Anyway, core chipsets generally do CPU memory accesses _differently_ from 
> DMA accesses from the PCI bus (at a minimum, SMM is something that only 
> the CPU can do), so I could see a situation where the remapping was set up 
> correctly for the CPU (and perhaps for "core chipset" devices like the 
> integrated southbridge), but devices that do DMA from the outside get 
> screwed over.
>

This board doesn't have any onboard video so that helps. I am running
with 2GB of memory.

I can put a card with similar chip in an X1 slot, and there are no
problems.  Same driver, but different bridges, and slightly different
Marvell chip.
 
> But it might not happen for all addresses. Non-remapped stuff might work 
> well, so if there is some way of figuring out what the bad DMA address was 
> for an erreneous access, that might offer some clues.
> 
> > This board has lots of "overclocker" friendly stuff; maybe the BIOS 
> > never really sets up the PCI bridges and clocks properly.
> 
> It's hard to set up a normal PCI-PCI bridge subtly incorrectly. But 
> special RAM timing or remapping stuff for the host bridge - sure.
> 
> > It doesn't seem like a software or driver problem. I have tried tweaking PCI
> > registers but nothing worked in this case.
> 
> Yeah, the PCI registers that would affect things like this tend to be in 
> the host bridge, not on the normal device.
> 
> That said, Intel doesn't generally do the really insane things. And a lot 
> of the old remapping stuff is simply not done any more. For example, I 
> doubt that the 925 chipset even supports remapping the 640k-1M range any 
> more: 384kB just isn't worth it when people talk about gigs of RAM, the 
> way it was when 16MB was considered a lot.
> 
> And looking quickly at the Intel 925X MCH (memory controller hub) 
> registers, nothing jumps out as a good candidate for some obvious bug. 
> 
> 			Linus

Here is the PCI controller chain to the device:

00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 32 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 00005000-00005fff
	Memory behind bridge: fff00000-000fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
	Capabilities: [40] Express Root Port (Slot+) IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
		Link: Latency L0s <1us, L1 <4us
		Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
		Link: Speed 2.5Gb/s, Width x0
		Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
		Slot: Number 16, PowerLimit 10.000000
		Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
		Slot: AttnInd Unknown, PwrInd Unknown, Power-
		Root: Correctable- Non-Fatal- Fatal- PME-
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
		Address: fee0300c  Data: 4169
	Capabilities: [90] Subsystem: Giga-byte Technology Unknown device 5001
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100] Virtual Channel
	Capabilities: [180] Unknown (5)

00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 32 bytes
	Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
	I/O behind bridge: 0000a000-0000afff
	Memory behind bridge: f8000000-f9ffffff
	Prefetchable memory behind bridge: 0000000080100000-00000000801fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
	Capabilities: [40] Express Root Port (Slot+) IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 5
		Link: Latency L0s <256ns, L1 <4us
		Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x1
		Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
		Slot: Number 20, PowerLimit 10.000000
		Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
		Slot: AttnInd Unknown, PwrInd Unknown, Power-
		Root: Correctable- Non-Fatal- Fatal- PME-
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
		Address: fee0300c  Data: 4181
	Capabilities: [90] Subsystem: Giga-byte Technology Unknown device 5001
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100] Virtual Channel
	Capabilities: [180] Unknown (5)

05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 14)
	Subsystem: Giga-byte Technology Unknown device e000
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 14
	Region 0: Memory at f9000000 (64-bit, non-prefetchable) [size=16K]
	Region 2: I/O ports at a000 [size=256]
	[virtual] Expansion ROM at 80100000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [e0] Express Legacy Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0
		Link: Latency L0s <256ns, L1 unlimited
		Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
		Link: Speed 2.5Gb/s, Width x1
	Capabilities: [100] Advanced Error Reporting


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

  reply	other threads:[~2007-05-23 14:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-19  5:17 Linux 2.6.22-rc2 Linus Torvalds
2007-05-19  6:54 ` Andrey Borzenkov
2007-05-19 14:28 ` [BUG: 2.6.22-rc2] SLAB doesn't like usb_get_configuration() Indan Zupancic
     [not found]   ` <6101e8c40705190950jb093d65l611995895a182ec0@mail.gmail.com>
2007-05-19 16:51     ` Fwd: " oliver pinter
2007-05-19 18:20   ` Christoph Lameter
2007-05-19 19:33     ` Greg KH
2007-05-19 22:10       ` Indan Zupancic
2007-05-20 12:57 ` Linux 2.6.22-rc2: make -j makes it unresponsive Rafael J. Wysocki
2007-05-20 13:01   ` Krzysztof Halasa
2007-05-20 13:23     ` Rafael J. Wysocki
2007-05-20 21:05 ` Linux 2.6.22-rc2 Mike Houston
2007-05-21 15:45   ` Stephen Hemminger
2007-05-21 17:10     ` Mike Houston
2007-05-21 17:37       ` Stephen Hemminger
2007-05-22  2:58         ` Mike Houston
2007-05-22  4:31           ` Stephen Hemminger
2007-05-22  4:36             ` Jeff Garzik
2007-05-22  4:42               ` Stephen Hemminger
2007-05-22  5:04                 ` Linus Torvalds
2007-05-22 17:19                   ` Stephen Hemminger
2007-05-22 17:54                   ` H. Peter Anvin
2007-05-22 22:14             ` Mike Houston
2007-05-23  0:00               ` Linus Torvalds
2007-05-23  0:29                 ` Stephen Hemminger
2007-05-23  1:53                   ` Linus Torvalds
2007-05-23 14:58                     ` Stephen Hemminger [this message]
2007-05-23 17:39                 ` Mike Houston
2007-05-23 17:46                   ` Linus Torvalds
2007-05-23 18:04                     ` Stephen Hemminger
2007-05-24 18:26                     ` Mike Houston
2007-05-24 22:08                       ` sky2/pci issues on Gigabyte Stephen Hemminger
2007-05-24 22:48                         ` Linus Torvalds
2007-05-24 23:04                           ` Stephen Hemminger
2007-05-25  0:01                             ` Mike Houston

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070523075820.3d9fc3f8@freepuppy \
    --to=shemminger@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikeserv@bmts.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox