* RE: S2464 (K7 Thunder) hangs -- some lessons learned
@ 2001-08-14 11:55 Ryan C. Bonham
0 siblings, 0 replies; 13+ messages in thread
From: Ryan C. Bonham @ 2001-08-14 11:55 UTC (permalink / raw)
To: Paul G. Allen, Mark Hahn, linux-kernel; +Cc: Alan Cox (E-mail)
> -----Original Message-----
> From: Paul G. Allen [mailto:pgallen@randomlogic.com]
> Sent: Monday, August 13, 2001 10:06 PM
> To: Mark Hahn; linux-kernel@vger.kernel.org
> Subject: Re: S2464 (K7 Thunder) hangs -- some lessons learned
>
>
> Mark Hahn wrote:
> >
> > > I don't find the errata. Can you hold my hand and point
> me to it? :)
> >
> > hopefully, you looked, and noticed the several "revision guides"
> > that AMD makes available. they contain the errata. Intel calls
> > them "spec updates", I think.
>
> I have the guide and unless it's been updated (the date on
> the site does not show it has) I have the latest revision of
> the chipset. (I also noted a DMA and an
> AGP issue in a post about two weeks ago).
The AGP issue, was resolved in the AC kernels, and i think the DMA was
also... ALAN??
Ryan
>
> PGA
>
> --
> Paul G. Allen
> UNIX Admin II/Programmer
> Akamai Technologies, Inc.
> www.akamai.com
> Work: (858)909-3630
> Cell: (858)395-5043
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* S2464 (K7 Thunder) hangs -- some lessons learned
@ 2001-08-13 1:24 Eric S. Raymond
2001-08-13 1:41 ` Paul G. Allen
2001-08-13 12:34 ` Alan Cox
0 siblings, 2 replies; 13+ messages in thread
From: Eric S. Raymond @ 2001-08-13 1:24 UTC (permalink / raw)
To: Linux Kernel List
Alas, the 2.4.8+ emu10k1 driver does not completely banish the K7 Thunder
lockups problem. It makes them a lot rarer, though, and enabled us to get to
the next level of diagnosis.
More from the article in progress:
<para>But as it turned out, the story didn't end there. The 2.4.8+ driver
doesn't completely banish the hangs; early in the morning of the third day,
while I was asleep, Gary tripped over a way to re-induce them by logging
into the machine via <command>ssh</command> while an X build is running. I
didn't yet know this when I next read my mail and saw a report from Jeffrey
Ingber of the linux-kernel list that he had continued to see emu10k1
lockups after installing 2.4.8 -- but that they were banished by the ALSA
drivers.</para>
<para>Further testing proved, in fact, that the presence of the SB Live!
in the machine can make it vulnerable to lockups triggered by network
activity even when the emul10k1 support is not configured in at all! This
takes the operating system out of the picture and suggests a hardware-
or BIOS-level problem. Our suspicions were immediately directed to PCI
IRQ sharing, a well-known source of lossage.</para>
<para>Upon investigation (via <filename>/proc/pci</filename>), we
discovered that the IRQ assignments looked distinctly dubious. IRQs
shared between on-board devices didn't bother us; we presumed the board
designers had been smart enough to avoid conflicts. But IRQs shared
between on-board and daughtercard devices looked like they might be
part of the problem.</para>
<para>Unlike some other PCI BIOSes, the S2464's doesn't give you the
capability to wire IRQs to specific card slots. While looking for this,
however, we found a BIOS setting that seemed relevant -- "Use PCI Interrupt
Entries In MP Table". When we switched it to `Yes', rebooted, and looked at
<filename>/proc/pci</filename>, the IRQ assignments looked a lot saner --
and when we tested, the <command>ssh</command> hang was gone!</para>
OK, so the lessons here are:
1. The S2464 needs to be configured with "Use PCI Interrupt Entries In MP
Table" for sanity to prevail, and
2. When you see a box hang that's clearly related to a daughtercard, *run*
(do not walk) to your local /proc directory, cat /proc/pci and check out
the IRQ assignments.
I'm not certain we've nailed the entire problem yet -- we still need to test
with the emu10k1 sound driver linked in. But it's looking pretty good.
BTW, somebody mailed me an explanation of that BIOS setting ("Use PCI
Interrupt Entries In MP Table") but I managed to lose it. Whoever you
are, could you remail? I want to include some sort of explanation in
the article.
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
The people cannot delegate to government the power to do anything
which would be unlawful for them to do themselves.
-- John Locke, "A Treatise Concerning Civil Government"
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 1:24 Eric S. Raymond
@ 2001-08-13 1:41 ` Paul G. Allen
2001-08-13 5:12 ` Christopher Abbey
2001-08-13 12:34 ` Alan Cox
1 sibling, 1 reply; 13+ messages in thread
From: Paul G. Allen @ 2001-08-13 1:41 UTC (permalink / raw)
Cc: Linux Kernel List
(Small note. The K7 Thunder is S2462, unless there is another, possibly
newer, version released?)
"Eric S. Raymond" wrote:
>
[SNIP]
>
> OK, so the lessons here are:
>
> 1. The S2464 needs to be configured with "Use PCI Interrupt Entries In MP
> Table" for sanity to prevail, and
I have been running my K7 in this mode since purchase. Could this be why
I see no SB Live!/ EMU10K problems (though I am running 2.4.7 kernels
now)?
>
> 2. When you see a box hang that's clearly related to a daughtercard, *run*
> (do not walk) to your local /proc directory, cat /proc/pci and check out
> the IRQ assignments.
Problem is, when it does hang, I can't get there as the system is
completely locked, including ssh and telnet.
PGA
--
Paul G. Allen
UNIX Admin II/Network Security
Akamai Technologies, Inc.
www.akamai.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 1:41 ` Paul G. Allen
@ 2001-08-13 5:12 ` Christopher Abbey
0 siblings, 0 replies; 13+ messages in thread
From: Christopher Abbey @ 2001-08-13 5:12 UTC (permalink / raw)
To: Linux Kernel List
Yesterday, Paul G. Allen wrote:
> > 2. When you see a box hang that's clearly related to a daughtercard, *run*
> > (do not walk) to your local /proc directory, cat /proc/pci and check out
> > the IRQ assignments.
lspci -vvv is also usefull.
> Problem is, when it does hang, I can't get there as the system is
> completely locked, including ssh and telnet.
But the point is to go look at the pci interrupt assignments *before*
the hang occurs. I've seen the same situation, where two devices are
sharing an interupt, one on the mobo, the other in a PCI slot... it's
never been a good thing in my experience. As Eric pointed out if they're
both on the mobo you have to hope the designers built the hardware to
handle that, or if they're both in pci slots you can usually expect
the cards will play well with others. It's the third case that's
trouble, and then it's time to do as Eric did - get into the bios and
change the assignements (or in this case something that would cuase a
change to happen).
--
now the forces of openness have a powerful and
unexpected new ally - http://ibm.com/linux
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 1:24 Eric S. Raymond
2001-08-13 1:41 ` Paul G. Allen
@ 2001-08-13 12:34 ` Alan Cox
2001-08-13 15:18 ` Eric S. Raymond
1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2001-08-13 12:34 UTC (permalink / raw)
To: esr; +Cc: Linux Kernel List
> Alas, the 2.4.8+ emu10k1 driver does not completely banish the K7 Thunder
> lockups problem. It makes them a lot rarer, though, and enabled us to get to
> the next level of diagnosis.
What version of the chipset do you have. The current ones can hang the PCI bus
during IDE transfers if you have IDE read/write prefetch enabled in the bios
setup.
It also has problems with the APIC implementation where an IRQ masked in
the APIC re-occurs which can hang the system. Worrying this one is marked
'nofix'. You might want to trying running "noapic"
Alan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 12:34 ` Alan Cox
@ 2001-08-13 15:18 ` Eric S. Raymond
2001-08-13 15:46 ` Alan Cox
0 siblings, 1 reply; 13+ messages in thread
From: Eric S. Raymond @ 2001-08-13 15:18 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel List
Alan Cox <alan@lxorguk.ukuu.org.uk>:
> > Alas, the 2.4.8+ emu10k1 driver does not completely banish the K7 Thunder
> > lockups problem. It makes them a lot rarer, though, and enabled us to get
> > to the next level of diagnosis.
>
> What version of the chipset do you have. The current ones can hang
> the PCI bus during IDE transfers if you have IDE read/write prefetch
> enabled in the bios setup.
I don't know what version we have. Is there a way to query it through /proc?
We have IDE disabled in the BIOS, so we're not likely to see this bug.
> It also has problems with the APIC implementation where an IRQ masked in
> the APIC re-occurs which can hang the system. Worrying this one is marked
> 'nofix'. You might want to trying running "noapic"
I'll bear that in mind if the lockups recur. I'll copy this to Gary, who
might find himself building IDE systems around this board.
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
"America is at that awkward stage. It's too late to work within the system,
but too early to shoot the bastards."
-- Claire Wolfe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 15:18 ` Eric S. Raymond
@ 2001-08-13 15:46 ` Alan Cox
2001-08-13 15:52 ` Eric S. Raymond
2001-08-14 1:45 ` Paul G. Allen
0 siblings, 2 replies; 13+ messages in thread
From: Alan Cox @ 2001-08-13 15:46 UTC (permalink / raw)
To: esr; +Cc: Alan Cox, Linux Kernel List
> I don't know what version we have. Is there a way to query it through /proc?
You need to look at the lspci hex data. There's an errata document for the
MP chipset on www.amd.com if you realyl want to scare yourself 8)
Alan
--
"Have you noticed the way people's intelligence capabilities decline
sharply the minute they start waving guns around?"
-- Dr. Who
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 15:46 ` Alan Cox
@ 2001-08-13 15:52 ` Eric S. Raymond
2001-08-13 16:00 ` Alan Cox
2001-08-14 1:45 ` Paul G. Allen
1 sibling, 1 reply; 13+ messages in thread
From: Eric S. Raymond @ 2001-08-13 15:52 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel List
Alan Cox <alan@lxorguk.ukuu.org.uk>:
> You need to look at the lspci hex data. There's an errata document for the
> MP chipset on www.amd.com if you realyl want to scare yourself 8)
Is there a more formal name for the chipset than just "760"?
> "Have you noticed the way people's intelligence capabilities decline
> sharply the minute they start waving guns around?"
> -- Dr. Who
People who wave guns around to coerce others don't think they *have* to
be intelligent, so they stop thinking. Unfortunately, they're right in the
short term often enough to make it almost useless that they're always wrong
in the long term. Sigh...
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
Everything you know is wrong. But some of it is a useful first approximation.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: S2464 (K7 Thunder) hangs -- some lessons learned
2001-08-13 15:46 ` Alan Cox
2001-08-13 15:52 ` Eric S. Raymond
@ 2001-08-14 1:45 ` Paul G. Allen
1 sibling, 0 replies; 13+ messages in thread
From: Paul G. Allen @ 2001-08-14 1:45 UTC (permalink / raw)
Cc: Linux Kernel List
Alan Cox wrote:
>
> > I don't know what version we have. Is there a way to query it through /proc?
>
> You need to look at the lspci hex data. There's an errata document for the
> MP chipset on www.amd.com if you realyl want to scare yourself 8)
>
I don't find the errata. Can you hold my hand and point me to it? :)
PGA
--
Paul G. Allen
UNIX Admin II/Programmer
Akamai Technologies, Inc.
www.akamai.com
Work: (858)909-3630
Cell: (858)395-5043
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-08-14 22:11 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.10.10108140151000.10879-100000@coffee.psychology.mcmaster.ca>
2001-08-14 2:05 ` S2464 (K7 Thunder) hangs -- some lessons learned Paul G. Allen
2001-08-14 11:55 Ryan C. Bonham
-- strict thread matches above, loose matches on Subject: below --
2001-08-13 1:24 Eric S. Raymond
2001-08-13 1:41 ` Paul G. Allen
2001-08-13 5:12 ` Christopher Abbey
2001-08-13 12:34 ` Alan Cox
2001-08-13 15:18 ` Eric S. Raymond
2001-08-13 15:46 ` Alan Cox
2001-08-13 15:52 ` Eric S. Raymond
2001-08-13 16:00 ` Alan Cox
2001-08-14 21:27 ` Eric S. Raymond
2001-08-14 22:13 ` Alan Cox
2001-08-14 1:45 ` Paul G. Allen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox