public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)
@ 2009-11-28 22:00 Bruno Barberi Gnecco
  2009-11-28 22:01 ` Michael Breuer
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Barberi Gnecco @ 2009-11-28 22:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: brunobg

Summary: a PC I'm testing is unstable and freezing often. Freezes occur in various 
situations, including under no load (not even the X running). Dumps are provided below.

Sometimes it's a plain freeze, sometimes a kernel panic with the blinking leds and a dump. 
It seems the problem is related to memory, most of the messages are related to null memory 
references (see more below). System ran once for several hours before freezing.

* System specification: Intel Core 2 Quad Q9550 @ 2.83GHz, 4GB DDR3, Motherboard Asus 
P5QC. No overclocking.

* Tested under the following kernels/distros, all had the bug. Since I used their vanilla 
kernels, I'm not posting a copy of the .configs:

- 2.6.27.7-smp (slackware 12.2 hugesmp kernel; distro heavily updated), 32 bits
- 2.6.29.6-smp (slackware 13.0 hugesmp kernel; fresh install), both 32 and 64 bits
- 2.6.29.6-nosmp (slackware 13.0 huge kernel; fresh install), both 32 and 64 bits
- 2.6.31-14-generic SMP (ubuntu 9.10 netbook remix live cd), 32 bits

* Here are some screen dumps (sorry, no serial cable for a text dump).

http://ultraxs.com/share-208E_4B118ED2.html
http://ultraxs.com/share-2FDD_4B118ED2.html
http://ultraxs.com/share-D0D9_4B118ED2.html
http://ultraxs.com/share-498F_4B118ED2.html
http://ultraxs.com/share-0AD8_4B118ED2.html

* Turning off ACPI and APIC, either through the BIOS or through kernel parameters did not 
help, even when only one processor is recognized.

* I ran memtest and no errors were detected. I completed the tests three times.

* Nothing is ever output to the logs.

* A sure way to get a freeze is to recompile the kernel, but the freeze does not happen at 
the same point of the compilation and I can't even say if the compilation triggers the bug 
or if it's just a coincidence and the compilation just takes long enough for it to happen.

* Any other information, testing or debugging you need, just ask. I just can't use the PC, 
it usually freezes within minutes. Please CC any replies to me [brunobg at gmail]. Thank 
you very much for any help.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)
  2009-11-28 22:00 PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps) Bruno Barberi Gnecco
@ 2009-11-28 22:01 ` Michael Breuer
  2009-11-29 15:52   ` Bruno Barberi Gnecco
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Breuer @ 2009-11-28 22:01 UTC (permalink / raw)
  To: Bruno Barberi Gnecco; +Cc: linux-kernel

I'd think this is a hardware problem. Some things that have caused me 
similar grief in the past:

Bad IDE cable
Bad SATA cable
Bad power supply
Bad motherboard
Bad memory (memtest doesn't necessarily access things the same way as 
the kernel)
Bad cards (pci, agp, whatever)
Any of the above with loose connections
And did I mention bad power supply?

Bruno Barberi Gnecco wrote:
> Summary: a PC I'm testing is unstable and freezing often. Freezes 
> occur in various situations, including under no load (not even the X 
> running). Dumps are provided below.
>
> Sometimes it's a plain freeze, sometimes a kernel panic with the 
> blinking leds and a dump. It seems the problem is related to memory, 
> most of the messages are related to null memory references (see more 
> below). System ran once for several hours before freezing.
>
> * System specification: Intel Core 2 Quad Q9550 @ 2.83GHz, 4GB DDR3, 
> Motherboard Asus P5QC. No overclocking.
>
> * Tested under the following kernels/distros, all had the bug. Since I 
> used their vanilla kernels, I'm not posting a copy of the .configs:
>
> - 2.6.27.7-smp (slackware 12.2 hugesmp kernel; distro heavily 
> updated), 32 bits
> - 2.6.29.6-smp (slackware 13.0 hugesmp kernel; fresh install), both 32 
> and 64 bits
> - 2.6.29.6-nosmp (slackware 13.0 huge kernel; fresh install), both 32 
> and 64 bits
> - 2.6.31-14-generic SMP (ubuntu 9.10 netbook remix live cd), 32 bits
>
> * Here are some screen dumps (sorry, no serial cable for a text dump).
>
> http://ultraxs.com/share-208E_4B118ED2.html
> http://ultraxs.com/share-2FDD_4B118ED2.html
> http://ultraxs.com/share-D0D9_4B118ED2.html
> http://ultraxs.com/share-498F_4B118ED2.html
> http://ultraxs.com/share-0AD8_4B118ED2.html
>
> * Turning off ACPI and APIC, either through the BIOS or through kernel 
> parameters did not help, even when only one processor is recognized.
>
> * I ran memtest and no errors were detected. I completed the tests 
> three times.
>
> * Nothing is ever output to the logs.
>
> * A sure way to get a freeze is to recompile the kernel, but the 
> freeze does not happen at the same point of the compilation and I 
> can't even say if the compilation triggers the bug or if it's just a 
> coincidence and the compilation just takes long enough for it to happen.
>
> * Any other information, testing or debugging you need, just ask. I 
> just can't use the PC, it usually freezes within minutes. Please CC 
> any replies to me [brunobg at gmail]. Thank you very much for any help.
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)
  2009-11-28 22:01 ` Michael Breuer
@ 2009-11-29 15:52   ` Bruno Barberi Gnecco
  2009-11-29 16:47     ` Robert Hancock
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Barberi Gnecco @ 2009-11-29 15:52 UTC (permalink / raw)
  To: Michael Breuer; +Cc: linux-kernel, brunobg

> I'd think this is a hardware problem. Some things that have caused me 
> similar grief in the past:

	It is possible, but I can rule most of them out. I think it's unlikely.

> Bad IDE cable
> Bad SATA cable

	Ruled out. Only SATA drives. I used two different HDs (separatedly) with different 
cables, and the live CD. It's not the cables or the drives.

> Bad power supply
> Bad motherboard

	Both are brand new, but of course that doesn't rule them out.

	Regarding the PS, I have checked voltages with a multimeter and they are more than fine, 
and the wattage is enough for the system, so it'd have to be a very weird transient glitch 
that affects only memory access. See also below.

	Any ideas to rule the MB out, other than "get a new one"?

> Bad memory (memtest doesn't necessarily access things the same way as 
> the kernel)

	Ruled out. I replaced with a 2GB DDR2, still got the bug: "BUG: Bad page map in process".

> Bad cards (pci, agp, whatever)

	Ruled out. The only card is the video card. I replaced it with a very old PCI board and 
still got error. This also pretty much rules out that the PS is underpowered, since I 
powered only the MB and the HD.

	Could it be one of the onboard things? I disabled everything but the LAN, and still got it.

> Any of the above with loose connections

	I already reconnected everything twice. Could still be a loose connection of one of the 
wires in the connector, but it's very very unlikely to give such a specific error on 
memory access.

> And did I mention bad power supply?

	Yes you did, and I'll try to get another one to be sure, but it could still be a software 
bug too.

	I tried to install an old Win2K I had here. It doesn't handle the big HD well and ends up 
not booting after installation (it can't partition it, also), but it didn't freeze (even 
though it formatted the disk and copied the files, which was not slow). So +1 to being a 
kernel bug.

	The most common error message I get is: "BUG: unable to handle kernel NULL pointer 
dereference."

	Any other suggestions? Any other information which could help?

	Thanks for the answer,

> Bruno Barberi Gnecco wrote:
>> Summary: a PC I'm testing is unstable and freezing often. Freezes 
>> occur in various situations, including under no load (not even the X 
>> running). Dumps are provided below.
>>
>> Sometimes it's a plain freeze, sometimes a kernel panic with the 
>> blinking leds and a dump. It seems the problem is related to memory, 
>> most of the messages are related to null memory references (see more 
>> below). System ran once for several hours before freezing.
>>
>> * System specification: Intel Core 2 Quad Q9550 @ 2.83GHz, 4GB DDR3, 
>> Motherboard Asus P5QC. No overclocking.
>>
>> * Tested under the following kernels/distros, all had the bug. Since I 
>> used their vanilla kernels, I'm not posting a copy of the .configs:
>>
>> - 2.6.27.7-smp (slackware 12.2 hugesmp kernel; distro heavily 
>> updated), 32 bits
>> - 2.6.29.6-smp (slackware 13.0 hugesmp kernel; fresh install), both 32 
>> and 64 bits
>> - 2.6.29.6-nosmp (slackware 13.0 huge kernel; fresh install), both 32 
>> and 64 bits
>> - 2.6.31-14-generic SMP (ubuntu 9.10 netbook remix live cd), 32 bits
>>
>> * Here are some screen dumps (sorry, no serial cable for a text dump).
>>
>> http://ultraxs.com/share-208E_4B118ED2.html
>> http://ultraxs.com/share-2FDD_4B118ED2.html
>> http://ultraxs.com/share-D0D9_4B118ED2.html
>> http://ultraxs.com/share-498F_4B118ED2.html
>> http://ultraxs.com/share-0AD8_4B118ED2.html
>>
>> * Turning off ACPI and APIC, either through the BIOS or through kernel 
>> parameters did not help, even when only one processor is recognized.
>>
>> * I ran memtest and no errors were detected. I completed the tests 
>> three times.
>>
>> * Nothing is ever output to the logs.
>>
>> * A sure way to get a freeze is to recompile the kernel, but the 
>> freeze does not happen at the same point of the compilation and I 
>> can't even say if the compilation triggers the bug or if it's just a 
>> coincidence and the compilation just takes long enough for it to happen.
>>
>> * Any other information, testing or debugging you need, just ask. I 
>> just can't use the PC, it usually freezes within minutes. Please CC 
>> any replies to me [brunobg at gmail]. Thank you very much for any help.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


-- 
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)
  2009-11-29 15:52   ` Bruno Barberi Gnecco
@ 2009-11-29 16:47     ` Robert Hancock
       [not found]       ` <1259518220.20108.36.camel@marge.simson.net>
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Hancock @ 2009-11-29 16:47 UTC (permalink / raw)
  To: Bruno Barberi Gnecco; +Cc: Michael Breuer, linux-kernel

On 11/29/2009 09:52 AM, Bruno Barberi Gnecco wrote:
>> I'd think this is a hardware problem. Some things that have caused me
>> similar grief in the past:
>
> It is possible, but I can rule most of them out. I think it's unlikely.
>
>> Bad IDE cable
>> Bad SATA cable
>
> Ruled out. Only SATA drives. I used two different HDs (separatedly) with
> different cables, and the live CD. It's not the cables or the drives.
>
>> Bad power supply
>> Bad motherboard
>
> Both are brand new, but of course that doesn't rule them out.

Checked CPU temperature under load?

>
> Regarding the PS, I have checked voltages with a multimeter and they are
> more than fine, and the wattage is enough for the system, so it'd have
> to be a very weird transient glitch that affects only memory access. See
> also below.

Most of the time transients will be the issue when a power supply causes 
problems and that can't be seen with a normal voltmeter. It's not 
typical for the rails to be low all the time unless the power supply is 
heavily overloaded.

>
> Any ideas to rule the MB out, other than "get a new one"?
>
>> Bad memory (memtest doesn't necessarily access things the same way as
>> the kernel)
>
> Ruled out. I replaced with a 2GB DDR2, still got the bug: "BUG: Bad page
> map in process".
>
>> Bad cards (pci, agp, whatever)
>
> Ruled out. The only card is the video card. I replaced it with a very
> old PCI board and still got error. This also pretty much rules out that
> the PS is underpowered, since I powered only the MB and the HD.
>
> Could it be one of the onboard things? I disabled everything but the
> LAN, and still got it.
>
>> Any of the above with loose connections
>
> I already reconnected everything twice. Could still be a loose
> connection of one of the wires in the connector, but it's very very
> unlikely to give such a specific error on memory access.
>
>> And did I mention bad power supply?
>
> Yes you did, and I'll try to get another one to be sure, but it could
> still be a software bug too.
>
> I tried to install an old Win2K I had here. It doesn't handle the big HD
> well and ends up not booting after installation (it can't partition it,
> also), but it didn't freeze (even though it formatted the disk and
> copied the files, which was not slow). So +1 to being a kernel bug.
>
> The most common error message I get is: "BUG: unable to handle kernel
> NULL pointer dereference."

If you're getting random crashes in different places then it usually is 
some kind of hardware problem, unless there's some kind of random memory 
corruption going on, but that seems a bit unlikely.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)
       [not found]       ` <1259518220.20108.36.camel@marge.simson.net>
@ 2009-12-03 16:41         ` Bruno Barberi Gnecco
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Barberi Gnecco @ 2009-12-03 16:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Robert Hancock, linux-kernel


>>> Regarding the PS, I have checked voltages with a multimeter and they are
>>> more than fine, and the wattage is enough for the system, so it'd have
>>> to be a very weird transient glitch that affects only memory access. See
>>> also below.
>> Most of the time transients will be the issue when a power supply causes 
>> problems and that can't be seen with a normal voltmeter. It's not 
>> typical for the rails to be low all the time unless the power supply is 
>> heavily overloaded.
> 
> Or stone cold dead.
> 
> You can't check any PSU with any multimeter I've ever seen unless it's a
> catastrophic failure, or as you said, so overloaded that it can't
> regulate (in which case it would have shut down if it were decent
> quality...).  Non-catastophic PSU failures are often filter problems
> that a multimeter isn't fast enough to see.  Many switchers are
> deplorably noisy, and rely on the caps at the end of the transmission
> line, so one poor quality or dried out cap on MB can screw the pooch
> too.
> 
>>> Any ideas to rule the MB out, other than "get a new one"?
>>>
>>>> Bad memory (memtest doesn't necessarily access things the same way as
>>>> the kernel)
>>> Ruled out. I replaced with a 2GB DDR2, still got the bug: "BUG: Bad page
>>> map in process".
>>>
>>>> Bad cards (pci, agp, whatever)
>>> Ruled out. The only card is the video card. I replaced it with a very
>>> old PCI board and still got error. This also pretty much rules out that
>>> the PS is underpowered, since I powered only the MB and the HD.
>>>
>>> Could it be one of the onboard things? I disabled everything but the
>>> LAN, and still got it.
>>>
>>>> Any of the above with loose connections
> 
> Pay very close attention to cleanliness.  Dust works it's way into
> connectors with vibration.  Pull ram, and reseat.  Resist the urge to
> clean any connector with anything other than no-residue contact cleaner.
> 
> Another thing to watch out for is crappy heat sink compound.  That dries
> out, doesn't conduct heat well enough.  Under load, such a problem may
> build VERY fast with modern CPU current draw.  If all else fails, pull
> your CPU heatsink, clean and re-apply fresh compound.
> 
>>> I already reconnected everything twice. Could still be a loose
>>> connection of one of the wires in the connector, but it's very very
>>> unlikely to give such a specific error on memory access.
>>>
>>>> And did I mention bad power supply?
>>> Yes you did, and I'll try to get another one to be sure, but it could
>>> still be a software bug too.
> 
> Yes, but try another unit.  PSU is THE odds on favorite for random crap
> with everything from PC hardware to very high dollar HW.  It's the point
> of maximum electrical stress.  It's also a spot where many people try to
> save money... big mistake that.
> 
> (removes HW guy hat;)

	Follow-up, with thanks to everybody who helped: I tried a different PSU and still got the 
problem, and I also got a BSOD with Windows. So it seems to be a problem with the 
motherboard or the processor.

	Thanks a lot again,

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-03 16:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-28 22:00 PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps) Bruno Barberi Gnecco
2009-11-28 22:01 ` Michael Breuer
2009-11-29 15:52   ` Bruno Barberi Gnecco
2009-11-29 16:47     ` Robert Hancock
     [not found]       ` <1259518220.20108.36.camel@marge.simson.net>
2009-12-03 16:41         ` Bruno Barberi Gnecco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox