public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Johnson <dj@david-web.co.uk>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Hardware bug or kernel bug?
Date: Fri, 13 Oct 2006 10:20:50 +0100	[thread overview]
Message-ID: <200610131020.51110.dj@david-web.co.uk> (raw)
In-Reply-To: <Pine.LNX.4.64.0610121158490.3952@g5.osdl.org>

On Thursday 12 October 2006 20:13, you wrote:
>
> A reboot usually indicates a serious hardware problem - it could be an
> overheating sensor tripping, but it could be some serious corruption
> causing a triple-fault or something like that too.
>
> But the _most_ likely problem is just the power supply. If your power
> supply is border-line, having something that stresses CPU, disk,
> southbridge and networking at the same time may be just the way to cause a
> power-fail signal, which usually causes an instant reboot.

The power supplies in both machines on which I'm seeing the problem are brand 
new, supposedly good quality and from different manufacturers. Could it be 
that the motherboard has some fault which causes it to overload even good 
power supplies?

> I think it just changes timings, and there is something timing-related
> going on - like just instant power draw. The timer frequency should not
> have any serious impact on heat, so I doubt it's about overheating, but
> it's certainly worth opening the case and using one of those
> compressed-air things to cool down the CPU and/or southbridge chips.

The motherboard has all the usual heat sensors and will alarm if something 
gets too hot - I suspected overheating the first time this happened and 
checked the temps in the BIOS, but everything was well within limits.

> Interrupts generally aren't problematic, I'd be more likely to suspect CPU
> overclocking or similar (does the cpuinfo match the frequency claimed by
> the BIOS?) or just some strange motherboard problem (which could be
> firmware: bad programming of memory timings etc). So a BIOS upgrade is
> worth looking into.

The cpuinfo does indeed match the reported BIOS speed. The boards are already 
running the latest BIOS, so if it is a BIOS issue the motherboard 
manufacturer isn't aware of it...

> Soemtimes issues like this can be worked around - for example, maybe the
> problem is the chipset having issues with concurrent DMA or something, so
> turning off DMA on the disk drives could possibly at least _hide_ the
> problem.

I should have mentioned that of the two machines that are having the problem, 
one is using IDE and the other SATA. The SATA machine seems worst affected by 
it.

> But check the power supply first. And check to see if there is a BIOS
> upgrade available. You can double-check the cooling: check that all
> heat-sinks are properly seated and have appropriate amounts of thermal
> grease. And blowing air from a compressed-air can on top of the things
> until you see the frost over is certainly a good spot-check.

OK I'll give all that a go.

Thanks for your help,
David.

  reply	other threads:[~2006-10-13  9:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-12 16:53 Hardware bug or kernel bug? David Johnson
2006-10-12 17:20 ` Arjan van de Ven
2006-10-12 19:13 ` Linus Torvalds
2006-10-13  9:20   ` David Johnson [this message]
2006-10-13  8:56 ` Jarek Poplawski
2006-10-13  9:20   ` David Johnson
2006-10-13 10:58     ` Jarek Poplawski
2006-10-13 11:56       ` David Johnson
2006-10-13 13:06         ` Jarek Poplawski
2006-10-13 16:24           ` David Johnson
2006-10-13 17:11             ` Alan Cox
2006-10-16 10:25             ` Jarek Poplawski
2006-10-16 14:32               ` David Johnson
2006-10-17  7:10                 ` Jarek Poplawski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200610131020.51110.dj@david-web.co.uk \
    --to=dj@david-web.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox