public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: "linux-os (Dick Johnson)" <linux-os@analogic.com>
Cc: Andreas Mohr <andi@rhlx01.fht-esslingen.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC/SERIOUS] grilling troubled CPUs for fun and profit?
Date: Mon, 19 Jun 2006 16:23:54 -0400	[thread overview]
Message-ID: <20060619202354.GD26759@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0606191542050.4926@chaos.analogic.com>

On Mon, Jun 19, 2006 at 04:00:06PM -0400, linux-os (Dick Johnson) wrote:

 > > arch/i386/kernel/doublefault.c/doublefault_fn():
 > >
 > >        for (;;) /* nothing */;
 > > }
 > >
 > > Let's assume that we have a less than moderate fan failure that causes
 > > the CPU to heat up beyond the critical limit...
 > > That might result in - you guessed it - crashes or doublefaults.
 > > In which case we enter the corresponding handler and do... what?
 > 
 > The double-fault is just a place-holder. The CPU will actually
 > reset without even executing this (try it).

Wrong.

Why do you think we go to the bother of installing a double fault handler if
we're going to reset? Why would we go to the bother of printk'ing
information about the double fault if we're about to reset faster than
it would get to a serial console ?

The box intentionally locks up, so we have a chance to know wtf happened.

 > A CPU without a fan will go into
 > a cold, cold, shutdown, requiring a hardware reset to get it out of
 > that latched, no internal clock running, mode.

Wrong.

 > Try it. I have had
 > broken plastic heat-sink hold-downs let the entire heat-sink fall off
 > the CPU. The machine just stops.

Your single datapoint is just that, a single datapoint.
There are a number of reported cases of CPUs frying themselves.
Here's one: http://www.tomshardware.com/2001/09/17/hot_spot/page4.html
Google no doubt has more.

Another anecdote: Upon fan failure, I once had an athlon MP *completely shatter*
(as in broke in two pieces) under extreme heat.

This _does_ happen.

 > Also, the CPU was only warm to the touch, having been completely shut down for the
 > several minutes it took to locate tools to remove the cover, even
 > though I deliberately left the power ON.

So you got lucky. I've blistered a thumb on hot CPUs before now
after fan failure.

 > In the first place, when the Intel and AMD CPUs overheat, they
 > shut down. 

Reality disagrees with you.

 > For sure, it might be nicer to have some call-and-never-return
 > function for waiting with the rep-nop code, but it isn't necessary
 > for CPU protection.

cpu_relax() and friends aren't going to save a box in light of
a fan failure in my experience.  
However for a box which has locked up (intentionally)
running instructions that do save power in a loop has obvious advantages.

		Dave

-- 
http://www.codemonkey.org.uk

  reply	other threads:[~2006-06-19 20:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-19 19:15 [RFC/SERIOUS] grilling troubled CPUs for fun and profit? Andreas Mohr
2006-06-19 19:39 ` John Richard Moser
2006-06-19 20:00 ` linux-os (Dick Johnson)
2006-06-19 20:23   ` Dave Jones [this message]
2006-06-19 20:47     ` linux-os (Dick Johnson)
2006-06-19 20:59       ` Dave Jones
2006-06-19 22:25     ` Pavel Machek
2006-06-19 22:41       ` Dave Jones
2006-06-20 11:39         ` linux-os (Dick Johnson)
2006-06-21 17:16           ` Ian Romanick
2006-06-21 17:57             ` linux-os (Dick Johnson)
2006-06-22 17:47         ` Pavel Machek
2006-06-20  9:58       ` Jan Engelhardt
2006-06-22 18:16         ` Pavel Machek
2006-06-23 17:32           ` Jan Engelhardt
2006-06-24 19:54             ` Pavel Machek
2006-06-25 11:01               ` Jan Engelhardt
2006-06-20  9:54     ` Jan Engelhardt
2006-06-19 21:16   ` Claudio Martins
2006-06-19 22:16 ` Pavel Machek
2006-06-19 22:43   ` Dave Jones
2006-06-20  7:29     ` Andreas Mohr
     [not found] <6pxs2-1AR-5@gated-at.bofh.it>
     [not found] ` <6pyer-2Pt-1@gated-at.bofh.it>
2006-06-19 21:40   ` Bodo Eggert
2006-06-19 21:44     ` Dave Jones
     [not found] <fa.pC0NfRl4O1eOCqPOBXy8f+7gbqU@ifi.uio.no>
     [not found] ` <fa.so5wrYE6MzA2swzlOE1Xjw9iqvk@ifi.uio.no>
2006-06-19 23:32   ` Robert Hancock
  -- strict thread matches above, loose matches on Subject: below --
2006-06-20  3:30 Ken Ryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060619202354.GD26759@redhat.com \
    --to=davej@redhat.com \
    --cc=andi@rhlx01.fht-esslingen.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-os@analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox