All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kai Militzer <km@westend.com>
To: linux-kernel@vger.kernel.org
Subject: Where do the "Machine Check Exceptions" come from?
Date: 29 Jan 2004 11:01:34 +0100	[thread overview]
Message-ID: <1075370497.775.89.camel@bart> (raw)

Hello!

We have a Server runing here, with a very strange behavior.

It all started, that the machine crashed in two-day-intervalls with the
following message in log:

Jan  6 22:39:01 CPU 0: Machine Check Exception: 0000000000000004
Jan  6 22:39:01 Bank 4: b200000000040151
Jan  6 22:39:01 Kernel panic: CPU context corrupt

So we took the machine out of productivity and started to search for the
problem. We first thought it must be some Hardware error, so we did a
memtest86 for a long time (over 10 passes) without any errors there. We
then booted from a knoppix CD and did a burnMMX and burnP4. Nothing
happend, all ran smooth.

So we thought maybe it is some other system component, so we removed
everything not needed (network card, scsi-controller) and changed the
video-card. We then bootet again from knoppix and did a lot of kernel
compiles over night (on the harddisk, not on a ramdisk) --> all went
smooth.

We then bootet the original system (without all unneeded hardware),
started kernel compiling and it crashed after a day. This was strange.
So we looked in out changelog and then realized, that the crashing
started, when we changed the running kernel from a vanilla 2.4.19 to a
vanilla 2.4.23.

We thought it could be something in the new kernel. So we took a new
2.4.24 with the config from 2.4.23 (make oldconfig) and tested -->
system crashed after compiling kernels for a day.

So there must be something else. Next step was to take the config from
the 2.4.19 kernel and do a "make oldconfig" with the 2.4.24. The system
is now running for two days without a crash. So it must be something
that has changed between the two configs.

So I took the config from the faulty 2.4.23 kernel, and did a "make
oldconfig" with the running config from 2.4.19 on the 2.4.23 kernel.

I will attach what a "diff faulty_config running_config" showed at the
end of the mail.

Any ideas what option new option made the kernel crash? I will try the
three options directly compiled into the kernel (not as a module) the
next few days and will give an, if I can find out what causes this
behavior.

Best regards

Kai Militzer

++++output of diff+++++

153c153
< CONFIG_BLK_STATS=y
---
> # CONFIG_BLK_STATS is not set
194c194
< CONFIG_IP_NF_TFTP=m
---
> # CONFIG_IP_NF_TFTP is not set
200c200
< CONFIG_IP_NF_MATCH_PKTTYPE=m
---
> # CONFIG_IP_NF_MATCH_PKTTYPE is not set
204,206c204,206
< CONFIG_IP_NF_MATCH_RECENT=m
< CONFIG_IP_NF_MATCH_ECN=m
< CONFIG_IP_NF_MATCH_DSCP=m
---
> # CONFIG_IP_NF_MATCH_RECENT is not set
> # CONFIG_IP_NF_MATCH_ECN is not set
> # CONFIG_IP_NF_MATCH_DSCP is not set
211c211
< CONFIG_IP_NF_MATCH_HELPER=m
---
> # CONFIG_IP_NF_MATCH_HELPER is not set
213c213
< CONFIG_IP_NF_MATCH_CONNTRACK=m
---
> # CONFIG_IP_NF_MATCH_CONNTRACK is not set
227d226
< CONFIG_IP_NF_NAT_TFTP=m
230,231c229,230
< CONFIG_IP_NF_TARGET_ECN=m
< CONFIG_IP_NF_TARGET_DSCP=m
---
> # CONFIG_IP_NF_TARGET_ECN is not set
> # CONFIG_IP_NF_TARGET_DSCP is not set
238c237
< CONFIG_IP_NF_ARP_MANGLE=m
---
> # CONFIG_IP_NF_ARP_MANGLE is not set
329c328
< CONFIG_BLK_DEV_GENERIC=y
---
> # CONFIG_BLK_DEV_GENERIC is not set
557c556
< CONFIG_B44=m
---
> # CONFIG_B44 is not set
565c564
< CONFIG_E100=m
---
> # CONFIG_E100 is not set
593,594c592
< CONFIG_E1000=m
< # CONFIG_E1000_NAPI is not set
---
> # CONFIG_E1000 is not set
599c597
< CONFIG_R8169=m
---
> # CONFIG_R8169 is not set
712c710
< CONFIG_HW_RANDOM=m
---
> # CONFIG_HW_RANDOM is not set
910c908
< CONFIG_DEBUG_STACKOVERFLOW=y
---
> # CONFIG_DEBUG_STACKOVERFLOW is not set
927c925
< CONFIG_CRC32=m
---
> # CONFIG_CRC32 is not set
929c927
< CONFIG_ZLIB_DEFLATE=m
---
> # CONFIG_ZLIB_DEFLATE is not set

+++++end output of diff++++

-- 
Kai Militzer                 WESTEND GmbH  |  Internet-Business-Provider
Technik                      CISCO Systems Partner - Authorized Reseller
                             Lütticher Straße 10      Tel 0241/701333-11
km@westend.com               D-52064 Aachen              Fax 0241/911879



             reply	other threads:[~2004-01-29 10:01 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-29 10:01 Kai Militzer [this message]
2004-02-02 13:51 ` Where do the "Machine Check Exceptions" come from? [update] Kai Militzer
2004-02-08 12:13   ` Re[3]: 2.6.2 Compile Failure - Redhat 7.3 Distro Nick Warne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1075370497.775.89.camel@bart \
    --to=km@westend.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.