Re: Oops after 30 days of uptime

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Willy Tarreau <w@1wt.eu>
To: Ondrej Zary <linux@rainbow-software.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	kaber@trash.net
Subject: Re: Oops after 30 days of uptime
Date: Sun, 10 Sep 2006 10:26:49 +0200	[thread overview]
Message-ID: <20060910082649.GA20814@1wt.eu> (raw)
In-Reply-To: <200609091338.56539.linux@rainbow-software.org>

Hi Ondrej,

OK, I've analysed your oops with your kernel. My conclusions are that you
have a hardware problem (most probably the CPU), because you've hit an
impossible case :

ip_nat_cheat_check() pushed the size of the data (8) on the stack, followed
by the pointer to the data, then called csum_partial() :

c01e657f:       6a 08                   push   $0x8
c01e6581:       52                      push   %edx
c01e6582:       e8 a5 85 00 00          call   c01eeb2c <csum_partial>

In csum_partial(), ECX is filled with the size (8) and ESI with the data
pointer (0xc0227ce8) :

c01eeb32:       8b 4c 24 10             mov    0x10(%esp),%ecx
c01eeb36:       8b 74 24 0c             mov    0xc(%esp),%esi

Then, the size is divided by 32 to count how many 32 bytes blocks can be read
at a time. If the size is lower than 32, the code branches to a special
location which reads 1 word at a time :

c01eeb78:       89 ca                   mov    %ecx,%edx
c01eeb7a:       c1 e9 05                shr    $0x5,%ecx
c01eeb7d:       74 32                   je     c01eebb1 <csum_partial+0x85>

Your oops comes from a few instructions below. The branch has not been taken
while it should have because (8 >> 5) == 0. You can also see from EDX in the
oops that it really was 0x8 when copied from ECX. The rest is pretty obvious.
The data are read 32 bytes at a time after ESI, and ECX is decreased by 1
every 32 bytes. When ESI+0x18 reaches an unmapped area (0xc2000000), you get
the oops, and ECX = 0xfff113e8 as in your oops.

Given that the failing instruction is the most common conditionnal jump, it
is very fortunate that your system can work 30 days before crashing. I think
that your CPU might be running too hot and might get wrong results during
branch prediction. It's also possible that you have a poor power supply.
However, I'm pretty sure that this is not a RAM problem.

Best regards,
Willy

next prev parent reply	other threads:[~2006-09-10  8:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-01 16:52 Oops after 30 days of uptime Ondrej Zary
2006-09-01 17:00 ` Patrick McHardy
2006-09-01 18:00   ` Ondrej Zary
2006-09-03 20:03 ` Ondrej Zary
2006-09-09  5:10   ` Willy Tarreau
2006-09-09  5:20 ` Willy Tarreau
2006-09-09 10:15   ` Ondrej Zary
2006-09-09 10:19     ` Willy Tarreau
2006-09-09 10:43       ` Ondrej Zary
2006-09-09 11:38         ` Ondrej Zary
2006-09-10  8:26           ` Willy Tarreau [this message]
2006-09-10 10:43             ` Ondrej Zary
2006-09-10 13:16               ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060910082649.GA20814@1wt.eu \
    --to=w@1wt.eu \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rainbow-software.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.