From: Roger Heflin <rogerheflin@gmail.com>
To: "Joel K. Greene" <joel.greene@catapult.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Mike Steckly <michaels@catapult.com>
Subject: Re: Clock has stopped (time/date looping over 5 seconds), things are broken - what to check to debug?
Date: Mon, 07 Apr 2008 10:41:58 -0500 [thread overview]
Message-ID: <47FA40C6.5030104@gmail.com> (raw)
In-Reply-To: <1207568231.4496.9.camel@linuxdt036.nc.catapult.com>
Joel K. Greene wrote:
> Hi Roger,
>
> Does this sound familiar:
>
> http://lkml.org/lkml/2008/3/14/178
That sounds like it matches what I have.
>
>
> We've been chasing this for quite a while. Our PIC gets in a bad state
> where it thinks the CPU is in the ISR, and so won't give another int. We
> haven't much of an idea of how we get in that state other than that
> HZ=1000 makes it happen faster and HZ=100 causes it less often.
I do have HZ=1000 set, Pavel mentions setting it the =4000 to make it happen
faster, I will try that, I am rebuilding 2.4.24.4 with =4000 in the .config
file, and will verify after it is up that 4000 is running on it.
My machine does have a fair amount of cpu usage (transcoding video), and has a
fair amount of interrupt handling (5 disks, and 3 TV recording cards).
>
> I think that if you look at jiffies you will see it is not incrementing.
> The 4 second loop seems to be in the conversion from jiffies to wall
> time.
I did check the counter in /proc/timer_list under (now at) and it was looping too.
>
>
> It _appears_ that there is a race in the kernel that can be triggered by
> any number of hardware issues. There's another thread by Gregory Stark
> with the same symptoms - he thinks his was fixed by replacing a bad
> DIMM.
I don't think I have bad HW, I will run a test job for a few hours that checks
its results and make sure that the proper answers are coming back, and it is not
crashing.
I do have a couple of disks (on a SIL controller) that every so often appear to
give funny errors, but recover and continue on.
>
> Note that we first saw this on 2.6.16, and Gregory found it on 2.6.5.
> We've seen systems run for a couple of months before seeing this, so
> it's a bear to debug.
>
> How often is this happening for you? How repeatable?
14-30 days, I don't know if it always happens or not, I don't have exact enough
data, but I don't think the machine has made it past 30 days in the last 6
months, if I go back far enough though, I believe it was stable, before I added
a couple of TV recording cards (PVR150, HD5500), and a disk controller (SIL) to
it.
>
> What hardware are you running on?
AMD-754 Sempron64 processor.
ASUS K8V-SE Deluxe MB (VT8385/VT8387 Chipset), so very different HW that the
Serverworks-P3 that you have.
prev parent reply other threads:[~2008-04-07 15:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-04 21:27 Clock has stopped (time/date looping over 5 seconds), things are broken - what to check to debug? Roger Heflin
2008-04-07 11:37 ` Joel K. Greene
2008-04-07 15:41 ` Roger Heflin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47FA40C6.5030104@gmail.com \
--to=rogerheflin@gmail.com \
--cc=joel.greene@catapult.com \
--cc=linux-kernel@vger.kernel.org \
--cc=michaels@catapult.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.