All of lore.kernel.org
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Roman Zippel <zippel@linux-m68k.org>
Subject: Re: Hang and Soft Lockup problems with generic time code
Date: Fri, 07 Jul 2006 16:39:39 -0700	[thread overview]
Message-ID: <1152315579.7493.9.camel@localhost.localdomain> (raw)
In-Reply-To: <1152313879.3866.53.camel@mulgrave.il.steeleye.com>

On Fri, 2006-07-07 at 18:11 -0500, James Bottomley wrote:
> Ever since the 2.6.17 kernel pulled in the generic timer code, I've been
> experiencing hangs and softlockups with the aic94xx driver (which I
> thought were driver related).  Finally, after a lot of debugging I've
> isolated the culprit to linux/time.h:timespec_add_ns()
> 
> What is happening is that a->tv_nsec is coming in here negative and
> looping for huge amounts of time.

Yep. This has been seen where a large number of ticks are lost. Roman
and I are working on a solution for this (I sent a patch out to the list
earlier today for it, and Roman *just* posted his version a moment ago -
if you can give one or both of them a try it would be appreciated).

> Why tv_nsec is negative appears to be related to massive cycle
> adjustments in kernel/timer.c:update_wall_time().  With the TSC as my
> clocksource I've seen the clocksource_read() return increments of in the
> 200s range.  No idea why this is happening.  The same strange
> discontinuous jumps in cycle count also occurs with pm_acpi as the clock
> source.

Did you really mean jumps of 200 seconds? Hmmm. The issue Roman and I
have been looking into does occur when we lose a number of ticks and
that confuses the clocksource adjustment code. The fix we're working on
corrects the adjustment confusion, but doesn't fix the lost ticks.

However 200 seconds of lost ticks sounds very off. Could the driver be
disabling interrupt for such a long period of time?

> I can't get a good enough handle on all the generic time code changes to
> reverse them.  However, this machine is a P4, so I was able to boot it
> with an x86_64 kernel (which doesn't yet use the generic time code) and
> confirm that all the hangs and softlockups go away.
> 
> The machine in question is an IBM x206m dual core P4.

I appreciate the report and apologize for the trouble.

thanks
-john


  reply	other threads:[~2006-07-07 23:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-07 23:11 Hang and Soft Lockup problems with generic time code James Bottomley
2006-07-07 23:39 ` john stultz [this message]
2006-07-08  4:36   ` James Bottomley
2006-07-08 21:47     ` john stultz
2006-07-08 22:13       ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1152315579.7493.9.camel@localhost.localdomain \
    --to=johnstul@us.ibm.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.