public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 18rc1 soft lockup
@ 2006-07-11 19:03 Dave Jones
  2006-07-11 19:13 ` john stultz
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-07-11 19:03 UTC (permalink / raw)
  To: Linux Kernel

Just saw this during boot of a HT P4 box.

BUG: soft lockup detected on CPU#0!
 [<c04051af>] show_trace_log_lvl+0x54/0xfd
 [<c0405766>] show_trace+0xd/0x10
 [<c0405885>] dump_stack+0x19/0x1b
 [<c0450ec7>] softlockup_tick+0xa5/0xb9
 [<c042d496>] run_local_timers+0x12/0x14
 [<c042d81b>] update_process_times+0x3c/0x61
 [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
 [<c0404ada>] apic_timer_interrupt+0x2a/0x30
BUG: soft lockup detected on CPU#1!
 [<c04051af>] show_trace_log_lvl+0x54/0xfd
 [<c0405766>] show_trace+0xd/0x10
 [<c0405885>] dump_stack+0x19/0x1b
 [<c0450ec7>] softlockup_tick+0xa5/0xb9
 [<c042d496>] run_local_timers+0x12/0x14
 [<c042d81b>] update_process_times+0x3c/0x61
 [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
 [<c0404ada>] apic_timer_interrupt+0x2a/0x30
BUG: soft lockup detected on CPU#0!
 [<c04051af>] show_trace_log_lvl+0x54/0xfd
 [<c0405766>] show_trace+0xd/0x10
 [<c0405885>] dump_stack+0x19/0x1b
 [<c0450ec7>] softlockup_tick+0xa5/0xb9
 [<c042d496>] run_local_timers+0x12/0x14
 [<c042d81b>] update_process_times+0x3c/0x61
 [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
 [<c0404ada>] apic_timer_interrupt+0x2a/0x30

It then continued booting just fine..

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-11 19:03 18rc1 soft lockup Dave Jones
@ 2006-07-11 19:13 ` john stultz
  2006-07-11 19:16   ` Dave Jones
  0 siblings, 1 reply; 12+ messages in thread
From: john stultz @ 2006-07-11 19:13 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

On Tue, 2006-07-11 at 15:03 -0400, Dave Jones wrote:
> Just saw this during boot of a HT P4 box.
> 
> BUG: soft lockup detected on CPU#0!
>  [<c04051af>] show_trace_log_lvl+0x54/0xfd
>  [<c0405766>] show_trace+0xd/0x10
>  [<c0405885>] dump_stack+0x19/0x1b
>  [<c0450ec7>] softlockup_tick+0xa5/0xb9
>  [<c042d496>] run_local_timers+0x12/0x14
>  [<c042d81b>] update_process_times+0x3c/0x61
>  [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
>  [<c0404ada>] apic_timer_interrupt+0x2a/0x30

That's clocksource_adjust/lost tick bug. Roman's fix landed in Linus'
-git yesterday.

thanks
-john


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-11 19:13 ` john stultz
@ 2006-07-11 19:16   ` Dave Jones
  2006-07-13 22:07     ` Dave Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-07-11 19:16 UTC (permalink / raw)
  To: john stultz; +Cc: Linux Kernel

On Tue, Jul 11, 2006 at 12:13:47PM -0700, john stultz wrote:
 > On Tue, 2006-07-11 at 15:03 -0400, Dave Jones wrote:
 > > Just saw this during boot of a HT P4 box.
 > > 
 > > BUG: soft lockup detected on CPU#0!
 > >  [<c04051af>] show_trace_log_lvl+0x54/0xfd
 > >  [<c0405766>] show_trace+0xd/0x10
 > >  [<c0405885>] dump_stack+0x19/0x1b
 > >  [<c0450ec7>] softlockup_tick+0xa5/0xb9
 > >  [<c042d496>] run_local_timers+0x12/0x14
 > >  [<c042d81b>] update_process_times+0x3c/0x61
 > >  [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
 > >  [<c0404ada>] apic_timer_interrupt+0x2a/0x30
 > 
 > That's clocksource_adjust/lost tick bug. Roman's fix landed in Linus'
 > -git yesterday.

Ah, that was actually a .18rc1-git3 tree. I notice a git4 just appeared,
I'll try and reproduce with that.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-11 19:16   ` Dave Jones
@ 2006-07-13 22:07     ` Dave Jones
  2006-07-13 22:15       ` john stultz
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-07-13 22:07 UTC (permalink / raw)
  To: john stultz, Linux Kernel

On Tue, Jul 11, 2006 at 03:16:58PM -0400, Dave Jones wrote:
 > On Tue, Jul 11, 2006 at 12:13:47PM -0700, john stultz wrote:
 >  > On Tue, 2006-07-11 at 15:03 -0400, Dave Jones wrote:
 >  > > Just saw this during boot of a HT P4 box.
 >  > > 
 >  > > BUG: soft lockup detected on CPU#0!
 >  > >  [<c04051af>] show_trace_log_lvl+0x54/0xfd
 >  > >  [<c0405766>] show_trace+0xd/0x10
 >  > >  [<c0405885>] dump_stack+0x19/0x1b
 >  > >  [<c0450ec7>] softlockup_tick+0xa5/0xb9
 >  > >  [<c042d496>] run_local_timers+0x12/0x14
 >  > >  [<c042d81b>] update_process_times+0x3c/0x61
 >  > >  [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
 >  > >  [<c0404ada>] apic_timer_interrupt+0x2a/0x30
 >  > 
 >  > That's clocksource_adjust/lost tick bug. Roman's fix landed in Linus'
 >  > -git yesterday.
 > 
 > Ah, that was actually a .18rc1-git3 tree. I notice a git4 just appeared,
 > I'll try and reproduce with that.

Just when I thought it had gotten fixed..
2.6.18rc1-git6 this time on x86-64..

BUG: soft lockup detected on CPU#3!

Call Trace:
 [<ffffffff80270865>] show_trace+0xaa/0x23d
 [<ffffffff80270a0d>] dump_stack+0x15/0x17
 [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
 [<ffffffff80250bea>] run_local_timers+0x13/0x15
 [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
 [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
 [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
 [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70

		Dave


-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-13 22:07     ` Dave Jones
@ 2006-07-13 22:15       ` john stultz
  2006-07-13 22:28         ` Dave Jones
  2006-07-13 23:05         ` Roman Zippel
  0 siblings, 2 replies; 12+ messages in thread
From: john stultz @ 2006-07-13 22:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, Roman Zippel

On Thu, 2006-07-13 at 18:07 -0400, Dave Jones wrote:
> On Tue, Jul 11, 2006 at 03:16:58PM -0400, Dave Jones wrote:
>  > On Tue, Jul 11, 2006 at 12:13:47PM -0700, john stultz wrote:
>  >  > On Tue, 2006-07-11 at 15:03 -0400, Dave Jones wrote:
>  >  > > Just saw this during boot of a HT P4 box.
>  >  > > 
>  >  > > BUG: soft lockup detected on CPU#0!
>  >  > >  [<c04051af>] show_trace_log_lvl+0x54/0xfd
>  >  > >  [<c0405766>] show_trace+0xd/0x10
>  >  > >  [<c0405885>] dump_stack+0x19/0x1b
>  >  > >  [<c0450ec7>] softlockup_tick+0xa5/0xb9
>  >  > >  [<c042d496>] run_local_timers+0x12/0x14
>  >  > >  [<c042d81b>] update_process_times+0x3c/0x61
>  >  > >  [<c04179e0>] smp_apic_timer_interrupt+0x6d/0x75
>  >  > >  [<c0404ada>] apic_timer_interrupt+0x2a/0x30
>  >  > 
>  >  > That's clocksource_adjust/lost tick bug. Roman's fix landed in Linus'
>  >  > -git yesterday.
>  > 
>  > Ah, that was actually a .18rc1-git3 tree. I notice a git4 just appeared,
>  > I'll try and reproduce with that.
> 
> Just when I thought it had gotten fixed..
> 2.6.18rc1-git6 this time on x86-64..
> 
> BUG: soft lockup detected on CPU#3!
> 
> Call Trace:
>  [<ffffffff80270865>] show_trace+0xaa/0x23d
>  [<ffffffff80270a0d>] dump_stack+0x15/0x17
>  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
>  [<ffffffff80250bea>] run_local_timers+0x13/0x15
>  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
>  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
>  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
>  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70

Hmmm.. grumble. Was this on bootup, or after some time period?

I'm looking into it.

thanks for the report
-john


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-13 22:15       ` john stultz
@ 2006-07-13 22:28         ` Dave Jones
  2006-07-14  9:22           ` Roman Zippel
  2006-07-13 23:05         ` Roman Zippel
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-07-13 22:28 UTC (permalink / raw)
  To: john stultz; +Cc: Linux Kernel, Roman Zippel

On Thu, Jul 13, 2006 at 03:15:43PM -0700, john stultz wrote:

 > > Just when I thought it had gotten fixed..
 > > 2.6.18rc1-git6 this time on x86-64..
 > > 
 > > BUG: soft lockup detected on CPU#3!
 > > 
 > > Call Trace:
 > >  [<ffffffff80270865>] show_trace+0xaa/0x23d
 > >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
 > >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
 > >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
 > >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
 > >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
 > >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
 > >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
 > 
 > Hmmm.. grumble. Was this on bootup, or after some time period?

Right at the end of boot up, between the switch from runlevel 3 to 5.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-13 22:15       ` john stultz
  2006-07-13 22:28         ` Dave Jones
@ 2006-07-13 23:05         ` Roman Zippel
  2006-07-14  0:02           ` john stultz
  1 sibling, 1 reply; 12+ messages in thread
From: Roman Zippel @ 2006-07-13 23:05 UTC (permalink / raw)
  To: john stultz; +Cc: Dave Jones, Linux Kernel

Hi,

On Thu, 13 Jul 2006, john stultz wrote:

> > Just when I thought it had gotten fixed..
> > 2.6.18rc1-git6 this time on x86-64..
> > 
> > BUG: soft lockup detected on CPU#3!
> > 
> > Call Trace:
> >  [<ffffffff80270865>] show_trace+0xaa/0x23d
> >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
> >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
> >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
> >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
> >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
> >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
> >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
> 
> Hmmm.. grumble. Was this on bootup, or after some time period?
> 
> I'm looking into it.

I don't quite understand how this is clock related, soft lockup uses 
jiffies and there is nothing clock related in the trace???

bye, Roman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-13 23:05         ` Roman Zippel
@ 2006-07-14  0:02           ` john stultz
  2006-07-14  0:12             ` Dave Jones
  0 siblings, 1 reply; 12+ messages in thread
From: john stultz @ 2006-07-14  0:02 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Dave Jones, Linux Kernel

On Fri, 2006-07-14 at 01:05 +0200, Roman Zippel wrote:
> Hi,
> 
> On Thu, 13 Jul 2006, john stultz wrote:
> 
> > > Just when I thought it had gotten fixed..
> > > 2.6.18rc1-git6 this time on x86-64..
> > > 
> > > BUG: soft lockup detected on CPU#3!
> > > 
> > > Call Trace:
> > >  [<ffffffff80270865>] show_trace+0xaa/0x23d
> > >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
> > >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
> > >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
> > >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
> > >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
> > >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
> > >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
> > 
> > Hmmm.. grumble. Was this on bootup, or after some time period?
> > 
> > I'm looking into it.
> 
> I don't quite understand how this is clock related, soft lockup uses 
> jiffies and there is nothing clock related in the trace???

Hmmm. Well, its easy to check:

Dave, could you comment out the "clocksource_adjust(...)" line in
kernel/timer.c::update_wall_time() just to check if its the same issue?

thanks
-john



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-14  0:02           ` john stultz
@ 2006-07-14  0:12             ` Dave Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Jones @ 2006-07-14  0:12 UTC (permalink / raw)
  To: john stultz; +Cc: Roman Zippel, Linux Kernel

On Thu, Jul 13, 2006 at 05:02:38PM -0700, john stultz wrote:
 
 > > I don't quite understand how this is clock related, soft lockup uses 
 > > jiffies and there is nothing clock related in the trace???
 > 
 > Hmmm. Well, its easy to check:
 > 
 > Dave, could you comment out the "clocksource_adjust(...)" line in
 > kernel/timer.c::update_wall_time() just to check if its the same issue?

I'll try, but just like every other bug I've hit together, it's
non-deterministic.  I'll do a half dozen boots to see if turns up again.

Whatever happened to the good old days of reproducable bugs? :)

		Dave
-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-13 22:28         ` Dave Jones
@ 2006-07-14  9:22           ` Roman Zippel
  2006-07-14 14:09             ` Dave Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Zippel @ 2006-07-14  9:22 UTC (permalink / raw)
  To: Dave Jones; +Cc: john stultz, Linux Kernel

Hi,

On Thu, 13 Jul 2006, Dave Jones wrote:

> On Thu, Jul 13, 2006 at 03:15:43PM -0700, john stultz wrote:
> 
>  > > Just when I thought it had gotten fixed..
>  > > 2.6.18rc1-git6 this time on x86-64..
>  > > 
>  > > BUG: soft lockup detected on CPU#3!
>  > > 
>  > > Call Trace:
>  > >  [<ffffffff80270865>] show_trace+0xaa/0x23d
>  > >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
>  > >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
>  > >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
>  > >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
>  > >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
>  > >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
>  > >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
>  > 
>  > Hmmm.. grumble. Was this on bootup, or after some time period?
> 
> Right at the end of boot up, between the switch from runlevel 3 to 5.

When it waits, a SysRq+T might be useful.

bye, Roman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-14  9:22           ` Roman Zippel
@ 2006-07-14 14:09             ` Dave Jones
  2006-07-14 23:30               ` john stultz
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-07-14 14:09 UTC (permalink / raw)
  To: Roman Zippel; +Cc: john stultz, Linux Kernel

On Fri, Jul 14, 2006 at 11:22:47AM +0200, Roman Zippel wrote:
 > Hi,
 > 
 > On Thu, 13 Jul 2006, Dave Jones wrote:
 > 
 > > On Thu, Jul 13, 2006 at 03:15:43PM -0700, john stultz wrote:
 > > 
 > >  > > Just when I thought it had gotten fixed..
 > >  > > 2.6.18rc1-git6 this time on x86-64..
 > >  > > 
 > >  > > BUG: soft lockup detected on CPU#3!
 > >  > > 
 > >  > > Call Trace:
 > >  > >  [<ffffffff80270865>] show_trace+0xaa/0x23d
 > >  > >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
 > >  > >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
 > >  > >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
 > >  > >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
 > >  > >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
 > >  > >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
 > >  > >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
 > >  > 
 > >  > Hmmm.. grumble. Was this on bootup, or after some time period?
 > > 
 > > Right at the end of boot up, between the switch from runlevel 3 to 5.
 > 
 > When it waits, a SysRq+T might be useful.

it doesn't wait. this scrolls by within a split second.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 18rc1 soft lockup
  2006-07-14 14:09             ` Dave Jones
@ 2006-07-14 23:30               ` john stultz
  0 siblings, 0 replies; 12+ messages in thread
From: john stultz @ 2006-07-14 23:30 UTC (permalink / raw)
  To: Dave Jones; +Cc: Roman Zippel, Linux Kernel

On Fri, 2006-07-14 at 10:09 -0400, Dave Jones wrote:
> On Fri, Jul 14, 2006 at 11:22:47AM +0200, Roman Zippel wrote:
>  > Hi,
>  > 
>  > On Thu, 13 Jul 2006, Dave Jones wrote:
>  > 
>  > > On Thu, Jul 13, 2006 at 03:15:43PM -0700, john stultz wrote:
>  > > 
>  > >  > > Just when I thought it had gotten fixed..
>  > >  > > 2.6.18rc1-git6 this time on x86-64..
>  > >  > > 
>  > >  > > BUG: soft lockup detected on CPU#3!
>  > >  > > 
>  > >  > > Call Trace:
>  > >  > >  [<ffffffff80270865>] show_trace+0xaa/0x23d
>  > >  > >  [<ffffffff80270a0d>] dump_stack+0x15/0x17
>  > >  > >  [<ffffffff802c44e6>] softlockup_tick+0xd5/0xea
>  > >  > >  [<ffffffff80250bea>] run_local_timers+0x13/0x15
>  > >  > >  [<ffffffff8029cc1d>] update_process_times+0x4c/0x79
>  > >  > >  [<ffffffff8027bfeb>] smp_local_timer_interrupt+0x2b/0x50
>  > >  > >  [<ffffffff8027c766>] smp_apic_timer_interrupt+0x58/0x62
>  > >  > >  [<ffffffff802628ae>] apic_timer_interrupt+0x6a/0x70
>  > >  > 
>  > >  > Hmmm.. grumble. Was this on bootup, or after some time period?
>  > > 
>  > > Right at the end of boot up, between the switch from runlevel 3 to 5.
>  > 
>  > When it waits, a SysRq+T might be useful.
> 
> it doesn't wait. this scrolls by within a split second.

Huh. I wonder if the x86-64 lost-tick compensation code is being
triggered. Could you boot w/ "report_lost_ticks" and see if it spits
anything out right before this?

thanks
-john


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-07-14 23:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 19:03 18rc1 soft lockup Dave Jones
2006-07-11 19:13 ` john stultz
2006-07-11 19:16   ` Dave Jones
2006-07-13 22:07     ` Dave Jones
2006-07-13 22:15       ` john stultz
2006-07-13 22:28         ` Dave Jones
2006-07-14  9:22           ` Roman Zippel
2006-07-14 14:09             ` Dave Jones
2006-07-14 23:30               ` john stultz
2006-07-13 23:05         ` Roman Zippel
2006-07-14  0:02           ` john stultz
2006-07-14  0:12             ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox