All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] BUG() on soft lockup upon suspend/resume
@ 2006-10-09 21:22 Glauber de Oliveira Costa
  2006-10-09 22:22 ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Glauber de Oliveira Costa @ 2006-10-09 21:22 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

Hi,

In systems with vcpu > 1, a BUG due to a detected soft lockup seems to be 
triggered after system resume/suspend. This is probably due to the lack of 
seqlocking around the region that does the local time processing. 

The following patch fix this.

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

[-- Attachment #2: softlockup.patch --]
[-- Type: text/plain, Size: 846 bytes --]

--- linux-2.6.18-orig/arch/i386/kernel/time-xen.c	2006-10-06 14:37:41.000000000 -0400
+++ linux-2.6.18.x86_64/arch/i386/kernel/time-xen.c	2006-10-09 17:13:54.000000000 -0400
@@ -666,6 +666,13 @@ irqreturn_t timer_interrupt(int irq, voi
 		clock_was_set();
 	}
 
+	/* Local timer processing (see update_process_times()). */
+	run_local_timers();
+	if (rcu_pending(cpu))
+		rcu_check_callbacks(cpu, user_mode(regs));
+	scheduler_tick();
+	run_posix_cpu_timers(current); 
+
 	write_sequnlock(&xtime_lock);
 
 	/*
@@ -709,13 +716,6 @@ irqreturn_t timer_interrupt(int irq, voi
 					    (cputime_t)delta_cpu);
 	}
 
-	/* Local timer processing (see update_process_times()). */
-	run_local_timers();
-	if (rcu_pending(cpu))
-		rcu_check_callbacks(cpu, user_mode(regs));
-	scheduler_tick();
-	run_posix_cpu_timers(current);
-
 	return IRQ_HANDLED;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] BUG() on soft lockup upon suspend/resume
  2006-10-09 21:22 [PATCH] BUG() on soft lockup upon suspend/resume Glauber de Oliveira Costa
@ 2006-10-09 22:22 ` Keir Fraser
  2006-10-10  0:29   ` Glauber de Oliveira Costa
  0 siblings, 1 reply; 4+ messages in thread
From: Keir Fraser @ 2006-10-09 22:22 UTC (permalink / raw)
  To: Glauber de Oliveira Costa, xen-devel

On 9/10/06 10:22 pm, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:

> In systems with vcpu > 1, a BUG due to a detected soft lockup seems to be
> triggered after system resume/suspend. This is probably due to the lack of
> seqlocking around the region that does the local time processing.

We do SMP save/restore tests regularly and do not see this issue. It ought
to be avoided by the fact that, when we bring up a CPU, we
touch_softlockup_watchdog() in cpu_bringup(), before enabling interrupts.
For CPU0 on resume, the touch is done in time_resume() in
arch/i386/kernel/time-xen.c.

In general that local accounting work does not need to be done under the
xtime_lock. Native x86 does not take the lock in smp_local_timer_interrupt()
(apic.c) for example.

I think we need to understand the issue you are hitting a bit more before
deciding on the right fix.

 -- Keir

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] BUG() on soft lockup upon suspend/resume
  2006-10-09 22:22 ` Keir Fraser
@ 2006-10-10  0:29   ` Glauber de Oliveira Costa
  2006-10-10 10:06     ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Glauber de Oliveira Costa @ 2006-10-10  0:29 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

> 
> > In systems with vcpu > 1, a BUG due to a detected soft lockup seems to be
> > triggered after system resume/suspend. This is probably due to the lack of
> > seqlocking around the region that does the local time processing.
> 
> We do SMP save/restore tests regularly and do not see this issue. It ought
> to be avoided by the fact that, when we bring up a CPU, we
> touch_softlockup_watchdog() in cpu_bringup(), before enabling interrupts.
> For CPU0 on resume, the touch is done in time_resume() in
> arch/i386/kernel/time-xen.c.

This happens not only (once) when the system comes back. It do happen a
lot after it. So even if the first touch is right, I suspect this issue
is more related to a situation in which we are already resumed for a
long time, with all set up
> 
> I think we need to understand the issue you are hitting a bit more before
> deciding on the right fix.

Right, here it goes more info:

I'm on a 8-way x86_64 machine, and This is the sort of info I see
repeatedly:

BUG: soft lockup detected on CPU#1!

Call Trace:
 <IRQ>  [<ffffffff802ace9d>] softlockup_tick+0xf8/0x113
 [<ffffffff8026d591>] timer_interrupt+0x38a/0x3d8
 [<ffffffff80210e87>] handle_IRQ_event+0x2d/0x60
 [<ffffffff802ad1e6>] __do_IRQ+0xa5/0x107
 [<ffffffff8028be7a>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026b4c9>] do_IRQ+0xe7/0xf5
 [<ffffffff8039386e>] evtchn_do_upcall+0x86/0xe0
 [<ffffffff8025e2a2>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026cb13>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026a121>] xen_idle+0x38/0x4a
 [<ffffffff80248e66>] cpu_idle+0x97/0xba

It obviously never happen on CPU#0, but I see it on all others (vcpus=4)

If you have any other opinion on what else may be causing this, it's
very welcome. I'll keep investigating.


-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] BUG() on soft lockup upon suspend/resume
  2006-10-10  0:29   ` Glauber de Oliveira Costa
@ 2006-10-10 10:06     ` Keir Fraser
  0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2006-10-10 10:06 UTC (permalink / raw)
  To: Glauber de Oliveira Costa; +Cc: xen-devel

On 10/10/06 01:29, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:

> BUG: soft lockup detected on CPU#1!
> 
> Call Trace:

The trace indicates the CPU is idling, so certainly this is a bogus
softlockup warning. I guess we already knew that. ;-)

> It obviously never happen on CPU#0, but I see it on all others (vcpus=4)

Probably worth instrumenting the warning message to print jiffies and
timestamp. Also the timestamp values for all other CPUs and see how much
they vary. We want to find out if one CPU is cimply lagging in touching the
softlockup watchdog, or if perhaps jiffies is updating in big jumps.

Given that this is so repro'able for you, it's weird noone else has reported
it.

 -- Keir

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-10-10 10:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-09 21:22 [PATCH] BUG() on soft lockup upon suspend/resume Glauber de Oliveira Costa
2006-10-09 22:22 ` Keir Fraser
2006-10-10  0:29   ` Glauber de Oliveira Costa
2006-10-10 10:06     ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.