From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber de Oliveira Costa Subject: Re: [PATCH] BUG() on soft lockup upon suspend/resume Date: Mon, 9 Oct 2006 21:29:10 -0300 Message-ID: <20061010002910.GC28540@redhat.com> References: <20061009212242.GB28540@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org > > > In systems with vcpu > 1, a BUG due to a detected soft lockup seems to be > > triggered after system resume/suspend. This is probably due to the lack of > > seqlocking around the region that does the local time processing. > > We do SMP save/restore tests regularly and do not see this issue. It ought > to be avoided by the fact that, when we bring up a CPU, we > touch_softlockup_watchdog() in cpu_bringup(), before enabling interrupts. > For CPU0 on resume, the touch is done in time_resume() in > arch/i386/kernel/time-xen.c. This happens not only (once) when the system comes back. It do happen a lot after it. So even if the first touch is right, I suspect this issue is more related to a situation in which we are already resumed for a long time, with all set up > > I think we need to understand the issue you are hitting a bit more before > deciding on the right fix. Right, here it goes more info: I'm on a 8-way x86_64 machine, and This is the sort of info I see repeatedly: BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xf8/0x113 [] timer_interrupt+0x38a/0x3d8 [] handle_IRQ_event+0x2d/0x60 [] __do_IRQ+0xa5/0x107 [] _local_bh_enable+0x61/0xc5 [] do_IRQ+0xe7/0xf5 [] evtchn_do_upcall+0x86/0xe0 [] do_hypervisor_callback+0x1e/0x2c [] hypercall_page+0x3aa/0x1000 [] hypercall_page+0x3aa/0x1000 [] raw_safe_halt+0x84/0xa8 [] xen_idle+0x38/0x4a [] cpu_idle+0x97/0xba It obviously never happen on CPU#0, but I see it on all others (vcpus=4) If you have any other opinion on what else may be causing this, it's very welcome. I'll keep investigating. -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom"