From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Only CPU0 active after ACPI S3, xen 4.1.3 Date: Wed, 16 Jan 2013 11:58:19 +0000 Message-ID: <50F695DB.6000402@eu.citrix.com> References: <50B7AF8A.5010304@invisiblethingslab.com> <50B8DAEA0200007800090B69@nat28.tlf.novell.com> <50B8DC55.8000308@invisiblethingslab.com> <50BC653E02000078000AD28C@nat28.tlf.novell.com> <50BDFA38.7030009@invisiblethingslab.com> <50D335E6.902@invisiblethingslab.com> <50D39C73.906@invisiblethingslab.com> <50D3EB03.4000109@invisiblethingslab.com> <50D4322102000078000B1F80@nat28.tlf.novell.com> <50D46534.2010304@invisiblethingslab.com> <50D4757202000078000B2042@nat28.tlf.novell.com> <50D46B47.8000003@invisiblethingslab.com> <50D48090.6060603@invisiblethingslab.com> <50D4967602000078000B2114@nat28.tlf.novell.com> <3368417890369848263@unknownmsgid> <50D6713C.2000202@invisiblethingslab.com> <50E554CC02000078000B29BD@nat28.tlf.novell.com> <50F68D8F.7030704@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <50F68D8F.7030704@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Ben Guthro , Marek Marczykowski , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 16/01/13 11:22, George Dunlap wrote: > On 03/01/13 08:52, Jan Beulich wrote: >>>>> On 31.12.12 at 13:51, Ben Guthro wrote: >>> My current suspicion is irq delivery, because of the following >>> messages I >>> see on the console on the way down: >>> >>> (XEN) Preparing system for ACPI S3 state. >>> (XEN) Disabling non-boot CPUs ... >>> (XEN) Broke affinity for irq 1 >>> (XEN) Broke affinity for irq 9 >>> (XEN) Broke affinity for irq 12 >>> (XEN) Broke affinity for irq 26 >>> (XEN) Broke affinity for irq 30 >>> (XEN) Broke affinity for irq 1 >>> (XEN) Broke affinity for irq 1 >>> (XEN) Entering ACPI S3 state. >> No, that's normal behavior. But you ought to be able to verify by >> pinning Dom0's vCPU 0 to pCPU 0, and within Dom0 setting the >> affinities of all interrupts to CPU 0 - that should make all of these >> messages go away. >> >>> Jan - any suggestions on how to procede with this? FWIW, Xen 4.0.y >>> suspends >>> on this machine reliably. >> With two scheduler related changesets having got spotted as >> problematic by now (23255:1f95b55ef427 and 23269:d67e4d12723f, >> albeit the latter not really scheduler specific), I'm really very much >> hoping for George to have an idea, the more that ... > > Marek, > > Sorry I haven't been following the thread -- have you tested this with > 4.2, with and without the corresponding patch reverted > (25079:d5ccb2d1dbd1)? That might tell us whether the patch itself was > wrong, or whether there was a mistake in back-porting the patch > (possibly because of different invariants outside of the patched code). > > Jan, the commit message isn't very informative -- can you point me to > a conversation describing the problem you're fixing wrt > suspend/resume, and/or describe what you were trying to do? Given the > results, the whole thing about not disabling scheduling during suspend > seems a bit suspect... In particular, just on a fairly cursory bit of function call skimming, it looks like: * This change means that cpupool.c:cpu_callback() won't call cpupool_cpu_add() when resuming * cpupool_cpu_add() does a bunch of paperwork (which would be unnecessary given the changes re suspend), but also calls cpupool_assign_cpu_locked() * cpupool_assign_cpu_locked() calls schedule_cpu_switch() * schedule_cpu_switch() calls the scheduler's tick_resume() So is it possible that on resume ticks are not being re-enabled, or something like that? (And possibly related to Ben's problem, ticks are not being disabled on suspend?) -George