From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Fri, 25 Jan 2013 11:35:17 +0100
Message-ID: <51025FE5.1070200@ts.fujitsu.com>
References: <5100070F.7010808@citrix.com>
	<5100D229.4030906@ts.fujitsu.com>	<510144A3.9060302@citrix.com>	<5101630D02000078000B93AD@nat28.tlf.novell.com>	<51016065.3080902@citrix.com>	<510175E802000078000B94A1@nat28.tlf.novell.com>	<51024B56.20706@citrix.com>	<5102603302000078000B985C@nat28.tlf.novell.com>	<5102541D.1070408@citrix.com>	<5102694B02000078000B98B7@nat28.tlf.novell.com>	<51025D2C.3040005@ts.fujitsu.com>
	<51026CFC02000078000B9917@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <51026CFC02000078000B9917@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>, Tomasz Wroblewski <tomasz.wroblewski@citrix.com>, "Keir (Xen.org)" <keir@xen.org>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

Am 25.01.2013 11:31, schrieb Jan Beulich:
>>>> On 25.01.13 at 11:23, Juergen Gross<juergen.gross@ts.fujitsu.com>  wrote:
>> Am 25.01.2013 11:15, schrieb Jan Beulich:
>>>>>> On 25.01.13 at 10:45, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>   wrote:
>>>
>>>>> I think I had already raised the question of the placement of
>>>>> this rcu_barrier() here, and the lack of a counterpart in the
>>>>> suspend portion of the path. Keir? Or should
>>>>> rcu_barrier_action() avoid calling process_pending_softirqs()
>>>>> while still resuming, and instead call __do_softirq() with all but
>>>>> RCU_SOFTIRQ masked (perhaps through a suitable wrapper,
>>>>> or alternatively by open-coding its effect)?
>>>>>
>>>> Though I recall these vcpu_wake crashes happen also from other entry
>>>> points in enter_state but rcu_barrier, so I dont think removing that
>>>> helps much. Just was unable to get a proper log of them today due to
>>>> most of them being cut in half. Will try bit more.
>>>
>>> In which case making __do_softirq() itself honor being in the
>>> suspend/resume path might still be an option.
>>>
>>>> My belief is that as long as vcpu_migrate is not called in
>>>> cpu_disable_scheduler, the vcpu->processor shall continue to point to
>>>> offline cpu. Which will crash if the vcpu_wake is called for that vcpu.
>>>> If vcpu_migrate is called, then vcpu_wake will still be called with some
>>>> frequency but since vcpu->processor shall point to online cpu, and it
>>>> won't crash. So likely avoiding the wakes here completely is not the
>>>> goal, just the offline ones.
>>>
>>> But you neglect the fact that waking vCPU-s at this point is
>>> unnecessary anyway (they have nowhere to run on).
>>
>> What about adding a global scheduler_disable() in freeze_domains() and a
>> scheduler_enable() in thaw_domains() which will switch scheduler locking to
>> a global lock (or disable it at all?). This should solve all problems
>> without
>> any complex changes of current behavior.
>
> I don't see how this would address the so far described
> shortcomings.

The crash happens due to an access to the scheduler percpu area which isn't
allocated at the moment. The accessed element is the address of the scheduler
lock for this cpu. Disabling the percpu locking scheme of the scheduler while
the non-boot cpus are offline will avoid the crash.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html