xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Tomasz Wroblewski <tomasz.wroblewski@citrix.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Fri, 25 Jan 2013 12:05:52 +0100	[thread overview]
Message-ID: <51026710.8040806@ts.fujitsu.com> (raw)
In-Reply-To: <51026F2202000078000B996C@nat28.tlf.novell.com>

Am 25.01.2013 11:40, schrieb Jan Beulich:
>>>> On 25.01.13 at 11:35, Juergen Gross<juergen.gross@ts.fujitsu.com>  wrote:
>> Am 25.01.2013 11:31, schrieb Jan Beulich:
>>>>>> On 25.01.13 at 11:23, Juergen Gross<juergen.gross@ts.fujitsu.com>   wrote:
>>>> Am 25.01.2013 11:15, schrieb Jan Beulich:
>>>>>>>> On 25.01.13 at 10:45, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>    wrote:
>>>>>
>>>>>>> I think I had already raised the question of the placement of
>>>>>>> this rcu_barrier() here, and the lack of a counterpart in the
>>>>>>> suspend portion of the path. Keir? Or should
>>>>>>> rcu_barrier_action() avoid calling process_pending_softirqs()
>>>>>>> while still resuming, and instead call __do_softirq() with all but
>>>>>>> RCU_SOFTIRQ masked (perhaps through a suitable wrapper,
>>>>>>> or alternatively by open-coding its effect)?
>>>>>>>
>>>>>> Though I recall these vcpu_wake crashes happen also from other entry
>>>>>> points in enter_state but rcu_barrier, so I dont think removing that
>>>>>> helps much. Just was unable to get a proper log of them today due to
>>>>>> most of them being cut in half. Will try bit more.
>>>>>
>>>>> In which case making __do_softirq() itself honor being in the
>>>>> suspend/resume path might still be an option.
>>>>>
>>>>>> My belief is that as long as vcpu_migrate is not called in
>>>>>> cpu_disable_scheduler, the vcpu->processor shall continue to point to
>>>>>> offline cpu. Which will crash if the vcpu_wake is called for that vcpu.
>>>>>> If vcpu_migrate is called, then vcpu_wake will still be called with some
>>>>>> frequency but since vcpu->processor shall point to online cpu, and it
>>>>>> won't crash. So likely avoiding the wakes here completely is not the
>>>>>> goal, just the offline ones.
>>>>>
>>>>> But you neglect the fact that waking vCPU-s at this point is
>>>>> unnecessary anyway (they have nowhere to run on).
>>>>
>>>> What about adding a global scheduler_disable() in freeze_domains() and a
>>>> scheduler_enable() in thaw_domains() which will switch scheduler locking to
>>>> a global lock (or disable it at all?). This should solve all problems
>>>> without
>>>> any complex changes of current behavior.
>>>
>>> I don't see how this would address the so far described
>>> shortcomings.
>>
>> The crash happens due to an access to the scheduler percpu area which isn't
>> allocated at the moment. The accessed element is the address of the
>> scheduler
>> lock for this cpu. Disabling the percpu locking scheme of the scheduler
>> while
>> the non-boot cpus are offline will avoid the crash.
>
> Ah, okay. But that wouldn't prevent other bad effects that could
> result from vCPU-s pointing to offline pCPU-s. Hence I think such
> a solution, even if sufficient for now, would set us up for future
> similar (and similarly hard to debug) issues.

To avoid this problem you would have to change the suspend logic, I think.

We could take the cpus just physically offline without removing all the
state information from the system. During resume the old state and percpu areas
would just be reused.

While this would be a clean solution, it's not a really small task to do...


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

  reply	other threads:[~2013-01-25 11:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57   ` Tomasz Wroblewski
2013-01-23 17:01     ` Tomasz Wroblewski
2013-01-23 17:50     ` Tomasz Wroblewski
2013-01-24  6:18 ` Juergen Gross
2013-01-24 14:26   ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36     ` Jan Beulich
2013-01-24 15:57       ` George Dunlap
2013-01-24 16:25       ` Tomasz Wroblewski
2013-01-24 16:56         ` Jan Beulich
2013-01-25  9:07           ` Tomasz Wroblewski
2013-01-25  9:36             ` Jan Beulich
2013-01-25  9:45               ` Tomasz Wroblewski
2013-01-25 10:15                 ` Jan Beulich
2013-01-25 10:18                   ` Tomasz Wroblewski
2013-01-25 10:29                     ` Jan Beulich
2013-01-25 10:23                   ` Juergen Gross
2013-01-25 10:29                     ` Tomasz Wroblewski
2013-01-25 10:31                     ` Jan Beulich
2013-01-25 10:35                       ` Juergen Gross
2013-01-25 10:40                         ` Jan Beulich
2013-01-25 11:05                           ` Juergen Gross [this message]
2013-01-25 11:56                         ` Tomasz Wroblewski
2013-01-25 12:27                           ` Jan Beulich
2013-01-25 13:58                             ` Tomasz Wroblewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51026710.8040806@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tomasz.wroblewski@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).