From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume Date: Fri, 25 Jan 2013 11:23:40 +0100 Message-ID: <51025D2C.3040005@ts.fujitsu.com> References: <5100070F.7010808@citrix.com> <5100D229.4030906@ts.fujitsu.com> <510144A3.9060302@citrix.com> <5101630D02000078000B93AD@nat28.tlf.novell.com> <51016065.3080902@citrix.com> <510175E802000078000B94A1@nat28.tlf.novell.com> <51024B56.20706@citrix.com> <5102603302000078000B985C@nat28.tlf.novell.com> <5102541D.1070408@citrix.com> <5102694B02000078000B98B7@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5102694B02000078000B98B7@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: George Dunlap , Tomasz Wroblewski , "Keir (Xen.org)" , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Am 25.01.2013 11:15, schrieb Jan Beulich: >>>> On 25.01.13 at 10:45, Tomasz Wroblewski wrote: > >>> I think I had already raised the question of the placement of >>> this rcu_barrier() here, and the lack of a counterpart in the >>> suspend portion of the path. Keir? Or should >>> rcu_barrier_action() avoid calling process_pending_softirqs() >>> while still resuming, and instead call __do_softirq() with all but >>> RCU_SOFTIRQ masked (perhaps through a suitable wrapper, >>> or alternatively by open-coding its effect)? >>> >> Though I recall these vcpu_wake crashes happen also from other entry >> points in enter_state but rcu_barrier, so I dont think removing that >> helps much. Just was unable to get a proper log of them today due to >> most of them being cut in half. Will try bit more. > > In which case making __do_softirq() itself honor being in the > suspend/resume path might still be an option. > >> My belief is that as long as vcpu_migrate is not called in >> cpu_disable_scheduler, the vcpu->processor shall continue to point to >> offline cpu. Which will crash if the vcpu_wake is called for that vcpu. >> If vcpu_migrate is called, then vcpu_wake will still be called with some >> frequency but since vcpu->processor shall point to online cpu, and it >> won't crash. So likely avoiding the wakes here completely is not the >> goal, just the offline ones. > > But you neglect the fact that waking vCPU-s at this point is > unnecessary anyway (they have nowhere to run on). What about adding a global scheduler_disable() in freeze_domains() and a scheduler_enable() in thaw_domains() which will switch scheduler locking to a global lock (or disable it at all?). This should solve all problems without any complex changes of current behavior. Juergen -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html