From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH 3/3] xen: cpupools: avoid crashing if shutting down with free CPUs Date: Fri, 08 May 2015 15:18:53 +0200 Message-ID: <554CB7BD.3010201@suse.com> References: <20150506143312.9537.1503.stgit@Solace.station> <20150506151041.9537.62969.stgit@Solace.station> <554C8DF0.6010304@suse.com> <554CAD480200007800078288@suse.com> <554C9450.6080405@suse.com> <1431090740.4957.90.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431090740.4957.90.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: George Dunlap , Andrew Cooper , Keir Fraser , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 05/08/2015 03:12 PM, Dario Faggioli wrote: > On Fri, 2015-05-08 at 12:47 +0200, Juergen Gross wrote: >> On 05/08/2015 12:34 PM, Jan Beulich wrote: > >>>> (XEN) Xen call trace: >>>> (XEN) [] cpu_up+0xaf/0xfe >>>> (XEN) [] enable_nonboot_cpus+0x4f/0xfc >>>> (XEN) [] enter_state_helper+0x2cb/0x370 >>>> (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 >>>> (XEN) [] do_tasklet_work+0x78/0xab >>>> (XEN) [] do_tasklet+0x5e/0x8a >>>> (XEN) [] idle_loop+0x56/0x70 >>>> (XEN) >>>> (XEN) >>>> (XEN) **************************************** >>>> (XEN) Panic on CPU 0: >>>> (XEN) Xen BUG at cpu.c:149 >>>> (XEN) **************************************** >>> >>> Which would seem to more likely be a result of patch 2. Having >>> taken a closer look - is setting ret to -EINVAL at the top of >>> cpupool_cpu_add() really correct? I.e. it is guaranteed that >>> at least one of the two places altering ret will always be run >>> into? If it is, then I'd still suspect one of the two >>> cpupool_assign_cpu_locked() invocations to be failing. >> >> Indeed. >> > Not really. > > Well, the problem is, of course, related, as your test shows, and I now > see why this happens, but it's all patch 3 fault (see below). > > So what's in tree right now is ok and there is no need to revert. I > believe the best thing to do is for me to send a new, fixed, version of > patch 3. The fix would probably still be just changing "int ret = > -EINVAL" to "int ret = 0" in cpupool_cpu_add(), but that should be done > within patch 3, not as a fix to patch 2, which was indeed right. > > What do you both think? > >> Setting ret to 0 initially does the trick. >> > Yes. However, as far as patch 2 is concerned, that initialization to > -EINVAL is ok, as we are sure and it is guaranted that at least one of > the two places altering ret is executed, as Jan was wandering. (Well, > because of that, the initialization is not that important, I just added > it to be extra-cautious.) > > The problem is, in patch 3, when that code becomes: > > int ret = -EINVAL; > > if ( system_state == SYS_STATE_resume ) > { > > ret = cpupool_assign_cpu_locked(*c, cpu); > } > else > { > > ret = cpupool_assign_cpu_locked(cpupool0, cpu); > } > > In fact, now, if the cpu was free when suspending, we won't find it > anywhere when looking for it in the system_state==SYS_STATE_resume case, > and hence we won't call cpupool_assign_cpu_locked(). Then, because of > the 'if() else', we don't call it below either (as we did before), and > hence no one alters 'ret'. > > That is my point, actually: in patch 2, we are sure ret will be altered. > In patch 3, it's no longer guaranteed that we alter ret, and the case in > which we don't is perfectly fine, so ret should be inited to 0. > >> With this >> modification suspend/resume and power off are working with cpus >> not allocated to any cpupool. >> > Great to know, thanks for testing... and sorry for not having been able > to do so myself. My test box allows me to "echo mem >/sys/power/state", > and it seems to suspend ok (e.g., power led is blinking)... but then it > just does not resume. :-/ > >> Dario, I suggest you write another patch to correct patch 2. >> >> For patch 3 with patch 2 corrected: >> >> Reviewed-by: Juergen Gross >> Tested-by: Juergen Gross >> > If you agree on my plan of sending v2 of patch3, and if that will really > be just the same of v1, but with "int ret=0", I'll stick these tags > there, unless you tell me not to. I don't mind how you are doing it. The machine crashed even without patch 2 when suspending with at least one free cpu, so this patch isn't making anything worse. You can still apply my 2 *.by: tags, of course. Juergen