From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Wed, 16 Feb 2011 15:28:39 +0100 Message-ID: <4D5BDF17.5000909@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <4D4A72D8.3020502@ts.fujitsu.com> <4D4C08B6.30600@amd.com> <4D4FE7E2.9070605@amd.com> <4D4FF452.6060508@ts.fujitsu.com> <4D50D80F.9000007@ts.fujitsu.com> <4D517051.10402@amd.com> <4D529BD9.5050200@amd.com> <4D52A2CD.9090507@ts.fujitsu.com> <4D5388DF.8040900@ts.fujitsu.com> <4D53AF27.7030909@amd.com> <4D53F3BC.4070807@amd.com> <4D54D478.9000402@ts.fujitsu.com> <4D54E79E.3000800@amd.com> <4D5A29C0.4050702@ts.fujitsu.com > <4D5B9D2B.107@ts.fujitsu.com> <4D5BDAF8.50800@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D5BDAF8.50800@ts.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap Cc: Andre Przywara , "xen-devel@lists.xensource.com" , "Diestelhorst, Stephan" List-Id: xen-devel@lists.xenproject.org On 02/16/11 15:11, Juergen Gross wrote: > On 02/16/11 14:54, George Dunlap wrote: >> Andre (and Juergen), can you try again with the attached patch? >> >> What the patch basically does is try to make "cpu_disable_scheduler()" >> do what it seems to say it does. :-) Namely, the various >> scheduler-related interrutps (both per-cpu ticks and the master tick) >> is a part of the scheduler, so disable them before doing anything, and >> don't enable them until the cpu is really ready to go again. >> >> To be precise: >> * cpu_disable_scheduler() disables ticks >> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >> and does it after inserting the idle vcpu >> * Modify semantics, s.t., {alloc,free}_pdata() don't actually start or >> stop tickers >> + Call tick_{resume,suspend} in cpu_{up,down}, respectively > > I tried this before :-) > It didn't work for Andre, but may be there were some bits missing. > >> * Modify credit1's tick_{suspend,resume} to handle the master ticker >> as well. >> >> With this patch (if dom0 doesn't get wedged due to all 8 vcpus being >> on one pcpu), I can perform thousands of operations successfully. > > Nice. I'll try later. In the moment I'm testing another patch (attached > for review, if you like). I think I've identified two possible races. My patch works for me. I think I have to rework the locking for credit1, but that shouldn't be too hard. My machine survived 10000 iterations of your script with additional consistency checks in the scheduler. Without my patch the machine crashed after less then 500 iterations. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html