From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank van der Linden Subject: Re: Re: [PATCH 1/4] CPU online/offline support in Xen Date: Wed, 10 Sep 2008 10:05:06 -0600 Message-ID: <48C7F032.5000400@Sun.COM> References: <823A93EED437D048963A3697DB0E35DE01BE83CC@pdsmsx414.ccr.corp.intel.com> <481ad8630809100559k2ecdb5ffidab0a2754f0cf869@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: <481ad8630809100559k2ecdb5ffidab0a2754f0cf869@mail.gmail.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Haitao Shan Cc: xen-devel@lists.xensource.com, "Shan, Haitao" , Keir Fraser List-Id: xen-devel@lists.xenproject.org Haitao Shan wrote: > Agree. Placing migration in stop_machine context will definitely make > our jobs easier. I will start making a new patch tomorrow. :) > I place the migraton code outside the stop_machine_run context, partly > because I am not quite sure how long it will take to migrate all the > vcpus away. If it takes too much time, all useful works are blocked > since all cpus are in the stop_machine context. Of course, I borrowed > the ideas from kernel, which also let me made the desicion. > > 2008/9/10 Keir Fraser : > >> I feel this is more complicated than it needs to be. >> >> How about clearing VCPUs from the offlined CPU's runqueue from the very end >> of __cpu_disable()? At that point all other CPUs are safely in softirq >> context with IRQs disabled, and we are running on the correct CPU (being >> offlined). We could have a hook into the scheduler subsystem at that point >> to break affinities, assign to different runqueues, etc. We would just need >> to be careful not to try an IPI. :-) This approach would not need a >> cpu_schedule_map (which is really increasing code fragility imo, by creating >> possible extra confusion about which cpumask is the wright one to use in a >> given situation). >> >> My feeling, unless I've missed something, is that this would make the patch >> quite a bit smaller and with a smaller spread of code changes. >> >> -- Keir >> This would also address some problems I saw with the patch: race conditions regarding migration of VCPUs, because other CPUs may call runq_tickle. Or a hypercall may come in changing the VCPU affinity, since things are done in 2 stages. The changes I have are more complicated, because I was working off 3.1.4, which is our current Xen version. It doesn't have things like stop_machine_run. But if the patch is simplified in this manner, it is easier for us to use, and we can just backport things like stop_machine_run for the time being. The other issue I was seeing was that cpu_up sometimes did not succeed in actually getting a CPU to boot. But there have been a few fixes to smpboot.c, so I'll have to see if that always works now. - Frank