From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. Date: Mon, 7 Dec 2015 12:21:26 +0000 Message-ID: <566579C6.6080600@citrix.com> References: <20151113171006.15775.61105.stgit@Solace.station> <56548323.7050702@citrix.com> <1448385272.7833.19.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1a5unM-0000nd-1v for xen-devel@lists.xenproject.org; Mon, 07 Dec 2015 12:21:32 +0000 In-Reply-To: <1448385272.7833.19.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli , xen-devel@lists.xenproject.org Cc: George Dunlap , Juergen Gross , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 24/11/15 17:14, Dario Faggioli wrote: > On Tue, 2015-11-24 at 15:32 +0000, George Dunlap wrote: >> On 13/11/15 17:10, Dario Faggioli wrote: >>> >>> During suspend, the pCPUs are not removed from their >>> pools with the standard procedure (which would involve >>> schedule_cpu_switch(). During resume, they: >>> 1) are assigned to the default cpupool (CPU_UP_PREPARE >>> phase); >>> 2) are moved to the pool they were in before suspend, >>> via schedule_cpu_switch() (CPU_ONLINE phase) >>> >>> During resume, scheduling (even if just the idle loop) >>> can happen right after the CPU_STARTING phase(before >>> CPU_ONLINE), i.e., before the pCPU is put back in its >>> pool. >> >> So why are we restoring scheduling stuff during CPU_STARTING, but >> only >> putting cpus back in their pools at CPU_ONLINE? >> > Indeed. Much worse: we open the CPU for scheduling before it's back in > its pool (this is all what this bug is about!). this never made much > sense to me. > >> At some point I think I knew the answer to this, but it's worth >> revisiting it. >> > So, I once had a look, and tried shuffling things around, in a way in > which the order made more sense to me, but it does not work 'out of the > box'. > > The issues have, AFAICR, to do with the fact that memory allocations > (for the per-pCPU scheduling data, need IRQs enabled (which means > CPU_UP_PREPARE, much rather than CPU_STARTING, is what we want) on the > scheduling side, and other data that need to be ready and initialized > in order to setup cpupools (e.g., per_cpu(cpupool, cpu)). > > As said, I don't recall all the details, sorry. I recall thinking that > a solution would involve putting the CPU in the pool earlier, but that > in turn calls for other work (e.g., tweaking the priorities of the > callbacks for avoiding races). > > It's on my list of things to do, but not with super high priority. Are > you saying that we should drop this patch, and do the callback > reordering/refactoring first? Sorry, meant to respond to this -- no, I don't think refactoring is a prerequisite. Let me give it another look-over today. -George