From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake() Date: Fri, 11 Oct 2013 11:36:56 +0100 Message-ID: <5257D4C8.7050908@eu.citrix.com> References: <5256EB63.7070508@citrix.com> <5257C0FE02000078000FA711@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VUa5o-0000Fb-FC for xen-devel@lists.xenproject.org; Fri, 11 Oct 2013 10:37:12 +0000 In-Reply-To: <5257C0FE02000078000FA711@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Andrew Cooper , David Vrabel , Keir Fraser Cc: xen-devel , Juergen Gross List-Id: xen-devel@lists.xenproject.org On 11/10/13 08:12, Jan Beulich wrote: >>>> On 10.10.13 at 20:27, Keir Fraser wrote: >> On 10/10/2013 19:01, "Andrew Cooper" wrote: >> >>>> Just taking the lock for the old processor seemed sufficient to me as >>>> anything seeing the new value would lock and unlock using the same new >>>> value. But do we need to take the schedule_lock for the new processor >>>> as well (in the right order of course)? >>> David and I have been discussing this for a while, involving a >>> whiteboard, and not come to a firm conclusion either way. >>> >>> From my point of view, holding the appropriate vcpu schedule lock >>> entitles you to play with vcpu scheduling state, which involves >>> following v->sched_priv which we update outside the critical region later. >>> >>> Only taking the one lock still leaves a race condition where another cpu >>> can follow the new v->processor and obtain the schedule lock, at which >>> point we have two threads both working on the internals of a vcpu. The >>> change below certainly will fix the current bug of locking one spinlock >>> and unlocking another. >>> >>> My gut feeling is that we do need to take both locks to be safe in terms >>> of data access, but we would appreciate advice from someone more >>> familiar with the scheduler locking. >> If it's that tricky to work out, why not just take the two locks, >> appropriately ordered? This isn't a hot path. > Shouldn't we rather fix the locking mechanism itself, making > vcpu_schedule_lock...() return the lock, such that the unlock > will unavoidably use the correct lock? That's an idea; but I half wonder if it wouldn't be better to actually keep vcpu_schedule_unlock(), but pass it the old lock. Then for debug builds we can ASSERT that the lock hasn't changed. -George