From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: deadlock in the credit2 Date: Fri, 14 Oct 2011 13:11:01 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap , Eunbyung Park Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 14/10/2011 12:47, "George Dunlap" wrote: > 2011/10/14 Eunbyung Park : >> IMHO, it seems to be deadlock when changing dom0's weight in credit2 >> scheduler. >> >> when the sched_adjust() in schedule.c is called, it grabs the >> schedule_lock after pausing all of the vcpus >> >> and then, csched_dom_cntl in sched_credit2.c, it also grab the >> schedule_lock by using vcpu_schedule_lock_irq(). >> >> In the credit2, all of the percpu schedule_lock points out same runqueue >> lock if they belong to same runqueue. >> >> Eventually, all of vcpu are paused except for itself running the code, >> and it try to grab schedule_lock that was grabbed by itself. >> >> Am I right? If I was wrong, please tell me my misunderstanding. > > Hmm, I think you may have discovered the source of a bug that people > have been reporting but I haven't had time to look into yet. > > Keir, I think that lock in schedule.c around SCHED(adjust) must be > wrong. By the time we grab that lock, grabbing it will be completely > pointless. What are we going to be racing against? In any case, the > actual scheduler should be responsible for grabbing locks; there's no > reason that the scheduler can't grab whatever lock it needs inside > that function. I haven't done a deep analysis, but my initial > instinct is to just get rid of it. What do you think? Fine by me. The synchronisation in that function looks pretty fragile. It's probably outdated too. -- Keir >> if ( d == current->domain ) >> vcpu_schedule_lock_irq(current); >> >> It was very hard to understan for me..:) What does it exactly mean? > > You're asking what "current" means? "current" is a macro that always > resolves to the vcpu which is running on the current processor. > > sched_adjust() seems to be trying to avoid scheduling races in general > by pausing all vcpus before calling the per-scheduler function. But > if a VM is calling the op on itself, the vcpu making the hypercall > can't pause itself. So in that case (current->domain == d) will be > true, so sched_adjust() grab the schedule lock of that vm instead. > > But really all that locking should be handled in the scheduler > function, not by the generic code. It knows best what needs to be > locked when. > > -George