From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Rathor Subject: Re: dom0 hang Date: Wed, 01 Jul 2009 20:19:31 -0700 Message-ID: <4A4C2743.5030703@oracle.com> References: <4A426D50.80401@oracle.com> Reply-To: mukesh.rathor@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4A426D50.80401@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: mukesh.rathor@oracle.com Cc: ackaouy@gmail.com, Dan Magenheimer , xen-devel , andrew thomas , "Kurt C. Hackel" List-Id: xen-devel@lists.xenproject.org Mukesh Rathor wrote: > Here are the details on the dom0 hang: > > xen: 3.4.0 > dom0: 2.6.18-128 > > dom0.vcpu0: spinning in schedule() on spinlock: spin_lock_irq(&rq->lock); > dom0.vcpu1: eip == ret after __HYPERVISOR_event_channel_op hypercall > > Just of of curiosity, I set breakpoint at the above ret in kdb, and it > never got hit. So I wondered why vcpu1 is not getting scheculed, and > noticed that xen.schedule always schedules vcpu0. Two cpus on the box, > other one is mostly in idle. > > anyways, I've turned lock debugging on in dom0 and reproducing it right > now. > > thanks, > Mukesh > Ok, here's what I have found on this: dom0 hang: vcpu0 is trying to wakeup a task and in try_to_wake_up() calls task_rq_lock(). since the task has cpu set to 1, it gets runq lock for vcpu1. next it calls resched_task() which results in sending IPI to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is spinning on it's runq lock in "schedule():spin_lock_irq(&rq->lock);", that vcpu0 is holding (and is waiting to return from the HCALL). As I had noticed before, vcpu0 never gets scheduled in xen. So looking further into xen: xen: Both vcpu's are on the same runq, in this case cpu1. But the priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result, the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see in kdb that the scheduler timer is not set on cpu 0. That would've allowed csched_load_balance() to kick in on cpu0. [Also, on cpu1, the accounting timer, csched_tick, is not set. Altho, csched_tick() is running on cpu0, it only checks runq for cpu0.] Looks like c/s 19500 changed csched_schedule(): - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE); + ret.time = (is_idle_vcpu(snext->vcpu) ? + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE)); The quickest fix for us would be to just back that out. BTW, just a comment on following (all in sched_credit.c): if ( svc->pri == CSCHED_PRI_TS_UNDER && !(svc->flags & CSCHED_FLAG_VCPU_PARKED) ) { svc->pri = CSCHED_PRI_TS_BOOST; } comibined with if ( snext->pri > CSCHED_PRI_TS_OVER ) __runq_remove(snext); Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me, since csched_schedule() never checks for time accumulated by a vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to a pcpu. if that vcpu never makes progress, essentially, the system has lost a physical cpu. Optionally, csched_schedule() should always check for cpu time accumulated and reduce the priority over time. I can't tell right off if it already does that. or something like that :)... my 2 cents. thanks, Mukesh *** : starting 3 star campaign against overuse of macros!