From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Rathor Subject: Re: dom0 hang Date: Thu, 02 Jul 2009 18:18:33 -0700 Message-ID: <4A4D5C69.5020409@oracle.com> References: <4A426D50.80401@oracle.com> <4A4C2743.5030703@oracle.com> <4A4D0710.10309@oracle.com> <4A4D2253.8070807@oracle.com> <4A4D4D78.1060609@oracle.com> Reply-To: mukesh.rathor@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4A4D4D78.1060609@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: mukesh.rathor@oracle.com Cc: George Dunlap , "Kurt C. Hackel" , "Tian, Kevin" , xen-devel@lists.xensource.com, Yu Ke List-Id: xen-devel@lists.xenproject.org Hi Kevin/Yu: acpi_processor_idle() { sched_tick_suspend(); /* * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer, * which will break the later assumption of no sofirq pending, * so add do_softirq */ if ( softirq_pending(smp_processor_id()) ) do_softirq(); <=============== local_irq_disable(); if ( softirq_pending(smp_processor_id()) ) { local_irq_enable(); sched_tick_resume(); cpufreq_dbs_timer_resume(); return; } wouldn't the do_softirq() call scheduler with tick suspended, and the scheduler then context switches to another vcpu0 (with *_BOOST) which would result in the stuck vcpu I described? thanks Mukesh Mukesh Rathor wrote: > ah, i totally missed csched_tick(): > if ( !is_idle_vcpu(current) ) > csched_vcpu_acct(cpu); > > yeah, looks like that's what is going on. i'm still waiting to > reproduce. at first glance, looking at c/s 19460, seems like > suspend/resume, well at least the resume, should happen in > csched_schedule()..... > > thanks, > Mukesh > > > George Dunlap wrote: >> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke >> who wrote cs 19460] >> >> The functionality I was talking about, subtracting credits and >> clearing BOOST, happens in csched_vcpu_acct() (which is different than >> csched_acct()). vcpu_acct() is called from csched_tick(), which >> should still happen every 10ms on every cpu. >> >> The patch I referred to (cs 19460) disables and re-enables tickers in >> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the >> processor idles. I can't see anywhere else that tickers are disabled, >> so it's probably something not properly re-enabling them again. >> >> Try applying the attached patch to see if that changes anything. (I'm >> on the road, so I can't repro the lockup issue.) If that doesn't >> work, try disabling c-states and see if that helps. Then at least >> we'll know where the problem lies. >> >> -George >> >> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh >> Rathor wrote: >>> that seems to only suspend csched_pcpu.ticker which is csched_tick >>> that is >>> only sorting local runq. >>> >>> again, we are concerned about csched_priv.master_ticker that calls >>> csched_acct? correct, so i can trace that? >>> >>> thanks, >>> mukesh >>> >>> >>> George Dunlap wrote: >>>> Ah, I see that there's been some changes to tick stuff with the >>>> c-state (e.g., cs 19460). It looks like they're supposed to be going >>>> still, but perhaps the tick_suspend() and tick_resume() aren't being >>>> called properly. Let me take a closer look. >>>> >>>> -George >>>> >>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh Rathor >>>> wrote: >>>>> George Dunlap wrote: >>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh >>>>>> Rathor >>>>>> wrote: >>>>>>> dom0 hang: >>>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls >>>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock >>>>>>> for vcpu1. next it calls resched_task() which results in sending >>>>>>> IPI >>>>>>> to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op >>>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running, >>>>>>> and is >>>>>>> spinning on it's runq lock in >>>>>>> "schedule():spin_lock_irq(&rq->lock);", >>>>>>> that vcpu0 is holding (and is waiting to return from the HCALL). >>>>>>> >>>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So >>>>>>> looking further into xen: >>>>>>> >>>>>>> xen: >>>>>>> Both vcpu's are on the same runq, in this case cpu1. But the >>>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result, >>>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I >>>>>>> see in >>>>>>> kdb that the scheduler timer is not set on cpu 0. That would've >>>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on >>>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho, >>>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.] >>>>>>> >>>>>>> Looks like c/s 19500 changed csched_schedule(): >>>>>>> >>>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE); >>>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ? >>>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE)); >>>>>>> >>>>>>> The quickest fix for us would be to just back that out. >>>>>>> >>>>>>> >>>>>>> BTW, just a comment on following (all in sched_credit.c): >>>>>>> >>>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER && >>>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) ) >>>>>>> { >>>>>>> svc->pri = CSCHED_PRI_TS_BOOST; >>>>>>> } >>>>>>> comibined with >>>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER ) >>>>>>> __runq_remove(snext); >>>>>>> >>>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To >>>>>>> me, >>>>>>> since csched_schedule() never checks for time accumulated by a >>>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a >>>>>>> vcpu to a >>>>>>> pcpu. if that vcpu never makes progress, essentially, the system >>>>>>> has lost a physical cpu. Optionally, csched_schedule() should >>>>>>> always >>>>>>> check for cpu time accumulated and reduce the priority over time. >>>>>>> I can't tell right off if it already does that. or something like >>>>>>> that :)... my 2 cents. >>>>>> Hmm... what's supposed to happen is that eventually a timer tick will >>>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be >>>>>> debited 10ms worth of credit. Eventually, it will go into OVER, and >>>>>> lose BOOST. If it's "inactive", then when the tick happens, it will >>>>>> be set to "active" and be debited 10ms again, setting it directly >>>>>> into >>>>>> OVER (and thus also losing boost). >>>>>> >>>>>> Can you see if the timer ticks are still happening, and perhaps put >>>>>> some tracing it to verify that what I described above is happening? >>>>>> >>>>>> -George >>>>> George, >>>>> >>>>> Is that in csched_acct()? Looks like that's somehow gotten removed. If >>>>> true, then may be that's the fundamental problem to chase. >>>>> >>>>> Here's what the trq looks like when hung, not in any schedule >>>>> function: >>>>> >>>>> [0]xkdb> dtrq >>>>> CPU[00]: NOW:0x00003f2db9af369e >>>>> 1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000 >>>>> 2: exp=0x00003ee347ece164 fn:time_calibration data:0000000000000000 >>>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000 >>>>> 4: exp=0x00003f055895e25f fn:plt_overflow data:0000000000000000 >>>>> 5: exp=0x00003ee353810216 fn:rtc_update_second data:ffff83007f0226d8 >>>>> >>>>> CPU[01]: NOW:0x00003f2db9af369e >>>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000 >>>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback data:ffff83007f022a68 >>>>> >>>>> >>>>> thanks >>>>> Mukesh >>>>> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel