From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Rathor Subject: Re: dom0 hang Date: Thu, 02 Jul 2009 17:14:48 -0700 Message-ID: <4A4D4D78.1060609@oracle.com> References: <4A426D50.80401@oracle.com> <4A4C2743.5030703@oracle.com> <4A4D0710.10309@oracle.com> <4A4D2253.8070807@oracle.com> Reply-To: mukesh.rathor@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap Cc: "Tian, Kevin" , xen-devel@lists.xensource.com, Yu Ke List-Id: xen-devel@lists.xenproject.org ah, i totally missed csched_tick(): if ( !is_idle_vcpu(current) ) csched_vcpu_acct(cpu); yeah, looks like that's what is going on. i'm still waiting to reproduce. at first glance, looking at c/s 19460, seems like suspend/resume, well at least the resume, should happen in csched_schedule()..... thanks, Mukesh George Dunlap wrote: > [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke > who wrote cs 19460] > > The functionality I was talking about, subtracting credits and > clearing BOOST, happens in csched_vcpu_acct() (which is different than > csched_acct()). vcpu_acct() is called from csched_tick(), which > should still happen every 10ms on every cpu. > > The patch I referred to (cs 19460) disables and re-enables tickers in > xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the > processor idles. I can't see anywhere else that tickers are disabled, > so it's probably something not properly re-enabling them again. > > Try applying the attached patch to see if that changes anything. (I'm > on the road, so I can't repro the lockup issue.) If that doesn't > work, try disabling c-states and see if that helps. Then at least > we'll know where the problem lies. > > -George > > On Thu, Jul 2, 2009 at 10:10 PM, Mukesh Rathor wrote: >> that seems to only suspend csched_pcpu.ticker which is csched_tick that is >> only sorting local runq. >> >> again, we are concerned about csched_priv.master_ticker that calls >> csched_acct? correct, so i can trace that? >> >> thanks, >> mukesh >> >> >> George Dunlap wrote: >>> Ah, I see that there's been some changes to tick stuff with the >>> c-state (e.g., cs 19460). It looks like they're supposed to be going >>> still, but perhaps the tick_suspend() and tick_resume() aren't being >>> called properly. Let me take a closer look. >>> >>> -George >>> >>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh Rathor >>> wrote: >>>> George Dunlap wrote: >>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh Rathor >>>>> wrote: >>>>>> dom0 hang: >>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls >>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock >>>>>> for vcpu1. next it calls resched_task() which results in sending IPI >>>>>> to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op >>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is >>>>>> spinning on it's runq lock in "schedule():spin_lock_irq(&rq->lock);", >>>>>> that vcpu0 is holding (and is waiting to return from the HCALL). >>>>>> >>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So >>>>>> looking further into xen: >>>>>> >>>>>> xen: >>>>>> Both vcpu's are on the same runq, in this case cpu1. But the >>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result, >>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see in >>>>>> kdb that the scheduler timer is not set on cpu 0. That would've >>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on >>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho, >>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.] >>>>>> >>>>>> Looks like c/s 19500 changed csched_schedule(): >>>>>> >>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE); >>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ? >>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE)); >>>>>> >>>>>> The quickest fix for us would be to just back that out. >>>>>> >>>>>> >>>>>> BTW, just a comment on following (all in sched_credit.c): >>>>>> >>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER && >>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) ) >>>>>> { >>>>>> svc->pri = CSCHED_PRI_TS_BOOST; >>>>>> } >>>>>> comibined with >>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER ) >>>>>> __runq_remove(snext); >>>>>> >>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me, >>>>>> since csched_schedule() never checks for time accumulated by a >>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to a >>>>>> pcpu. if that vcpu never makes progress, essentially, the system >>>>>> has lost a physical cpu. Optionally, csched_schedule() should >>>>>> always >>>>>> check for cpu time accumulated and reduce the priority over time. >>>>>> I can't tell right off if it already does that. or something like >>>>>> that :)... my 2 cents. >>>>> Hmm... what's supposed to happen is that eventually a timer tick will >>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be >>>>> debited 10ms worth of credit. Eventually, it will go into OVER, and >>>>> lose BOOST. If it's "inactive", then when the tick happens, it will >>>>> be set to "active" and be debited 10ms again, setting it directly into >>>>> OVER (and thus also losing boost). >>>>> >>>>> Can you see if the timer ticks are still happening, and perhaps put >>>>> some tracing it to verify that what I described above is happening? >>>>> >>>>> -George >>>> George, >>>> >>>> Is that in csched_acct()? Looks like that's somehow gotten removed. If >>>> true, then may be that's the fundamental problem to chase. >>>> >>>> Here's what the trq looks like when hung, not in any schedule function: >>>> >>>> [0]xkdb> dtrq >>>> CPU[00]: NOW:0x00003f2db9af369e >>>> 1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000 >>>> 2: exp=0x00003ee347ece164 fn:time_calibration data:0000000000000000 >>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000 >>>> 4: exp=0x00003f055895e25f fn:plt_overflow data:0000000000000000 >>>> 5: exp=0x00003ee353810216 fn:rtc_update_second data:ffff83007f0226d8 >>>> >>>> CPU[01]: NOW:0x00003f2db9af369e >>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000 >>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback data:ffff83007f022a68 >>>> >>>> >>>> thanks >>>> Mukesh >>>>