From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: lkp@lists.01.org
Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection
Date: Mon, 15 Dec 2014 15:13:38 +0530 [thread overview]
Message-ID: <548EAD4A.6070506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKohpo=C8Jv-+CtmDO+QJ-_=3dNwsk6_W0gtRRHmtSQ8_LgObQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3184 bytes --]
On 12/15/2014 03:02 PM, Viresh Kumar wrote:
> On 15 December 2014 at 12:55, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Viresh,
>>
>> Let me explain why I think this is happening.
>>
>> 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle*
>> and receives an interrupt.
>
> Bang on target. Yeah that's the part we missed while writing this patch :)
>
>> 2. Commit 2a16fc93d2c9568e1, cancels programming of tick_sched timer
>> in its handler, assuming that tick_nohz_irq_exit() will take care of
>> programming the clock event device appropriately, and hence it would
>> requeue or cancel the tick_sched timer.
>
> Correct.
>
>> 3. But the intel_powerclamp driver injects an idle period only.
>> *The CPU however is not idle*. It has work on its runqueue and the
>> rq->curr != idle. This means that *tick_nohz_irq_enter()/exit() will not
>> get called on any interrupt*.
>
> Still good..
>
>> 4. As a consequence, when we get a hrtimer interrupt during the period
>> that the powerclamp driver is mimicking idle, the exit path of the
>> interrupt never calls tick_nohz_irq_exit(). Hence the tick_sched timer
>> that would have got removed due to the above commit will not get
>> enqueued back on for any pending timers that there might be. Besides
>> this, *jiffies never gets updated*.
>
> Jiffies can be updated by any CPU and there is something called a control
> cpu with powerclamp driver. BUT we may have got interrupted before the
> powerclamp timer expired and so we are stuck in the
>
> while (time_before(jiffies, target_jiffies))
>
> loop for ever.
>
>> Hope the above explanation makes sense.
>
> Mostly good. Thanks for helping out.
>
> Now, what's the right solution going forward ?
>
> - Revert the offending commit ..
> - Or still try to avoid reprogramming if we can ..
>
> This is what I could come up with to still avoid reprogramming of tick:
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index cc0a5b6f741b..49f4278f69e2 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1100,7 +1100,7 @@ static enum hrtimer_restart
> tick_sched_timer(struct hrtimer *timer)
> tick_sched_handle(ts, regs);
>
> /* No need to reprogram if we are in idle or full dynticks mode */
> - if (unlikely(ts->tick_stopped))
> + if (unlikely(ts->tick_stopped && (is_idle_task(current) ||
> !ts->inidle)))
> return HRTIMER_NORESTART;
>
> hrtimer_forward(timer, now, tick_period);
>
>
Looks good to me. You can add my Reviewed-by to the above patch.
>
> Above change checks why we have stopped tick..
> - The cpu has gone idle (really): is_idle_task(current)
> - The cpu isn't in idle mode, i.e. its in nohz-full mode: !ts->inidle
>
> This fixed the issues with powerclamp in my case.
>
> @Fengguang: Can you please check if this fixes it for you as well?
>
> @Thomas: Please let me know if you want me to send this fix
> or you want to revert the original commit itself.
Regards
Preeti U Murthy
>
> Thanks.
>
> --
> Viresh
>
WARNING: multiple messages have this Message-ID (diff)
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Fengguang Wu <fengguang.wu@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
LKML <linux-kernel@vger.kernel.org>, LKP <lkp@01.org>
Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection
Date: Mon, 15 Dec 2014 15:13:38 +0530 [thread overview]
Message-ID: <548EAD4A.6070506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKohpo=C8Jv-+CtmDO+QJ-_=3dNwsk6_W0gtRRHmtSQ8_LgObQ@mail.gmail.com>
On 12/15/2014 03:02 PM, Viresh Kumar wrote:
> On 15 December 2014 at 12:55, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Viresh,
>>
>> Let me explain why I think this is happening.
>>
>> 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle*
>> and receives an interrupt.
>
> Bang on target. Yeah that's the part we missed while writing this patch :)
>
>> 2. Commit 2a16fc93d2c9568e1, cancels programming of tick_sched timer
>> in its handler, assuming that tick_nohz_irq_exit() will take care of
>> programming the clock event device appropriately, and hence it would
>> requeue or cancel the tick_sched timer.
>
> Correct.
>
>> 3. But the intel_powerclamp driver injects an idle period only.
>> *The CPU however is not idle*. It has work on its runqueue and the
>> rq->curr != idle. This means that *tick_nohz_irq_enter()/exit() will not
>> get called on any interrupt*.
>
> Still good..
>
>> 4. As a consequence, when we get a hrtimer interrupt during the period
>> that the powerclamp driver is mimicking idle, the exit path of the
>> interrupt never calls tick_nohz_irq_exit(). Hence the tick_sched timer
>> that would have got removed due to the above commit will not get
>> enqueued back on for any pending timers that there might be. Besides
>> this, *jiffies never gets updated*.
>
> Jiffies can be updated by any CPU and there is something called a control
> cpu with powerclamp driver. BUT we may have got interrupted before the
> powerclamp timer expired and so we are stuck in the
>
> while (time_before(jiffies, target_jiffies))
>
> loop for ever.
>
>> Hope the above explanation makes sense.
>
> Mostly good. Thanks for helping out.
>
> Now, what's the right solution going forward ?
>
> - Revert the offending commit ..
> - Or still try to avoid reprogramming if we can ..
>
> This is what I could come up with to still avoid reprogramming of tick:
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index cc0a5b6f741b..49f4278f69e2 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1100,7 +1100,7 @@ static enum hrtimer_restart
> tick_sched_timer(struct hrtimer *timer)
> tick_sched_handle(ts, regs);
>
> /* No need to reprogram if we are in idle or full dynticks mode */
> - if (unlikely(ts->tick_stopped))
> + if (unlikely(ts->tick_stopped && (is_idle_task(current) ||
> !ts->inidle)))
> return HRTIMER_NORESTART;
>
> hrtimer_forward(timer, now, tick_period);
>
>
Looks good to me. You can add my Reviewed-by to the above patch.
>
> Above change checks why we have stopped tick..
> - The cpu has gone idle (really): is_idle_task(current)
> - The cpu isn't in idle mode, i.e. its in nohz-full mode: !ts->inidle
>
> This fixed the issues with powerclamp in my case.
>
> @Fengguang: Can you please check if this fixes it for you as well?
>
> @Thomas: Please let me know if you want me to send this fix
> or you want to revert the original commit itself.
Regards
Preeti U Murthy
>
> Thanks.
>
> --
> Viresh
>
next prev parent reply other threads:[~2014-12-15 9:43 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 19:42 [nohz] 2a16fc93d2c: kernel lockup on idle injection Fengguang Wu
2014-12-11 19:42 ` Fengguang Wu
2014-12-12 11:57 ` Viresh Kumar
2014-12-12 11:57 ` Viresh Kumar
2014-12-15 7:25 ` Preeti U Murthy
2014-12-15 7:25 ` Preeti U Murthy
2014-12-15 9:32 ` Viresh Kumar
2014-12-15 9:32 ` Viresh Kumar
2014-12-15 9:43 ` Preeti U Murthy [this message]
2014-12-15 9:43 ` Preeti U Murthy
2014-12-15 21:24 ` Pan, Jacob jun
2014-12-15 21:24 ` Pan, Jacob jun
2014-12-16 4:18 ` Viresh Kumar
2014-12-16 4:18 ` Viresh Kumar
2014-12-16 17:15 ` Jacob Pan
2014-12-16 17:15 ` Jacob Pan
2014-12-16 21:15 ` Thomas Gleixner
2014-12-16 21:15 ` Thomas Gleixner
2014-12-15 23:44 ` Frederic Weisbecker
2014-12-15 23:44 ` Frederic Weisbecker
2014-12-16 4:53 ` Viresh Kumar
2014-12-16 4:53 ` Viresh Kumar
2014-12-16 9:36 ` Preeti U Murthy
2014-12-16 9:36 ` Preeti U Murthy
2014-12-16 12:49 ` Thomas Gleixner
2014-12-16 12:49 ` Thomas Gleixner
2014-12-16 14:20 ` Frederic Weisbecker
2014-12-16 14:20 ` Frederic Weisbecker
2014-12-16 14:50 ` Thomas Gleixner
2014-12-16 14:50 ` Thomas Gleixner
2014-12-16 21:21 ` Thomas Gleixner
2014-12-16 21:21 ` Thomas Gleixner
2014-12-16 22:49 ` Peter Zijlstra
2014-12-16 22:49 ` Peter Zijlstra
2014-12-16 22:54 ` Thomas Gleixner
2014-12-16 22:54 ` Thomas Gleixner
2014-12-17 0:26 ` Frederic Weisbecker
2014-12-17 0:26 ` Frederic Weisbecker
2014-12-17 0:12 ` Frederic Weisbecker
2014-12-17 0:12 ` Frederic Weisbecker
2014-12-17 9:11 ` Thomas Gleixner
2014-12-17 9:11 ` Thomas Gleixner
2014-12-17 12:47 ` Frederic Weisbecker
2014-12-17 12:47 ` Frederic Weisbecker
2014-12-16 14:32 ` Thomas Gleixner
2014-12-16 14:32 ` Thomas Gleixner
2014-12-16 14:56 ` Peter Zijlstra
2014-12-16 14:56 ` Peter Zijlstra
2014-12-16 16:54 ` Thomas Gleixner
2014-12-16 16:54 ` Thomas Gleixner
2014-12-17 12:31 ` Preeti Murthy
2014-12-17 12:31 ` Preeti Murthy
2014-12-17 15:42 ` Thomas Gleixner
2014-12-17 15:42 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548EAD4A.6070506@linux.vnet.ibm.com \
--to=preeti@linux.vnet.ibm.com \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.