From: Pekka Riikonen <priikone@iki.fi>
To: Tejun Heo <tj@kernel.org>
Cc: greearb@candelatech.com, linux-kernel@vger.kernel.org,
eric.dumazet@gmail.com, stable@vger.kernel.org,
torvalds@linux-foundation.org
Subject: Re: [PATCH v3] Fix lockup related to stop_machine being stuck in __do_softirq.
Date: Fri, 7 Jun 2013 07:23:21 +0200 (CEST) [thread overview]
Message-ID: <alpine.GSO.2.00.1306070717400.20297@git.silcnet.org> (raw)
In-Reply-To: <20130606214014.GK5045@htj.dyndns.org>
On Thu, 6 Jun 2013, Tejun Heo wrote:
> On Thu, Jun 06, 2013 at 02:29:49PM -0700, greearb@candelatech.com wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> The stop machine logic can lock up if all but one of
>> the migration threads make it through the disable-irq
>> step and the one remaining thread gets stuck in
>> __do_softirq. The reason __do_softirq can hang is
>> that it has a bail-out based on jiffies timeout, but
>> in the lockup case, jiffies itself is not incremented.
>>
>> To work around this, re-add the max_restart counter in __do_irq
>> and stop processing irqs after 10 restarts.
>>
>> Thanks to Tejun Heo and Rusty Russell and others for
>> helping me track this down.
>>
>> This was introduced in 3.9 by commit: c10d73671ad30f5469
>> (softirq: reduce latencies).
>>
>> It may be worth looking into ath9k to see if it has issues with
>> it's irq handler at a later date.
>>
>> The hang stack traces look something like this:
> ...
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> Linus, while this doesn't fix the root cause of the problem - softirq
> runaway - I still think this is a worthwhile protection to have. Ben
> is in the process of finding out why the softirq runaway happens in
> the first place. We probably want to add Cc: stable@vger.kernel.org
> tag.
>
The counter also helps to keep the interrupted task interrupted a shorter
period of time. 10 iterations may be a lot shorter than the 2 ms, or 10
ms with HZ=100, so it helps interactivity also. This is a good change
to bring back in any case.
Pekka
next prev parent reply other threads:[~2013-06-07 5:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-06 21:29 [PATCH v3] Fix lockup related to stop_machine being stuck in __do_softirq greearb
2013-06-06 21:40 ` Tejun Heo
2013-06-07 5:23 ` Pekka Riikonen [this message]
2013-06-10 17:08 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.GSO.2.00.1306070717400.20297@git.silcnet.org \
--to=priikone@iki.fi \
--cc=eric.dumazet@gmail.com \
--cc=greearb@candelatech.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox