From: Ben Greear <greearb@candelatech.com>
To: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
Joe Lawrence <joe.lawrence@stratus.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
stable@vger.kernel.org
Subject: Re: stop_machine lockup issue in 3.9.y.
Date: Wed, 05 Jun 2013 13:58:31 -0700 [thread overview]
Message-ID: <51AFA677.9010605@candelatech.com> (raw)
In-Reply-To: <51AF91F5.6090801@candelatech.com>
On 06/05/2013 12:31 PM, Ben Greear wrote:
> This is no longer really about the module unlink, so changing
> subject.
>
> On 06/05/2013 12:11 PM, Ben Greear wrote:
>> On 06/05/2013 11:48 AM, Tejun Heo wrote:
>>> Hello, Ben.
>>>
>>> On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote:
>>>> One pattern I notice repeating for at least most of the hangs is that all but one
>>>> CPU thread has irqs disabled and is in state 2. But, there will be one thread
>>>> in state 1 that still has IRQs enabled and it is reported to be in soft-lockup
>>>> instead of hard-lockup. In 'sysrq l' it always shows some IRQ processing,
>>>> but typically that of the sysrq itself. I added printk that would always
>>>> print if the thread notices that smdata->state != curstate, and the soft-lockup
>>>> thread (cpu 2 below) never shows that message.
>>>
>>> It sounds like one of the cpus get live-locked by IRQs. I can't tell
>>> why the situation is made worse by other CPUs being tied up. Do you
>>> ever see CPUs being live locked by IRQs during normal operation?
Hmm, wonder if I found it. I previously saw times where it appears
jiffies does not increment. __do_softirq has a break-out based on
jiffies timeout. Maybe that is failing to get us out of __do_softirq
in my lockup case because for whatever reason the system cannot update
jiffies in this case?
I added this (probably whitespace damaged) hack and now I have not been
able to reproduce the problem.
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 14d7758..621ea3b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void)
unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
int cpu;
unsigned long old_flags = current->flags;
+ unsigned long loops = 0;
/*
* Mask out PF_MEMALLOC s current task context is borrowed for the
@@ -241,6 +242,7 @@ restart:
unsigned int vec_nr = h - softirq_vec;
int prev_count = preempt_count();
+ loops++;
kstat_incr_softirqs_this_cpu(vec_nr);
trace_softirq_entry(vec_nr);
@@ -265,7 +267,7 @@ restart:
pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched())
+ if (time_before(jiffies, end) && !need_resched() && (loops < 500))
goto restart;
wakeup_softirqd();
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2013-06-05 20:58 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-31 18:14 Please add to stable: module: don't unlink the module until we've removed all exposure Ben Greear
2013-06-02 5:09 ` Rusty Russell
2013-06-03 3:46 ` Joe Lawrence
2013-06-03 11:25 ` Joe Lawrence
2013-06-03 14:17 ` Joe Lawrence
2013-06-03 15:59 ` Ben Greear
2013-06-03 16:36 ` Ben Greear
2013-06-04 4:37 ` Rusty Russell
2013-06-04 5:56 ` Rusty Russell
2013-06-04 14:07 ` Joe Lawrence
2013-06-04 16:50 ` Joe Lawrence
2013-06-04 16:53 ` Ben Greear
2013-06-04 17:45 ` Ben Greear
2013-06-05 4:17 ` Rusty Russell
2013-06-05 7:15 ` Tejun Heo
2013-06-05 16:59 ` Ben Greear
2013-06-05 18:48 ` Tejun Heo
2013-06-05 19:11 ` Ben Greear
2013-06-05 19:31 ` stop_machine lockup issue in 3.9.y Ben Greear
2013-06-05 20:58 ` Ben Greear [this message]
2013-06-05 21:11 ` [ath9k-devel] " Tejun Heo
2013-06-05 21:11 ` Tejun Heo
2013-06-05 21:11 ` Tejun Heo
2013-06-05 21:33 ` [ath9k-devel] " Ben Greear
2013-06-05 21:33 ` Ben Greear
2013-06-06 1:34 ` [ath9k-devel] " Eric Dumazet
2013-06-06 1:34 ` Eric Dumazet
2013-06-06 1:34 ` Eric Dumazet
2013-06-06 3:14 ` [ath9k-devel] " Tejun Heo
2013-06-06 3:14 ` Tejun Heo
2013-06-06 3:14 ` Tejun Heo
2013-06-06 3:26 ` [ath9k-devel] " Eric Dumazet
2013-06-06 3:26 ` Eric Dumazet
2013-06-06 3:26 ` Eric Dumazet
2013-06-06 3:41 ` [ath9k-devel] " Ben Greear
2013-06-06 3:41 ` Ben Greear
2013-06-06 3:46 ` [ath9k-devel] " Eric Dumazet
2013-06-06 3:46 ` Eric Dumazet
2013-06-06 3:50 ` [ath9k-devel] " Ben Greear
2013-06-06 3:50 ` Ben Greear
2013-06-06 4:08 ` [ath9k-devel] " Eric Dumazet
2013-06-06 4:08 ` Eric Dumazet
2013-06-06 20:55 ` [ath9k-devel] " Tejun Heo
2013-06-06 20:55 ` Tejun Heo
2013-06-06 21:15 ` [ath9k-devel] " Ben Greear
2013-06-06 21:15 ` Ben Greear
2013-06-06 21:17 ` [ath9k-devel] " Tejun Heo
2013-06-06 21:17 ` Tejun Heo
2013-06-05 3:29 ` Please add to stable: module: don't unlink the module until we've removed all exposure Rusty Russell
2013-06-05 5:07 ` Greg KH
2013-06-05 7:13 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AFA677.9010605@candelatech.com \
--to=greearb@candelatech.com \
--cc=joe.lawrence@stratus.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.