From: john stultz <johnstul@us.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
"Paul E. McKenney" <paulmck@us.ibm.com>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()
Date: Thu, 14 Jun 2007 14:20:20 -0700 [thread overview]
Message-ID: <1181856020.6276.14.camel@localhost.localdomain> (raw)
In-Reply-To: <1181096244.6018.20.camel@localhost>
On Tue, 2007-06-05 at 19:17 -0700, john stultz wrote:
> Hey Ingo,
> So we've been seeing the following trace fairly frequently on our SMP
> boxes when running kernbench:
>
> BUG: at kernel/softirq.c:639 __tasklet_action()
>
> Call Trace:
> [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
> [<ffffffff8106d89b>] show_trace+0x41/0x5c
> [<ffffffff8106d8cb>] dump_stack+0x15/0x17
> [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
> [<ffffffff81094f76>] tasklet_action+0x27/0x29
> [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
> [<ffffffff81033d4d>] kthread+0xf5/0x128
> [<ffffffff8105ff68>] child_rip+0xa/0x12
>
>
> Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
>
>
> Anyway, I think I finally found the issue. Its a bit hard to explain,
> but the idea is while __tasklet_action is running the tasklet function
> on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
> after we mark the TASKLET_STATE_SCHED bit we are preempted,
> __tasklet_action on CPU1 might be able to re-run the function, clear the
> bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
> Once __tasklet_common_schedule locks the tasklet, we will add the
> tasklet to the list with the TASKLET_STATE_SCHED *unset*.
>
> I've verified this race occurs w/ a WARN_ON in
> __tasklet_common_schedule().
>
>
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
>
> Does it look ok to you?
>
> thanks
> -john
>
> Signed-off-by: John Stultz <johnstul@us.ibm.com>
>
> Index: 2.6-rt/kernel/softirq.c
> ===================================================================
> --- 2.6-rt.orig/kernel/softirq.c 2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
> __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
> {
> if (tasklet_trylock(t)) {
> - WARN_ON(t->next != NULL);
> - t->next = head->list;
> - head->list = t;
> - raise_softirq_irqoff(nr);
> + /* We may have been preempted before tasklet_trylock
> + * and __tasklet_action may have already run.
> + * So double check the sched bit while the takslet
> + * is locked before adding it to the list.
> + */
> + if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> + WARN_ON(t->next != NULL);
> + t->next = head->list;
> + head->list = t;
> + raise_softirq_irqoff(nr);
> + }
> tasklet_unlock(t);
> }
> }
So while digging on a strange OOM issue we were seeing (which actually
ended up being fixed by Steven's softirq patch), I noticed that the fix
above is incomplete. With only the patch above, we may no longer have
unscheduled tasklets added to the list, but we may end up with scheduled
tasklets that are not on the list (and will stay that way!).
The following additional patch should correct this issue. Although since
we weren't actually hitting it, the issue is a bit theoretical, so I've
not been able to prove it really fixes anything.
thanks
-john
Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c 2007-06-12 20:30:11.000000000 -0700
+++ 2.6-rt/kernel/softirq.c 2007-06-12 20:37:22.000000000 -0700
@@ -544,6 +544,7 @@
__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
{
if (tasklet_trylock(t)) {
+again:
/* We may have been preempted before tasklet_trylock
* and __tasklet_action may have already run.
* So double check the sched bit while the takslet
@@ -554,8 +555,21 @@
t->next = head->list;
head->list = t;
raise_softirq_irqoff(nr);
+ tasklet_unlock(t);
+ } else {
+ /* This is subtle. If we hit the corner case above
+ * It is possible that we get preempted right here,
+ * and another task has successfully called
+ * tasklet_schedule(), then this function, and
+ * failed on the trylock. Thus we must be sure
+ * before releasing the tasklet lock, that the
+ * SCHED_BIT is clear. Otherwise the tasklet
+ * may get its SCHED_BIT set, but not added to the
+ * list
+ */
+ if (!tasklet_tryunlock(t))
+ goto again;
}
- tasklet_unlock(t);
}
}
next prev parent reply other threads:[~2007-06-14 21:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-06 2:17 [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON() john stultz
2007-06-06 9:45 ` Ingo Molnar
2007-06-06 17:39 ` john stultz
2007-06-06 10:31 ` Jesper Juhl
2007-06-14 21:20 ` john stultz [this message]
-- strict thread matches above, loose matches on Subject: below --
2007-06-15 15:52 Oleg Nesterov
2007-06-15 21:51 ` john stultz
2007-06-15 23:59 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1181856020.6276.14.camel@localhost.localdomain \
--to=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@us.ibm.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox