From: john cooper <john.cooper@timesys.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: john cooper <john.cooper@timesys.com>,
Daniel Walker <dwalker@mvista.com>,
linux-kernel@vger.kernel.org
Subject: Re: RT and Cascade interrupts
Date: Tue, 24 May 2005 12:32:21 -0400 [thread overview]
Message-ID: <42935715.2000505@timesys.com> (raw)
In-Reply-To: <4284A7B6.4090408@timesys.com>
[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]
john cooper wrote:
> I'm seeing the BUG assert in kernel/timers.c:cascade()
> kick in (tmp->base is somehow 0) during a test which
> creates a few tasks of priority higher than ksoftirqd.
> This race doesn't happen if ksoftirqd's priority is
> elevated (eg: chrt -f -p 75 2) so the -RT patch might
> be opening up a window here.
There is a window in rpc_run_timer() which allows
it to lose track of timer ownership when ksoftirqd
(and thus itself) are preempted. This doesn't
immediately cause a problem but does corrupt
the timer cascade list when the timer struct is
recycled/requeued. This shows up some time later
as the list is processed. The failure mode is cascade()
attempting to percolate a timer with poisoned
next/prev *s and a NULL base causing the assertion
BUG(tmp->base != base) to kick in.
The RPC code is attempting to replicate state of
timer ownership for a given rpc_task via RPC_TASK_HAS_TIMER
in rpc_task.tk_runstate. Besides not working
correctly in the case of preemptable context it is
a replication of state of a timer pending in the
cascade structure (ie: timer->base). The fix
changes the RPC code to use timer->base when
deciding whether an outstanding timer registration
exists during rpc_task tear down.
Note: this failure occurred in the 40-04 version of
the patch though it applies to more current versions.
It was seen when executing stress tests on a number
of PPC targets running on an NFS mounted root though
was not observed on a x86 target under similar
conditions.
-john
--
john.cooper@timesys.com
[-- Attachment #2: RPC.patch --]
[-- Type: text/plain, Size: 1630 bytes --]
./include/linux/sunrpc/sched.h
./net/sunrpc/sched.c
=================================================================
--- ./include/linux/sunrpc/sched.h.ORG 2005-05-24 10:29:24.000000000 -0400
+++ ./include/linux/sunrpc/sched.h 2005-05-24 10:47:56.000000000 -0400
@@ -142,7 +142,6 @@ typedef void (*rpc_action)(struct rpc_
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
#define RPC_TASK_WAKEUP 2
-#define RPC_TASK_HAS_TIMER 3
#define RPC_IS_RUNNING(t) (test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
#define rpc_set_running(t) (set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
=================================================================
--- ./net/sunrpc/sched.c.ORG 2005-05-24 10:29:52.000000000 -0400
+++ ./net/sunrpc/sched.c 2005-05-24 11:02:44.000000000 -0400
@@ -103,9 +103,6 @@ static void rpc_run_timer(struct rpc_tas
dprintk("RPC: %4d running timer\n", task->tk_pid);
callback(task);
}
- smp_mb__before_clear_bit();
- clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
- smp_mb__after_clear_bit();
}
/*
@@ -124,7 +121,6 @@ __rpc_add_timer(struct rpc_task *task, r
task->tk_timeout_fn = timer;
else
task->tk_timeout_fn = __rpc_default_timer;
- set_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
mod_timer(&task->tk_timer, jiffies + task->tk_timeout);
}
@@ -135,7 +131,7 @@ __rpc_add_timer(struct rpc_task *task, r
static inline void
rpc_delete_timer(struct rpc_task *task)
{
- if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
+ if (task->tk_timer.base) {
del_singleshot_timer_sync(&task->tk_timer);
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
}
next prev parent reply other threads:[~2005-05-24 16:40 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-12 14:43 RT and Cascade interrupts Daniel Walker
2005-05-13 7:44 ` Ingo Molnar
2005-05-13 13:12 ` john cooper
2005-05-24 16:32 ` john cooper [this message]
2005-05-27 7:25 ` Ingo Molnar
2005-05-27 13:53 ` john cooper
-- strict thread matches above, loose matches on Subject: below --
2005-05-27 16:47 Oleg Nesterov
2005-05-27 23:37 ` john cooper
2005-05-28 8:52 ` Oleg Nesterov
2005-05-28 14:02 ` john cooper
2005-05-28 16:34 ` Oleg Nesterov
2005-05-28 17:48 ` john cooper
2005-05-28 20:35 ` Trond Myklebust
2005-05-29 3:12 ` john cooper
2005-05-29 7:40 ` Trond Myklebust
2005-05-30 21:32 ` john cooper
2005-05-31 23:09 ` john cooper
2005-06-01 14:22 ` Oleg Nesterov
2005-06-01 18:05 ` john cooper
2005-06-01 18:31 ` Trond Myklebust
2005-06-01 19:20 ` john cooper
2005-06-01 19:46 ` Trond Myklebust
2005-06-01 20:21 ` Trond Myklebust
2005-06-01 20:59 ` john cooper
2005-06-01 22:51 ` Trond Myklebust
2005-06-01 23:09 ` Trond Myklebust
2005-06-02 3:31 ` john cooper
2005-06-02 4:26 ` Trond Myklebust
2005-06-09 23:17 ` George Anzinger
2005-06-09 23:52 ` john cooper
2005-05-29 11:31 ` Oleg Nesterov
2005-05-29 13:58 ` Trond Myklebust
2005-05-30 14:50 ` Ingo Molnar
2005-05-28 22:17 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42935715.2000505@timesys.com \
--to=john.cooper@timesys.com \
--cc=dwalker@mvista.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox