From: john cooper <john.cooper@timesys.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
john cooper <john.cooper@timesys.com>
Subject: Re: RT and Cascade interrupts
Date: Fri, 27 May 2005 19:37:29 -0400 [thread overview]
Message-ID: <4297AF39.4070304@timesys.com> (raw)
In-Reply-To: <42974F08.1C89CF2A@tv-sign.ru>
[-- Attachment #1: Type: text/plain, Size: 3984 bytes --]
Oleg Nesterov wrote:
> john cooper wrote:
>
>> rpc_delete_timer(struct rpc_task *task)
>> {
>>- if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
>>+ if (task->tk_timer.base) {
>> del_singleshot_timer_sync(&task->tk_timer);
>> dprintk("RPC: %4d deleting timer\n", task->tk_pid);
>> }
>
>
> I know nothing about rpc, but this looks completely wrong to me.
>
> Please don't use the ->base directly, use timer_pending().
A nit here but fair enough.
> This field is never NULL in -mm tree.
Which is ok if timer_pending() still provides the needed
information. Otherwise another means will be needed to
determine whether the struct timer is logically "off" the
base to which it was previous queued. I added such state
information to the timer structure and associated code in
the process of hunting this problem, but removed it in the
patch as for the tree to which this applies the state of
base already provided the information.
> Next, timer_pending() == 0 does not mean that del_timer_sync() is
> not needed, it may be running on the other CPU.
Whether timer_pending() is SMP safe depends on the caller's
context relative to the overall usage of the timer structure.
For example we aren't holding base->lock in rpc_delete_timer()
as would normally be expected. The reason this is safe is
the transition to "timer off base" and "timer.base <- NULL"
follows this sequence in __run_timers(). So the worst we
can do is to be racing with another cpu trying to expire this
timer and we will delete an already inactive timer which is
innocuous here.
The state transition of "timer on base" and
"timer.base <- base" can only happen after we free
the struct rpc_task in which the struct timer is embedded.
We haven't done so yet so no race here exists.
> Looking at the code for the first time, I can guess that RPC_TASK_HAS_TIMER
> was invented to indicate when it is safe not to wait for timer
> completition. This bit is cleared after ->tk_timeout_fn call in
> the timer's handler. After that rpc_run_timer will not touch *task.
The problem being it is possible in the now preemptible
context in which rpc_run_timer() executes for the call
of callback(task) to be preempted allowing another task
restart the timer (via a call to __rpc_add_timer()).
So the previous implied assumption in a nonpreemptive
rpc_run_timer() of the timer known to be inactive (as
that's how we arrived here) is no longer reliable.
The failure is for the timer to have been requeued during
the preemption but RPC_TASK_HAS_TIMER to afterwards
be cleared in rpc_run_timer(). This results in the call to
rpc_delete_timer() from rpc_release_task() never occurring
and the rpc_task struct then released for reuse with the
embedded timer struct still linked in the timer cascade list.
The next reuse of this rpc_task struct requeues the
embedded timer struct again in the cascade list causing
the corruption.
The fix removes the attempt to replicate timer queue
status from the RPC code by removing the usage of the
pre-calculated RPC_TASK_HAS_TIMER. This information is
only for the benefit of rpc_delete_timer() which is
called in the case of rpc_task tear down. So the test
is deferred until just before freeing the rpc_task for
reuse.
The potential race within rpc_release_task() of the timer
being requeued between the call to rpc_delete_timer()
and the call to rpc_free() is closed by setting
tk_timeout = 0 in the rpc_task before the rpc_delete_timer()
call. It may be unnecessary but I couldn't conclude a
race did not exist so it errs on the side of caution.
Updated patch attached relative to 40-04.
This is a rather complex failure scenario which unwinds
over time and requires considerable instrumentation to
isolate. I think the addition of configurable debug
state information to the timer structure along with
assertion checking in the primitives which manipulate
them would be quite useful for catching similar problems.
-john
--
john.cooper@timesys.com
[-- Attachment #2: RPC.patch2 --]
[-- Type: text/plain, Size: 1734 bytes --]
--- ./include/linux/sunrpc/sched.h.ORG 2005-05-24 10:29:24.000000000 -0400
+++ ./include/linux/sunrpc/sched.h 2005-05-24 10:47:56.000000000 -0400
@@ -142,7 +142,6 @@ typedef void (*rpc_action)(struct rpc_
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
#define RPC_TASK_WAKEUP 2
-#define RPC_TASK_HAS_TIMER 3
#define RPC_IS_RUNNING(t) (test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
#define rpc_set_running(t) (set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
=================================================================
--- ./net/sunrpc/sched.c.ORG 2005-05-24 10:29:52.000000000 -0400
+++ ./net/sunrpc/sched.c 2005-05-27 18:27:41.000000000 -0400
@@ -103,9 +103,6 @@ static void rpc_run_timer(struct rpc_tas
dprintk("RPC: %4d running timer\n", task->tk_pid);
callback(task);
}
- smp_mb__before_clear_bit();
- clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
- smp_mb__after_clear_bit();
}
/*
@@ -124,7 +121,6 @@ __rpc_add_timer(struct rpc_task *task, r
task->tk_timeout_fn = timer;
else
task->tk_timeout_fn = __rpc_default_timer;
- set_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
mod_timer(&task->tk_timer, jiffies + task->tk_timeout);
}
@@ -135,7 +131,7 @@ __rpc_add_timer(struct rpc_task *task, r
static inline void
rpc_delete_timer(struct rpc_task *task)
{
- if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
+ if (timer_pending(&task->tk_timer)) {
del_singleshot_timer_sync(&task->tk_timer);
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
}
@@ -849,6 +845,7 @@ void rpc_release_task(struct rpc_task *t
task->tk_active = 0;
/* Synchronously delete any running timer */
+ task->tk_timeout = 0;
rpc_delete_timer(task);
/* Release resources */
next prev parent reply other threads:[~2005-05-27 23:39 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-27 16:47 RT and Cascade interrupts Oleg Nesterov
2005-05-27 23:37 ` john cooper [this message]
2005-05-28 8:52 ` Oleg Nesterov
2005-05-28 14:02 ` john cooper
2005-05-28 16:34 ` Oleg Nesterov
2005-05-28 17:48 ` john cooper
2005-05-28 20:35 ` Trond Myklebust
2005-05-29 3:12 ` john cooper
2005-05-29 7:40 ` Trond Myklebust
2005-05-30 21:32 ` john cooper
2005-05-31 23:09 ` john cooper
2005-06-01 14:22 ` Oleg Nesterov
2005-06-01 18:05 ` john cooper
2005-06-01 18:31 ` Trond Myklebust
2005-06-01 19:20 ` john cooper
2005-06-01 19:46 ` Trond Myklebust
2005-06-01 20:21 ` Trond Myklebust
2005-06-01 20:59 ` john cooper
2005-06-01 22:51 ` Trond Myklebust
2005-06-01 23:09 ` Trond Myklebust
2005-06-02 3:31 ` john cooper
2005-06-02 4:26 ` Trond Myklebust
2005-06-09 23:17 ` George Anzinger
2005-06-09 23:52 ` john cooper
2005-05-29 11:31 ` Oleg Nesterov
2005-05-29 13:58 ` Trond Myklebust
2005-05-30 14:50 ` Ingo Molnar
2005-05-28 22:17 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2005-05-12 14:43 Daniel Walker
2005-05-13 7:44 ` Ingo Molnar
2005-05-13 13:12 ` john cooper
2005-05-24 16:32 ` john cooper
2005-05-27 7:25 ` Ingo Molnar
2005-05-27 13:53 ` john cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4297AF39.4070304@timesys.com \
--to=john.cooper@timesys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@tv-sign.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox