From: john cooper <john.cooper@timesys.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: john cooper <john.cooper@timesys.com>,
Oleg Nesterov <oleg@tv-sign.ru>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Olaf Kirch <okir@suse.de>
Subject: Re: RT and Cascade interrupts
Date: Tue, 31 May 2005 19:09:44 -0400 [thread overview]
Message-ID: <429CEEB8.1010404@timesys.com> (raw)
In-Reply-To: <429B8678.1000706@timesys.com>
john cooper wrote:
> Trond Myklebust wrote:
>
>> I've appended a patch that
>> should check for strict compliance of the above rules. Could you try it
>> out and see if it triggers any Oopses?
>
>
> Yes, the assert in rpc_delete_timer() occurs just before
> the cascade list corruption. This is consistent with
> what I have seen. ie: the timer in a released rpc_task
> is still active.
I've captured more data in the instrumentation and found
the rpc_task's timer is being requeued by an application
task which is preempting ksoftirqd when it wakes up in
xprt_transmit(). This is what I had originally suspected
but likely didn't communicate it effectively.
The scenario unfolds as:
[high priority app task]
:
call_transmit()
xprt_transmit()
/* blocks in xprt_transmit() */
ksoftirqd()
__run_timers()
list_del("rpc_task_X.timer") /* logically off cascade */
rpc_run_timer(data)
task->tk_timeout_fn(task)
/* ksoftirqd preempted */
:
---------------------------------------------------------
/* Don't race with disconnect */
if (!xprt_connected(xprt))
task->tk_status = -ENOTCONN;
else if (!req->rq_received)
rpc_sleep_on(&xprt->pending, task, NULL, xprt_timer);
---------------------------------------------------------
__rpc_sleep_on()
__mod_timer("rpc_task_X.timer") /* requeued in cascade */
/* blocks */
/* rpc_run_timer resumes from preempt */
clear_bit(RPC_TASK_HAS_TIMER, "rpc_task_X.tk_runstate");
/* rpc_task_X.timer is now enqueued in cascade without
RPC_TASK_HAS_TIMER set and will not be dequeued
in rpc_release_task()/rpc_delete_timer() */
The usage of "rpc_task_X.timer" indicates the same KVA
observed for the timer struct at the associated points
in the instrumented code.
The above was gathered by logging usage of the
kernel/timer.c primitives. Thus I don't have more
detailed state of the rpc_task in RPC context.
However I did verify which of the three calls to
rpc_sleep_on() in xprt_transmit() was being invoked
(as above).
So the root cause appears to be the rpc_task's timer
being requeued in xprt_transmit() when rpc_run_timer
is preempted. From looking at the code I'm unsure
if modifying xprt_transmit()/out_receive is appropriate
to synchronize with rpc_release_task(). It seems
allowing rpc_sleep_on() to occur is more natural and
for rpc_release_task() to detect the pending timer and
remove it before proceeding. I'm still in the process
of trying to digest the logic here but I thought there
was enough information here to be of use. Suggestions,
warnings welcome.
-john
--
john.cooper@timesys.com
next prev parent reply other threads:[~2005-05-31 23:11 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-27 16:47 RT and Cascade interrupts Oleg Nesterov
2005-05-27 23:37 ` john cooper
2005-05-28 8:52 ` Oleg Nesterov
2005-05-28 14:02 ` john cooper
2005-05-28 16:34 ` Oleg Nesterov
2005-05-28 17:48 ` john cooper
2005-05-28 20:35 ` Trond Myklebust
2005-05-29 3:12 ` john cooper
2005-05-29 7:40 ` Trond Myklebust
2005-05-30 21:32 ` john cooper
2005-05-31 23:09 ` john cooper [this message]
2005-06-01 14:22 ` Oleg Nesterov
2005-06-01 18:05 ` john cooper
2005-06-01 18:31 ` Trond Myklebust
2005-06-01 19:20 ` john cooper
2005-06-01 19:46 ` Trond Myklebust
2005-06-01 20:21 ` Trond Myklebust
2005-06-01 20:59 ` john cooper
2005-06-01 22:51 ` Trond Myklebust
2005-06-01 23:09 ` Trond Myklebust
2005-06-02 3:31 ` john cooper
2005-06-02 4:26 ` Trond Myklebust
2005-06-09 23:17 ` George Anzinger
2005-06-09 23:52 ` john cooper
2005-05-29 11:31 ` Oleg Nesterov
2005-05-29 13:58 ` Trond Myklebust
2005-05-30 14:50 ` Ingo Molnar
2005-05-28 22:17 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2005-05-12 14:43 Daniel Walker
2005-05-13 7:44 ` Ingo Molnar
2005-05-13 13:12 ` john cooper
2005-05-24 16:32 ` john cooper
2005-05-27 7:25 ` Ingo Molnar
2005-05-27 13:53 ` john cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=429CEEB8.1010404@timesys.com \
--to=john.cooper@timesys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=okir@suse.de \
--cc=oleg@tv-sign.ru \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox