From: George Anzinger <george@mvista.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: john cooper <john.cooper@timesys.com>,
Oleg Nesterov <oleg@tv-sign.ru>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Olaf Kirch <okir@suse.de>
Subject: Re: RT and Cascade interrupts
Date: Thu, 09 Jun 2005 16:17:45 -0700 [thread overview]
Message-ID: <42A8CE19.1000807@mvista.com> (raw)
In-Reply-To: <1117686367.10822.104.camel@lade.trondhjem.org>
Excuse me for interrupting this thread, but have you seen:
http://marc.theaimsgroup.com/?l=linux-kernel&m=111717961227508&w=2
I think this will fix your problem.
George
--
Trond Myklebust wrote:
> on den 01.06.2005 Klokka 23:31 (-0400) skreiv john cooper:
>
>>I fully share your frustration of wanting to "use the
>>latest patch -- dammit". However there are other practical
>>constraints coming into play. This tree has accumulated a
>>substantial amount of fixes for scheduler violation assertions
>>along with associated testing and has faired well thus far.
>>The bug under discussion here is the last major operational
>>problem found in the associated testing process. Arriving
>>at this point also required development of target specific
>>driver/board code so a resync to a later version is not a
>>trivial operation. However it would be justifiable in the
>>case of encountering at an impasse with the current tree.
>
>
> My point is that you are considering timer bugs due to synchronization
> problems in code which is obviously not designed to accommodate
> synchronization. Once that fact is established, one moves on and
> considers the code which does support synchronization.
>
>
>>>Could you then apply the following debugging patch? It should warn you
>>>in case something happens to corrupt base->running_timer (something
>>>which would screw up del_timer_sync()). I'm not sure that can happen,
>>>but it might be worth checking.
>>
>>Yes, thanks. Though the event trace does not suggest a
>>reentrance in __run_timer() but rather a preemption of it
>>during the call to rpc_run_timer() by a high priority
>>application task in the midst of an RPC. The preempting
>>task requeues the timer in the cascade at the tail of
>>xprt_transmit(). rpc_run_timer() upon resuming execution
>>unconditionally clears the RPC_TASK_HAS_TIMER flag. This
>>creates the inconsistent state.
>
>
> There are NO cases where that is supposed to be allowed to occur. This
> case is precisely what del_timer_sync() is supposed to treat.
>
>
>>No explicit deletion attempt of the timer (synchronous or
>>otherwise) is coming into play in the failure scenario as
>>witnessed by the event trace. Rather it is the implicit
>>dequeue of the timer from the cascade in __run_timer() and
>>attempt to track ownership of it in rpc_run_timer() via
>>RPC_TASK_HAS_TIMER which is undermined in the case of
>>preemption.
>
>
> No!!! The responsibility for tracking timers that have been dequeued and
> that are currently running inside __run_timer() lies fairly and squarely
> with del_timer_sync().
> There is NOTHING within the RT patches that implies that the existing
> callers of del_timer_sync() should be burdened with having to do
> additional tracking of pending timers. To do so would be a major change
> of the existing API, and would require a lot of justification.
>
> IOW: nobody but you is claiming that the RPC code is trying to deal with
> this case by tracking RPC_TASK_HAS_TIMER. That is not its purpose, nor
> should it be in the RT case.
>
>
>> From earlier mail:
>>
>> > There should be no instances of RPC entering call_transmit() or any
>> > other tk_action callback with a pending timer.
>>
>>My description wasn't clear. The timeout isn't pending
>>before call_transmit(). Rather the RPC appears to be
>>blocked elsewhere and upon wakeup via __run_timer()/xprt_timer()
>>preempts ksoftirqd and does the __rpc_sleep_on()/__mod_timer()
>>at the very tail of xprt_transmit().
>
>
> No!!! How is this supposed to happen? There is only one thread that is
> allowed to call rpc_sleep_on(), and that is the exact same thread that
> is calling __rpc_execute(). It may call rpc_sleep_on() only from inside
> a task->tk_action() call, and therefore only _after_ it has called
> rpc_delete_timer(). There is supposed to be strict ordering here!
>
> Trond
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
next prev parent reply other threads:[~2005-06-09 23:19 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-27 16:47 RT and Cascade interrupts Oleg Nesterov
2005-05-27 23:37 ` john cooper
2005-05-28 8:52 ` Oleg Nesterov
2005-05-28 14:02 ` john cooper
2005-05-28 16:34 ` Oleg Nesterov
2005-05-28 17:48 ` john cooper
2005-05-28 20:35 ` Trond Myklebust
2005-05-29 3:12 ` john cooper
2005-05-29 7:40 ` Trond Myklebust
2005-05-30 21:32 ` john cooper
2005-05-31 23:09 ` john cooper
2005-06-01 14:22 ` Oleg Nesterov
2005-06-01 18:05 ` john cooper
2005-06-01 18:31 ` Trond Myklebust
2005-06-01 19:20 ` john cooper
2005-06-01 19:46 ` Trond Myklebust
2005-06-01 20:21 ` Trond Myklebust
2005-06-01 20:59 ` john cooper
2005-06-01 22:51 ` Trond Myklebust
2005-06-01 23:09 ` Trond Myklebust
2005-06-02 3:31 ` john cooper
2005-06-02 4:26 ` Trond Myklebust
2005-06-09 23:17 ` George Anzinger [this message]
2005-06-09 23:52 ` john cooper
2005-05-29 11:31 ` Oleg Nesterov
2005-05-29 13:58 ` Trond Myklebust
2005-05-30 14:50 ` Ingo Molnar
2005-05-28 22:17 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2005-05-12 14:43 Daniel Walker
2005-05-13 7:44 ` Ingo Molnar
2005-05-13 13:12 ` john cooper
2005-05-24 16:32 ` john cooper
2005-05-27 7:25 ` Ingo Molnar
2005-05-27 13:53 ` john cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42A8CE19.1000807@mvista.com \
--to=george@mvista.com \
--cc=john.cooper@timesys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=okir@suse.de \
--cc=oleg@tv-sign.ru \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox