From: George Anzinger <george@mvista.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: john cooper <john.cooper@timesys.com>,
Oleg Nesterov <oleg@tv-sign.ru>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Olaf Kirch <okir@suse.de>
Subject: Re: RT and Cascade interrupts
Date: Thu, 09 Jun 2005 16:17:45 -0700 [thread overview]
Message-ID: <42A8CE19.1000807@mvista.com> (raw)
In-Reply-To: <1117686367.10822.104.camel@lade.trondhjem.org>
Excuse me for interrupting this thread, but have you seen:
http://marc.theaimsgroup.com/?l=linux-kernel&m=111717961227508&w=2
I think this will fix your problem.
George
--
Trond Myklebust wrote:
> on den 01.06.2005 Klokka 23:31 (-0400) skreiv john cooper:
>
>>I fully share your frustration of wanting to "use the
>>latest patch -- dammit". However there are other practical
>>constraints coming into play. This tree has accumulated a
>>substantial amount of fixes for scheduler violation assertions
>>along with associated testing and has faired well thus far.
>>The bug under discussion here is the last major operational
>>problem found in the associated testing process. Arriving
>>at this point also required development of target specific
>>driver/board code so a resync to a later version is not a
>>trivial operation. However it would be justifiable in the
>>case of encountering at an impasse with the current tree.
>
>
> My point is that you are considering timer bugs due to synchronization
> problems in code which is obviously not designed to accommodate
> synchronization. Once that fact is established, one moves on and
> considers the code which does support synchronization.
>
>
>>>Could you then apply the following debugging patch? It should warn you
>>>in case something happens to corrupt base->running_timer (something
>>>which would screw up del_timer_sync()). I'm not sure that can happen,
>>>but it might be worth checking.
>>
>>Yes, thanks. Though the event trace does not suggest a
>>reentrance in __run_timer() but rather a preemption of it
>>during the call to rpc_run_timer() by a high priority
>>application task in the midst of an RPC. The preempting
>>task requeues the timer in the cascade at the tail of
>>xprt_transmit(). rpc_run_timer() upon resuming execution
>>unconditionally clears the RPC_TASK_HAS_TIMER flag. This
>>creates the inconsistent state.
>
>
> There are NO cases where that is supposed to be allowed to occur. This
> case is precisely what del_timer_sync() is supposed to treat.
>
>
>>No explicit deletion attempt of the timer (synchronous or
>>otherwise) is coming into play in the failure scenario as
>>witnessed by the event trace. Rather it is the implicit
>>dequeue of the timer from the cascade in __run_timer() and
>>attempt to track ownership of it in rpc_run_timer() via
>>RPC_TASK_HAS_TIMER which is undermined in the case of
>>preemption.
>
>
> No!!! The responsibility for tracking timers that have been dequeued and
> that are currently running inside __run_timer() lies fairly and squarely
> with del_timer_sync().
> There is NOTHING within the RT patches that implies that the existing
> callers of del_timer_sync() should be burdened with having to do
> additional tracking of pending timers. To do so would be a major change
> of the existing API, and would require a lot of justification.
>
> IOW: nobody but you is claiming that the RPC code is trying to deal with
> this case by tracking RPC_TASK_HAS_TIMER. That is not its purpose, nor
> should it be in the RT case.
>
>
>> From earlier mail:
>>
>> > There should be no instances of RPC entering call_transmit() or any
>> > other tk_action callback with a pending timer.
>>
>>My description wasn't clear. The timeout isn't pending
>>before call_transmit(). Rather the RPC appears to be
>>blocked elsewhere and upon wakeup via __run_timer()/xprt_timer()
>>preempts ksoftirqd and does the __rpc_sleep_on()/__mod_timer()
>>at the very tail of xprt_transmit().
>
>
> No!!! How is this supposed to happen? There is only one thread that is
> allowed to call rpc_sleep_on(), and that is the exact same thread that
> is calling __rpc_execute(). It may call rpc_sleep_on() only from inside
> a task->tk_action() call, and therefore only _after_ it has called
> rpc_delete_timer(). There is supposed to be strict ordering here!
>
> Trond
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
next prev parent reply other threads:[~2005-06-09 23:19 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-27 16:47 RT and Cascade interrupts Oleg Nesterov
2005-05-27 23:37 ` john cooper
2005-05-28 8:52 ` Oleg Nesterov
2005-05-28 14:02 ` john cooper
2005-05-28 16:34 ` Oleg Nesterov
2005-05-28 17:48 ` john cooper
2005-05-28 20:35 ` Trond Myklebust
2005-05-29 3:12 ` john cooper
2005-05-29 7:40 ` Trond Myklebust
2005-05-30 21:32 ` john cooper
2005-05-31 23:09 ` john cooper
2005-06-01 14:22 ` Oleg Nesterov
2005-06-01 18:05 ` john cooper
2005-06-01 18:31 ` Trond Myklebust
2005-06-01 19:20 ` john cooper
2005-06-01 19:46 ` Trond Myklebust
2005-06-01 20:21 ` Trond Myklebust
2005-06-01 20:59 ` john cooper
2005-06-01 22:51 ` Trond Myklebust
2005-06-01 23:09 ` Trond Myklebust
2005-06-02 3:31 ` john cooper
2005-06-02 4:26 ` Trond Myklebust
2005-06-09 23:17 ` George Anzinger [this message]
2005-06-09 23:52 ` john cooper
2005-05-29 11:31 ` Oleg Nesterov
2005-05-29 13:58 ` Trond Myklebust
2005-05-30 14:50 ` Ingo Molnar
2005-05-28 22:17 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2005-05-12 14:43 Daniel Walker
2005-05-13 7:44 ` Ingo Molnar
2005-05-13 13:12 ` john cooper
2005-05-24 16:32 ` john cooper
2005-05-27 7:25 ` Ingo Molnar
2005-05-27 13:53 ` john cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42A8CE19.1000807@mvista.com \
--to=george@mvista.com \
--cc=john.cooper@timesys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=okir@suse.de \
--cc=oleg@tv-sign.ru \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.