Re: [Adeos-main] [BUG] evsync is not SMP-safe

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@domain.hid>
To: rpm@xenomai.org
Cc: adeos-main <adeos-main@gna.org>
Subject: Re: [Adeos-main] [BUG] evsync is not SMP-safe
Date: Mon, 28 Jan 2008 09:35:33 +0100	[thread overview]
Message-ID: <479D93D5.2040207@domain.hid> (raw)
In-Reply-To: <479D7D48.1050200@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 5663 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>> Philippe,
>>>>
>>>> this
>>>>
>>>> int fastcall __ipipe_dispatch_event (unsigned event, void *data)
>>>> ...
>>>> 	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
>>>> 	local_irq_restore_hw(flags);
>>>> 	propagate = !evhand(event, start_domain, data);
>>>> 	local_irq_save_hw(flags);
>>>> 	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
>>>>
>>>> doesn't fly on SMP. While the invoked event handler is running, it may
>>>> happen that the caller gets migrated to another CPU. The result is an
>>>> inconsistent evsync state that causes ipipe_catch_event to stall (test
>>>> case: invoke Jerome's system() test a few times, them try to unload
>>>> Xenomai skins and nucleus).
>>>>
>>>> First idea (I've nothing implemented to far, would happily leave it to
>>>> someone else's hand): Track event handler entry/exit with an, say, 8 bit
>>>> per-cpu counter. On event deregistration, just summarize over the
>>>> per-cpu counters and wait for the sum to become 0. This has just the
>>>> drawback that it may cause livelocks on large SMP boxes when trying to
>>>> wait for a busy event. I've no perfect idea so far.
>>> Second approach to solve this (currently) last known ipipe issue cleanly:
>>>
>>> As long as some task in the system has __ipipe_dispatch_event (and thus
>>> some of the registered event handlers) on its stack, it is not safe to
>>> unregister some handler. For simplicity reasons, I don't think we should
>>> make any difference which handlers are precisely busy. So, how to find
>>> out if there are such tasks?
>>>
>>> I checked the code and derived three classes of preconditions for the
>>> invocation of __ipipe_dispatch_event:
>>>
>>>  1) ipipe_init_notify and ipipe_cleanup_notify -> no preconditions
>>>
>>>  2) ipipe_trap_notify -> some Linux task has PF_EVNOTIFY set or
>>>                          there are tasks with custom stacks present
>>>
>>>  3) all other invocations -> some Linux task has PF_EVNOTIFY set
> 
> 4) ipipe_syscall_watched_p used in syscall pipelining -> PF_EVNOTIFY ||
> non-regular syscall

Damn it.

> 
>>> This means by walking the full Linux task list and checking that are no
>>> more PF_EVNOTIFY-tasks present, we can easily exclude 3). In fact, 2)
>>> can also be excluded this way because we can reasonably demand that any
>>> I-pipe user fiddling with event handlers has killed all non-Linux tasks
>>> beforehand.
>>> This leaves us with 1) which should be handled via classic
>>> RCU as Linux offers it. Viable? It would even shorten the fast path a bit!
>>>
>>> Still no code, I would like to collect feedback on this new idea first.
>> OK, here is some code. Seems to work fine (now that I fixed the cleanup
>> race in Xenomai as well - the issue happened to pop up right after
>> applying this patch :-/).
>>
> 
> This patch changes the predicate from "a given registered handler should
> no be currently running" to "no task in the system should be currently
> asking for any kind of I-pipe notification", which has not quite the
> same scope. For instance, we could no more deregister a handler only for
> a particular event leaving others installed. It becomes an "all or
> nothing" test, even if that still remains compatible with the way
> Xenomai works though.
> 
> However, we have a real problem with the syscall event. This one has an
> additional precondition to PF_EVNOTIFY which is unfortunately dynamic,
> i.e. syscall_nr >= NR_SYSCALL. This is used to relay non-regular
> syscalls to whoever it may concern in the pipeline, when PF_EVNOTIFY is
> not set. Xenomai uses this property for its shadow binding syscall,
> which will eventually set PF_EVNOTIFY for the caller. IOW, an attempt to
> bind to the skin from userland while the nucleus is deregistering the
> syscall handler would escape the RCU sync with the current patch.
> 
> But holding a read-side RCU lock to postpone the quiescent RCU
> state won't fly, since the syscall handler may reschedule Linux-wise.
> 
> Looking back at the actual issue, we only have a single case to handle
> for the current code to be correct: detect when a handler appears to
> linger indefinitely on a given CPU (albeit it does not really), due to a
> migration.
> 
> This should be done in a way that does not require complex
> locking/tracking on the fast path since we are really dealing with a
> corner case here. The attached patch does this by allowing a grace
> period before assuming that the handler will never complete on the
> current CPU, looking for some quiescent state. The latter part is almost
> overkill, since clearing the handler in the domain struct already avoids
> livelocking and would prevent such a handler to be fired quite early
> anyway. The underlying assumption is that tasks which should be notified
> of system events will be killed before someone attempts to deregister
> the event handlers they rely on, so they should not be able to run/sleep
> more than the grace period on behalf of the notifier.
> 
> Does this patch also fix the case you pointed out?

May solve my particular test case, but given sleeping event handlers,
its clear to me now that the whole bit-flag based locking scheme remains
fragile once you have >1 tasks on one CPU inside the same
to-be-unregistered handler. Then the first leaving task will release the
lock, shadowing the existence of further tasks inside the handler. We
would need SRCU-like locking here if if were available for our
environment...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

     prev parent reply	other threads:[~2008-01-28  8:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-12 13:36 [Adeos-main] [BUG] evsync is not SMP-safe Jan Kiszka
2008-01-25 22:06 ` Jan Kiszka
2008-01-27 15:24   ` Jan Kiszka
2008-01-28  6:59     ` Philippe Gerum
2008-01-28  8:35       ` Jan Kiszka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479D93D5.2040207@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=adeos-main@gna.org \
    --cc=rpm@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.