[Adeos-main] [BUG] evsync is not SMP-safe

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Adeos-main] [BUG] evsync is not SMP-safe
@ 2008-01-12 13:36 Jan Kiszka
  2008-01-25 22:06 ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2008-01-12 13:36 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

Philippe,

this

int fastcall __ipipe_dispatch_event (unsigned event, void *data)
...
	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
	local_irq_restore_hw(flags);
	propagate = !evhand(event, start_domain, data);
	local_irq_save_hw(flags);
	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);

doesn't fly on SMP. While the invoked event handler is running, it may
happen that the caller gets migrated to another CPU. The result is an
inconsistent evsync state that causes ipipe_catch_event to stall (test
case: invoke Jerome's system() test a few times, them try to unload
Xenomai skins and nucleus).

First idea (I've nothing implemented to far, would happily leave it to
someone else's hand): Track event handler entry/exit with an, say, 8 bit
per-cpu counter. On event deregistration, just summarize over the
per-cpu counters and wait for the sum to become 0. This has just the
drawback that it may cause livelocks on large SMP boxes when trying to
wait for a busy event. I've no perfect idea so far.

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] [BUG] evsync is not SMP-safe
  2008-01-12 13:36 [Adeos-main] [BUG] evsync is not SMP-safe Jan Kiszka
@ 2008-01-25 22:06 ` Jan Kiszka
  2008-01-27 15:24   ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2008-01-25 22:06 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 2576 bytes --]

Jan Kiszka wrote:
> Philippe,
> 
> this
> 
> int fastcall __ipipe_dispatch_event (unsigned event, void *data)
> ...
> 	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
> 	local_irq_restore_hw(flags);
> 	propagate = !evhand(event, start_domain, data);
> 	local_irq_save_hw(flags);
> 	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
> 
> doesn't fly on SMP. While the invoked event handler is running, it may
> happen that the caller gets migrated to another CPU. The result is an
> inconsistent evsync state that causes ipipe_catch_event to stall (test
> case: invoke Jerome's system() test a few times, them try to unload
> Xenomai skins and nucleus).
> 
> First idea (I've nothing implemented to far, would happily leave it to
> someone else's hand): Track event handler entry/exit with an, say, 8 bit
> per-cpu counter. On event deregistration, just summarize over the
> per-cpu counters and wait for the sum to become 0. This has just the
> drawback that it may cause livelocks on large SMP boxes when trying to
> wait for a busy event. I've no perfect idea so far.

Second approach to solve this (currently) last known ipipe issue cleanly:

As long as some task in the system has __ipipe_dispatch_event (and thus
some of the registered event handlers) on its stack, it is not safe to
unregister some handler. For simplicity reasons, I don't think we should
make any difference which handlers are precisely busy. So, how to find
out if there are such tasks?

I checked the code and derived three classes of preconditions for the
invocation of __ipipe_dispatch_event:

 1) ipipe_init_notify and ipipe_cleanup_notify -> no preconditions

 2) ipipe_trap_notify -> some Linux task has PF_EVNOTIFY set or
                         there are tasks with custom stacks present

 3) all other invocations -> some Linux task has PF_EVNOTIFY set

This means by walking the full Linux task list and checking that are no
more PF_EVNOTIFY-tasks present, we can easily exclude 3). In fact, 2)
can also be excluded this way because we can reasonably demand that any
I-pipe user fiddling with event handlers has killed all non-Linux tasks
beforehand. This leaves us with 1) which should be handled via classic
RCU as Linux offers it. Viable? It would even shorten the fast path a bit!

Still no code, I would like to collect feedback on this new idea first.

Jan

PS: IPIPE_EVENT_INIT is not used by Xenomai (and I found no trace that
it ever was). Can we deprecate this event and then remove it later,
Philippe?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] [BUG] evsync is not SMP-safe
  2008-01-25 22:06 ` Jan Kiszka
@ 2008-01-27 15:24   ` Jan Kiszka
  2008-01-28  6:59     ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2008-01-27 15:24 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: adeos-main


[-- Attachment #1.1: Type: text/plain, Size: 2685 bytes --]

Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Philippe,
>>
>> this
>>
>> int fastcall __ipipe_dispatch_event (unsigned event, void *data)
>> ...
>> 	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
>> 	local_irq_restore_hw(flags);
>> 	propagate = !evhand(event, start_domain, data);
>> 	local_irq_save_hw(flags);
>> 	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
>>
>> doesn't fly on SMP. While the invoked event handler is running, it may
>> happen that the caller gets migrated to another CPU. The result is an
>> inconsistent evsync state that causes ipipe_catch_event to stall (test
>> case: invoke Jerome's system() test a few times, them try to unload
>> Xenomai skins and nucleus).
>>
>> First idea (I've nothing implemented to far, would happily leave it to
>> someone else's hand): Track event handler entry/exit with an, say, 8 bit
>> per-cpu counter. On event deregistration, just summarize over the
>> per-cpu counters and wait for the sum to become 0. This has just the
>> drawback that it may cause livelocks on large SMP boxes when trying to
>> wait for a busy event. I've no perfect idea so far.
> 
> Second approach to solve this (currently) last known ipipe issue cleanly:
> 
> As long as some task in the system has __ipipe_dispatch_event (and thus
> some of the registered event handlers) on its stack, it is not safe to
> unregister some handler. For simplicity reasons, I don't think we should
> make any difference which handlers are precisely busy. So, how to find
> out if there are such tasks?
> 
> I checked the code and derived three classes of preconditions for the
> invocation of __ipipe_dispatch_event:
> 
>  1) ipipe_init_notify and ipipe_cleanup_notify -> no preconditions
> 
>  2) ipipe_trap_notify -> some Linux task has PF_EVNOTIFY set or
>                          there are tasks with custom stacks present
> 
>  3) all other invocations -> some Linux task has PF_EVNOTIFY set
> 
> This means by walking the full Linux task list and checking that are no
> more PF_EVNOTIFY-tasks present, we can easily exclude 3). In fact, 2)
> can also be excluded this way because we can reasonably demand that any
> I-pipe user fiddling with event handlers has killed all non-Linux tasks
> beforehand. This leaves us with 1) which should be handled via classic
> RCU as Linux offers it. Viable? It would even shorten the fast path a bit!
> 
> Still no code, I would like to collect feedback on this new idea first.

OK, here is some code. Seems to work fine (now that I fixed the cleanup
race in Xenomai as well - the issue happened to pop up right after
applying this patch :-/).

Jan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: rework-event-deregistration.patch --]
[-- Type: text/x-patch; name="rework-event-deregistration.patch", Size: 4830 bytes --]

---
 include/linux/ipipe.h        |   11 +++++++++--
 include/linux/ipipe_percpu.h |    1 -
 kernel/ipipe/core.c          |   42 ++++++++++++++++++++++++------------------
 3 files changed, 33 insertions(+), 21 deletions(-)

Index: linux-2.6.24-xeno_64/include/linux/ipipe_percpu.h
===================================================================
--- linux-2.6.24-xeno_64.orig/include/linux/ipipe_percpu.h
+++ linux-2.6.24-xeno_64/include/linux/ipipe_percpu.h
@@ -33,7 +33,6 @@ struct ipipe_percpu_domain_data {
 	unsigned long irqpend_lomask[IPIPE_IRQ_IWORDS];
 	unsigned long irqheld_mask[IPIPE_IRQ_IWORDS];
 	unsigned long irqall[IPIPE_NR_IRQS];
-	u64 evsync;
 };
 
 #ifdef CONFIG_SMP
Index: linux-2.6.24-xeno_64/kernel/ipipe/core.c
===================================================================
--- linux-2.6.24-xeno_64.orig/kernel/ipipe/core.c
+++ linux-2.6.24-xeno_64/kernel/ipipe/core.c
@@ -260,8 +260,6 @@ void __ipipe_init_stage(struct ipipe_dom
 
 		for (n = 0; n < IPIPE_NR_IRQS; n++)
 			ipipe_percpudom(ipd, irqall, cpu)[n] = 0;
-
-		ipipe_percpudom(ipd, evsync, cpu) = 0;
 	}
 
 	for (n = 0; n < IPIPE_NR_IRQS; n++) {
@@ -554,7 +552,6 @@ void fastcall __ipipe_walk_pipeline(stru
 				__ipipe_sync_pipeline(IPIPE_IRQMASK_ANY);
 			else {
 
-				ipipe_cpudom_var(this_domain, evsync) = 0;
 				ipipe_current_domain = next_domain;
 				ipipe_suspend_domain();	/* Sync stage and propagate interrupts. */
 
@@ -822,11 +819,9 @@ int fastcall __ipipe_dispatch_event (uns
 
 		if (evhand != NULL) {
 			ipipe_current_domain = next_domain;
-			ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
 			local_irq_restore_hw(flags);
 			propagate = !evhand(event, start_domain, data);
 			local_irq_save_hw(flags);
-			ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
 			if (ipipe_current_domain != next_domain)
 				this_domain = ipipe_current_domain;
 		}
@@ -1252,7 +1247,7 @@ ipipe_event_handler_t ipipe_catch_event(
 {
 	ipipe_event_handler_t old_handler;
 	unsigned long flags;
-	int self = 0, cpu;
+	int self = 0;
 
 	if (event & IPIPE_EVENT_SELF) {
 		event &= ~IPIPE_EVENT_SELF;
@@ -1293,20 +1288,31 @@ ipipe_event_handler_t ipipe_catch_event(
 		 * domain, we have to wait for any current invocation
 		 * to drain, since our caller might subsequently unmap
 		 * the target domain. To this aim, this code
-		 * synchronizes with __ipipe_dispatch_event(),
-		 * guaranteeing that either the dispatcher sees a null
-		 * handler in which case it discards the invocation
-		 * (which also prevents from entering a livelock), or
-		 * finds a valid handler and calls it. Symmetrically,
-		 * ipipe_catch_event() ensures that the called code
-		 * won't be unmapped under our feet until the event
-		 * synchronization flag is cleared for the given event
-		 * on all CPUs.
+		 * synchronizes on RCU sections to ensure that event
+		 * handlers protected by this mechanisms were all left.
+		 * Moreover, it is checked that there are no more tasks
+		 * in system which have the PF_EVNOTIFY flag set, i.e.
+		 * may still have the deregistered handler on their
+		 * stack.
 		 */
 
-		for_each_online_cpu(cpu) {
-			while (ipipe_percpudom(ipd, evsync, cpu) & (1LL << event))
-				schedule_timeout_interruptible(HZ / 50);
+		synchronize_rcu();
+
+		while (1) {
+			int evnotify_tasks = 0;
+			struct task_struct *p, *t;
+
+			read_lock(&tasklist_lock);
+			do_each_thread(p, t) {
+				if (t->flags & PF_EVNOTIFY)
+					evnotify_tasks++;
+			} while_each_thread(p, t);
+			read_unlock(&tasklist_lock);
+
+			if (!evnotify_tasks)
+				break;
+
+			schedule_timeout_interruptible(HZ / 50);
 		}
 	}
 
Index: linux-2.6.24-xeno_64/include/linux/ipipe.h
===================================================================
--- linux-2.6.24-xeno_64.orig/include/linux/ipipe.h
+++ linux-2.6.24-xeno_64/include/linux/ipipe.h
@@ -27,6 +27,7 @@
 #include <linux/percpu.h>
 #include <linux/mutex.h>
 #include <linux/linkage.h>
+#include <linux/rcupdate.h>
 #include <linux/ipipe_base.h>
 #include <linux/ipipe_compat.h>
 #include <asm/ipipe.h>
@@ -294,16 +295,22 @@ do {									\
 
 static inline void ipipe_init_notify(struct task_struct *p)
 {
-	if (__ipipe_event_monitored_p(IPIPE_EVENT_INIT))
+	if (__ipipe_event_monitored_p(IPIPE_EVENT_INIT)) {
+		rcu_read_lock();
 		__ipipe_dispatch_event(IPIPE_EVENT_INIT,p);
+		rcu_read_unlock();
+	}
 }
 
 struct mm_struct;
 
 static inline void ipipe_cleanup_notify(struct mm_struct *mm)
 {
-	if (__ipipe_event_monitored_p(IPIPE_EVENT_CLEANUP))
+	if (__ipipe_event_monitored_p(IPIPE_EVENT_CLEANUP)) {
+		rcu_read_lock();
 		__ipipe_dispatch_event(IPIPE_EVENT_CLEANUP,mm);
+		rcu_read_unlock();
+	}
 }
 
 /* Public interface */

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] [BUG] evsync is not SMP-safe
  2008-01-27 15:24   ` Jan Kiszka
@ 2008-01-28  6:59     ` Philippe Gerum
  2008-01-28  8:35       ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2008-01-28  6:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 4971 bytes --]

Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Philippe,
>>>
>>> this
>>>
>>> int fastcall __ipipe_dispatch_event (unsigned event, void *data)
>>> ...
>>> 	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
>>> 	local_irq_restore_hw(flags);
>>> 	propagate = !evhand(event, start_domain, data);
>>> 	local_irq_save_hw(flags);
>>> 	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
>>>
>>> doesn't fly on SMP. While the invoked event handler is running, it may
>>> happen that the caller gets migrated to another CPU. The result is an
>>> inconsistent evsync state that causes ipipe_catch_event to stall (test
>>> case: invoke Jerome's system() test a few times, them try to unload
>>> Xenomai skins and nucleus).
>>>
>>> First idea (I've nothing implemented to far, would happily leave it to
>>> someone else's hand): Track event handler entry/exit with an, say, 8 bit
>>> per-cpu counter. On event deregistration, just summarize over the
>>> per-cpu counters and wait for the sum to become 0. This has just the
>>> drawback that it may cause livelocks on large SMP boxes when trying to
>>> wait for a busy event. I've no perfect idea so far.
>> Second approach to solve this (currently) last known ipipe issue cleanly:
>>
>> As long as some task in the system has __ipipe_dispatch_event (and thus
>> some of the registered event handlers) on its stack, it is not safe to
>> unregister some handler. For simplicity reasons, I don't think we should
>> make any difference which handlers are precisely busy. So, how to find
>> out if there are such tasks?
>>
>> I checked the code and derived three classes of preconditions for the
>> invocation of __ipipe_dispatch_event:
>>
>>  1) ipipe_init_notify and ipipe_cleanup_notify -> no preconditions
>>
>>  2) ipipe_trap_notify -> some Linux task has PF_EVNOTIFY set or
>>                          there are tasks with custom stacks present
>>
>>  3) all other invocations -> some Linux task has PF_EVNOTIFY set

4) ipipe_syscall_watched_p used in syscall pipelining -> PF_EVNOTIFY ||
non-regular syscall

>>
>> This means by walking the full Linux task list and checking that are no
>> more PF_EVNOTIFY-tasks present, we can easily exclude 3). In fact, 2)
>> can also be excluded this way because we can reasonably demand that any
>> I-pipe user fiddling with event handlers has killed all non-Linux tasks
>> beforehand.
>> This leaves us with 1) which should be handled via classic
>> RCU as Linux offers it. Viable? It would even shorten the fast path a bit!
>>
>> Still no code, I would like to collect feedback on this new idea first.
> 
> OK, here is some code. Seems to work fine (now that I fixed the cleanup
> race in Xenomai as well - the issue happened to pop up right after
> applying this patch :-/).
> 

This patch changes the predicate from "a given registered handler should
no be currently running" to "no task in the system should be currently
asking for any kind of I-pipe notification", which has not quite the
same scope. For instance, we could no more deregister a handler only for
a particular event leaving others installed. It becomes an "all or
nothing" test, even if that still remains compatible with the way
Xenomai works though.

However, we have a real problem with the syscall event. This one has an
additional precondition to PF_EVNOTIFY which is unfortunately dynamic,
i.e. syscall_nr >= NR_SYSCALL. This is used to relay non-regular
syscalls to whoever it may concern in the pipeline, when PF_EVNOTIFY is
not set. Xenomai uses this property for its shadow binding syscall,
which will eventually set PF_EVNOTIFY for the caller. IOW, an attempt to
bind to the skin from userland while the nucleus is deregistering the
syscall handler would escape the RCU sync with the current patch.

But holding a read-side RCU lock to postpone the quiescent RCU
state won't fly, since the syscall handler may reschedule Linux-wise.

Looking back at the actual issue, we only have a single case to handle
for the current code to be correct: detect when a handler appears to
linger indefinitely on a given CPU (albeit it does not really), due to a
migration.

This should be done in a way that does not require complex
locking/tracking on the fast path since we are really dealing with a
corner case here. The attached patch does this by allowing a grace
period before assuming that the handler will never complete on the
current CPU, looking for some quiescent state. The latter part is almost
overkill, since clearing the handler in the domain struct already avoids
livelocking and would prevent such a handler to be fired quite early
anyway. The underlying assumption is that tasks which should be notified
of system events will be killed before someone attempts to deregister
the event handlers they rely on, so they should not be able to run/sleep
more than the grace period on behalf of the notifier.

Does this patch also fix the case you pointed out?

-- 
Philippe.








[-- Attachment #2: evsync.patch --]
[-- Type: text/x-diff, Size: 2064 bytes --]

diff --git a/include/linux/ipipe_percpu.h b/include/linux/ipipe_percpu.h
index 4b4d1f5..5d1b88d 100644
--- a/include/linux/ipipe_percpu.h
+++ b/include/linux/ipipe_percpu.h
@@ -34,6 +34,7 @@ struct ipipe_percpu_domain_data {
 	unsigned long irqheld_mask[IPIPE_IRQ_IWORDS];
 	unsigned long irqall[IPIPE_NR_IRQS];
 	u64 evsync;
+	int evstamp;
 };
 
 #ifdef CONFIG_SMP
diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
index 3a7b4d9..3baa443 100644
--- a/kernel/ipipe/core.c
+++ b/kernel/ipipe/core.c
@@ -262,6 +262,7 @@ void __ipipe_init_stage(struct ipipe_domain *ipd)
 			ipipe_percpudom(ipd, irqall, cpu)[n] = 0;
 
 		ipipe_percpudom(ipd, evsync, cpu) = 0;
+		ipipe_percpudom(ipd, evstamp, cpu) = 0;
 	}
 
 	for (n = 0; n < IPIPE_NR_IRQS; n++) {
@@ -827,6 +828,7 @@ int fastcall __ipipe_dispatch_event (unsigned event, void *data)
 			propagate = !evhand(event, start_domain, data);
 			local_irq_save_hw(flags);
 			ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
+			ipipe_cpudom_var(next_domain, evstamp)++;
 			if (ipipe_current_domain != next_domain)
 				this_domain = ipipe_current_domain;
 		}
@@ -1251,8 +1253,8 @@ ipipe_event_handler_t ipipe_catch_event(struct ipipe_domain *ipd,
 					ipipe_event_handler_t handler)
 {
 	ipipe_event_handler_t old_handler;
+	int self = 0, cpu, evstamp;
 	unsigned long flags;
-	int self = 0, cpu;
 
 	if (event & IPIPE_EVENT_SELF) {
 		event &= ~IPIPE_EVENT_SELF;
@@ -1305,8 +1307,18 @@ ipipe_event_handler_t ipipe_catch_event(struct ipipe_domain *ipd,
 		 */
 
 		for_each_online_cpu(cpu) {
-			while (ipipe_percpudom(ipd, evsync, cpu) & (1LL << event))
+			/*
+			 * We allow a grace period of 1s for the
+			 * handler to complete, before assuming that
+			 * it will never do so on the current CPU
+			 * because of a migration.
+			 */
+			evstamp = ipipe_percpudom(ipd, evstamp, cpu);
+			do {
+				if ((ipipe_percpudom(ipd, evsync, cpu) & (1LL << event)) == 0)
+					break;
 				schedule_timeout_interruptible(HZ / 50);
+			} while(++evstamp - ipipe_percpudom(ipd, evstamp, cpu) < 50);
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Adeos-main] [BUG] evsync is not SMP-safe
  2008-01-28  6:59     ` Philippe Gerum
@ 2008-01-28  8:35       ` Jan Kiszka
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kiszka @ 2008-01-28  8:35 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 5663 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>> Philippe,
>>>>
>>>> this
>>>>
>>>> int fastcall __ipipe_dispatch_event (unsigned event, void *data)
>>>> ...
>>>> 	ipipe_cpudom_var(next_domain, evsync) |= (1LL << event);
>>>> 	local_irq_restore_hw(flags);
>>>> 	propagate = !evhand(event, start_domain, data);
>>>> 	local_irq_save_hw(flags);
>>>> 	ipipe_cpudom_var(next_domain, evsync) &= ~(1LL << event);
>>>>
>>>> doesn't fly on SMP. While the invoked event handler is running, it may
>>>> happen that the caller gets migrated to another CPU. The result is an
>>>> inconsistent evsync state that causes ipipe_catch_event to stall (test
>>>> case: invoke Jerome's system() test a few times, them try to unload
>>>> Xenomai skins and nucleus).
>>>>
>>>> First idea (I've nothing implemented to far, would happily leave it to
>>>> someone else's hand): Track event handler entry/exit with an, say, 8 bit
>>>> per-cpu counter. On event deregistration, just summarize over the
>>>> per-cpu counters and wait for the sum to become 0. This has just the
>>>> drawback that it may cause livelocks on large SMP boxes when trying to
>>>> wait for a busy event. I've no perfect idea so far.
>>> Second approach to solve this (currently) last known ipipe issue cleanly:
>>>
>>> As long as some task in the system has __ipipe_dispatch_event (and thus
>>> some of the registered event handlers) on its stack, it is not safe to
>>> unregister some handler. For simplicity reasons, I don't think we should
>>> make any difference which handlers are precisely busy. So, how to find
>>> out if there are such tasks?
>>>
>>> I checked the code and derived three classes of preconditions for the
>>> invocation of __ipipe_dispatch_event:
>>>
>>>  1) ipipe_init_notify and ipipe_cleanup_notify -> no preconditions
>>>
>>>  2) ipipe_trap_notify -> some Linux task has PF_EVNOTIFY set or
>>>                          there are tasks with custom stacks present
>>>
>>>  3) all other invocations -> some Linux task has PF_EVNOTIFY set
> 
> 4) ipipe_syscall_watched_p used in syscall pipelining -> PF_EVNOTIFY ||
> non-regular syscall

Damn it.

> 
>>> This means by walking the full Linux task list and checking that are no
>>> more PF_EVNOTIFY-tasks present, we can easily exclude 3). In fact, 2)
>>> can also be excluded this way because we can reasonably demand that any
>>> I-pipe user fiddling with event handlers has killed all non-Linux tasks
>>> beforehand.
>>> This leaves us with 1) which should be handled via classic
>>> RCU as Linux offers it. Viable? It would even shorten the fast path a bit!
>>>
>>> Still no code, I would like to collect feedback on this new idea first.
>> OK, here is some code. Seems to work fine (now that I fixed the cleanup
>> race in Xenomai as well - the issue happened to pop up right after
>> applying this patch :-/).
>>
> 
> This patch changes the predicate from "a given registered handler should
> no be currently running" to "no task in the system should be currently
> asking for any kind of I-pipe notification", which has not quite the
> same scope. For instance, we could no more deregister a handler only for
> a particular event leaving others installed. It becomes an "all or
> nothing" test, even if that still remains compatible with the way
> Xenomai works though.
> 
> However, we have a real problem with the syscall event. This one has an
> additional precondition to PF_EVNOTIFY which is unfortunately dynamic,
> i.e. syscall_nr >= NR_SYSCALL. This is used to relay non-regular
> syscalls to whoever it may concern in the pipeline, when PF_EVNOTIFY is
> not set. Xenomai uses this property for its shadow binding syscall,
> which will eventually set PF_EVNOTIFY for the caller. IOW, an attempt to
> bind to the skin from userland while the nucleus is deregistering the
> syscall handler would escape the RCU sync with the current patch.
> 
> But holding a read-side RCU lock to postpone the quiescent RCU
> state won't fly, since the syscall handler may reschedule Linux-wise.
> 
> Looking back at the actual issue, we only have a single case to handle
> for the current code to be correct: detect when a handler appears to
> linger indefinitely on a given CPU (albeit it does not really), due to a
> migration.
> 
> This should be done in a way that does not require complex
> locking/tracking on the fast path since we are really dealing with a
> corner case here. The attached patch does this by allowing a grace
> period before assuming that the handler will never complete on the
> current CPU, looking for some quiescent state. The latter part is almost
> overkill, since clearing the handler in the domain struct already avoids
> livelocking and would prevent such a handler to be fired quite early
> anyway. The underlying assumption is that tasks which should be notified
> of system events will be killed before someone attempts to deregister
> the event handlers they rely on, so they should not be able to run/sleep
> more than the grace period on behalf of the notifier.
> 
> Does this patch also fix the case you pointed out?

May solve my particular test case, but given sleeping event handlers,
its clear to me now that the whole bit-flag based locking scheme remains
fragile once you have >1 tasks on one CPU inside the same
to-be-unregistered handler. Then the first leaving task will release the
lock, shadowing the existence of further tasks inside the handler. We
would need SRCU-like locking here if if were available for our
environment...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-01-28  8:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-12 13:36 [Adeos-main] [BUG] evsync is not SMP-safe Jan Kiszka
2008-01-25 22:06 ` Jan Kiszka
2008-01-27 15:24   ` Jan Kiszka
2008-01-28  6:59     ` Philippe Gerum
2008-01-28  8:35       ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.