* [Xenomai-core] [RFC] Fixes for domain migration races
@ 2011-07-19 6:44 Jan Kiszka
2011-07-27 18:44 ` Gilles Chanteperdrix
0 siblings, 1 reply; 3+ messages in thread
From: Jan Kiszka @ 2011-07-19 6:44 UTC (permalink / raw)
To: Xenomai core
[-- Attachment #1: Type: text/plain, Size: 872 bytes --]
Hi,
I've just uploaded my upstream queue that mostly deals with the various
races I found in the domain migration code.
One of my concerns raised earlier turned out to be for no reason: We do
not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
deletion race can indeed be fixed by the patch I sent earlier. However,
we do not synchronize setting and testing of TASK_ATOMICSWITCH (because
we cannot hold the rq lock), thus we still face a small race window that
allows premature wakeups, at least in theory. That's now addressed by
patch 3.
Besides another race around set/clear_task_nowakeup, there should have
been a window during early migration to RT where we silently swallowed
Linux signals. Closed by patch 4, hopefully also fixing our spurious gdb
lockups on SMP boxes - time will tell.
Please review carefully.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Xenomai-core] [RFC] Fixes for domain migration races
2011-07-19 6:44 [Xenomai-core] [RFC] Fixes for domain migration races Jan Kiszka
@ 2011-07-27 18:44 ` Gilles Chanteperdrix
2011-07-28 16:44 ` Jan Kiszka
0 siblings, 1 reply; 3+ messages in thread
From: Gilles Chanteperdrix @ 2011-07-27 18:44 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Xenomai core
On 07/19/2011 08:44 AM, Jan Kiszka wrote:
> Hi,
>
> I've just uploaded my upstream queue that mostly deals with the various
> races I found in the domain migration code.
>
> One of my concerns raised earlier turned out to be for no reason: We do
> not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
> deletion race can indeed be fixed by the patch I sent earlier.
So, I still have the same question: is not the solution of synchronizing
with the gatekeeper as soon as we get out from schedule in secondary
mode better than waiting the task_exit callback? It looks more correct,
and it avoids gksched.
--
Gilles.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Xenomai-core] [RFC] Fixes for domain migration races
2011-07-27 18:44 ` Gilles Chanteperdrix
@ 2011-07-28 16:44 ` Jan Kiszka
0 siblings, 0 replies; 3+ messages in thread
From: Jan Kiszka @ 2011-07-28 16:44 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: Xenomai core
On 2011-07-27 20:44, Gilles Chanteperdrix wrote:
> On 07/19/2011 08:44 AM, Jan Kiszka wrote:
>> Hi,
>>
>> I've just uploaded my upstream queue that mostly deals with the various
>> races I found in the domain migration code.
>>
>> One of my concerns raised earlier turned out to be for no reason: We do
>> not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
>> deletion race can indeed be fixed by the patch I sent earlier.
>
> So, I still have the same question: is not the solution of synchronizing
> with the gatekeeper as soon as we get out from schedule in secondary
> mode better than waiting the task_exit callback? It looks more correct,
> and it avoids gksched.
Yes, I was on the wrong track /wrt wakeup races during the early
migration phase.
It is a possible and valid scenario that the task returns from
schedule() without being migrated. That can only happen if a signal was
queued in the meantime. The task will not be woken up again, that is
prevented by ATOMICSWITCH, but it will check for pending signals itself
before falling asleep. In that case it will enter TASK_RUNNING again and
return either before the gatekeeper could run or, on SMP, may continue
in parallel on a different CPU.
What saves us now from the fatal scenario that both the task runs and
the gatekeeper resumes its Xenomai part is that TASK_INTERRUPTIBLE state
was left. And if we wait for the gatekeeper to realize this like you
suggested, we ensure that neither the object is deleted too early nor
TASK_INTERRUPTIBLE is reentered again by doing Linux work.
I've cleaned up my queue correspondingly and just pushed it.
Thanks,
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-07-28 16:44 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-19 6:44 [Xenomai-core] [RFC] Fixes for domain migration races Jan Kiszka
2011-07-27 18:44 ` Gilles Chanteperdrix
2011-07-28 16:44 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.