* [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
@ 2014-02-02 20:37 Matthias Schneider
2014-02-03 11:03 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Matthias Schneider @ 2014-02-02 20:37 UTC (permalink / raw)
To: xenomai@xenomai.org
Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge
when creating and deleting tasks in rapid succession. The problem seems to be the
adding and removing of tasks to wind_task_list in task_trampoline() and
task_finalizer().
When creating a task, before returning a handle to the created task to the
application, calling threadobj_start() from taskActivate() will wait for
&thobj->barrier by calling wait_on_barrier(). This condition is triggered from
the newly created task via task_trampoline() calling threadobj_prologue().
Having unblocked taskActivate(), the creating task/thread will return to the
calling application, which in turn can delete the task in quick succession.
Deleting the task will trigger the execution of task_finalizer, which will try
to remove the task from wind_task_table. However, due to specific timing
situations, it may be possible that the started task has not yet added itself
to the list (it may still be "just" leaving threadobj_prologue()), leading to
a segmentation fault. This behavior can be easily reproduced by adding a delay
in task_trampoline() at the following location in taskLib.c:
ret = __bt(threadobj_prologue(&task->thobj, task->name));
usleep(100000);
write_lock_nocancel(&wind_task_lock);
pvlist_append(&task->next, &wind_task_list);
write_unlock(&wind_task_lock);
writing a simple test application that simply calls
SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
taskDelete(tid);
will then reliably demonstrate the issue.
Moving the last three lines appending the task to the list above threadobj_prologue()
seems to fix the issue, although I am not 100% sure about possible side effects.
Thanks in advance for commenting my issue,
regards,
Matthias
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider
@ 2014-02-03 11:03 ` Philippe Gerum
2014-02-04 19:12 ` Matthias Schneider
0 siblings, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2014-02-03 11:03 UTC (permalink / raw)
To: Matthias Schneider, xenomai@xenomai.org
On 02/02/2014 09:37 PM, Matthias Schneider wrote:
> Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge
> when creating and deleting tasks in rapid succession. The problem seems to be the
> adding and removing of tasks to wind_task_list in task_trampoline() and
> task_finalizer().
>
> When creating a task, before returning a handle to the created task to the
> application, calling threadobj_start() from taskActivate() will wait for
> &thobj->barrier by calling wait_on_barrier(). This condition is triggered from
> the newly created task via task_trampoline() calling threadobj_prologue().
> Having unblocked taskActivate(), the creating task/thread will return to the
> calling application, which in turn can delete the task in quick succession.
> Deleting the task will trigger the execution of task_finalizer, which will try
> to remove the task from wind_task_table. However, due to specific timing
> situations, it may be possible that the started task has not yet added itself
> to the list (it may still be "just" leaving threadobj_prologue()), leading to
> a segmentation fault. This behavior can be easily reproduced by adding a delay
> in task_trampoline() at the following location in taskLib.c:
>
> ret = __bt(threadobj_prologue(&task->thobj, task->name));
>
> usleep(100000);
>
> write_lock_nocancel(&wind_task_lock);
> pvlist_append(&task->next, &wind_task_list);
> write_unlock(&wind_task_lock);
>
>
> writing a simple test application that simply calls
>
> SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
> taskDelete(tid);
>
> will then reliably demonstrate the issue.
>
> Moving the last three lines appending the task to the list above threadobj_prologue()
> seems to fix the issue, although I am not 100% sure about possible side effects.
>
> Thanks in advance for commenting my issue,
> regards,
>
All correct, the bug bites as described. As you suspected, simply swapping the statements would introduce another race. Could you try this patch instead? TIA,
diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c
index 1ef52a5..e3457a3 100644
--- a/lib/vxworks/taskLib.c
+++ b/lib/vxworks/taskLib.c
@@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj)
{
struct wind_task *task = container_of(thobj, struct wind_task, thobj);
- write_lock_nocancel(&wind_task_lock);
- pvlist_remove(&task->next);
- write_unlock(&wind_task_lock);
+ if (pvholder_linked(&task->next)) {
+ write_lock_nocancel(&wind_task_lock);
+ pvlist_remove(&task->next);
+ write_unlock(&wind_task_lock);
+ }
task->tcb->status |= WIND_DEAD;
cluster_delobj(&wind_task_table, &task->cobj);
@@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task,
idata.finalizer = task_finalizer;
idata.priority = cprio;
threadobj_init(&task->thobj, &idata);
+ initpvh(&task->next);
ret = __bt(cluster_addobj(&wind_task_table, task->name, &task->cobj));
if (ret) {
--
Philippe.
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
2014-02-03 11:03 ` Philippe Gerum
@ 2014-02-04 19:12 ` Matthias Schneider
2014-02-06 9:51 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Matthias Schneider @ 2014-02-04 19:12 UTC (permalink / raw)
To: xenomai@xenomai.org
> On Monday, February 3, 2014 12:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> > On 02/02/2014 09:37 PM, Matthias Schneider wrote:
>> Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge
>> when creating and deleting tasks in rapid succession. The problem seems to be the
>> adding and removing of tasks to wind_task_list in task_trampoline() and
>> task_finalizer().
>>
>> When creating a task, before returning a handle to the created task to the
>> application, calling threadobj_start() from taskActivate() will wait for
>> &thobj->barrier by calling wait_on_barrier(). This condition is triggered from
>> the newly created task via task_trampoline() calling threadobj_prologue().
>> Having unblocked taskActivate(), the creating task/thread will return to the
>> calling application, which in turn can delete the task in quick succession.
>> Deleting the task will trigger the execution of task_finalizer, which will try
>> to remove the task from wind_task_table. However, due to specific timing
>> situations, it may be possible that the started task has not yet added itself
>> to the list (it may still be "just" leaving
>> threadobj_prologue()), leading to
>> a segmentation fault. This behavior can be easily reproduced by adding a delay
>> in task_trampoline() at the following location in taskLib.c:
>>
>> ret = __bt(threadobj_prologue(&task->thobj, task->name));
>>
>> usleep(100000);
>>
>> write_lock_nocancel(&wind_task_lock);
>> pvlist_append(&task->next, &wind_task_list);
>> write_unlock(&wind_task_lock);
>>
>>
>> writing a simple test application that simply calls
>>
>> SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
>> taskDelete(tid);
>>
>> will then reliably demonstrate the issue.
>>
>> Moving the last three lines appending the task to the list above threadobj_prologue()
>> seems to fix the issue, although I am not 100% sure about possible side effects.
>>
>> Thanks in advance for commenting my issue,
>> regards,
>>
>
> All correct, the bug bites as described. As you suspected, simply swapping the
> statements would introduce another race. Could you try this patch instead? TIA,
>
> diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c
> index 1ef52a5..e3457a3 100644
> --- a/lib/vxworks/taskLib.c
> +++ b/lib/vxworks/taskLib.c
> @@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj)
> {
> struct wind_task *task = container_of(thobj, struct wind_task, thobj);
>
> - write_lock_nocancel(&wind_task_lock);
> - pvlist_remove(&task->next);
> - write_unlock(&wind_task_lock);
> + if (pvholder_linked(&task->next)) {
> + write_lock_nocancel(&wind_task_lock);
> + pvlist_remove(&task->next);
>
> + write_unlock(&wind_task_lock);
> + }
>
> task->tcb->status |= WIND_DEAD;
> cluster_delobj(&wind_task_table, &task->cobj);
> @@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task,
> idata.finalizer = task_finalizer;
> idata.priority = cprio;
> threadobj_init(&task->thobj, &idata);
> + initpvh(&task->next);
>
> ret = __bt(cluster_addobj(&wind_task_table, task->name,
> &task->cobj));
> if (ret) {
>
>
> --
> Philippe.
>
Hi Philippe,
the patch seems to address the original problem. Should this patch be merged
into the repository?
However, I seem to stumble into a different problem. If I understand correctly,
CANCEL_DEFER() should prevent the thread/task from being cancelled. I have verified
that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL
macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER
and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks
the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly,
due to wind_task_lock being held (consistent with the observed printf behavior).
I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the
apparently non-functioning pthread_setcanceltype ?
Thanks in advance,
Matthias
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
2014-02-04 19:12 ` Matthias Schneider
@ 2014-02-06 9:51 ` Philippe Gerum
0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2014-02-06 9:51 UTC (permalink / raw)
To: Matthias Schneider, xenomai@xenomai.org
On 02/04/2014 08:12 PM, Matthias Schneider wrote:
> the patch seems to address the original problem. Should this patch be merged
> into the repository?
>
It is on its way to 'master', currently pending in 'next'.
> However, I seem to stumble into a different problem. If I understand correctly,
> CANCEL_DEFER() should prevent the thread/task from being cancelled.
...asynchronously. This is the important missing part.
I have verified
> that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL
> macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER
> and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks
> the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly,
> due to wind_task_lock being held (consistent with the observed printf behavior).
> I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the
> apparently non-functioning pthread_setcanceltype ?
>
Per POSIX, printf() is a cancellation point, any pending cancellation
request will be handled unconditionally when traversing it. CANCEL_DEFER
will only postpone async cancellation requests, until a cancellation
point is traversed or async cancellation mode is enabled back, whichever
comes first.
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-02-06 9:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider
2014-02-03 11:03 ` Philippe Gerum
2014-02-04 19:12 ` Matthias Schneider
2014-02-06 9:51 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.