* [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge @ 2014-02-02 20:37 Matthias Schneider 2014-02-03 11:03 ` Philippe Gerum 0 siblings, 1 reply; 4+ messages in thread From: Matthias Schneider @ 2014-02-02 20:37 UTC (permalink / raw) To: xenomai@xenomai.org Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge when creating and deleting tasks in rapid succession. The problem seems to be the adding and removing of tasks to wind_task_list in task_trampoline() and task_finalizer(). When creating a task, before returning a handle to the created task to the application, calling threadobj_start() from taskActivate() will wait for &thobj->barrier by calling wait_on_barrier(). This condition is triggered from the newly created task via task_trampoline() calling threadobj_prologue(). Having unblocked taskActivate(), the creating task/thread will return to the calling application, which in turn can delete the task in quick succession. Deleting the task will trigger the execution of task_finalizer, which will try to remove the task from wind_task_table. However, due to specific timing situations, it may be possible that the started task has not yet added itself to the list (it may still be "just" leaving threadobj_prologue()), leading to a segmentation fault. This behavior can be easily reproduced by adding a delay in task_trampoline() at the following location in taskLib.c: ret = __bt(threadobj_prologue(&task->thobj, task->name)); usleep(100000); write_lock_nocancel(&wind_task_lock); pvlist_append(&task->next, &wind_task_list); write_unlock(&wind_task_lock); writing a simple test application that simply calls SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); taskDelete(tid); will then reliably demonstrate the issue. Moving the last three lines appending the task to the list above threadobj_prologue() seems to fix the issue, although I am not 100% sure about possible side effects. Thanks in advance for commenting my issue, regards, Matthias ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge 2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider @ 2014-02-03 11:03 ` Philippe Gerum 2014-02-04 19:12 ` Matthias Schneider 0 siblings, 1 reply; 4+ messages in thread From: Philippe Gerum @ 2014-02-03 11:03 UTC (permalink / raw) To: Matthias Schneider, xenomai@xenomai.org On 02/02/2014 09:37 PM, Matthias Schneider wrote: > Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge > when creating and deleting tasks in rapid succession. The problem seems to be the > adding and removing of tasks to wind_task_list in task_trampoline() and > task_finalizer(). > > When creating a task, before returning a handle to the created task to the > application, calling threadobj_start() from taskActivate() will wait for > &thobj->barrier by calling wait_on_barrier(). This condition is triggered from > the newly created task via task_trampoline() calling threadobj_prologue(). > Having unblocked taskActivate(), the creating task/thread will return to the > calling application, which in turn can delete the task in quick succession. > Deleting the task will trigger the execution of task_finalizer, which will try > to remove the task from wind_task_table. However, due to specific timing > situations, it may be possible that the started task has not yet added itself > to the list (it may still be "just" leaving threadobj_prologue()), leading to > a segmentation fault. This behavior can be easily reproduced by adding a delay > in task_trampoline() at the following location in taskLib.c: > > ret = __bt(threadobj_prologue(&task->thobj, task->name)); > > usleep(100000); > > write_lock_nocancel(&wind_task_lock); > pvlist_append(&task->next, &wind_task_list); > write_unlock(&wind_task_lock); > > > writing a simple test application that simply calls > > SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); > taskDelete(tid); > > will then reliably demonstrate the issue. > > Moving the last three lines appending the task to the list above threadobj_prologue() > seems to fix the issue, although I am not 100% sure about possible side effects. > > Thanks in advance for commenting my issue, > regards, > All correct, the bug bites as described. As you suspected, simply swapping the statements would introduce another race. Could you try this patch instead? TIA, diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c index 1ef52a5..e3457a3 100644 --- a/lib/vxworks/taskLib.c +++ b/lib/vxworks/taskLib.c @@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj) { struct wind_task *task = container_of(thobj, struct wind_task, thobj); - write_lock_nocancel(&wind_task_lock); - pvlist_remove(&task->next); - write_unlock(&wind_task_lock); + if (pvholder_linked(&task->next)) { + write_lock_nocancel(&wind_task_lock); + pvlist_remove(&task->next); + write_unlock(&wind_task_lock); + } task->tcb->status |= WIND_DEAD; cluster_delobj(&wind_task_table, &task->cobj); @@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task, idata.finalizer = task_finalizer; idata.priority = cprio; threadobj_init(&task->thobj, &idata); + initpvh(&task->next); ret = __bt(cluster_addobj(&wind_task_table, task->name, &task->cobj)); if (ret) { -- Philippe. ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge 2014-02-03 11:03 ` Philippe Gerum @ 2014-02-04 19:12 ` Matthias Schneider 2014-02-06 9:51 ` Philippe Gerum 0 siblings, 1 reply; 4+ messages in thread From: Matthias Schneider @ 2014-02-04 19:12 UTC (permalink / raw) To: xenomai@xenomai.org > On Monday, February 3, 2014 12:02 PM, Philippe Gerum <rpm@xenomai.org> wrote: > > On 02/02/2014 09:37 PM, Matthias Schneider wrote: >> Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge >> when creating and deleting tasks in rapid succession. The problem seems to be the >> adding and removing of tasks to wind_task_list in task_trampoline() and >> task_finalizer(). >> >> When creating a task, before returning a handle to the created task to the >> application, calling threadobj_start() from taskActivate() will wait for >> &thobj->barrier by calling wait_on_barrier(). This condition is triggered from >> the newly created task via task_trampoline() calling threadobj_prologue(). >> Having unblocked taskActivate(), the creating task/thread will return to the >> calling application, which in turn can delete the task in quick succession. >> Deleting the task will trigger the execution of task_finalizer, which will try >> to remove the task from wind_task_table. However, due to specific timing >> situations, it may be possible that the started task has not yet added itself >> to the list (it may still be "just" leaving >> threadobj_prologue()), leading to >> a segmentation fault. This behavior can be easily reproduced by adding a delay >> in task_trampoline() at the following location in taskLib.c: >> >> ret = __bt(threadobj_prologue(&task->thobj, task->name)); >> >> usleep(100000); >> >> write_lock_nocancel(&wind_task_lock); >> pvlist_append(&task->next, &wind_task_list); >> write_unlock(&wind_task_lock); >> >> >> writing a simple test application that simply calls >> >> SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); >> taskDelete(tid); >> >> will then reliably demonstrate the issue. >> >> Moving the last three lines appending the task to the list above threadobj_prologue() >> seems to fix the issue, although I am not 100% sure about possible side effects. >> >> Thanks in advance for commenting my issue, >> regards, >> > > All correct, the bug bites as described. As you suspected, simply swapping the > statements would introduce another race. Could you try this patch instead? TIA, > > diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c > index 1ef52a5..e3457a3 100644 > --- a/lib/vxworks/taskLib.c > +++ b/lib/vxworks/taskLib.c > @@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj) > { > struct wind_task *task = container_of(thobj, struct wind_task, thobj); > > - write_lock_nocancel(&wind_task_lock); > - pvlist_remove(&task->next); > - write_unlock(&wind_task_lock); > + if (pvholder_linked(&task->next)) { > + write_lock_nocancel(&wind_task_lock); > + pvlist_remove(&task->next); > > + write_unlock(&wind_task_lock); > + } > > task->tcb->status |= WIND_DEAD; > cluster_delobj(&wind_task_table, &task->cobj); > @@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task, > idata.finalizer = task_finalizer; > idata.priority = cprio; > threadobj_init(&task->thobj, &idata); > + initpvh(&task->next); > > ret = __bt(cluster_addobj(&wind_task_table, task->name, > &task->cobj)); > if (ret) { > > > -- > Philippe. > Hi Philippe, the patch seems to address the original problem. Should this patch be merged into the repository? However, I seem to stumble into a different problem. If I understand correctly, CANCEL_DEFER() should prevent the thread/task from being cancelled. I have verified that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly, due to wind_task_lock being held (consistent with the observed printf behavior). I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the apparently non-functioning pthread_setcanceltype ? Thanks in advance, Matthias ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge 2014-02-04 19:12 ` Matthias Schneider @ 2014-02-06 9:51 ` Philippe Gerum 0 siblings, 0 replies; 4+ messages in thread From: Philippe Gerum @ 2014-02-06 9:51 UTC (permalink / raw) To: Matthias Schneider, xenomai@xenomai.org On 02/04/2014 08:12 PM, Matthias Schneider wrote: > the patch seems to address the original problem. Should this patch be merged > into the repository? > It is on its way to 'master', currently pending in 'next'. > However, I seem to stumble into a different problem. If I understand correctly, > CANCEL_DEFER() should prevent the thread/task from being cancelled. ...asynchronously. This is the important missing part. I have verified > that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL > macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER > and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks > the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly, > due to wind_task_lock being held (consistent with the observed printf behavior). > I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the > apparently non-functioning pthread_setcanceltype ? > Per POSIX, printf() is a cancellation point, any pending cancellation request will be handled unconditionally when traversing it. CANCEL_DEFER will only postpone async cancellation requests, until a cancellation point is traversed or async cancellation mode is enabled back, whichever comes first. http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html -- Philippe. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-02-06 9:51 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider 2014-02-03 11:03 ` Philippe Gerum 2014-02-04 19:12 ` Matthias Schneider 2014-02-06 9:51 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.