All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
@ 2014-02-02 20:37 Matthias Schneider
  2014-02-03 11:03 ` Philippe Gerum
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Schneider @ 2014-02-02 20:37 UTC (permalink / raw)
  To: xenomai@xenomai.org

Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge 
when creating and deleting tasks in rapid succession. The problem seems to be the 
adding and removing of tasks to wind_task_list in task_trampoline() and 
task_finalizer().

When creating a task, before returning a handle to the created task to the 
application, calling threadobj_start() from taskActivate() will wait for 
&thobj->barrier by calling wait_on_barrier(). This condition is triggered from 
the newly created task via task_trampoline() calling threadobj_prologue(). 
Having unblocked taskActivate(), the creating task/thread will return to the 
calling application, which in turn can delete the task in quick succession. 
Deleting the task will trigger the execution of task_finalizer, which will try 
to remove the task from wind_task_table. However, due to specific timing 
situations, it may be possible  that the started task has not yet added itself 
to the list (it may still be "just" leaving threadobj_prologue()), leading to
a segmentation fault. This behavior can be easily reproduced by adding a delay
in task_trampoline() at the following location in taskLib.c:

ret = __bt(threadobj_prologue(&task->thobj, task->name));

usleep(100000);

write_lock_nocancel(&wind_task_lock);
pvlist_append(&task->next, &wind_task_list);
write_unlock(&wind_task_lock);


writing a simple test application that simply calls

SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
taskDelete(tid);

will then reliably demonstrate the issue.

Moving the last three lines appending the task to the list above threadobj_prologue() 
seems to fix the issue, although I am not 100% sure about possible side effects.

Thanks in advance for commenting my issue,
regards,

Matthias


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
  2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider
@ 2014-02-03 11:03 ` Philippe Gerum
  2014-02-04 19:12   ` Matthias Schneider
  0 siblings, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2014-02-03 11:03 UTC (permalink / raw)
  To: Matthias Schneider, xenomai@xenomai.org

On 02/02/2014 09:37 PM, Matthias Schneider wrote:
> Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge
> when creating and deleting tasks in rapid succession. The problem seems to be the
> adding and removing of tasks to wind_task_list in task_trampoline() and
> task_finalizer().
> 
> When creating a task, before returning a handle to the created task to the
> application, calling threadobj_start() from taskActivate() will wait for
> &thobj->barrier by calling wait_on_barrier(). This condition is triggered from
> the newly created task via task_trampoline() calling threadobj_prologue().
> Having unblocked taskActivate(), the creating task/thread will return to the
> calling application, which in turn can delete the task in quick succession.
> Deleting the task will trigger the execution of task_finalizer, which will try
> to remove the task from wind_task_table. However, due to specific timing
> situations, it may be possible  that the started task has not yet added itself
> to the list (it may still be "just" leaving threadobj_prologue()), leading to
> a segmentation fault. This behavior can be easily reproduced by adding a delay
> in task_trampoline() at the following location in taskLib.c:
> 
> ret = __bt(threadobj_prologue(&task->thobj, task->name));
> 
> usleep(100000);
> 
> write_lock_nocancel(&wind_task_lock);
> pvlist_append(&task->next, &wind_task_list);
> write_unlock(&wind_task_lock);
> 
> 
> writing a simple test application that simply calls
> 
> SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
> taskDelete(tid);
> 
> will then reliably demonstrate the issue.
> 
> Moving the last three lines appending the task to the list above threadobj_prologue()
> seems to fix the issue, although I am not 100% sure about possible side effects.
> 
> Thanks in advance for commenting my issue,
> regards,
> 

All correct, the bug bites as described. As you suspected, simply swapping the statements would introduce another race. Could you try this patch instead? TIA,

diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c
index 1ef52a5..e3457a3 100644
--- a/lib/vxworks/taskLib.c
+++ b/lib/vxworks/taskLib.c
@@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj)
 {
 	struct wind_task *task = container_of(thobj, struct wind_task, thobj);
 
-	write_lock_nocancel(&wind_task_lock);
-	pvlist_remove(&task->next);
-	write_unlock(&wind_task_lock);
+	if (pvholder_linked(&task->next)) {
+		write_lock_nocancel(&wind_task_lock);
+		pvlist_remove(&task->next);
+		write_unlock(&wind_task_lock);
+	}
 
 	task->tcb->status |= WIND_DEAD;
 	cluster_delobj(&wind_task_table, &task->cobj);
@@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task,
 	idata.finalizer = task_finalizer;
 	idata.priority = cprio;
 	threadobj_init(&task->thobj, &idata);
+	initpvh(&task->next);
 
 	ret = __bt(cluster_addobj(&wind_task_table, task->name, &task->cobj));
 	if (ret) {


-- 
Philippe.


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
  2014-02-03 11:03 ` Philippe Gerum
@ 2014-02-04 19:12   ` Matthias Schneider
  2014-02-06  9:51     ` Philippe Gerum
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Schneider @ 2014-02-04 19:12 UTC (permalink / raw)
  To: xenomai@xenomai.org

> On Monday, February 3, 2014 12:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:

> > On 02/02/2014 09:37 PM, Matthias Schneider wrote:
>>  Hi all, there seems to be a race condition in the VxWorks skin in xenomai-forge
>>  when creating and deleting tasks in rapid succession. The problem seems to be the
>>  adding and removing of tasks to wind_task_list in task_trampoline() and
>>  task_finalizer().
>> 
>>  When creating a task, before returning a handle to the created task to the
>>  application, calling threadobj_start() from taskActivate() will wait for
>>  &thobj->barrier by calling wait_on_barrier(). This condition is triggered from
>>  the newly created task via task_trampoline() calling threadobj_prologue().
>>  Having unblocked taskActivate(), the creating task/thread will return to the
>>  calling application, which in turn can delete the task in quick succession.
>>  Deleting the task will trigger the execution of task_finalizer, which will try
>>  to remove the task from wind_task_table. However, due to specific timing
>>  situations, it may be possible  that the started task has not yet added itself
>>  to the list (it may still be "just" leaving 
>>  threadobj_prologue()), leading to
>>  a segmentation fault. This behavior can be easily reproduced by adding a delay
>>  in task_trampoline() at the following location in taskLib.c:
>> 
>>  ret = __bt(threadobj_prologue(&task->thobj, task->name));
>> 
>>  usleep(100000);
>> 
>>  write_lock_nocancel(&wind_task_lock);
>>  pvlist_append(&task->next, &wind_task_list);
>>  write_unlock(&wind_task_lock);
>> 
>> 
>>  writing a simple test application that simply calls
>> 
>>  SEM_ID tid = taskSpawn(name, 50, 0, 0, task, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
>>  taskDelete(tid);
>> 
>>  will then reliably demonstrate the issue.
>> 
>>  Moving the last three lines appending the task to the list above threadobj_prologue()
>>  seems to fix the issue, although I am not 100% sure about possible side effects.
>> 
>>  Thanks in advance for commenting my issue,
>>  regards,
>> 
> 
> All correct, the bug bites as described. As you suspected, simply swapping the 
> statements would introduce another race. Could you try this patch instead? TIA,
> 
> diff --git a/lib/vxworks/taskLib.c b/lib/vxworks/taskLib.c
> index 1ef52a5..e3457a3 100644
> --- a/lib/vxworks/taskLib.c
> +++ b/lib/vxworks/taskLib.c
> @@ -151,9 +151,11 @@ static void task_finalizer(struct threadobj *thobj)
> {
>     struct wind_task *task = container_of(thobj, struct wind_task, thobj);
> 
> -    write_lock_nocancel(&wind_task_lock);
> -    pvlist_remove(&task->next);
> -    write_unlock(&wind_task_lock);
> +    if (pvholder_linked(&task->next)) {
> +        write_lock_nocancel(&wind_task_lock);
> +        pvlist_remove(&task->next);
> 
> +        write_unlock(&wind_task_lock);
> +    }
> 
>     task->tcb->status |= WIND_DEAD;
>     cluster_delobj(&wind_task_table, &task->cobj);
> @@ -356,6 +358,7 @@ static STATUS __taskInit(struct wind_task *task,
>     idata.finalizer = task_finalizer;
>     idata.priority = cprio;
>     threadobj_init(&task->thobj, &idata);
> +    initpvh(&task->next);
> 
>     ret = __bt(cluster_addobj(&wind_task_table, task->name, 
> &task->cobj));
>     if (ret) {
> 
> 
> -- 
> Philippe.
> 


Hi Philippe,

the patch seems to address the original problem. Should this patch be merged
into the repository?

However, I seem to stumble into a different problem. If I understand correctly,
CANCEL_DEFER() should prevent the thread/task from being cancelled. I have verified
that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL
macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER
and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks
the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly,
due to wind_task_lock being held (consistent with the observed printf behavior).
I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the
apparently non-functioning pthread_setcanceltype ?

Thanks in advance,

Matthias


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge
  2014-02-04 19:12   ` Matthias Schneider
@ 2014-02-06  9:51     ` Philippe Gerum
  0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2014-02-06  9:51 UTC (permalink / raw)
  To: Matthias Schneider, xenomai@xenomai.org

On 02/04/2014 08:12 PM, Matthias Schneider wrote:
> the patch seems to address the original problem. Should this patch be merged
> into the repository?
>

It is on its way to 'master', currently pending in 'next'.

> However, I seem to stumble into a different problem. If I understand correctly,
> CANCEL_DEFER() should prevent the thread/task from being cancelled.

...asynchronously. This is the important missing part.

I have verified
> that pthread_setcanceltype() is being entered and the CONFIG_XENO_ASYNC_CANCEL
> macro has thus been set correctly. Nevertheless, putting a printf after CANCEL_DEFER
> and before CANCEL_RESTORE shows that when rapidly creating and destroying tasks
> the CANCEL_RESTORE printf is not printed. Also, I run into a deadlock very quickly,
> due to wind_task_lock being held (consistent with the observed printf behavior).
> I am using Mercury on top on an unpatched linux kernel. Any idea how I can debug the
> apparently non-functioning pthread_setcanceltype ?
>

Per POSIX, printf() is a cancellation point, any pending cancellation 
request will be handled unconditionally when traversing it. CANCEL_DEFER 
will only postpone async cancellation requests, until a cancellation 
point is traversed or async cancellation mode is enabled back, whichever 
comes first.

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html

-- 
Philippe.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-06  9:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02 20:37 [Xenomai] race condition when creating and deleting VxWorks tasks in xenomai-forge Matthias Schneider
2014-02-03 11:03 ` Philippe Gerum
2014-02-04 19:12   ` Matthias Schneider
2014-02-06  9:51     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.