From: Philippe Gerum <rpm@xenomai.org>
To: Matthias Schneider <ma30002000@yahoo.de>,
"xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] Issue with cobalt_monitor_wait()
Date: Thu, 10 Jul 2014 11:32:54 +0200 [thread overview]
Message-ID: <53BE5DC6.9040406@xenomai.org> (raw)
In-Reply-To: <1404942153.53492.YahooMailNeo@web171605.mail.ir2.yahoo.com>
On 07/09/2014 11:42 PM, Matthias Schneider wrote:
> ----- Original Message -----
>
>> From: Philippe Gerum <rpm@xenomai.org>
>> To: Matthias Schneider <ma30002000@yahoo.de>; "xenomai@xenomai.org" <xenomai@xenomai.org>
>> Cc:
>> Sent: Wednesday, July 9, 2014 11:52 AM
>> Subject: Re: [Xenomai] Issue with cobalt_monitor_wait()
>>
>> On 07/08/2014 06:10 PM, Matthias Schneider wrote:
>>> ----- Original Message -----
>>>
>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>> To: Matthias Schneider <ma30002000@yahoo.de>;
>> "xenomai@xenomai.org" <xenomai@xenomai.org>
>>>> Cc:
>>>> Sent: Sunday, July 6, 2014 11:15 PM
>>>> Subject: Re: [Xenomai] Issue with cobalt_monitor_wait()
>>>>
>>>> On 07/06/2014 10:16 PM, Matthias Schneider wrote:
>>>>
>>>> [snip]
>>>>
>>>>> On thing I do not understand is:
>>>>>
>>>>> in kernel cobalt_monitor_wait(), the synch object is unlocked via
>>>>> xnsynch_release(). What happens if this synchobj was locked via
>>>>> mon->gate.fastlock ? Shouldnt that also be released?
>>>>>
>>>>
>>>> xnsynch_release() handles fastlocks as well.
>>>>
>>>>
>>>>> What other reason could there be if the synch object was released
>>>>> via xnsynch_release, xnsynch_acquire was interrupted for
>>>>> xnsynch_release to block?
>>>>>
>>>>
>>>> Since the issue seems to be easily reproducible, could you send a
>>>> self-contained piece of code illustrating it?
>>>>
>>>> Also, please mention if you are seeing this issue only when running
>> your
>>>> app over GDB, or if it currently happens without any debugger attached.
>>>>
>>>> TIA,
>>>
>>>
>>> It seems I have not described the problematic scenario completely -
>>>
>>> there were two other threads that call called syncobj_lock()
>>> / cobalt_monitor_enter() at about the same time. (Actually there
>>> are three concurrent on the queue that is being tested, two receive
>>> operation and one send operation). I am pretty sure that the issue is
>>> extremely timing dependent.
>>>
>>> Anyway, the testcase would be
>>>
>>> queue_test_receive_peek_multiple_tasks()
>>>
>>
>> I could not reproduce the issue yet, but could you check if this patch
>> has any influence on this bug? TIA,
>>
>> diff --git a/kernel/cobalt/posix/syscall.c b/kernel/cobalt/posix/syscall.c
>> index d921d81..3856794 100644
>> --- a/kernel/cobalt/posix/syscall.c
>> +++ b/kernel/cobalt/posix/syscall.c
>> @@ -156,7 +156,7 @@ static struct xnsyscall cobalt_syscalls[] = {
>> SKINCALL_DEF(sc_cobalt_monitor_enter, cobalt_monitor_enter, primary),
>> SKINCALL_DEF(sc_cobalt_monitor_wait, cobalt_monitor_wait,
>> nonrestartable),
>> SKINCALL_DEF(sc_cobalt_monitor_sync, cobalt_monitor_sync,
>> nonrestartable),
>> - SKINCALL_DEF(sc_cobalt_monitor_exit, cobalt_monitor_exit, primary),
>> + SKINCALL_DEF(sc_cobalt_monitor_exit, cobalt_monitor_exit, nonrestartable),
>> SKINCALL_DEF(sc_cobalt_event_init, cobalt_event_init, current),
>> SKINCALL_DEF(sc_cobalt_event_destroy, cobalt_event_destroy, current),
>> SKINCALL_DEF(sc_cobalt_event_wait, cobalt_event_wait, primary),
>> diff --git a/lib/cobalt/internal.c b/lib/cobalt/internal.c
>> index e0d990d..6c1331d 100644
>> --- a/lib/cobalt/internal.c
>> +++ b/lib/cobalt/internal.c
>> @@ -230,6 +230,7 @@ int cobalt_monitor_exit(cobalt_monitor_t *mon)
>> struct cobalt_monitor_data *datp;
>> unsigned long status;
>> xnhandle_t cur;
>> + int ret;
>>
>> __sync_synchronize();
>>
>> @@ -246,9 +247,13 @@ int cobalt_monitor_exit(cobalt_monitor_t *mon)
>> if (xnsynch_fast_release(&datp->owner, cur))
>> return 0;
>> syscall:
>> - return XENOMAI_SKINCALL1(__cobalt_muxid,
>> - sc_cobalt_monitor_exit,
>> - mon);
>> + do
>> + ret = XENOMAI_SKINCALL1(__cobalt_muxid,
>> + sc_cobalt_monitor_exit,
>> + mon);
>> + while (ret == -EINTR);
>> +
>> + return ret;
>> }
>>
>> int cobalt_monitor_wait(cobalt_monitor_t *mon, int event,
>>
>
>
> Hm, it seems when I run into the issue, cobalt_monitor_exit() isnt
> called at all...
>
> Having compiled the cobalt kernel without optimization,
> I noticed that cobalt_monitor_wait() actually sets u_ret = -EINTR and
> apparently cobalt_monitor_enter_inner() seems to work, thus setting ret
> to 0. However, in internal.c:cobalt_monitor_wait (in user mode),
> both ret and opret seem to be set to -EINTR. This would explain that
> the second call of internal.c:cobalt_monitor_wait to cobalt_monitor_enter
> will block indefinitely since the sync object is already locked.
>
> Investigating what else happens on the way back to user mode, it seems
> that the return code is changed from 0 to -EINTR by the following stack:
>
> #0 __xn_error_return (regs=0xde0fffb0, v=-4) at arch/arm/xenomai/include/asm/xenomai/syscall.h:62
> #1 prepare_for_signal (p=<optimized out>, thread=thread@entry=0xde702e08, regs=regs@entry=0xde0fffb0, sysflags=sysflags@entry=134) at kernel/xenomai/shadow.c:1842
> #2 0xc00c68a8 in handle_head_syscall (regs=0xde0fffb0, ipd=0xc07d63c0 <xnarch_machdata>) at kernel/xenomai/shadow.c:1996
> #3 ipipe_syscall_hook (ipd=0xc07d63c0 <xnarch_machdata>, regs=0xde0fffb0) at kernel/xenomai/shadow.c:2164
> #4 0xc00959a8 in __ipipe_notify_syscall (regs=regs@entry=0xde0fffb0) at kernel/ipipe/core.c:982
> #5 0xc0015c90 in __ipipe_syscall_root (scno=<optimized out>, regs=0xde0fffb0) at arch/arm/kernel/ipipe.c:417
>
> Apperently, the assumption of internal.c:cobalt_monitor_wait that a
> syscall return -EINTR indicates a failure to re-lock the sync object
> does not hold in this case. There are probably other cases where
> the same scenario may occur
>
> Unfortunately I do not yet know how to resolve this issue...
>
Actually, you did it. Thanks for the analysis. As you mentioned, the basic issue is with relocking the monitor gate upon EINTR, which is wrong: there must be a reason why we do this from userland...
The reason is with any blocking Cobalt syscall which must be aborted upon Linux signal receipt, which causes XNBREAK to be present in the thread state flags (handle_sigwake_event -> __xnshadow_kick()). And we must not hold the gate lock until the signal handler has run.
When a signal hits the sleeping syscall, we must unwind the context all way down the regular Linux syscall path, so that a signal frame is built for it. As part of this process, prepare_for_signal() switches the signaled context from primary to secondary mode.
In short, receiving EINTR in kernel space waiting for a monitor means unwinding back to the userland call site first, keeping the monitor gate free while running the handler, then grabbing the gate lock anew prior to returning to the caller.
Unblocking a thread forcibly can also happen when the latter receives the internal/special/not-so-hidden SIGRELS notification (see __cobalt_kill()), in which case XNBREAK is raised too. In such a case, we will relock from userland the same way.
I need to review the entire machinery for more non-sense of mine, but in the meantime, could you try this patch?
TIA,
diff --git a/kernel/cobalt/posix/monitor.c b/kernel/cobalt/posix/monitor.c
index 0ecaa6a..a61d028 100644
--- a/kernel/cobalt/posix/monitor.c
+++ b/kernel/cobalt/posix/monitor.c
@@ -283,9 +283,11 @@ int cobalt_monitor_wait(struct cobalt_monitor_shadow __user *u_mon,
if (list_empty(&mon->waiters) && !xnsynch_pended_p(&mon->drain))
datp->flags &= ~COBALT_MONITOR_PENDED;
- if (info & XNBREAK)
+ if (info & XNBREAK) {
opret = -EINTR;
- else if (info & XNTIMEO)
+ goto out;
+ }
+ if (info & XNTIMEO)
opret = -ETIMEDOUT;
}
--
Philippe.
next prev parent reply other threads:[~2014-07-10 9:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-06 20:16 [Xenomai] Issue with cobalt_monitor_wait() Matthias Schneider
2014-07-06 21:15 ` Philippe Gerum
2014-07-08 16:10 ` Matthias Schneider
2014-07-08 16:24 ` Philippe Gerum
2014-07-08 17:01 ` Philippe Gerum
2014-07-09 9:52 ` Philippe Gerum
2014-07-09 21:42 ` Matthias Schneider
2014-07-10 9:32 ` Philippe Gerum [this message]
2014-07-10 17:13 ` Matthias Schneider
2014-07-10 17:29 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53BE5DC6.9040406@xenomai.org \
--to=rpm@xenomai.org \
--cc=ma30002000@yahoo.de \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.