All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: rpm@xenomai.org
Cc: xenomai-core <xenomai@xenomai.org>
Subject: Re: [Xenomai-core] [BUG] module_put called over non-root domain
Date: Mon, 21 May 2007 18:21:34 +0200	[thread overview]
Message-ID: <4651C70E.8080908@domain.hid> (raw)
In-Reply-To: <1179763621.473.145.camel@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 5632 bytes --]

Philippe Gerum wrote:
> On Mon, 2007-05-21 at 16:43 +0200, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> Jan Kiszka wrote:
>>> Just ran into this with CONFIG_IPIPE_DEBUG_CONTEXT (maybe due to some
>>> bug of my own):
>> Here is some code to trigger the issue reliably:
>>
>> #include <sys/mman.h>
>> #include <native/task.h>
>>
>> void task_fnct(void *arg)
>> {
>>         rt_task_delete(NULL);
>> }
>>
>> main()
>> {
>>         RT_TASK task;
>>         mlockall(MCL_CURRENT|MCL_FUTURE);
>>         rt_task_spawn(&task, "task", 0, 10, 0, task_fnct, NULL);
>> }
>>
>>
>>> [  102.616000] I-pipe: Detected illicit call from domain 'Xenomai'
>>> [  102.616000]         into a service reserved for domain 'Linux' and below.
>>> [  102.616000]        c741bdc8 00000000 00000000 c8860ef8 c741bdec c0105683 c032c200 c13fe22c
>>> [  102.616000]        c0361f00 c741be08 c01519ed c032f5b8 c032c742 c03380b3 c8885100 c78beac0
>>> [  102.616000]        c741be14 c0142ce9 c7a80b30 c741be3c c884d075 c885f150 c8860ef8 c741be3c
>>> [  102.616000] Call Trace:
>>> [  102.616000]  [<c0104d9f>] show_trace_log_lvl+0x1f/0x40
>>> [  102.616000]  [<c0104e71>] show_stack_log_lvl+0xb1/0xe0
>>> [  102.616000]  [<c0105683>] show_stack+0x33/0x40
>>> [  102.616000]  [<c01519ed>] ipipe_check_context+0xad/0xc0
>>> [  102.616000]  [<c0142ce9>] module_put+0x19/0x90
>>> [  102.616000]  [<c884d075>] xnshadow_unmap+0xb5/0x130 [xeno_nucleus]
>>> [  102.616000]  [<c8871dc5>] __shadow_delete_hook+0x25/0x30 [xeno_native]
>>> [  102.616000]  [<c8842f78>] xnpod_schedule+0xb58/0x12f0 [xeno_nucleus]
>>> [  102.616000]  [<c8844bfb>] xnpod_delete_thread+0x2cb/0x3d0 [xeno_nucleus]
>>> [  102.616000]  [<c886f5bd>] rt_task_delete+0x20d/0x220 [xeno_native]
>>>
>>> I would dare to say that module_put in xnshadow_unmap is not well placed
>>> as it can wakeup a Linux process. The module ref-counter maintenance
>>> needs some postponing, I guess.
> 
> Mm, I should definitely read mails entirely.
> 
>> Attached is a patch proposal. It solves the issue by postponing the
>> module_put via a new schedule_linux_call. Note that this approach issues
>> LO_WAKEUP_REQ where the old test (p->state != TASK_RUNNING) would not
>> have done so. I don't see negative side effects yet,
> 
> This code has to work with an awful lot of runtime situations, including
> those raised by ptracing/GDB, and thread signaling in general, so test
> harnessing is key here.

Well, what about listing those scenarios and their requirements in a
comment if this is so critical? That may make /me feel better as I could
more easily go through them and check changes like those of today
against them.

> 
>>  and I'm furthermore
>> not sure of the old code was handling SMP scenarios safely (What if the
>> thread to be unmapped was running on different CPU than xnshadow_unmap?
>> How to ensure test-atomicity then?).
>>
> 
> This wakeup call is there to release emerging shadows waiting on their
> start barrier that get killed before start. Other regular cases imply
> that either the current thread is exiting - in which case this is a no
> brainer, or it has been sent a group kill signal, in which case it has
> to wake up anyway.

I didn't _removed_ cases where the wakeup takes place, I _added_ some
more my removing the test.

> 
>> Well, this thing seems to work,
> 
> It does, actually.
> 
>>  but it doesn't leave me with a good
>> feeling. IMHO, there are far too many cleanup cases with all the
>> combinations of non-shadow termination, shadow self-termination,
>> termination on task exit, various SMP scenarios, etc.
>>  Moreover, quite a
>> lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
>> At least for shadow threads, I don't think all of this is necessary.
>> Actually, deferring most of the uncritical cleanup work into Linux
>> context would make things more regular, easier to review, and likely
>> less disturbing for overall system latencies. I know that atomic
>> scheduling hooks are required to implement certain skins, but I don't
>> see why janitor work should be done there as well. And we also still
>> have that tiny uncleanness on shadow threads termination around TCB
>> release vs. hook execution.
> 
> Talking about xnshadow_unmap, and as your patch illustrates, what you
> could postpone is basically the first part of the routine, which updates
> the reference counts. The rest _must_ run over the deletion hook in
> atomic context (already runs fine all Linux debug switches on). The

rpi_pop()? No chance to call it before entering the scheduler and then
the hooks?

> latency hit is certainly not seen there; the real issues are left in the
> native and POSIX skin thread deletion routines (heap management and
> other housekeeping calls under nklock and so on).

That's surely true. Besides RPI manipulation, there is really no
problematic code left here latency-wise. Other hooks are trickier. But
my point is that we first need some infrastructure to provide an
alternative cleanup context before we can start moving things. My patch
today is a hack...err...workaround from this perspective.

> 
>> As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
>> may (theoretically) bite large setups when e.g. too many threads gets
>> killed from the "wrong" context. At least some overflow detection is
>> required here to warn the unfortunate user that he reached hard-coded
>> scalability limits.
> 
> Yes.

CONFIG_XENO_OPT_DEBUG[_NUCLEUS] or unconditionally (error return code)?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

  reply	other threads:[~2007-05-21 16:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-20 18:56 [Xenomai-core] [BUG] module_put called over non-root domain Jan Kiszka
2007-05-21 14:43 ` Jan Kiszka
2007-05-21 15:17   ` Gilles Chanteperdrix
2007-05-21 15:44     ` Jan Kiszka
2007-05-21 15:25   ` Philippe Gerum
2007-05-21 16:07   ` Philippe Gerum
2007-05-21 16:21     ` Jan Kiszka [this message]
2007-05-21 16:51       ` Philippe Gerum
2007-05-21 17:49         ` Jan Kiszka
2007-05-24  9:20           ` Philippe Gerum
2007-05-24  9:43             ` Jan Kiszka
2007-05-25  0:37               ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4651C70E.8080908@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.