All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: rpm@xenomai.org
Cc: xenomai-core <xenomai@xenomai.org>
Subject: Re: [Xenomai-core] [BUG] module_put called over non-root domain
Date: Mon, 21 May 2007 19:49:59 +0200	[thread overview]
Message-ID: <4651DBC7.1000305@domain.hid> (raw)
In-Reply-To: <1179766310.473.168.camel@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 6344 bytes --]

Philippe Gerum wrote:
> On Mon, 2007-05-21 at 18:21 +0200, Jan Kiszka wrote:
>>> This code has to work with an awful lot of runtime situations, including
>>> those raised by ptracing/GDB, and thread signaling in general, so test
>>> harnessing is key here.
>> Well, what about listing those scenarios and their requirements in a
>> comment if this is so critical?
> 
> Eh, because I'm likely lazy as you are? :o>
> 

Sigh, that's not a fair argument... ;->

>>  That may make /me feel better as I could
>> more easily go through them and check changes like those of today
>> against them.
>>
> 
> Those situations have been encountered when debugging, there is no
> limited set of golden rules to document, I'm afraid, this would have
> been far too easy. The entire signal handling wrt ptracing is a terrible
> mess - somebody must have endured a very bad Karma to implement this the
> way it is (long life utrace). The ChangeLog might be a good start for
> comments related to fixes in shadow.c/unmap and various head banging
> sessions I went through regarding this (2.4/2.6).
> 
>>>>  and I'm furthermore
>>>> not sure of the old code was handling SMP scenarios safely (What if the
>>>> thread to be unmapped was running on different CPU than xnshadow_unmap?
>>>> How to ensure test-atomicity then?).
>>>>
>>> This wakeup call is there to release emerging shadows waiting on their
>>> start barrier that get killed before start. Other regular cases imply
>>> that either the current thread is exiting - in which case this is a no
>>> brainer, or it has been sent a group kill signal, in which case it has
>>> to wake up anyway.
>> I didn't _removed_ cases where the wakeup takes place, I _added_ some
>> more my removing the test.
>>
> 
> In the light of what I have just written, you would only cause useless
> wakeups doing so.

My question was if the test for "useless" was really race-safe on SMP.
Then we could introduce it again in the APC handler. Otherwise, it might
be safer to live with some degree of uselessness.

> 
>>>> Well, this thing seems to work,
>>> It does, actually.
>>>
>>>>  but it doesn't leave me with a good
>>>> feeling. IMHO, there are far too many cleanup cases with all the
>>>> combinations of non-shadow termination, shadow self-termination,
>>>> termination on task exit, various SMP scenarios, etc.
>>>>  Moreover, quite a
>>>> lot of stuff is executed in atomic scheduling hooks (namely on cleanup).
>>>> At least for shadow threads, I don't think all of this is necessary.
>>>> Actually, deferring most of the uncritical cleanup work into Linux
>>>> context would make things more regular, easier to review, and likely
>>>> less disturbing for overall system latencies. I know that atomic
>>>> scheduling hooks are required to implement certain skins, but I don't
>>>> see why janitor work should be done there as well. And we also still
>>>> have that tiny uncleanness on shadow threads termination around TCB
>>>> release vs. hook execution.
>>> Talking about xnshadow_unmap, and as your patch illustrates, what you
>>> could postpone is basically the first part of the routine, which updates
>>> the reference counts. The rest _must_ run over the deletion hook in
>>> atomic context (already runs fine all Linux debug switches on). The
>> rpi_pop()? No chance to call it before entering the scheduler and then
>> the hooks?
> 
> What do you have in mind precisely?

Preemptibility whenever having all things in a single atomic region
doesn't buy us much. Even if rpi_pop per se might not be heavy, adding
it to an already complex path doesn't improve things.

So my spontaneous idea was, as you said the rest _must_ be atomic, if
that piece couldn't be actually moved into the task deletion service and
the task_exit Linux hook, ie. before taking the nklock again and do the
final reschedule. Just an example for what might be improvable - once we
dig here again.

> 
>>> latency hit is certainly not seen there; the real issues are left in the
>>> native and POSIX skin thread deletion routines (heap management and
>>> other housekeeping calls under nklock and so on).
>> That's surely true. Besides RPI manipulation, there is really no
>> problematic code left here latency-wise. Other hooks are trickier. But
>> my point is that we first need some infrastructure to provide an
>> alternative cleanup context before we can start moving things. My patch
>> today is a hack...err...workaround from this perspective.
>>
> 
> I don't see it this way. This code is very specific, in the sense it
> lays on the inter-domain boundary between Linux and Xenomai's primary
> mode for a critical operation like thread deletion, this is what make
> things a bit confusing. What's below and above this layer has to be as
> seamless as possible, this code in particular can't.

Well, we could move the memory release of the shadow thread in Linux
context e.g. (release on idle, a bit like RCU - uuuh). Then the
reference counter can find a new home as well. Maybe some more things,
dunno yet.

> 
>>>> As we are already at it: LO_MAX_REQUESTS is a hidden limitation that
>>>> may (theoretically) bite large setups when e.g. too many threads gets
>>>> killed from the "wrong" context. At least some overflow detection is
>>>> required here to warn the unfortunate user that he reached hard-coded
>>>> scalability limits.
>>> Yes.
>> CONFIG_XENO_OPT_DEBUG[_NUCLEUS] or unconditionally (error return code)?
>>
> 
> Making it depend on the debug switch looks good. People who are serious
> about making production software do want to activate this switch once in
> a while while developing it, really.
> 

--- xenomai.orig/ksrc/nucleus/shadow.c
+++ xenomai/ksrc/nucleus/shadow.c
@@ -890,11 +890,19 @@ static void schedule_linux_call(int type
 		);

 	splhigh(s);
+
 	reqnum = rq->in;
+
+	if (XENO_DEBUG(NUCLEUS) &&
+	    ((reqnum + 1) & (LO_MAX_REQUESTS - 1)) == rq->out)
+		xnpod_fatal("lostage queue overflow on CPU %d! "
+			    "Increase LO_MAX_REQUESTS", cpuid);
+
 	rq->req[reqnum].type = type;
 	rq->req[reqnum].task = p;
 	rq->req[reqnum].arg = arg;
 	rq->in = (reqnum + 1) & (LO_MAX_REQUESTS - 1);
+
 	splexit(s);

 	rthal_apc_schedule(lostage_apc);



Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

  reply	other threads:[~2007-05-21 17:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-20 18:56 [Xenomai-core] [BUG] module_put called over non-root domain Jan Kiszka
2007-05-21 14:43 ` Jan Kiszka
2007-05-21 15:17   ` Gilles Chanteperdrix
2007-05-21 15:44     ` Jan Kiszka
2007-05-21 15:25   ` Philippe Gerum
2007-05-21 16:07   ` Philippe Gerum
2007-05-21 16:21     ` Jan Kiszka
2007-05-21 16:51       ` Philippe Gerum
2007-05-21 17:49         ` Jan Kiszka [this message]
2007-05-24  9:20           ` Philippe Gerum
2007-05-24  9:43             ` Jan Kiszka
2007-05-25  0:37               ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4651DBC7.1000305@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.