From: Philippe Gerum <rpm@xenomai.org>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: Xenomai <xenomai@xenomai.org>
Subject: Re: [Xenomai] kernel NULL pointer dereference
Date: Sat, 08 Sep 2012 12:41:19 +0200 [thread overview]
Message-ID: <504B20CF.8010100@xenomai.org> (raw)
In-Reply-To: <5047B2B3.2050309@xenomai.org>
On 09/05/2012 10:14 PM, Gilles Chanteperdrix wrote:
> On 09/05/2012 02:10 PM, Henri Roosen wrote:
>
>> On Wed, Sep 5, 2012 at 1:21 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> Anyway, what seems to happen is that your application calls exit, while
>>>> some thread was waiting for a a PI mutex, the nucleus tries to send a
>>>> signal to the mutex holder. However, something gets wrong, and the mutex
>>>> holder task pointer is invalid.
>>>>
>>>> What is strange, also, is how a task can be waiting for a mutex and
>>>> calling exit at the same time. Could you try to increase the number of
>>>> trace points to say 1000 points?
>>>
>>>
>>> Answering myself. The thread killed is the one holding the mutex. The
>>> signal is sent to this precise thread, so this may fail because the
>>> thread is in the process of being destroyed, and its user_task pointer
>>> is no longer valid.
>>
>> Please find attached ipipe_trace_2.txt that has the number of
>> tracepoints to 1000. Note that this log also doesn't trace whether
>> irqs are off (arch_irqs_disable_flags is not in the current ipipe tree
>> yet either).
>>
>> I will find out why the application is doing a sys_exit. However I'm
>> not sure how this is related to the thread affinity; when not setting
>> the affinity, the problem is not reproducable.
>
>
> Please try the following patch, it should avoid the bug, but I will
> wait for Philippe's ack before pushing it:
>
> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> index 260fdef..1f1a737 100644
> --- a/ksrc/nucleus/shadow.c
> +++ b/ksrc/nucleus/shadow.c
> @@ -274,7 +274,9 @@ static void rpi_update(struct xnthread *thread)
> if (rpi_p(thread)) {
> xnsched_pop_rpi(thread);
> thread->rpi = NULL;
> - rpi_push(sched, thread);
> +
> + if (xnthread_user_task(thread))
> + rpi_push(sched, thread);
> }
>
> xnlock_put_irqrestore(&sched->rpilock, s);
> @@ -1516,15 +1518,18 @@ EXPORT_SYMBOL_GPL(xnshadow_start);
> /* Called with nklock locked, Xenomai interrupts off. */
> void xnshadow_renice(struct xnthread *thread)
> {
> - /*
> - * We need to bound the priority values in the
> - * [1..MAX_RT_PRIO-1] range, since the Xenomai priority scale
> - * is a superset of the Linux priority scale.
> - */
> - int prio = normalize_priority(thread->cprio);
> + if (xnthread_user_task(thread)) {
> + /*
> + * We need to bound the priority values in the
> + * [1..MAX_RT_PRIO-1] range, since the Xenomai priority scale
> + * is a superset of the Linux priority scale.
> + */
> + int prio = normalize_priority(thread->cprio);
>
> - xnshadow_send_sig(thread, SIGSHADOW,
> - sigshadow_int(SIGSHADOW_ACTION_RENICE, prio), 1);
> + xnshadow_send_sig
> + (thread, SIGSHADOW,
> + sigshadow_int(SIGSHADOW_ACTION_RENICE, prio), 1);
> + }
>
> if (!xnthread_test_state(thread, XNDORMANT) &&
> xnthread_sched(thread) == xnpod_current_sched())
>
>
This patch is correct, but it does not fix the root cause. RPI seems to
be a victim here, not the culprit. As you already noticed, the main
issue is with a thread holding a contented mutex, which exits. This
causes the PIP boost to be cleared for it over its own taskexit handler,
after its task_struct backlink has been cleared. This leads to
schedule_linux_call() being called for a NULL task; enabling
CONFIG_XENO_OPT_DEBUG_NUCLEUS triggers the assertion properly in this
routine.
The source of all issues in this case is xnsynch_clear_boost() not
handling the particular case of a thread exiting with a contended mutex
held. We can reuse the thread zombie bit to flag this, and avoid any
useless renice.
Normally, there should not be any SMP-specific triggers of this bug,
the nucleus lock is held all through the deletion and synch de-boost paths.
If I'm right, the patch below should fix the original issue the same
way:
diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 5f53002..1f4a749 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -1187,12 +1187,12 @@ void xnpod_delete_thread(xnthread_t *thread)
if (xnthread_test_state(thread, XNPEND))
xnsynch_forget_sleeper(thread);
+ xnthread_set_state(thread, XNZOMBIE);
+
xnsynch_release_all_ownerships(thread);
__xnpod_giveup_fpu(sched, thread);
- xnthread_set_state(thread, XNZOMBIE);
-
if (sched->curr == thread) {
/*
* We first need to pick a new curr before
diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
index 695824e..e10be47 100644
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -608,7 +608,8 @@ static void xnsynch_clear_boost(struct xnsynch *synch,
target = owner;
}
- if (w_cprio(owner) != wprio)
+ if (w_cprio(owner) != wprio &&
+ !xnthread_test_state(owner, XNZOMBIE))
xnsynch_renice_thread(owner, target);
}
--
Philippe.
next prev parent reply other threads:[~2012-09-08 10:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-04 13:42 [Xenomai] kernel NULL pointer dereference Henri Roosen
2012-09-04 14:10 ` Gilles Chanteperdrix
2012-09-04 14:28 ` Henri Roosen
2012-09-04 18:33 ` Gilles Chanteperdrix
2012-09-04 18:43 ` Gilles Chanteperdrix
2012-09-08 6:18 ` Gilles Chanteperdrix
2012-09-04 19:21 ` Gilles Chanteperdrix
2012-09-05 7:26 ` Henri Roosen
2012-09-05 7:28 ` Gilles Chanteperdrix
2012-09-05 7:42 ` Henri Roosen
2012-09-05 8:28 ` Gilles Chanteperdrix
2012-09-05 9:29 ` Henri Roosen
2012-09-05 11:03 ` Gilles Chanteperdrix
2012-09-05 11:21 ` Gilles Chanteperdrix
2012-09-05 12:10 ` Henri Roosen
2012-09-05 12:25 ` Gilles Chanteperdrix
2012-09-05 19:22 ` Gilles Chanteperdrix
2012-09-05 20:38 ` Gilles Chanteperdrix
2012-09-06 8:40 ` Henri Roosen
2012-09-06 8:57 ` Gilles Chanteperdrix
2012-09-06 14:33 ` Henri Roosen
2012-09-06 18:47 ` Gilles Chanteperdrix
2012-09-05 20:14 ` Gilles Chanteperdrix
2012-09-08 10:41 ` Philippe Gerum [this message]
2012-09-08 10:43 ` Gilles Chanteperdrix
2012-09-08 11:57 ` Gilles Chanteperdrix
2012-09-08 12:10 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=504B20CF.8010100@xenomai.org \
--to=rpm@xenomai.org \
--cc=gilles.chanteperdrix@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.