Re: [Xenomai-help] Issue with Auto relax and nested mutexes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philippe Gerum <rpm@xenomai.org>
To: Makarand Pradhan <makarandpradhan@domain.hid>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai-help] Issue with Auto relax and nested mutexes
Date: Thu, 19 Jan 2012 12:25:18 +0100	[thread overview]
Message-ID: <4F17FD9E.6040007@domain.hid> (raw)
In-Reply-To: <4F174AA3.7030309@domain.hid>

On 01/18/2012 11:41 PM, Makarand Pradhan wrote:
> Hi,
>
> Another problem was encountered with rescnt related to nested mutexes.
>
> This time the rescnt is not incrementing because the XNOTHER bit is not
> set, causing a SIGDEBUG or SIGXCPU to be delivered to the thread causing
> my application to crash.
>
> The scenario is as follows:
>
> 1. Thread started with priority 0. (Relaxed)
> 2. This thread uses mutexes which causes Priority Inversions.
> 3. At some point, a rt_task_set_priority is done to change the priority.
> (RT 85).
> 4. Some time later the priority is set back to 0.

If I understand it properly, your runtime scenario is badly broken I'm 
afraid. By contrast to priority ceiling, priority inheritance is about 
leaving the responsibility to the _kernel_ to pick the best dynamic 
priority for your thread to solve a priority inversion.

Therefore, by changing your dynamic priority while holding a mutex, your 
application is preventing the kernel to do the job you previously 
assigned to it. Worst, you could be causing unexpected latencies to 
other threads your application has no clue about, or just can't tell 
whether they compete with your thread for accessing the resource at that 
specific time.

After all, this is your application that defined the contented mutex, 
and as such the fact that priority inheritance might be involved at some 
point. If you don't trust the kernel and want to deal with priorities 
manually during resource contention, then maybe you should use a 
different mutual exclusion mechanism not implementing priority 
inheritance, e.g. a plain binary semaphore.

>
> The problem again revolves around setting XNOTHER. In the problem
> scenario, the XNOTHER bit is not set in xnsynch_acquire. Hence the
> rescnt is not incremented.
>
> The reason for that is, while doing a rt_task_set_priority,
> __xnsched_rt_setparam is invoked before the thread is reniced.
>
> To resolve this issue, I had to set the XNOTHER bit in
> __xnpod_set_thread_schedparam after the thread was reniced or in
> rt_task_set_priority. Both the code changes are given below:
>
>
> rt_task_set_priority(....
>
> + if (0==prio)
> + {
> + xnthread_set_state(&task->thread_base, XNOTHER);
> + }
>
>
> xnpod_set_thread_schedparam(...
>
> #ifdef CONFIG_XENO_OPT_PERVASIVE
> if (propagate) {
> if (xnthread_test_state(thread, XNRELAX))
> xnshadow_renice(thread);
> else if (xnthread_test_state(thread, XNSHADOW))
> xnthread_set_info(thread, XNPRIOSET);
> }
>
> + if (xnthread_test_state(thread, XNSHADOW)) {
> + // if (thread->bprio || !xnthread_test_state(thread, XNBOOST))
> + if (thread->bprio)
> + xnthread_clear_state(thread, XNOTHER);
> + else
> + xnthread_set_state(thread, XNOTHER);
> + }
>
>
> Setting XNOTHER in rt_task_set_priority does not look appropriate. I
> believe the right place is in the xnpod_set_thread_schedparam.
>
> Would highly appreciate your views.
>
> Rgds,
> Mak
>
>
> On 10/01/12 02:10 PM, Makarand Pradhan wrote:
>> The patch does work. Thanks.
>>
>> Will it be available in the next release of xenomai?
>>
>> Rgds,
>> Mak
>>
>> root@domain.hid:~# ./relax 0 1
>> Spawning: tasks
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> Release complete
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> Release complete
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> Release complete
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> Grabbing mux in HP
>> Mux held by Task2
>> Release complete
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> Release complete
>> bP: 0, cp: 0, mode: 0
>> Acquire complete
>> ^C
>> root@domain.hid:~#
>>
>>
>> On 10/01/12 01:39 PM, Makarand Pradhan wrote:
>>> Hi Phillipe,
>>>
>>> A bit surprised to see a change in sched-rt.h. I had another problem
>>> earlier where the XNOTHER was not getting set after a priority change. I
>>> had to look at the code that you have modified. Although I had
>>> temporarily worked around it by setting the XNOTHER in
>>> rt_task_set_priority. I think this would fix that problem as well.
>>>
>>> Will test the patch and get back with the results.
>>>
>>> Thanks and Rgds,
>>> Mak.
>>>
>>> On 10/01/12 01:08 PM, Philippe Gerum wrote:
>>>> On 01/10/2012 04:51 PM, Makarand Pradhan wrote:
>>>>> Based on my testing, it is noted that the rescnt is not released when
>>>>> task1 gets a priority boost and starts running with priority 1. That's
>>>>> when the rescnt is not decremented.
>>>>>
>>>>> It would imply that we may be checking the current priority while
>>>>> testing if we want to invoke rt_mutex_release in kernel. Will try to
>>>>> check it out.
>>>> Does this help in your case?
>>>>
>>>> diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
>>>> index cc1cefa..6ac8fd7 100644
>>>> --- a/include/nucleus/sched-rt.h
>>>> +++ b/include/nucleus/sched-rt.h
>>>> @@ -87,7 +87,7 @@ static inline void __xnsched_rt_setparam(struct
>>>> xnthread *thread,
>>>> {
>>>> thread->cprio = p->rt.prio;
>>>> if (xnthread_test_state(thread, XNSHADOW)) {
>>>> - if (thread->cprio)
>>>> + if (thread->bprio || !xnthread_test_state(thread, XNBOOST))
>>>> xnthread_clear_state(thread, XNOTHER);
>>>> else
>>>> xnthread_set_state(thread, XNOTHER);
>>>>> Rgds,
>>>>> Mak.
>>>>>
>>>>> On 10/01/12 10:42 AM, Philippe Gerum wrote:
>>>>>> On 01/10/2012 04:40 PM, Philippe Gerum wrote:
>>>>>>> On 01/10/2012 04:40 PM, Makarand Pradhan wrote:
>>>>>>>> Another point:
>>>>>>>>
>>>>>>>> "These are fast mutexes, the thread does not have to jump to kernel
>>>>>>>> space
>>>>>>>> unless the released mutex was actually contented."
>>>>>>>>
>>>>>>>> When the first task is started with prio 0, I always see that
>>>>>>>> rt_mutex_release is invoked in the kernel. even when there is no
>>>>>>>> contention.
>>>>>>> I should have added: "unless there is no contention ... or the
>>>>>>> caller is
>>>>>>> a non-rt thread". This is because we have to jump to kernel space to
>>>>>>> track rescnt.
>>>>>>>
>>>>>> Ok, next try: "unless the mutex was contented ... or the caller is
>>>>>> a non-rt thread".
>>>>>>
>>>>>>>> I have an instrumented kernel. The kernel trace is given below.
>>>>>>>> In this
>>>>>>>> trace only task1 is running at prio 0. It should be easy to follow:
>>>>>>>>
>>>>>>>> Jan 10 10:36:59 ruggedcom kernel: lo: rescnt: 0, switched: 0
>>>>>>>> Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 0, switched: 0
>>>>>>>> Jan 10 10:36:59 ruggedcom kernel: lo: rescnt: 1, switched: 1
>>>>>>>> Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 2, switched: 0
>>>>>>>> Jan 10 10:36:59 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 2, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 1, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 0, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: lo: rescnt: 1, switched: 1
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 2, switched: 0
>>>>>>>> Jan 10 10:37:01 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 2, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 1, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: __rt_mutex_release
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: RML
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: rt_mutex_release: lockcnt: 1
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: xnsynch_release_thread: BP: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 0, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: lo: rescnt: 1, switched: 1
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 2, switched: 0
>>>>>>>> Jan 10 10:37:03 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>> Jan 10 10:37:04 ruggedcom kernel: hi: rescnt: 3, switched: 0
>>>>>>>>
>>>>>>>>
>>>>>>>> root@domain.hid:~# ./a.out 0 1
>>>>>>>> Spawning: tasks
>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>> Acquire complete
>>>>>>>> Release complete
>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>> Acquire complete
>>>>>>>> Release complete
>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>> Acquire complete
>>>>>>>> ^C
>>>>>>>>
>>>>>>>>
>>>>>>>> Rgds,
>>>>>>>> Mak.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/01/12 10:26 AM, Makarand Pradhan wrote:
>>>>>>>>> Hi Phillippe,
>>>>>>>>>
>>>>>>>>> You are right. Task 1 requires to be started with prio 0. I start
>>>>>>>>> seeing
>>>>>>>>> the problem after task2 grabs the mutex and releases them. The
>>>>>>>>> first
>>>>>>>>> task never jumps back to seconodary. Here is my output. The
>>>>>>>>> mode never
>>>>>>>>> goes back to 0 after "Grabbing mux in HP" and the rescnt stays
>>>>>>>>> stuck at
>>>>>>>>> 1 in the kernel.
>>>>>>>>>
>>>>>>>>> root@domain.hid:~# ./relax 0 1
>>>>>>>>> Spawning: tasks
>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>> Acquire complete
>>>>>>>>> Release complete
>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>> Acquire complete
>>>>>>>>> Release complete
>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>> Acquire complete
>>>>>>>>> Release complete
>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>> Acquire complete
>>>>>>>>> Grabbing mux in HP
>>>>>>>>> Mux held by Task2
>>>>>>>>> Release complete
>>>>>>>>> bP: 0, cp: 0, mode: 1
>>>>>>>>> Acquire complete
>>>>>>>>> Release complete
>>>>>>>>> bP: 0, cp: 0, mode: 1
>>>>>>>>> Acquire complete
>>>>>>>>>
>>>>>>>>> Rgds,
>>>>>>>>> Mak.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/01/12 10:11 AM, Philippe Gerum wrote:
>>>>>>>>>> On 01/09/2012 09:50 PM, Makarand Pradhan wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am running kernel 3.0.0, xenomai: 2.6, powerpc 8360.
>>>>>>>>>>>
>>>>>>>>>>> I am noticing an issue while using the auto relax feature
>>>>>>>>>>> related to
>>>>>>>>>>> mutexes. I am using nested mutexes. The code is attached to this
>>>>>>>>>>> email.
>>>>>>>>>>>
>>>>>>>>>>> The problem is that I am not relaxing after a RT thread grabs
>>>>>>>>>>> and
>>>>>>>>>>> releases a mutex. On further investigation, it was noted that
>>>>>>>>>>> the
>>>>>>>>>>> rescnt
>>>>>>>>>>> is not going down to 0.
>>>>>>>>>> From your code, task1 would auto-relax only if started with
>>>>>>>>>> priority 0,
>>>>>>>>>> which is what I get here:
>>>>>>>>>>
>>>>>>>>>> -bash-3.2# ./relax 0 1
>>>>>>>>>> Spawning: tasks
>>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>>> Acquire complete
>>>>>>>>>> Release complete
>>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>>> Acquire complete
>>>>>>>>>> Release complete
>>>>>>>>>> bP: 0, cp: 0, mode: 0
>>>>>>>>>> Acquire complete
>>>>>>>>>> Release complete
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> Conversely, I get the right behavior if setting a non-zero
>>>>>>>>>> priority to
>>>>>>>>>> task1:
>>>>>>>>>>
>>>>>>>>>> -bash-3.2# ./relax 1 0
>>>>>>>>>> Spawning: tasks
>>>>>>>>>> bP: 1, cp: 1, mode: 1
>>>>>>>>>> Acquire complete
>>>>>>>>>> Release complete
>>>>>>>>>> bP: 1, cp: 1, mode: 1
>>>>>>>>>> Acquire complete
>>>>>>>>>> Release complete
>>>>>>>>>> bP: 1, cp: 1, mode: 1
>>>>>>>>>> Acquire complete
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> In any case, the priority of task2 should have no impact on the
>>>>>>>>>> result.
>>>>>>>>>>
>>>>>>>>>> I'm running current 2.6 HEAD commit (168da46de), kernel
>>>>>>>>>> 3.1.5/powerpc32
>>>>>>>>>> (52xx), pipeline 2.13-06.
>>>>>>>>>>
>>>>>>>>>> Which priority arguments are you passing to your test program?
>>>>>>>>>>
>>>>>>>>>>> Another observation is that I do not hit
>>>>>>>>>>> rt_mutex_release in the kernel in the problem scenario, I
>>>>>>>>>>> believe
>>>>>>>>>>> when
>>>>>>>>>>> the thread undergoes a priority inversion.This may be a problem
>>>>>>>>>>> as the
>>>>>>>>>>> rescnt would not get decremented. Not sure how the mutex is
>>>>>>>>>>> releasing
>>>>>>>>>>> wiithout hitting rt_mutex_relase or am I missing anything?
>>>>>>>>>>>
>>>>>>>>>> These are fast mutexes, the thread does not have to jump to
>>>>>>>>>> kernel
>>>>>>>>>> space
>>>>>>>>>> unless the released mutex was actually contented.
>>>>>>>>>>
>>>>>>>>>>> If I have both the tasks running at priority 0, I stay in the
>>>>>>>>>>> secondary
>>>>>>>>>>> domain, rt_mutex_release is invoked as expected, the rescnt goes
>>>>>>>>>>> down to
>>>>>>>>>>> 0 when all the mutexes are released.
>>>>>>>>>>>
>>>>>>>>>>> Has anyone faced this problem?
>>>>>>>>>>>
>>>>>>>>>> I'm unsure there is any yet. Auto-relax applies to non -rt
>>>>>>>>>> Xenomai
>>>>>>>>>> threads only (i.e. prio == 0).
>>>>>>>>>>
>>>>>>>>>>> Rgds,
>>>>>>>>>>> Makarand
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Xenomai-help mailing list
>>>>>>>>>>> Xenomai-help@domain.hid
>>>>>>>>>>> https://mail.gna.org/listinfo/xenomai-help
>>> --
>>> ___________________________________________________________________________
>>>
>>> NOTICE OF CONFIDENTIALITY:
>>> This e-mail and any attachments may contain confidential and
>>> privileged information. If you are
>>> not the intended recipient, please notify the sender immediately by
>>> return e-mail and delete this
>>> e-mail and any copies. Any dissemination or use of this information
>>> by a person other than the
>>> intended recipient is unauthorized and may be illegal.
>>> _____________________________________________________________________
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Xenomai-help mailing list
>>> Xenomai-help@domain.hid
>>> https://mail.gna.org/listinfo/xenomai-help
>>
>
>


-- 
Philippe.

next prev parent reply	other threads:[~2012-01-19 11:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-09 20:50 [Xenomai-help] Issue with Auto relax and nested mutexes Makarand Pradhan
2012-01-10 15:11 ` Philippe Gerum
2012-01-10 15:26   ` Makarand Pradhan
2012-01-10 15:38     ` Philippe Gerum
2012-01-10 15:40     ` Makarand Pradhan
2012-01-10 15:40       ` Philippe Gerum
2012-01-10 15:42         ` Philippe Gerum
2012-01-10 15:51           ` Makarand Pradhan
2012-01-10 17:51             ` Philippe Gerum
2012-01-10 18:08             ` Philippe Gerum
2012-01-10 18:39               ` Makarand Pradhan
2012-01-10 19:10                 ` Makarand Pradhan
2012-01-10 20:30                   ` Philippe Gerum
2012-01-18 22:41                   ` Makarand Pradhan
2012-01-19 10:17                     ` Gilles Chanteperdrix
2012-01-19 11:25                     ` Philippe Gerum [this message]
2012-01-19 12:29                       ` Gilles Chanteperdrix
2012-01-19 15:35                         ` Makarand Pradhan
2012-01-19 15:22                       ` Makarand Pradhan
2012-01-19 15:49                         ` Philippe Gerum
2012-01-19 16:22                           ` Makarand Pradhan
2012-01-19 16:39                             ` Makarand Pradhan
2012-01-23 15:01                               ` Makarand Pradhan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F17FD9E.6040007@domain.hid \
    --to=rpm@xenomai.org \
    --cc=makarandpradhan@domain.hid \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.