From: Jan Kiszka <jan.kiszka@web.de>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] "inconsistent lock state" on boot-up
Date: Mon, 10 Nov 2014 21:42:22 +0100 [thread overview]
Message-ID: <5461232E.5000006@web.de> (raw)
In-Reply-To: <20141110203747.GV17476@sisyphus.hd.free.fr>
On 2014-11-10 21:37, Gilles Chanteperdrix wrote:
> On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote:
>> On 2014-11-10 21:23, Gilles Chanteperdrix wrote:
>>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote:
>>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote:
>>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote:
>>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote:
>>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote:
>>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote:
>>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote:
>>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel
>>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not
>>>>>>>>>>>>>>>> appear on boot-up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same
>>>>>>>>>>>>>>>>> with unlocked context switch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> FCSE is already disabled at all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could
>>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off
>>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the
>>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask
>>>>>>>>>>>>>>> instead of the hardware mask.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault,
>>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back
>>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace
>>>>>>>>>>>>>>> freeze in the kernell code when the problem happens.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call
>>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try
>>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try
>>>>>>>>>>>>>>> and catch such mistakes. This is especially important if you are
>>>>>>>>>>>>>>> running a custom skin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles:
>>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just
>>>>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes
>>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can
>>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the
>>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the
>>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is
>>>>>>>>>>>>> broken, that should be something more obscure).
>>>>>>>>>>>>
>>>>>>>>>>>> Let's see. I think I've identified one wrong path:
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
>>>>>>>>>>>> index d32f8bd..ab911f8 100644
>>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S
>>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S
>>>>>>>>>>>> @@ -198,7 +198,10 @@
>>>>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS
>>>>>>>>>>>> @ The parent context IRQs must have been enabled to get here in
>>>>>>>>>>>> @ the first place, so there's no point checking the PSR I bit.
>>>>>>>>>>>> - bl trace_hardirqs_on
>>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT
>>>>>>>>>>>> + bleq trace_hardirqs_off
>>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT
>>>>>>>>>>>> + blne trace_hardirqs_on
>>>>>>>>>>>> #endif
>>>>>>>>>>>> .else
>>>>>>>>>>>> @ IRQs off again before pulling preserved data off the stack
>>>>>>>>>>>>
>>>>>>>>>>>> This is probably no fix, but a with that change applied, the warning is
>>>>>>>>>>>> gone. Now the question is what to really test for when returning here. I
>>>>>>>>>>>> suppose we want the pipeline state of root here - should I
>>>>>>>>>>>> __ipipe_check_root_interruptible?
>>>>>>>>>>>
>>>>>>>>>>> This does not make sense, read the comment above that change: there
>>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with
>>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a
>>>>>>>>>>> problem for mainline too. We are necessarily returning to a place
>>>>>>>>>>> where hardware irqs were on.
>>>>>>>>>>
>>>>>>>>>> Did you also look at the trace I posted?
>>>>>>>>>
>>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I
>>>>>>>>> see is that these trace functions should never have been called from
>>>>>>>>> rt domain in the first place.
>>>>>>>>>
>>>>>>>>
>>>>>>>> There is no RT domain in the trace, only an inconsistent Linux trace
>>>>>>>> state after return from IRQ.
>>>>>>>
>>>>>>> What can I say, when returning from IRQ, you are necessarily
>>>>>>> returning to a point where irqs are ON, as the comment says, and it
>>>>>>> makes perfect sense. So your "fix" should be a nop. So, something
>>>>>>> else is broken.
>>>>>>
>>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I
>>>>>> was asking for a better check. Also, if that path can be taken by RT
>>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we
>>>>>> additionally need to check for the caller's domain.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Note that the fact that this trace_irqs stuff is not working well
>>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE
>>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off)
>>>>>>>>
>>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not
>>>>>>>> address all cases properly) plus the reversion of that commit, there are
>>>>>>>> still inconsistencies.
>>>>>>>
>>>>>>> You can not reverse that commit, otherwise you will end-up calling
>>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I
>>>>>>> repeat, can not work.
>>>>>>
>>>>>> I can help to understand if that is sufficient to resolve the tracing
>>>>>> breakage - it isn't, there are more paths missing or wrongly instrumented.
>>>>>
>>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on
>>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And
>>>>> is not broken.
>>>>
>>>> No, the I-pipe trace does not provide a Linux lock dependency checker,
>>>> nor does it support might_sleep and such. If you have Linux drivers
>>>> which depend on Xenomai directly or indirectly, you cannot validate them
>>>> anymore. That's why we support this on x86.
>>>
>>> Since the I-pipe is already keeping track of irq state with
>>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead
>>> of trying and using this trace_hardirqs stuff which looks
>>> irremediably broken to me?
>>
>> The former reflects the hw state, the latter traces the Linux state -
>> from Linux POV.
>
> The I-pipe tracer keeps track of the root domain stall bit as well.
>
>>
>> This is fixable. We just need to call the tracing functions where Linux
>> would call it or where we replaced some Linux call with an I-pipe
>> specific path and avoid calling it when the domain != root. Identifying
>> those spots is tricky.
>
> If we take the example of an irq, we probably want not to call
> trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the
> root domain stall bit.
Linux tracks the IRQ state separately from the (now virtualized) real
state - to validate the consistency independently of some spurious hard
irq enable/disable. And it tracks per task, not per CPU. It will be more
messy to fake this than to fix it, I'm quite sure.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/d7a9728f/attachment.sig>
next prev parent reply other threads:[~2014-11-10 20:42 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-09 10:07 [Xenomai] "inconsistent lock state" on boot-up Stoidner, Christoph
2014-11-09 15:53 ` Gilles Chanteperdrix
2014-11-10 9:08 ` Stoidner, Christoph
2014-11-10 12:33 ` Stoidner, Christoph
2014-11-10 12:44 ` Gilles Chanteperdrix
2014-11-10 12:43 ` Gilles Chanteperdrix
2014-11-10 14:52 ` Jan Kiszka
2014-11-10 15:56 ` Gilles Chanteperdrix
2014-11-10 18:29 ` Jan Kiszka
2014-11-10 19:46 ` Gilles Chanteperdrix
2014-11-10 19:51 ` Gilles Chanteperdrix
2014-11-10 19:55 ` Jan Kiszka
2014-11-10 20:00 ` Gilles Chanteperdrix
2014-11-10 20:02 ` Jan Kiszka
2014-11-10 20:06 ` Gilles Chanteperdrix
2014-11-10 20:10 ` Jan Kiszka
2014-11-10 20:14 ` Gilles Chanteperdrix
2014-11-10 20:17 ` Jan Kiszka
2014-11-10 20:18 ` Gilles Chanteperdrix
2014-11-10 20:22 ` Jan Kiszka
2014-11-10 20:23 ` Gilles Chanteperdrix
2014-11-10 20:28 ` Jan Kiszka
2014-11-10 20:37 ` Gilles Chanteperdrix
2014-11-10 20:42 ` Jan Kiszka [this message]
2014-11-10 20:55 ` Gilles Chanteperdrix
2014-11-10 21:58 ` Gilles Chanteperdrix
2014-11-12 17:27 ` Gilles Chanteperdrix
2014-11-17 16:48 ` Jan Kiszka
2014-11-17 16:59 ` Gilles Chanteperdrix
2014-11-17 17:11 ` Jan Kiszka
2014-11-17 17:33 ` Gilles Chanteperdrix
2014-11-17 19:07 ` Jan Kiszka
2014-11-17 19:24 ` Gilles Chanteperdrix
2014-11-18 6:19 ` Jan Kiszka
2014-11-18 6:28 ` Gilles Chanteperdrix
2014-11-11 17:33 ` Stoidner, Christoph
2014-11-11 17:46 ` Gilles Chanteperdrix
2014-11-11 18:04 ` Philippe Gerum
2014-11-17 10:01 ` Stoidner, Christoph
2014-11-17 10:22 ` Gilles Chanteperdrix
2014-11-17 11:13 ` Stoidner, Christoph
2014-11-17 11:30 ` Philippe Gerum
2014-11-17 13:16 ` Gilles Chanteperdrix
2014-11-17 11:49 ` Philippe Gerum
2014-11-17 11:51 ` Philippe Gerum
2014-11-17 13:10 ` Gilles Chanteperdrix
2014-11-17 13:33 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5461232E.5000006@web.de \
--to=jan.kiszka@web.de \
--cc=gilles.chanteperdrix@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.