From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <5461232E.5000006@web.de>
Date: Mon, 10 Nov 2014 21:42:22 +0100
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
References: <20141110194606.GO17476@sisyphus.hd.free.fr>
 <5461182E.1060201@web.de> <20141110200028.GQ17476@sisyphus.hd.free.fr>
 <546119F2.4080709@web.de> <20141110200632.GR17476@sisyphus.hd.free.fr>
 <54611BB7.1090500@web.de> <20141110201409.GS17476@sisyphus.hd.free.fr>
 <54611D4E.6090308@web.de> <20141110202326.GU17476@sisyphus.hd.free.fr>
 <54612002.6010102@web.de> <20141110203747.GV17476@sisyphus.hd.free.fr>
In-Reply-To: <20141110203747.GV17476@sisyphus.hd.free.fr>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Xenomai] "inconsistent lock state" on boot-up
List-Id: Discussions about the Xenomai project <xenomai.xenomai.org>
List-Unsubscribe: <http://www.xenomai.org/mailman/options/xenomai>,
 <mailto:xenomai-request@xenomai.org?subject=unsubscribe>
List-Archive: <http://www.xenomai.org/pipermail/xenomai/>
List-Post: <mailto:xenomai@xenomai.org>
List-Help: <mailto:xenomai-request@xenomai.org?subject=help>
List-Subscribe: <http://www.xenomai.org/mailman/listinfo/xenomai>,
 <mailto:xenomai-request@xenomai.org?subject=subscribe>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>

On 2014-11-10 21:37, Gilles Chanteperdrix wrote:
> On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote:
>> On 2014-11-10 21:23, Gilles Chanteperdrix wrote:
>>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote:
>>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote:
>>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote:
>>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote:
>>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote:
>>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote:
>>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote:
>>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christo=
ph wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel
>>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE =
disabled?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the mess=
age does not =

>>>>>>>>>>>>>>>> appear on boot-up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling i=
t? same
>>>>>>>>>>>>>>>>> with unlocked context switch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> FCSE is already disabled at all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lock=
dep could
>>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are no=
t off
>>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit=
 in the
>>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual m=
ask
>>>>>>>>>>>>>>> instead of the hardware mask.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation =
fault,
>>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough =
back
>>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a t=
race
>>>>>>>>>>>>>>> freeze in the kernell code when the problem happens.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if y=
ou call
>>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you=
 can try
>>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debuggin=
g, to try
>>>>>>>>>>>>>>> and catch such mistakes. This is especially important if yo=
u are
>>>>>>>>>>>>>>> running a custom skin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockd=
ep stumbles:
>>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Li=
nux. I just
>>>>>>>>>>>>>> started to look into this issue again. We tried earlier but =
got distracted.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I so=
metimes
>>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From wha=
t I can
>>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the
>>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by=
 the
>>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if somet=
hing is
>>>>>>>>>>>>> broken, that should be something more obscure).
>>>>>>>>>>>>
>>>>>>>>>>>> Let's see. I think I've identified one wrong path:
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/=
entry-header.S
>>>>>>>>>>>> index d32f8bd..ab911f8 100644
>>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S
>>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S
>>>>>>>>>>>> @@ -198,7 +198,10 @@
>>>>>>>>>>>>  #ifdef CONFIG_TRACE_IRQFLAGS
>>>>>>>>>>>>  	@ The parent context IRQs must have been enabled to get here=
 in
>>>>>>>>>>>>  	@ the first place, so there's no point checking the PSR I bi=
t.
>>>>>>>>>>>> -	bl	trace_hardirqs_on
>>>>>>>>>>>> +	tst	\rpsr, #PSR_I_BIT
>>>>>>>>>>>> +	bleq	trace_hardirqs_off
>>>>>>>>>>>> +	tst	\rpsr, #PSR_I_BIT
>>>>>>>>>>>> +	blne	trace_hardirqs_on
>>>>>>>>>>>>  #endif
>>>>>>>>>>>>  	.else
>>>>>>>>>>>>  	@ IRQs off again before pulling preserved data off the stack
>>>>>>>>>>>>
>>>>>>>>>>>> This is probably no fix, but a with that change applied, the w=
arning is
>>>>>>>>>>>> gone. Now the question is what to really test for when returni=
ng here. I
>>>>>>>>>>>> suppose we want the pipeline state of root here - should I
>>>>>>>>>>>> __ipipe_check_root_interruptible?
>>>>>>>>>>>
>>>>>>>>>>> This does not make sense, read the comment above that change: t=
here
>>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry,=
 with
>>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a
>>>>>>>>>>> problem for mainline too. We are necessarily returning to a pla=
ce
>>>>>>>>>>> where hardware irqs were on.
>>>>>>>>>>
>>>>>>>>>> Did you also look at the trace I posted?
>>>>>>>>>
>>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I
>>>>>>>>> see is that these trace functions should never have been called f=
rom
>>>>>>>>> rt domain in the first place.
>>>>>>>>>
>>>>>>>>
>>>>>>>> There is no RT domain in the trace, only an inconsistent Linux tra=
ce
>>>>>>>> state after return from IRQ.
>>>>>>>
>>>>>>> What can I say, when returning from IRQ, you are necessarily
>>>>>>> returning to a point where irqs are ON, as the comment says, and it
>>>>>>> makes perfect sense. So your "fix" should be a nop. So, something
>>>>>>> else is broken.
>>>>>>
>>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why=
 I
>>>>>> was asking for a better check. Also, if that path can be taken by RT
>>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and=
 we
>>>>>> additionally need to check for the caller's domain.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Note that the fact that this trace_irqs stuff is not working well
>>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE
>>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off)
>>>>>>>>
>>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not
>>>>>>>> address all cases properly) plus the reversion of that commit, the=
re are
>>>>>>>> still inconsistencies.
>>>>>>>
>>>>>>> You can not reverse that commit, otherwise you will end-up calling
>>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain,  which, I
>>>>>>> repeat, can not work.
>>>>>>
>>>>>> I can help to understand if that is sufficient to resolve the tracing
>>>>>> breakage - it isn't, there are more paths missing or wrongly instrum=
ented.
>>>>>
>>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on
>>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And
>>>>> is not broken.
>>>>
>>>> No, the I-pipe trace does not provide a Linux lock dependency checker,
>>>> nor does it support might_sleep and such. If you have Linux drivers
>>>> which depend on Xenomai directly or indirectly, you cannot validate th=
em
>>>> anymore. That's why we support this on x86.
>>>
>>> Since the I-pipe is already keeping track of irq state with
>>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead
>>> of trying and using this trace_hardirqs stuff which looks
>>> irremediably broken to me?
>>
>> The former reflects the hw state, the latter traces the Linux state -
>> from Linux POV.
> =

> The I-pipe tracer keeps track of the root domain stall bit as well.
> =

>>
>> This is fixable. We just need to call the tracing functions where Linux
>> would call it or where we replaced some Linux call with an I-pipe
>> specific path and avoid calling it when the domain !=3D root. Identifying
>> those spots is tricky.
> =

> If we take the example of an irq, we probably want not to call
> trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the
> root domain stall bit.

Linux tracks the IRQ state separately from the (now virtualized) real
state - to validate the consistency independently of some spurious hard
irq enable/disable. And it tracks per task, not per CPU. It will be more
messy to fake this than to fix it, I'm quite sure.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/d7a9728=
f/attachment.sig>