From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <4CDD4F85.6030009@domain.hid>
Date: Fri, 12 Nov 2010 15:30:29 +0100
From: Jan Kiszka <jan.kiszka@domain.hid>
MIME-Version: 1.0
References: <20101007115728.GA24500@domain.hid>	
	<20101013092617.GB6902@domain.hid>	
	<1286981521.1759.83.camel@domain.hid>	
	<1288025329.26618.132.camel@domain.hid> <4CC5C80E.2070004@domain.hid>	
	<1288033731.26618.161.camel@domain.hid> <4CC5D742.9080307@domain.hid>	
	<1288034435.26618.164.camel@domain.hid> <4CC5D8FF.5080109@domain.hid>	
	<1288041166.26618.182.camel@domain.hid> <4CC5F525.7040206@domain.hid>	
	<1288042858.26618.204.camel@domain.hid> <4CC5FAE6.6010305@domain.hid>	
	<1288068231.26618.224.camel@domain.hid>
	<4CC665A1.9040707@domain.hid>	 <4CC72D27.3010607@domain.hid>
	<1288243034.1816.14.camel@domain.hid>	 <4CC926BE.7040105@domain.hid>
	<1288251968.1816.22.camel@domain.hid>	
	<1289142959.1842.295.camel@domain.hid> <4CD6D22C.2030708@domain.hid>	
	<4CD8FFC4.5040202@domain.hid> <1289291217.1957.16.camel@domain.hid shift>	
	<4CD908AD.9000202@domain.hid> <1289295412.1957.32.camel@domain.hid>	
	<4CD948BB.4040201@domain.hid> <1289551711.1937.108.camel@domain.hid>	
	<4CDD055C.6040103@domain.hid> <1289570246.1937.422.camel@domain.hid>
In-Reply-To: <1289570246.1937.422.camel@domain.hid>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig32D620DF28BC32F5118470B1"
Sender: jan.kiszka@domain.hid
Subject: Re: [Xenomai-help] kernel oopses when killing realtime task
List-Id: Help regarding installation and common use of Xenomai
	<xenomai.xenomai.org>
List-Unsubscribe: <https://mail.gna.org/options/xenomai-help>,
	<mailto:xenomai-help-request@domain.hid>
List-Archive: </public/xenomai-help>
List-Post: <mailto:xenomai@xenomai.org>
List-Help: <mailto:xenomai-help-request@domain.hid>
List-Subscribe: <https://mail.gna.org/listinfo/xenomai-help>,
	<mailto:xenomai-help-request@domain.hid>
To: Philippe Gerum <rpm@xenomai.org>
Cc: xenomai@xenomai.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig32D620DF28BC32F5118470B1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Am 12.11.2010 14:57, Philippe Gerum wrote:
> On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
>> Am 12.11.2010 09:48, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>>>> The following patches implements the teardown approach. The ba=
sic idea
>>>>>>>>>> is:
>>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patc=
hes not
>>>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>>>
>>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> This actually causes one regression: I've just learned that peop=
le are
>>>>>>>> already happily using MSIs with Xenomai in the field. This is pe=
rfectly
>>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in=

>>>>>>>> non-root contexts or while hard IRQs are disable. The latter req=
uirement
>>>>>>>> would be violated by this fix now.
>>>>>>>
>>>>>>> What we could do is handle this corner-case in the ipipe directly=
, going
>>>>>>> for a nop when IRQs are off on a per-arch basis only to please th=
ose
>>>>>>> users,
>>>>>>
>>>>>> Don't we disable hard IRQs also then the root domain is the only
>>>>>> registered one? I'm worried about pushing regressions around, then=
 to
>>>>>> plain Linux use-cases of MSI (which are not broken in anyway - exc=
ept
>>>>>> for powerpc).
>>>>>
>>>>> The idea is to provide an ad hoc ipipe service for this, to be used=
 by
>>>>> the HAL. A service that would check the controller for the target I=
RQ,
>>>>> and handle MSI ones conditionally. For sure, we just can't put thos=
e
>>>>> conditionally bluntly into the chip mask handler and expect the ker=
nel
>>>>> to be happy.
>>>>>
>>>>> In fact, we already have __ipipe_enable/disable_irq from the intern=
al
>>>>> Adeos interface avail, but they are mostly wrappers for now. We cou=
ld
>>>>> make them a bit more smart, and handle the MSI issue as well. We wo=
uld
>>>>> then tell the HAL to switch to using those arch-agnostic helpers
>>>>> generally, instead of peeking directly into the chip controller str=
ucts
>>>>> like today.
>>>>
>>>> This belongs to I-pipe, like we already have ipipe_end, just properl=
y
>>>> wrapped to avoid descriptor access. That's specifically important if=
 we
>>>> want to emulate MSI masking in software. I've the generic I-pipe
>>>> infrastructure ready, but the backend, so far consisting of x86 MSI
>>>> hardening, unfortunately needs to be rewritten.
>>>>
>>>>>
>>>>> If that ipipe "feature" is not detected by the HAL, then we would
>>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this wo=
uld
>>>>> leave the SMP race window open, but since we need recent ipipes to =
get
>>>>> it plugged already anyway (for the revised ipipe_control_irq), we w=
ould
>>>>> still remain in the current situation:
>>>>> - old patches? no SMP race fix, no regression
>>>>> - new patches? SMP race fix avail, no regression
>>>>
>>>> Sounds good.
>>>
>>> Now that I slept on it, I find the approach of working around pipelin=
e
>>> limitations this way, to be incorrect.
>>>
>>> Basically, the issue is that we still don't have 100% reliable handli=
ng
>>> of MSI interrupts (actually, we only have partial handling, and solel=
y
>>> for x86), but this is no reason to introduce code in the pipeline
>>> interface which would perpetuate this fact. I see this as a "all or
>>> nothing" issue: either MSI is fully handled and there shall be no
>>> restriction on applying common operations such as masking/unmasking o=
n
>>> the related IRQs, or it is not, and we should not export "conditional=
ly
>>> working" APIs.
>>>
>>> In the latter case, the responsibility to rely on MSI support belongs=
 to
>>> the user, which then should know about the pending restrictions, and
>>> decides for himself whether to use MSI. So I'm heading to this soluti=
on
>>> instead:
>>>
>>> - when detaching the last handler for a given IRQ, instead of forcibl=
y
>>> disabling the IRQ line, the nucleus would just make sure that such IR=
Q
>>> is already in a disabled state, and bail out on error if not (probabl=
y
>>> with a kernel warning to make the issue obvious).
>>
>> Fiddling with the IRQ "line" state is a workaround for the missing
>> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
>> disabling become unneeded.
>=20
> Actually, no. The synchronize irq service in the pipeline could have
> been introduced weeks ago in a snap, provided Xenomai did not call
> ipipe_virtualize_irq holding a lock with irqs off - this is in essence
> what my first patch to Pavel did, but could not work for the reason
> mentioned. That is what the patch we are discussing now is all about:
> clearing issues in Xenomai first.
>=20
> Additionally, having an IRQ disabled is a requirement to properly
> synchronize that IRQ later, it is not an option. It is part of the sync=

> protocol actually. If you don't do this, it lacks a basic guarantee, it=

> can't work.

The required pattern from driver POV is
 - disable IRQ at device level
 - flush pending IRQs
 - deregister handler

Linux does the last two steps in reverse order as it synchronizes
descriptor changes against descriptor usage internally. I-pipe requires
the above ordering (unless my patch is used to harden
ipipe_virtualize_irq against NULL handlers). Still, the driver should
find step 2 and 3 as part of rtdm_irq_free. That's all.

>=20
>>
>>>
>>> - track the IRQ line state from xnintr_enable/xnintr_disable routines=
,
>>> so that xnintr_detach can determine whether the call is legit. Of
>>> course, this also means that any attempt to take sideways to
>>> enable/disable nucleus managed interrupts at PIC level would break th=
at
>>> logic, but doing so would be the root bug anyway.
>>>
>>> The advantage of doing so would be three-fold:
>>>
>>> - no pipeline code to acknowledge (or even perpetuate) the fact that =
MSI
>>> support is half working, half broken. We need to fix it properly, so
>>> that we can use it 100% reliably, from whatever context commonly allo=
wed
>>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
>>> only). Typically, I fail to see how one would cope with such limitati=
on,
>>> if a real-time handler detects that some device is going wild and rea=
lly
>>> needs to shut it down before the whole system crashes.
>>
>> MSIs are edge-triggered. Only broken hardware continuously sending bog=
us
>> messages can theoretically cause troubles. In practice (ie. in absence=

>> of broken HW), we see a single spurious IRQ at worst.
>>
>=20
> The driver should be able to work the same way whether it deals with MS=
I
> support, or is given pinned (level) IRQs.

Exactly my point, thus rtdm_irq_disable in driver code is a no-go.

>=20
>>>
>>> - we enforce the API usage requirement to disable an interrupt line w=
ith
>>> rtdm_irq_disable(), before eventually detaching the last IRQ handler =
for
>>> it, which is common sense anyway.
>>
>> That's an easy-to-get-wrong API. It would apply to non-shared IRQs onl=
y
>> (aka MSIs). No-go IMHO.
>=20
> - the software does not want to share MSIs, or something is quite wrong=

> - xnintr_detach() is broken by design in its current implementation for=

> shared handlers, because it leaves the decision to mask the IRQ line to=

> the user. This could not work for multiple non-cooperating drivers
> sharing an IRQ, at least not in a raceless way.

It works (minus the missing synchronize_irq) if you disable the IRQ at
device level as drivers normally do.

>=20
> So the issue is not that much about MSI vs pinned-type IRQs, it is
> rather about shared vs non-shared interrupts. What we want is a sane AP=
I
> and behavior for both types, which happens to fit the limited support w=
e
> have now for MSI, not the other way around.
>=20
> Therefore, until the MSI issue is fixed in the pipeline:
>=20
> - non-shared mode users should call xnintr_disable(irq) then
> xnintr_detach(irq) explicitly. This is what everybody should do today
> already (via rtdm_irq_disable() or whatever), since xnintr_detach() doe=
s
> not disable the IRQ in the current implementation.

No one should call rtdm_irq_disable today for whatever IRQ unless in
very few exceptional cases. I think I should extend the documentation of
rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.

>=20
> - shared mode users should only call xnintr_detach(irq), which would
> then decide racelessly whether to disable the line. In essence, this
> could never affect MSIs.
>=20
> When the MSI support is generally available with no limitation on
> masking/unmasking contexts, then xnintr_detach() could be updated to
> forcibly disable the detached IRQ in the non-shared case as well. At
> worst, the legacy code would end up calling xnintr_disable() uselessly
> when the pipeline fix is merged.
>=20
> What we can do in the meantime, is 1) fixing xnintr_detach() in the
> shared case, 2) introducing an error check to prevent detaching
> non-shared IRQs that are still enabled. That won't have any impact on
> the partial MSI support we have today.
>=20
>>
>>>
>>> - absolutely no change for people who currently rely on partial MSI
>>> support, provided they duly disable IRQ lines before detaching their
>>> last handler via the appropriate RTDM interface.
>>>
>>> Can we deal on this?
>>>
>>
>> Nope, don't think so. The only option I see (besides using my original=

>> proposal of a dummy handler for deregistering - still much simpler tha=
n
>> the current patches) is to emulate MSI masking in the same run, thus
>> providing solutions for both issues.
>=20
> The dummy handler solution is incomplete because it does not address th=
e
> IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gis=
t
> of the patch I sent, actually.

It is complete in this regard as it does not touch the line state. It
actually implements the Linux pattern.

>=20
> Besides, I'm still scratching my head: how would introducing a dummy
> handler help dealing with that sequence?
>=20
> CPU0					CPU1
>=20
> xnintr_disable =3D> mask irq		<irq>

No disable here, mask must be at device level.

> xnintr_detach				real-time handler
> 	<install dummy handler>			<touch device>
> 					ipipe_end_irq() =3D> unmask irq

xnarch_synchronize_irq

>=20
> Then, on whichever CPU, we could receive a device IRQ as we touched the=

> hardware before leaving the handler, passing that IRQ to the root domai=
n
> which has no proper handler for it:
>=20
> - lucky scenario: the device is in a state that won't cause it to kick
> yet another interrupt even if the last one was not serviced, and we jus=
t
> end up with a nifty spurious IRQ message in the kernel log.
>=20
> - out of luck: we touched the device on CPU1 in a way that claims for a=

> specific action by the software, which won't be able to perform it
> because it is now branching to the dummy handler. Whether we have some
> code to silence the device (and not only the IRQ line) before or after
> we detach the IRQ on CPU0 does not fix the issue, since we have no
> synchronization between the CPUs whatsoever.
>=20

The driver is responsible for synchronizing its shutdown sequence
between the cleanup code and any of its in-flight IRQs. If that handler
reenables the IRQ that the shutdown code just disabled (all at device
level, of course), something is seriously broken - but not on Xenomai
side. As I explained before: That potential in-flight IRQ becomes
spurious at the point the drivers starts the shutdown, thus can safely
be ignored.

Jan


--------------enig32D620DF28BC32F5118470B1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAkzdT4gACgkQitSsb3rl5xQ/RgCfVHQmNVaEL6rr0J9qdsg5oBvo
198AoIDJd19dbtpA17pdDKxo1LQFctLH
=26/o
-----END PGP SIGNATURE-----

--------------enig32D620DF28BC32F5118470B1--