can/should a disabled irq become pending?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* can/should a disabled irq become pending?
@ 2024-11-11 13:03 Uwe Kleine-König
  2024-11-12 19:35 ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Uwe Kleine-König @ 2024-11-11 13:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Nuno Sá, Jonathan Cameron

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

Hello Thomas,

[This mail is about an issue identified in a thread on the linux-iio
mailing list, but I guess you missed that because the Subject isn't
"interesting". See
https://lore.kernel.org/linux-iio/20241030204429.70cdcf35@jic23-huawei.
So here comes a dedicated mail about only this topic in the hope it
attracts your attention.]

The ad7124 ADC and a few cousins of it have the peculiarity to report
their interrupt on the SPI MISO line. So while the SPI clock is toggling
it behaves as MISO and after a transfer it's a falling edge irq signal.

So the driver masks the irq during SPI transfers (using irq_disable()).
It also uses irq_set_status_flags(sigma_delta->irq_line,
IRQ_DISABLE_UNLAZY);

Now on some irq controllers (e.g. the rpi GPIO controller) the detection
logic is off between calls to irq_disable() and irq_enable() and so on
that platform calling irq_enable() makes the irq handler called only on
a real irq, while for other irq controllers (e.g. the Altera GPIO
controller) the SPI transfers make the irq pending and irq_enable()
results in a call of the handler immediately (but very likely before the
device actually asserted the interrupt).

I think (but please correct me) that the latter behaviour has to be
expected and the former is even broken as it might miss irq events.

My conclusions from this are:

 - the ad7124 driver needs some additional logic to check the actual
   line state in the irq handler to differentiate between "real" irqs
   and spurious ones triggered by SPI transfers[1];

 - the rpi GPIO controller should be fixed not to honor
   IRQ_DISABLE_UNLAZY.

Can you confirm these?

Best regards
Uwe

[1] it can only check the level, not a passed edge, but that seems to
    work in practise

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-11 13:03 can/should a disabled irq become pending? Uwe Kleine-König
@ 2024-11-12 19:35 ` Thomas Gleixner
  2024-11-12 22:08   ` Uwe Kleine-König
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-11-12 19:35 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: linux-kernel, Nuno Sá, Jonathan Cameron

Uwe!

On Mon, Nov 11 2024 at 14:03, Uwe Kleine-König wrote:
> [This mail is about an issue identified in a thread on the linux-iio
> mailing list, but I guess you missed that because the Subject isn't
> "interesting".

Pretty much so :)

> The ad7124 ADC and a few cousins of it have the peculiarity to report
> their interrupt on the SPI MISO line. So while the SPI clock is toggling
> it behaves as MISO and after a transfer it's a falling edge irq
> signal.

Saves a pin and the fallout can be handled in software, or not :)

> So the driver masks the irq during SPI transfers (using irq_disable()).
> It also uses irq_set_status_flags(sigma_delta->irq_line,
> IRQ_DISABLE_UNLAZY);
>
> Now on some irq controllers (e.g. the rpi GPIO controller) the detection
> logic is off between calls to irq_disable() and irq_enable() and so on
> that platform calling irq_enable() makes the irq handler called only on
> a real irq, while for other irq controllers (e.g. the Altera GPIO
> controller) the SPI transfers make the irq pending and irq_enable()
> results in a call of the handler immediately (but very likely before the
> device actually asserted the interrupt).
>
> I think (but please correct me) that the latter behaviour has to be
> expected and the former is even broken as it might miss irq events.

The default lazy disabling of interrupts has two reasons:

    1) Avoid the potentially expensive mask/unmask operations for the
       common case where no interrupt comes in between mask() and
       unmask()

    2) Addressing the problem of edge interrupt controllers, which
       disable the edge detection logic when masked, which in turn
       causes interrupts to be lost.

IRQ_DISABLE_UNLAZY solves a different problem. See commit e9849777d0e2
("genirq: Add flag to force mask in disable_irq[_nosync]()") See the
full discussion at:

  https://lore.kernel.org/all/alpine.DEB.2.11.1510092348370.6097@nanos/T/#m08531ad84aa9a53c26f91fd55be60fad6b5607b9

> My conclusions from this are:

>  - the rpi GPIO controller should be fixed not to honor
>    IRQ_DISABLE_UNLAZY.

I don't think that there is something to fix which is specific to the
RPI GPIO controller. IRQ_DISABLE_UNLAZY is clearly defined and of course
it will cause the problem #2 for edge type interrupts on such chips.

Ignoring IRQ_DISABLE_UNLAZY might be the right thing to do for this
AD7*** use case, but is it universially correct?

It's probably correct to do so for edge type interrupts. For level type,
not so much.

>  - the ad7124 driver needs some additional logic to check the actual
>    line state in the irq handler to differentiate between "real" irqs
>    and spurious ones triggered by SPI transfers[1];
>
> [1] it can only check the level, not a passed edge, but that seems to
>     work in practise

Right, because once the pin is in /RDY mode again it stays low until the
next update happens.

But this requires that the chip is connected to a GPIO based interrupt
line. Non GPIO based interrupt controllers are not providing a line
state readback.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-12 19:35 ` Thomas Gleixner
@ 2024-11-12 22:08   ` Uwe Kleine-König
  2024-11-13  3:40     ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Uwe Kleine-König @ 2024-11-12 22:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Nuno Sá, Jonathan Cameron

[-- Attachment #1: Type: text/plain, Size: 5343 bytes --]

Hello Thomas,

On Tue, Nov 12, 2024 at 08:35:13PM +0100, Thomas Gleixner wrote:
> On Mon, Nov 11 2024 at 14:03, Uwe Kleine-König wrote:
> > The ad7124 ADC and a few cousins of it have the peculiarity to report
> > their interrupt on the SPI MISO line. So while the SPI clock is toggling
> > it behaves as MISO and after a transfer it's a falling edge irq
> > signal.
> 
> Saves a pin and the fallout can be handled in software, or not :)
> 
> > So the driver masks the irq during SPI transfers (using irq_disable()).
> > It also uses irq_set_status_flags(sigma_delta->irq_line,
> > IRQ_DISABLE_UNLAZY);
> >
> > Now on some irq controllers (e.g. the rpi GPIO controller) the detection
> > logic is off between calls to irq_disable() and irq_enable() and so on
> > that platform calling irq_enable() makes the irq handler called only on
> > a real irq, while for other irq controllers (e.g. the Altera GPIO
> > controller) the SPI transfers make the irq pending and irq_enable()
> > results in a call of the handler immediately (but very likely before the
> > device actually asserted the interrupt).
> >
> > I think (but please correct me) that the latter behaviour has to be
> > expected and the former is even broken as it might miss irq events.
> 
> The default lazy disabling of interrupts has two reasons:
> 
>     1) Avoid the potentially expensive mask/unmask operations for the
>        common case where no interrupt comes in between mask() and
>        unmask()
> 
>     2) Addressing the problem of edge interrupt controllers, which
>        disable the edge detection logic when masked, which in turn
>        causes interrupts to be lost.

Ack, so far I understood.

> IRQ_DISABLE_UNLAZY solves a different problem. See commit e9849777d0e2
> ("genirq: Add flag to force mask in disable_irq[_nosync]()") See the
> full discussion at:
> 
>   https://lore.kernel.org/all/alpine.DEB.2.11.1510092348370.6097@nanos/T/#m08531ad84aa9a53c26f91fd55be60fad6b5607b9

I found that thread already before. I didn't understand the
motivation/problem fixed there though as the thread start is missing on
lore. I guess the problem was "taking each interrupt twice" as mentioned
in the commit log, but I don't grokk that either. What means "taken"?
The controller's irq routine called; or the driver's handler? Or
something in hardware? I don't see something being done twice.

If I understand correctly, IRQ_DISABLE_UNLAZY is only an optimisation,
but isn't supposed to affect correctness.

> > My conclusions from this are:
> 
> >  - the rpi GPIO controller should be fixed not to honor
> >    IRQ_DISABLE_UNLAZY.
> 
> I don't think that there is something to fix which is specific to the
> RPI GPIO controller. IRQ_DISABLE_UNLAZY is clearly defined and of course
> it will cause the problem #2 for edge type interrupts on such chips.

So a driver making use of IRQ_DISABLE_UNLAZY must know itself if the
irq is provided by such a problematic controller? IMHO that sounds
wrong and the controller (or the irq subsystem) should stick to the lazy
disable approach in that case even though the driver requested to
disable it because not doing the optimisation is better than possibly
missing interrupts.

> Ignoring IRQ_DISABLE_UNLAZY might be the right thing to do for this
> AD7*** use case, but is it universially correct?

If the device driver is agnostic to which irq controller is used (which
it should be) it must be prepared to handle an irq that triggered while
it was masked. So with my understanding (which might be a relevant
restriction) ignoring IRQ_DISABLE_UNLAZY is correct. However that
doesn't mean that drivers (as for example the current ad7124) would
break by such a change as it currently depends on that #2 brokeness.

> It's probably correct to do so for edge type interrupts. For level type,
> not so much.

I'm missing a few details here to follow. E.g. is a level irq supposed
to become pending when the irq line gets only shortly active while the
irq is masked or while irqs are disabled globally?

> >  - the ad7124 driver needs some additional logic to check the actual
> >    line state in the irq handler to differentiate between "real" irqs
> >    and spurious ones triggered by SPI transfers[1];

Do you agree that this is necessary? I created a patch for that and
Jonathan had some doubts this is a valid approach.

> > [1] it can only check the level, not a passed edge, but that seems to
> >     work in practise
> 
> Right, because once the pin is in /RDY mode again it stays low until the
> next update happens.
> 
> But this requires that the chip is connected to a GPIO based interrupt
> line. Non GPIO based interrupt controllers are not providing a line
> state readback.

Ack. If there is no way to read back the line state and it's unknown if
the irq controller suffers from problem #2, the only way to still
benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
each 2nd irq; or ignore irqs based on timing. That doesn't sound very
robust though, so maybe the driver has to fall back on polling the
status register and not use irqs at all in that case.

It's even unknown to me if all GPIO controller support reading the line
if they operate in irq mode.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-12 22:08   ` Uwe Kleine-König
@ 2024-11-13  3:40     ` Thomas Gleixner
  2024-11-13 10:34       ` Nuno Sá
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-11-13  3:40 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: linux-kernel, Nuno Sá, Jonathan Cameron

Uwe!

On Tue, Nov 12 2024 at 23:08, Uwe Kleine-König wrote:
> On Tue, Nov 12, 2024 at 08:35:13PM +0100, Thomas Gleixner wrote:
>> IRQ_DISABLE_UNLAZY solves a different problem. See commit e9849777d0e2
>> ("genirq: Add flag to force mask in disable_irq[_nosync]()") See the
>> full discussion at:
>> 
>>   https://lore.kernel.org/all/alpine.DEB.2.11.1510092348370.6097@nanos/T/#m08531ad84aa9a53c26f91fd55be60fad6b5607b9
>
> I found that thread already before. I didn't understand the
> motivation/problem fixed there though as the thread start is missing on
> lore. I guess the problem was "taking each interrupt twice" as mentioned
> in the commit log, but I don't grokk that either. What means "taken"?
> The controller's irq routine called; or the driver's handler? Or
> something in hardware? I don't see something being done twice.

The problem on that particular hardware is that disabling interrupts at
the device level is not working. That's discussed somewhere in the
thread. The device seems to always raise an interrupt during the
disabled time. So with lazy disable this will invoke the corresponding
low level flow handler which sees 'disabled' and masks it.

On enable the interrupt is unmasked and retriggered. That double
handling costs quite some performance.

> If I understand correctly, IRQ_DISABLE_UNLAZY is only an optimisation,
> but isn't supposed to affect correctness.

That's correct.

>> > My conclusions from this are:
>> 
>> >  - the rpi GPIO controller should be fixed not to honor
>> >    IRQ_DISABLE_UNLAZY.
>> 
>> I don't think that there is something to fix which is specific to the
>> RPI GPIO controller. IRQ_DISABLE_UNLAZY is clearly defined and of course
>> it will cause the problem #2 for edge type interrupts on such chips.
>
> So a driver making use of IRQ_DISABLE_UNLAZY must know itself if the
> irq is provided by such a problematic controller?

What I meant is that this is a problem which affects all interrupt chips
which disable the edge detection logic on mask().

>> Ignoring IRQ_DISABLE_UNLAZY might be the right thing to do for this
>> AD7*** use case, but is it universially correct?
>
> If the device driver is agnostic to which irq controller is used (which
> it should be) it must be prepared to handle an irq that triggered while
> it was masked.

The interrupt does not get to the device handler even in the lazy
disable case. Once the driver invoked disable_irq*() the low level flow
handlers (edge, level ...) mask the interrupt line and marks the
interrupt pending. enable_irq() retriggers the interrupt when the
pending bit is set, except when the interrupt line is level triggered.

>> It's probably correct to do so for edge type interrupts. For level type,
>> not so much.
>
> I'm missing a few details here to follow. E.g. is a level irq supposed
> to become pending when the irq line gets only shortly active while the
> irq is masked or while irqs are disabled globally?

Level type interrupts are not transient by definition. They stay
asserted until the driver acknowledged them in the device. That's why
level triggered interrupts are excluded from retrigger. The hardware
takes care of that already. So UNLAZY just works for level triggered
interrupts.

On controllers which suffer from the #2 problem UNLAZY should indeed be
ignored for edge type interrupts. That's something which the controller
should signal via a irqchip flag and the core code can act upon it and
ignore UNLAZY for edge type interrupts.

But that won't fix the problem at hand. Let's take a step back and look
at the larger picture whether this can be reliably "fixed" at all.

The lazy disable case:

   disable_irq();

   // After this point the drivers interrupt handler is not longer
   // invoked. An eventually raised interrupt causes the line to
   // be masked and the interrupt is marked pending.

   do_spi_transfer();   // Triggers interrupt because of stupid hardware

   enable_irq();        // Interrupt is retriggered in software

   device_handler()

The UNLAZY case where the interrupt controller does not disable edge
detection is not really different:

   disable_irq();

   // After this point the drivers interrupt handler is not longer
   // invoked. The line is masked

   do_spi_transfer();   // Triggers interrupt because of stupid hardware
                        // which is marked pending in the interrupt controller

   enable_irq();        // Interrupt is raised by the interrupt controller

   device_handler()

In both cases the device handler is up the creek without a paddle
because it cannot tell whether this is a real data ready interrupt or
just the artifact of the SPI transfer.

The UNLAZY case where the interrupt controller disables edge
detection suffers from a different problem:

   disable_irq();

   // After this point the drivers interrupt handler is not longer
   // invoked. The line is masked

   do_spi_transfer();   // Triggers interrupt because of stupid hardware
                        // which is ignored by the interrupt controller

   enable_irq();        // Races against data ready interrupt

If enable_irq() is invoked _after_ the device triggered the data ready
interrupt, then the interrupt is lost.

IOW, there is no reliable software solution for this problem unless you
have a microcontroller where you have finegrained control over all of
this.

The proper way to handle this for a general purpose CPU/OS is to gate off
the interrupt line in hardware when there is a SPI transfer in progress:

    /RDY ---|>o---|&&
                  |&&------ CPU interrupt pin (raising edge)
    /CS  ---------|&&

>> >  - the ad7124 driver needs some additional logic to check the actual
>> >    line state in the irq handler to differentiate between "real" irqs
>> >    and spurious ones triggered by SPI transfers[1];
>
> Do you agree that this is necessary? I created a patch for that and
> Jonathan had some doubts this is a valid approach.

Something needs to be done obviously.

>> > [1] it can only check the level, not a passed edge, but that seems to
>> >     work in practise
>> 
>> Right, because once the pin is in /RDY mode again it stays low until the
>> next update happens.
>> 
>> But this requires that the chip is connected to a GPIO based interrupt
>> line. Non GPIO based interrupt controllers are not providing a line
>> state readback.
>
> Ack. If there is no way to read back the line state and it's unknown if
> the irq controller suffers from problem #2, the only way to still
> benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
> each 2nd irq; or ignore irqs based on timing. That doesn't sound very
> robust though, so maybe the driver has to fall back on polling the
> status register and not use irqs at all in that case.

Actually ignoring the first interrupt after a SPI transfer and waiting
for the next conversion to raise the interrupt again should be robust
enough. The ADC has to be in continous conversion mode for that
obviously.

> It's even unknown to me if all GPIO controller support reading the line
> if they operate in irq mode.

Probably not.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-13  3:40     ` Thomas Gleixner
@ 2024-11-13 10:34       ` Nuno Sá
  2024-11-13 15:50         ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Nuno Sá @ 2024-11-13 10:34 UTC (permalink / raw)
  To: Thomas Gleixner, Uwe Kleine-König; +Cc: linux-kernel, Jonathan Cameron

Hi Thomas,

On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> Uwe!
> 
> On Tue, Nov 12 2024 at 23:08, Uwe Kleine-König wrote:
> > On Tue, Nov 12, 2024 at 08:35:13PM +0100, Thomas Gleixner wrote:
> > > IRQ_DISABLE_UNLAZY solves a different problem. See commit e9849777d0e2
> > > ("genirq: Add flag to force mask in disable_irq[_nosync]()") See the
> > > full discussion at:
> > > 
> > >  
> > > https://lore.kernel.org/all/alpine.DEB.2.11.1510092348370.6097@nanos/T/#m08531ad84aa9a53c26f91fd55be60fad6b5607b9
> > 
> > I found that thread already before. I didn't understand the
> > motivation/problem fixed there though as the thread start is missing on
> > lore. I guess the problem was "taking each interrupt twice" as mentioned
> > in the commit log, but I don't grokk that either. What means "taken"?
> > The controller's irq routine called; or the driver's handler? Or
> > something in hardware? I don't see something being done twice.
> 
> The problem on that particular hardware is that disabling interrupts at
> the device level is not working. That's discussed somewhere in the
> thread. The device seems to always raise an interrupt during the
> disabled time. So with lazy disable this will invoke the corresponding
> low level flow handler which sees 'disabled' and masks it.
> 
> On enable the interrupt is unmasked and retriggered. That double
> handling costs quite some performance.
> 
> > If I understand correctly, IRQ_DISABLE_UNLAZY is only an optimisation,
> > but isn't supposed to affect correctness.
> 
> That's correct.
> 
> > > > My conclusions from this are:
> > > 
> > > >  - the rpi GPIO controller should be fixed not to honor
> > > >    IRQ_DISABLE_UNLAZY.
> > > 
> > > I don't think that there is something to fix which is specific to the
> > > RPI GPIO controller. IRQ_DISABLE_UNLAZY is clearly defined and of course
> > > it will cause the problem #2 for edge type interrupts on such chips.
> > 
> > So a driver making use of IRQ_DISABLE_UNLAZY must know itself if the
> > irq is provided by such a problematic controller?
> 
> What I meant is that this is a problem which affects all interrupt chips
> which disable the edge detection logic on mask().
> 
> > > Ignoring IRQ_DISABLE_UNLAZY might be the right thing to do for this
> > > AD7*** use case, but is it universially correct?
> > 
> > If the device driver is agnostic to which irq controller is used (which
> > it should be) it must be prepared to handle an irq that triggered while
> > it was masked.
> 
> The interrupt does not get to the device handler even in the lazy
> disable case. Once the driver invoked disable_irq*() the low level flow
> handlers (edge, level ...) mask the interrupt line and marks the
> interrupt pending. enable_irq() retriggers the interrupt when the
> pending bit is set, except when the interrupt line is level triggered.

There's something that I'm still trying to figure... For IRQ controllers that not
disable edge detection, can't we get the device handler called twice if we don't set
unlazy?

irq_enable() - > check_irq_resend()

and then

handle_edge_irq() raised by the controller

Or is the core handling this somehow? I thought IRQS_REPLAY could be doing the trick
but I'm not seeing how...

> 
> > > It's probably correct to do so for edge type interrupts. For level type,
> > > not so much.
> > 
> > I'm missing a few details here to follow. E.g. is a level irq supposed
> > to become pending when the irq line gets only shortly active while the
> > irq is masked or while irqs are disabled globally?
> 
> Level type interrupts are not transient by definition. They stay
> asserted until the driver acknowledged them in the device. That's why
> level triggered interrupts are excluded from retrigger. The hardware
> takes care of that already. So UNLAZY just works for level triggered
> interrupts.
> 
> On controllers which suffer from the #2 problem UNLAZY should indeed be
> ignored for edge type interrupts. That's something which the controller
> should signal via a irqchip flag and the core code can act upon it and
> ignore UNLAZY for edge type interrupts.
> 
> But that won't fix the problem at hand. Let's take a step back and look
> at the larger picture whether this can be reliably "fixed" at all.
> 

Yeah, I'm still trying to figure when it's correct for a device to do UNLAZY? If I'm
understanding things, devices that rely on disable_irq*() should set it? Because
problem #2 is something that needs to be handled at the controller and core level if
I got you right.

> The lazy disable case:
> 
>    disable_irq();
> 
>    // After this point the drivers interrupt handler is not longer
>    // invoked. An eventually raised interrupt causes the line to
>    // be masked and the interrupt is marked pending.
>    
>    do_spi_transfer();   // Triggers interrupt because of stupid hardware
> 
>    enable_irq();        // Interrupt is retriggered in software
> 
>    device_handler()
> 
> The UNLAZY case where the interrupt controller does not disable edge
> detection is not really different:
> 
>    disable_irq();
> 
>    // After this point the drivers interrupt handler is not longer
>    // invoked. The line is masked
>    
>    do_spi_transfer();   // Triggers interrupt because of stupid hardware
>                         // which is marked pending in the interrupt controller
> 
>    enable_irq();        // Interrupt is raised by the interrupt controller
> 
>    device_handler()
> 
> In both cases the device handler is up the creek without a paddle
> because it cannot tell whether this is a real data ready interrupt or
> just the artifact of the SPI transfer.
> 
> The UNLAZY case where the interrupt controller disables edge
> detection suffers from a different problem:
> 
>    disable_irq();
> 
>    // After this point the drivers interrupt handler is not longer
>    // invoked. The line is masked
>    
>    do_spi_transfer();   // Triggers interrupt because of stupid hardware
>                         // which is ignored by the interrupt controller
> 
>    enable_irq();        // Races against data ready interrupt
> 
> If enable_irq() is invoked _after_ the device triggered the data ready
> interrupt, then the interrupt is lost.

And this had the side effect for things to work with such controllers. See [1]. That
was affecting "single shot" conversions rather than IIO buffering (continuous mode)
where the pending IRQ was firing right after enable_irq() causing a completion to be
signaled and then not reading a valid sample.

I see now that was a bandaid and not a proper fix and won't work with all
controllers.
  
> 
> IOW, there is no reliable software solution for this problem unless you
> have a microcontroller where you have finegrained control over all of
> this.
> 
> The proper way to handle this for a general purpose CPU/OS is to gate off
> the interrupt line in hardware when there is a SPI transfer in progress:
> 
>     /RDY ---|>o---|&&
>                   |&&------ CPU interrupt pin (raising edge)
>     /CS  ---------|&&
> 
> > > >  - the ad7124 driver needs some additional logic to check the actual
> > > >    line state in the irq handler to differentiate between "real" irqs
> > > >    and spurious ones triggered by SPI transfers[1];
> > 
> > Do you agree that this is necessary? I created a patch for that and
> > Jonathan had some doubts this is a valid approach.
> 
> Something needs to be done obviously.
> 
> > > > [1] it can only check the level, not a passed edge, but that seems to
> > > >     work in practise
> > > 
> > > Right, because once the pin is in /RDY mode again it stays low until the
> > > next update happens.
> > > 
> > > But this requires that the chip is connected to a GPIO based interrupt
> > > line. Non GPIO based interrupt controllers are not providing a line
> > > state readback.
> > 
> > Ack. If there is no way to read back the line state and it's unknown if
> > the irq controller suffers from problem #2, the only way to still
> > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
> > each 2nd irq; or ignore irqs based on timing. That doesn't sound very
> > robust though, so maybe the driver has to fall back on polling the
> > status register and not use irqs at all in that case.
> 
> Actually ignoring the first interrupt after a SPI transfer and waiting
> for the next conversion to raise the interrupt again should be robust
> enough. The ADC has to be in continous conversion mode for that
> obviously.
> 

Might be the only sane option we have, Uwe? If we do this, we could be dropping valid
samples but only with controllers that suffer from #2. And in that case, we can loose
samples anyways due to the irq_enable() race.

[1]: https://lore.kernel.org/linux-iio/20230306044737.862-1-honda@mechatrax.com/

Thanks for all the clarifications!
- Nuno Sá
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-13 10:34       ` Nuno Sá
@ 2024-11-13 15:50         ` Thomas Gleixner
  2024-11-14  7:49           ` Nuno Sá
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-11-13 15:50 UTC (permalink / raw)
  To: Nuno Sá, Uwe Kleine-König; +Cc: linux-kernel, Jonathan Cameron

On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
>> The interrupt does not get to the device handler even in the lazy
>> disable case. Once the driver invoked disable_irq*() the low level flow
>> handlers (edge, level ...) mask the interrupt line and marks the
>> interrupt pending. enable_irq() retriggers the interrupt when the
>> pending bit is set, except when the interrupt line is level triggered.
>
> There's something that I'm still trying to figure... For IRQ controllers that not
> disable edge detection, can't we get the device handler called twice if we don't set
> unlazy?
>
> irq_enable() - > check_irq_resend()
>
> and then
>
> handle_edge_irq() raised by the controller

You're right. We should have a flag which controls the replay
requirements of an interrupt controller. So far it only skips for level
triggered interrupts, but for those controllers it should skip for edge
too. Something like IRQCHIP_NO_RESEND ...

> Or is the core handling this somehow? I thought IRQS_REPLAY could be
> doing the trick but I'm not seeing how...

IRQS_REPLAY is just internal state to avoid double replay.

>> On controllers which suffer from the #2 problem UNLAZY should indeed be
>> ignored for edge type interrupts. That's something which the controller
>> should signal via a irqchip flag and the core code can act upon it and
>> ignore UNLAZY for edge type interrupts.
>> 
>> But that won't fix the problem at hand. Let's take a step back and look
>> at the larger picture whether this can be reliably "fixed" at all.
>> 
>
> Yeah, I'm still trying to figure when it's correct for a device to do UNLAZY? If I'm
> understanding things, devices that rely on disable_irq*() should set
> it?

Not necessarily. In most cases devices are not re-raising interrupts
before the previous one has been handled and acknowledged in the device.

> Because problem #2 is something that needs to be handled at the
> controller and core level if I got you right.

Yes. We need a irqchip flag for that.

>> > Ack. If there is no way to read back the line state and it's unknown if
>> > the irq controller suffers from problem #2, the only way to still
>> > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
>> > each 2nd irq; or ignore irqs based on timing. That doesn't sound very
>> > robust though, so maybe the driver has to fall back on polling the
>> > status register and not use irqs at all in that case.
>> 
>> Actually ignoring the first interrupt after a SPI transfer and waiting
>> for the next conversion to raise the interrupt again should be robust
>> enough. The ADC has to be in continous conversion mode for that
>> obviously.
>> 
> Might be the only sane option we have, Uwe? If we do this, we could be
> dropping valid samples but only with controllers that suffer from
> #2.

No. You have the same problem with the controllers which do not disable
the edge detection logic.

The interrupt controller raises the interrupt on unmask (enable_irq()).
Depending on timing the device handler might be invoked _before_ the
sample is ready, no?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-13 15:50         ` Thomas Gleixner
@ 2024-11-14  7:49           ` Nuno Sá
  2024-11-14 10:59             ` Uwe Kleine-König
  0 siblings, 1 reply; 13+ messages in thread
From: Nuno Sá @ 2024-11-14  7:49 UTC (permalink / raw)
  To: Thomas Gleixner, Uwe Kleine-König; +Cc: linux-kernel, Jonathan Cameron

On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:
> On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> > > The interrupt does not get to the device handler even in the lazy
> > > disable case. Once the driver invoked disable_irq*() the low level flow
> > > handlers (edge, level ...) mask the interrupt line and marks the
> > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > pending bit is set, except when the interrupt line is level triggered.
> > 
> > There's something that I'm still trying to figure... For IRQ controllers
> > that not
> > disable edge detection, can't we get the device handler called twice if we
> > don't set
> > unlazy?
> > 
> > irq_enable() - > check_irq_resend()
> > 
> > and then
> > 
> > handle_edge_irq() raised by the controller
> 
> You're right. We should have a flag which controls the replay
> requirements of an interrupt controller. So far it only skips for level
> triggered interrupts, but for those controllers it should skip for edge
> too. Something like IRQCHIP_NO_RESEND ...
> 
> > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > doing the trick but I'm not seeing how...
> 
> IRQS_REPLAY is just internal state to avoid double replay.
> 
> > > On controllers which suffer from the #2 problem UNLAZY should indeed be
> > > ignored for edge type interrupts. That's something which the controller
> > > should signal via a irqchip flag and the core code can act upon it and
> > > ignore UNLAZY for edge type interrupts.
> > > 
> > > But that won't fix the problem at hand. Let's take a step back and look
> > > at the larger picture whether this can be reliably "fixed" at all.
> > > 
> > 
> > Yeah, I'm still trying to figure when it's correct for a device to do
> > UNLAZY? If I'm
> > understanding things, devices that rely on disable_irq*() should set
> > it?
> 
> Not necessarily. In most cases devices are not re-raising interrupts
> before the previous one has been handled and acknowledged in the device.
> 
> > Because problem #2 is something that needs to be handled at the
> > controller and core level if I got you right.
> 
> Yes. We need a irqchip flag for that.
> 
> > > > Ack. If there is no way to read back the line state and it's unknown if
> > > > the irq controller suffers from problem #2, the only way to still
> > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
> > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound very
> > > > robust though, so maybe the driver has to fall back on polling the
> > > > status register and not use irqs at all in that case.
> > > 
> > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > for the next conversion to raise the interrupt again should be robust
> > > enough. The ADC has to be in continous conversion mode for that
> > > obviously.
> > > 
> > Might be the only sane option we have, Uwe? If we do this, we could be
> > dropping valid samples but only with controllers that suffer from
> > #2.
> 
> No. You have the same problem with the controllers which do not disable
> the edge detection logic.
> 
> The interrupt controller raises the interrupt on unmask (enable_irq()).
> Depending on timing the device handler might be invoked _before_ the
> sample is ready, no?
> 

For those controllers, I think it's almost always guaranteed that the first IRQ
after enable is not really a valid sample. We'll always have some SPI transfer
(that should latch an IRQ on the controller) before enable_irq().

- Nuno Sá


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-14  7:49           ` Nuno Sá
@ 2024-11-14 10:59             ` Uwe Kleine-König
  2024-11-14 12:04               ` Nuno Sá
  0 siblings, 1 reply; 13+ messages in thread
From: Uwe Kleine-König @ 2024-11-14 10:59 UTC (permalink / raw)
  To: Nuno Sá; +Cc: Thomas Gleixner, linux-kernel, Jonathan Cameron

[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]

Hello,

On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:
> On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:
> > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> > > > The interrupt does not get to the device handler even in the lazy
> > > > disable case. Once the driver invoked disable_irq*() the low level flow
> > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > pending bit is set, except when the interrupt line is level triggered.
> > > 
> > > There's something that I'm still trying to figure... For IRQ controllers
> > > that not
> > > disable edge detection, can't we get the device handler called twice if we
> > > don't set
> > > unlazy?
> > > 
> > > irq_enable() - > check_irq_resend()
> > > 
> > > and then
> > > 
> > > handle_edge_irq() raised by the controller
> > 
> > You're right. We should have a flag which controls the replay
> > requirements of an interrupt controller. So far it only skips for level
> > triggered interrupts, but for those controllers it should skip for edge
> > too. Something like IRQCHIP_NO_RESEND ...

Agreed, if the irq gets pending while disabled in both hardware and
software, that shouldn't result in two invokations. Is this an issue for
level irqs only? For edge irqs this only happens with lazy disable and
if two events happen. Hm, I guess in that case we still only want a single
invokation of the irq handler?

> > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > doing the trick but I'm not seeing how...
> > 
> > IRQS_REPLAY is just internal state to avoid double replay.
> > 
> > > > On controllers which suffer from the #2 problem UNLAZY should indeed be
> > > > ignored for edge type interrupts. That's something which the controller
> > > > should signal via a irqchip flag and the core code can act upon it and
> > > > ignore UNLAZY for edge type interrupts.
> > > > 
> > > > But that won't fix the problem at hand. Let's take a step back and look
> > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > 
> > > 
> > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > UNLAZY? If I'm
> > > understanding things, devices that rely on disable_irq*() should set
> > > it?
> > 
> > Not necessarily. In most cases devices are not re-raising interrupts
> > before the previous one has been handled and acknowledged in the device.

Usage of UNLAZY should never affect correctness. It's "only" a
performance optimisation which has a positive effect if it's expected
that an irq event happens while it's masked.

> > > Because problem #2 is something that needs to be handled at the
> > > controller and core level if I got you right.
> > 
> > Yes. We need a irqchip flag for that.
> > 
> > > > > Ack. If there is no way to read back the line state and it's unknown if
> > > > > the irq controller suffers from problem #2, the only way to still
> > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
> > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound very
> > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > status register and not use irqs at all in that case.
> > > > 
> > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > for the next conversion to raise the interrupt again should be robust
> > > > enough. The ADC has to be in continous conversion mode for that
> > > > obviously.
> > > > 
> > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > dropping valid samples but only with controllers that suffer from
> > > #2.
> > 
> > No. You have the same problem with the controllers which do not disable
> > the edge detection logic.
> > 
> > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > Depending on timing the device handler might be invoked _before_ the
> > sample is ready, no?
> 
> For those controllers, I think it's almost always guaranteed that the first IRQ
> after enable is not really a valid sample. We'll always have some SPI transfer
> (that should latch an IRQ on the controller) before enable_irq().

The first irq isn't a valid sample unless the driver is preempted
between the spi transfer and the following enable_irq() such that the
irq event triggered by the SPI transfer doesn't result in calling the
irq handler before the sample is ready. I guess that's what you ruled
out by saying "almost always"? I'd recommend to not rely on that. Chips
become faster (and so conversion time shorter) which widens the race
window and if you become unsynchronized and ignore every wrong second
irq all samples become bogous.

So I still think the extra GPIO read should be implemented (as I
proposed in
https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/)
to guarantee reliable operation. If that isn't possible the only really
robust way to operate is using polling.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-14 10:59             ` Uwe Kleine-König
@ 2024-11-14 12:04               ` Nuno Sá
  2024-11-23 11:28                 ` Jonathan Cameron
  0 siblings, 1 reply; 13+ messages in thread
From: Nuno Sá @ 2024-11-14 12:04 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Thomas Gleixner, linux-kernel, Jonathan Cameron

On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> Hello,
> 
> On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:
> > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:
> > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> > > > > The interrupt does not get to the device handler even in the lazy
> > > > > disable case. Once the driver invoked disable_irq*() the low level
> > > > > flow
> > > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > > pending bit is set, except when the interrupt line is level triggered.
> > > > 
> > > > There's something that I'm still trying to figure... For IRQ controllers
> > > > that not
> > > > disable edge detection, can't we get the device handler called twice if
> > > > we
> > > > don't set
> > > > unlazy?
> > > > 
> > > > irq_enable() - > check_irq_resend()
> > > > 
> > > > and then
> > > > 
> > > > handle_edge_irq() raised by the controller
> > > 
> > > You're right. We should have a flag which controls the replay
> > > requirements of an interrupt controller. So far it only skips for level
> > > triggered interrupts, but for those controllers it should skip for edge
> > > too. Something like IRQCHIP_NO_RESEND ...
> 
> Agreed, if the irq gets pending while disabled in both hardware and
> software, that shouldn't result in two invokations. Is this an issue for
> level irqs only? For edge irqs this only happens with lazy disable and

Resending is already ignore for level...

> if two events happen. Hm, I guess in that case we still only want a single
> invokation of the irq handler?
> 
> > > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > > doing the trick but I'm not seeing how...
> > > 
> > > IRQS_REPLAY is just internal state to avoid double replay.
> > > 
> > > > > On controllers which suffer from the #2 problem UNLAZY should indeed
> > > > > be
> > > > > ignored for edge type interrupts. That's something which the
> > > > > controller
> > > > > should signal via a irqchip flag and the core code can act upon it and
> > > > > ignore UNLAZY for edge type interrupts.
> > > > > 
> > > > > But that won't fix the problem at hand. Let's take a step back and
> > > > > look
> > > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > > 
> > > > 
> > > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > > UNLAZY? If I'm
> > > > understanding things, devices that rely on disable_irq*() should set
> > > > it?
> > > 
> > > Not necessarily. In most cases devices are not re-raising interrupts
> > > before the previous one has been handled and acknowledged in the device.
> 
> Usage of UNLAZY should never affect correctness. It's "only" a
> performance optimisation which has a positive effect if it's expected
> that an irq event happens while it's masked.
> 
> > > > Because problem #2 is something that needs to be handled at the
> > > > controller and core level if I got you right.
> > > 
> > > Yes. We need a irqchip flag for that.
> > > 
> > > > > > Ack. If there is no way to read back the line state and it's unknown
> > > > > > if
> > > > > > the irq controller suffers from problem #2, the only way to still
> > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act
> > > > > > on
> > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound
> > > > > > very
> > > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > > status register and not use irqs at all in that case.
> > > > > 
> > > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > > for the next conversion to raise the interrupt again should be robust
> > > > > enough. The ADC has to be in continous conversion mode for that
> > > > > obviously.
> > > > > 
> > > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > > dropping valid samples but only with controllers that suffer from
> > > > #2.
> > > 
> > > No. You have the same problem with the controllers which do not disable
> > > the edge detection logic.
> > > 
> > > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > > Depending on timing the device handler might be invoked _before_ the
> > > sample is ready, no?
> > 
> > For those controllers, I think it's almost always guaranteed that the first
> > IRQ
> > after enable is not really a valid sample. We'll always have some SPI
> > transfer
> > (that should latch an IRQ on the controller) before enable_irq().
> 
> The first irq isn't a valid sample unless the driver is preempted
> between the spi transfer and the following enable_irq() such that the
> irq event triggered by the SPI transfer doesn't result in calling the
> irq handler before the sample is ready. I guess that's what you ruled

I guess that race we could prevent by disabling IRQs...

> out by saying "almost always"? I'd recommend to not rely on that. Chips
> become faster (and so conversion time shorter) which widens the race
> window and if you become unsynchronized and ignore every wrong second
> irq all samples become bogous.

Right now we set UNLAZY and that brings this difference in behavior depending on
the IRQ controller we have. But if we remove that change and make sure there can
be no race between enable_irq() and the last spi_transfer, it should be safe to
assume the first time we get in the handler is not for a valid sample. Not sure
synchronization could be an issue to the point where you ignore all samples. If
you ignore one IRQ, then the next one needs to be a valid sample (as there
should be no spi_transfer in between). But not sure if it can affect
performance...

I think right now, unless the IRQ controller suffers from #2, every time we get
in the device handler after enable_irq() is not because of DRDY and having a
valid sample or not is pure luck. 

> 
> So I still think the extra GPIO read should be implemented (as I
> proposed in
> https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> )
> to guarantee reliable operation. If that isn't possible the only really
> robust way to operate is using polling.

My only issue with the gpio approach and your conversation with Thomas seems to
prove it is that we're not guaranteed to be able to read the line. I guess your
reasoning is that if we can't do that for a platform, then don't give the gpio
in DT? But in that case, are we left with a device that might or might not work?

"Funny stuff"...

- Nuno Sá


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-14 12:04               ` Nuno Sá
@ 2024-11-23 11:28                 ` Jonathan Cameron
  2024-11-24 13:18                   ` Nuno Sá
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Cameron @ 2024-11-23 11:28 UTC (permalink / raw)
  To: Nuno Sá; +Cc: Uwe Kleine-König, Thomas Gleixner, linux-kernel

On Thu, 14 Nov 2024 13:04:58 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> > Hello,
> > 
> > On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:  
> > > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:  
> > > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:  
> > > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:  
> > > > > > The interrupt does not get to the device handler even in the lazy
> > > > > > disable case. Once the driver invoked disable_irq*() the low level
> > > > > > flow
> > > > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > > > pending bit is set, except when the interrupt line is level triggered.  
> > > > > 
> > > > > There's something that I'm still trying to figure... For IRQ controllers
> > > > > that not
> > > > > disable edge detection, can't we get the device handler called twice if
> > > > > we
> > > > > don't set
> > > > > unlazy?
> > > > > 
> > > > > irq_enable() - > check_irq_resend()
> > > > > 
> > > > > and then
> > > > > 
> > > > > handle_edge_irq() raised by the controller  
> > > > 
> > > > You're right. We should have a flag which controls the replay
> > > > requirements of an interrupt controller. So far it only skips for level
> > > > triggered interrupts, but for those controllers it should skip for edge
> > > > too. Something like IRQCHIP_NO_RESEND ...  
> > 
> > Agreed, if the irq gets pending while disabled in both hardware and
> > software, that shouldn't result in two invokations. Is this an issue for
> > level irqs only? For edge irqs this only happens with lazy disable and  
> 
> Resending is already ignore for level...
> 
> > if two events happen. Hm, I guess in that case we still only want a single
> > invokation of the irq handler?
> >   
> > > > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > > > doing the trick but I'm not seeing how...  
> > > > 
> > > > IRQS_REPLAY is just internal state to avoid double replay.
> > > >   
> > > > > > On controllers which suffer from the #2 problem UNLAZY should indeed
> > > > > > be
> > > > > > ignored for edge type interrupts. That's something which the
> > > > > > controller
> > > > > > should signal via a irqchip flag and the core code can act upon it and
> > > > > > ignore UNLAZY for edge type interrupts.
> > > > > > 
> > > > > > But that won't fix the problem at hand. Let's take a step back and
> > > > > > look
> > > > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > > >   
> > > > > 
> > > > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > > > UNLAZY? If I'm
> > > > > understanding things, devices that rely on disable_irq*() should set
> > > > > it?  
> > > > 
> > > > Not necessarily. In most cases devices are not re-raising interrupts
> > > > before the previous one has been handled and acknowledged in the device.  
> > 
> > Usage of UNLAZY should never affect correctness. It's "only" a
> > performance optimisation which has a positive effect if it's expected
> > that an irq event happens while it's masked.
> >   
> > > > > Because problem #2 is something that needs to be handled at the
> > > > > controller and core level if I got you right.  
> > > > 
> > > > Yes. We need a irqchip flag for that.
> > > >   
> > > > > > > Ack. If there is no way to read back the line state and it's unknown
> > > > > > > if
> > > > > > > the irq controller suffers from problem #2, the only way to still
> > > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act
> > > > > > > on
> > > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound
> > > > > > > very
> > > > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > > > status register and not use irqs at all in that case.  
> > > > > > 
> > > > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > > > for the next conversion to raise the interrupt again should be robust
> > > > > > enough. The ADC has to be in continous conversion mode for that
> > > > > > obviously.
> > > > > >   
> > > > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > > > dropping valid samples but only with controllers that suffer from
> > > > > #2.  
> > > > 
> > > > No. You have the same problem with the controllers which do not disable
> > > > the edge detection logic.
> > > > 
> > > > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > > > Depending on timing the device handler might be invoked _before_ the
> > > > sample is ready, no?  
> > > 
> > > For those controllers, I think it's almost always guaranteed that the first
> > > IRQ
> > > after enable is not really a valid sample. We'll always have some SPI
> > > transfer
> > > (that should latch an IRQ on the controller) before enable_irq().  
> > 
> > The first irq isn't a valid sample unless the driver is preempted
> > between the spi transfer and the following enable_irq() such that the
> > irq event triggered by the SPI transfer doesn't result in calling the
> > irq handler before the sample is ready. I guess that's what you ruled  
> 
> I guess that race we could prevent by disabling IRQs...
> 
> > out by saying "almost always"? I'd recommend to not rely on that. Chips
> > become faster (and so conversion time shorter) which widens the race
> > window and if you become unsynchronized and ignore every wrong second
> > irq all samples become bogous.  
> 
> Right now we set UNLAZY and that brings this difference in behavior depending on
> the IRQ controller we have. But if we remove that change and make sure there can
> be no race between enable_irq() and the last spi_transfer, it should be safe to
> assume the first time we get in the handler is not for a valid sample. Not sure
> synchronization could be an issue to the point where you ignore all samples. If
> you ignore one IRQ, then the next one needs to be a valid sample (as there
> should be no spi_transfer in between). But not sure if it can affect
> performance...

I think it is overly optimistic to assume that an interrupt controller will
definitely catch both edges.  IIRC some of them have an interesting acknowledge
dance before they can see an other edge at all.

So it's also possible we only see one interrupt even though there were loads
from the spi transfer (which can also trigger multiple if slow enough)

> 
> I think right now, unless the IRQ controller suffers from #2, every time we get
> in the device handler after enable_irq() is not because of DRDY and having a
> valid sample or not is pure luck. 
> 
> > 
> > So I still think the extra GPIO read should be implemented (as I
> > proposed in
> > https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> > )
> > to guarantee reliable operation. If that isn't possible the only really
> > robust way to operate is using polling.  
> 
> My only issue with the gpio approach and your conversation with Thomas seems to
> prove it is that we're not guaranteed to be able to read the line. I guess your
> reasoning is that if we can't do that for a platform, then don't give the gpio
> in DT? But in that case, are we left with a device that might or might not work?
Now we have some input from Thomas, I'm happier that we basically have no choice
for at least some controllers :(

I don't mind the GPIO being optional, but I don't want to break existing boards
that happen to work (which your patch doesn't do, so fine there).
It probably makes sense to add quite a bit of documentation to the DT binding
to provide some background though keeping it OS independent may be a little fiddly.
Perhaps even strongly advise using a GPIO so that people describe it that way
if their hardware does allow reading the status.  

I'm not sure, but does the flag Thomas suggested let us 'know' if we can get
away (subject to the race condition) with not having a GPIO?  If it does
then we could have the driver fail to probe (or poll instead) for cases where
we do need it for correctness (e.g. the RPI)

So in conclusion, some more docs in the dt-binding and I'm fine with your series Uwe.
Sounds like an additional changes to not do lazy on some controllers also makes
sense.

Jonathan

> 
> "Funny stuff"...
> 
> - Nuno Sá
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-23 11:28                 ` Jonathan Cameron
@ 2024-11-24 13:18                   ` Nuno Sá
  2024-11-25  8:50                     ` Uwe Kleine-König
  0 siblings, 1 reply; 13+ messages in thread
From: Nuno Sá @ 2024-11-24 13:18 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Uwe Kleine-König, Thomas Gleixner, linux-kernel

On Sat, 2024-11-23 at 11:28 +0000, Jonathan Cameron wrote:
> On Thu, 14 Nov 2024 13:04:58 +0100
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> > > Hello,
> > > 
> > > On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:  
> > > > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:  
> > > > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:  
> > > > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:  
> > > > > > > The interrupt does not get to the device handler even in the lazy
> > > > > > > disable case. Once the driver invoked disable_irq*() the low level
> > > > > > > flow
> > > > > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > > > > pending bit is set, except when the interrupt line is level triggered. 
> > > > > > 
> > > > > > There's something that I'm still trying to figure... For IRQ controllers
> > > > > > that not
> > > > > > disable edge detection, can't we get the device handler called twice if
> > > > > > we
> > > > > > don't set
> > > > > > unlazy?
> > > > > > 
> > > > > > irq_enable() - > check_irq_resend()
> > > > > > 
> > > > > > and then
> > > > > > 
> > > > > > handle_edge_irq() raised by the controller  
> > > > > 
> > > > > You're right. We should have a flag which controls the replay
> > > > > requirements of an interrupt controller. So far it only skips for level
> > > > > triggered interrupts, but for those controllers it should skip for edge
> > > > > too. Something like IRQCHIP_NO_RESEND ...  
> > > 
> > > Agreed, if the irq gets pending while disabled in both hardware and
> > > software, that shouldn't result in two invokations. Is this an issue for
> > > level irqs only? For edge irqs this only happens with lazy disable and  
> > 
> > Resending is already ignore for level...
> > 
> > > if two events happen. Hm, I guess in that case we still only want a single
> > > invokation of the irq handler?
> > >   
> > > > > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > > > > doing the trick but I'm not seeing how...  
> > > > > 
> > > > > IRQS_REPLAY is just internal state to avoid double replay.
> > > > >   
> > > > > > > On controllers which suffer from the #2 problem UNLAZY should indeed
> > > > > > > be
> > > > > > > ignored for edge type interrupts. That's something which the
> > > > > > > controller
> > > > > > > should signal via a irqchip flag and the core code can act upon it and
> > > > > > > ignore UNLAZY for edge type interrupts.
> > > > > > > 
> > > > > > > But that won't fix the problem at hand. Let's take a step back and
> > > > > > > look
> > > > > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > > > >   
> > > > > > 
> > > > > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > > > > UNLAZY? If I'm
> > > > > > understanding things, devices that rely on disable_irq*() should set
> > > > > > it?  
> > > > > 
> > > > > Not necessarily. In most cases devices are not re-raising interrupts
> > > > > before the previous one has been handled and acknowledged in the device.  
> > > 
> > > Usage of UNLAZY should never affect correctness. It's "only" a
> > > performance optimisation which has a positive effect if it's expected
> > > that an irq event happens while it's masked.
> > >   
> > > > > > Because problem #2 is something that needs to be handled at the
> > > > > > controller and core level if I got you right.  
> > > > > 
> > > > > Yes. We need a irqchip flag for that.
> > > > >   
> > > > > > > > Ack. If there is no way to read back the line state and it's unknown
> > > > > > > > if
> > > > > > > > the irq controller suffers from problem #2, the only way to still
> > > > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act
> > > > > > > > on
> > > > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound
> > > > > > > > very
> > > > > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > > > > status register and not use irqs at all in that case.  
> > > > > > > 
> > > > > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > > > > for the next conversion to raise the interrupt again should be robust
> > > > > > > enough. The ADC has to be in continous conversion mode for that
> > > > > > > obviously.
> > > > > > >   
> > > > > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > > > > dropping valid samples but only with controllers that suffer from
> > > > > > #2.  
> > > > > 
> > > > > No. You have the same problem with the controllers which do not disable
> > > > > the edge detection logic.
> > > > > 
> > > > > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > > > > Depending on timing the device handler might be invoked _before_ the
> > > > > sample is ready, no?  
> > > > 
> > > > For those controllers, I think it's almost always guaranteed that the first
> > > > IRQ
> > > > after enable is not really a valid sample. We'll always have some SPI
> > > > transfer
> > > > (that should latch an IRQ on the controller) before enable_irq().  
> > > 
> > > The first irq isn't a valid sample unless the driver is preempted
> > > between the spi transfer and the following enable_irq() such that the
> > > irq event triggered by the SPI transfer doesn't result in calling the
> > > irq handler before the sample is ready. I guess that's what you ruled  
> > 
> > I guess that race we could prevent by disabling IRQs...
> > 
> > > out by saying "almost always"? I'd recommend to not rely on that. Chips
> > > become faster (and so conversion time shorter) which widens the race
> > > window and if you become unsynchronized and ignore every wrong second
> > > irq all samples become bogous.  
> > 
> > Right now we set UNLAZY and that brings this difference in behavior depending on
> > the IRQ controller we have. But if we remove that change and make sure there can
> > be no race between enable_irq() and the last spi_transfer, it should be safe to
> > assume the first time we get in the handler is not for a valid sample. Not sure
> > synchronization could be an issue to the point where you ignore all samples. If
> > you ignore one IRQ, then the next one needs to be a valid sample (as there
> > should be no spi_transfer in between). But not sure if it can affect
> > performance...
> 
> I think it is overly optimistic to assume that an interrupt controller will
> definitely catch both edges.  IIRC some of them have an interesting acknowledge
> dance before they can see an other edge at all.
> 
> So it's also possible we only see one interrupt even though there were loads
> from the spi transfer (which can also trigger multiple if slow enough)
> 
> > 
> > I think right now, unless the IRQ controller suffers from #2, every time we get
> > in the device handler after enable_irq() is not because of DRDY and having a
> > valid sample or not is pure luck. 
> > 
> > > 
> > > So I still think the extra GPIO read should be implemented (as I
> > > proposed in
> > > https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> > > )
> > > to guarantee reliable operation. If that isn't possible the only really
> > > robust way to operate is using polling.  
> > 
> > My only issue with the gpio approach and your conversation with Thomas seems to
> > prove it is that we're not guaranteed to be able to read the line. I guess your
> > reasoning is that if we can't do that for a platform, then don't give the gpio
> > in DT? But in that case, are we left with a device that might or might not work?
> Now we have some input from Thomas, I'm happier that we basically have no choice
> for at least some controllers :(

Agreed. I'm not opposing to the GPIO change (even though not perfect) since it's
better that what we have today and from this whole discussion, it also looks like
there's not perfect solution anyways.

> 
> I don't mind the GPIO being optional, but I don't want to break existing boards
> that happen to work (which your patch doesn't do, so fine there).
> It probably makes sense to add quite a bit of documentation to the DT binding
> to provide some background though keeping it OS independent may be a little fiddly.
> Perhaps even strongly advise using a GPIO so that people describe it that way
> if their hardware does allow reading the status.  
> 
> I'm not sure, but does the flag Thomas suggested let us 'know' if we can get
> away (subject to the race condition) with not having a GPIO?  If it does
> then we could have the driver fail to probe (or poll instead) for cases where
> we do need it for correctness (e.g. the RPI)
> 

Don't think so. IIUC, the flag would be something that controllers that cannot do
edge detection logic when masked would pass to the core so that the UNLAZY flag would
be ignored,.

> So in conclusion, some more docs in the dt-binding and I'm fine with your series
> Uwe.
> Sounds like an additional changes to not do lazy on some controllers also makes
> sense.
> 

In theory, and from this discussion, it seems we should not be doing UNLAZY in the
library (it seems that it combination with some "broken" controllers, it "fixes" some
issues) but at this point I'm afraid we could be breaking some users.

- Nuno Sá


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-24 13:18                   ` Nuno Sá
@ 2024-11-25  8:50                     ` Uwe Kleine-König
  2024-11-25  9:08                       ` Nuno Sá
  0 siblings, 1 reply; 13+ messages in thread
From: Uwe Kleine-König @ 2024-11-25  8:50 UTC (permalink / raw)
  To: Nuno Sá; +Cc: Jonathan Cameron, Thomas Gleixner, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11282 bytes --]

Hello,

On Sun, Nov 24, 2024 at 02:18:27PM +0100, Nuno Sá wrote:
> On Sat, 2024-11-23 at 11:28 +0000, Jonathan Cameron wrote:
> > On Thu, 14 Nov 2024 13:04:58 +0100
> > Nuno Sá <noname.nuno@gmail.com> wrote:
> > 
> > > On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> > > > On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:  
> > > > > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:  
> > > > > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:  
> > > > > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:  
> > > > > > > > The interrupt does not get to the device handler even in the lazy
> > > > > > > > disable case. Once the driver invoked disable_irq*() the low level
> > > > > > > > flow
> > > > > > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > > > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > > > > > pending bit is set, except when the interrupt line is level triggered. 
> > > > > > > 
> > > > > > > There's something that I'm still trying to figure... For IRQ controllers
> > > > > > > that not
> > > > > > > disable edge detection, can't we get the device handler called twice if
> > > > > > > we
> > > > > > > don't set
> > > > > > > unlazy?
> > > > > > > 
> > > > > > > irq_enable() - > check_irq_resend()
> > > > > > > 
> > > > > > > and then
> > > > > > > 
> > > > > > > handle_edge_irq() raised by the controller  
> > > > > > 
> > > > > > You're right. We should have a flag which controls the replay
> > > > > > requirements of an interrupt controller. So far it only skips for level
> > > > > > triggered interrupts, but for those controllers it should skip for edge
> > > > > > too. Something like IRQCHIP_NO_RESEND ...  
> > > > 
> > > > Agreed, if the irq gets pending while disabled in both hardware and
> > > > software, that shouldn't result in two invokations. Is this an issue for
> > > > level irqs only? For edge irqs this only happens with lazy disable and  
> > > 
> > > Resending is already ignore for level...
> > > 
> > > > if two events happen. Hm, I guess in that case we still only want a single
> > > > invokation of the irq handler?
> > > >   
> > > > > > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > > > > > doing the trick but I'm not seeing how...  
> > > > > > 
> > > > > > IRQS_REPLAY is just internal state to avoid double replay.
> > > > > >   
> > > > > > > > On controllers which suffer from the #2 problem UNLAZY should indeed
> > > > > > > > be
> > > > > > > > ignored for edge type interrupts. That's something which the
> > > > > > > > controller
> > > > > > > > should signal via a irqchip flag and the core code can act upon it and
> > > > > > > > ignore UNLAZY for edge type interrupts.
> > > > > > > > 
> > > > > > > > But that won't fix the problem at hand. Let's take a step back and
> > > > > > > > look
> > > > > > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > > > > >   
> > > > > > > 
> > > > > > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > > > > > UNLAZY? If I'm
> > > > > > > understanding things, devices that rely on disable_irq*() should set
> > > > > > > it?  
> > > > > > 
> > > > > > Not necessarily. In most cases devices are not re-raising interrupts
> > > > > > before the previous one has been handled and acknowledged in the device.  
> > > > 
> > > > Usage of UNLAZY should never affect correctness. It's "only" a
> > > > performance optimisation which has a positive effect if it's expected
> > > > that an irq event happens while it's masked.
> > > >   
> > > > > > > Because problem #2 is something that needs to be handled at the
> > > > > > > controller and core level if I got you right.  
> > > > > > 
> > > > > > Yes. We need a irqchip flag for that.
> > > > > >   
> > > > > > > > > Ack. If there is no way to read back the line state and it's unknown
> > > > > > > > > if
> > > > > > > > > the irq controller suffers from problem #2, the only way to still
> > > > > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act
> > > > > > > > > on
> > > > > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound
> > > > > > > > > very
> > > > > > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > > > > > status register and not use irqs at all in that case.  
> > > > > > > > 
> > > > > > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > > > > > for the next conversion to raise the interrupt again should be robust
> > > > > > > > enough. The ADC has to be in continous conversion mode for that
> > > > > > > > obviously.
> > > > > > > >   
> > > > > > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > > > > > dropping valid samples but only with controllers that suffer from
> > > > > > > #2.  
> > > > > > 
> > > > > > No. You have the same problem with the controllers which do not disable
> > > > > > the edge detection logic.
> > > > > > 
> > > > > > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > > > > > Depending on timing the device handler might be invoked _before_ the
> > > > > > sample is ready, no?  
> > > > > 
> > > > > For those controllers, I think it's almost always guaranteed that the first
> > > > > IRQ
> > > > > after enable is not really a valid sample. We'll always have some SPI
> > > > > transfer
> > > > > (that should latch an IRQ on the controller) before enable_irq().  
> > > > 
> > > > The first irq isn't a valid sample unless the driver is preempted
> > > > between the spi transfer and the following enable_irq() such that the
> > > > irq event triggered by the SPI transfer doesn't result in calling the
> > > > irq handler before the sample is ready. I guess that's what you ruled  
> > > 
> > > I guess that race we could prevent by disabling IRQs...
> > > 
> > > > out by saying "almost always"? I'd recommend to not rely on that. Chips
> > > > become faster (and so conversion time shorter) which widens the race
> > > > window and if you become unsynchronized and ignore every wrong second
> > > > irq all samples become bogous.  
> > > 
> > > Right now we set UNLAZY and that brings this difference in behavior depending on
> > > the IRQ controller we have. But if we remove that change and make sure there can
> > > be no race between enable_irq() and the last spi_transfer, it should be safe to
> > > assume the first time we get in the handler is not for a valid sample. Not sure
> > > synchronization could be an issue to the point where you ignore all samples. If
> > > you ignore one IRQ, then the next one needs to be a valid sample (as there
> > > should be no spi_transfer in between). But not sure if it can affect
> > > performance...
> > 
> > I think it is overly optimistic to assume that an interrupt controller will
> > definitely catch both edges.  IIRC some of them have an interesting acknowledge
> > dance before they can see an other edge at all.

Plus there is also the (probably only theoretic) race condition in
combination with a controller suffering from #2 that the irq_enable()
only happens after the conversion was done. Then the irq would be
missed.

> > So it's also possible we only see one interrupt even though there were loads
> > from the spi transfer (which can also trigger multiple if slow enough)

Ack, so I think we all agree now that the rdy-gpio is needed for
reliable operation with irq. If that isn't available due to already
finalized hardware, the only option is polling.

> > > I think right now, unless the IRQ controller suffers from #2, every time we get
> > > in the device handler after enable_irq() is not because of DRDY and having a
> > > valid sample or not is pure luck. 
> > > 
> > > > 
> > > > So I still think the extra GPIO read should be implemented (as I
> > > > proposed in
> > > > https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> > > > )
> > > > to guarantee reliable operation. If that isn't possible the only really
> > > > robust way to operate is using polling.  
> > > 
> > > My only issue with the gpio approach and your conversation with Thomas seems to
> > > prove it is that we're not guaranteed to be able to read the line. I guess your
> > > reasoning is that if we can't do that for a platform, then don't give the gpio
> > > in DT? But in that case, are we left with a device that might or might not work?
> > Now we have some input from Thomas, I'm happier that we basically have no choice
> > for at least some controllers :(
> 
> Agreed. I'm not opposing to the GPIO change (even though not perfect) since it's
> better that what we have today and from this whole discussion, it also looks like
> there's not perfect solution anyways.

What is your concern that lets you say "not perfect"? Is it just that it
won't work on every hardware?
 
> > I don't mind the GPIO being optional, but I don't want to break existing boards
> > that happen to work (which your patch doesn't do, so fine there).
> > It probably makes sense to add quite a bit of documentation to the DT binding
> > to provide some background though keeping it OS independent may be a little fiddly.
> > Perhaps even strongly advise using a GPIO so that people describe it that way
> > if their hardware does allow reading the status.  

Yeah, I also think that this should make it into the documentation
provided by ADI. I didn't check yet what currently is available, but if
there is an application note about how to add such a device to a custom
hardware design, adding a hint there would be nice, too.

> > I'm not sure, but does the flag Thomas suggested let us 'know' if we can get
> > away (subject to the race condition) with not having a GPIO?  If it does
> > then we could have the driver fail to probe (or poll instead) for cases where
> > we do need it for correctness (e.g. the RPI)

In the long run we won't get away without the GPIO if irq controllers
affected by #2 learn to refuse unlazy handling. I have no brilliant idea
here. If we don't question to fix the unlazy handling the best is
probably a warning during probe + checking the status register before
reading a sample? This would essentially mean busy polling on the spi
bus for the affected devices.

> > So in conclusion, some more docs in the dt-binding and I'm fine with your series
> > Uwe.

I'll care for an updated series also to cope for the other feedback I
got.

> > Sounds like an additional changes to not do lazy on some controllers also makes
> > sense.
> 
> In theory, and from this discussion, it seems we should not be doing UNLAZY in the
> library (it seems that it combination with some "broken" controllers, it "fixes" some
> issues) but at this point I'm afraid we could be breaking some users.

And also consider all the other drivers we don't know about that might
break in the same way. Might be worth checking the ones disabling lazy
disable when #2 is addressed.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can/should a disabled irq become pending?
  2024-11-25  8:50                     ` Uwe Kleine-König
@ 2024-11-25  9:08                       ` Nuno Sá
  0 siblings, 0 replies; 13+ messages in thread
From: Nuno Sá @ 2024-11-25  9:08 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Jonathan Cameron, Thomas Gleixner, linux-kernel

On Mon, 2024-11-25 at 09:50 +0100, Uwe Kleine-König wrote:
> Hello,
> 
> On Sun, Nov 24, 2024 at 02:18:27PM +0100, Nuno Sá wrote:
> > On Sat, 2024-11-23 at 11:28 +0000, Jonathan Cameron wrote:
> > > On Thu, 14 Nov 2024 13:04:58 +0100
> > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > 
> > > > On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> > > > > On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:  
> > > > > > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:  
> > > > > > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:  
> > > > > > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:  
> > > > > > > > > The interrupt does not get to the device handler even in the
> > > > > > > > > lazy
> > > > > > > > > disable case. Once the driver invoked disable_irq*() the low
> > > > > > > > > level
> > > > > > > > > flow
> > > > > > > > > handlers (edge, level ...) mask the interrupt line and marks
> > > > > > > > > the
> > > > > > > > > interrupt pending. enable_irq() retriggers the interrupt when
> > > > > > > > > the
> > > > > > > > > pending bit is set, except when the interrupt line is level
> > > > > > > > > triggered. 
> > > > > > > > 
> > > > > > > > There's something that I'm still trying to figure... For IRQ
> > > > > > > > controllers
> > > > > > > > that not
> > > > > > > > disable edge detection, can't we get the device handler called
> > > > > > > > twice if
> > > > > > > > we
> > > > > > > > don't set
> > > > > > > > unlazy?
> > > > > > > > 
> > > > > > > > irq_enable() - > check_irq_resend()
> > > > > > > > 
> > > > > > > > and then
> > > > > > > > 
> > > > > > > > handle_edge_irq() raised by the controller  
> > > > > > > 
> > > > > > > You're right. We should have a flag which controls the replay
> > > > > > > requirements of an interrupt controller. So far it only skips for
> > > > > > > level
> > > > > > > triggered interrupts, but for those controllers it should skip for
> > > > > > > edge
> > > > > > > too. Something like IRQCHIP_NO_RESEND ...  
> > > > > 
> > > > > Agreed, if the irq gets pending while disabled in both hardware and
> > > > > software, that shouldn't result in two invokations. Is this an issue
> > > > > for
> > > > > level irqs only? For edge irqs this only happens with lazy disable
> > > > > and  
> > > > 
> > > > Resending is already ignore for level...
> > > > 
> > > > > if two events happen. Hm, I guess in that case we still only want a
> > > > > single
> > > > > invokation of the irq handler?
> > > > >   
> > > > > > > > Or is the core handling this somehow? I thought IRQS_REPLAY
> > > > > > > > could be
> > > > > > > > doing the trick but I'm not seeing how...  
> > > > > > > 
> > > > > > > IRQS_REPLAY is just internal state to avoid double replay.
> > > > > > >   
> > > > > > > > > On controllers which suffer from the #2 problem UNLAZY should
> > > > > > > > > indeed
> > > > > > > > > be
> > > > > > > > > ignored for edge type interrupts. That's something which the
> > > > > > > > > controller
> > > > > > > > > should signal via a irqchip flag and the core code can act
> > > > > > > > > upon it and
> > > > > > > > > ignore UNLAZY for edge type interrupts.
> > > > > > > > > 
> > > > > > > > > But that won't fix the problem at hand. Let's take a step back
> > > > > > > > > and
> > > > > > > > > look
> > > > > > > > > at the larger picture whether this can be reliably "fixed" at
> > > > > > > > > all.
> > > > > > > > >   
> > > > > > > > 
> > > > > > > > Yeah, I'm still trying to figure when it's correct for a device
> > > > > > > > to do
> > > > > > > > UNLAZY? If I'm
> > > > > > > > understanding things, devices that rely on disable_irq*() should
> > > > > > > > set
> > > > > > > > it?  
> > > > > > > 
> > > > > > > Not necessarily. In most cases devices are not re-raising
> > > > > > > interrupts
> > > > > > > before the previous one has been handled and acknowledged in the
> > > > > > > device.  
> > > > > 
> > > > > Usage of UNLAZY should never affect correctness. It's "only" a
> > > > > performance optimisation which has a positive effect if it's expected
> > > > > that an irq event happens while it's masked.
> > > > >   
> > > > > > > > Because problem #2 is something that needs to be handled at the
> > > > > > > > controller and core level if I got you right.  
> > > > > > > 
> > > > > > > Yes. We need a irqchip flag for that.
> > > > > > >   
> > > > > > > > > > Ack. If there is no way to read back the line state and it's
> > > > > > > > > > unknown
> > > > > > > > > > if
> > > > > > > > > > the irq controller suffers from problem #2, the only way to
> > > > > > > > > > still
> > > > > > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and
> > > > > > > > > > only act
> > > > > > > > > > on
> > > > > > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't
> > > > > > > > > > sound
> > > > > > > > > > very
> > > > > > > > > > robust though, so maybe the driver has to fall back on
> > > > > > > > > > polling the
> > > > > > > > > > status register and not use irqs at all in that case.  
> > > > > > > > > 
> > > > > > > > > Actually ignoring the first interrupt after a SPI transfer and
> > > > > > > > > waiting
> > > > > > > > > for the next conversion to raise the interrupt again should be
> > > > > > > > > robust
> > > > > > > > > enough. The ADC has to be in continous conversion mode for
> > > > > > > > > that
> > > > > > > > > obviously.
> > > > > > > > >   
> > > > > > > > Might be the only sane option we have, Uwe? If we do this, we
> > > > > > > > could be
> > > > > > > > dropping valid samples but only with controllers that suffer
> > > > > > > > from
> > > > > > > > #2.  
> > > > > > > 
> > > > > > > No. You have the same problem with the controllers which do not
> > > > > > > disable
> > > > > > > the edge detection logic.
> > > > > > > 
> > > > > > > The interrupt controller raises the interrupt on unmask
> > > > > > > (enable_irq()).
> > > > > > > Depending on timing the device handler might be invoked _before_
> > > > > > > the
> > > > > > > sample is ready, no?  
> > > > > > 
> > > > > > For those controllers, I think it's almost always guaranteed that
> > > > > > the first
> > > > > > IRQ
> > > > > > after enable is not really a valid sample. We'll always have some
> > > > > > SPI
> > > > > > transfer
> > > > > > (that should latch an IRQ on the controller) before enable_irq().  
> > > > > 
> > > > > The first irq isn't a valid sample unless the driver is preempted
> > > > > between the spi transfer and the following enable_irq() such that the
> > > > > irq event triggered by the SPI transfer doesn't result in calling the
> > > > > irq handler before the sample is ready. I guess that's what you ruled 
> > > > 
> > > > I guess that race we could prevent by disabling IRQs...
> > > > 
> > > > > out by saying "almost always"? I'd recommend to not rely on that.
> > > > > Chips
> > > > > become faster (and so conversion time shorter) which widens the race
> > > > > window and if you become unsynchronized and ignore every wrong second
> > > > > irq all samples become bogous.  
> > > > 
> > > > Right now we set UNLAZY and that brings this difference in behavior
> > > > depending on
> > > > the IRQ controller we have. But if we remove that change and make sure
> > > > there can
> > > > be no race between enable_irq() and the last spi_transfer, it should be
> > > > safe to
> > > > assume the first time we get in the handler is not for a valid sample.
> > > > Not sure
> > > > synchronization could be an issue to the point where you ignore all
> > > > samples. If
> > > > you ignore one IRQ, then the next one needs to be a valid sample (as
> > > > there
> > > > should be no spi_transfer in between). But not sure if it can affect
> > > > performance...
> > > 
> > > I think it is overly optimistic to assume that an interrupt controller
> > > will
> > > definitely catch both edges.  IIRC some of them have an interesting
> > > acknowledge
> > > dance before they can see an other edge at all.
> 
> Plus there is also the (probably only theoretic) race condition in
> combination with a controller suffering from #2 that the irq_enable()
> only happens after the conversion was done. Then the irq would be
> missed.
> 
> > > So it's also possible we only see one interrupt even though there were
> > > loads
> > > from the spi transfer (which can also trigger multiple if slow enough)
> 
> Ack, so I think we all agree now that the rdy-gpio is needed for
> reliable operation with irq. If that isn't available due to already
> finalized hardware, the only option is polling.
> 
> > > > I think right now, unless the IRQ controller suffers from #2, every time
> > > > we get
> > > > in the device handler after enable_irq() is not because of DRDY and
> > > > having a
> > > > valid sample or not is pure luck. 
> > > > 
> > > > > 
> > > > > So I still think the extra GPIO read should be implemented (as I
> > > > > proposed in
> > > > > https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> > > > > )
> > > > > to guarantee reliable operation. If that isn't possible the only
> > > > > really
> > > > > robust way to operate is using polling.  
> > > > 
> > > > My only issue with the gpio approach and your conversation with Thomas
> > > > seems to
> > > > prove it is that we're not guaranteed to be able to read the line. I
> > > > guess your
> > > > reasoning is that if we can't do that for a platform, then don't give
> > > > the gpio
> > > > in DT? But in that case, are we left with a device that might or might
> > > > not work?
> > > Now we have some input from Thomas, I'm happier that we basically have no
> > > choice
> > > for at least some controllers :(
> > 
> > Agreed. I'm not opposing to the GPIO change (even though not perfect) since
> > it's
> > better that what we have today and from this whole discussion, it also looks
> > like
> > there's not perfect solution anyways.
> 
> What is your concern that lets you say "not perfect"? Is it just that it
> won't work on every hardware?

Pretty much :)...

- Nuno Sá


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-25  9:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-11 13:03 can/should a disabled irq become pending? Uwe Kleine-König
2024-11-12 19:35 ` Thomas Gleixner
2024-11-12 22:08   ` Uwe Kleine-König
2024-11-13  3:40     ` Thomas Gleixner
2024-11-13 10:34       ` Nuno Sá
2024-11-13 15:50         ` Thomas Gleixner
2024-11-14  7:49           ` Nuno Sá
2024-11-14 10:59             ` Uwe Kleine-König
2024-11-14 12:04               ` Nuno Sá
2024-11-23 11:28                 ` Jonathan Cameron
2024-11-24 13:18                   ` Nuno Sá
2024-11-25  8:50                     ` Uwe Kleine-König
2024-11-25  9:08                       ` Nuno Sá

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox