* [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
@ 2020-10-22  9:26 Lorenzo Bianconi
  2020-11-01 16:33 ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-10-22  9:26 UTC (permalink / raw)
  To: jic23; +Cc: linux-iio, lorenzo.bianconi, mario.tesi
If the device is configured to trigger edge interrupts it is possible to
miss samples since the sensor can generate an interrupt while the driver
is still processing the previous one.
Poll FIFO status register to process all pending interrupts.
Configure IRQF_ONESHOT only for level interrupts.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
 1 file changed, 25 insertions(+), 8 deletions(-)
diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
index 5e584c6026f1..d43b08ceec01 100644
--- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
+++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
@@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
 	return data & event_settings->wakeup_src_status_mask;
 }
 
+static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
+{
+	return IRQ_WAKE_THREAD;
+}
+
 static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
 {
 	struct st_lsm6dsx_hw *hw = private;
+	int fifo_len = 0, len = 0;
 	bool event;
-	int count;
 
 	event = st_lsm6dsx_report_motion_event(hw);
 
 	if (!hw->settings->fifo_ops.read_fifo)
 		return event ? IRQ_HANDLED : IRQ_NONE;
 
-	mutex_lock(&hw->fifo_lock);
-	count = hw->settings->fifo_ops.read_fifo(hw);
-	mutex_unlock(&hw->fifo_lock);
+	/*
+	 * If we are using edge IRQs, new samples can arrive while
+	 * processing current IRQ and those may be missed unless we
+	 * pick them here, so let's try read FIFO status again
+	 */
+	do {
+		mutex_lock(&hw->fifo_lock);
+		len = hw->settings->fifo_ops.read_fifo(hw);
+		mutex_unlock(&hw->fifo_lock);
+
+		fifo_len += len;
+	} while (len > 0);
 
-	return count || event ? IRQ_HANDLED : IRQ_NONE;
+	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
 }
 
 static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
@@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
 
 	switch (irq_type) {
 	case IRQF_TRIGGER_HIGH:
+		irq_type |= IRQF_ONESHOT;
+		fallthrough;
 	case IRQF_TRIGGER_RISING:
 		irq_active_low = false;
 		break;
 	case IRQF_TRIGGER_LOW:
+		irq_type |= IRQF_ONESHOT;
+		fallthrough;
 	case IRQF_TRIGGER_FALLING:
 		irq_active_low = true;
 		break;
@@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
 	}
 
 	err = devm_request_threaded_irq(hw->dev, hw->irq,
-					NULL,
+					st_lsm6dsx_handler_irq,
 					st_lsm6dsx_handler_thread,
-					irq_type | IRQF_ONESHOT,
-					"lsm6dsx", hw);
+					irq_type, "lsm6dsx", hw);
 	if (err) {
 		dev_err(hw->dev, "failed to request trigger irq %d\n",
 			hw->irq);
-- 
2.26.2
^ permalink raw reply related	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-10-22  9:26 [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts Lorenzo Bianconi
@ 2020-11-01 16:33 ` Jonathan Cameron
  2020-11-02 10:15   ` Lorenzo Bianconi
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2020-11-01 16:33 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: linux-iio, lorenzo.bianconi, mario.tesi
On Thu, 22 Oct 2020 11:26:53 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> If the device is configured to trigger edge interrupts it is possible to
> miss samples since the sensor can generate an interrupt while the driver
> is still processing the previous one.
> Poll FIFO status register to process all pending interrupts.
> Configure IRQF_ONESHOT only for level interrupts.
Hmm. This sort of case is often extremely prone to race conditions.
I'd like to see more explanation of why we don't have one after this
fix.  Edge interrupts for FIFOs are horrible!
Dropping IRQF_ONESHOT should mean we enter the threaded handler with
interrupts enabled, but if another one happens we still have to wait
for the thread to finish before we schedule it again.
We should only do that if we disabled the interrupt in the top half,
which we haven't done here (you are working around the warnings
that would be printed with the otherwise pointless top half).
I 'assume' that the interrupts are latched.  So we won't get a new
interrupt until we have taken some action to clear it?  In this
case that action is removing items from the fifo?
IIRC, if we get an interrupt whilst it is masked due to IRQF_ONESHOT
then it is left pending until we exit the thread.  So that should
be sufficient to close a potential edge condition where we clear
the fifo, and it immediately fires again.  This pending behaviour
is necessary to avoid the race that would happen in any normal handler.
Hmm. Having had a look at one of the datasheets, I'm far from convinced these
parts truely support edge interrupts.  I can't see anything about minimum
off periods etc that you need for true edge interrupts. Otherwise they are
going to be prone to races.
So I think the following can happen.
A) We drain the fifo and it stays under the limit. Hence once that
   is crossed in future we will interrupt as normal.
B) We drain the fifo but it either has a very low watermark, or is
   filling very fast.   We manage to drain enough to get the interrupt
   to fire again, so all is fine if less than ideal.  With you loop we
   may up entering the interrupt handler when we don't actually need to.
   If you want to avoid that you would need to disable the interrupt,
   then drain the fifo and finally do a dance to successfully reenable
   the interrupt, whilst ensuring no chance of missing by checking it
   should not have fired (still below the threshold)
C) We try to drain the fifo, but it is actually filling fast enough that
   we never get it under the limit, so no interrupt ever fires.
   With new code, we'll keep spinning to 0 so might eventually drain it.
   That needs a timeout so we just give up eventually.
D) watershed is one sample, we drain low enough to successfully get down
   to zero at the moment of the read, but very very soon after that we get
   one sample again. There is a window in which the interrupt line dropped
   but analogue electronics etc being what they are, it may not have been
   detectable.  Hence we miss an interrupt...  What you are doing is reducing
   the chance of hitting this.  It is nasty, but you might be able to ensure
   a reasonable period by widening this window.  Limit the watermark to 2
   samples?  
Also needs a fixes tag :)
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> index 5e584c6026f1..d43b08ceec01 100644
> --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
>  	return data & event_settings->wakeup_src_status_mask;
>  }
>  
> +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> +{
> +	return IRQ_WAKE_THREAD;
> +}
> +
>  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
>  {
>  	struct st_lsm6dsx_hw *hw = private;
> +	int fifo_len = 0, len = 0;
>  	bool event;
> -	int count;
>  
>  	event = st_lsm6dsx_report_motion_event(hw);
>  
>  	if (!hw->settings->fifo_ops.read_fifo)
>  		return event ? IRQ_HANDLED : IRQ_NONE;
>  
> -	mutex_lock(&hw->fifo_lock);
> -	count = hw->settings->fifo_ops.read_fifo(hw);
> -	mutex_unlock(&hw->fifo_lock);
> +	/*
> +	 * If we are using edge IRQs, new samples can arrive while
> +	 * processing current IRQ and those may be missed unless we
> +	 * pick them here, so let's try read FIFO status again
> +	 */
> +	do {
> +		mutex_lock(&hw->fifo_lock);
> +		len = hw->settings->fifo_ops.read_fifo(hw);
> +		mutex_unlock(&hw->fifo_lock);
> +
> +		fifo_len += len;
> +	} while (len > 0);
>  
> -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
>  }
>  
>  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
>  
>  	switch (irq_type) {
>  	case IRQF_TRIGGER_HIGH:
> +		irq_type |= IRQF_ONESHOT;
> +		fallthrough;
>  	case IRQF_TRIGGER_RISING:
>  		irq_active_low = false;
>  		break;
>  	case IRQF_TRIGGER_LOW:
> +		irq_type |= IRQF_ONESHOT;
> +		fallthrough;
>  	case IRQF_TRIGGER_FALLING:
>  		irq_active_low = true;
>  		break;
> @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
>  	}
>  
>  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> -					NULL,
> +					st_lsm6dsx_handler_irq,
>  					st_lsm6dsx_handler_thread,
> -					irq_type | IRQF_ONESHOT,
> -					"lsm6dsx", hw);
> +					irq_type, "lsm6dsx", hw);
>  	if (err) {
>  		dev_err(hw->dev, "failed to request trigger irq %d\n",
>  			hw->irq);
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-01 16:33 ` Jonathan Cameron
@ 2020-11-02 10:15   ` Lorenzo Bianconi
  2020-11-02 17:44     ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-11-02 10:15 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-iio, lorenzo.bianconi, mario.tesi, denis.ciocca,
	armando.visconti
[-- Attachment #1: Type: text/plain, Size: 7393 bytes --]
> On Thu, 22 Oct 2020 11:26:53 +0200
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > If the device is configured to trigger edge interrupts it is possible to
> > miss samples since the sensor can generate an interrupt while the driver
> > is still processing the previous one.
> > Poll FIFO status register to process all pending interrupts.
> > Configure IRQF_ONESHOT only for level interrupts.
> 
Hi Jonathan,
thx for the review :)
> Hmm. This sort of case is often extremely prone to race conditions.
> I'd like to see more explanation of why we don't have one after this
> fix.  Edge interrupts for FIFOs are horrible!
> 
> Dropping IRQF_ONESHOT should mean we enter the threaded handler with
> interrupts enabled, but if another one happens we still have to wait
> for the thread to finish before we schedule it again.
> We should only do that if we disabled the interrupt in the top half,
> which we haven't done here (you are working around the warnings
> that would be printed with the otherwise pointless top half).
looking at handle_edge_irq (please correct me if I am wrong) IRQF_ONESHOT
takes effect only for level interrupts while for edge-sensitive interrupts
the irq handler runs with the line unmasked. In fact the IRQF_ONESHOT part of
the patch seems not relevant for fixing the issue, I just aligned the code to
st_sensor general handling in st_sensors_allocate_trigger()
(https://elixir.bootlin.com/linux/v5.9.3/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L182).
I think the issue is a new interrupt can fire while we are still processing
the previous one if watermark is low (e.g. 1) and the sensor is running at high
ODR (e.g. 833Hz). Reading again the status register in st_lsm6dsx_handler_thread()
fixes the issue in my tests.
I guess we can just drop the IRQF_ONESHOT chunk and keep the while loop in
st_lsm6dsx_handler_thread(). What do you think?
> 
> I 'assume' that the interrupts are latched.  So we won't get a new
> interrupt until we have taken some action to clear it?  In this
> case that action is removing items from the fifo?
I do not know :). Adding stm folks.
@mario, denis, armando: any pointer for this?
> 
> IIRC, if we get an interrupt whilst it is masked due to IRQF_ONESHOT
> then it is left pending until we exit the thread.  So that should
> be sufficient to close a potential edge condition where we clear
> the fifo, and it immediately fires again.  This pending behaviour
> is necessary to avoid the race that would happen in any normal handler.
I did not get you on this point.
> 
> 
> Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> parts truely support edge interrupts.  I can't see anything about minimum
> off periods etc that you need for true edge interrupts. Otherwise they are
> going to be prone to races.
@mario, denis, armando: any pointer for this?
> 
> So I think the following can happen.
> 
> A) We drain the fifo and it stays under the limit. Hence once that
>    is crossed in future we will interrupt as normal.
> 
> B) We drain the fifo but it either has a very low watermark, or is
>    filling very fast.   We manage to drain enough to get the interrupt
>    to fire again, so all is fine if less than ideal.  With you loop we
>    may up entering the interrupt handler when we don't actually need to.
>    If you want to avoid that you would need to disable the interrupt,
>    then drain the fifo and finally do a dance to successfully reenable
>    the interrupt, whilst ensuring no chance of missing by checking it
>    should not have fired (still below the threshold)
> 
> C) We try to drain the fifo, but it is actually filling fast enough that
>    we never get it under the limit, so no interrupt ever fires.
>    With new code, we'll keep spinning to 0 so might eventually drain it.
>    That needs a timeout so we just give up eventually.
> 
> D) watershed is one sample, we drain low enough to successfully get down
>    to zero at the moment of the read, but very very soon after that we get
>    one sample again. There is a window in which the interrupt line dropped
>    but analogue electronics etc being what they are, it may not have been
>    detectable.  Hence we miss an interrupt...  What you are doing is reducing
>    the chance of hitting this.  It is nasty, but you might be able to ensure
>    a reasonable period by widening this window.  Limit the watermark to 2
>    samples?  
> 
> Also needs a fixes tag :)
ack, I will add them in v2
Regards,
Lorenzo
> 
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> >  1 file changed, 25 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > index 5e584c6026f1..d43b08ceec01 100644
> > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> >  	return data & event_settings->wakeup_src_status_mask;
> >  }
> >  
> > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > +{
> > +	return IRQ_WAKE_THREAD;
> > +}
> > +
> >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> >  {
> >  	struct st_lsm6dsx_hw *hw = private;
> > +	int fifo_len = 0, len = 0;
> >  	bool event;
> > -	int count;
> >  
> >  	event = st_lsm6dsx_report_motion_event(hw);
> >  
> >  	if (!hw->settings->fifo_ops.read_fifo)
> >  		return event ? IRQ_HANDLED : IRQ_NONE;
> >  
> > -	mutex_lock(&hw->fifo_lock);
> > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > -	mutex_unlock(&hw->fifo_lock);
> > +	/*
> > +	 * If we are using edge IRQs, new samples can arrive while
> > +	 * processing current IRQ and those may be missed unless we
> > +	 * pick them here, so let's try read FIFO status again
> > +	 */
> > +	do {
> > +		mutex_lock(&hw->fifo_lock);
> > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > +		mutex_unlock(&hw->fifo_lock);
> > +
> > +		fifo_len += len;
> > +	} while (len > 0);
> >  
> > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> >  }
> >  
> >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> >  
> >  	switch (irq_type) {
> >  	case IRQF_TRIGGER_HIGH:
> > +		irq_type |= IRQF_ONESHOT;
> > +		fallthrough;
> >  	case IRQF_TRIGGER_RISING:
> >  		irq_active_low = false;
> >  		break;
> >  	case IRQF_TRIGGER_LOW:
> > +		irq_type |= IRQF_ONESHOT;
> > +		fallthrough;
> >  	case IRQF_TRIGGER_FALLING:
> >  		irq_active_low = true;
> >  		break;
> > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> >  	}
> >  
> >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > -					NULL,
> > +					st_lsm6dsx_handler_irq,
> >  					st_lsm6dsx_handler_thread,
> > -					irq_type | IRQF_ONESHOT,
> > -					"lsm6dsx", hw);
> > +					irq_type, "lsm6dsx", hw);
> >  	if (err) {
> >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> >  			hw->irq);
> 
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-02 10:15   ` Lorenzo Bianconi
@ 2020-11-02 17:44     ` Jonathan Cameron
  2020-11-02 18:18       ` Lorenzo Bianconi
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2020-11-02 17:44 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jonathan Cameron, linux-iio, lorenzo.bianconi, mario.tesi,
	denis.ciocca, armando.visconti
On Mon, 2 Nov 2020 11:15:21 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > On Thu, 22 Oct 2020 11:26:53 +0200
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >   
> > > If the device is configured to trigger edge interrupts it is possible to
> > > miss samples since the sensor can generate an interrupt while the driver
> > > is still processing the previous one.
> > > Poll FIFO status register to process all pending interrupts.
> > > Configure IRQF_ONESHOT only for level interrupts.  
> >   
> 
> Hi Jonathan,
> 
> thx for the review :)
> 
> > Hmm. This sort of case is often extremely prone to race conditions.
> > I'd like to see more explanation of why we don't have one after this
> > fix.  Edge interrupts for FIFOs are horrible!
> > 
> > Dropping IRQF_ONESHOT should mean we enter the threaded handler with
> > interrupts enabled, but if another one happens we still have to wait
> > for the thread to finish before we schedule it again.
> > We should only do that if we disabled the interrupt in the top half,
> > which we haven't done here (you are working around the warnings
> > that would be printed with the otherwise pointless top half).  
> 
> looking at handle_edge_irq (please correct me if I am wrong) IRQF_ONESHOT
> takes effect only for level interrupts while for edge-sensitive interrupts
> the irq handler runs with the line unmasked. In fact the IRQF_ONESHOT part of
> the patch seems not relevant for fixing the issue, I just aligned the code to
> st_sensor general handling in st_sensors_allocate_trigger()
> (https://elixir.bootlin.com/linux/v5.9.3/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L182).
> I think the issue is a new interrupt can fire while we are still processing
> the previous one if watermark is low (e.g. 1) and the sensor is running at high
> ODR (e.g. 833Hz). Reading again the status register in st_lsm6dsx_handler_thread()
> fixes the issue in my tests.
> I guess we can just drop the IRQF_ONESHOT chunk and keep the while loop in
> st_lsm6dsx_handler_thread(). What do you think?
I'd do that.
> 
> > 
> > I 'assume' that the interrupts are latched.  So we won't get a new
> > interrupt until we have taken some action to clear it?  In this
> > case that action is removing items from the fifo?  
> 
> I do not know :). Adding stm folks.
> @mario, denis, armando: any pointer for this?
> 
> > 
> > IIRC, if we get an interrupt whilst it is masked due to IRQF_ONESHOT
> > then it is left pending until we exit the thread.  So that should
> > be sufficient to close a potential edge condition where we clear
> > the fifo, and it immediately fires again.  This pending behaviour
> > is necessary to avoid the race that would happen in any normal handler.  
> 
> I did not get you on this point.
If an interrupt occurs, even whilst we have it masked, we shouldn't
loose it.  If we did so then any normal handler that clears the interrupt
at the end of doing whatever it needs to do would race against a new interrupt.
So my suspicion is that you aren't actually missing an interrupt, but rather the
drop time is too short to be detected (or effectively not there at all).
> 
> > 
> > 
> > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > parts truely support edge interrupts.  I can't see anything about minimum
> > off periods etc that you need for true edge interrupts. Otherwise they are
> > going to be prone to races.  
> 
> @mario, denis, armando: any pointer for this?
> 
> > 
> > So I think the following can happen.
> > 
> > A) We drain the fifo and it stays under the limit. Hence once that
> >    is crossed in future we will interrupt as normal.
> > 
> > B) We drain the fifo but it either has a very low watermark, or is
> >    filling very fast.   We manage to drain enough to get the interrupt
> >    to fire again, so all is fine if less than ideal.  With you loop we
> >    may up entering the interrupt handler when we don't actually need to.
> >    If you want to avoid that you would need to disable the interrupt,
> >    then drain the fifo and finally do a dance to successfully reenable
> >    the interrupt, whilst ensuring no chance of missing by checking it
> >    should not have fired (still below the threshold)
> > 
> > C) We try to drain the fifo, but it is actually filling fast enough that
> >    we never get it under the limit, so no interrupt ever fires.
> >    With new code, we'll keep spinning to 0 so might eventually drain it.
> >    That needs a timeout so we just give up eventually.
> > 
> > D) watershed is one sample, we drain low enough to successfully get down
> >    to zero at the moment of the read, but very very soon after that we get
> >    one sample again. There is a window in which the interrupt line dropped
> >    but analogue electronics etc being what they are, it may not have been
> >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> >    the chance of hitting this.  It is nasty, but you might be able to ensure
> >    a reasonable period by widening this window.  Limit the watermark to 2
> >    samples?  
> > 
> > Also needs a fixes tag :)  
> 
> ack, I will add them in v2
> 
> Regards,
> Lorenzo
> >   
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > index 5e584c6026f1..d43b08ceec01 100644
> > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > >  	return data & event_settings->wakeup_src_status_mask;
> > >  }
> > >  
> > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > +{
> > > +	return IRQ_WAKE_THREAD;
> > > +}
> > > +
> > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > >  {
> > >  	struct st_lsm6dsx_hw *hw = private;
> > > +	int fifo_len = 0, len = 0;
> > >  	bool event;
> > > -	int count;
> > >  
> > >  	event = st_lsm6dsx_report_motion_event(hw);
> > >  
> > >  	if (!hw->settings->fifo_ops.read_fifo)
> > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > >  
> > > -	mutex_lock(&hw->fifo_lock);
> > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > -	mutex_unlock(&hw->fifo_lock);
> > > +	/*
> > > +	 * If we are using edge IRQs, new samples can arrive while
> > > +	 * processing current IRQ and those may be missed unless we
> > > +	 * pick them here, so let's try read FIFO status again
> > > +	 */
> > > +	do {
> > > +		mutex_lock(&hw->fifo_lock);
> > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > +		mutex_unlock(&hw->fifo_lock);
> > > +
> > > +		fifo_len += len;
> > > +	} while (len > 0);
> > >  
> > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > >  }
> > >  
> > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > >  
> > >  	switch (irq_type) {
> > >  	case IRQF_TRIGGER_HIGH:
> > > +		irq_type |= IRQF_ONESHOT;
> > > +		fallthrough;
> > >  	case IRQF_TRIGGER_RISING:
> > >  		irq_active_low = false;
> > >  		break;
> > >  	case IRQF_TRIGGER_LOW:
> > > +		irq_type |= IRQF_ONESHOT;
> > > +		fallthrough;
> > >  	case IRQF_TRIGGER_FALLING:
> > >  		irq_active_low = true;
> > >  		break;
> > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > >  	}
> > >  
> > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > -					NULL,
> > > +					st_lsm6dsx_handler_irq,
> > >  					st_lsm6dsx_handler_thread,
> > > -					irq_type | IRQF_ONESHOT,
> > > -					"lsm6dsx", hw);
> > > +					irq_type, "lsm6dsx", hw);
> > >  	if (err) {
> > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > >  			hw->irq);  
> >   
> 
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-02 17:44     ` Jonathan Cameron
@ 2020-11-02 18:18       ` Lorenzo Bianconi
  2020-11-08 16:49         ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-11-02 18:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
[-- Attachment #1: Type: text/plain, Size: 9059 bytes --]
> On Mon, 2 Nov 2020 11:15:21 +0100
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > > On Thu, 22 Oct 2020 11:26:53 +0200
> > > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >   
> > > > If the device is configured to trigger edge interrupts it is possible to
> > > > miss samples since the sensor can generate an interrupt while the driver
> > > > is still processing the previous one.
> > > > Poll FIFO status register to process all pending interrupts.
> > > > Configure IRQF_ONESHOT only for level interrupts.  
> > >   
> > 
> > Hi Jonathan,
> > 
> > thx for the review :)
> > 
> > > Hmm. This sort of case is often extremely prone to race conditions.
> > > I'd like to see more explanation of why we don't have one after this
> > > fix.  Edge interrupts for FIFOs are horrible!
> > > 
> > > Dropping IRQF_ONESHOT should mean we enter the threaded handler with
> > > interrupts enabled, but if another one happens we still have to wait
> > > for the thread to finish before we schedule it again.
> > > We should only do that if we disabled the interrupt in the top half,
> > > which we haven't done here (you are working around the warnings
> > > that would be printed with the otherwise pointless top half).  
> > 
> > looking at handle_edge_irq (please correct me if I am wrong) IRQF_ONESHOT
> > takes effect only for level interrupts while for edge-sensitive interrupts
> > the irq handler runs with the line unmasked. In fact the IRQF_ONESHOT part of
> > the patch seems not relevant for fixing the issue, I just aligned the code to
> > st_sensor general handling in st_sensors_allocate_trigger()
> > (https://elixir.bootlin.com/linux/v5.9.3/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L182).
> > I think the issue is a new interrupt can fire while we are still processing
> > the previous one if watermark is low (e.g. 1) and the sensor is running at high
> > ODR (e.g. 833Hz). Reading again the status register in st_lsm6dsx_handler_thread()
> > fixes the issue in my tests.
> > I guess we can just drop the IRQF_ONESHOT chunk and keep the while loop in
> > st_lsm6dsx_handler_thread(). What do you think?
> 
> I'd do that.
ack, I will post a v2 with only this change.
> 
> > 
> > > 
> > > I 'assume' that the interrupts are latched.  So we won't get a new
> > > interrupt until we have taken some action to clear it?  In this
> > > case that action is removing items from the fifo?  
> > 
> > I do not know :). Adding stm folks.
> > @mario, denis, armando: any pointer for this?
> > 
> > > 
> > > IIRC, if we get an interrupt whilst it is masked due to IRQF_ONESHOT
> > > then it is left pending until we exit the thread.  So that should
> > > be sufficient to close a potential edge condition where we clear
> > > the fifo, and it immediately fires again.  This pending behaviour
> > > is necessary to avoid the race that would happen in any normal handler.  
> > 
> > I did not get you on this point.
> 
> If an interrupt occurs, even whilst we have it masked, we shouldn't
> loose it.  If we did so then any normal handler that clears the interrupt
> at the end of doing whatever it needs to do would race against a new interrupt.
> 
> So my suspicion is that you aren't actually missing an interrupt, but rather the
> drop time is too short to be detected (or effectively not there at all).
I guess since edge interrupts run with the line unmasked a new interrupt can fire
while the irq thread is still running (so wake_up_process() will just return) but
the driver has already read fifo_status register and so it will not read new
sample. This case should be fixed reading again the fifo_status register.
Regards,
Lorenzo
> 
> > 
> > > 
> > > 
> > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > parts truely support edge interrupts.  I can't see anything about minimum
> > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > going to be prone to races.  
> > 
> > @mario, denis, armando: any pointer for this?
> > 
> > > 
> > > So I think the following can happen.
> > > 
> > > A) We drain the fifo and it stays under the limit. Hence once that
> > >    is crossed in future we will interrupt as normal.
> > > 
> > > B) We drain the fifo but it either has a very low watermark, or is
> > >    filling very fast.   We manage to drain enough to get the interrupt
> > >    to fire again, so all is fine if less than ideal.  With you loop we
> > >    may up entering the interrupt handler when we don't actually need to.
> > >    If you want to avoid that you would need to disable the interrupt,
> > >    then drain the fifo and finally do a dance to successfully reenable
> > >    the interrupt, whilst ensuring no chance of missing by checking it
> > >    should not have fired (still below the threshold)
> > > 
> > > C) We try to drain the fifo, but it is actually filling fast enough that
> > >    we never get it under the limit, so no interrupt ever fires.
> > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > >    That needs a timeout so we just give up eventually.
> > > 
> > > D) watershed is one sample, we drain low enough to successfully get down
> > >    to zero at the moment of the read, but very very soon after that we get
> > >    one sample again. There is a window in which the interrupt line dropped
> > >    but analogue electronics etc being what they are, it may not have been
> > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > >    a reasonable period by widening this window.  Limit the watermark to 2
> > >    samples?  
> > > 
> > > Also needs a fixes tag :)  
> > 
> > ack, I will add them in v2
> > 
> > Regards,
> > Lorenzo
> > >   
> > > > 
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > ---
> > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > >  	return data & event_settings->wakeup_src_status_mask;
> > > >  }
> > > >  
> > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > +{
> > > > +	return IRQ_WAKE_THREAD;
> > > > +}
> > > > +
> > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > >  {
> > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > +	int fifo_len = 0, len = 0;
> > > >  	bool event;
> > > > -	int count;
> > > >  
> > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > >  
> > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > >  
> > > > -	mutex_lock(&hw->fifo_lock);
> > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > -	mutex_unlock(&hw->fifo_lock);
> > > > +	/*
> > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > +	 * processing current IRQ and those may be missed unless we
> > > > +	 * pick them here, so let's try read FIFO status again
> > > > +	 */
> > > > +	do {
> > > > +		mutex_lock(&hw->fifo_lock);
> > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > +		mutex_unlock(&hw->fifo_lock);
> > > > +
> > > > +		fifo_len += len;
> > > > +	} while (len > 0);
> > > >  
> > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > >  }
> > > >  
> > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > >  
> > > >  	switch (irq_type) {
> > > >  	case IRQF_TRIGGER_HIGH:
> > > > +		irq_type |= IRQF_ONESHOT;
> > > > +		fallthrough;
> > > >  	case IRQF_TRIGGER_RISING:
> > > >  		irq_active_low = false;
> > > >  		break;
> > > >  	case IRQF_TRIGGER_LOW:
> > > > +		irq_type |= IRQF_ONESHOT;
> > > > +		fallthrough;
> > > >  	case IRQF_TRIGGER_FALLING:
> > > >  		irq_active_low = true;
> > > >  		break;
> > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > >  	}
> > > >  
> > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > -					NULL,
> > > > +					st_lsm6dsx_handler_irq,
> > > >  					st_lsm6dsx_handler_thread,
> > > > -					irq_type | IRQF_ONESHOT,
> > > > -					"lsm6dsx", hw);
> > > > +					irq_type, "lsm6dsx", hw);
> > > >  	if (err) {
> > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > >  			hw->irq);  
> > >   
> > 
> 
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-02 18:18       ` Lorenzo Bianconi
@ 2020-11-08 16:49         ` Jonathan Cameron
  2020-11-08 18:27           ` Lorenzo Bianconi
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2020-11-08 16:49 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jonathan Cameron, Lorenzo Bianconi, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
On Mon, 2 Nov 2020 19:18:42 +0100
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > On Mon, 2 Nov 2020 11:15:21 +0100
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >   
> > > > On Thu, 22 Oct 2020 11:26:53 +0200
> > > > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > >     
> > > > > If the device is configured to trigger edge interrupts it is possible to
> > > > > miss samples since the sensor can generate an interrupt while the driver
> > > > > is still processing the previous one.
> > > > > Poll FIFO status register to process all pending interrupts.
> > > > > Configure IRQF_ONESHOT only for level interrupts.    
> > > >     
> > > 
> > > Hi Jonathan,
> > > 
> > > thx for the review :)
> > >   
> > > > Hmm. This sort of case is often extremely prone to race conditions.
> > > > I'd like to see more explanation of why we don't have one after this
> > > > fix.  Edge interrupts for FIFOs are horrible!
> > > > 
> > > > Dropping IRQF_ONESHOT should mean we enter the threaded handler with
> > > > interrupts enabled, but if another one happens we still have to wait
> > > > for the thread to finish before we schedule it again.
> > > > We should only do that if we disabled the interrupt in the top half,
> > > > which we haven't done here (you are working around the warnings
> > > > that would be printed with the otherwise pointless top half).    
> > > 
> > > looking at handle_edge_irq (please correct me if I am wrong) IRQF_ONESHOT
> > > takes effect only for level interrupts while for edge-sensitive interrupts
> > > the irq handler runs with the line unmasked. In fact the IRQF_ONESHOT part of
> > > the patch seems not relevant for fixing the issue, I just aligned the code to
> > > st_sensor general handling in st_sensors_allocate_trigger()
> > > (https://elixir.bootlin.com/linux/v5.9.3/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L182).
> > > I think the issue is a new interrupt can fire while we are still processing
> > > the previous one if watermark is low (e.g. 1) and the sensor is running at high
> > > ODR (e.g. 833Hz). Reading again the status register in st_lsm6dsx_handler_thread()
> > > fixes the issue in my tests.
> > > I guess we can just drop the IRQF_ONESHOT chunk and keep the while loop in
> > > st_lsm6dsx_handler_thread(). What do you think?  
> > 
> > I'd do that.  
> 
> ack, I will post a v2 with only this change.
> 
> >   
> > >   
> > > > 
> > > > I 'assume' that the interrupts are latched.  So we won't get a new
> > > > interrupt until we have taken some action to clear it?  In this
> > > > case that action is removing items from the fifo?    
> > > 
> > > I do not know :). Adding stm folks.
> > > @mario, denis, armando: any pointer for this?
> > >   
> > > > 
> > > > IIRC, if we get an interrupt whilst it is masked due to IRQF_ONESHOT
> > > > then it is left pending until we exit the thread.  So that should
> > > > be sufficient to close a potential edge condition where we clear
> > > > the fifo, and it immediately fires again.  This pending behaviour
> > > > is necessary to avoid the race that would happen in any normal handler.    
> > > 
> > > I did not get you on this point.  
> > 
> > If an interrupt occurs, even whilst we have it masked, we shouldn't
> > loose it.  If we did so then any normal handler that clears the interrupt
> > at the end of doing whatever it needs to do would race against a new interrupt.
> > 
> > So my suspicion is that you aren't actually missing an interrupt, but rather the
> > drop time is too short to be detected (or effectively not there at all).  
> 
> I guess since edge interrupts run with the line unmasked a new interrupt can fire
> while the irq thread is still running (so wake_up_process() will just return) but
> the driver has already read fifo_status register and so it will not read new
> sample. This case should be fixed reading again the fifo_status register.
It doesn't actually help because there is always a window after the fifo_status register
is read before we exit the thread.
I 'think' what happens (it's been a while since I dug through this stuff) is
that you end up with the task being added to the runqueue, even though it's
already running. Upshot the thread gets scheduled gain. 
If this were not the case there would be a race with any edge based interrupt
as the thread has to reenable the interrupt and it will always be able to fire
whilst the thread is still running.
Jonathan
> 
> Regards,
> Lorenzo
> 
> >   
> > >   
> > > > 
> > > > 
> > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > going to be prone to races.    
> > > 
> > > @mario, denis, armando: any pointer for this?
> > >   
> > > > 
> > > > So I think the following can happen.
> > > > 
> > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > >    is crossed in future we will interrupt as normal.
> > > > 
> > > > B) We drain the fifo but it either has a very low watermark, or is
> > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > >    may up entering the interrupt handler when we don't actually need to.
> > > >    If you want to avoid that you would need to disable the interrupt,
> > > >    then drain the fifo and finally do a dance to successfully reenable
> > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > >    should not have fired (still below the threshold)
> > > > 
> > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > >    we never get it under the limit, so no interrupt ever fires.
> > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > >    That needs a timeout so we just give up eventually.
> > > > 
> > > > D) watershed is one sample, we drain low enough to successfully get down
> > > >    to zero at the moment of the read, but very very soon after that we get
> > > >    one sample again. There is a window in which the interrupt line dropped
> > > >    but analogue electronics etc being what they are, it may not have been
> > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > >    samples?  
> > > > 
> > > > Also needs a fixes tag :)    
> > > 
> > > ack, I will add them in v2
> > > 
> > > Regards,
> > > Lorenzo  
> > > >     
> > > > > 
> > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > ---
> > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > >  }
> > > > >  
> > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > +{
> > > > > +	return IRQ_WAKE_THREAD;
> > > > > +}
> > > > > +
> > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > >  {
> > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > +	int fifo_len = 0, len = 0;
> > > > >  	bool event;
> > > > > -	int count;
> > > > >  
> > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > >  
> > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > >  
> > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > +	/*
> > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > +	 */
> > > > > +	do {
> > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > +
> > > > > +		fifo_len += len;
> > > > > +	} while (len > 0);
> > > > >  
> > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > >  }
> > > > >  
> > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > >  
> > > > >  	switch (irq_type) {
> > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > +		fallthrough;
> > > > >  	case IRQF_TRIGGER_RISING:
> > > > >  		irq_active_low = false;
> > > > >  		break;
> > > > >  	case IRQF_TRIGGER_LOW:
> > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > +		fallthrough;
> > > > >  	case IRQF_TRIGGER_FALLING:
> > > > >  		irq_active_low = true;
> > > > >  		break;
> > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > >  	}
> > > > >  
> > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > -					NULL,
> > > > > +					st_lsm6dsx_handler_irq,
> > > > >  					st_lsm6dsx_handler_thread,
> > > > > -					irq_type | IRQF_ONESHOT,
> > > > > -					"lsm6dsx", hw);
> > > > > +					irq_type, "lsm6dsx", hw);
> > > > >  	if (err) {
> > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > >  			hw->irq);    
> > > >     
> > >   
> >   
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-08 16:49         ` Jonathan Cameron
@ 2020-11-08 18:27           ` Lorenzo Bianconi
  2020-11-14 15:06             ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-11-08 18:27 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
[-- Attachment #1: Type: text/plain, Size: 7846 bytes --]
[...]
> > 
> > I guess since edge interrupts run with the line unmasked a new interrupt can fire
> > while the irq thread is still running (so wake_up_process() will just return) but
> > the driver has already read fifo_status register and so it will not read new
> > sample. This case should be fixed reading again the fifo_status register.
> 
> It doesn't actually help because there is always a window after the fifo_status register
> is read before we exit the thread.
> 
> I 'think' what happens (it's been a while since I dug through this stuff) is
> that you end up with the task being added to the runqueue, even though it's
> already running. Upshot the thread gets scheduled gain. 
> 
> If this were not the case there would be a race with any edge based interrupt
> as the thread has to reenable the interrupt and it will always be able to fire
> whilst the thread is still running.
I guess this is the case (race between irq-thread and edge interrupt) since afaik
handle_edge_irq() runs with the irq-line unmasked.
I agree with you this approach does not fix completely the issue but it reduces
the race-surface since now the interrupt can fire while processing the previous one
(the issue occurs if the interrupt fires between the end of hw->settings->fifo_ops.read_fifo()
and the end of the irq-thread) while before the interrupt must always fire before
reading the fifo status register (in fact with the patch applied I am not able
to trigger the issue anymore).
@denis, mario, armando: can you please confirm the hw does not support pulsed
interrupts for fifo-watermark?
If not one possible approach would be to disable the interrupt generation on
the sensor at the beginning of st_lsm6dsx_handler_thread() and schedule a
workqueue at the end of st_lsm6dsx_handler_thread() to re-enable the sensor
interrupt generation. What do you think?
Regards,
Lorenzo
> 
> Jonathan
> 
> 
> 
> 
> > 
> > Regards,
> > Lorenzo
> > 
> > >   
> > > >   
> > > > > 
> > > > > 
> > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > going to be prone to races.    
> > > > 
> > > > @mario, denis, armando: any pointer for this?
> > > >   
> > > > > 
> > > > > So I think the following can happen.
> > > > > 
> > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > >    is crossed in future we will interrupt as normal.
> > > > > 
> > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > >    should not have fired (still below the threshold)
> > > > > 
> > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > >    That needs a timeout so we just give up eventually.
> > > > > 
> > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > >    but analogue electronics etc being what they are, it may not have been
> > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > >    samples?  
> > > > > 
> > > > > Also needs a fixes tag :)    
> > > > 
> > > > ack, I will add them in v2
> > > > 
> > > > Regards,
> > > > Lorenzo  
> > > > >     
> > > > > > 
> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > ---
> > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > >  }
> > > > > >  
> > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > +{
> > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > +}
> > > > > > +
> > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > >  {
> > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > +	int fifo_len = 0, len = 0;
> > > > > >  	bool event;
> > > > > > -	int count;
> > > > > >  
> > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > >  
> > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > >  
> > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > +	/*
> > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > +	 */
> > > > > > +	do {
> > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > +
> > > > > > +		fifo_len += len;
> > > > > > +	} while (len > 0);
> > > > > >  
> > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > >  }
> > > > > >  
> > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > >  
> > > > > >  	switch (irq_type) {
> > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > +		fallthrough;
> > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > >  		irq_active_low = false;
> > > > > >  		break;
> > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > +		fallthrough;
> > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > >  		irq_active_low = true;
> > > > > >  		break;
> > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > >  	}
> > > > > >  
> > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > -					NULL,
> > > > > > +					st_lsm6dsx_handler_irq,
> > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > -					"lsm6dsx", hw);
> > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > >  	if (err) {
> > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > >  			hw->irq);    
> > > > >     
> > > >   
> > >   
> 
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-08 18:27           ` Lorenzo Bianconi
@ 2020-11-14 15:06             ` Jonathan Cameron
  2020-11-14 16:48               ` Lorenzo Bianconi
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2020-11-14 15:06 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
On Sun, 8 Nov 2020 19:27:28 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> [...]
> > > 
> > > I guess since edge interrupts run with the line unmasked a new interrupt can fire
> > > while the irq thread is still running (so wake_up_process() will just return) but
> > > the driver has already read fifo_status register and so it will not read new
> > > sample. This case should be fixed reading again the fifo_status register.  
> > 
> > It doesn't actually help because there is always a window after the fifo_status register
> > is read before we exit the thread.
> > 
> > I 'think' what happens (it's been a while since I dug through this stuff) is
> > that you end up with the task being added to the runqueue, even though it's
> > already running. Upshot the thread gets scheduled gain. 
> > 
> > If this were not the case there would be a race with any edge based interrupt
> > as the thread has to reenable the interrupt and it will always be able to fire
> > whilst the thread is still running.  
> 
> I guess this is the case (race between irq-thread and edge interrupt) since afaik
> handle_edge_irq() runs with the irq-line unmasked.
> I agree with you this approach does not fix completely the issue but it reduces
> the race-surface since now the interrupt can fire while processing the previous one
> (the issue occurs if the interrupt fires between the end of hw->settings->fifo_ops.read_fifo()
> and the end of the irq-thread) while before the interrupt must always fire before
> reading the fifo status register (in fact with the patch applied I am not able
> to trigger the issue anymore).
So the thing I've been trying to say badly here is that I'm fairly sure the
issue isn't what you think it is at all.  (Note I've spent a lot of
time with scopes on interrupt lines looking for similar issues - it's
not fun).
I think the actual condition here is that you have an interrupt that is not
guaranteed to go low for long enough between being cleared and set.  Thus if you are
read the fifo at almost exactly the moment new data is written you may in theory
have the interrupt drop, but in practice analog electronics kicks in an you won't
get an interrupt detected at all. This why the sensor needs to put guarantees
on that drop time (some do - but I'm not seeing in datasheet for this one).
On a more mundane note, I'm not sure in this case that there is a guarantee
it will ever drop even in theory - this buffer could for this short period be
filling faster than we drain it.
The reason your change makes this much less likely to happen is that, by checking
again you are generally much closer to the time of the change of the level in
the fifo.  Thus, unless you are preempted you should clear it long before it
would be set again, and thus get a nice clean drop on the interrupt.
So for some asci art 
Previously we have
data samples       |       |       |
                          _
Read of fifo   ___________|_____ 
                    _______ _____________
interrupt line ____|       |              Interrupt stuck high as edge missed.
                           ^       
                           1       
With your fix
data samples       |       |       |
                          _
Read of fifo   ___________|__|__ 
                    _______ __
interrupt line ____|       |  |____|
                           ^       ^
                           1       2
So we would have missed 1, but because we check the fifo level again immediate
after we would have made it drop, if we hit this unfortunately timing we will
very quickly pull new data from the sensor and result in a drop well before the
next interrupt comes in.
> 
> @denis, mario, armando: can you please confirm the hw does not support pulsed
> interrupts for fifo-watermark?
> 
> If not one possible approach would be to disable the interrupt generation on
> the sensor at the beginning of st_lsm6dsx_handler_thread() and schedule a
> workqueue at the end of st_lsm6dsx_handler_thread() to re-enable the sensor
> interrupt generation. What do you think?
Reenabling it in the thread should work as well.  It is a heavy weight solution
but it is what you are expected to do in cases like this. 
I'd be very surprised if that doesn't work.  The normal operation of edge
interrupt handlers is to reenable in the thread after we are sure we have
cleared the condition for the original interrupt.  That will take long enough
(as bus transaction involved) that the interrupt will definitely have dropped
for long enough to be detected.
In some similar cases we've just concluded the right option is to not
support edge interrupts. Do we know if we have boards out there that are using
it in that mode and is there any chance they would support level interrupts
as that's going to be a lot simpler and more reliable for this?
Jonathan
> 
> Regards,
> Lorenzo
> 
> > 
> > Jonathan
> > 
> > 
> > 
> >   
> > > 
> > > Regards,
> > > Lorenzo
> > >   
> > > >     
> > > > >     
> > > > > > 
> > > > > > 
> > > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > > going to be prone to races.      
> > > > > 
> > > > > @mario, denis, armando: any pointer for this?
> > > > >     
> > > > > > 
> > > > > > So I think the following can happen.
> > > > > > 
> > > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > > >    is crossed in future we will interrupt as normal.
> > > > > > 
> > > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > > >    should not have fired (still below the threshold)
> > > > > > 
> > > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > > >    That needs a timeout so we just give up eventually.
> > > > > > 
> > > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > > >    but analogue electronics etc being what they are, it may not have been
> > > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > > >    samples?  
> > > > > > 
> > > > > > Also needs a fixes tag :)      
> > > > > 
> > > > > ack, I will add them in v2
> > > > > 
> > > > > Regards,
> > > > > Lorenzo    
> > > > > >       
> > > > > > > 
> > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > ---
> > > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > > >  }
> > > > > > >  
> > > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > > +{
> > > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > > >  {
> > > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > > +	int fifo_len = 0, len = 0;
> > > > > > >  	bool event;
> > > > > > > -	int count;
> > > > > > >  
> > > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > > >  
> > > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > >  
> > > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > > +	/*
> > > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > > +	 */
> > > > > > > +	do {
> > > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > > +
> > > > > > > +		fifo_len += len;
> > > > > > > +	} while (len > 0);
> > > > > > >  
> > > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > >  }
> > > > > > >  
> > > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > >  
> > > > > > >  	switch (irq_type) {
> > > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > +		fallthrough;
> > > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > > >  		irq_active_low = false;
> > > > > > >  		break;
> > > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > +		fallthrough;
> > > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > > >  		irq_active_low = true;
> > > > > > >  		break;
> > > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > >  	}
> > > > > > >  
> > > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > > -					NULL,
> > > > > > > +					st_lsm6dsx_handler_irq,
> > > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > > -					"lsm6dsx", hw);
> > > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > > >  	if (err) {
> > > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > > >  			hw->irq);      
> > > > > >       
> > > > >     
> > > >     
> >   
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-14 15:06             ` Jonathan Cameron
@ 2020-11-14 16:48               ` Lorenzo Bianconi
  2020-11-14 17:31                 ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-11-14 16:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
[-- Attachment #1: Type: text/plain, Size: 10664 bytes --]
> On Sun, 8 Nov 2020 19:27:28 +0100
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
[...]
> 
> So the thing I've been trying to say badly here is that I'm fairly sure the
> issue isn't what you think it is at all.  (Note I've spent a lot of
> time with scopes on interrupt lines looking for similar issues - it's
> not fun).
> 
> I think the actual condition here is that you have an interrupt that is not
> guaranteed to go low for long enough between being cleared and set.  Thus if you are
> read the fifo at almost exactly the moment new data is written you may in theory
> have the interrupt drop, but in practice analog electronics kicks in an you won't
> get an interrupt detected at all. This why the sensor needs to put guarantees
> on that drop time (some do - but I'm not seeing in datasheet for this one).
> On a more mundane note, I'm not sure in this case that there is a guarantee
> it will ever drop even in theory - this buffer could for this short period be
> filling faster than we drain it.
ack, very nice explanation :)
> 
> The reason your change makes this much less likely to happen is that, by checking
> again you are generally much closer to the time of the change of the level in
> the fifo.  Thus, unless you are preempted you should clear it long before it
> would be set again, and thus get a nice clean drop on the interrupt.
> 
> So for some asci art 
very nice :)
> 
> Previously we have
> 
> data samples       |       |       |
>                           _
> Read of fifo   ___________|_____ 
>                     _______ _____________
> interrupt line ____|       |              Interrupt stuck high as edge missed.
>                            ^       
>                            1       
> 
> With your fix
> 
> data samples       |       |       |
>                           _
> Read of fifo   ___________|__|__ 
>                     _______ __
> interrupt line ____|       |  |____|
>                            ^       ^
>                            1       2
> 
> So we would have missed 1, but because we check the fifo level again immediate
> after we would have made it drop, if we hit this unfortunately timing we will
> very quickly pull new data from the sensor and result in a drop well before the
> next interrupt comes in.
in the last case, even if we introduce a little bit of burstiness, I guess it
works because we read both 1 and 2, right?
> 
> 
> > 
> > @denis, mario, armando: can you please confirm the hw does not support pulsed
> > interrupts for fifo-watermark?
> > 
> > If not one possible approach would be to disable the interrupt generation on
> > the sensor at the beginning of st_lsm6dsx_handler_thread() and schedule a
> > workqueue at the end of st_lsm6dsx_handler_thread() to re-enable the sensor
> > interrupt generation. What do you think?
> 
> Reenabling it in the thread should work as well.  It is a heavy weight solution
> but it is what you are expected to do in cases like this. 
> 
> I'd be very surprised if that doesn't work.  The normal operation of edge
> interrupt handlers is to reenable in the thread after we are sure we have
> cleared the condition for the original interrupt.  That will take long enough
> (as bus transaction involved) that the interrupt will definitely have dropped
> for long enough to be detected.
agree it should work
> 
> In some similar cases we've just concluded the right option is to not
> support edge interrupts. Do we know if we have boards out there that are using
> it in that mode and is there any chance they would support level interrupts
> as that's going to be a lot simpler and more reliable for this?
I do not know about it, I just received a report about the issue from stm folks.
I am fine to drop support for edge interrupts but do we have a similar issue for
st sensors (acc, magn, gyro) as well? Please consider:
https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L113
Regards,
Lorenzo
> 
> Jonathan
> > 
> > Regards,
> > Lorenzo
> > 
> > > 
> > > Jonathan
> > > 
> > > 
> > > 
> > >   
> > > > 
> > > > Regards,
> > > > Lorenzo
> > > >   
> > > > >     
> > > > > >     
> > > > > > > 
> > > > > > > 
> > > > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > > > going to be prone to races.      
> > > > > > 
> > > > > > @mario, denis, armando: any pointer for this?
> > > > > >     
> > > > > > > 
> > > > > > > So I think the following can happen.
> > > > > > > 
> > > > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > > > >    is crossed in future we will interrupt as normal.
> > > > > > > 
> > > > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > > > >    should not have fired (still below the threshold)
> > > > > > > 
> > > > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > > > >    That needs a timeout so we just give up eventually.
> > > > > > > 
> > > > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > > > >    but analogue electronics etc being what they are, it may not have been
> > > > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > > > >    samples?  
> > > > > > > 
> > > > > > > Also needs a fixes tag :)      
> > > > > > 
> > > > > > ack, I will add them in v2
> > > > > > 
> > > > > > Regards,
> > > > > > Lorenzo    
> > > > > > >       
> > > > > > > > 
> > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > > ---
> > > > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > > > +{
> > > > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > > > >  {
> > > > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > > > +	int fifo_len = 0, len = 0;
> > > > > > > >  	bool event;
> > > > > > > > -	int count;
> > > > > > > >  
> > > > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > > > >  
> > > > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > >  
> > > > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > > > +	/*
> > > > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > > > +	 */
> > > > > > > > +	do {
> > > > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > > > +
> > > > > > > > +		fifo_len += len;
> > > > > > > > +	} while (len > 0);
> > > > > > > >  
> > > > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > >  
> > > > > > > >  	switch (irq_type) {
> > > > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > +		fallthrough;
> > > > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > > > >  		irq_active_low = false;
> > > > > > > >  		break;
> > > > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > +		fallthrough;
> > > > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > > > >  		irq_active_low = true;
> > > > > > > >  		break;
> > > > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > >  	}
> > > > > > > >  
> > > > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > > > -					NULL,
> > > > > > > > +					st_lsm6dsx_handler_irq,
> > > > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > > > -					"lsm6dsx", hw);
> > > > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > > > >  	if (err) {
> > > > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > > > >  			hw->irq);      
> > > > > > >       
> > > > > >     
> > > > >     
> > >   
> 
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-14 16:48               ` Lorenzo Bianconi
@ 2020-11-14 17:31                 ` Jonathan Cameron
  2020-11-14 17:58                   ` Lorenzo Bianconi
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2020-11-14 17:31 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
On Sat, 14 Nov 2020 17:48:40 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > On Sun, 8 Nov 2020 19:27:28 +0100
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >   
> [...]
> > 
> > So the thing I've been trying to say badly here is that I'm fairly sure the
> > issue isn't what you think it is at all.  (Note I've spent a lot of
> > time with scopes on interrupt lines looking for similar issues - it's
> > not fun).
> > 
> > I think the actual condition here is that you have an interrupt that is not
> > guaranteed to go low for long enough between being cleared and set.  Thus if you are
> > read the fifo at almost exactly the moment new data is written you may in theory
> > have the interrupt drop, but in practice analog electronics kicks in an you won't
> > get an interrupt detected at all. This why the sensor needs to put guarantees
> > on that drop time (some do - but I'm not seeing in datasheet for this one).
> > On a more mundane note, I'm not sure in this case that there is a guarantee
> > it will ever drop even in theory - this buffer could for this short period be
> > filling faster than we drain it.  
> 
> ack, very nice explanation :)
> 
> > 
> > The reason your change makes this much less likely to happen is that, by checking
> > again you are generally much closer to the time of the change of the level in
> > the fifo.  Thus, unless you are preempted you should clear it long before it
> > would be set again, and thus get a nice clean drop on the interrupt.
> > 
> > So for some asci art   
> 
> very nice :)
> 
> > 
> > Previously we have
> > 
> > data samples       |       |       |
> >                           _
> > Read of fifo   ___________|_____ 
> >                     _______ _____________
> > interrupt line ____|       |              Interrupt stuck high as edge missed.
> >                            ^       
> >                            1       
> > 
> > With your fix
> > 
> > data samples       |       |       |
> >                           _
> > Read of fifo   ___________|__|__ 
> >                     _______ __
> > interrupt line ____|       |  |____|
> >                            ^       ^
> >                            1       2
> > 
> > So we would have missed 1, but because we check the fifo level again immediate
> > after we would have made it drop, if we hit this unfortunately timing we will
> > very quickly pull new data from the sensor and result in a drop well before the
> > next interrupt comes in.  
> 
> in the last case, even if we introduce a little bit of burstiness, I guess it
> works because we read both 1 and 2, right?
We should always be fine, because the extra check must take a bit of time. Either
the event happens after that time (in which case the interrupt will have been low
long enough) or it doesn't and we will catch it.
> 
> > 
> >   
> > > 
> > > @denis, mario, armando: can you please confirm the hw does not support pulsed
> > > interrupts for fifo-watermark?
> > > 
> > > If not one possible approach would be to disable the interrupt generation on
> > > the sensor at the beginning of st_lsm6dsx_handler_thread() and schedule a
> > > workqueue at the end of st_lsm6dsx_handler_thread() to re-enable the sensor
> > > interrupt generation. What do you think?  
> > 
> > Reenabling it in the thread should work as well.  It is a heavy weight solution
> > but it is what you are expected to do in cases like this. 
> > 
> > I'd be very surprised if that doesn't work.  The normal operation of edge
> > interrupt handlers is to reenable in the thread after we are sure we have
> > cleared the condition for the original interrupt.  That will take long enough
> > (as bus transaction involved) that the interrupt will definitely have dropped
> > for long enough to be detected.  
> 
> agree it should work
> 
> > 
> > In some similar cases we've just concluded the right option is to not
> > support edge interrupts. Do we know if we have boards out there that are using
> > it in that mode and is there any chance they would support level interrupts
> > as that's going to be a lot simpler and more reliable for this?  
> 
> I do not know about it, I just received a report about the issue from stm folks.
> I am fine to drop support for edge interrupts but do we have a similar issue for
> st sensors (acc, magn, gyro) as well? Please consider:
> https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L113
It was a part now supported by that driver that I hit this issue on
years ago.  As a side note, there is a bug in there though, be it one we
probably can't hit?  stat_drdy has to be defined, if not the while loop will get
a negative back (which is true) and loop for ever.  
https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L36
Probably want's to return 0 but print an error message.  Whilst there even better
if that function just returns a boolean so we cant accidentally put such a bug
back in again in future.
Lets go with your fix, but perhaps we should add a note to the dt binding to
say level interrupts preferred?  Saving a check or two in the common case is
definitely beneficial if the host supports level interrupts.
If you can do a v3 with updated explanation and comments that would be great.
Thanks,
Jonathan
> 
> Regards,
> Lorenzo
> 
> > 
> > Jonathan  
> > > 
> > > Regards,
> > > Lorenzo
> > >   
> > > > 
> > > > Jonathan
> > > > 
> > > > 
> > > > 
> > > >     
> > > > > 
> > > > > Regards,
> > > > > Lorenzo
> > > > >     
> > > > > >       
> > > > > > >       
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > > > > going to be prone to races.        
> > > > > > > 
> > > > > > > @mario, denis, armando: any pointer for this?
> > > > > > >       
> > > > > > > > 
> > > > > > > > So I think the following can happen.
> > > > > > > > 
> > > > > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > > > > >    is crossed in future we will interrupt as normal.
> > > > > > > > 
> > > > > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > > > > >    should not have fired (still below the threshold)
> > > > > > > > 
> > > > > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > > > > >    That needs a timeout so we just give up eventually.
> > > > > > > > 
> > > > > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > > > > >    but analogue electronics etc being what they are, it may not have been
> > > > > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > > > > >    samples?  
> > > > > > > > 
> > > > > > > > Also needs a fixes tag :)        
> > > > > > > 
> > > > > > > ack, I will add them in v2
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Lorenzo      
> > > > > > > >         
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > > > ---
> > > > > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > > > > +{
> > > > > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > > > > >  {
> > > > > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > > > > +	int fifo_len = 0, len = 0;
> > > > > > > > >  	bool event;
> > > > > > > > > -	int count;
> > > > > > > > >  
> > > > > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > > > > >  
> > > > > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > >  
> > > > > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > > > > +	/*
> > > > > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > > > > +	 */
> > > > > > > > > +	do {
> > > > > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > > > > +
> > > > > > > > > +		fifo_len += len;
> > > > > > > > > +	} while (len > 0);
> > > > > > > > >  
> > > > > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > >  
> > > > > > > > >  	switch (irq_type) {
> > > > > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > +		fallthrough;
> > > > > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > > > > >  		irq_active_low = false;
> > > > > > > > >  		break;
> > > > > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > +		fallthrough;
> > > > > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > > > > >  		irq_active_low = true;
> > > > > > > > >  		break;
> > > > > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > >  	}
> > > > > > > > >  
> > > > > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > > > > -					NULL,
> > > > > > > > > +					st_lsm6dsx_handler_irq,
> > > > > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > > > > -					"lsm6dsx", hw);
> > > > > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > > > > >  	if (err) {
> > > > > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > > > > >  			hw->irq);        
> > > > > > > >         
> > > > > > >       
> > > > > >       
> > > >     
> >   
^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
  2020-11-14 17:31                 ` Jonathan Cameron
@ 2020-11-14 17:58                   ` Lorenzo Bianconi
  0 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Bianconi @ 2020-11-14 17:58 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lorenzo Bianconi, Jonathan Cameron, linux-iio, mario.tesi,
	denis.ciocca, armando.visconti
[-- Attachment #1: Type: text/plain, Size: 11762 bytes --]
> On Sat, 14 Nov 2020 17:48:40 +0100
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > > On Sun, 8 Nov 2020 19:27:28 +0100
> > > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >   
> > [...]
> > > 
> > > So the thing I've been trying to say badly here is that I'm fairly sure the
> > > issue isn't what you think it is at all.  (Note I've spent a lot of
> > > time with scopes on interrupt lines looking for similar issues - it's
> > > not fun).
> > > 
> > > I think the actual condition here is that you have an interrupt that is not
> > > guaranteed to go low for long enough between being cleared and set.  Thus if you are
> > > read the fifo at almost exactly the moment new data is written you may in theory
> > > have the interrupt drop, but in practice analog electronics kicks in an you won't
> > > get an interrupt detected at all. This why the sensor needs to put guarantees
> > > on that drop time (some do - but I'm not seeing in datasheet for this one).
> > > On a more mundane note, I'm not sure in this case that there is a guarantee
> > > it will ever drop even in theory - this buffer could for this short period be
> > > filling faster than we drain it.  
> > 
> > ack, very nice explanation :)
> > 
> > > 
> > > The reason your change makes this much less likely to happen is that, by checking
> > > again you are generally much closer to the time of the change of the level in
> > > the fifo.  Thus, unless you are preempted you should clear it long before it
> > > would be set again, and thus get a nice clean drop on the interrupt.
> > > 
> > > So for some asci art   
> > 
> > very nice :)
> > 
> > > 
> > > Previously we have
> > > 
> > > data samples       |       |       |
> > >                           _
> > > Read of fifo   ___________|_____ 
> > >                     _______ _____________
> > > interrupt line ____|       |              Interrupt stuck high as edge missed.
> > >                            ^       
> > >                            1       
> > > 
> > > With your fix
> > > 
> > > data samples       |       |       |
> > >                           _
> > > Read of fifo   ___________|__|__ 
> > >                     _______ __
> > > interrupt line ____|       |  |____|
> > >                            ^       ^
> > >                            1       2
> > > 
> > > So we would have missed 1, but because we check the fifo level again immediate
> > > after we would have made it drop, if we hit this unfortunately timing we will
> > > very quickly pull new data from the sensor and result in a drop well before the
> > > next interrupt comes in.  
> > 
> > in the last case, even if we introduce a little bit of burstiness, I guess it
> > works because we read both 1 and 2, right?
> 
> We should always be fine, because the extra check must take a bit of time. Either
> the event happens after that time (in which case the interrupt will have been low
> long enough) or it doesn't and we will catch it.
> 
> > 
> > > 
> > >   
> > > > 
[...]
> > I do not know about it, I just received a report about the issue from stm folks.
> > I am fine to drop support for edge interrupts but do we have a similar issue for
> > st sensors (acc, magn, gyro) as well? Please consider:
> > https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L113
> 
> It was a part now supported by that driver that I hit this issue on
> years ago.  As a side note, there is a bug in there though, be it one we
> probably can't hit?  stat_drdy has to be defined, if not the while loop will get
> a negative back (which is true) and loop for ever.  
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L36
> Probably want's to return 0 but print an error message.  Whilst there even better
> if that function just returns a boolean so we cant accidentally put such a bug
> back in again in future.
ack, I agree. I can post a fix but I have no device for testing.
> 
> Lets go with your fix, but perhaps we should add a note to the dt binding to
> say level interrupts preferred?  Saving a check or two in the common case is
> definitely beneficial if the host supports level interrupts.
> 
> If you can do a v3 with updated explanation and comments that would be great.
sure, I will add some comments to v2 and post v3.
Regards,
Lorenzo
> 
> Thanks,
> 
> Jonathan
> 
> > 
> > Regards,
> > Lorenzo
> > 
> > > 
> > > Jonathan  
> > > > 
> > > > Regards,
> > > > Lorenzo
> > > >   
> > > > > 
> > > > > Jonathan
> > > > > 
> > > > > 
> > > > > 
> > > > >     
> > > > > > 
> > > > > > Regards,
> > > > > > Lorenzo
> > > > > >     
> > > > > > >       
> > > > > > > >       
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > > > > > going to be prone to races.        
> > > > > > > > 
> > > > > > > > @mario, denis, armando: any pointer for this?
> > > > > > > >       
> > > > > > > > > 
> > > > > > > > > So I think the following can happen.
> > > > > > > > > 
> > > > > > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > > > > > >    is crossed in future we will interrupt as normal.
> > > > > > > > > 
> > > > > > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > > > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > > > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > > > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > > > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > > > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > > > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > > > > > >    should not have fired (still below the threshold)
> > > > > > > > > 
> > > > > > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > > > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > > > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > > > > > >    That needs a timeout so we just give up eventually.
> > > > > > > > > 
> > > > > > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > > > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > > > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > > > > > >    but analogue electronics etc being what they are, it may not have been
> > > > > > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > > > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > > > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > > > > > >    samples?  
> > > > > > > > > 
> > > > > > > > > Also needs a fixes tag :)        
> > > > > > > > 
> > > > > > > > ack, I will add them in v2
> > > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Lorenzo      
> > > > > > > > >         
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > > > > > +{
> > > > > > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > > > > > >  {
> > > > > > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > > > > > +	int fifo_len = 0, len = 0;
> > > > > > > > > >  	bool event;
> > > > > > > > > > -	int count;
> > > > > > > > > >  
> > > > > > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > > > > > >  
> > > > > > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > >  
> > > > > > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > > > > > +	/*
> > > > > > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > > > > > +	 */
> > > > > > > > > > +	do {
> > > > > > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > > > > > +
> > > > > > > > > > +		fifo_len += len;
> > > > > > > > > > +	} while (len > 0);
> > > > > > > > > >  
> > > > > > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  
> > > > > > > > > >  	switch (irq_type) {
> > > > > > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > > +		fallthrough;
> > > > > > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > > > > > >  		irq_active_low = false;
> > > > > > > > > >  		break;
> > > > > > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > > +		fallthrough;
> > > > > > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > > > > > >  		irq_active_low = true;
> > > > > > > > > >  		break;
> > > > > > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  	}
> > > > > > > > > >  
> > > > > > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > > > > > -					NULL,
> > > > > > > > > > +					st_lsm6dsx_handler_irq,
> > > > > > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > > > > > -					"lsm6dsx", hw);
> > > > > > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > > > > > >  	if (err) {
> > > > > > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > > > > > >  			hw->irq);        
> > > > > > > > >         
> > > > > > > >       
> > > > > > >       
> > > > >     
> > >   
> 
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply	[flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-11-14 17:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-22  9:26 [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts Lorenzo Bianconi
2020-11-01 16:33 ` Jonathan Cameron
2020-11-02 10:15   ` Lorenzo Bianconi
2020-11-02 17:44     ` Jonathan Cameron
2020-11-02 18:18       ` Lorenzo Bianconi
2020-11-08 16:49         ` Jonathan Cameron
2020-11-08 18:27           ` Lorenzo Bianconi
2020-11-14 15:06             ` Jonathan Cameron
2020-11-14 16:48               ` Lorenzo Bianconi
2020-11-14 17:31                 ` Jonathan Cameron
2020-11-14 17:58                   ` Lorenzo Bianconi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).