* [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame @ 2013-05-09 15:19 David Hauweele 2013-05-09 15:19 ` [PATCH 2/2] mrf24j40: Keep the interrupt line enabled David Hauweele 2013-05-14 3:22 ` [Linux-zigbee-devel] [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame Alan Ott 0 siblings, 2 replies; 8+ messages in thread From: David Hauweele @ 2013-05-09 15:19 UTC (permalink / raw) To: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel Cc: netdev, linux-kernel, David Hauweele The transceiver may fail under heavy traffic when a frame is transmitted while receiving another frame. This patch uses a mutex to separate the transmission and reception of frames along with a secondary working queue to avoid a deadlock while waiting for the transmission interrupt. Signed-off-by: David Hauweele <david@hauweele.net> --- drivers/net/ieee802154/mrf24j40.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c index ede3ce4..1e3ddf3 100644 --- a/drivers/net/ieee802154/mrf24j40.c +++ b/drivers/net/ieee802154/mrf24j40.c @@ -82,8 +82,10 @@ struct mrf24j40 { struct ieee802154_dev *dev; struct mutex buffer_mutex; /* only used to protect buf */ + struct mutex txrx_mutex; /* avoid transmission while receiving a frame */ struct completion tx_complete; struct work_struct irqwork; + struct work_struct rxwork; u8 *buf; /* 3 bytes. Used for SPI single-register transfers. */ }; @@ -353,6 +355,8 @@ static int mrf24j40_tx(struct ieee802154_dev *dev, struct sk_buff *skb) /* Set TXNACKREQ if the ACK bit is set in the packet. */ if (skb->data[0] & IEEE802154_FC_ACK_REQ) val |= 0x4; + + mutex_lock(&devrec->txrx_mutex); write_short_reg(devrec, REG_TXNCON, val); INIT_COMPLETION(devrec->tx_complete); @@ -361,6 +365,9 @@ static int mrf24j40_tx(struct ieee802154_dev *dev, struct sk_buff *skb) ret = wait_for_completion_interruptible_timeout( &devrec->tx_complete, 5 * HZ); + + mutex_unlock(&devrec->txrx_mutex); + if (ret == -ERESTARTSYS) goto err; if (ret == 0) { @@ -535,6 +542,8 @@ static int mrf24j40_handle_rx(struct mrf24j40 *devrec) int ret = 0; struct sk_buff *skb; + mutex_lock(&devrec->txrx_mutex); + /* Turn off reception of packets off the air. This prevents the * device from overwriting the buffer while we're reading it. */ ret = read_short_reg(devrec, REG_BBREG1, &val); @@ -575,6 +584,8 @@ out: val &= ~0x4; /* Clear RXDECINV */ write_short_reg(devrec, REG_BBREG1, val); + mutex_unlock(&devrec->txrx_mutex); + return ret; } @@ -616,12 +627,19 @@ static void mrf24j40_isrwork(struct work_struct *work) /* Check for Rx */ if (intstat & 0x8) - mrf24j40_handle_rx(devrec); + schedule_work(&devrec->rxwork); out: enable_irq(devrec->spi->irq); } +static void mrf24j40_rxwork(struct work_struct *work) +{ + struct mrf24j40 *devrec = container_of(work, struct mrf24j40, rxwork); + + mrf24j40_handle_rx(devrec); +} + static int mrf24j40_probe(struct spi_device *spi) { int ret = -ENOMEM; @@ -648,8 +666,10 @@ static int mrf24j40_probe(struct spi_device *spi) spi->max_speed_hz = MAX_SPI_SPEED_HZ; mutex_init(&devrec->buffer_mutex); + mutex_init(&devrec->txrx_mutex); init_completion(&devrec->tx_complete); INIT_WORK(&devrec->irqwork, mrf24j40_isrwork); + INIT_WORK(&devrec->rxwork, mrf24j40_rxwork); devrec->spi = spi; spi_set_drvdata(spi, devrec); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-09 15:19 [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame David Hauweele @ 2013-05-09 15:19 ` David Hauweele 2013-05-14 3:55 ` [Linux-zigbee-devel] " Alan Ott 2013-05-14 3:22 ` [Linux-zigbee-devel] [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame Alan Ott 1 sibling, 1 reply; 8+ messages in thread From: David Hauweele @ 2013-05-09 15:19 UTC (permalink / raw) To: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel Cc: netdev, linux-kernel, David Hauweele Disabling the interrupt line could miss an IRQ and leave the line into a low state hence locking the driver. Signed-off-by: David Hauweele <david@hauweele.net> --- drivers/net/ieee802154/mrf24j40.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c index 1e3ddf3..6ef32f7 100644 --- a/drivers/net/ieee802154/mrf24j40.c +++ b/drivers/net/ieee802154/mrf24j40.c @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) { struct mrf24j40 *devrec = data; - disable_irq_nosync(irq); - schedule_work(&devrec->irqwork); return IRQ_HANDLED; @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) /* Read the interrupt status */ ret = read_short_reg(devrec, REG_INTSTAT, &intstat); if (ret) - goto out; + return; /* Check for TX complete */ if (intstat & 0x1) @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) /* Check for Rx */ if (intstat & 0x8) schedule_work(&devrec->rxwork); - -out: - enable_irq(devrec->spi->irq); } static void mrf24j40_rxwork(struct work_struct *work) -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-09 15:19 ` [PATCH 2/2] mrf24j40: Keep the interrupt line enabled David Hauweele @ 2013-05-14 3:55 ` Alan Ott 2013-05-16 21:34 ` David Hauweele 0 siblings, 1 reply; 8+ messages in thread From: Alan Ott @ 2013-05-14 3:55 UTC (permalink / raw) To: David Hauweele Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel On 5/9/13 11:19 AM, David Hauweele wrote: > Disabling the interrupt line could miss an IRQ and leave the line into a > low state hence locking the driver. > Have you observed this? My understanding is that the interrupt won't be lost but instead delayed until enable_irq() is called. I got this pattern from the other 802.15.4 drivers. Perhaps my understanding is wrong. > Signed-off-by: David Hauweele <david@hauweele.net> > --- > drivers/net/ieee802154/mrf24j40.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c > index 1e3ddf3..6ef32f7 100644 > --- a/drivers/net/ieee802154/mrf24j40.c > +++ b/drivers/net/ieee802154/mrf24j40.c > @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) > { > struct mrf24j40 *devrec = data; > > - disable_irq_nosync(irq); > - > schedule_work(&devrec->irqwork); > > return IRQ_HANDLED; > @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) > /* Read the interrupt status */ > ret = read_short_reg(devrec, REG_INTSTAT, &intstat); > if (ret) > - goto out; > + return; > > /* Check for TX complete */ > if (intstat & 0x1) > @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) > /* Check for Rx */ > if (intstat & 0x8) > schedule_work(&devrec->rxwork); > - > -out: > - enable_irq(devrec->spi->irq); > } > > static void mrf24j40_rxwork(struct work_struct *work) > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-14 3:55 ` [Linux-zigbee-devel] " Alan Ott @ 2013-05-16 21:34 ` David Hauweele 2013-05-19 23:04 ` Alan Ott 0 siblings, 1 reply; 8+ messages in thread From: David Hauweele @ 2013-05-16 21:34 UTC (permalink / raw) To: Alan Ott Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel I have seen the interrupt line going low forever with heavy traffic. The at86rf230 driver has two isr, one for edge-triggered and another for level-triggered interrupt. The latter uses disable_irq_nosync/enable_irq which makes sense for level-triggered interrupt. Otherwise the interrupt would be triggered again and again until the scheduled work read the status register. However I don't see the use of disable_irq/enable_irq with an edge-triggered interrupt. Perhaps I'm missing something. Though I don't see any harm using it anyway. As you said the interrupt should be delayed until enable_irq() is called. In particular when CONFIG_HARDIRQS_SW_RESEND is set. David 2013/5/14 Alan Ott <alan@signal11.us>: > On 5/9/13 11:19 AM, David Hauweele wrote: >> >> Disabling the interrupt line could miss an IRQ and leave the line into a >> low state hence locking the driver. >> > > Have you observed this? My understanding is that the interrupt won't be lost > but instead delayed until enable_irq() is called. > > I got this pattern from the other 802.15.4 drivers. Perhaps my understanding > is wrong. > > > >> Signed-off-by: David Hauweele <david@hauweele.net> >> --- >> drivers/net/ieee802154/mrf24j40.c | 7 +------ >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/drivers/net/ieee802154/mrf24j40.c >> b/drivers/net/ieee802154/mrf24j40.c >> index 1e3ddf3..6ef32f7 100644 >> --- a/drivers/net/ieee802154/mrf24j40.c >> +++ b/drivers/net/ieee802154/mrf24j40.c >> @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) >> { >> struct mrf24j40 *devrec = data; >> >> - disable_irq_nosync(irq); >> - >> schedule_work(&devrec->irqwork); >> >> return IRQ_HANDLED; >> @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) >> /* Read the interrupt status */ >> ret = read_short_reg(devrec, REG_INTSTAT, &intstat); >> if (ret) >> - goto out; >> + return; >> >> /* Check for TX complete */ >> if (intstat & 0x1) >> @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) >> /* Check for Rx */ >> if (intstat & 0x8) >> schedule_work(&devrec->rxwork); >> - >> -out: >> - enable_irq(devrec->spi->irq); >> } >> >> static void mrf24j40_rxwork(struct work_struct *work) >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-16 21:34 ` David Hauweele @ 2013-05-19 23:04 ` Alan Ott 2013-05-21 16:17 ` David Hauweele 0 siblings, 1 reply; 8+ messages in thread From: Alan Ott @ 2013-05-19 23:04 UTC (permalink / raw) To: David Hauweele Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel On 05/16/2013 05:34 PM, David Hauweele wrote: > I have seen the interrupt line going low forever with heavy traffic. Hi David, I've been doing some testing today and I can reproduce your issue if I ping -f both ways simultaneously (after about 3-4 minutes usually). I think it has more to do with the tx/rx mutual exclusion that you described in patch 1/1 than it does with the disable/enable IRQ. With respect to enable/disable irq, I did some checking of other similar drivers (non-ieee802154) and noticed that many of them use threaded interrupts. I think this may be closer to the right way to do it. Are you running a tickless kernel? What is your preemption model? What's your hardware platform? > The at86rf230 driver has two isr, one for edge-triggered and another > for level-triggered interrupt. The latter uses > disable_irq_nosync/enable_irq which makes sense for level-triggered > interrupt. Otherwise the interrupt would be triggered again and again > until the scheduled work read the status register. > > However I don't see the use of disable_irq/enable_irq with an > edge-triggered interrupt. Perhaps I'm missing something. Though I > don't see any harm using it anyway. My understanding based on the datasheet is that the interrupt can be asserted again as soon as INTSTAT is read (which is done in the workqueue). I guess even if it happens while the workqueue is running, it's ok. I think I had a flawed understanding of schedule_work() before, thinking that it would not schedule a work_struct it if the work_struct was running. > As you said the interrupt should > be delayed until enable_irq() is called. In particular when > CONFIG_HARDIRQS_SW_RESEND is set. I think I agree with you. I'll send you a patch to try with threaded interrupts to try. I'm trying to determine exactly what's the cause of the interrupt line being stuck low. Alan. > > 2013/5/14 Alan Ott <alan@signal11.us>: >> On 5/9/13 11:19 AM, David Hauweele wrote: >>> Disabling the interrupt line could miss an IRQ and leave the line into a >>> low state hence locking the driver. >>> >> Have you observed this? My understanding is that the interrupt won't be lost >> but instead delayed until enable_irq() is called. >> >> I got this pattern from the other 802.15.4 drivers. Perhaps my understanding >> is wrong. >> >> >> >>> Signed-off-by: David Hauweele <david@hauweele.net> >>> --- >>> drivers/net/ieee802154/mrf24j40.c | 7 +------ >>> 1 file changed, 1 insertion(+), 6 deletions(-) >>> >>> diff --git a/drivers/net/ieee802154/mrf24j40.c >>> b/drivers/net/ieee802154/mrf24j40.c >>> index 1e3ddf3..6ef32f7 100644 >>> --- a/drivers/net/ieee802154/mrf24j40.c >>> +++ b/drivers/net/ieee802154/mrf24j40.c >>> @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) >>> { >>> struct mrf24j40 *devrec = data; >>> >>> - disable_irq_nosync(irq); >>> - >>> schedule_work(&devrec->irqwork); >>> >>> return IRQ_HANDLED; >>> @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) >>> /* Read the interrupt status */ >>> ret = read_short_reg(devrec, REG_INTSTAT, &intstat); >>> if (ret) >>> - goto out; >>> + return; >>> >>> /* Check for TX complete */ >>> if (intstat & 0x1) >>> @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) >>> /* Check for Rx */ >>> if (intstat & 0x8) >>> schedule_work(&devrec->rxwork); >>> - >>> -out: >>> - enable_irq(devrec->spi->irq); >>> } >>> >>> static void mrf24j40_rxwork(struct work_struct *work) >>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-19 23:04 ` Alan Ott @ 2013-05-21 16:17 ` David Hauweele 2013-05-21 18:22 ` Alan Ott 0 siblings, 1 reply; 8+ messages in thread From: David Hauweele @ 2013-05-21 16:17 UTC (permalink / raw) To: Alan Ott Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel 2013/5/20 Alan Ott <alan@signal11.us>: > On 05/16/2013 05:34 PM, David Hauweele wrote: >> I have seen the interrupt line going low forever with heavy traffic. > > Hi David, > > I've been doing some testing today and I can reproduce your issue if I > ping -f both ways simultaneously (after about 3-4 minutes usually). I > think it has more to do with the tx/rx mutual exclusion that you > described in patch 1/1 than it does with the disable/enable IRQ. With > respect to enable/disable irq, I did some checking of other similar > drivers (non-ieee802154) and noticed that many of them use threaded > interrupts. I think this may be closer to the right way to do it. Hi Alan, Indeed, threaded interrupts would be more appropriate. What about spi_async ? We could avoid using any working queue with this call. Although the driver would need a major rewrite and I've no clue about the benefits of doing so. I wonder how much additional latency is introduced by the working queue, I will try to measure this today. > > Are you running a tickless kernel? What is your preemption model? What's > your hardware platform? This is a tickless kernel with CONFIG_PREEMPT on an RPi. > >> The at86rf230 driver has two isr, one for edge-triggered and another >> for level-triggered interrupt. The latter uses >> disable_irq_nosync/enable_irq which makes sense for level-triggered >> interrupt. Otherwise the interrupt would be triggered again and again >> until the scheduled work read the status register. >> >> However I don't see the use of disable_irq/enable_irq with an >> edge-triggered interrupt. Perhaps I'm missing something. Though I >> don't see any harm using it anyway. > > My understanding based on the datasheet is that the interrupt can be > asserted again as soon as INTSTAT is read (which is done in the > workqueue). I guess even if it happens while the workqueue is running, > it's ok. You're right, as soon as INTSTAT is read, the line should be driven high again to assert another interrupt. So it shouldn't pose any problem if the interrupts are replayed by the kernel. What may happen however is an error in the spi transfer. This would result in the line staying low since the register hasn't really been read. This is what I observed at the hardware level. If this is the case, then the problem should also occurs with and without disable/enable_irq. However this bug is a bit harder to reproduce as it may take several hours. The only solution I see would be to regularly poll the INTSTAT register for recovering. I know about other drivers which do this so I will look into this and work on a fix. > > I think I had a flawed understanding of schedule_work() before, thinking > that it would not schedule a work_struct it if the work_struct was running. > >> As you said the interrupt should >> be delayed until enable_irq() is called. In particular when >> CONFIG_HARDIRQS_SW_RESEND is set. > > I think I agree with you. I'll send you a patch to try with threaded > interrupts to try. > > I'm trying to determine exactly what's the cause of the interrupt line > being stuck low. > I tried your patch for INIT_COMPLETION. It doesn't fix the problem but I agree with you for the race condition. David >> >> 2013/5/14 Alan Ott <alan@signal11.us>: >>> On 5/9/13 11:19 AM, David Hauweele wrote: >>>> Disabling the interrupt line could miss an IRQ and leave the line into a >>>> low state hence locking the driver. >>>> >>> Have you observed this? My understanding is that the interrupt won't be lost >>> but instead delayed until enable_irq() is called. >>> >>> I got this pattern from the other 802.15.4 drivers. Perhaps my understanding >>> is wrong. >>> >>> >>> >>>> Signed-off-by: David Hauweele <david@hauweele.net> >>>> --- >>>> drivers/net/ieee802154/mrf24j40.c | 7 +------ >>>> 1 file changed, 1 insertion(+), 6 deletions(-) >>>> >>>> diff --git a/drivers/net/ieee802154/mrf24j40.c >>>> b/drivers/net/ieee802154/mrf24j40.c >>>> index 1e3ddf3..6ef32f7 100644 >>>> --- a/drivers/net/ieee802154/mrf24j40.c >>>> +++ b/drivers/net/ieee802154/mrf24j40.c >>>> @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) >>>> { >>>> struct mrf24j40 *devrec = data; >>>> >>>> - disable_irq_nosync(irq); >>>> - >>>> schedule_work(&devrec->irqwork); >>>> >>>> return IRQ_HANDLED; >>>> @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) >>>> /* Read the interrupt status */ >>>> ret = read_short_reg(devrec, REG_INTSTAT, &intstat); >>>> if (ret) >>>> - goto out; >>>> + return; >>>> >>>> /* Check for TX complete */ >>>> if (intstat & 0x1) >>>> @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) >>>> /* Check for Rx */ >>>> if (intstat & 0x8) >>>> schedule_work(&devrec->rxwork); >>>> - >>>> -out: >>>> - enable_irq(devrec->spi->irq); >>>> } >>>> >>>> static void mrf24j40_rxwork(struct work_struct *work) >>>> > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 2/2] mrf24j40: Keep the interrupt line enabled 2013-05-21 16:17 ` David Hauweele @ 2013-05-21 18:22 ` Alan Ott 0 siblings, 0 replies; 8+ messages in thread From: Alan Ott @ 2013-05-21 18:22 UTC (permalink / raw) To: David Hauweele Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel On 05/21/2013 12:17 PM, David Hauweele wrote: > 2013/5/20 Alan Ott <alan@signal11.us>: >> On 05/16/2013 05:34 PM, David Hauweele wrote: >>> I have seen the interrupt line going low forever with heavy traffic. >> I've been doing some testing today and I can reproduce your issue if I >> ping -f both ways simultaneously (after about 3-4 minutes usually). I >> think it has more to do with the tx/rx mutual exclusion that you >> described in patch 1/1 than it does with the disable/enable IRQ. With >> respect to enable/disable irq, I did some checking of other similar >> drivers (non-ieee802154) and noticed that many of them use threaded >> interrupts. I think this may be closer to the right way to do it. > Indeed, threaded interrupts would be more appropriate. > What about spi_async ? We could avoid using any working queue with this call. > Although the driver would need a major rewrite and I've no clue about > the benefits of doing so. > I wonder how much additional latency is introduced by the working > queue, I will try to measure this today. Async would be really difficult since we need the values that are read in order to know what to write in later calls (like where we need to set one bit in a register). It's hard to imagine doing this without blocking somewhere, and we might as well use the blocking mechanisms that the kernel gives us. >> Are you running a tickless kernel? What is your preemption model? What's >> your hardware platform? > This is a tickless kernel with CONFIG_PREEMPT on an RPi. As another data point, can you try with voluntary preemption and see if the behavior changes? >>> The at86rf230 driver has two isr, one for edge-triggered and another >>> for level-triggered interrupt. The latter uses >>> disable_irq_nosync/enable_irq which makes sense for level-triggered >>> interrupt. Otherwise the interrupt would be triggered again and again >>> until the scheduled work read the status register. >>> >>> However I don't see the use of disable_irq/enable_irq with an >>> edge-triggered interrupt. Perhaps I'm missing something. Though I >>> don't see any harm using it anyway. >> My understanding based on the datasheet is that the interrupt can be >> asserted again as soon as INTSTAT is read (which is done in the >> workqueue). I guess even if it happens while the workqueue is running, >> it's ok. > You're right, as soon as INTSTAT is read, the line should be driven > high again to assert another interrupt. This is where I think I've found another race condition. In doing some testing with the mrf24j40 from PIC24 firmware last night, I think I've discovered that it's possible for the interrupt line to _not_ get reset (de-asserted). I think what may be causing it is if an interrupt occurs at the same time as (or right shortly after) INTSTAT is read. It's easy to imagine the interrupt line not being able to get all the way up before coming back down. I have a support email into Microchip asking specifically about this case. In my firmware, I'm able to see a case where the INTSTAT register shows non-zero (ie: an interrupt is pending), yet the interrupt flag is not set. It seems like it takes about 30 minutes for this to happen. On Linux, I think this is less likely, but still possible, since there is more latency between the hardware interrupt and when INTSTAT is read. The extra latency means there is more time for the second interrupt to happen, causing INTSTAT to simply show both interrupts, and have less likelihood of the second interrupt happening exactly as INTSTAT is read. > So it shouldn't pose any problem if the interrupts are replayed by the > kernel. What may happen however is > an error in the spi transfer. This would result in the line staying > low since the register hasn't really been read. > This is what I observed at the hardware level. To be clear, you've observed the line staying low, not an SPI transfer error, right? > If this is the case, then the problem should also occurs with and > without disable/enable_irq. However this bug > is a bit harder to reproduce as it may take several hours. The only > solution I see would be to regularly poll the > INTSTAT register for recovering. I know about other drivers which do > this so I will look into this and work on a fix. I'd like to avoid a straight-up poll if at all possible. Worst case, we can poll for interrupt in the case of something happening which we don't expect to ever happen, for example on something like the TX interrupt never showing up. It's harder of course to know when to poll for a system that's primarily a receiver. I've done something like this in my firmware: void mrf24j40_isr(void) { u8 intstat; int ret; /* Read the interrupt status */ ret = read_short_reg(devrec, REG_INTSTAT, &intstat); if (ret) return; do { /* Check for TX complete */ if (intstat & 0x1) tx_complete = 1; /* Check for Rx */ if (intstat & 0x8) mrf24j40_handle_rx(); /* Read the interrupt status */ ret = read_short_reg(devrec, REG_INTSTAT, &intstat); if (ret) return; if (intstat && !IFS1bits.INT1IF) { /* Put a break here. This should never happen but does */ Nop(); Nop(); Nop(); } } while (intstat); } Basically there's an extra reading of the INTSTAT every time there's an interrupt. One would think this would take care of the race condition, unless there's another.... The breakpoint in the comment is shows the error I describe. The second reading of INTSTAT shows an interrupt pending (in the case I observed, 0x8: RX), but INT1F (the interrupt flag for INT1, where I have the INT line of the mrf24j40 connected) does not get set. In the Linux driver (and in an ISR which doesn't do the extra check of INTSTAT, this causes the INT line to be stuck low (asserted) with no ISR ever getting called. We could try triggering on level (IRQF_TRIGGER_LOW) instead of edge. The mrf24j40 datasheet specifically says it gives an edge trigger, but the level sticks low until it INTSTAT is read, so we should be able to detect it as a level trigger. I'll do some testing on both of these cases. Alan. >> I think I had a flawed understanding of schedule_work() before, thinking >> that it would not schedule a work_struct it if the work_struct was running. >> >>> As you said the interrupt should >>> be delayed until enable_irq() is called. In particular when >>> CONFIG_HARDIRQS_SW_RESEND is set. >> I think I agree with you. I'll send you a patch to try with threaded >> interrupts to try. >> >> I'm trying to determine exactly what's the cause of the interrupt line >> being stuck low. >> > I tried your patch for INIT_COMPLETION. It doesn't fix the problem but > I agree with you for the race condition. > > David > >>> 2013/5/14 Alan Ott <alan@signal11.us>: >>>> On 5/9/13 11:19 AM, David Hauweele wrote: >>>>> Disabling the interrupt line could miss an IRQ and leave the line into a >>>>> low state hence locking the driver. >>>>> >>>> Have you observed this? My understanding is that the interrupt won't be lost >>>> but instead delayed until enable_irq() is called. >>>> >>>> I got this pattern from the other 802.15.4 drivers. Perhaps my understanding >>>> is wrong. >>>> >>>> >>>> >>>>> Signed-off-by: David Hauweele <david@hauweele.net> >>>>> --- >>>>> drivers/net/ieee802154/mrf24j40.c | 7 +------ >>>>> 1 file changed, 1 insertion(+), 6 deletions(-) >>>>> >>>>> diff --git a/drivers/net/ieee802154/mrf24j40.c >>>>> b/drivers/net/ieee802154/mrf24j40.c >>>>> index 1e3ddf3..6ef32f7 100644 >>>>> --- a/drivers/net/ieee802154/mrf24j40.c >>>>> +++ b/drivers/net/ieee802154/mrf24j40.c >>>>> @@ -603,8 +603,6 @@ static irqreturn_t mrf24j40_isr(int irq, void *data) >>>>> { >>>>> struct mrf24j40 *devrec = data; >>>>> >>>>> - disable_irq_nosync(irq); >>>>> - >>>>> schedule_work(&devrec->irqwork); >>>>> >>>>> return IRQ_HANDLED; >>>>> @@ -619,7 +617,7 @@ static void mrf24j40_isrwork(struct work_struct *work) >>>>> /* Read the interrupt status */ >>>>> ret = read_short_reg(devrec, REG_INTSTAT, &intstat); >>>>> if (ret) >>>>> - goto out; >>>>> + return; >>>>> >>>>> /* Check for TX complete */ >>>>> if (intstat & 0x1) >>>>> @@ -628,9 +626,6 @@ static void mrf24j40_isrwork(struct work_struct *work) >>>>> /* Check for Rx */ >>>>> if (intstat & 0x8) >>>>> schedule_work(&devrec->rxwork); >>>>> - >>>>> -out: >>>>> - enable_irq(devrec->spi->irq); >>>>> } >>>>> >>>>> static void mrf24j40_rxwork(struct work_struct *work) >>>>> >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Linux-zigbee-devel] [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame 2013-05-09 15:19 [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame David Hauweele 2013-05-09 15:19 ` [PATCH 2/2] mrf24j40: Keep the interrupt line enabled David Hauweele @ 2013-05-14 3:22 ` Alan Ott 1 sibling, 0 replies; 8+ messages in thread From: Alan Ott @ 2013-05-14 3:22 UTC (permalink / raw) To: David Hauweele Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-zigbee-devel, netdev, linux-kernel On 5/9/13 11:19 AM, David Hauweele wrote: > The transceiver may fail under heavy traffic when a frame is transmitted > while receiving another frame. > > This patch uses a mutex to separate the transmission and reception of > frames along with a secondary working queue to avoid a deadlock while > waiting for the transmission interrupt. > Have you observed this failure? I have put this kind of thing in before (several times actually) but ultimately convinced myself each time that it was not necessary. I'm happy to be shown wrong. > Signed-off-by: David Hauweele <david@hauweele.net> > --- > drivers/net/ieee802154/mrf24j40.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c > index ede3ce4..1e3ddf3 100644 > --- a/drivers/net/ieee802154/mrf24j40.c > +++ b/drivers/net/ieee802154/mrf24j40.c > @@ -82,8 +82,10 @@ struct mrf24j40 { > struct ieee802154_dev *dev; > > struct mutex buffer_mutex; /* only used to protect buf */ > + struct mutex txrx_mutex; /* avoid transmission while receiving a frame */ > struct completion tx_complete; > struct work_struct irqwork; > + struct work_struct rxwork; > u8 *buf; /* 3 bytes. Used for SPI single-register transfers. */ > }; > > @@ -353,6 +355,8 @@ static int mrf24j40_tx(struct ieee802154_dev *dev, struct sk_buff *skb) > /* Set TXNACKREQ if the ACK bit is set in the packet. */ > if (skb->data[0] & IEEE802154_FC_ACK_REQ) > val |= 0x4; > + > + mutex_lock(&devrec->txrx_mutex); > write_short_reg(devrec, REG_TXNCON, val); > > INIT_COMPLETION(devrec->tx_complete); > @@ -361,6 +365,9 @@ static int mrf24j40_tx(struct ieee802154_dev *dev, struct sk_buff *skb) > ret = wait_for_completion_interruptible_timeout( > &devrec->tx_complete, > 5 * HZ); > + > + mutex_unlock(&devrec->txrx_mutex); > + > if (ret == -ERESTARTSYS) > goto err; > if (ret == 0) { > @@ -535,6 +542,8 @@ static int mrf24j40_handle_rx(struct mrf24j40 *devrec) > int ret = 0; > struct sk_buff *skb; > > + mutex_lock(&devrec->txrx_mutex); > + > /* Turn off reception of packets off the air. This prevents the > * device from overwriting the buffer while we're reading it. */ > ret = read_short_reg(devrec, REG_BBREG1, &val); > @@ -575,6 +584,8 @@ out: > val &= ~0x4; /* Clear RXDECINV */ > write_short_reg(devrec, REG_BBREG1, val); > > + mutex_unlock(&devrec->txrx_mutex); > + > return ret; > } > > @@ -616,12 +627,19 @@ static void mrf24j40_isrwork(struct work_struct *work) > > /* Check for Rx */ > if (intstat & 0x8) > - mrf24j40_handle_rx(devrec); > + schedule_work(&devrec->rxwork); > mrf24f40_isrwork() is already called from a workqueue. > out: > enable_irq(devrec->spi->irq); > } > > +static void mrf24j40_rxwork(struct work_struct *work) > +{ > + struct mrf24j40 *devrec = container_of(work, struct mrf24j40, rxwork); > + > + mrf24j40_handle_rx(devrec); > +} > + > static int mrf24j40_probe(struct spi_device *spi) > { > int ret = -ENOMEM; > @@ -648,8 +666,10 @@ static int mrf24j40_probe(struct spi_device *spi) > spi->max_speed_hz = MAX_SPI_SPEED_HZ; > > mutex_init(&devrec->buffer_mutex); > + mutex_init(&devrec->txrx_mutex); > init_completion(&devrec->tx_complete); > INIT_WORK(&devrec->irqwork, mrf24j40_isrwork); > + INIT_WORK(&devrec->rxwork, mrf24j40_rxwork); > devrec->spi = spi; > spi_set_drvdata(spi, devrec); > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-05-21 18:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-09 15:19 [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame David Hauweele 2013-05-09 15:19 ` [PATCH 2/2] mrf24j40: Keep the interrupt line enabled David Hauweele 2013-05-14 3:55 ` [Linux-zigbee-devel] " Alan Ott 2013-05-16 21:34 ` David Hauweele 2013-05-19 23:04 ` Alan Ott 2013-05-21 16:17 ` David Hauweele 2013-05-21 18:22 ` Alan Ott 2013-05-14 3:22 ` [Linux-zigbee-devel] [PATCH 1/2] mrf24j40: Avoid transmission while receiving a frame Alan Ott
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox