* [ath9k-devel] Possible over driving AR9106, how to detect? @ 2011-09-07 14:59 Daniel Smith 2011-09-07 15:47 ` Adrian Chadd 0 siblings, 1 reply; 7+ messages in thread From: Daniel Smith @ 2011-09-07 14:59 UTC (permalink / raw) To: ath9k-devel Greetings, I am running into a situation where I believe the RF front-end (AR9106) is being over driven. The configuration being used is a high-gain antenna with an inline LNA attached to a SparkLAN WMIA-199NI. The interface is put into monitor mode and set the fcsfail flag. The interface is brought up and after random periods of time we stop receiving frames. In the past this typically occurred at the same time as the DMA rx stop storm would occur. After all the fixes were put in place we still continued to see the issue. When the lock-up occurs there is no sign of activity on the card (e.g. frame counts, interrupt counts, etc in debugfs do not increase). This lock-up can be cleared by simple changing the channel or cycling the interface down and back up. One factor that seems to trigger this is when we run in the lab where the AP and station are within close proximity resulting in a signal strength of 70dB. I was wondering if there might be an interrupt that I could mask in or some other means to detect if the radio is in fact being over driven. Thanks in advance! V/r, Daniel Smith ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-07 14:59 [ath9k-devel] Possible over driving AR9106, how to detect? Daniel Smith @ 2011-09-07 15:47 ` Adrian Chadd 2011-09-07 18:04 ` Daniel Smith 0 siblings, 1 reply; 7+ messages in thread From: Adrian Chadd @ 2011-09-07 15:47 UTC (permalink / raw) To: ath9k-devel On 7 September 2011 22:59, Daniel Smith <viscous.liquid@gmail.com> wrote: > Greetings, > > I am running into a situation where I believe the RF front-end (AR9106) > is being over driven. The configuration being used is a high-gain > antenna with an inline LNA attached to a SparkLAN WMIA-199NI. The > interface is put into monitor mode and set the fcsfail flag. The > interface is brought up and after random periods of time we stop > receiving frames. In the past this typically occurred at the same time > as the DMA rx stop storm would occur. After all the fixes were put in > place we still continued to see the issue. When the lock-up occurs there > is no sign of activity on the card (e.g. frame counts, interrupt counts, > etc in debugfs do not increase). This lock-up can be cleared by simple > changing the channel or cycling the interface down and back up. One > factor that seems to trigger this is when we run in the lab where the AP > and station are within close proximity resulting in a signal strength of > 70dB. > > I was wondering if there might be an interrupt that I could mask in or > some other means to detect if the radio is in fact being over driven. If the DMA RX stop storm occured then it meant the NIC thought it hit the end of the RX descriptor list (whether you did or not) and it just kept signalling it couldn't write packets anywhere. I remember seeing this in FreeBSD, so I added some code to the RX tasklet to forcibly reset the PCU receive and re-link all the RX descriptors. It causes packet loss when it occurs (and it only occurs when I'm thrashing the NIC with too much UDP traffic) but I bet it could also occur if I enabled PHY errors (eg when doing radar detection) in a very busy+noisy environment. ath9k handles the RX descriptors a bit differently but when I tried the same method in FreeBSD, it still ended up occasionally hitting RXEOL, firing off RXORN interrupts and then getting very pissed off at me. I'll do some further digging soon and I'll post an update to the list when I figure it out. If you're up for a bit of coding, here's what I did: * when RXEOL interrupt is received, set sc->sc_kickpcu=1; disable RXEOL/RXORN; * in ath_rx_tasklet() (in ath9k, it's not called that in FreeBSD) run the normal descriptor list check, then once that's done, if sc->sc_kickpcu == 1: * set it to 0 * call pcu stop; * re-initialise all of the descriptors * call pcu start; * re-enable interrupts, with RXEOL|RXORN re-enabled. This reliably fixes all the crazy stuff I saw when I didn't do the above but it does give (to me, unacceptable) packet loss under very high UDP RX load. Adrian ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-07 15:47 ` Adrian Chadd @ 2011-09-07 18:04 ` Daniel Smith 2011-09-08 0:37 ` Adrian Chadd 0 siblings, 1 reply; 7+ messages in thread From: Daniel Smith @ 2011-09-07 18:04 UTC (permalink / raw) To: ath9k-devel Thanks for the quick response Adrian. On Wed, Sep 7, 2011 at 11:47 AM, Adrian Chadd <adrian@freebsd.org> wrote: > If the DMA RX stop storm occured then it meant the NIC thought it hit > the end of the RX descriptor list (whether you did or not) and it just > kept signalling it couldn't write packets anywhere. I am not certain it is in fact the DMA RX stop storm. The occurrence often coincided when the storm use to be much more pervasive. Now we still see it even when there is no storm occurring, e.g. the interrupt debug counters are not increasing and there are none of the "unable to stop RX DMA" kernel log errors. > I remember seeing this in FreeBSD, so I added some code to the RX > tasklet to forcibly reset the PCU receive and re-link all the RX > descriptors. It causes packet loss when it occurs (and it only occurs > when I'm thrashing the NIC with too much UDP traffic) but I bet it > could also occur if I enabled PHY errors (eg when doing radar > detection) in a very busy+noisy environment. I have seen you mention this in other postings. Like stated above I am pretty certain it is occurring even when there is no DMA storm, but what is intriguing is that you seem to be seeing the same trigger. That being when high volumes of traffic coming into the interface. > ath9k handles the RX descriptors a bit differently but when I tried > the same method in FreeBSD, it still ended up occasionally hitting > RXEOL, firing off RXORN interrupts and then getting very pissed off at > me. I'll do some further digging soon and I'll post an update to the > list when I figure it out. I will go back and see if in fact I can see an RXEOL being fired when the lock-up occurs for us. > If you're up for a bit of coding, here's what I did: Always up for a challenge (^_^) > * when RXEOL interrupt is received, set sc->sc_kickpcu=1; disable RXEOL/RXORN; > * in ath_rx_tasklet() (in ath9k, it's not called that in FreeBSD) run > the normal descriptor list check, then once that's done, if > sc->sc_kickpcu == 1: > ? ?* set it to 0 > ? ?* call pcu stop; > ? ?* re-initialise all of the descriptors > ? ?* call pcu start; > ? ?* re-enable interrupts, with RXEOL|RXORN re-enabled. This may be as simple (with some additional success checks) as a: if (sc->sc_kickpcu == 1) { ath_stoprecv(sc); ath_rx_cleanup(sc); ath_rx_init(sc, ATH_RXBUF); ath_startrecv(sc); sc->sc_kickpcu = 0 } right after unlocking the spinlock, afterwards because three of those calls all try to lock the rxbuflock. > This reliably fixes all the crazy stuff I saw when I didn't do the > above but it does give (to me, unacceptable) packet loss under very > high UDP RX load. Do you have this fix in the FreeBSD mainline? If so, would it be beneficial to the mainline ath9k for a similar fix? v/r, Daniel ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-07 18:04 ` Daniel Smith @ 2011-09-08 0:37 ` Adrian Chadd 2011-09-08 11:42 ` Daniel Smith 0 siblings, 1 reply; 7+ messages in thread From: Adrian Chadd @ 2011-09-08 0:37 UTC (permalink / raw) To: ath9k-devel On 8 September 2011 02:04, Daniel Smith <viscous.liquid@gmail.com> wrote: > Thanks for the quick response Adrian. No worries. > I am not certain it is in fact the DMA RX stop storm. The occurrence > often coincided when the storm use to be much more pervasive. Now we > still see it even when there is no storm occurring, e.g. the interrupt > debug counters are not increasing and there are none of the "unable to > stop RX DMA" kernel log errors. Right. The reason you won't see "unable to stop RX dma" is because it hasn't locked up anything like that.. > This may be as simple (with some additional success checks) as a: > > if (sc->sc_kickpcu == 1) { > ? ?ath_stoprecv(sc); > ? ?ath_rx_cleanup(sc); > ? ?ath_rx_init(sc, ATH_RXBUF); > ? ?ath_startrecv(sc); > > ? ?sc->sc_kickpcu = 0 > } That looks like my solution. But I have extra checks to see whether startrecv() properly worked. Otherwise I do an ath_reset() call. Also, you should make sure you reset the interrupt mask. > right after unlocking the spinlock, afterwards because three of those > calls all try to lock the rxbuflock. > > >> This reliably fixes all the crazy stuff I saw when I didn't do the >> above but it does give (to me, unacceptable) packet loss under very >> high UDP RX load. > > Do you have this fix in the FreeBSD mainline? If so, would it be > beneficial to the mainline ath9k for a similar fix? I'm still trying to figure out what the "real" problem is (and then fix it.) Adrian ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-08 0:37 ` Adrian Chadd @ 2011-09-08 11:42 ` Daniel Smith 2011-09-08 12:01 ` Adrian Chadd 2011-09-08 12:09 ` Alex Hacker 0 siblings, 2 replies; 7+ messages in thread From: Daniel Smith @ 2011-09-08 11:42 UTC (permalink / raw) To: ath9k-devel On Wed, Sep 7, 2011 at 8:37 PM, Adrian Chadd <adrian@freebsd.org> wrote: > Right. The reason you won't see "unable to stop RX dma" is because it > hasn't locked up anything like that.. Ahh, so you think this possible could be a precursor to the DMA storm? > That looks like my solution. But I have extra checks to see whether > startrecv() properly worked. > Otherwise I do an ath_reset() call. Don't doubt it, it flowed naturally from your description. I will probably put checks on the stoprecv and the rx_init. > I'm still trying to figure out what the "real" problem is (and then fix it.) For us we can reliably recreate it when we have high gain reception (70+ dB) combined with a high incoming frame rate. I am wondering if (and the reason for the post) the RF front-end is being over driven. Therefore not a bug that can be fixed in software but just a limitation of the hardware that needs to be acknowledged and dealt with appropriately. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-08 11:42 ` Daniel Smith @ 2011-09-08 12:01 ` Adrian Chadd 2011-09-08 12:09 ` Alex Hacker 1 sibling, 0 replies; 7+ messages in thread From: Adrian Chadd @ 2011-09-08 12:01 UTC (permalink / raw) To: ath9k-devel On 8 September 2011 19:42, Daniel Smith <viscous.liquid@gmail.com> wrote: > On Wed, Sep 7, 2011 at 8:37 PM, Adrian Chadd <adrian@freebsd.org> wrote: >> Right. The reason you won't see "unable to stop RX dma" is because it >> hasn't locked up anything like that.. > > Ahh, so you think this possible could be a precursor to the DMA storm? Well, that's basically what is happening: * The hardware hits or thinks its hit the end of the RX descriptor list; * It registers an RXEOL once it hits it; * The PCU then stops receiving RX frames, as there's nowhere now to put them; * and you then get an RXORN interrupt for each RX FIFO overrun (because there's nowhere for the PCU to DMA them into.) The patch recently pushed into ath9k just quietens the interrupts when they occur, hoping that when the RX process is next kicked, the RX descriptors would be re-setup and rechained correctly and the PCU will continue along its merry way. The trouble is - I found (in FreeBSD), this isn't necessarily the case. If I revert my fix and change it to just stop/start the PCU after walking the RX descriptor list and re-chaining things, the list would eventually get itself twisted in a way where only a handful of buffers on the RX list would be RX'ed into before the hardware thought it hit EOL. So it was hitting EOL after 3, 4, .. 10 descriptors; rather than 512. That's why I think there's something else we've missed. I don't want to have to shut the PCU down and then re-init the list every time this happens. I'd like to see what is left in the RX descriptor that's causing issues. It may be something as simple as a missing write memory barrier. Adrian ^ permalink raw reply [flat|nested] 7+ messages in thread
* [ath9k-devel] Possible over driving AR9106, how to detect? 2011-09-08 11:42 ` Daniel Smith 2011-09-08 12:01 ` Adrian Chadd @ 2011-09-08 12:09 ` Alex Hacker 1 sibling, 0 replies; 7+ messages in thread From: Alex Hacker @ 2011-09-08 12:09 UTC (permalink / raw) To: ath9k-devel On Thu, Sep 08, 2011 at 07:42:19AM -0400, Daniel Smith wrote: > For us we can reliably recreate it when we have high gain reception > (70+ dB) combined with a high incoming frame rate. I am wondering if > (and the reason for the post) the RF front-end is being over driven. > Therefore not a bug that can be fixed in software but just a > limitation of the hardware that needs to be acknowledged and dealt > with appropriately. Not sure about AR9106 but for AR9160 or AR92xx chips the RSSI of 70dB (it should be around -25dBm at antenna input) is not suffcient to overdrive radio. Such signal levels or more is very common in our lab tests without any issues. Regards, Alex. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-09-08 12:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-07 14:59 [ath9k-devel] Possible over driving AR9106, how to detect? Daniel Smith 2011-09-07 15:47 ` Adrian Chadd 2011-09-07 18:04 ` Daniel Smith 2011-09-08 0:37 ` Adrian Chadd 2011-09-08 11:42 ` Daniel Smith 2011-09-08 12:01 ` Adrian Chadd 2011-09-08 12:09 ` Alex Hacker
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.