linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ath5k: scanning while transmitting causes oops on 802.11a capable card
@ 2009-05-06 16:14 Pavel Roskin
  2009-05-06 16:19 ` Johannes Berg
  2009-05-06 16:36 ` [ath5k-devel] " Bob Copeland
  0 siblings, 2 replies; 10+ messages in thread
From: Pavel Roskin @ 2009-05-06 16:14 UTC (permalink / raw)
  To: linux-wireless; +Cc: ath5k-devel

Hello!

If I scan by "iw dev wlan0 scan" while sending data through the
interface, I get a BUG in net/mac80211/tx.c:

                /* RC is busted */
                if (WARN_ON_ONCE(info->control.rates[i].idx >=
                                 sband->n_bitrates)) {
                        info->control.rates[i].idx = -1;
                        continue;
                }

I added this statement inside the condition:

printk("idx = %d, bitrates = %d, i = %d\n", info->control.rates[i].idx,
sband->n_bitrates, i);

The result is:

idx = 9, bitrates = 8, i = 0
idx = 10, bitrates = 8, i = 1
idx = 9, bitrates = 8, i = 2

The card is 802.11a capable.  My interpretation is that scanning
switches to the 802.11a band temporarily, but doesn't stop transmission.
When transmitting, the rate indices for 2.4 GHz band are checked against
the number of rates in the 5 GHz band, which is indeed 8.  There are 12
rates in the 2.4 GHz band.

ath5k 0000:0b:00.0: PCI INT A disabled
ath5k 0000:0b:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ath5k 0000:0b:00.0: setting latency timer to 64
ath5k 0000:0b:00.0: registered as 'phy2'
ath: Country alpha2 being used: US
ath: Regpair detected: 0x3a
phy2: Selected rate control algorithm 'minstrel'
ath5k phy2: Atheros AR5414 chip found (MAC: 0xa3, PHY: 0x61)

I actually had to patch the kernel, or the oops would escalate to a
panic.  Perhaps it's a good idea to have that check:

--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1246,6 +1246,8 @@ ath5k_txbuf_setup(struct ath5k_softc *sc, struct ath5k_buf *bf)
                        PCI_DMA_TODEVICE);
 
        rate = ieee80211_get_tx_rate(sc->hw, info);
+       if (!rate)
+               return -EIO;
 
        if (info->flags & IEEE80211_TX_CTL_NO_ACK)
                flags |= AR5K_TXDESC_NOACK;

-- 
Regards,
Pavel Roskin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 16:14 ath5k: scanning while transmitting causes oops on 802.11a capable card Pavel Roskin
@ 2009-05-06 16:19 ` Johannes Berg
  2009-05-06 16:36 ` [ath5k-devel] " Bob Copeland
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Berg @ 2009-05-06 16:19 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: linux-wireless, ath5k-devel

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]


> If I scan by "iw dev wlan0 scan" while sending data through the
> interface, I get a BUG in net/mac80211/tx.c:
> 
>                 /* RC is busted */
>                 if (WARN_ON_ONCE(info->control.rates[i].idx >=
>                                  sband->n_bitrates)) {
>                         info->control.rates[i].idx = -1;
>                         continue;
>                 }
> 
> I added this statement inside the condition:
> 
> printk("idx = %d, bitrates = %d, i = %d\n", info->control.rates[i].idx,
> sband->n_bitrates, i);
> 
> The result is:
> 
> idx = 9, bitrates = 8, i = 0
> idx = 10, bitrates = 8, i = 1
> idx = 9, bitrates = 8, i = 2
> 
> The card is 802.11a capable.  My interpretation is that scanning
> switches to the 802.11a band temporarily, but doesn't stop transmission.
> When transmitting, the rate indices for 2.4 GHz band are checked against
> the number of rates in the 5 GHz band, which is indeed 8.  There are 12
> rates in the 2.4 GHz band.
> 
> ath5k 0000:0b:00.0: PCI INT A disabled
> ath5k 0000:0b:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> ath5k 0000:0b:00.0: setting latency timer to 64
> ath5k 0000:0b:00.0: registered as 'phy2'
> ath: Country alpha2 being used: US
> ath: Regpair detected: 0x3a
> phy2: Selected rate control algorithm 'minstrel'
> ath5k phy2: Atheros AR5414 chip found (MAC: 0xa3, PHY: 0x61)
> 
> I actually had to patch the kernel, or the oops would escalate to a
> panic.  Perhaps it's a good idea to have that check:

I don't think it's just ath5k... I had an oops too. I think some of the
pid/minstrel "fixes" broke it but don't know yet.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 16:14 ath5k: scanning while transmitting causes oops on 802.11a capable card Pavel Roskin
  2009-05-06 16:19 ` Johannes Berg
@ 2009-05-06 16:36 ` Bob Copeland
  2009-05-06 17:25   ` John W. Linville
  1 sibling, 1 reply; 10+ messages in thread
From: Bob Copeland @ 2009-05-06 16:36 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: linux-wireless, ath5k-devel

On Wed, May 6, 2009 at 12:14 PM, Pavel Roskin <proski@gnu.org> wrote:
> Hello!
>
> If I scan by "iw dev wlan0 scan" while sending data through the
> interface, I get a BUG in net/mac80211/tx.c:

Agreed... Also I think the same thing happens for rx for ath5k,
explaining the 'unknown rate index' warnings (sc->curband changes
during scan but we process a beacon from 2ghz band, that one at
least just needs some synchronization in the driver).

> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* RC is busted */
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (WARN_ON_ONCE(info->control.rates[i=
].idx >=3D
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 sband=
->n_bitrates)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0info->control.rates[i]=
=2Eidx =3D -1;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}

I had a patch here to return rate_lowest_index().  But it still
crashed eventually.

> =A0 =A0 =A0 =A0rate =3D ieee80211_get_tx_rate(sc->hw, info);
> + =A0 =A0 =A0 if (!rate)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EIO;

There are a few more rates here for MRR and RTS/CTS etc.

--=20
Bob Copeland %% www.bobcopeland.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireles=
s" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 16:36 ` [ath5k-devel] " Bob Copeland
@ 2009-05-06 17:25   ` John W. Linville
  2009-05-06 20:12     ` Pavel Roskin
  2009-05-07  8:56     ` Bob Copeland
  0 siblings, 2 replies; 10+ messages in thread
From: John W. Linville @ 2009-05-06 17:25 UTC (permalink / raw)
  To: Bob Copeland; +Cc: Pavel Roskin, linux-wireless, ath5k-devel

On Wed, May 06, 2009 at 12:36:13PM -0400, Bob Copeland wrote:
> On Wed, May 6, 2009 at 12:14 PM, Pavel Roskin <proski@gnu.org> wrote:
> > Hello!
> >
> > If I scan by "iw dev wlan0 scan" while sending data through the
> > interface, I get a BUG in net/mac80211/tx.c:
> 
> Agreed... Also I think the same thing happens for rx for ath5k,
> explaining the 'unknown rate index' warnings (sc->curband changes
> during scan but we process a beacon from 2ghz band, that one at
> least just needs some synchronization in the driver).

Ah, that could be -- I sure am tired of reading bug reports about
that...

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 17:25   ` John W. Linville
@ 2009-05-06 20:12     ` Pavel Roskin
  2009-05-06 20:26       ` Johannes Berg
  2009-05-07  8:56     ` Bob Copeland
  1 sibling, 1 reply; 10+ messages in thread
From: Pavel Roskin @ 2009-05-06 20:12 UTC (permalink / raw)
  To: John W. Linville; +Cc: Bob Copeland, linux-wireless, ath5k-devel

On Wed, 2009-05-06 at 13:25 -0400, John W. Linville wrote:
> On Wed, May 06, 2009 at 12:36:13PM -0400, Bob Copeland wrote:
> > On Wed, May 6, 2009 at 12:14 PM, Pavel Roskin <proski@gnu.org> wrote:
> > > Hello!
> > >
> > > If I scan by "iw dev wlan0 scan" while sending data through the
> > > interface, I get a BUG in net/mac80211/tx.c:
> > 
> > Agreed... Also I think the same thing happens for rx for ath5k,
> > explaining the 'unknown rate index' warnings (sc->curband changes
> > during scan but we process a beacon from 2ghz band, that one at
> > least just needs some synchronization in the driver).
> 
> Ah, that could be -- I sure am tired of reading bug reports about
> that...

I've bisected it.  The problem is introduced by the commit
2038ccfbb5f7fc7d8bca26bf53bdd6c7778136ff:

Author:     Johannes Berg <johannes@sipsolutions.net>
AuthorDate: Wed Apr 29 12:26:17 2009 +0200
Commit:     John W. Linville <linville@tuxdriver.com>
CommitDate: Thu Apr 30 15:06:34 2009 -0400

    mac80211: tell driver when idle

    When we aren't doing anything in mac80211, we can turn off
    much of the hardware, depending on the driver/hw. Not doing
    anything, aka being idle, means:
...

-- 
Regards,
Pavel Roskin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 20:12     ` Pavel Roskin
@ 2009-05-06 20:26       ` Johannes Berg
  2009-05-06 21:29         ` Pavel Roskin
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Berg @ 2009-05-06 20:26 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: John W. Linville, Bob Copeland, linux-wireless, ath5k-devel

On Wed, 2009-05-06 at 16:12 -0400, Pavel Roskin wrote:

> > > > If I scan by "iw dev wlan0 scan" while sending data through the
> > > > interface, I get a BUG in net/mac80211/tx.c:
> > > 
> > > Agreed... Also I think the same thing happens for rx for ath5k,
> > > explaining the 'unknown rate index' warnings (sc->curband changes
> > > during scan but we process a beacon from 2ghz band, that one at
> > > least just needs some synchronization in the driver).
> > 
> > Ah, that could be -- I sure am tired of reading bug reports about
> > that...
> 
> I've bisected it.  The problem is introduced by the commit
> 2038ccfbb5f7fc7d8bca26bf53bdd6c7778136ff:
> 
> Author:     Johannes Berg <johannes@sipsolutions.net>
> AuthorDate: Wed Apr 29 12:26:17 2009 +0200
> Commit:     John W. Linville <linville@tuxdriver.com>
> CommitDate: Thu Apr 30 15:06:34 2009 -0400
> 
>     mac80211: tell driver when idle

Huh? That's confusing. Also, you say you get a BUG but point out a
WARN_ON_ONCE, was that an oversight or does something crash there?

OTOH, I can see one thing happening -- it would access scan_channel.
Patch should fix that, does it help?

johannes

--- wireless-testing.orig/net/mac80211/iface.c	2009-05-06 22:25:45.000000000 +0200
+++ wireless-testing/net/mac80211/iface.c	2009-05-06 22:25:53.000000000 +0200
@@ -964,5 +964,6 @@ void ieee80211_recalc_idle(struct ieee80
 	mutex_lock(&local->iflist_mtx);
 	chg = __ieee80211_recalc_idle(local);
 	mutex_unlock(&local->iflist_mtx);
-	ieee80211_hw_config(local, chg);
+	if (chg)
+		ieee80211_hw_config(local, chg);
 }



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 20:26       ` Johannes Berg
@ 2009-05-06 21:29         ` Pavel Roskin
  2009-05-07  6:01           ` Johannes Berg
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Roskin @ 2009-05-06 21:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John W. Linville, Bob Copeland, linux-wireless, ath5k-devel

On Wed, 2009-05-06 at 22:26 +0200, Johannes Berg wrote:

> Huh? That's confusing. Also, you say you get a BUG but point out a
> WARN_ON_ONCE, was that an oversight or does something crash there?

Sorry, I meant WARN_ON_ONCE.

> OTOH, I can see one thing happening -- it would access scan_channel.
> Patch should fix that, does it help?

Yes, that does the trick!  Several scans with iw and iwlist while flood
pinging the AP don't cause any kernel messages.  Thank you!

-- 
Regards,
Pavel Roskin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 21:29         ` Pavel Roskin
@ 2009-05-07  6:01           ` Johannes Berg
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Berg @ 2009-05-07  6:01 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: John W. Linville, Bob Copeland, linux-wireless, ath5k-devel

[-- Attachment #1: Type: text/plain, Size: 709 bytes --]

On Wed, 2009-05-06 at 17:29 -0400, Pavel Roskin wrote:
> On Wed, 2009-05-06 at 22:26 +0200, Johannes Berg wrote:
> 
> > Huh? That's confusing. Also, you say you get a BUG but point out a
> > WARN_ON_ONCE, was that an oversight or does something crash there?
> 
> Sorry, I meant WARN_ON_ONCE.

Ok, no worries, just got confused for a second what you meant.

> > OTOH, I can see one thing happening -- it would access scan_channel.
> > Patch should fix that, does it help?
> 
> Yes, that does the trick!  Several scans with iw and iwlist while flood
> pinging the AP don't cause any kernel messages.  Thank you!

Ok, thanks for testing. I'll figure out the real problem and fix it.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-06 17:25   ` John W. Linville
  2009-05-06 20:12     ` Pavel Roskin
@ 2009-05-07  8:56     ` Bob Copeland
  2009-05-07 14:04       ` Bob Copeland
  1 sibling, 1 reply; 10+ messages in thread
From: Bob Copeland @ 2009-05-07  8:56 UTC (permalink / raw)
  To: John W. Linville; +Cc: Pavel Roskin, linux-wireless, ath5k-devel

On Wed, May 06, 2009 at 01:25:13PM -0400, John W. Linville wrote:
> On Wed, May 06, 2009 at 12:36:13PM -0400, Bob Copeland wrote:
> > Agreed... Also I think the same thing happens for rx for ath5k,
> > explaining the 'unknown rate index' warnings (sc->curband changes
> > during scan but we process a beacon from 2ghz band, that one at
> > least just needs some synchronization in the driver).
> 
> Ah, that could be -- I sure am tired of reading bug reports about
> that...

It's not conceptually hard to fix but I can't think of an easy 1-liner
patch.

Here are the races in the rx path (note TX status processing has a
similar race with hw_to_driver_rix, which is separate from Pavel's
get_tx_rate report).  It should be pretty easy to reproduce by setting
up scans of one channel in each band and running them continuously under
load.

Race 1 (single cpu, unfortunately placed interrupt):

CPU 1:
drv_config 
 sc->curchan = xxx
 sc->curband = yyy
 [intr]
  rx_tasklet
   rxs.rate = hw_to_driver_rix [now reports wrong band/channel]
  [end tasklet]
 reset [changes channel]

The following (untested!) patch fixes this race in a cheesy way for both
RX and TX.  However, it doesn't fix the following, smaller race caused by
deferred processing on a separate CPU.  A proper fix, as indicated,
would both rework the reset order and flush out any unprocessed data under
the appropriate spinlocks, forcing that stuff to run after the tasklets
are completed.

CPU 1:                                CPU 2:
drv_config
 next_chan = xxx
 next_band = yyy
 reset 
                                      [intr]
  disable intr                         rx_tasklet
  stop dma             
  [flush should go here]
  curchan = next_chan
  curband = next_chan
                                       rxs.rate = hw_to_driver_rix

Cheesy untested patch (actually, I know it is broken for ath5k_init, 
but you get the idea):

diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 6789c5d..6264d49 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1076,8 +1076,8 @@ ath5k_chan_set(struct ath5k_softc *sc, struct ieee80211_channel *chan)
 	if (chan->center_freq != sc->curchan->center_freq ||
 		chan->hw_value != sc->curchan->hw_value) {
 
-		sc->curchan = chan;
-		sc->curband = &sc->sbands[chan->band];
+		sc->nextchan = chan;
+		sc->nextband = &sc->sbands[chan->band];
 
 		/*
 		 * To switch channels clear any pending DMA operations;
@@ -2648,6 +2648,8 @@ ath5k_reset(struct ath5k_softc *sc, bool stop, bool change_channel)
 		ath5k_txq_cleanup(sc);
 		ath5k_rx_stop(sc);
 	}
+	sc->curchan = sc->nextchan;
+	sc->curband = sc->nextband;
 	ret = ath5k_hw_reset(ah, sc->opmode, sc->curchan, true);
 	if (ret) {
 		ATH5K_ERR(sc, "can't reset hardware (%d)\n", ret);
diff --git a/drivers/net/wireless/ath/ath5k/base.h b/drivers/net/wireless/ath/ath5k/base.h
index 852b2c1..ce8a5d4 100644
--- a/drivers/net/wireless/ath/ath5k/base.h
+++ b/drivers/net/wireless/ath/ath5k/base.h
@@ -116,6 +116,7 @@ struct ath5k_softc {
 	struct ath5k_hw		*ah;		/* Atheros HW */
 
 	struct ieee80211_supported_band		*curband;
+	struct ieee80211_supported_band		*nextband;
 
 #ifdef CONFIG_ATH5K_DEBUG
 	struct ath5k_dbg_info	debug;		/* debug info */
@@ -137,6 +138,7 @@ struct ath5k_softc {
 	unsigned int		filter_flags;	/* HW flags, AR5K_RX_FILTER_* */
 	unsigned int		curmode;	/* current phy mode */
 	struct ieee80211_channel *curchan;	/* current h/w channel */
+	struct ieee80211_channel *nextchan;	/* next h/w channel */
 
 	struct ieee80211_vif *vif;
 
-- 
Bob Copeland %% www.bobcopeland.com


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [ath5k-devel] ath5k: scanning while transmitting causes oops on 802.11a capable card
  2009-05-07  8:56     ` Bob Copeland
@ 2009-05-07 14:04       ` Bob Copeland
  0 siblings, 0 replies; 10+ messages in thread
From: Bob Copeland @ 2009-05-07 14:04 UTC (permalink / raw)
  To: John W. Linville; +Cc: ath5k-devel, linux-wireless

On Thu, May 7, 2009 at 4:56 AM, Bob Copeland <me@bobcopeland.com> wrote:
> Cheesy untested patch (actually, I know it is broken for ath5k_init,
> but you get the idea):

For what it's worth, I have a better version that passes the
channel struct directly to ath5k_reset, after I give it some
testing I'll post with a decent changelog.

-- 
Bob Copeland %% www.bobcopeland.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-05-07 14:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-06 16:14 ath5k: scanning while transmitting causes oops on 802.11a capable card Pavel Roskin
2009-05-06 16:19 ` Johannes Berg
2009-05-06 16:36 ` [ath5k-devel] " Bob Copeland
2009-05-06 17:25   ` John W. Linville
2009-05-06 20:12     ` Pavel Roskin
2009-05-06 20:26       ` Johannes Berg
2009-05-06 21:29         ` Pavel Roskin
2009-05-07  6:01           ` Johannes Berg
2009-05-07  8:56     ` Bob Copeland
2009-05-07 14:04       ` Bob Copeland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).