mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station.

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
@ 2013-05-02 19:50 Ben Greear
  2013-05-02 20:24 ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2013-05-02 19:50 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

Kernel is hacked 3.9.0+

I've been seeing this problem for a while (and posted about it previously).  The problem
is that a station appears to associate fine, but never actually 'connects'.  This problem
is not easy to reproduce...

I added printouts in the connect logic to help debug this..for instance, here is an
interface that worked:


May  2 12:03:37 localhost kernel: [88122.116646] wiphy0: start_sw_scan: running-other-vifs: 1  running-station-vifs: 21, associated-stations: 20 scanning 
current channel: 2437 MHz
May  2 12:03:37 localhost kernel: [88122.149736] wiphy0: start_sw_scan: running-other-vifs: 2  running-station-vifs: 42, associated-stations: 40 scanning 
current channel: 2437 MHz
May  2 12:03:38 localhost kernel: [88122.218702] sta17: authenticate with 80:01:02:03:04:05
May  2 12:03:38 localhost kernel: [88122.246420] sta17: send auth to 80:01:02:03:04:05 (try 1/3)
May  2 12:03:38 localhost kernel: [88122.315220] sta17: authenticated
May  2 12:03:38 localhost kernel: [88122.329509] sta17: associate with 80:01:02:03:04:05 (try 1/3)
May  2 12:03:38 localhost kernel: [88122.350295] sta17: RX AssocResp from 80:01:02:03:04:05 (capab=0x401 status=0 aid=21)
May  2 12:03:38 localhost kernel: [88122.364322] IPv6: ADDRCONF(NETDEV_CHANGE): sta17: link becomes ready
May  2 12:03:38 localhost kernel: [88122.376190] sta17: associated
May  2 12:03:38 localhost kernel: [88122.388795] nl80211_send_connect_result, dev: sta17  status: 0
May  2 12:03:38 localhost kernel: [88122.400701] nl80211_send_connect_result, dev: sta17  sending msg...

Here is the stuck one (sta4).  What may be of interest is that the Invalid WDS/flush state shows up in
these logs.  Any idea if this could somehow cause the 'connect' logic to not be
called?

May  2 12:36:37 localhost kernel: [90101.873677] sta4: authenticated
May  2 12:36:37 localhost kernel: [90101.891233] sta4: associate with 80:01:02:03:04:05 (try 1/3)
May  2 12:36:37 localhost kernel: [90101.908620] sta4: RX AssocResp from 80:01:02:03:04:05 (capab=0x401 status=0 aid=19)
May  2 12:36:37 localhost kernel: [90101.922494] IPv6: ADDRCONF(NETDEV_CHANGE): sta4: link becomes ready
May  2 12:36:37 localhost kernel: [90101.935228] sta4: associated
May  2 12:36:42 localhost ntpd[1243]: Listen normally on 935 sta4 fe80::2ab:cdff:feef:105 UDP 123
May  2 12:37:50 localhost kernel: [90174.724370] sta4: Invalid WDS/flush state, type: 2  WDS: 5  flushed: 1
May  2 12:37:51 localhost ntpd[1243]: Deleting interface #935 sta4, fe80::2ab:cdff:feef:105#123, interface stats: received=0, sent=0, dropped=0, active_time=69 secs
May  2 12:37:55 localhost kernel: [90179.828649] IPv6: ADDRCONF(NETDEV_UP): sta4: link is not ready
May  2 12:37:55 localhost kernel: [90179.844669] wiphy0: start_sw_scan: running-other-vifs: 1  running-station-vifs: 21, associated-stations: 20 scanning 
current channel: 2437 MHz
May  2 12:37:55 localhost kernel: [90179.870926] wiphy0: start_sw_scan: running-other-vifs: 2  running-station-vifs: 42, associated-stations: 40 scanning 
current channel: 2437 MHz
May  2 12:37:55 localhost kernel: [90180.012706] IPv6: ADDRCONF(NETDEV_UP): sta4: link is not ready
May  2 12:37:55 localhost kernel: [90180.098460] sta4: authenticate with 80:01:02:03:04:05
May  2 12:37:55 localhost kernel: [90180.116506] sta4: send auth to 80:01:02:03:04:05 (try 1/3)
May  2 12:37:55 localhost kernel: [90180.144183] sta4: authenticated
May  2 12:37:55 localhost kernel: [90180.155081] sta4: associate with 80:01:02:03:04:05 (try 1/3)
May  2 12:37:55 localhost kernel: [90180.171836] sta4: RX AssocResp from 80:01:02:03:04:05 (capab=0x401 status=0 aid=19)
May  2 12:37:56 localhost kernel: [90180.192507] IPv6: ADDRCONF(NETDEV_CHANGE): sta4: link becomes ready
May  2 12:37:56 localhost kernel: [90180.196250] sta4: associated
May  2 12:38:00 localhost ntpd[1243]: Listen normally on 936 sta4 fe80::2ab:cdff:feef:105 UDP 123
May  2 12:38:38 localhost kernel: [90222.695314] sta4: Invalid WDS/flush state, type: 2  WDS: 5  flushed: 1
May  2 12:38:38 localhost kernel: [90222.728069] IPv6: ADDRCONF(NETDEV_UP): sta4: link is not ready
May  2 12:38:38 localhost kernel: [90222.746386] wiphy0: start_sw_scan: running-other-vifs: 1  running-station-vifs: 21, associated-stations: 20 scanning 
current channel: 2437 MHz
May  2 12:38:38 localhost kernel: [90222.772198] wiphy0: start_sw_scan: running-other-vifs: 2  running-station-vifs: 42, associated-stations: 40 scanning 
current channel: 2437 MHz
May  2 12:38:38 localhost kernel: [90222.862226] sta4: authenticate with 80:01:02:03:04:05
May  2 12:38:38 localhost kernel: [90222.880675] sta4: send auth to 80:01:02:03:04:05 (try 1/3)
May  2 12:38:38 localhost kernel: [90222.898643] sta4: deauthenticating from 80:01:02:03:04:05 by local choice (reason=3)
May  2 12:38:38 localhost kernel: [90222.943259] sta4: authenticate with 80:01:02:03:04:05
May  2 12:38:38 localhost kernel: [90222.956498] sta4: send auth to 80:01:02:03:04:05 (try 1/3)
May  2 12:38:38 localhost kernel: [90222.994180] sta4: authenticated
May  2 12:38:38 localhost kernel: [90223.004165] sta4: associate with 80:01:02:03:04:05 (try 1/3)
May  2 12:38:38 localhost kernel: [90223.022441] sta4: RX AssocResp from 80:01:02:03:04:05 (capab=0x401 status=0 aid=19)
May  2 12:38:38 localhost kernel: [90223.041723] IPv6: ADDRCONF(NETDEV_CHANGE): sta4: link becomes ready
May  2 12:38:38 localhost kernel: [90223.060054] sta4: associated
May  2 12:38:39 localhost ntpd[1243]: Deleting interface #936 sta4, fe80::2ab:cdff:feef:105#123, interface stats: received=0, sent=0, dropped=0, active_time=39 secs
May  2 12:38:43 localhost ntpd[1243]: Listen normally on 937 sta4 fe80::2ab:cdff:feef:105 UDP 123


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-02 19:50 mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station Ben Greear
@ 2013-05-02 20:24 ` Johannes Berg
  2013-05-02 20:45   ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2013-05-02 20:24 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Thu, 2013-05-02 at 12:50 -0700, Ben Greear wrote:
> Kernel is hacked 3.9.0+

Clearly :)

> I've been seeing this problem for a while (and posted about it previously).  The problem
> is that a station appears to associate fine, but never actually 'connects'.  This problem
> is not easy to reproduce...

It would be useful to know what you added ... the message you point to
(invalid wds/flush whatever) doesn't exist upstream.

johannes



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-02 20:24 ` Johannes Berg
@ 2013-05-02 20:45   ` Ben Greear
  2013-05-08 16:18     ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2013-05-02 20:45 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 05/02/2013 01:24 PM, Johannes Berg wrote:
> On Thu, 2013-05-02 at 12:50 -0700, Ben Greear wrote:
>> Kernel is hacked 3.9.0+
>
> Clearly :)
>
>> I've been seeing this problem for a while (and posted about it previously).  The problem
>> is that a station appears to associate fine, but never actually 'connects'.  This problem
>> is not easy to reproduce...
>
> It would be useful to know what you added ... the message you point to
> (invalid wds/flush whatever) doesn't exist upstream.

Gobs of stuff, as usual.  Thought I had that WDS thing pushed upstream,
but I guess not.

http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary

That message comes from:

	/*
	 * Remove all stations associated with this interface.
	 *
	 * This must be done before calling ops->remove_interface()
	 * because otherwise we can later invoke ops->sta_notify()
	 * whenever the STAs are removed, and that invalidates driver
	 * assumptions about always getting a vif pointer that is valid
	 * (because if we remove a STA after ops->remove_interface()
	 * the driver will have removed the vif info already!)
	 *
	 * This is relevant only in WDS mode, in all other modes we've
	 * already removed all stations when disconnecting or similar,
	 * so warn otherwise.
	 *
	 * We call sta_info_flush_cleanup() later, to combine RCU waits.
	 */
	flushed = sta_info_flush_defer(sdata);
	if ((sdata->vif.type != NL80211_IFTYPE_WDS && flushed > 0) ||
	    (sdata->vif.type == NL80211_IFTYPE_WDS && flushed != 1)) {
		sdata_info(sdata,
			   "Invalid WDS/flush state, type: %i  WDS: %i  flushed: %i\n",
			   sdata->vif.type, NL80211_IFTYPE_WDS, flushed);
		WARN_ON_ONCE(1);
	}

I notice __cfg80211_connect_result checks the wdev state, so I added some
printouts there to see if it is bailing due to some funny state, but will
probably be a while before I reproduce it again and know for sure.

Thanks,
Ben


>
> johannes
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-02 20:45   ` Ben Greear
@ 2013-05-08 16:18     ` Ben Greear
  2013-05-08 17:58       ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2013-05-08 16:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 05/02/2013 01:45 PM, Ben Greear wrote:
> On 05/02/2013 01:24 PM, Johannes Berg wrote:
>> On Thu, 2013-05-02 at 12:50 -0700, Ben Greear wrote:
>>> Kernel is hacked 3.9.0+
>>
>> Clearly :)
>>
>>> I've been seeing this problem for a while (and posted about it previously).  The problem
>>> is that a station appears to associate fine, but never actually 'connects'.  This problem
>>> is not easy to reproduce...
>>
>> It would be useful to know what you added ... the message you point to
>> (invalid wds/flush whatever) doesn't exist upstream.
>
> Gobs of stuff, as usual.  Thought I had that WDS thing pushed upstream,
> but I guess not.

Ok, I reproduced this with yet more debugging printouts in the kernel.

The symptom is this:

The sme_state is SME_CONNECTED, so it bails out below before sending the
'connected' message to user-space.

void __cfg80211_connect_result(struct net_device *dev, const u8 *bssid,
                                const u8 *req_ie, size_t req_ie_len,
                                const u8 *resp_ie, size_t resp_ie_len,
                                u16 status, bool wextev,
                                struct cfg80211_bss *bss)
{
         struct wireless_dev *wdev = dev->ieee80211_ptr;
         const u8 *country_ie;
#ifdef CONFIG_CFG80211_WEXT
         union iwreq_data wrqu;
#endif

         ASSERT_WDEV_LOCK(wdev);

         printk("connect_result: %s: type: %i  sme_state: %i\n",
                dev->name, (int)(wdev->iftype), (int)(wdev->sme_state));

         if (WARN_ON(wdev->iftype != NL80211_IFTYPE_STATION &&
                     wdev->iftype != NL80211_IFTYPE_P2P_CLIENT))
                 return;

         if (wdev->sme_state != CFG80211_SME_CONNECTING)
                 return;

>
> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.9.dev.y/.git;a=summary
>
> That message comes from:
>
>      /*
>       * Remove all stations associated with this interface.
>       *
>       * This must be done before calling ops->remove_interface()
>       * because otherwise we can later invoke ops->sta_notify()
>       * whenever the STAs are removed, and that invalidates driver
>       * assumptions about always getting a vif pointer that is valid
>       * (because if we remove a STA after ops->remove_interface()
>       * the driver will have removed the vif info already!)
>       *
>       * This is relevant only in WDS mode, in all other modes we've
>       * already removed all stations when disconnecting or similar,
>       * so warn otherwise.
>       *
>       * We call sta_info_flush_cleanup() later, to combine RCU waits.
>       */
>      flushed = sta_info_flush_defer(sdata);
>      if ((sdata->vif.type != NL80211_IFTYPE_WDS && flushed > 0) ||
>          (sdata->vif.type == NL80211_IFTYPE_WDS && flushed != 1)) {
>          sdata_info(sdata,
>                 "Invalid WDS/flush state, type: %i  WDS: %i  flushed: %i\n",
>                 sdata->vif.type, NL80211_IFTYPE_WDS, flushed);
>          WARN_ON_ONCE(1);
>      }
>
> I notice __cfg80211_connect_result checks the wdev state, so I added some
> printouts there to see if it is bailing due to some funny state, but will
> probably be a while before I reproduce it again and know for sure.
>
> Thanks,
> Ben
>
>
>>
>> johannes
>>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-08 16:18     ` Ben Greear
@ 2013-05-08 17:58       ` Johannes Berg
  2013-05-08 18:14         ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2013-05-08 17:58 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:

> Ok, I reproduced this with yet more debugging printouts in the kernel.
> 
> The symptom is this:
> 
> The sme_state is SME_CONNECTED, so it bails out below before sending the
> 'connected' message to user-space.

Is your system being really really really slow and/or are threads
getting pre-empted a lot? This maybe seem like a bit of a stretch, but
it seems possible that this happens:

ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
somewhere between printing "associated" and calling
cfg80211_send_rx_assoc() (or in the call already, before taking the lock
though.)

Then your interface is set down at the same time, possibly on a
different CPU. Here's where the scenario gets stretched, clearly your
interface is getting set down over a minute later, I don't see how you
could have stalled the other thread for that long.

But if you did, then that thread is still processing things while the
interface is going down, cfg80211 didn't know anything about the
association having completed so it won't have disconnected, etc.

So far, I haven't found any other scenario, nor a solution.

johannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-08 17:58       ` Johannes Berg
@ 2013-05-08 18:14         ` Ben Greear
  2013-05-10 21:21           ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2013-05-08 18:14 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 05/08/2013 10:58 AM, Johannes Berg wrote:
> On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:
>
>> Ok, I reproduced this with yet more debugging printouts in the kernel.
>>
>> The symptom is this:
>>
>> The sme_state is SME_CONNECTED, so it bails out below before sending the
>> 'connected' message to user-space.
>
> Is your system being really really really slow and/or are threads
> getting pre-empted a lot? This maybe seem like a bit of a stretch, but
> it seems possible that this happens:
>
> ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
> somewhere between printing "associated" and calling
> cfg80211_send_rx_assoc() (or in the call already, before taking the lock
> though.)
>
> Then your interface is set down at the same time, possibly on a
> different CPU. Here's where the scenario gets stretched, clearly your
> interface is getting set down over a minute later, I don't see how you
> could have stalled the other thread for that long.
>
> But if you did, then that thread is still processing things while the
> interface is going down, cfg80211 didn't know anything about the
> association having completed so it won't have disconnected, etc.
>
> So far, I haven't found any other scenario, nor a solution.

It is not that slow or overloaded (at least most of the time,
and in particular, I only had 20 virtual stations up on this system
not doing much traffic...it easily handles 100's of stations).

And, once it gets in this state..it stays there (overnight,
with my app resetting the port (via 'ip link set down' and
poking at wpa_supplicant) every minute or so in this case.

I was wondering..in the cfg80211_mlme_down method (or perhaps
some place similar), should we force sme state to IDLE
with a big WARN_ON_ONCE or similar.

That way, if it does get stuck somehow, we can recover by
downing the interface and bringing it back up?

For what it's worth, I don't recall ever seeing this problem
in 5.7, but it's way to rare to be able to bisect...

Thanks,
Ben

>
> johannes
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-08 18:14         ` Ben Greear
@ 2013-05-10 21:21           ` Ben Greear
  2013-05-10 21:25             ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Greear @ 2013-05-10 21:21 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 05/08/2013 11:14 AM, Ben Greear wrote:
> On 05/08/2013 10:58 AM, Johannes Berg wrote:
>> On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:
>>
>>> Ok, I reproduced this with yet more debugging printouts in the kernel.
>>>
>>> The symptom is this:
>>>
>>> The sme_state is SME_CONNECTED, so it bails out below before sending the
>>> 'connected' message to user-space.
>>
>> Is your system being really really really slow and/or are threads
>> getting pre-empted a lot? This maybe seem like a bit of a stretch, but
>> it seems possible that this happens:
>>
>> ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
>> somewhere between printing "associated" and calling
>> cfg80211_send_rx_assoc() (or in the call already, before taking the lock
>> though.)
>>
>> Then your interface is set down at the same time, possibly on a
>> different CPU. Here's where the scenario gets stretched, clearly your
>> interface is getting set down over a minute later, I don't see how you
>> could have stalled the other thread for that long.
>>
>> But if you did, then that thread is still processing things while the
>> interface is going down, cfg80211 didn't know anything about the
>> association having completed so it won't have disconnected, etc.
>>
>> So far, I haven't found any other scenario, nor a solution.
>
> It is not that slow or overloaded (at least most of the time,
> and in particular, I only had 20 virtual stations up on this system
> not doing much traffic...it easily handles 100's of stations).
>
> And, once it gets in this state..it stays there (overnight,
> with my app resetting the port (via 'ip link set down' and
> poking at wpa_supplicant) every minute or so in this case.
>
> I was wondering..in the cfg80211_mlme_down method (or perhaps
> some place similar), should we force sme state to IDLE
> with a big WARN_ON_ONCE or similar.
>
> That way, if it does get stuck somehow, we can recover by
> downing the interface and bringing it back up?
>

Here's some more debug info..hit it again today:

I added this debug code (on top of all my other patches and 3.9.1+).

void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
			struct net_device *dev)
{
	struct wireless_dev *wdev = dev->ieee80211_ptr;
	struct cfg80211_deauth_request req;
	u8 bssid[ETH_ALEN];

	ASSERT_WDEV_LOCK(wdev);

	printk("mlme_down: %s: type: %i  sme_state: %i current-bss: %p\n",
                dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
	       wdev->current_bss);

I see this printout for the stuck station (this is dmesg | grep sta74,
so it skips errors about other interfaces that are also hung).

I am guessing we should never be calling mlme_down with state
of CFG80211_SME_CONNECTED when bss is NULL?

I'm hoping I can get by with some sort of work-around patch
for the 3.9 kernel instead of trying to patch in your big
locking changes....


sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
sta74: Invalid WDS/flush state, type: 2  WDS: 5  flushed: 1
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready


> For what it's worth, I don't recall ever seeing this problem
> in 5.7, but it's way to rare to be able to bisect...
>
> Thanks,
> Ben
>
>>
>> johannes
>>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-10 21:21           ` Ben Greear
@ 2013-05-10 21:25             ` Johannes Berg
  2013-05-10 21:33               ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2013-05-10 21:25 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Fri, 2013-05-10 at 14:21 -0700, Ben Greear wrote:

> void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
> 			struct net_device *dev)
> {
> 	struct wireless_dev *wdev = dev->ieee80211_ptr;
> 	struct cfg80211_deauth_request req;
> 	u8 bssid[ETH_ALEN];
> 
> 	ASSERT_WDEV_LOCK(wdev);
> 
> 	printk("mlme_down: %s: type: %i  sme_state: %i current-bss: %p\n",
>                 dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
> 	       wdev->current_bss);
> 
> I see this printout for the stuck station (this is dmesg | grep sta74,
> so it skips errors about other interfaces that are also hung).
> 
> I am guessing we should never be calling mlme_down with state
> of CFG80211_SME_CONNECTED when bss is NULL?

We should _never_ be in a state where current_bss is NULL but the state
is != IDLE. The question I can't seem to find an answer for is how we
got into that state, that we're in the state in down() is really less
interesting.

Since you seem to be able to reproduce this, maybe it'd help to mark all
state transitions and current_bss assignments, and then backtrack them
after the fact.

johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
  2013-05-10 21:25             ` Johannes Berg
@ 2013-05-10 21:33               ` Ben Greear
  0 siblings, 0 replies; 9+ messages in thread
From: Ben Greear @ 2013-05-10 21:33 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 05/10/2013 02:25 PM, Johannes Berg wrote:
> On Fri, 2013-05-10 at 14:21 -0700, Ben Greear wrote:
>
>> void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
>> 			struct net_device *dev)
>> {
>> 	struct wireless_dev *wdev = dev->ieee80211_ptr;
>> 	struct cfg80211_deauth_request req;
>> 	u8 bssid[ETH_ALEN];
>>
>> 	ASSERT_WDEV_LOCK(wdev);
>>
>> 	printk("mlme_down: %s: type: %i  sme_state: %i current-bss: %p\n",
>>                  dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
>> 	       wdev->current_bss);
>>
>> I see this printout for the stuck station (this is dmesg | grep sta74,
>> so it skips errors about other interfaces that are also hung).
>>
>> I am guessing we should never be calling mlme_down with state
>> of CFG80211_SME_CONNECTED when bss is NULL?
>
> We should _never_ be in a state where current_bss is NULL but the state
> is != IDLE. The question I can't seem to find an answer for is how we
> got into that state, that we're in the state in down() is really less
> interesting.
>
> Since you seem to be able to reproduce this, maybe it'd help to mark all
> state transitions and current_bss assignments, and then backtrack them
> after the fact.

I'll work on instrumenting all of those assignments.  I plan to use
a helper macro to assign them and print out call sites..

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-05-10 21:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-02 19:50 mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station Ben Greear
2013-05-02 20:24 ` Johannes Berg
2013-05-02 20:45   ` Ben Greear
2013-05-08 16:18     ` Ben Greear
2013-05-08 17:58       ` Johannes Berg
2013-05-08 18:14         ` Ben Greear
2013-05-10 21:21           ` Ben Greear
2013-05-10 21:25             ` Johannes Berg
2013-05-10 21:33               ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).