Re: mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station.

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ben Greear <greearb@candelatech.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: mac80211:  3.9.0+:  Invalid WDS/flush state and non-connecting station.
Date: Fri, 10 May 2013 14:21:42 -0700	[thread overview]
Message-ID: <518D64E6.8000102@candelatech.com> (raw)
In-Reply-To: <518A9618.1020107@candelatech.com>

On 05/08/2013 11:14 AM, Ben Greear wrote:
> On 05/08/2013 10:58 AM, Johannes Berg wrote:
>> On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:
>>
>>> Ok, I reproduced this with yet more debugging printouts in the kernel.
>>>
>>> The symptom is this:
>>>
>>> The sme_state is SME_CONNECTED, so it bails out below before sending the
>>> 'connected' message to user-space.
>>
>> Is your system being really really really slow and/or are threads
>> getting pre-empted a lot? This maybe seem like a bit of a stretch, but
>> it seems possible that this happens:
>>
>> ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
>> somewhere between printing "associated" and calling
>> cfg80211_send_rx_assoc() (or in the call already, before taking the lock
>> though.)
>>
>> Then your interface is set down at the same time, possibly on a
>> different CPU. Here's where the scenario gets stretched, clearly your
>> interface is getting set down over a minute later, I don't see how you
>> could have stalled the other thread for that long.
>>
>> But if you did, then that thread is still processing things while the
>> interface is going down, cfg80211 didn't know anything about the
>> association having completed so it won't have disconnected, etc.
>>
>> So far, I haven't found any other scenario, nor a solution.
>
> It is not that slow or overloaded (at least most of the time,
> and in particular, I only had 20 virtual stations up on this system
> not doing much traffic...it easily handles 100's of stations).
>
> And, once it gets in this state..it stays there (overnight,
> with my app resetting the port (via 'ip link set down' and
> poking at wpa_supplicant) every minute or so in this case.
>
> I was wondering..in the cfg80211_mlme_down method (or perhaps
> some place similar), should we force sme state to IDLE
> with a big WARN_ON_ONCE or similar.
>
> That way, if it does get stuck somehow, we can recover by
> downing the interface and bringing it back up?
>

Here's some more debug info..hit it again today:

I added this debug code (on top of all my other patches and 3.9.1+).

void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
			struct net_device *dev)
{
	struct wireless_dev *wdev = dev->ieee80211_ptr;
	struct cfg80211_deauth_request req;
	u8 bssid[ETH_ALEN];

	ASSERT_WDEV_LOCK(wdev);

	printk("mlme_down: %s: type: %i  sme_state: %i current-bss: %p\n",
                dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
	       wdev->current_bss);

I see this printout for the stuck station (this is dmesg | grep sta74,
so it skips errors about other interfaces that are also hung).

I am guessing we should never be calling mlme_down with state
of CFG80211_SME_CONNECTED when bss is NULL?

I'm hoping I can get by with some sort of work-around patch
for the 3.9 kernel instead of trying to patch in your big
locking changes....


sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
sta74: Invalid WDS/flush state, type: 2  WDS: 5  flushed: 1
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
__cfg80211_disconnect: sta74: type: 2  sme_state: 2  conn-state: -1
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
mlme_down: sta74: type: 2  sme_state: 2 current-bss:           (null)
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
sta74: associated
connect_result: sta74: type: 2  sme_state: 2
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready


> For what it's worth, I don't recall ever seeing this problem
> in 5.7, but it's way to rare to be able to bisect...
>
> Thanks,
> Ben
>
>>
>> johannes
>>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

next prev parent reply	other threads:[~2013-05-10 21:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-02 19:50 mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station Ben Greear
2013-05-02 20:24 ` Johannes Berg
2013-05-02 20:45   ` Ben Greear
2013-05-08 16:18     ` Ben Greear
2013-05-08 17:58       ` Johannes Berg
2013-05-08 18:14         ` Ben Greear
2013-05-10 21:21           ` Ben Greear [this message]
2013-05-10 21:25             ` Johannes Berg
2013-05-10 21:33               ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518D64E6.8000102@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).