From: Ben Greear <greearb@candelatech.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station.
Date: Fri, 10 May 2013 14:21:42 -0700 [thread overview]
Message-ID: <518D64E6.8000102@candelatech.com> (raw)
In-Reply-To: <518A9618.1020107@candelatech.com>
On 05/08/2013 11:14 AM, Ben Greear wrote:
> On 05/08/2013 10:58 AM, Johannes Berg wrote:
>> On Wed, 2013-05-08 at 09:18 -0700, Ben Greear wrote:
>>
>>> Ok, I reproduced this with yet more debugging printouts in the kernel.
>>>
>>> The symptom is this:
>>>
>>> The sme_state is SME_CONNECTED, so it bails out below before sending the
>>> 'connected' message to user-space.
>>
>> Is your system being really really really slow and/or are threads
>> getting pre-empted a lot? This maybe seem like a bit of a stretch, but
>> it seems possible that this happens:
>>
>> ieee80211_sta_rx_queued_mgmt() is running, possibly on one CPU, and is
>> somewhere between printing "associated" and calling
>> cfg80211_send_rx_assoc() (or in the call already, before taking the lock
>> though.)
>>
>> Then your interface is set down at the same time, possibly on a
>> different CPU. Here's where the scenario gets stretched, clearly your
>> interface is getting set down over a minute later, I don't see how you
>> could have stalled the other thread for that long.
>>
>> But if you did, then that thread is still processing things while the
>> interface is going down, cfg80211 didn't know anything about the
>> association having completed so it won't have disconnected, etc.
>>
>> So far, I haven't found any other scenario, nor a solution.
>
> It is not that slow or overloaded (at least most of the time,
> and in particular, I only had 20 virtual stations up on this system
> not doing much traffic...it easily handles 100's of stations).
>
> And, once it gets in this state..it stays there (overnight,
> with my app resetting the port (via 'ip link set down' and
> poking at wpa_supplicant) every minute or so in this case.
>
> I was wondering..in the cfg80211_mlme_down method (or perhaps
> some place similar), should we force sme state to IDLE
> with a big WARN_ON_ONCE or similar.
>
> That way, if it does get stuck somehow, we can recover by
> downing the interface and bringing it back up?
>
Here's some more debug info..hit it again today:
I added this debug code (on top of all my other patches and 3.9.1+).
void cfg80211_mlme_down(struct cfg80211_registered_device *rdev,
struct net_device *dev)
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct cfg80211_deauth_request req;
u8 bssid[ETH_ALEN];
ASSERT_WDEV_LOCK(wdev);
printk("mlme_down: %s: type: %i sme_state: %i current-bss: %p\n",
dev->name, (int)(wdev->iftype), (int)(wdev->sme_state),
wdev->current_bss);
I see this printout for the stuck station (this is dmesg | grep sta74,
so it skips errors about other interfaces that are also hung).
I am guessing we should never be calling mlme_down with state
of CFG80211_SME_CONNECTED when bss is NULL?
I'm hoping I can get by with some sort of work-around patch
for the 3.9 kernel instead of trying to patch in your big
locking changes....
sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready
sta74: associated
connect_result: sta74: type: 2 sme_state: 2
__cfg80211_disconnect: sta74: type: 2 sme_state: 2 conn-state: -1
mlme_down: sta74: type: 2 sme_state: 2 current-bss: (null)
mlme_down: sta74: type: 2 sme_state: 2 current-bss: (null)
sta74: Invalid WDS/flush state, type: 2 WDS: 5 flushed: 1
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
__cfg80211_disconnect: sta74: type: 2 sme_state: 2 conn-state: -1
mlme_down: sta74: type: 2 sme_state: 2 current-bss: (null)
mlme_down: sta74: type: 2 sme_state: 2 current-bss: (null)
IPv6: ADDRCONF(NETDEV_UP): sta74: link is not ready
sta74: authenticate with 00:de:ad:1d:ea:00
sta74: send auth to 00:de:ad:1d:ea:00 (try 1/3)
sta74: authenticated
sta74: associate with 00:de:ad:1d:ea:00 (try 1/3)
sta74: RX AssocResp from 00:de:ad:1d:ea:00 (capab=0x1 status=0 aid=67)
sta74: associated
connect_result: sta74: type: 2 sme_state: 2
IPv6: ADDRCONF(NETDEV_CHANGE): sta74: link becomes ready
> For what it's worth, I don't recall ever seeing this problem
> in 5.7, but it's way to rare to be able to bisect...
>
> Thanks,
> Ben
>
>>
>> johannes
>>
>
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2013-05-10 21:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-02 19:50 mac80211: 3.9.0+: Invalid WDS/flush state and non-connecting station Ben Greear
2013-05-02 20:24 ` Johannes Berg
2013-05-02 20:45 ` Ben Greear
2013-05-08 16:18 ` Ben Greear
2013-05-08 17:58 ` Johannes Berg
2013-05-08 18:14 ` Ben Greear
2013-05-10 21:21 ` Ben Greear [this message]
2013-05-10 21:25 ` Johannes Berg
2013-05-10 21:33 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=518D64E6.8000102@candelatech.com \
--to=greearb@candelatech.com \
--cc=johannes@sipsolutions.net \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).