* Linux Client vs. CISCO AP with band select @ 2010-11-19 21:22 Wolfgang Breyha 2010-11-19 21:45 ` Dan Williams 2010-11-20 11:27 ` Jouni Malinen 0 siblings, 2 replies; 13+ messages in thread From: Wolfgang Breyha @ 2010-11-19 21:22 UTC (permalink / raw) To: linux-wireless@vger.kernel.org Hi! I'm working at the IT department at the University of Vienna. We've a large installation of CISCO APs providing WLAN access to students and employees. All of these APs provide both 2,4GHz and 5GHz channels. CISCO provides two features called "load balancing" and "band select". At least "band select" causes lots of troubles using a Linux client. It needs a big portion of luck to successfully connect. I'm using my HP Elitebook 2540p with Intel 6200 abgn pci id: 8086:4239 (rev 35) Starting with Fedora 13, now Fedora 14 I tried to get into all the wireless stuff. Currently I'm running compat-wireless-20101115 and wpa_supplicant 0.7.3. Additionally I patched NetworkManager to use a timeout of 180 seconds instead of the default 25 and "-D nl80211" as driver for wpa_supplicant. Firmware used is iwlwifi-6000-4.ucode. AFAIK "band select" tries to "convince" a client to prefer 5GHz channels by not answering to 2.4GHz probes at least two times (configurable with 2 as default) the same client asks. But the AP appears in scans since beacons are received as usual. In my case I see 10 BSSIDs for this SSID. 2 strong 2.4GHz APs and the first 5GHz AP appears on third position reception wise. wpa_supplicant starts authentication at the strongest. Then I see a probe request for the SSID in wireshark, but no response from the selected BSSID. No authentication packet is seen from wireshark. Authentication times out. And then the worst case scenario takes place... wpa_supplicant retries and retries the same AP with time outs and scans in between. Sometimes even 180 seconds is not enough to try an other AP. I can provide sample wpa_supplicant.log and wireshark traces if of interest. I just built wireless-compat 20101119 with DEBUG_VERBOSE and can get details if needed. Last but not least I tried with Windows. Windows is able to connected even to the 2.4GHz channels. I've monitored the channel with my linux machine while windows connected to the 2.4GHz AP. All I see are unanswered probes also, but Windows seems to simply send an authentication request afterwards and gets an answer then. I can't figure out how CISCO hopes that a client behaves to cooperate well with this feature. I'm sorry that I'm not very proficient with all that wireless stuff yet, but I'll try to improve and help as good as possible if that's appreciated. With kind regards, Wolfgang Breyha University of Vienna -- Wolfgang Breyha <wbreyha@gmx.net> | http://www.blafasel.at/ Vienna University Computer Center | Austria ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-19 21:22 Linux Client vs. CISCO AP with band select Wolfgang Breyha @ 2010-11-19 21:45 ` Dan Williams 2010-11-20 11:27 ` Jouni Malinen 1 sibling, 0 replies; 13+ messages in thread From: Dan Williams @ 2010-11-19 21:45 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: linux-wireless@vger.kernel.org On Fri, 2010-11-19 at 22:22 +0100, Wolfgang Breyha wrote: > Hi! > > I'm working at the IT department at the University of Vienna. We've a > large installation of CISCO APs providing WLAN access to students and > employees. All of these APs provide both 2,4GHz and 5GHz channels. CISCO > provides two features called "load balancing" and "band select". > > At least "band select" causes lots of troubles using a Linux client. It > needs a big portion of luck to successfully connect. > > I'm using my HP Elitebook 2540p with Intel 6200 abgn > pci id: 8086:4239 (rev 35) > > Starting with Fedora 13, now Fedora 14 I tried to get into all the > wireless stuff. Currently I'm running compat-wireless-20101115 and > wpa_supplicant 0.7.3. Additionally I patched NetworkManager to use a > timeout of 180 seconds instead of the default 25 and "-D nl80211" as Eww, 180 seconds indicates something is clearly wrong with the network setup or the driver. Based on your description below, we do need to figure out something in the supplicant or driver to better handle this behavior. We will be adding settings to NM to lock/prefer specific bands too. (NM 0.9 will default to nl80211 supplicant driver) > driver for wpa_supplicant. Firmware used is iwlwifi-6000-4.ucode. > > AFAIK "band select" tries to "convince" a client to prefer 5GHz channels > by not answering to 2.4GHz probes at least two times (configurable with > 2 as default) the same client asks. But the AP appears in scans since > beacons are received as usual. > > In my case I see 10 BSSIDs for this SSID. 2 strong 2.4GHz APs and the > first 5GHz AP appears on third position reception wise. wpa_supplicant > starts authentication at the strongest. Then I see a probe request for > the SSID in wireshark, but no response from the selected BSSID. No > authentication packet is seen from wireshark. > > Authentication times out. And then the worst case scenario takes > place... wpa_supplicant retries and retries the same AP with time outs > and scans in between. Sometimes even 180 seconds is not enough to try an > other AP. Yeah, there's gotta be some better way to handle this. The cisco behavior seems like a huge hack that tries to work around Windows specific 802.11 stack behavior, but unfortunately we've got to handle it as well. Not sure how that should happen though. Dan > I can provide sample wpa_supplicant.log and wireshark traces if of interest. > > I just built wireless-compat 20101119 with DEBUG_VERBOSE and can get > details if needed. > > Last but not least I tried with Windows. Windows is able to connected > even to the 2.4GHz channels. I've monitored the channel with my linux > machine while windows connected to the 2.4GHz AP. All I see are > unanswered probes also, but Windows seems to simply send an > authentication request afterwards and gets an answer then. > > I can't figure out how CISCO hopes that a client behaves to cooperate > well with this feature. > > I'm sorry that I'm not very proficient with all that wireless stuff yet, > but I'll try to improve and help as good as possible if that's appreciated. > > With kind regards, > Wolfgang Breyha > University of Vienna ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-19 21:22 Linux Client vs. CISCO AP with band select Wolfgang Breyha 2010-11-19 21:45 ` Dan Williams @ 2010-11-20 11:27 ` Jouni Malinen 2010-11-20 12:04 ` Helmut Schaa 1 sibling, 1 reply; 13+ messages in thread From: Jouni Malinen @ 2010-11-20 11:27 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: linux-wireless@vger.kernel.org On Fri, Nov 19, 2010 at 10:22:32PM +0100, Wolfgang Breyha wrote: > AFAIK "band select" tries to "convince" a client to prefer 5GHz channels > by not answering to 2.4GHz probes at least two times (configurable with > 2 as default) the same client asks. But the AP appears in scans since > beacons are received as usual. Huh.. This makes the AP completely non-compliant with IEEE Std 802.11-2007 and such madness should not really be encouraged in any way or form. Please just disable it and request Cisco to provide a sane solution that allows the stations to opt-in to whatever games the AP want to play and not some non-standard hacks. If the AP wants to suggest the station to move to another band, there better be documented, publicly available specification describing a clear message that stations can use as a clear input to BSS selection. Arbitrarily breaking required standard functionality is not such a mechanism. > In my case I see 10 BSSIDs for this SSID. 2 strong 2.4GHz APs and the > first 5GHz AP appears on third position reception wise. wpa_supplicant > starts authentication at the strongest. Then I see a probe request for > the SSID in wireshark, but no response from the selected BSSID. No > authentication packet is seen from wireshark. OK, this is all expected thanks to the silly AP design. > Authentication times out. And then the worst case scenario takes > place... wpa_supplicant retries and retries the same AP with time outs > and scans in between. Sometimes even 180 seconds is not enough to try an > other AP. This is not.. wpa_supplicant should use blacklist to block the BSSID temporarily and try to find another BSS at that point. > I can provide sample wpa_supplicant.log and wireshark traces if of interest. Could you please send me those? Ideally, I would like to see wpa_supplicant debug log with -ddt on the command line (i.e., timestamps and verbose debugging) with -Dnl80211 and preferably, without using NetworkManager to control it to avoid any extra timeouts etc. making the log more confusing to interpret. > Last but not least I tried with Windows. Windows is able to connected > even to the 2.4GHz channels. I've monitored the channel with my linux > machine while windows connected to the 2.4GHz AP. All I see are > unanswered probes also, but Windows seems to simply send an > authentication request afterwards and gets an answer then. This is all driver/802.11 specific and it is not really same for all Windows drivers or all Linux drivers. The AP is behaving incorrectly and the station behavior in such a case is undefined.. > I can't figure out how CISCO hopes that a client behaves to cooperate > well with this feature. Neither can I.. Unfortunately, some enterprise AP vendors seem to be coming up with load balancing designs that are based on some proprietary hacks and hope that all stations behave in a specific way. There is no sane way of implementing these things properly without depending on some common standard that both the APs and stations can use to exchange information about preferred BSS candidates. IEEE 802.11 was designed to keep the station, not the AP, in control of roaming; that cannot be changed unilaterally at the AP without breaking things badly. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-20 11:27 ` Jouni Malinen @ 2010-11-20 12:04 ` Helmut Schaa 2010-11-20 16:49 ` Wolfgang Breyha 2010-11-23 16:13 ` Wolfgang Breyha 0 siblings, 2 replies; 13+ messages in thread From: Helmut Schaa @ 2010-11-20 12:04 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Jouni Malinen, linux-wireless@vger.kernel.org Am Samstag 20 November 2010 schrieb Jouni Malinen: > On Fri, Nov 19, 2010 at 10:22:32PM +0100, Wolfgang Breyha wrote: > > In my case I see 10 BSSIDs for this SSID. 2 strong 2.4GHz APs and the > > first 5GHz AP appears on third position reception wise. wpa_supplicant > > starts authentication at the strongest. Then I see a probe request for > > the SSID in wireshark, but no response from the selected BSSID. No > > authentication packet is seen from wireshark. > > OK, this is all expected thanks to the silly AP design. I'm wondering if the Cisco APs would reply to direct probe requests (with the bssid being set instead of the broadcast address). At least we're sending broadcast probes before authentication (in case we did not receive a probe response from this AP yet during a previous scan): 464 /* 465 * Direct probe is sent to broadcast address as some APs 466 * will not answer to direct packet in unassociated state. 467 */ 468 ieee80211_send_probe_req(sdata, NULL, wk->probe_auth.ssid, 469 wk->probe_auth.ssid_len, NULL, 0); I guess this was introduced to work around another strange AP behavior. If the Cisco APs would reply to direct probes we could (as a workaround) just send an additional direct probe here. I agree with Jouni that the AP behavior is just stupid but the users will blame Linux for not being able to connect and not the AP vendor. Wolfgang, could you please try the (untested) patch below if it makes any difference? Helmut --- diff --git a/net/mac80211/work.c b/net/mac80211/work.c index ae344d1..57ae8d5 100644 --- a/net/mac80211/work.c +++ b/net/mac80211/work.c @@ -467,6 +467,9 @@ ieee80211_direct_probe(struct ieee80211_work *wk) */ ieee80211_send_probe_req(sdata, NULL, wk->probe_auth.ssid, wk->probe_auth.ssid_len, NULL, 0); + ieee80211_send_probe_req(sdata, wk->filter_ta, wk->probe_auth.ssid, + wk->probe_auth.ssid_len, NULL, 0); + wk->timeout = jiffies + IEEE80211_AUTH_TIMEOUT; run_again(local, wk->timeout); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-20 12:04 ` Helmut Schaa @ 2010-11-20 16:49 ` Wolfgang Breyha 2010-11-20 21:24 ` Jouni Malinen 2010-11-23 16:13 ` Wolfgang Breyha 1 sibling, 1 reply; 13+ messages in thread From: Wolfgang Breyha @ 2010-11-20 16:49 UTC (permalink / raw) To: Helmut Schaa; +Cc: Jouni Malinen, linux-wireless@vger.kernel.org On 2010-11-20 13:04, Helmut Schaa wrote: > If the Cisco APs would reply to direct probes we could (as a workaround) just > send an additional direct probe here. I agree with Jouni that the AP behavior > is just stupid but the users will blame Linux for not being able to connect > and not the AP vendor. Suddenly the term "<whatever working protocol> fixup" comes to mind reading your and Jounis answers;-) I agree with you, too. But I'm not the one administrating the APs here at the university. Maybe I can convince my college, but as long as you want to find a solution for Linux as long I'll try to give you remote hands;-) We tried to deactivate "band select" already and my laptop was able to connect instantly to the nearest 2.4GHz AP. But that's the point where the second "feature" kicks in. "load balancing" is then used by the APs to kick stations trying to push them to an other AP. At this point Linux clients have troubles again to get a stable connection. > Wolfgang, could you please try the (untested) patch below if it makes any > difference? Sure, I'll try as soon as I'm back at the office on Monday. And I'll try to get the logs and packet traces for Jouni, too. Greetings, Wolfgang -- Wolfgang Breyha <wbreyha@gmx.net> | http://www.blafasel.at/ Vienna University Computer Center | Austria ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-20 16:49 ` Wolfgang Breyha @ 2010-11-20 21:24 ` Jouni Malinen 0 siblings, 0 replies; 13+ messages in thread From: Jouni Malinen @ 2010-11-20 21:24 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On Sat, Nov 20, 2010 at 05:49:30PM +0100, Wolfgang Breyha wrote: > We tried to deactivate "band select" already and my laptop was able to > connect instantly to the nearest 2.4GHz AP. But that's the point where the > second "feature" kicks in. "load balancing" is then used by the APs to kick > stations trying to push them to an other AP. At this point Linux clients > have troubles again to get a stable connection. Could you please send a wpa_supplicant debug log for this one, too? If the AP is just kicking out the station, we should be able to blacklist the AP and try to find someone else.. I've seen number of issues with load balancing designs in the past, but I think I fixed many of them the last time I was testing this.. It's a bit pity that I don't have this type of "enterprise AP" test setup at home to remind me that things are not working. I usually end up debugging this only when traveling and getting hit by connectivity issues myself. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-20 12:04 ` Helmut Schaa 2010-11-20 16:49 ` Wolfgang Breyha @ 2010-11-23 16:13 ` Wolfgang Breyha 2010-11-24 14:56 ` Wolfgang Breyha 2010-11-25 21:18 ` Jouni Malinen 1 sibling, 2 replies; 13+ messages in thread From: Wolfgang Breyha @ 2010-11-23 16:13 UTC (permalink / raw) To: Helmut Schaa; +Cc: Jouni Malinen, linux-wireless@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2683 bytes --] Hi! On 2010-11-20 13:04, Helmut Schaa wrote: > If the Cisco APs would reply to direct probes we could (as a workaround) just > send an additional direct probe here. I agree with Jouni that the AP behavior > is just stupid but the users will blame Linux for not being able to connect > and not the AP vendor. > > Wolfgang, could you please try the (untested) patch below if it makes any > difference? Sorry, it took me a day longer as promised because I had to stay at home yesterday. The patch from Helmut didn't change anything. I even tried to send both broadcast and direct probes in triples to check if that's the threshold which is configured in band select as retries. It's not;-) After that I tried a dirty hack on wpa_supplicant 0.7.3: ------- --- wpa_supplicant-0.7.3.orig/wpa_supplicant/sme.c 2010-09-07 17:43:39.000000000 +0200 +++ wpa_supplicant-0.7.3/wpa_supplicant/sme.c 2010-11-23 15:21:23.866829986 +0100 @@ -456,8 +456,23 @@ void sme_event_auth_timed_out(struct wpa_supplicant *wpa_s, union wpa_event_data *data) { + int timeout = 5000; wpa_printf(MSG_DEBUG, "SME: Authentication timed out"); - wpa_supplicant_req_scan(wpa_s, 5, 0); + if (wpa_blacklist_add(wpa_s, wpa_s->pending_bssid) == 0) { + struct wpa_blacklist *b; + wpa_blacklist_add(wpa_s, wpa_s->pending_bssid); + b = wpa_blacklist_get(wpa_s, wpa_s->pending_bssid); + if (b && b->count < 3) { + /* + * Speed up next attempt if there could be other APs + * that could accept association. + */ + timeout = 100; + } + } + wpa_supplicant_req_scan(wpa_s, timeout / 1000, + 1000 * (timeout % 1000)); +// wpa_supplicant_req_scan(wpa_s, 5, 0); } -------- In other words I reused the code found in sme_event_assoc_reject() to add the BSSID to the blacklist. To speed up things further I add it twice;-) I don't know why wpa_supplicant needs a blacklist count of 2 to finally try an other BSSID. And it helps a lot. With this change wpa_supplicant stops retrying the same BSSID all the time and tries a 5GHz one pretty fast. And I think that's exactly what CISCO tries to achieve. Finally there is another timeout in the EAP stage (SUPP_BE) I can't pinpoint. I attached the wpa_supplicant.log. Authenticated once reauthentication works very fast if needed. Knowing where to search and how to hack mac80211 and wpa_supplicant I'll try to find some details which probes CISCO responds to reaching the threshold. I can still provide packet traces if you need/want them. In case of the load balancing feature it may take some time because I've not found a trick to provoke it. But I think a well and fast trained blacklist will help in this case, too. Greetings, Wolfgang [-- Attachment #2: wpa_eap.log.gz --] [-- Type: application/x-gzip, Size: 3196 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-23 16:13 ` Wolfgang Breyha @ 2010-11-24 14:56 ` Wolfgang Breyha 2010-11-25 16:47 ` Jouni Malinen 2010-11-25 21:18 ` Jouni Malinen 1 sibling, 1 reply; 13+ messages in thread From: Wolfgang Breyha @ 2010-11-24 14:56 UTC (permalink / raw) To: Helmut Schaa; +Cc: Jouni Malinen, linux-wireless@vger.kernel.org Hi again;-) I have proof now, that the APs respond to authentication requests regardless of a successful probe before. Simply skipping the direct probe is sufficient to connect successfully. All my efforts to find a way to get a response to the probe requests were unsuccessful. Maybe that's something only aironet devices handle correctly? Greetings, Wolfgang -- Wolfgang Breyha <wbreyha@gmx.net> | http://www.blafasel.at/ Vienna University Computer Center | Austria ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-24 14:56 ` Wolfgang Breyha @ 2010-11-25 16:47 ` Jouni Malinen 2010-11-25 17:50 ` Jouni Malinen 0 siblings, 1 reply; 13+ messages in thread From: Jouni Malinen @ 2010-11-25 16:47 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On Wed, Nov 24, 2010 at 03:56:32PM +0100, Wolfgang Breyha wrote: > I have proof now, that the APs respond to authentication requests regardless > of a successful probe before. Simply skipping the direct probe is sufficient > to connect successfully. Interestingly enough, that seems to be exactly what the current wireless-testing.git snapshot is doing.. I was trying to reproduce this issue by modifying my AP not to reply to Probe Request frames on 2.4 GHz band and did not see any problems in getting connected. The station saw both the 2.4 and 5 GHz BSSes from the AP and the 2.4 GHz BSS was selected based on signal strength. mac80211 went through the authentication and association frame exchanges without any problems (and without sending out a directed Probe Request frame). Whether this change was done by design is another question, but at least this seems to be the current behavior. There may still be some issues in wpa_supplicant blacklist handling which I will try to reproduce in some other way since the station should have actually managed to follow the not so polite hint from the AP and try to use the 5 GHz band here. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-25 16:47 ` Jouni Malinen @ 2010-11-25 17:50 ` Jouni Malinen 0 siblings, 0 replies; 13+ messages in thread From: Jouni Malinen @ 2010-11-25 17:50 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On Thu, Nov 25, 2010 at 06:47:24PM +0200, Jouni Malinen wrote: > Interestingly enough, that seems to be exactly what the current > wireless-testing.git snapshot is doing.. I was trying to reproduce this > issue by modifying my AP not to reply to Probe Request frames on 2.4 GHz > band and did not see any problems in getting connected. The station saw > both the 2.4 and 5 GHz BSSes from the AP and the 2.4 GHz BSS was > selected based on signal strength. mac80211 went through the > authentication and association frame exchanges without any problems (and > without sending out a directed Probe Request frame). Whether this change > was done by design is another question, but at least this seems to be > the current behavior. Well, maybe not. I could not reproduce that after adding more debug code to mac80211. In other words, the Probe Request is still there before authentication is attempted. Anyway, with this, I did get to reproduce the problem that shows up wpa_supplicant blacklisting not working properly in this case at least with -Dnl80211. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-23 16:13 ` Wolfgang Breyha 2010-11-24 14:56 ` Wolfgang Breyha @ 2010-11-25 21:18 ` Jouni Malinen 2010-11-25 23:24 ` Wolfgang Breyha 1 sibling, 1 reply; 13+ messages in thread From: Jouni Malinen @ 2010-11-25 21:18 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On Tue, Nov 23, 2010 at 05:13:40PM +0100, Wolfgang Breyha wrote: > The patch from Helmut didn't change anything. I even tried to send both > broadcast and direct probes in triples to check if that's the threshold > which is configured in band select as retries. It's not;-) > > After that I tried a dirty hack on wpa_supplicant 0.7.3: ... > In other words I reused the code found in sme_event_assoc_reject() to > add the BSSID to the blacklist. To speed up things further I add it > twice;-) I don't know why wpa_supplicant needs a blacklist count of 2 to > finally try an other BSSID. Thanks! This was indeed one of the problems (but not the only one). The 1 vs. 2 part comes from five years ago (needed to go through the commit log messages to remember that one..). It avoids getting stuck with worse networks when multiple network blocks are configured. So yes, incrementing the blacklist count by two is indeed the way to go here in some cases. I simulated the five most likely ways current APs could attempt to implement load balancing and fixed/optimized those in sme.c. Please take a look at following commits if you want to see more details: http://w1.fi/gitweb/gitweb.cgi?p=hostap.git;a=commitdiff;h=7e6646c794ccd1df8d38b9927d11e101c0d45517 http://w1.fi/gitweb/gitweb.cgi?p=hostap.git;a=commitdiff;h=f47d639d495b32f0348c09a0fd0ff5b5791720d4 With those in place, it should now be possible to recover from the authentication failure (this no Probe Request looks like auth timeout with mac80211) or association failure (e.g., AP rejecting association with status code 17) in about 0.5 seconds or so (or a bit more if there are APs in multiple channels). Though, please note that this is only the case with nl80211 as the driver interface (-Dnl80211). WEXT will still go through three full scans in this type of case (i.e., two full scans to recover vs. one scan with just the known channels when using nl80211). > And it helps a lot. With this change wpa_supplicant stops retrying the > same BSSID all the time and tries a 5GHz one pretty fast. And I think > that's exactly what CISCO tries to achieve. Yes, I would assume so. > Finally there is another timeout in the EAP stage (SUPP_BE) I can't > pinpoint. I attached the wpa_supplicant.log. That looks like a lost EAPOL packet to me based on that log.. Would likely need to use a wireless sniffer to take a closer look at where the packet is dropped. > Knowing where to search and how to hack mac80211 and wpa_supplicant I'll > try to find some details which probes CISCO responds to reaching the > threshold. I don't think that that would be very critical to figure out anymore with the current wpa_supplicant (-Dnl80211). Sure, we could consider removing the need-a-probe-response-before-auth case from mac80211, but actually, in this particular case, it would result in not following the not-so-gently hint from the AP. > I can still provide packet traces if you need/want them. In case of the > load balancing feature it may take some time because I've not found a > trick to provoke it. But I think a well and fast trained blacklist will > help in this case, too. For band enforcement, I think the behavior is clear enough and no additional information is needed. I can easily simulate this type of behavior by modifying hostapd. For load balancing while being associated, it would be interesting to hear if it behaves badly, i.e., if there is a long gap in connectivity etc. user visible badness. I would assume I can easily simulate those for testing, but to do that, I would need to first see how the particular AP/network is behaving. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-25 21:18 ` Jouni Malinen @ 2010-11-25 23:24 ` Wolfgang Breyha 2010-11-26 9:48 ` Jouni Malinen 0 siblings, 1 reply; 13+ messages in thread From: Wolfgang Breyha @ 2010-11-25 23:24 UTC (permalink / raw) To: Jouni Malinen; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On 2010-11-25 22:18, Jouni Malinen wrote: > I simulated the five most likely ways current APs could attempt to > implement load balancing and fixed/optimized those in sme.c. Please take > a look at following commits if you want to see more details: Sure! I definitely will try a git checkout tomorrow and report back! I did some more tests meanwhile. After hacking mac80211 to not send the direct probe I was able to connect to 2.4GHz again as I already noted. What I didn't recognize initially was that the AP responded to probes afterwards for some time. But after a short time (0-5 Minutes) it stopped responding again. I think that's the way load balancing works with CISCO APs. wpa_supplicant deauthenticates then with "due to inactivity" and blacklists the AP. And this is another case in which the same AP is tried again at the next reconnect attempt because the blacklist count reaches only 1 in events.c:wpa_supplicant_event_disassoc():1298 I tried to change events.c:472ff e = wpa_blacklist_get(wpa_s, bss->bssid); if (e && e->count > 1) { wpa_printf(MSG_DEBUG, " skip - blacklisted"); return 0; } to "e->count >= 1" and had better results since a BSSID is never tried again in the following retry. But your commitdiffs let me guess that it is wanted in some other cases I'm not aware of. Last but not least I talked to my college some days ago and he told me that "band select" is not a feature he needs desperately. But "load balancing" is indeed needed for our large audiences with up to 750 people. If the decision is left to the clients alone some APs are pretty overcrowded very fast. We decided to keep both features active as long as I can help to get the issues fixed for Linux. Afterwards we most likely will deactivate "band select" until the fixes find their way into common Ubuntus, Fedoras & Co. Greetings, Wolfgang -- Wolfgang Breyha <wbreyha@gmx.net> | http://www.blafasel.at/ Vienna University Computer Center | Austria ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Linux Client vs. CISCO AP with band select 2010-11-25 23:24 ` Wolfgang Breyha @ 2010-11-26 9:48 ` Jouni Malinen 0 siblings, 0 replies; 13+ messages in thread From: Jouni Malinen @ 2010-11-26 9:48 UTC (permalink / raw) To: Wolfgang Breyha; +Cc: Helmut Schaa, linux-wireless@vger.kernel.org On Fri, Nov 26, 2010 at 12:24:42AM +0100, Wolfgang Breyha wrote: > I did some more tests meanwhile. After hacking mac80211 to not send the > direct probe I was able to connect to 2.4GHz again as I already noted. What > I didn't recognize initially was that the AP responded to probes afterwards > for some time. But after a short time (0-5 Minutes) it stopped responding > again. I think that's the way load balancing works with CISCO APs. Interesting.. I'm not sure whether that would get many stations leaving the current AP, so there may be other more aggressive options that the AP ends up using in the end, though. I think that mac80211 was just modified to use another AP probing mechanism (data nullfunc instead of Probe Request), so mac80211-based drivers may not react to the probe response changes while associated anymore. > wpa_supplicant deauthenticates then with "due to inactivity" and blacklists > the AP. And this is another case in which the same AP is tried again at the > next reconnect attempt because the blacklist count reaches only 1 in > events.c:wpa_supplicant_event_disassoc():1298 > e = wpa_blacklist_get(wpa_s, bss->bssid); > if (e && e->count > 1) { > wpa_printf(MSG_DEBUG, " skip - blacklisted"); > to "e->count >= 1" and had better results since a BSSID is never tried > again in the following retry. But your commitdiffs let me guess that it is > wanted in some other cases I'm not aware of. Yes, but that only applies for the case where more than a single network configuration block is enabled. I changed wpa_supplicant to change between 0 and 1 in this check based on the number of enabled networks. In addition, I extended the optimized scan after auth/assoc failure mechanism to apply for the disconnection event, too. That should speed up recovery from this type of situation quite a bit. > Last but not least I talked to my college some days ago and he told me that > "band select" is not a feature he needs desperately. But "load balancing" > is indeed needed for our large audiences with up to 750 people. If the > decision is left to the clients alone some APs are pretty overcrowded very > fast. Could you please send me a wireless capture log with some of the Beacon and Probe Response frames from those APs? I would like to see what kind of information they advertise and whether there would be anything worth using in BSS selection to avoid being kicked off from the network based on more aggressive load balancing mechanisms. -- Jouni Malinen PGP id EFC895FA ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-11-26 9:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-19 21:22 Linux Client vs. CISCO AP with band select Wolfgang Breyha 2010-11-19 21:45 ` Dan Williams 2010-11-20 11:27 ` Jouni Malinen 2010-11-20 12:04 ` Helmut Schaa 2010-11-20 16:49 ` Wolfgang Breyha 2010-11-20 21:24 ` Jouni Malinen 2010-11-23 16:13 ` Wolfgang Breyha 2010-11-24 14:56 ` Wolfgang Breyha 2010-11-25 16:47 ` Jouni Malinen 2010-11-25 17:50 ` Jouni Malinen 2010-11-25 21:18 ` Jouni Malinen 2010-11-25 23:24 ` Wolfgang Breyha 2010-11-26 9:48 ` Jouni Malinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).