From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758905Ab1LOOHp (ORCPT ); Thu, 15 Dec 2011 09:07:45 -0500 Received: from he.sipsolutions.net ([78.46.109.217]:37811 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751199Ab1LOOHn (ORCPT ); Thu, 15 Dec 2011 09:07:43 -0500 Subject: Re: iwlagn is getting very shaky From: Johannes Berg To: Emmanuel Grumbach Cc: Norbert Preining , "Guy, Wey-Yi" , Pekka Enberg , "linux-wireless@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Dave Jones , David Rientjes In-Reply-To: (sfid-20111211_205719_337550_081FCE1C) References: <20111125122143.GA30404@gamma.logic.tuwien.ac.at> <20111125123720.GA31564@gamma.logic.tuwien.ac.at> <1322387175.4044.16.camel@jlt3.sipsolutions.net> <20111128035627.GH1422@gamma.logic.tuwien.ac.at> <20111128042343.GA4619@gamma.logic.tuwien.ac.at> <20111128232525.GA12719@gamma.logic.tuwien.ac.at> <1322555472.4110.8.camel@jlt3.sipsolutions.net> (sfid-20111211_205719_337550_081FCE1C) Content-Type: text/plain; charset="UTF-8" Date: Thu, 15 Dec 2011 15:07:33 +0100 Message-ID: <1323958053.3337.48.camel@jlt3.sipsolutions.net> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org So I finally looked at this ... On Sun, 2011-12-11 at 21:56 +0200, Emmanuel Grumbach wrote: > >> Could something be hogging the workqueues? > > So I tried to understand what is going on with the workqueue and ended > > up to see that if we are lucky, we can need the workqueue for the BA > > handshake (could be AddBA / DelBA handling, or driver callback) while > > we are scanning. Which basically means that we will need to wait until > > the scan is over to handle these frames / callbacks. I got these > > measurements while stopping the BA session: > > > > * scanning working for roughly 3 seconds (pardon me not being precise, > > but with this order of magnitude I don't care much about the single > > millisecond..) Oh. I see, while scanning we won't process the work queue. > > * when scanning is over, the while loop in ieee80211_iface_work > > consumes 73 mgmt for about 34ms. > > ( how come we have so many beacons during those 3 seconds..., or maybe > > all the BCAST probe request ?, my network is quite busy...) > > * then the finally my stop_tx_ba_cb was served which took 10ms (time > > takes by the driver). > > * another series of beacons (10ms). > > What about flushing the workqueue before we scan ? > This is not a bullet proof solution of course, we will still encounter > bad races, but at least we would flush what we can before the > workqueue becomes unable for 4 seconds (!). Yeah, that seems like a good thing. Actually I had an idea about this before -- drain & stop the workqueue for any functions in mac80211/cfg.c so that mac80211 essentially becomes single-threaded. > We can also delay the scan if we are in the middle of {add,del}BA > handshake, which is the only flow I can think about that needs > responsiveness. The other frame exchanges are MLME ones and involve > the wpa_supplicant (unless we are using the late WEXT). Hopefully the > wpa_supplicant won't request to scan in the middle of association or > so. There might be other features (mesh or whatever), that may be > hidden from the wpa_supplicant and require good responsiveness from > the wq too. Hm, yeah, that would be an idea too, but I'm not sure it's easy to do right now. johannes