From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail.candelatech.com ([208.74.158.172]:57259 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755257Ab0LHRg2 (ORCPT ); Wed, 8 Dec 2010 12:36:28 -0500 Message-ID: <4CFFC214.6000608@candelatech.com> Date: Wed, 08 Dec 2010 09:36:20 -0800 From: Ben Greear MIME-Version: 1.0 To: "Luis R. Rodriguez" CC: Johannes Berg , Tejun Heo , linux-wireless@vger.kernel.org Subject: Re: [PATCH] mac80211: Fix deadlock in ieee80211_do_stop. References: <1289592426-5367-1-git-send-email-greearb@candelatech.com> <1289594998.3736.11.camel@jlt3.sipsolutions.net> <4CDDAA3B.9090007@candelatech.com> <1289596096.3736.13.camel@jlt3.sipsolutions.net> <4CDE699B.70401@kernel.org> <4CE1A344.7040201@candelatech.com> <4CE292F7.4090200@kernel.org> <1289929258.3673.1.camel@jlt3.sipsolutions.net> <4CE396A9.1050908@kernel.org> <1290020005.3777.6.camel@jlt3.sipsolutions.net> <4CE4C8DD.6010806@kernel.org> <51f5dd53c39a77fff4efc1a99b189725@localhost> <4CE4D41F.1080005@kernel.org> <1290099585.3801.1.camel@jlt3.sipsolutions.net> <4CE68AF4.8060507@kernel.org> <1290189452.3768.3.camel@jlt3.sipsolutions.net> <4CE6E430.6080804@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: On 11/19/2010 02:27 PM, Luis R. Rodriguez wrote: > On Fri, Nov 19, 2010 at 12:55 PM, Ben Greear wrote: >> On 11/19/2010 09:57 AM, Johannes Berg wrote: >>> >>> On Fri, 2010-11-19 at 15:34 +0100, Tejun Heo wrote: >>> >>>> Awesome. :-) >>>> >>>> Ben, if you have trouble generating full trace, please let me know if >>>> there's something I can buy which isn't too expensive to reproduce the >>>> problem. I would be happy to track it down myself. >>> >>> Maybe you can try Ben's setup in kvm (or directly on your box if you >>> like) with mac80211_hwsim. From a mac80211 POV it should be almost >>> equivalent, although it'll do different memory allocation patterns etc. >> >> I tried manually backing out my patch, and now I can no longer reproduce >> the problem. Maybe something in -rc2 fixed it, or maybe some changes >> to my environment just made it harder to hit. >> >> If you see no logical reason why calling flush_work with RTNL held >> would cause trouble, then I guess we can just leave the code as is >> for now. >> >> If you do want to play with this yourself, I think any ath5k type adapter >> with 64+ virtual stations configured would be a valid test case. My >> application calls ifdown/ifup on them a few times after being created >> and then generates traffic (and gathers stats, calls 'iwconfig', etc). >> As configured in the original scenario that reproduced the problem, >> the STAs had no encryption and were all associating with a single AP. >> wpa_supplicant was not being used. > > FWIW, I had to do similar tests before and Ben offered up a perl > script to do something similar to what his proprietary app does upon > device bring up. I've modified it just a bit and you can find it here: > > http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/poo.pl Well, I backed out my work-around patch yesterday, and then let the system run overnight. This morning it is mostly dead, spewing OOM errors and with a bunch of 'sh' processes using maximum amount of CPU, blocked on trying to acquire rtnl. There is one 'ip' process that appears to hold rtnl and is trying to call ieee80211_do_stop, which is probably blocked down in the work-queue logic just like last time. Lots of worker processes attempting to grab rtnl (and many other processes as well.) Lockdep was disabled because a proprietary module of mine was attempted to be loaded, but it doesn't actual load due to symbol mismatch (it's compiled against a non-debug kernel). If the lockdep info is critical, I can attempt to reproduce with my module completed removed from the file system so it cannot attempt to load, but it seems like last time the 'sysrq t' was of more interest anyway. I have uploaded what I believe is a full 'sysrq t' output, interspersed with OOM warnings that are constantly spewing to the console, here: http://www.candelatech.com/~greearb/minicom_ath9k_log.txt Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com