linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Tejun Heo <tj@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>,
	linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211:  Fix deadlock in ieee80211_do_stop.
Date: Wed, 17 Nov 2010 09:37:24 -0800	[thread overview]
Message-ID: <4CE412D4.7030406@candelatech.com> (raw)
In-Reply-To: <4CE39892.2070306@kernel.org>

On 11/17/2010 12:55 AM, Tejun Heo wrote:
> Hello,
>
> On 11/16/2010 05:51 PM, Ben Greear wrote:
>>> 1. Try to capture the full dump.  Usually serial console works best.
>>
>> This was from serial console, and I grabbed everything it printed
>> to the screen.  I'll look in /var/log/messages in case there is more
>> there.
>
> Yeah, weird.  It doesn't look like it's missing random lines but it
> definitely doesn't contain all the tasks.
>
>> If you have a system with an ath5k nic, I should be able to show you
>> how to reproduce it, if you're interested.
>
> Unfortunately, I don't have ath5k.  I can order one if it's not too
> expensive.  Anything you can recommend?
>
>>> 2. Does adding WQ_MEM_RECLAIM to alloc_ordered_workqueue() call in
>>>      ieee80211_register_hw() make any difference?
>>>
>>> 3. What if you replace it with the following?
>>>
>>>      alloc_workqueue(wiphy_name(local->hw.wiphy), WQ_NON_REENTRANT, 0)
>>
>> I can try these things..hopefully today.
>>
>> Can you explain briefly how this is supposed to work?  I'm certain that some
>> workers can be blocked attempting to get rtnl.  When we call flush_work(),
>> how is a worker chosen/created to flush that work?
>
> They might not be solutions themselves but they should point where the
> problem is.  flush_work() only flushes the target work.  It waits the
> currently pending or executing work to finish execution.  Ordered
> workqueue can execute only single work at any given time, so if
> another work is taking a long time to finish, everything queued to the
> workqueue will be delayed.  This is why I asked for the full dump so

Well, from the lockdep and stack traces, we can be certain at least one
of the workers is blocked trying to lock RTNL.  That worker is certainly
blocked and will never finish until the flush_work() completes since the
flush_work() caller already owns RTNL.  If
the flush-work() is waiting on that worker to finish, then it's
a deadlock.

> that we can find out who's holding the queue.  The other reason a work
> execution can be delayed is if there is no execution resource
> available due to high memory pressure.  This again will be
> distinguishible from task dump as rescue workers would be active and
> manager worker would be in worker creation path.

I have plenty of memory available when this problem starts.
(I doubled memory to 2GB of low-memory and the problem persists.)

> The two suggested changes modify the workqueue behavior such that each
> resolves one of the two issues.  If you set WQ_MEM_RECLAIM, workqueue
> allocates a dedicated worker to use under memory pressure, so
> execution resource is guaranteed to be there.  If you use
> WQ_NON_REENTRANT, workqueue would execute multiple works in parallel
> and a single work which takes a long time to finish won't delay other
> works queued to the same workqueue.

 From your description, REENTRANT appears that it could fix the problem.
Johannes:  Any idea if that would be proper behaviour for this work-queue,
or would that add out-of-order and locking issues of it's own?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


  reply	other threads:[~2010-11-17 17:37 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 20:07 [PATCH] mac80211: Fix deadlock in ieee80211_do_stop greearb
2010-11-12 20:08 ` Luis R. Rodriguez
2010-11-12 20:16   ` Ben Greear
2010-11-12 20:49 ` Johannes Berg
2010-11-12 20:57   ` Ben Greear
2010-11-12 21:08     ` Johannes Berg
2010-11-12 21:51       ` Ben Greear
2010-11-13 10:34       ` Tejun Heo
2010-11-15 21:16         ` Ben Greear
2010-11-16 14:19           ` Tejun Heo
2010-11-16 16:51             ` Ben Greear
2010-11-17  8:55               ` Tejun Heo
2010-11-17 17:37                 ` Ben Greear [this message]
2010-11-16 17:40             ` Johannes Berg
2010-11-17  8:47               ` Tejun Heo
2010-11-17 18:53                 ` Johannes Berg
2010-11-17 18:59                   ` Ben Greear
2010-11-17 19:03                     ` Johannes Berg
2010-11-18  6:34                   ` Tejun Heo
2010-11-18  7:07                     ` Johannes Berg
2010-11-18  7:22                       ` Tejun Heo
2010-11-18 16:59                         ` Johannes Berg
2010-11-19 14:34                           ` Tejun Heo
2010-11-19 17:57                             ` Johannes Berg
2010-11-19 20:55                               ` Ben Greear
2010-11-19 22:27                                 ` Luis R. Rodriguez
2010-12-08 17:36                                   ` Ben Greear
2010-12-08 18:19                                     ` Ben Greear
2010-12-08 18:28                                       ` Ben Greear
2010-12-09 14:34                                         ` Tejun Heo
2010-12-09 14:42                                           ` Johannes Berg
2010-12-09 14:46                                             ` Tejun Heo
2010-12-09 16:17                                               ` Tejun Heo
     [not found]                                                 ` <4D0156F6.4000306@candelate ch.com>
2010-12-09 17:27                                                 ` Ben Greear
2010-12-09 22:23                                                 ` Ben Greear
2010-12-10 15:11                                                   ` Tejun Heo
2010-12-10 16:35                                                     ` Ben Greear
2010-11-18 17:55                         ` Ben Greear
2010-11-18 18:04                           ` Tejun Heo
2010-11-18 18:11                             ` Ben Greear
2010-11-17 20:13             ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE412D4.7030406@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).