From: Johannes Berg <johannes@sipsolutions.net>
To: Sven Eckelmann <sven@narfation.org>
Cc: linux-wireless@vger.kernel.org, Felix Fietkau <nbd@openwrt.org>,
Sven Eckelmann <sven@open-mesh.com>,
Arik Nemtsov <arik@wizery.com>,
liad.kaufman@intel.com
Subject: Re: [RFC v3] mac80211: lock rate control
Date: Thu, 12 Mar 2015 19:37:48 +0100 [thread overview]
Message-ID: <1426185468.1885.17.camel@sipsolutions.net> (raw)
In-Reply-To: <1426061656-17546-1-git-send-email-sven@narfation.org> (sfid-20150311_091501_300586_8F374DB3)
On Wed, 2015-03-11 at 09:14 +0100, Sven Eckelmann wrote:
> Since fixing all rate control algorithms will be very difficult,
> just provide locking for invocations. This protects the internal
> data structures the algorithms maintain.
So ... we've been running with this and found that it causes a potential
deadlock.
The sta->lock is held in many places (particularly mesh, but in one case
also for delBA) while transmitting frames, so this causes a
lock->rate_ctrl_lock dependency.
At the same time, that lock is acquired for aggregation state machines,
i.e. the exported function calls ieee80211_stop_rx_ba_session(),
ieee80211_start_tx_ba_session() and ieee80211_stop_tx_ba_session().
These are often called from the rate control algorithm (minstrel,
rtlwifi, iwlwifi, ...) and thus cause a rate_ctrl_lock->lock dependency.
Combined, this clearly creates potential for classic ABBA deadlocks.
I'm not really sure yet how to fix this. One way would be to avoid all
TX under the sta->lock, which is a relatively simple patch for the
aggregation case since there's just one place
(http://p.sipsolutions.net/f951b8da4e78112a.txt) but mesh is far more
complex and has many places that send frames while holding sta->lock.
Another way would to avoid all such things would be to just make all the
internal mac80211 TX asynchronous by punting to the TX tasklet we have
anyway, but that seems overkill.
Another option might be to let mesh use a different spinlock, but then
we'd have to be careful not to cause a lock->mesh_lock dependency since
that would again lead to a lock->mesh_lock->rate_ctrl_lock dependency,
which is still buggy since the aggregation code paths cause the other
dependency.
Yet another option might be to make the three mentioned aggregation
functions asynchronous entirely, but that might cause interesting races
in the aggregation state machines again, and some drivers even rely on
the return value which could obviously then not be given.
If anyone has any good ideas ...
johannes
next prev parent reply other threads:[~2015-03-12 18:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 8:14 [RFC v3] mac80211: lock rate control Sven Eckelmann
2015-03-11 8:22 ` Sven Eckelmann
2015-03-12 18:37 ` Johannes Berg [this message]
2015-04-13 21:26 ` Bob Copeland
2015-04-14 7:54 ` Johannes Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1426185468.1885.17.camel@sipsolutions.net \
--to=johannes@sipsolutions.net \
--cc=arik@wizery.com \
--cc=liad.kaufman@intel.com \
--cc=linux-wireless@vger.kernel.org \
--cc=nbd@openwrt.org \
--cc=sven@narfation.org \
--cc=sven@open-mesh.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).