* TXQ real_num_tx_queues comments/questions
@ 2008-07-11 9:47 Johannes Berg
[not found] ` <1215769627.3483.107.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2008-07-11 9:47 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]
Sorry for not replying in a proper thread, I didn't have a copy at hand
and didn't want to bump the old version. Apologies also for the long
mail, I'm staring at the code at the moment and it may not be the most
cohesive I've ever written.
One thing I just noticed that caused me to think about it a bit more is
that you didn't change netif_tx_{start,stop,wake}_all_queues to use
real_num_tx_queues instead of num_tx_queues. However, since those queues
don't actually get used, it doesn't actually matter whether they're
started or not.
Also related to real_num_tx_queues, you mentioned in the email (if that
restriction is kept it should be explained in a comment somewhere I
think) that it must not be changed while the device is operating. While
that restriction makes it obviously safe, mac80211 is already
effectively violating it, although it isn't setting real_num_tx_queues
at all. That makes me question that restriction a bit.
However, since real_num_tx_queues is only used by the TX hashing, it
would be completely safe to change real_num_tx_queues in an RCU-like
fashion if dev_pick_tx() was under the RCU lock in dev_queue_xmit(),
i.e. just a few lines later.
Back to mac80211. There, we are clearly not using real_num_tx_queues at
all, but we are using select_queue which has the same effect. Our
select_queue, however, isn't static, and it can decide to fill a
different number of queues depending on the aggregation states.
The only interesting situation is when we need to remove a queue since
for adding it we can prepare for that before we make select_queue use
it. Removing it is however, I think, still buggy. You removed most of
the queue locking and now just lock the single txq in ieee80211_requeue,
which seems fine, but when you look at ieee80211_ht_agg_queue_remove it
seems still racy:
select_queue uses the queue_pool bit, which is only cleared in
ieee80211_ht_agg_queue_remove right before ieee80211_requeue.
ieee80211_requeue then proceeds to take all packets out of the queue and
for that of course locks the queue.
But, as far as I can tell, it would be possible for another CPU to be in
select_queue at the same time, after having tested the queue_pool bit to
be set, but not having returned yet. Now, this CPU proceeds to enqueue
the packet to the stopped queue and tries to lock the queue lock for
that, having to wait because the first CPU is in ieee80211_requeue
already holding it. mac80211 finishes requeuing and finally assumes
nothing will come in on that queue any more, but the other CPU, now
getting the lock, will enqueue a frame to that queue.
I'm not entirely sure that my analsysis is correct because the mac80211
code is running in a tasklet, but dev_queue_xmit() only disables bhs
after using select_queue, and even that is only local bhs.
To fix it, I think we should move the queue selection under the rcu
lock. Then we can, in mac80211, instead of deferring the requeuing to
the tasklet, defer it to a work struct (i.e. process context) and
synchronize_rcu() between making select_queue no longer choose the queue
and requeuing from it.
Or we can do more complicated things with the queue_pool bits or an
extra spinlock for select_queue.
I like the RCU variant better, as it means we don't need a "central"
lock that is taken for all tx queues, and it also allows other drivers
to actually change real_num_tx_queues in a similar fashion, should that
ever be required.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
[not found] ` <1215769627.3483.107.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
@ 2008-07-15 9:16 ` David Miller
2008-07-15 9:25 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2008-07-15 9:16 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Fri, 11 Jul 2008 11:47:06 +0200
> One thing I just noticed that caused me to think about it a bit more is
> that you didn't change netif_tx_{start,stop,wake}_all_queues to use
> real_num_tx_queues instead of num_tx_queues. However, since those queues
> don't actually get used, it doesn't actually matter whether they're
> started or not.
Right, I don't think it matters.
> Also related to real_num_tx_queues, you mentioned in the email (if that
> restriction is kept it should be explained in a comment somewhere I
> think) that it must not be changed while the device is operating. While
> that restriction makes it obviously safe, mac80211 is already
> effectively violating it, although it isn't setting real_num_tx_queues
> at all. That makes me question that restriction a bit.
In the mac80211 implementation, you control everything.
mac80211's ->select_queue() is the only thing that selects
queue values.
> The only interesting situation is when we need to remove a queue since
> for adding it we can prepare for that before we make select_queue use
> it. Removing it is however, I think, still buggy. You removed most of
> the queue locking and now just lock the single txq in ieee80211_requeue,
> which seems fine, but when you look at ieee80211_ht_agg_queue_remove it
> seems still racy:
...
> I'm not entirely sure that my analsysis is correct because the mac80211
> code is running in a tasklet, but dev_queue_xmit() only disables bhs
> after using select_queue, and even that is only local bhs.
>
> To fix it, I think we should move the queue selection under the rcu
> lock. Then we can, in mac80211, instead of deferring the requeuing to
> the tasklet, defer it to a work struct (i.e. process context) and
> synchronize_rcu() between making select_queue no longer choose the queue
> and requeuing from it.
>
> Or we can do more complicated things with the queue_pool bits or an
> extra spinlock for select_queue.
>
> I like the RCU variant better, as it means we don't need a "central"
> lock that is taken for all tx queues, and it also allows other drivers
> to actually change real_num_tx_queues in a similar fashion, should that
> ever be required.
It seems to me that a simple synchronize_net() call near the end of
agg queue removal would solve your problem as-is, wouldn't it?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
2008-07-15 9:16 ` David Miller
@ 2008-07-15 9:25 ` Johannes Berg
[not found] ` <1216113923.3535.14.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2008-07-15 9:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 556 bytes --]
[...]
> > I like the RCU variant better, as it means we don't need a "central"
> > lock that is taken for all tx queues, and it also allows other drivers
> > to actually change real_num_tx_queues in a similar fashion, should that
> > ever be required.
>
> It seems to me that a simple synchronize_net() call near the end of
> agg queue removal would solve your problem as-is, wouldn't it?
Well, as far as I can tell we can't do that there because it's in
tasklet context/called under spinlocks so it has to be atomic. I think.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
[not found] ` <1216113923.3535.14.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
@ 2008-07-15 9:28 ` David Miller
[not found] ` <20080715.022854.55875400.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2008-07-15 9:28 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Tue, 15 Jul 2008 11:25:23 +0200
> > It seems to me that a simple synchronize_net() call near the end of
> > agg queue removal would solve your problem as-is, wouldn't it?
>
> Well, as far as I can tell we can't do that there because it's in
> tasklet context/called under spinlocks so it has to be atomic. I think.
Right, my bad.
It seems that there can end up being holes in the queue
space on mac80211, which slightly complicates matters.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
[not found] ` <20080715.022854.55875400.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-07-15 9:37 ` Johannes Berg
[not found] ` <1216114621.3535.18.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2008-07-15 9:37 UTC (permalink / raw)
To: David Miller
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]
On Tue, 2008-07-15 at 02:28 -0700, David Miller wrote:
> From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
> Date: Tue, 15 Jul 2008 11:25:23 +0200
>
> > > It seems to me that a simple synchronize_net() call near the end of
> > > agg queue removal would solve your problem as-is, wouldn't it?
> >
> > Well, as far as I can tell we can't do that there because it's in
> > tasklet context/called under spinlocks so it has to be atomic. I think.
>
> Right, my bad.
Also, I don't actually think it is sufficient since select_queue isn't
called under rcu lock, so even if we synchronize_net() somewhere there
we can still end up not having waited long enough, no?
> It seems that there can end up being holes in the queue
> space on mac80211, which slightly complicates matters.
Yes, but why does it complicate matters? From mac80211's POV it's just
that select_queue must have known about the removed queue when the
requeue is done. And we can defer the requeue to a workqueue.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
[not found] ` <1216114621.3535.18.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
@ 2008-07-15 9:39 ` David Miller
2008-07-15 9:49 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2008-07-15 9:39 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Tue, 15 Jul 2008 11:37:01 +0200
> From mac80211's POV it's just that select_queue must have known
> about the removed queue when the requeue is done. And we can defer
> the requeue to a workqueue.
Ok.
So you're proposing the requeue be deferred to a workqueue
and what else? RCU protection around real_num_tx_queues
access?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
2008-07-15 9:39 ` David Miller
@ 2008-07-15 9:49 ` Johannes Berg
[not found] ` <1216115349.3535.20.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2008-07-15 9:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 641 bytes --]
On Tue, 2008-07-15 at 02:39 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue, 15 Jul 2008 11:37:01 +0200
>
> > From mac80211's POV it's just that select_queue must have known
> > about the removed queue when the requeue is done. And we can defer
> > the requeue to a workqueue.
>
> Ok.
>
> So you're proposing the requeue be deferred to a workqueue
> and what else? RCU protection around real_num_tx_queues
> access?
Yes, I think we can fix it by deferring the requeue to a workqueue,
using synchronize_net() and putting RCU protection against the
select_queue call.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
[not found] ` <1216115349.3535.20.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
@ 2008-07-15 10:05 ` David Miller
2008-07-15 10:28 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2008-07-15 10:05 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Tue, 15 Jul 2008 11:49:09 +0200
> Yes, I think we can fix it by deferring the requeue to a workqueue,
> using synchronize_net() and putting RCU protection against the
> select_queue call.
I'm respinning my net-tx-2.6 tree at the moment and I'll
make patch 8 look like below.
When I get to the mac80211 bits I'll add the appropriate
synchronize_net() call.
netdev: Add netdev->select_queue() method.
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().
This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.
This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.
Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fdac115..9464e64 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -724,6 +724,9 @@ struct net_device
void (*poll_controller)(struct net_device *dev);
#endif
+ u16 (*select_queue)(struct net_device *dev,
+ struct sk_buff *skb);
+
#ifdef CONFIG_NET_NS
/* Network namespace this network device is inside */
struct net *nd_net;
diff --git a/net/core/dev.c b/net/core/dev.c
index f027a1a..7ca9564 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1670,6 +1670,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
{
u16 queue_index = 0;
+ if (dev->select_queue)
+ queue_index = dev->select_queue(dev, skb);
+
skb_set_queue_mapping(skb, queue_index);
return netdev_get_tx_queue(dev, queue_index);
}
@@ -1710,14 +1713,14 @@ int dev_queue_xmit(struct sk_buff *skb)
}
gso:
- txq = dev_pick_tx(dev, skb);
- spin_lock_prefetch(&txq->lock);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: TXQ real_num_tx_queues comments/questions
2008-07-15 10:05 ` David Miller
@ 2008-07-15 10:28 ` Johannes Berg
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Berg @ 2008-07-15 10:28 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 455 bytes --]
On Tue, 2008-07-15 at 03:05 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue, 15 Jul 2008 11:49:09 +0200
>
> > Yes, I think we can fix it by deferring the requeue to a workqueue,
> > using synchronize_net() and putting RCU protection against the
> > select_queue call.
>
> I'm respinning my net-tx-2.6 tree at the moment and I'll
> make patch 8 look like below.
Looks good to me, thanks.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-15 10:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 9:47 TXQ real_num_tx_queues comments/questions Johannes Berg
[not found] ` <1215769627.3483.107.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
2008-07-15 9:16 ` David Miller
2008-07-15 9:25 ` Johannes Berg
[not found] ` <1216113923.3535.14.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
2008-07-15 9:28 ` David Miller
[not found] ` <20080715.022854.55875400.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-07-15 9:37 ` Johannes Berg
[not found] ` <1216114621.3535.18.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
2008-07-15 9:39 ` David Miller
2008-07-15 9:49 ` Johannes Berg
[not found] ` <1216115349.3535.20.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>
2008-07-15 10:05 ` David Miller
2008-07-15 10:28 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).