Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Johannes Berg @ 2010-06-09 12:27 UTC (permalink / raw)
  To: Florian Mickler
  Cc: pm list, james.bottomley-l3A5Bk7waGM,
	markgross-FexmcLCpFRVAfugRpC6u6w, mgross-VuQAYsv1563Yd54FQh9/CA,
	John W. Linville, David S. Miller, Javier Cardona, Jouni Malinen,
	Rui Paulo, Kalle Valo, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner
In-Reply-To: <20100609141643.14e9aedc-mGsOIKOveelVRbCss4o9kg@public.gmane.org>

On Wed, 2010-06-09 at 14:16 +0200, Florian Mickler wrote:

> That was also my first idea, but then I thought about qos and thought
> atomic notification are necessary.
> Do you see any value in having atomic notification? 
> 
> I have the following situation before my eyes:
> 
> 	Driver A gets an interrupt and needs (to service that
> 	interrupt) the cpu to guarantee a latency of X because the
> 	device is a bit icky.
> 
> Now, in that situation, if we don't immediately (without scheduling in
> between) notify the system to be in that latency-mode the driver won't
> function properly. Is this a realistic scene?
> 
> At the moment we only have process context notification and only 2
> listeners.
> 
> I think providing for atomic as well as "relaxed" notification could be
> useful. 
> 
> If atomic notification is deemed unnecessary, I have no
> problems to just use schedule_work() in update request.
> Anyway, it is probably best to split this. I.e. first make
> update_request callable from atomic contexts with doing the
> schedule_work in update_request and then
> as an add on provide for constraints_objects with atomic notifications.

Well I remember http://thread.gmane.org/gmane.linux.kernel/979935 where
Mark renamed things to "request" which seems to imply to me more of a
"please do this" than "I NEED IT NOW!!!!!".

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Florian Mickler @ 2010-06-09 12:16 UTC (permalink / raw)
  To: Johannes Berg
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev, Thomas Gleixner
In-Reply-To: <1276080128.14580.5.camel@jlt3.sipsolutions.net>

On Wed, 09 Jun 2010 12:42:08 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Wed, 2010-06-09 at 12:20 +0200, Florian Mickler wrote:
> 
> > A third possibility would be to make it dependent on the
> > type of the constraint, if blocking notifiers are allowed or not. 
> > But that would sacrifice API consistency (update_request for one
> > constraint is allowed to be called in interrupt context and
> > update_request for another would be not).
> 
> I don't see what's wrong with the fourth possibility: Allow calling
> pm_qos_update_request() from atomic context, but change _it_ to schedule
> off a work that calls the blocking notifier chain. That avoids the
> complexity in notify-API users since they have process context, and also
> in request-API users since they can call it from any context.
> 
> johannes

That was also my first idea, but then I thought about qos and thought
atomic notification are necessary.
Do you see any value in having atomic
notification? 

I have the following situation before my eyes:

	Driver A gets an interrupt and needs (to service that
	interrupt) the cpu to guarantee a latency of X because the
	device is a bit icky.

Now, in that situation, if we don't immediately (without scheduling in
between) notify the system to be in that latency-mode the driver won't
function properly. Is this a realistic scene?

At the moment we only have process context notification and only 2
listeners.

I think providing for atomic as well as "relaxed" notification could be
useful. 

If atomic notification is deemed unnecessary, I have no
problems to just use schedule_work() in update request.
Anyway, it is probably best to split this. I.e. first make
update_request callable from atomic contexts with doing the
schedule_work in update_request and then
as an add on provide for constraints_objects with atomic notifications.

Flo

^ permalink raw reply

* Re: [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Eric Dumazet @ 2010-06-09 12:09 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <20100609104100.GB7308@ff.dom.local>

Le mercredi 09 juin 2010 à 10:41 +0000, Jarek Poplawski a écrit :

> Spelling suggestion: xt_rateest_free_rcu?
> 
> Since it's only 3 places I'd consider adding little comments,
> who needs this call_rcu (like in qdisc_destroy).
> 
> Anyway this patch looks OK to me.
> 

Thanks for reviewing, you are right about typo and adding some comments.

What about following ?

[PATCH v2] pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4 
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h              |    2 ++
 include/net/netfilter/xt_rateest.h |    1 +
 net/core/gen_estimator.c           |    1 +
 net/netfilter/xt_RATEEST.c         |   12 +++++++++++-
 net/sched/act_api.c                |   11 ++++++++++-
 net/sched/act_police.c             |   12 +++++++++++-
 6 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c05fd71..bab385f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -20,6 +20,7 @@ struct tcf_common {
 	struct gnet_stats_queue		tcfc_qstats;
 	struct gnet_stats_rate_est	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
+	struct rcu_head			tcfc_rcu;
 };
 #define tcf_next	common.tcfc_next
 #define tcf_index	common.tcfc_index
@@ -32,6 +33,7 @@ struct tcf_common {
 #define tcf_qstats	common.tcfc_qstats
 #define tcf_rate_est	common.tcfc_rate_est
 #define tcf_lock	common.tcfc_lock
+#define tcf_rcu		common.tcfc_rcu
 
 struct tcf_police {
 	struct tcf_common	common;
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index ddbf37e..5e14277 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -9,6 +9,7 @@ struct xt_rateest {
 	struct gnet_estimator		params;
 	struct gnet_stats_rate_est	rstats;
 	struct gnet_stats_basic_packed	bstats;
+	struct rcu_head			rcu;
 };
 
 extern struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 785e527..9fbe7f7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -263,6 +263,7 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
+ * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 69c01e1..de079ab 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -60,13 +60,22 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
+static void xt_rateest_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct xt_rateest, rcu));
+}
+
 void xt_rateest_put(struct xt_rateest *est)
 {
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
 		gen_kill_estimator(&est->bstats, &est->rstats);
-		kfree(est);
+		/*
+		 * gen_estimator est_timer() might access est->lock or bstats,
+		 * wait a RCU grace period before freeing 'est'
+		 */
+		call_rcu(&est->rcu, xt_rateest_free_rcu);
 	}
 	mutex_unlock(&xt_rateest_mutex);
 }
@@ -179,6 +188,7 @@ static int __init xt_rateest_tg_init(void)
 static void __exit xt_rateest_tg_fini(void)
 {
 	xt_unregister_target(&xt_rateest_tg_reg);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s (xt_rateest_free_rcu) */
 }
 
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 972378f..23b25f8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -26,6 +26,11 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void tcf_common_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_common, tcfc_rcu));
+}
+
 void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 {
 	unsigned int h = tcf_hash(p->tcfc_index, hinfo->hmask);
@@ -38,7 +43,11 @@ void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 			write_unlock_bh(hinfo->lock);
 			gen_kill_estimator(&p->tcfc_bstats,
 					   &p->tcfc_rate_est);
-			kfree(p);
+			/*
+			 * gen_estimator est_timer() might access p->tcfc_lock
+			 * or bstats, wait a RCU grace period before freeing p
+			 */
+			call_rcu(&p->tcfc_rcu, tcf_common_free_rcu);
 			return;
 		}
 	}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 654f73d..537a487 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -97,6 +97,11 @@ nla_put_failure:
 	goto done;
 }
 
+static void tcf_police_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_police, tcf_rcu));
+}
+
 static void tcf_police_destroy(struct tcf_police *p)
 {
 	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
@@ -113,7 +118,11 @@ static void tcf_police_destroy(struct tcf_police *p)
 				qdisc_put_rtab(p->tcfp_R_tab);
 			if (p->tcfp_P_tab)
 				qdisc_put_rtab(p->tcfp_P_tab);
-			kfree(p);
+			/*
+			 * gen_estimator est_timer() might access p->tcf_lock
+			 * or bstats, wait a RCU grace period before freeing p
+			 */
+			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
 			return;
 		}
 	}
@@ -397,6 +406,7 @@ static void __exit
 police_cleanup_module(void)
 {
 	tcf_unregister_action(&act_police_ops);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s (tcf_police_free_rcu) */
 }
 
 module_init(police_init_module);



^ permalink raw reply related

* Re: [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
From: Eric Dumazet @ 2010-06-09 11:55 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <20100609113329.GC7308@ff.dom.local>

Le mercredi 09 juin 2010 à 11:33 +0000, Jarek Poplawski a écrit :
> On Wed, Jun 09, 2010 at 11:39:10AM +0200, Eric Dumazet wrote:
> ...
> > Here is v2 of patch, adding protection in gen_estimator_active() as
> > well.
> > 
> > [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
> > 
> > gen_kill_estimator() / gen_new_estimator() is not always called with
> > RTNL held.
> > 
> > net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> > RTNL, so random corruptions can occur between "tc" and "iptables".
> > 
> > Add a new fine grained lock instead of trying to use RTNL in netfilter.
> 
> Btw, maybe it's a different case, but RTNL happens in netfilter already
> (nf_conntrack_helper.c, nf_conntrack_proto.c).
> 
> It would be nice to mention Changli's argument that
> gen_replace_estimator isn't atomic operation after this change.
> 

See my next comment. This function is still used in qdisc domain only.
We could add ASSERT_RTNL() here, but its not a bug fix...

> > @@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
> >  bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
> >  			  const struct gnet_stats_rate_est *rate_est)
> >  {
> > +	bool res;
> > +
> >  	ASSERT_RTNL();
> 
> This line should go away as well.
> 

Nope. This is really needed for safety. Of course, its a debugging knob
and you can remove it if you trust all callers to be good.

Caller should still use RTNL here, or else, as soon as we exit from
gen_estimator_active() with a 'true' result, some other qdisc user could
destroy the estimator.

Fortunatly up to 2.6.35, xt_RATEEST cannot reuse/delete an estimator
created by a qdisc user, and it doesnt use gen_estimator_active().

In a future patch, we should add RCU safety here instead of ASSERT_RTNL.
Thats to make sure caller of gen_estimator_active() is inside a
rcu_read_lock() section, and estimator cannot disappear under it, even
if another cpu/thread calls gen_kill_estimator() on same estimator.

^ permalink raw reply

* Re: [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
From: Jarek Poplawski @ 2010-06-09 11:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276076350.2442.90.camel@edumazet-laptop>

On Wed, Jun 09, 2010 at 11:39:10AM +0200, Eric Dumazet wrote:
...
> Here is v2 of patch, adding protection in gen_estimator_active() as
> well.
> 
> [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables".
> 
> Add a new fine grained lock instead of trying to use RTNL in netfilter.

Btw, maybe it's a different case, but RTNL happens in netfilter already
(nf_conntrack_helper.c, nf_conntrack_proto.c).

It would be nice to mention Changli's argument that
gen_replace_estimator isn't atomic operation after this change.

> @@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
>  bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
>  			  const struct gnet_stats_rate_est *rate_est)
>  {
> +	bool res;
> +
>  	ASSERT_RTNL();

This line should go away as well.

I'm OK with this patch unless there is an alternative xt_RATEEST RTNL
fix available.

Jarek P.

^ permalink raw reply

* Re: [v2 Patch 2/2] mlx4: add dynamic LRO disable support
From: Stanislaw Gruszka @ 2010-06-09 11:05 UTC (permalink / raw)
  To: Amerigo Wang, David Miller
  Cc: netdev, nhorman, herbert.xu, bhutchings, Ramkrishna.Vepa, davem
In-Reply-To: <20100609100938.6573.73536.sendpatchset@localhost.localdomain>

On Wed, Jun 09, 2010 at 06:05:34AM -0400, Amerigo Wang wrote:
> diff --git a/drivers/net/mlx4/en_ethtool.c b/drivers/net/mlx4/en_ethtool.c
> index d5afd03..2c77805 100644
> --- a/drivers/net/mlx4/en_ethtool.c
> +++ b/drivers/net/mlx4/en_ethtool.c
> @@ -387,6 +387,37 @@ static void mlx4_en_get_ringparam(struct net_device *dev,
>  	param->tx_pending = mdev->profile.prof[priv->port].tx_ring_size;
>  }
>  
> +static int mlx4_ethtool_op_set_flags(struct net_device *dev, u32 data)
> +{
> +	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct mlx4_en_dev *mdev = priv->mdev;
> +	int rc = 0;
> +	int changed = 0;
> +
> +	if (data & ETH_FLAG_LRO) {
> +		if (!(dev->features & NETIF_F_LRO)) {
> +			dev->features |= NETIF_F_LRO;
> +			changed = 1;
> +			mdev->profile.num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS);

I do not understand why you override mdev->profile.num_lro in v2 patch.
If in Rx patch NETIF_F_LRO flag is used we do not need this IMHO.

> +		}
> +	} else if (dev->features & NETIF_F_LRO) {
> +		dev->features &= ~NETIF_F_LRO;
> +		changed = 1;
> +		mdev->profile.num_lro = 0;
> +	}
[snip]
>  const struct ethtool_ops mlx4_en_ethtool_ops = {
>  	.get_drvinfo = mlx4_en_get_drvinfo,
>  	.get_settings = mlx4_en_get_settings,
> @@ -415,7 +446,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
>  	.get_ringparam = mlx4_en_get_ringparam,
>  	.set_ringparam = mlx4_en_set_ringparam,
>  	.get_flags = ethtool_op_get_flags,
> -	.set_flags = ethtool_op_set_flags,
> +	.set_flags = mlx4_ethtool_op_set_flags,
>  };

Since we modify .set_flags, please assure we return -EOPNOTSUPP
if someone will try to setup ETH_FLAG_NTUPLE and ETH_FLAG_RXHASH.

BTW: seems default ethtool_op_set_flags introduce a bug on many
devices regarding ETH_FLAG_RXHASH. I think default should
be EOPNOTSUPP, and these few devices that actually support RXHASH
should have custom ethtool_ops->set_flags

Stanislaw

^ permalink raw reply

* Re: [Patch 2/2] mlx4: add dynamic LRO disable support
From: Stanislaw Gruszka @ 2010-06-09 10:49 UTC (permalink / raw)
  To: Cong Wang; +Cc: Ben Hutchings, netdev, herbert.xu, nhorman, davem
In-Reply-To: <4C0F5D97.5030605@redhat.com>

Hi Amerigo

Sorry for being silent in this thread before.

On Wed, Jun 09, 2010 at 05:23:35PM +0800, Cong Wang wrote:
>>>> Is that flag test actually unsafe - and if so, how is testing num_lro
>>>> any better?  Perhaps access to net_device::features should be wrapped
>>>> with ACCESS_ONCE() to ensure that reads and writes are atomic.
>>>>
>>>
>>> At least, I don't find there is any race with 'num_lro', thus
>>> no lock is needed.
>>
>> In both cases there is a race condition but it is harmless so long as
>> the read and the write are atomic.  There is a general assumption in
>> networking code that this is the case for int and long.  Personally I
>> would prefer to see this made explicit using ACCESS_ONCE(), but I don't
>> see any specific problem in mlx4 (not that I'm familiar with this driver
>> either).
>
> I read this email again.
>
> I think you misunderstood the race condition here. Even read and write
> are atomic here, the race still exists. One can just set NETIF_F_LRO
> asynchronously right before mlx4 check this flag in mlx4_en_process_rx_cq()
> which doesn't take rtnl_lock.

If so, it's better to stop device before modify LRO settings. I suggest
something like that in mlx4_ethtool_op_set_flags:

if (!!(data & ETH_FLAG_LRO) != !!(dev->features & NETIF_F_LRO)) {
	/* Need to toggle LRO */

	if (netdev_running(dev)) {
               mutex_lock(&mdev->state_lock);
               mlx4_en_stop_port(dev);
               rc = mlx4_en_start_port(dev);
               if (rc)
                       en_err(priv, "Failed to restart port\n");
	}

	dev->features ^= NETIF_F_LRO;

	if (netdev_running(dev))
               mutex_unlock(&mdev->state_lock);
}

Stanislaw

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Johannes Berg @ 2010-06-09 10:42 UTC (permalink / raw)
  To: Florian Mickler
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev, Thomas Gleixner
In-Reply-To: <20100609122050.1dd18132@schatten.dmk.lab>

On Wed, 2010-06-09 at 12:20 +0200, Florian Mickler wrote:
> On Wed, 09 Jun 2010 11:38:07 +0200
> Johannes Berg <johannes@sipsolutions.net> wrote:
> 
> > On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> > > In order to have the pm_qos framework be callable from interrupt
> > > context, all listeners have to also be callable in that context.
> > 
> > That makes no sense at all. Why add work structs _everywhere_ in the
> > callees and make the API harder to use and easy to get wrong completely,
> > instead of just adding a single work struct that will be queued from the
> > caller and dealing with the locking complexity etc. just once.

> There are only two listeners at the moment. I suspect that most future
> uses of the framework need to be atomic, as the driver that
> requests a specific quality of service probably doesn't want to get into
> races with the provider of that service(listener). So i suspected the
> network listener to be the special case.  

Well even if it doesn't _want_ to race with it, a lot of drivers like
USB drivers etc. can't really do anything without deferring to a
workqueue.

And what's the race anyway? You get one update, defer the work, and if
another update happens inbetween you just read the new value when the
work finally runs -- and you end up doing it only once instead of twice.
That doesn't seem like a problem.

> The race between service-provider and qos-requester for non-atomic
> contextes is already there, isn't it? so, locking complexity shouldn't
> be worse than before.

I have no idea how it works now? I thought you can't request an update
from an atomic context.

However, if you request a QoS value, it is fundamentally that -- a
request. There's no guarantee as to when or how it will be honoured.

> But my first approach to this is seen here:
> https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026902.html

Icky too.

> A third possibility would be to make it dependent on the
> type of the constraint, if blocking notifiers are allowed or not. 
> But that would sacrifice API consistency (update_request for one
> constraint is allowed to be called in interrupt context and
> update_request for another would be not).

I don't see what's wrong with the fourth possibility: Allow calling
pm_qos_update_request() from atomic context, but change _it_ to schedule
off a work that calls the blocking notifier chain. That avoids the
complexity in notify-API users since they have process context, and also
in request-API users since they can call it from any context.

johannes

^ permalink raw reply

* Re: [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Jarek Poplawski @ 2010-06-09 10:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276077416.2442.97.camel@edumazet-laptop>

On Wed, Jun 09, 2010 at 11:56:56AM +0200, Eric Dumazet wrote:
...
> Updated patch follows :
> 
> [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
> 
> gen_kill_estimator() API is a bit misleading, since caller

I'd call it incomplete or simply buggy. Plus a tiny note below.

> should make sure an RCU grace period is respected before
> freeing stats_lock.
> 
> This was partially addressed in commit 5d944c640b4 
> (gen_estimator: deadlock fix), but same problem exist for all
> gen_kill_estimator() users, if lock they use is not already rcu
> protected.
> 
> A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
> problem. Other are ok because they use qdisc lock.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
...
> diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
> index 69c01e1..096b18b 100644
> --- a/net/netfilter/xt_RATEEST.c
> +++ b/net/netfilter/xt_RATEEST.c
> @@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
>  }
>  EXPORT_SYMBOL_GPL(xt_rateest_lookup);
>  
> +static void xt_rateeset_free_rcu(struct rcu_head *head)
> +{
> +	kfree(container_of(head, struct xt_rateest, rcu));
> +}
> +
>  void xt_rateest_put(struct xt_rateest *est)
>  {
>  	mutex_lock(&xt_rateest_mutex);
>  	if (--est->refcnt == 0) {
>  		hlist_del(&est->list);
>  		gen_kill_estimator(&est->bstats, &est->rstats);
> -		kfree(est);
> +		call_rcu(&est->rcu, xt_rateeset_free_rcu);

Spelling suggestion: xt_rateest_free_rcu?

Since it's only 3 places I'd consider adding little comments,
who needs this call_rcu (like in qdisc_destroy).

Anyway this patch looks OK to me.

Jarek P.

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Florian Mickler @ 2010-06-09 10:20 UTC (permalink / raw)
  To: Johannes Berg
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev, Thomas Gleixner
In-Reply-To: <1276076287.3727.15.camel@jlt3.sipsolutions.net>

On Wed, 09 Jun 2010 11:38:07 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> > In order to have the pm_qos framework be callable from interrupt
> > context, all listeners have to also be callable in that context.
> 
> That makes no sense at all. Why add work structs _everywhere_ in the
> callees and make the API harder to use and easy to get wrong completely,
> instead of just adding a single work struct that will be queued from the
> caller and dealing with the locking complexity etc. just once.
> 
> johannes

Just to defend this approach, but I'm certainly not married to it
(hence RFC):

There are only two listeners at the moment. I suspect that most future
uses of the framework need to be atomic, as the driver that
requests a specific quality of service probably doesn't want to get into
races with the provider of that service(listener). So i suspected the
network listener to be the special case.  

The race between service-provider and qos-requester for non-atomic
contextes is already there, isn't it? so, locking complexity shouldn't
be worse than before.

But my first approach to this is seen here:
https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026902.html

A third possibility would be to make it dependent on the
type of the constraint, if blocking notifiers are allowed or not. 
But that would sacrifice API consistency (update_request for one
constraint is allowed to be called in interrupt context and
update_request for another would be not).

Cheers,
Flo

^ permalink raw reply

* [v2 Patch 2/2] mlx4: add dynamic LRO disable support
From: Amerigo Wang @ 2010-06-09 10:05 UTC (permalink / raw)
  To: netdev
  Cc: nhorman, sgruszka, herbert.xu, Amerigo Wang, bhutchings,
	Ramkrishna.Vepa, davem
In-Reply-To: <20100609100928.6573.14199.sendpatchset@localhost.localdomain>

This patch adds dynamic LRO diable support for mlx4 net driver.
It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
path without rtnl lock.

I don't have mlx4 card, so only did compiling test. Anyone who wants
to test this is more than welcome.

This is based on Neil's initial work too.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>

---

diff --git a/drivers/net/mlx4/en_ethtool.c b/drivers/net/mlx4/en_ethtool.c
index d5afd03..2c77805 100644
--- a/drivers/net/mlx4/en_ethtool.c
+++ b/drivers/net/mlx4/en_ethtool.c
@@ -387,6 +387,37 @@ static void mlx4_en_get_ringparam(struct net_device *dev,
 	param->tx_pending = mdev->profile.prof[priv->port].tx_ring_size;
 }
 
+static int mlx4_ethtool_op_set_flags(struct net_device *dev, u32 data)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	int rc = 0;
+	int changed = 0;
+
+	if (data & ETH_FLAG_LRO) {
+		if (!(dev->features & NETIF_F_LRO)) {
+			dev->features |= NETIF_F_LRO;
+			changed = 1;
+			mdev->profile.num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS);
+		}
+	} else if (dev->features & NETIF_F_LRO) {
+		dev->features &= ~NETIF_F_LRO;
+		changed = 1;
+		mdev->profile.num_lro = 0;
+	}
+
+	if (changed && netif_running(dev)) {
+		mutex_lock(&mdev->state_lock);
+		mlx4_en_stop_port(dev);
+		rc = mlx4_en_start_port(dev);
+		if (rc)
+			en_err(priv, "Failed to restart port\n");
+		mutex_unlock(&mdev->state_lock);
+	}
+
+	return rc;
+}
+
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
 	.get_settings = mlx4_en_get_settings,
@@ -415,7 +446,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_ringparam = mlx4_en_get_ringparam,
 	.set_ringparam = mlx4_en_set_ringparam,
 	.get_flags = ethtool_op_get_flags,
-	.set_flags = ethtool_op_set_flags,
+	.set_flags = mlx4_ethtool_op_set_flags,
 };
 
 
diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c
index cbabf14..4ef42e2 100644
--- a/drivers/net/mlx4/en_main.c
+++ b/drivers/net/mlx4/en_main.c
@@ -70,8 +70,9 @@ MLX4_EN_PARM_INT(rss_xor, 0, "Use XOR hash function for RSS");
 MLX4_EN_PARM_INT(rss_mask, 0xf, "RSS hash type bitmask");
 
 /* Number of LRO sessions per Rx ring (rounded up to a power of two) */
-MLX4_EN_PARM_INT(num_lro, MLX4_EN_MAX_LRO_DESCRIPTORS,
-		 "Number of LRO sessions per ring or disabled (0)");
+unsigned int num_lro = MLX4_EN_MAX_LRO_DESCRIPTORS;
+module_param(num_lro , uint, 0444);
+MODULE_PARM_DESC(num_lro, "Number of LRO sessions per ring or disabled (0)");
 
 /* Priority pausing */
 MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]."
diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h
index b55e46c..bbac975 100644
--- a/drivers/net/mlx4/mlx4_en.h
+++ b/drivers/net/mlx4/mlx4_en.h
@@ -568,4 +568,6 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset);
  * Globals
  */
 extern const struct ethtool_ops mlx4_en_ethtool_ops;
+
+extern unsigned int num_lro;
 #endif

^ permalink raw reply related

* [v2 Patch 1/2] s2io: add dynamic LRO disable support
From: Amerigo Wang @ 2010-06-09 10:05 UTC (permalink / raw)
  To: netdev
  Cc: nhorman, sgruszka, herbert.xu, Amerigo Wang, bhutchings,
	Ramkrishna.Vepa, davem


This patch adds dynamic LRO diable support for s2io net driver.

I don't have s2io card, so only did compiling test. Anyone who wants
to test this is more than welcome.

This is based on Neil's initial work.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ramkrishna Vepa <Ramkrishna.Vepa@exar.com>

---

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 668327c..e558aa7 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -795,7 +795,6 @@ static int init_shared_mem(struct s2io_nic *nic)
 		ring->rx_curr_put_info.ring_len = rx_cfg->num_rxd - 1;
 		ring->nic = nic;
 		ring->ring_no = i;
-		ring->lro = lro_enable;
 
 		blk_cnt = rx_cfg->num_rxd / (rxd_count[nic->rxd_mode] + 1);
 		/*  Allocating all the Rx blocks */
@@ -6684,6 +6683,35 @@ static int s2io_ethtool_op_set_tso(struct net_device *dev, u32 data)
 
 	return 0;
 }
+static int s2io_ethtool_set_flags(struct net_device *dev, u32 data)
+{
+	struct s2io_nic *sp = netdev_priv(dev);
+	int rc = 0;
+	int changed = 0;
+
+	if (data & ETH_FLAG_LRO) {
+		if (lro_enable) {
+			if (!(dev->features & NETIF_F_LRO)) {
+				dev->features |= NETIF_F_LRO;
+				changed = 1;
+			}
+		} else
+			rc = -EINVAL;
+	} else if (dev->features & NETIF_F_LRO) {
+		dev->features &= ~NETIF_F_LRO;
+		changed = 1;
+	}
+
+	if (changed && netif_running(dev)) {
+		s2io_stop_all_tx_queue(sp);
+		s2io_card_down(sp);
+		sp->lro = dev->features & NETIF_F_LRO;
+		rc = s2io_card_up(sp);
+		s2io_start_all_tx_queue(sp);
+	}
+
+	return rc;
+}
 
 static const struct ethtool_ops netdev_ethtool_ops = {
 	.get_settings = s2io_ethtool_gset,
@@ -6701,6 +6729,8 @@ static const struct ethtool_ops netdev_ethtool_ops = {
 	.get_rx_csum = s2io_ethtool_get_rx_csum,
 	.set_rx_csum = s2io_ethtool_set_rx_csum,
 	.set_tx_csum = s2io_ethtool_op_set_tx_csum,
+	.set_flags = s2io_ethtool_set_flags,
+	.get_flags = ethtool_op_get_flags,
 	.set_sg = ethtool_op_set_sg,
 	.get_tso = s2io_ethtool_op_get_tso,
 	.set_tso = s2io_ethtool_op_set_tso,
@@ -7229,6 +7259,7 @@ static int s2io_card_up(struct s2io_nic *sp)
 		struct ring_info *ring = &mac_control->rings[i];
 
 		ring->mtu = dev->mtu;
+		ring->lro = sp->lro;
 		ret = fill_rx_buffers(sp, ring, 1);
 		if (ret) {
 			DBG_PRINT(ERR_DBG, "%s: Out of memory in Open\n",

^ permalink raw reply related

* Re: [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Eric Dumazet @ 2010-06-09  9:56 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1276076415.2442.92.camel@edumazet-laptop>



Oops, I missed  kfree() deletion in  :


>  static void tcf_police_destroy(struct tcf_police *p)
>  {
>  	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
> @@ -113,6 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
>  				qdisc_put_rtab(p->tcfp_R_tab);
>  			if (p->tcfp_P_tab)
>  				qdisc_put_rtab(p->tcfp_P_tab);
> +			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
>  			kfree(p);
>  			return;

Updated patch follows :

[PATCH] pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is a bit misleading, since caller
should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4 
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already rcu
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h              |    2 ++
 include/net/netfilter/xt_rateest.h |    1 +
 net/core/gen_estimator.c           |    1 +
 net/netfilter/xt_RATEEST.c         |    8 +++++++-
 net/sched/act_api.c                |    7 ++++++-
 net/sched/act_police.c             |    8 +++++++-
 6 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c05fd71..bab385f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -20,6 +20,7 @@ struct tcf_common {
 	struct gnet_stats_queue		tcfc_qstats;
 	struct gnet_stats_rate_est	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
+	struct rcu_head			tcfc_rcu;
 };
 #define tcf_next	common.tcfc_next
 #define tcf_index	common.tcfc_index
@@ -32,6 +33,7 @@ struct tcf_common {
 #define tcf_qstats	common.tcfc_qstats
 #define tcf_rate_est	common.tcfc_rate_est
 #define tcf_lock	common.tcfc_lock
+#define tcf_rcu		common.tcfc_rcu
 
 struct tcf_police {
 	struct tcf_common	common;
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index ddbf37e..5e14277 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -9,6 +9,7 @@ struct xt_rateest {
 	struct gnet_estimator		params;
 	struct gnet_stats_rate_est	rstats;
 	struct gnet_stats_basic_packed	bstats;
+	struct rcu_head			rcu;
 };
 
 extern struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 785e527..9fbe7f7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -263,6 +263,7 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
+ * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 69c01e1..096b18b 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
+static void xt_rateeset_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct xt_rateest, rcu));
+}
+
 void xt_rateest_put(struct xt_rateest *est)
 {
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
 		gen_kill_estimator(&est->bstats, &est->rstats);
-		kfree(est);
+		call_rcu(&est->rcu, xt_rateeset_free_rcu);
 	}
 	mutex_unlock(&xt_rateest_mutex);
 }
@@ -179,6 +184,7 @@ static int __init xt_rateest_tg_init(void)
 static void __exit xt_rateest_tg_fini(void)
 {
 	xt_unregister_target(&xt_rateest_tg_reg);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 972378f..0e6fd2b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -26,6 +26,11 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void tcf_common_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_common, tcfc_rcu));
+}
+
 void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 {
 	unsigned int h = tcf_hash(p->tcfc_index, hinfo->hmask);
@@ -38,7 +43,7 @@ void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 			write_unlock_bh(hinfo->lock);
 			gen_kill_estimator(&p->tcfc_bstats,
 					   &p->tcfc_rate_est);
-			kfree(p);
+			call_rcu(&p->tcfc_rcu, tcf_common_free_rcu);
 			return;
 		}
 	}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 654f73d..04929ce 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -97,6 +97,11 @@ nla_put_failure:
 	goto done;
 }
 
+static void tcf_police_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_police, tcf_rcu));
+}
+
 static void tcf_police_destroy(struct tcf_police *p)
 {
 	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
@@ -113,7 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
 				qdisc_put_rtab(p->tcfp_R_tab);
 			if (p->tcfp_P_tab)
 				qdisc_put_rtab(p->tcfp_P_tab);
-			kfree(p);
+			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
 			return;
 		}
 	}
@@ -397,6 +402,7 @@ static void __exit
 police_cleanup_module(void)
 {
 	tcf_unregister_action(&act_police_ops);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 module_init(police_init_module);



^ permalink raw reply related

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  9:54 UTC (permalink / raw)
  To: Herbert Xu, Stephen Hemminger
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, jdike@linux.intel.com
In-Reply-To: <20100608052744.GA21547@gondor.apana.org.au>

>-----Original Message-----
>From: Herbert Xu [mailto:herbert@gondor.apana.org.au]
>Sent: Tuesday, June 08, 2010 1:28 PM
>To: Stephen Hemminger
>Cc: Xin, Xiaohui; netdev@vger.kernel.org; kvm@vger.kernel.org;
>linux-kernel@vger.kernel.org; mst@redhat.com; mingo@elte.hu; davem@davemloft.net;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>On Sun, Jun 06, 2010 at 04:13:48PM -0700, Stephen Hemminger wrote:
>> Still not sure this is a good idea for a couple of reasons:
>>
>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>    and skb's travel through a lot of different parts of the kernel.  So any
>>    new change like this creates lots of exposed points for new bugs. Look
>>    at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>    and ppp and ...
>>
>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>    a fixed size pool in an external device, they can easily all get tied up
>>    if you have a slow listener. What happens then?
>
>I agree with Stephen on this.
>
>FWIW I don't think we even need the external pages concept in
>order to implement zero-copy receive (which I gather is the intent
>here).
>
>Here is one way to do it, simply construct a completely non-linear
>packet in the driver, as you would if you were using the GRO frags
>interface (grep for napi_gro_frags under drivers/net for examples).
>
I'm not sure if I understand your way correctly:
1) Does the way only deal with driver with SG feature? Since packet 
is non-linear...

2) Is skb->data still pointing to guest user buffers?
If yes, how to avoid the modifications to net core change to skb?

3) In our way only parts of drivers need be modified to support zero-copy.
and here, need we modify all the drivers?

If I missed your idea, may you explain it in more detail?

>This way you can transfer the entire contents of the packet without
>copying through to the other side, provided that the host stack does
>not modify the packet.
>

>If the host side did modify the packet then we have to incur the
>memory cost anyway.
>
>IOW I think the only feature provided by the external pages
>construct is allowing the skb->head area to be shared without
>copying.  I'm claiming that this can be done by simply making
>skb->head empty.
>
I think to make skb->head empty at first will cause more effort to pass the check with 
skb header. Have I missed something here? I really make the skb->head NULL
just before kfree(skb) in skb_release_data(), it's done by callback we have made for skb.

Thanks
Xiaohui

>Cheers,
>--
>Visit Openswan at http://www.openswan.org/
>Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
>Home Page: http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Eric Dumazet @ 2010-06-09  9:40 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1275929761.2545.159.camel@edumazet-laptop>

Le lundi 07 juin 2010 à 18:56 +0200, Eric Dumazet a écrit :

> 
> Even if its a bug correction, I cooked it for net-next-2.6 since bug
> probably never occured, and patch is too large to be sent to
> net-2.6/linux-2.6 before testing.
> 
> Another bug comes from net/netfilter/xt_RATEEST.c : It apparently
> calls gen_kill_estimator() / gen_new_estimator() without holding RTNL ?
> 
> So we should add another lock to protect things (est_root, elist[], ...)
> 
> David, I can send a net-2.6 patch for this one, since it should be small
> enough. If yes, I'll respin this patch of course ;)
> 
> Thanks
> 



Here is v3 of the patch, reduced to bare minimum.

This patch should be applied after previous one (pkt_sched:
gen_estimator: add a new lock)

Thanks


[PATCH] pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is a bit misleading, since caller
should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4 
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already rcu
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h              |    2 ++
 include/net/netfilter/xt_rateest.h |    1 +
 net/core/gen_estimator.c           |    1 +
 net/netfilter/xt_RATEEST.c         |    8 +++++++-
 net/sched/act_api.c                |    7 ++++++-
 net/sched/act_police.c             |    7 +++++++
 6 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c05fd71..bab385f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -20,6 +20,7 @@ struct tcf_common {
 	struct gnet_stats_queue		tcfc_qstats;
 	struct gnet_stats_rate_est	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
+	struct rcu_head			tcfc_rcu;
 };
 #define tcf_next	common.tcfc_next
 #define tcf_index	common.tcfc_index
@@ -32,6 +33,7 @@ struct tcf_common {
 #define tcf_qstats	common.tcfc_qstats
 #define tcf_rate_est	common.tcfc_rate_est
 #define tcf_lock	common.tcfc_lock
+#define tcf_rcu		common.tcfc_rcu
 
 struct tcf_police {
 	struct tcf_common	common;
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index ddbf37e..5e14277 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -9,6 +9,7 @@ struct xt_rateest {
 	struct gnet_estimator		params;
 	struct gnet_stats_rate_est	rstats;
 	struct gnet_stats_basic_packed	bstats;
+	struct rcu_head			rcu;
 };
 
 extern struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 785e527..9fbe7f7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -263,6 +263,7 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
+ * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 69c01e1..096b18b 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
+static void xt_rateeset_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct xt_rateest, rcu));
+}
+
 void xt_rateest_put(struct xt_rateest *est)
 {
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
 		gen_kill_estimator(&est->bstats, &est->rstats);
-		kfree(est);
+		call_rcu(&est->rcu, xt_rateeset_free_rcu);
 	}
 	mutex_unlock(&xt_rateest_mutex);
 }
@@ -179,6 +184,7 @@ static int __init xt_rateest_tg_init(void)
 static void __exit xt_rateest_tg_fini(void)
 {
 	xt_unregister_target(&xt_rateest_tg_reg);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 972378f..0e6fd2b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -26,6 +26,11 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void tcf_common_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_common, tcfc_rcu));
+}
+
 void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 {
 	unsigned int h = tcf_hash(p->tcfc_index, hinfo->hmask);
@@ -38,7 +43,7 @@ void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 			write_unlock_bh(hinfo->lock);
 			gen_kill_estimator(&p->tcfc_bstats,
 					   &p->tcfc_rate_est);
-			kfree(p);
+			call_rcu(&p->tcfc_rcu, tcf_common_free_rcu);
 			return;
 		}
 	}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 654f73d..9df45ed 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -97,6 +97,11 @@ nla_put_failure:
 	goto done;
 }
 
+static void tcf_police_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_police, tcf_rcu));
+}
+
 static void tcf_police_destroy(struct tcf_police *p)
 {
 	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
@@ -113,6 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
 				qdisc_put_rtab(p->tcfp_R_tab);
 			if (p->tcfp_P_tab)
 				qdisc_put_rtab(p->tcfp_P_tab);
+			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
 			kfree(p);
 			return;
 		}
@@ -397,6 +403,7 @@ static void __exit
 police_cleanup_module(void)
 {
 	tcf_unregister_action(&act_police_ops);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 module_init(police_init_module);



^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Jens Axboe @ 2010-06-09  9:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <20100609093249.GA17026@elte.hu>

On 2010-06-09 11:32, Ingo Molnar wrote:
>> This should be fixed by commit 28f4197e which was merged on friday.
> 
> The scheduler commit adding local_clock() (for .36) is:
> 
>   c676329: sched_clock: Add local_clock() API and improve documentation
> 
> So once that is upstream the block IO statistics code can use that.

Thanks, I'll have to make a note of that.

-- 
Jens Axboe


^ permalink raw reply

* [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
From: Eric Dumazet @ 2010-06-09  9:39 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1275931108.2545.168.camel@edumazet-laptop>

Le lundi 07 juin 2010 à 19:18 +0200, Eric Dumazet a écrit :


> [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables"
> 
> Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST


Here is v2 of patch, adding protection in gen_estimator_active() as
well.

[PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock

gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".

Add a new fine grained lock instead of trying to use RTNL in netfilter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/gen_estimator.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index cf8e703..785e527 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -107,6 +107,7 @@ static DEFINE_RWLOCK(est_lock);
 
 /* Protects against soft lockup during large deletion */
 static struct rb_root est_root = RB_ROOT;
+static DEFINE_SPINLOCK(est_tree_lock);
 
 static void est_timer(unsigned long arg)
 {
@@ -201,7 +202,6 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  * Returns 0 on success or a negative error code.
  *
- * NOTE: Called under rtnl_mutex
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 		      struct gnet_stats_rate_est *rate_est,
@@ -232,6 +232,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
+	spin_lock(&est_tree_lock);
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
 		setup_timer(&elist[idx].timer, est_timer, idx);
@@ -242,6 +243,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 
 	list_add_rcu(&est->list, &elist[idx].list);
 	gen_add_node(est);
+	spin_unlock(&est_tree_lock);
 
 	return 0;
 }
@@ -261,13 +263,13 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
- * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
 {
 	struct gen_estimator *e;
 
+	spin_lock(&est_tree_lock);
 	while ((e = gen_find_node(bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
 
@@ -278,6 +280,7 @@ void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 		list_del_rcu(&e->list);
 		call_rcu(&e->e_rcu, __gen_kill_estimator);
 	}
+	spin_unlock(&est_tree_lock);
 }
 EXPORT_SYMBOL(gen_kill_estimator);
 
@@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
 			  const struct gnet_stats_rate_est *rate_est)
 {
+	bool res;
+
 	ASSERT_RTNL();
 
-	return gen_find_node(bstats, rate_est) != NULL;
+	spin_lock(&est_tree_lock);
+	res = gen_find_node(bstats, rate_est) != NULL;
+	spin_unlock(&est_tree_lock);
+
+	return res;
 }
 EXPORT_SYMBOL(gen_estimator_active);



^ permalink raw reply related

* RE: [patch] caif: fix a couple range checks
From: Sjur BRENDELAND @ 2010-06-09  9:11 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: David S. Miller, netdev@vger.kernel.org,
	kernel-janitors@vger.kernel.org
In-Reply-To: <20100607145158.GO5483@bicker>

> From: Dan Carpenter [mailto:error27@gmail.com]
> 
> The extra ! character means that these conditions are always false.


Looks good, thanks. This is embarrassing, I should caught this ages ago.
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/net/caif/cfveil.c b/net/caif/cfveil.c
> index 0fd827f..e04f7d9 100644
> --- a/net/caif/cfveil.c
> +++ b/net/caif/cfveil.c
> @@ -84,7 +84,7 @@ static int cfvei_transmit(struct cflayer *layr,
> struct cfpkt *pkt)
>  		return ret;
>  	caif_assert(layr->dn != NULL);
>  	caif_assert(layr->dn->transmit != NULL);
> -	if (!cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
> +	if (cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
>  		pr_warning("CAIF: %s(): Packet too large - size=%d\n",
>  			   __func__, cfpkt_getlen(pkt));
>  		return -EOVERFLOW;
> diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
> index cd2830f..fd27b17 100644
> --- a/net/caif/cfrfml.c
> +++ b/net/caif/cfrfml.c
> @@ -83,7 +83,7 @@ static int cfrfml_transmit(struct cflayer *layr,
> struct cfpkt *pkt)
>  	if (!cfsrvl_ready(service, &ret))
>  		return ret;
> 
> -	if (!cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
> +	if (cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
>  		pr_err("CAIF: %s():Packet too large - size=%d\n",
>  			__func__, cfpkt_getlen(pkt));
>  		return -EOVERFLOW;

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Johannes Berg @ 2010-06-09  9:38 UTC (permalink / raw)
  To: florian
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev
In-Reply-To: <1276074915-26879-1-git-send-email-florian@mickler.org>

On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> In order to have the pm_qos framework be callable from interrupt
> context, all listeners have to also be callable in that context.

That makes no sense at all. Why add work structs _everywhere_ in the
callees and make the API harder to use and easy to get wrong completely,
instead of just adding a single work struct that will be queued from the
caller and dealing with the locking complexity etc. just once.

johannes

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Ingo Molnar @ 2010-06-09  9:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <4C0F4872.7090909@fusionio.com>


* Jens Axboe <jaxboe@fusionio.com> wrote:

> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> >> Date		: 2010-06-05 06:15 (4 days old)
> > 
> > This seems to have been introduced by
> > 
> > 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > 	Author: Ingo Molnar <mingo@elte.hu>
> > 	Date:   Sat Nov 8 17:05:38 2008 +0100
> > 
> > 	    sched: optimize sched_clock() a bit
> >     
> > 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > 	    variant of __cycles_2_ns().
> >     
> > 	    Most of the time sched_clock() is called with irqs disabled already.
> > 	    The few places that call it with irqs enabled need to be updated.
> >     
> > 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > 
> > and this seems to be one of those calling cases that need to be updated.
> > 
> > Ingo? The call trace is:
> > 
> > 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > 	caller is native_sched_clock+0x3c/0x68
> > 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > 	Call Trace:
> > 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> > 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > 	[<ffffffff81215537>] __make_request+0x338/0x45b
> > 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> > 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> > 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> > 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> > 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> > 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> > 
> > (from the bugzilla thing)
> 
> This should be fixed by commit 28f4197e which was merged on friday.

The scheduler commit adding local_clock() (for .36) is:

  c676329: sched_clock: Add local_clock() API and improve documentation

So once that is upstream the block IO statistics code can use that.

Thanks,

	Ingo

^ permalink raw reply

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  9:22 UTC (permalink / raw)
  To: Mitchell Erblich, Andi Kleen
  Cc: Stephen Hemminger, netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <40119F5E-A745-4C50-8053-D5A701266AD5@earthlink.net>

>-----Original Message-----
>From: Mitchell Erblich [mailto:erblichs@earthlink.net]
>Sent: Monday, June 07, 2010 4:17 PM
>To: Andi Kleen
>Cc: Stephen Hemminger; Xin, Xiaohui; netdev@vger.kernel.org; kvm@vger.kernel.org;
>linux-kernel@vger.kernel.org; mst@redhat.com; mingo@elte.hu; davem@davemloft.net;
>herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>
>On Jun 7, 2010, at 12:51 AM, Andi Kleen wrote:
>
>> Stephen Hemminger <shemminger@vyatta.com> writes:
>>
>>> Still not sure this is a good idea for a couple of reasons:
>>>
>>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>>   and skb's travel through a lot of different parts of the kernel.  So any
>>>   new change like this creates lots of exposed points for new bugs. Look
>>>   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>>   and ppp and ...
>>>
>>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>>   a fixed size pool in an external device, they can easily all get tied up
>>>   if you have a slow listener. What happens then?
>>
>> 3. If they come from an internal pool what happens when the kernel runs
>> low on memory? How is that pool balanced against other kernel
>> memory users?
>>
>> -Andi
>>
>> --
>> ak@linux.intel.com -- Speaking for myself only.
>
>In general,
>
>When an internal pool is created/used, their SHOULD be a reason.
>Maybe, to keep allocation latency to a min, OR ...
>
The internal pool here is a collection of user buffers submitted
by guest virtio-net driver. Guest put buffers here and driver get
buffers from it. If guest submit more buffers then driver needs,
we need somewhere to put the buffers, it's the internal pool here
to deal with. 

>Now IMO,
>
>internal pool objects should have a ref count and
>if that count is 0, then under memory pressure and/or num
>of objects are above a high water mark, then they are freed,
>
>OR if there is a last reference age field, then the object is to be
>cleaned if dirty, then freed,
>
>Else, the pool is allowed to grow if the number of objects in the
>pool is below a set max (max COULD equal Infinity).

Thanks for the thoughts.

Basically, the size of the internal pool is not decided by the pool itself,
To add/delete the objects in the pool is not a task of the pool itself too. 
It's decided by guest virtio-net driver and vhost-net driver both, and 
decided by the guest receive speed and submit speed.
The max size of the pool is limited by the virtqueue buffer numbers.

Thanks
Xiaohui

>
>
>
>Mitchell Erblich

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  9:22 UTC (permalink / raw)
  To: sedat.dilek-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Linus Torvalds, Kernel Testers List, Network Development,
	Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List,
	DRI, dmonakhov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <AANLkTiksbL1qHg7Q0A-6TbX0uUrxra4jctUoIGVk5vnE-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wednesday 09 June 2010, Sedat Dilek wrote:
> The patch from [1] is still missing.
> 
>    "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> Dmitry Monakhoc
> 
> Tested-by: Sedat Dilek <sedat.dilek-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Tested-by Maciej Rutecki <maciej.rutecki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> I have already reported this issue on LKML [2] and cpufreq ML [3].
> 
> - Sedat -
> 
> [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> [2] http://lkml.org/lkml/2010/5/31/77
> [3] http://www.spinics.net/lists/cpufreq/msg01637.html

Thanks, added.

Rafael

^ permalink raw reply

* Re: [Patch 2/2] mlx4: add dynamic LRO disable support
From: Cong Wang @ 2010-06-09  9:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, herbert.xu, nhorman, sgruszka, davem
In-Reply-To: <1275661552.2095.13.camel@achroite.uk.solarflarecom.com>

On 06/04/10 22:25, Ben Hutchings wrote:
> On Fri, 2010-06-04 at 09:56 +0800, Cong Wang wrote:
>> On 06/03/10 20:37, Ben Hutchings wrote:
>>> On Wed, 2010-06-02 at 23:39 -0400, Amerigo Wang wrote:
>>>> This patch adds dynamic LRO diable support for mlx4 net driver.
>>>> It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
>>>> path without rtnl lock.
>>> [...]
>>>
>>> Is that flag test actually unsafe - and if so, how is testing num_lro
>>> any better?  Perhaps access to net_device::features should be wrapped
>>> with ACCESS_ONCE() to ensure that reads and writes are atomic.
>>>
>>
>> At least, I don't find there is any race with 'num_lro', thus
>> no lock is needed.
>
> In both cases there is a race condition but it is harmless so long as
> the read and the write are atomic.  There is a general assumption in
> networking code that this is the case for int and long.  Personally I
> would prefer to see this made explicit using ACCESS_ONCE(), but I don't
> see any specific problem in mlx4 (not that I'm familiar with this driver
> either).

I read this email again.

I think you misunderstood the race condition here. Even read and write
are atomic here, the race still exists. One can just set NETIF_F_LRO
asynchronously right before mlx4 check this flag in mlx4_en_process_rx_cq()
which doesn't take rtnl_lock.

Also, I don't think ACCESS_ONCE() can make things atomic here.

Am I missing something?

Thanks.

^ permalink raw reply

* [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: florian @ 2010-06-09  9:15 UTC (permalink / raw)
  To: pm list
  Cc: james.bottomley, markgross, mgross, Florian Mickler,
	John W. Linville, Johannes Berg, David S. Miller, Javier Cardona,
	Jouni Malinen, Rui Paulo, Kalle Valo, linux-wireless,
	linux-kernel, netdev

In order to have the pm_qos framework be callable from interrupt
context, all listeners have to also be callable in that context.

This patch schedules the update of the latency value via
ieee80211_max_network_latency() for execution on
the global workqueue (using schedule_work()).

As there was no synchronization/locking between the listener and the
caller of pm_qos_update_request before, there should be no new races
uncovered by this. Although the timing probably changes.

Signed-off-by: Florian Mickler <florian@mickler.org>
---

This needs some networking expertise to check the reasoning above and
judge the implementation.

 net/mac80211/ieee80211_i.h |    4 +++-
 net/mac80211/main.c        |    5 +++--
 net/mac80211/mlme.c        |   20 +++++++++++++++-----
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 1a9e2da..3d2155a 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -851,6 +851,7 @@ struct ieee80211_local {
 	struct work_struct dynamic_ps_disable_work;
 	struct timer_list dynamic_ps_timer;
 	struct notifier_block network_latency_notifier;
+	struct work_struct network_latency_notifier_work;
 
 	int user_power_level; /* in dBm */
 	int power_constr_level; /* in dBm */
@@ -994,8 +995,9 @@ ieee80211_rx_result ieee80211_sta_rx_mgmt(struct ieee80211_sub_if_data *sdata,
 void ieee80211_send_pspoll(struct ieee80211_local *local,
 			   struct ieee80211_sub_if_data *sdata);
 void ieee80211_recalc_ps(struct ieee80211_local *local, s32 latency);
-int ieee80211_max_network_latency(struct notifier_block *nb,
+int ieee80211_max_network_latency_notification(struct notifier_block *nb,
 				  unsigned long data, void *dummy);
+void ieee80211_max_network_latency(struct work_struct *w);
 void ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata,
 				      struct ieee80211_channel_sw_ie *sw_elem,
 				      struct ieee80211_bss *bss,
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 22a384d..5cded3a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -607,9 +607,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
 	rtnl_unlock();
 
 	ieee80211_led_init(local);
-
+	INIT_WORK(&local->network_latency_notifier_work,
+			ieee80211_max_network_latency);
 	local->network_latency_notifier.notifier_call =
-		ieee80211_max_network_latency;
+		ieee80211_max_network_latency_notification;
 	result = pm_qos_add_notifier(PM_QOS_NETWORK_LATENCY,
 				     &local->network_latency_notifier);
 
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 0839c4e..feb6134 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1953,17 +1953,27 @@ void ieee80211_mlme_notify_scan_completed(struct ieee80211_local *local)
 	rcu_read_unlock();
 }
 
-int ieee80211_max_network_latency(struct notifier_block *nb,
-				  unsigned long data, void *dummy)
+void ieee80211_max_network_latency(struct work_struct *w)
 {
-	s32 latency_usec = (s32) data;
+	s32 latency_usec = (s32) atomic_long_read(&w->data);
 	struct ieee80211_local *local =
-		container_of(nb, struct ieee80211_local,
-			     network_latency_notifier);
+		container_of(w, struct ieee80211_local,
+			     network_latency_notifier_work);
 
 	mutex_lock(&local->iflist_mtx);
 	ieee80211_recalc_ps(local, latency_usec);
 	mutex_unlock(&local->iflist_mtx);
+}
+
+/* the notifier might be called from interrupt context */
+int ieee80211_max_network_latency_notification(struct notifier_block *nb,
+				  unsigned long data, void *dummy)
+{
+	struct ieee80211_local *local =
+		container_of(nb, struct ieee80211_local,
+				network_latency_notifier);
+	atomic_long_set(&local->network_latency_notifier_work.data, data);
+	schedule_work(&local->network_latency_notifier_work);
 
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  9:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Venkatesh Pallipadi, DRI, Mauro Carvalho Chehab, Eric Dumazet,
	Linux ACPI, Jens Axboe, Linux PM List, Linux Wireless List,
	Linux Kernel Mailing List, Jesse Barnes, Linux SCSI List,
	Carl Worth, Härdeman, Network Development, Ingo Molnar,
	Kernel Testers List, Andrew Morton, Maciej Rutecki
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506@i5.linux-foundation.org>

On Wednesday 09 June 2010, Linus Torvalds wrote:
> 
> [ Added lots of cc's to direct specific people to look at the regressions 
>   that may or may not be theirs... ]
> 
> On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
> >
> >  * Quite a few of the already reported regressions may be related to the bug
> >    fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
> >    bug in scrup, vt.c"), so reporters please retest with this commit applied.]
> 
> From a quick look, most of them look unrelated to that unfortunate bug. 
> It's hard to tell for sure, of course (memory corruption can do pretty 
> much anything), but I'm not seeing the 07200720.. pattern at least.
> 
> And some of them do seem to be bisected to likely culprits and/or have 
> patches that are claimed to have fixed them.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16163
> > Subject	: [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> > Submitter	: David John <davidjon@xenontk.org>
> > Date		: 2010-06-02 12:52 (7 days old)
> > Message-ID	: <4C065423.3000202@xenontk.org>
> > References	: http://marc.info/?l=linux-kernel&m=127548313828613&w=2
> 
> That has a "reverting the commit fixes it", and a confirmation from Nick 
> Bowler.
> 
> Eric, Carl: should I just revert that commit? Or do you have a fix?

This one is reported to have been fixed already.  Closed now.

...
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16104
> > Subject	: Radeon KMS does not start after merge of the new PM-Code
> > Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> > Date		: 2010-06-02 07:47 (7 days old)
> 
> This one also has a patch in Bugzilla, I think Airlie is just waiting to 
> calm down his queue and already removed the dependency on the temperature 
> code. DaveA?

This should be fixed by commit f8ed8b4c5d30b5214f185997131b06e35f6f7113, so
closing now.

Rafael

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
--

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox