Netdev List
 help / color / mirror / Atom feed
* Re: [Patch 2/2] mlx4: add dynamic LRO disable support
From: Stanislaw Gruszka @ 2010-06-09 10:49 UTC (permalink / raw)
  To: Cong Wang; +Cc: Ben Hutchings, netdev, herbert.xu, nhorman, davem
In-Reply-To: <4C0F5D97.5030605@redhat.com>

Hi Amerigo

Sorry for being silent in this thread before.

On Wed, Jun 09, 2010 at 05:23:35PM +0800, Cong Wang wrote:
>>>> Is that flag test actually unsafe - and if so, how is testing num_lro
>>>> any better?  Perhaps access to net_device::features should be wrapped
>>>> with ACCESS_ONCE() to ensure that reads and writes are atomic.
>>>>
>>>
>>> At least, I don't find there is any race with 'num_lro', thus
>>> no lock is needed.
>>
>> In both cases there is a race condition but it is harmless so long as
>> the read and the write are atomic.  There is a general assumption in
>> networking code that this is the case for int and long.  Personally I
>> would prefer to see this made explicit using ACCESS_ONCE(), but I don't
>> see any specific problem in mlx4 (not that I'm familiar with this driver
>> either).
>
> I read this email again.
>
> I think you misunderstood the race condition here. Even read and write
> are atomic here, the race still exists. One can just set NETIF_F_LRO
> asynchronously right before mlx4 check this flag in mlx4_en_process_rx_cq()
> which doesn't take rtnl_lock.

If so, it's better to stop device before modify LRO settings. I suggest
something like that in mlx4_ethtool_op_set_flags:

if (!!(data & ETH_FLAG_LRO) != !!(dev->features & NETIF_F_LRO)) {
	/* Need to toggle LRO */

	if (netdev_running(dev)) {
               mutex_lock(&mdev->state_lock);
               mlx4_en_stop_port(dev);
               rc = mlx4_en_start_port(dev);
               if (rc)
                       en_err(priv, "Failed to restart port\n");
	}

	dev->features ^= NETIF_F_LRO;

	if (netdev_running(dev))
               mutex_unlock(&mdev->state_lock);
}

Stanislaw

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Johannes Berg @ 2010-06-09 10:42 UTC (permalink / raw)
  To: Florian Mickler
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev, Thomas Gleixner
In-Reply-To: <20100609122050.1dd18132@schatten.dmk.lab>

On Wed, 2010-06-09 at 12:20 +0200, Florian Mickler wrote:
> On Wed, 09 Jun 2010 11:38:07 +0200
> Johannes Berg <johannes@sipsolutions.net> wrote:
> 
> > On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> > > In order to have the pm_qos framework be callable from interrupt
> > > context, all listeners have to also be callable in that context.
> > 
> > That makes no sense at all. Why add work structs _everywhere_ in the
> > callees and make the API harder to use and easy to get wrong completely,
> > instead of just adding a single work struct that will be queued from the
> > caller and dealing with the locking complexity etc. just once.


> There are only two listeners at the moment. I suspect that most future
> uses of the framework need to be atomic, as the driver that
> requests a specific quality of service probably doesn't want to get into
> races with the provider of that service(listener). So i suspected the
> network listener to be the special case.  

Well even if it doesn't _want_ to race with it, a lot of drivers like
USB drivers etc. can't really do anything without deferring to a
workqueue.

And what's the race anyway? You get one update, defer the work, and if
another update happens inbetween you just read the new value when the
work finally runs -- and you end up doing it only once instead of twice.
That doesn't seem like a problem.

> The race between service-provider and qos-requester for non-atomic
> contextes is already there, isn't it? so, locking complexity shouldn't
> be worse than before.

I have no idea how it works now? I thought you can't request an update
from an atomic context.

However, if you request a QoS value, it is fundamentally that -- a
request. There's no guarantee as to when or how it will be honoured.

> But my first approach to this is seen here:
> https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026902.html

Icky too.

> A third possibility would be to make it dependent on the
> type of the constraint, if blocking notifiers are allowed or not. 
> But that would sacrifice API consistency (update_request for one
> constraint is allowed to be called in interrupt context and
> update_request for another would be not).

I don't see what's wrong with the fourth possibility: Allow calling
pm_qos_update_request() from atomic context, but change _it_ to schedule
off a work that calls the blocking notifier chain. That avoids the
complexity in notify-API users since they have process context, and also
in request-API users since they can call it from any context.

johannes

^ permalink raw reply

* Re: [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Jarek Poplawski @ 2010-06-09 10:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276077416.2442.97.camel@edumazet-laptop>

On Wed, Jun 09, 2010 at 11:56:56AM +0200, Eric Dumazet wrote:
...
> Updated patch follows :
> 
> [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
> 
> gen_kill_estimator() API is a bit misleading, since caller

I'd call it incomplete or simply buggy. Plus a tiny note below.

> should make sure an RCU grace period is respected before
> freeing stats_lock.
> 
> This was partially addressed in commit 5d944c640b4 
> (gen_estimator: deadlock fix), but same problem exist for all
> gen_kill_estimator() users, if lock they use is not already rcu
> protected.
> 
> A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
> problem. Other are ok because they use qdisc lock.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
...
> diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
> index 69c01e1..096b18b 100644
> --- a/net/netfilter/xt_RATEEST.c
> +++ b/net/netfilter/xt_RATEEST.c
> @@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
>  }
>  EXPORT_SYMBOL_GPL(xt_rateest_lookup);
>  
> +static void xt_rateeset_free_rcu(struct rcu_head *head)
> +{
> +	kfree(container_of(head, struct xt_rateest, rcu));
> +}
> +
>  void xt_rateest_put(struct xt_rateest *est)
>  {
>  	mutex_lock(&xt_rateest_mutex);
>  	if (--est->refcnt == 0) {
>  		hlist_del(&est->list);
>  		gen_kill_estimator(&est->bstats, &est->rstats);
> -		kfree(est);
> +		call_rcu(&est->rcu, xt_rateeset_free_rcu);

Spelling suggestion: xt_rateest_free_rcu?

Since it's only 3 places I'd consider adding little comments,
who needs this call_rcu (like in qdisc_destroy).

Anyway this patch looks OK to me.

Jarek P.

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Florian Mickler @ 2010-06-09 10:20 UTC (permalink / raw)
  To: Johannes Berg
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev, Thomas Gleixner
In-Reply-To: <1276076287.3727.15.camel@jlt3.sipsolutions.net>

On Wed, 09 Jun 2010 11:38:07 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> > In order to have the pm_qos framework be callable from interrupt
> > context, all listeners have to also be callable in that context.
> 
> That makes no sense at all. Why add work structs _everywhere_ in the
> callees and make the API harder to use and easy to get wrong completely,
> instead of just adding a single work struct that will be queued from the
> caller and dealing with the locking complexity etc. just once.
> 
> johannes

Just to defend this approach, but I'm certainly not married to it
(hence RFC):

There are only two listeners at the moment. I suspect that most future
uses of the framework need to be atomic, as the driver that
requests a specific quality of service probably doesn't want to get into
races with the provider of that service(listener). So i suspected the
network listener to be the special case.  

The race between service-provider and qos-requester for non-atomic
contextes is already there, isn't it? so, locking complexity shouldn't
be worse than before.

But my first approach to this is seen here:
https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026902.html

A third possibility would be to make it dependent on the
type of the constraint, if blocking notifiers are allowed or not. 
But that would sacrifice API consistency (update_request for one
constraint is allowed to be called in interrupt context and
update_request for another would be not).

Cheers,
Flo

^ permalink raw reply

* [v2 Patch 2/2] mlx4: add dynamic LRO disable support
From: Amerigo Wang @ 2010-06-09 10:05 UTC (permalink / raw)
  To: netdev
  Cc: nhorman, sgruszka, herbert.xu, Amerigo Wang, bhutchings,
	Ramkrishna.Vepa, davem
In-Reply-To: <20100609100928.6573.14199.sendpatchset@localhost.localdomain>

This patch adds dynamic LRO diable support for mlx4 net driver.
It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
path without rtnl lock.

I don't have mlx4 card, so only did compiling test. Anyone who wants
to test this is more than welcome.

This is based on Neil's initial work too.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>

---

diff --git a/drivers/net/mlx4/en_ethtool.c b/drivers/net/mlx4/en_ethtool.c
index d5afd03..2c77805 100644
--- a/drivers/net/mlx4/en_ethtool.c
+++ b/drivers/net/mlx4/en_ethtool.c
@@ -387,6 +387,37 @@ static void mlx4_en_get_ringparam(struct net_device *dev,
 	param->tx_pending = mdev->profile.prof[priv->port].tx_ring_size;
 }
 
+static int mlx4_ethtool_op_set_flags(struct net_device *dev, u32 data)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	int rc = 0;
+	int changed = 0;
+
+	if (data & ETH_FLAG_LRO) {
+		if (!(dev->features & NETIF_F_LRO)) {
+			dev->features |= NETIF_F_LRO;
+			changed = 1;
+			mdev->profile.num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS);
+		}
+	} else if (dev->features & NETIF_F_LRO) {
+		dev->features &= ~NETIF_F_LRO;
+		changed = 1;
+		mdev->profile.num_lro = 0;
+	}
+
+	if (changed && netif_running(dev)) {
+		mutex_lock(&mdev->state_lock);
+		mlx4_en_stop_port(dev);
+		rc = mlx4_en_start_port(dev);
+		if (rc)
+			en_err(priv, "Failed to restart port\n");
+		mutex_unlock(&mdev->state_lock);
+	}
+
+	return rc;
+}
+
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
 	.get_settings = mlx4_en_get_settings,
@@ -415,7 +446,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_ringparam = mlx4_en_get_ringparam,
 	.set_ringparam = mlx4_en_set_ringparam,
 	.get_flags = ethtool_op_get_flags,
-	.set_flags = ethtool_op_set_flags,
+	.set_flags = mlx4_ethtool_op_set_flags,
 };
 
 
diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c
index cbabf14..4ef42e2 100644
--- a/drivers/net/mlx4/en_main.c
+++ b/drivers/net/mlx4/en_main.c
@@ -70,8 +70,9 @@ MLX4_EN_PARM_INT(rss_xor, 0, "Use XOR hash function for RSS");
 MLX4_EN_PARM_INT(rss_mask, 0xf, "RSS hash type bitmask");
 
 /* Number of LRO sessions per Rx ring (rounded up to a power of two) */
-MLX4_EN_PARM_INT(num_lro, MLX4_EN_MAX_LRO_DESCRIPTORS,
-		 "Number of LRO sessions per ring or disabled (0)");
+unsigned int num_lro = MLX4_EN_MAX_LRO_DESCRIPTORS;
+module_param(num_lro , uint, 0444);
+MODULE_PARM_DESC(num_lro, "Number of LRO sessions per ring or disabled (0)");
 
 /* Priority pausing */
 MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]."
diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h
index b55e46c..bbac975 100644
--- a/drivers/net/mlx4/mlx4_en.h
+++ b/drivers/net/mlx4/mlx4_en.h
@@ -568,4 +568,6 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset);
  * Globals
  */
 extern const struct ethtool_ops mlx4_en_ethtool_ops;
+
+extern unsigned int num_lro;
 #endif

^ permalink raw reply related

* [v2 Patch 1/2] s2io: add dynamic LRO disable support
From: Amerigo Wang @ 2010-06-09 10:05 UTC (permalink / raw)
  To: netdev
  Cc: nhorman, sgruszka, herbert.xu, Amerigo Wang, bhutchings,
	Ramkrishna.Vepa, davem


This patch adds dynamic LRO diable support for s2io net driver.

I don't have s2io card, so only did compiling test. Anyone who wants
to test this is more than welcome.

This is based on Neil's initial work.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ramkrishna Vepa <Ramkrishna.Vepa@exar.com>

---

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 668327c..e558aa7 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -795,7 +795,6 @@ static int init_shared_mem(struct s2io_nic *nic)
 		ring->rx_curr_put_info.ring_len = rx_cfg->num_rxd - 1;
 		ring->nic = nic;
 		ring->ring_no = i;
-		ring->lro = lro_enable;
 
 		blk_cnt = rx_cfg->num_rxd / (rxd_count[nic->rxd_mode] + 1);
 		/*  Allocating all the Rx blocks */
@@ -6684,6 +6683,35 @@ static int s2io_ethtool_op_set_tso(struct net_device *dev, u32 data)
 
 	return 0;
 }
+static int s2io_ethtool_set_flags(struct net_device *dev, u32 data)
+{
+	struct s2io_nic *sp = netdev_priv(dev);
+	int rc = 0;
+	int changed = 0;
+
+	if (data & ETH_FLAG_LRO) {
+		if (lro_enable) {
+			if (!(dev->features & NETIF_F_LRO)) {
+				dev->features |= NETIF_F_LRO;
+				changed = 1;
+			}
+		} else
+			rc = -EINVAL;
+	} else if (dev->features & NETIF_F_LRO) {
+		dev->features &= ~NETIF_F_LRO;
+		changed = 1;
+	}
+
+	if (changed && netif_running(dev)) {
+		s2io_stop_all_tx_queue(sp);
+		s2io_card_down(sp);
+		sp->lro = dev->features & NETIF_F_LRO;
+		rc = s2io_card_up(sp);
+		s2io_start_all_tx_queue(sp);
+	}
+
+	return rc;
+}
 
 static const struct ethtool_ops netdev_ethtool_ops = {
 	.get_settings = s2io_ethtool_gset,
@@ -6701,6 +6729,8 @@ static const struct ethtool_ops netdev_ethtool_ops = {
 	.get_rx_csum = s2io_ethtool_get_rx_csum,
 	.set_rx_csum = s2io_ethtool_set_rx_csum,
 	.set_tx_csum = s2io_ethtool_op_set_tx_csum,
+	.set_flags = s2io_ethtool_set_flags,
+	.get_flags = ethtool_op_get_flags,
 	.set_sg = ethtool_op_set_sg,
 	.get_tso = s2io_ethtool_op_get_tso,
 	.set_tso = s2io_ethtool_op_set_tso,
@@ -7229,6 +7259,7 @@ static int s2io_card_up(struct s2io_nic *sp)
 		struct ring_info *ring = &mac_control->rings[i];
 
 		ring->mtu = dev->mtu;
+		ring->lro = sp->lro;
 		ret = fill_rx_buffers(sp, ring, 1);
 		if (ret) {
 			DBG_PRINT(ERR_DBG, "%s: Out of memory in Open\n",

^ permalink raw reply related

* Re: [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Eric Dumazet @ 2010-06-09  9:56 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1276076415.2442.92.camel@edumazet-laptop>



Oops, I missed  kfree() deletion in  :


>  static void tcf_police_destroy(struct tcf_police *p)
>  {
>  	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
> @@ -113,6 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
>  				qdisc_put_rtab(p->tcfp_R_tab);
>  			if (p->tcfp_P_tab)
>  				qdisc_put_rtab(p->tcfp_P_tab);
> +			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
>  			kfree(p);
>  			return;

Updated patch follows :

[PATCH] pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is a bit misleading, since caller
should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4 
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already rcu
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h              |    2 ++
 include/net/netfilter/xt_rateest.h |    1 +
 net/core/gen_estimator.c           |    1 +
 net/netfilter/xt_RATEEST.c         |    8 +++++++-
 net/sched/act_api.c                |    7 ++++++-
 net/sched/act_police.c             |    8 +++++++-
 6 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c05fd71..bab385f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -20,6 +20,7 @@ struct tcf_common {
 	struct gnet_stats_queue		tcfc_qstats;
 	struct gnet_stats_rate_est	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
+	struct rcu_head			tcfc_rcu;
 };
 #define tcf_next	common.tcfc_next
 #define tcf_index	common.tcfc_index
@@ -32,6 +33,7 @@ struct tcf_common {
 #define tcf_qstats	common.tcfc_qstats
 #define tcf_rate_est	common.tcfc_rate_est
 #define tcf_lock	common.tcfc_lock
+#define tcf_rcu		common.tcfc_rcu
 
 struct tcf_police {
 	struct tcf_common	common;
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index ddbf37e..5e14277 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -9,6 +9,7 @@ struct xt_rateest {
 	struct gnet_estimator		params;
 	struct gnet_stats_rate_est	rstats;
 	struct gnet_stats_basic_packed	bstats;
+	struct rcu_head			rcu;
 };
 
 extern struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 785e527..9fbe7f7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -263,6 +263,7 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
+ * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 69c01e1..096b18b 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
+static void xt_rateeset_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct xt_rateest, rcu));
+}
+
 void xt_rateest_put(struct xt_rateest *est)
 {
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
 		gen_kill_estimator(&est->bstats, &est->rstats);
-		kfree(est);
+		call_rcu(&est->rcu, xt_rateeset_free_rcu);
 	}
 	mutex_unlock(&xt_rateest_mutex);
 }
@@ -179,6 +184,7 @@ static int __init xt_rateest_tg_init(void)
 static void __exit xt_rateest_tg_fini(void)
 {
 	xt_unregister_target(&xt_rateest_tg_reg);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 972378f..0e6fd2b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -26,6 +26,11 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void tcf_common_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_common, tcfc_rcu));
+}
+
 void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 {
 	unsigned int h = tcf_hash(p->tcfc_index, hinfo->hmask);
@@ -38,7 +43,7 @@ void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 			write_unlock_bh(hinfo->lock);
 			gen_kill_estimator(&p->tcfc_bstats,
 					   &p->tcfc_rate_est);
-			kfree(p);
+			call_rcu(&p->tcfc_rcu, tcf_common_free_rcu);
 			return;
 		}
 	}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 654f73d..04929ce 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -97,6 +97,11 @@ nla_put_failure:
 	goto done;
 }
 
+static void tcf_police_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_police, tcf_rcu));
+}
+
 static void tcf_police_destroy(struct tcf_police *p)
 {
 	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
@@ -113,7 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
 				qdisc_put_rtab(p->tcfp_R_tab);
 			if (p->tcfp_P_tab)
 				qdisc_put_rtab(p->tcfp_P_tab);
-			kfree(p);
+			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
 			return;
 		}
 	}
@@ -397,6 +402,7 @@ static void __exit
 police_cleanup_module(void)
 {
 	tcf_unregister_action(&act_police_ops);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 module_init(police_init_module);



^ permalink raw reply related

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  9:54 UTC (permalink / raw)
  To: Herbert Xu, Stephen Hemminger
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, jdike@linux.intel.com
In-Reply-To: <20100608052744.GA21547@gondor.apana.org.au>

>-----Original Message-----
>From: Herbert Xu [mailto:herbert@gondor.apana.org.au]
>Sent: Tuesday, June 08, 2010 1:28 PM
>To: Stephen Hemminger
>Cc: Xin, Xiaohui; netdev@vger.kernel.org; kvm@vger.kernel.org;
>linux-kernel@vger.kernel.org; mst@redhat.com; mingo@elte.hu; davem@davemloft.net;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>On Sun, Jun 06, 2010 at 04:13:48PM -0700, Stephen Hemminger wrote:
>> Still not sure this is a good idea for a couple of reasons:
>>
>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>    and skb's travel through a lot of different parts of the kernel.  So any
>>    new change like this creates lots of exposed points for new bugs. Look
>>    at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>    and ppp and ...
>>
>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>    a fixed size pool in an external device, they can easily all get tied up
>>    if you have a slow listener. What happens then?
>
>I agree with Stephen on this.
>
>FWIW I don't think we even need the external pages concept in
>order to implement zero-copy receive (which I gather is the intent
>here).
>
>Here is one way to do it, simply construct a completely non-linear
>packet in the driver, as you would if you were using the GRO frags
>interface (grep for napi_gro_frags under drivers/net for examples).
>
I'm not sure if I understand your way correctly:
1) Does the way only deal with driver with SG feature? Since packet 
is non-linear...

2) Is skb->data still pointing to guest user buffers?
If yes, how to avoid the modifications to net core change to skb?

3) In our way only parts of drivers need be modified to support zero-copy.
and here, need we modify all the drivers?

If I missed your idea, may you explain it in more detail?

>This way you can transfer the entire contents of the packet without
>copying through to the other side, provided that the host stack does
>not modify the packet.
>

>If the host side did modify the packet then we have to incur the
>memory cost anyway.
>
>IOW I think the only feature provided by the external pages
>construct is allowing the skb->head area to be shared without
>copying.  I'm claiming that this can be done by simply making
>skb->head empty.
>
I think to make skb->head empty at first will cause more effort to pass the check with 
skb header. Have I missed something here? I really make the skb->head NULL
just before kfree(skb) in skb_release_data(), it's done by callback we have made for skb.

Thanks
Xiaohui

>Cheers,
>--
>Visit Openswan at http://www.openswan.org/
>Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
>Home Page: http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] pkt_sched: gen_kill_estimator() rcu fixes
From: Eric Dumazet @ 2010-06-09  9:40 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1275929761.2545.159.camel@edumazet-laptop>

Le lundi 07 juin 2010 à 18:56 +0200, Eric Dumazet a écrit :

> 
> Even if its a bug correction, I cooked it for net-next-2.6 since bug
> probably never occured, and patch is too large to be sent to
> net-2.6/linux-2.6 before testing.
> 
> Another bug comes from net/netfilter/xt_RATEEST.c : It apparently
> calls gen_kill_estimator() / gen_new_estimator() without holding RTNL ?
> 
> So we should add another lock to protect things (est_root, elist[], ...)
> 
> David, I can send a net-2.6 patch for this one, since it should be small
> enough. If yes, I'll respin this patch of course ;)
> 
> Thanks
> 



Here is v3 of the patch, reduced to bare minimum.

This patch should be applied after previous one (pkt_sched:
gen_estimator: add a new lock)

Thanks


[PATCH] pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is a bit misleading, since caller
should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4 
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already rcu
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h              |    2 ++
 include/net/netfilter/xt_rateest.h |    1 +
 net/core/gen_estimator.c           |    1 +
 net/netfilter/xt_RATEEST.c         |    8 +++++++-
 net/sched/act_api.c                |    7 ++++++-
 net/sched/act_police.c             |    7 +++++++
 6 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c05fd71..bab385f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -20,6 +20,7 @@ struct tcf_common {
 	struct gnet_stats_queue		tcfc_qstats;
 	struct gnet_stats_rate_est	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
+	struct rcu_head			tcfc_rcu;
 };
 #define tcf_next	common.tcfc_next
 #define tcf_index	common.tcfc_index
@@ -32,6 +33,7 @@ struct tcf_common {
 #define tcf_qstats	common.tcfc_qstats
 #define tcf_rate_est	common.tcfc_rate_est
 #define tcf_lock	common.tcfc_lock
+#define tcf_rcu		common.tcfc_rcu
 
 struct tcf_police {
 	struct tcf_common	common;
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index ddbf37e..5e14277 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -9,6 +9,7 @@ struct xt_rateest {
 	struct gnet_estimator		params;
 	struct gnet_stats_rate_est	rstats;
 	struct gnet_stats_basic_packed	bstats;
+	struct rcu_head			rcu;
 };
 
 extern struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 785e527..9fbe7f7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -263,6 +263,7 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
+ * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 69c01e1..096b18b 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -60,13 +60,18 @@ struct xt_rateest *xt_rateest_lookup(const char *name)
 }
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
+static void xt_rateeset_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct xt_rateest, rcu));
+}
+
 void xt_rateest_put(struct xt_rateest *est)
 {
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
 		gen_kill_estimator(&est->bstats, &est->rstats);
-		kfree(est);
+		call_rcu(&est->rcu, xt_rateeset_free_rcu);
 	}
 	mutex_unlock(&xt_rateest_mutex);
 }
@@ -179,6 +184,7 @@ static int __init xt_rateest_tg_init(void)
 static void __exit xt_rateest_tg_fini(void)
 {
 	xt_unregister_target(&xt_rateest_tg_reg);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 972378f..0e6fd2b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -26,6 +26,11 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void tcf_common_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_common, tcfc_rcu));
+}
+
 void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 {
 	unsigned int h = tcf_hash(p->tcfc_index, hinfo->hmask);
@@ -38,7 +43,7 @@ void tcf_hash_destroy(struct tcf_common *p, struct tcf_hashinfo *hinfo)
 			write_unlock_bh(hinfo->lock);
 			gen_kill_estimator(&p->tcfc_bstats,
 					   &p->tcfc_rate_est);
-			kfree(p);
+			call_rcu(&p->tcfc_rcu, tcf_common_free_rcu);
 			return;
 		}
 	}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 654f73d..9df45ed 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -97,6 +97,11 @@ nla_put_failure:
 	goto done;
 }
 
+static void tcf_police_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct tcf_police, tcf_rcu));
+}
+
 static void tcf_police_destroy(struct tcf_police *p)
 {
 	unsigned int h = tcf_hash(p->tcf_index, POL_TAB_MASK);
@@ -113,6 +118,7 @@ static void tcf_police_destroy(struct tcf_police *p)
 				qdisc_put_rtab(p->tcfp_R_tab);
 			if (p->tcfp_P_tab)
 				qdisc_put_rtab(p->tcfp_P_tab);
+			call_rcu(&p->tcf_rcu, tcf_police_free_rcu);
 			kfree(p);
 			return;
 		}
@@ -397,6 +403,7 @@ static void __exit
 police_cleanup_module(void)
 {
 	tcf_unregister_action(&act_police_ops);
+	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
 module_init(police_init_module);



^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Jens Axboe @ 2010-06-09  9:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <20100609093249.GA17026@elte.hu>

On 2010-06-09 11:32, Ingo Molnar wrote:
>> This should be fixed by commit 28f4197e which was merged on friday.
> 
> The scheduler commit adding local_clock() (for .36) is:
> 
>   c676329: sched_clock: Add local_clock() API and improve documentation
> 
> So once that is upstream the block IO statistics code can use that.

Thanks, I'll have to make a note of that.

-- 
Jens Axboe


^ permalink raw reply

* [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
From: Eric Dumazet @ 2010-06-09  9:39 UTC (permalink / raw)
  To: Changli Gao, David Miller
  Cc: netdev, Stephen Hemminger, Jarek Poplawski, Patrick McHardy
In-Reply-To: <1275931108.2545.168.camel@edumazet-laptop>

Le lundi 07 juin 2010 à 19:18 +0200, Eric Dumazet a écrit :


> [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables"
> 
> Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST


Here is v2 of patch, adding protection in gen_estimator_active() as
well.

[PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock

gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".

Add a new fine grained lock instead of trying to use RTNL in netfilter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/gen_estimator.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index cf8e703..785e527 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -107,6 +107,7 @@ static DEFINE_RWLOCK(est_lock);
 
 /* Protects against soft lockup during large deletion */
 static struct rb_root est_root = RB_ROOT;
+static DEFINE_SPINLOCK(est_tree_lock);
 
 static void est_timer(unsigned long arg)
 {
@@ -201,7 +202,6 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  * Returns 0 on success or a negative error code.
  *
- * NOTE: Called under rtnl_mutex
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 		      struct gnet_stats_rate_est *rate_est,
@@ -232,6 +232,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
+	spin_lock(&est_tree_lock);
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
 		setup_timer(&elist[idx].timer, est_timer, idx);
@@ -242,6 +243,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 
 	list_add_rcu(&est->list, &elist[idx].list);
 	gen_add_node(est);
+	spin_unlock(&est_tree_lock);
 
 	return 0;
 }
@@ -261,13 +263,13 @@ static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
- * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
 {
 	struct gen_estimator *e;
 
+	spin_lock(&est_tree_lock);
 	while ((e = gen_find_node(bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
 
@@ -278,6 +280,7 @@ void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 		list_del_rcu(&e->list);
 		call_rcu(&e->e_rcu, __gen_kill_estimator);
 	}
+	spin_unlock(&est_tree_lock);
 }
 EXPORT_SYMBOL(gen_kill_estimator);
 
@@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
 			  const struct gnet_stats_rate_est *rate_est)
 {
+	bool res;
+
 	ASSERT_RTNL();
 
-	return gen_find_node(bstats, rate_est) != NULL;
+	spin_lock(&est_tree_lock);
+	res = gen_find_node(bstats, rate_est) != NULL;
+	spin_unlock(&est_tree_lock);
+
+	return res;
 }
 EXPORT_SYMBOL(gen_estimator_active);



^ permalink raw reply related

* RE: [patch] caif: fix a couple range checks
From: Sjur BRENDELAND @ 2010-06-09  9:11 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: David S. Miller, netdev@vger.kernel.org,
	kernel-janitors@vger.kernel.org
In-Reply-To: <20100607145158.GO5483@bicker>

> From: Dan Carpenter [mailto:error27@gmail.com]
> 
> The extra ! character means that these conditions are always false.


Looks good, thanks. This is embarrassing, I should caught this ages ago.
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/net/caif/cfveil.c b/net/caif/cfveil.c
> index 0fd827f..e04f7d9 100644
> --- a/net/caif/cfveil.c
> +++ b/net/caif/cfveil.c
> @@ -84,7 +84,7 @@ static int cfvei_transmit(struct cflayer *layr,
> struct cfpkt *pkt)
>  		return ret;
>  	caif_assert(layr->dn != NULL);
>  	caif_assert(layr->dn->transmit != NULL);
> -	if (!cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
> +	if (cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
>  		pr_warning("CAIF: %s(): Packet too large - size=%d\n",
>  			   __func__, cfpkt_getlen(pkt));
>  		return -EOVERFLOW;
> diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
> index cd2830f..fd27b17 100644
> --- a/net/caif/cfrfml.c
> +++ b/net/caif/cfrfml.c
> @@ -83,7 +83,7 @@ static int cfrfml_transmit(struct cflayer *layr,
> struct cfpkt *pkt)
>  	if (!cfsrvl_ready(service, &ret))
>  		return ret;
> 
> -	if (!cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
> +	if (cfpkt_getlen(pkt) > CAIF_MAX_PAYLOAD_SIZE) {
>  		pr_err("CAIF: %s():Packet too large - size=%d\n",
>  			__func__, cfpkt_getlen(pkt));
>  		return -EOVERFLOW;

^ permalink raw reply

* Re: [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: Johannes Berg @ 2010-06-09  9:38 UTC (permalink / raw)
  To: florian
  Cc: pm list, james.bottomley, markgross, mgross, John W. Linville,
	David S. Miller, Javier Cardona, Jouni Malinen, Rui Paulo,
	Kalle Valo, linux-wireless, linux-kernel, netdev
In-Reply-To: <1276074915-26879-1-git-send-email-florian@mickler.org>

On Wed, 2010-06-09 at 11:15 +0200, florian@mickler.org wrote:
> In order to have the pm_qos framework be callable from interrupt
> context, all listeners have to also be callable in that context.

That makes no sense at all. Why add work structs _everywhere_ in the
callees and make the API harder to use and easy to get wrong completely,
instead of just adding a single work struct that will be queued from the
caller and dealing with the locking complexity etc. just once.

johannes


^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Ingo Molnar @ 2010-06-09  9:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <4C0F4872.7090909@fusionio.com>


* Jens Axboe <jaxboe@fusionio.com> wrote:

> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> >> Date		: 2010-06-05 06:15 (4 days old)
> > 
> > This seems to have been introduced by
> > 
> > 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > 	Author: Ingo Molnar <mingo@elte.hu>
> > 	Date:   Sat Nov 8 17:05:38 2008 +0100
> > 
> > 	    sched: optimize sched_clock() a bit
> >     
> > 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > 	    variant of __cycles_2_ns().
> >     
> > 	    Most of the time sched_clock() is called with irqs disabled already.
> > 	    The few places that call it with irqs enabled need to be updated.
> >     
> > 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > 
> > and this seems to be one of those calling cases that need to be updated.
> > 
> > Ingo? The call trace is:
> > 
> > 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > 	caller is native_sched_clock+0x3c/0x68
> > 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > 	Call Trace:
> > 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> > 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > 	[<ffffffff81215537>] __make_request+0x338/0x45b
> > 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> > 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> > 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> > 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> > 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> > 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> > 
> > (from the bugzilla thing)
> 
> This should be fixed by commit 28f4197e which was merged on friday.

The scheduler commit adding local_clock() (for .36) is:

  c676329: sched_clock: Add local_clock() API and improve documentation

So once that is upstream the block IO statistics code can use that.

Thanks,

	Ingo

^ permalink raw reply

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  9:22 UTC (permalink / raw)
  To: Mitchell Erblich, Andi Kleen
  Cc: Stephen Hemminger, netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <40119F5E-A745-4C50-8053-D5A701266AD5@earthlink.net>

>-----Original Message-----
>From: Mitchell Erblich [mailto:erblichs@earthlink.net]
>Sent: Monday, June 07, 2010 4:17 PM
>To: Andi Kleen
>Cc: Stephen Hemminger; Xin, Xiaohui; netdev@vger.kernel.org; kvm@vger.kernel.org;
>linux-kernel@vger.kernel.org; mst@redhat.com; mingo@elte.hu; davem@davemloft.net;
>herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>
>On Jun 7, 2010, at 12:51 AM, Andi Kleen wrote:
>
>> Stephen Hemminger <shemminger@vyatta.com> writes:
>>
>>> Still not sure this is a good idea for a couple of reasons:
>>>
>>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>>   and skb's travel through a lot of different parts of the kernel.  So any
>>>   new change like this creates lots of exposed points for new bugs. Look
>>>   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>>   and ppp and ...
>>>
>>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>>   a fixed size pool in an external device, they can easily all get tied up
>>>   if you have a slow listener. What happens then?
>>
>> 3. If they come from an internal pool what happens when the kernel runs
>> low on memory? How is that pool balanced against other kernel
>> memory users?
>>
>> -Andi
>>
>> --
>> ak@linux.intel.com -- Speaking for myself only.
>
>In general,
>
>When an internal pool is created/used, their SHOULD be a reason.
>Maybe, to keep allocation latency to a min, OR ...
>
The internal pool here is a collection of user buffers submitted
by guest virtio-net driver. Guest put buffers here and driver get
buffers from it. If guest submit more buffers then driver needs,
we need somewhere to put the buffers, it's the internal pool here
to deal with. 

>Now IMO,
>
>internal pool objects should have a ref count and
>if that count is 0, then under memory pressure and/or num
>of objects are above a high water mark, then they are freed,
>
>OR if there is a last reference age field, then the object is to be
>cleaned if dirty, then freed,
>
>Else, the pool is allowed to grow if the number of objects in the
>pool is below a set max (max COULD equal Infinity).

Thanks for the thoughts.

Basically, the size of the internal pool is not decided by the pool itself,
To add/delete the objects in the pool is not a task of the pool itself too. 
It's decided by guest virtio-net driver and vhost-net driver both, and 
decided by the guest receive speed and submit speed.
The max size of the pool is limited by the virtqueue buffer numbers.

Thanks
Xiaohui

>
>
>
>Mitchell Erblich

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  9:22 UTC (permalink / raw)
  To: sedat.dilek-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Linus Torvalds, Kernel Testers List, Network Development,
	Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List,
	DRI, dmonakhov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <AANLkTiksbL1qHg7Q0A-6TbX0uUrxra4jctUoIGVk5vnE-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wednesday 09 June 2010, Sedat Dilek wrote:
> The patch from [1] is still missing.
> 
>    "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> Dmitry Monakhoc
> 
> Tested-by: Sedat Dilek <sedat.dilek-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Tested-by Maciej Rutecki <maciej.rutecki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> I have already reported this issue on LKML [2] and cpufreq ML [3].
> 
> - Sedat -
> 
> [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> [2] http://lkml.org/lkml/2010/5/31/77
> [3] http://www.spinics.net/lists/cpufreq/msg01637.html

Thanks, added.

Rafael

^ permalink raw reply

* Re: [Patch 2/2] mlx4: add dynamic LRO disable support
From: Cong Wang @ 2010-06-09  9:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, herbert.xu, nhorman, sgruszka, davem
In-Reply-To: <1275661552.2095.13.camel@achroite.uk.solarflarecom.com>

On 06/04/10 22:25, Ben Hutchings wrote:
> On Fri, 2010-06-04 at 09:56 +0800, Cong Wang wrote:
>> On 06/03/10 20:37, Ben Hutchings wrote:
>>> On Wed, 2010-06-02 at 23:39 -0400, Amerigo Wang wrote:
>>>> This patch adds dynamic LRO diable support for mlx4 net driver.
>>>> It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
>>>> path without rtnl lock.
>>> [...]
>>>
>>> Is that flag test actually unsafe - and if so, how is testing num_lro
>>> any better?  Perhaps access to net_device::features should be wrapped
>>> with ACCESS_ONCE() to ensure that reads and writes are atomic.
>>>
>>
>> At least, I don't find there is any race with 'num_lro', thus
>> no lock is needed.
>
> In both cases there is a race condition but it is harmless so long as
> the read and the write are atomic.  There is a general assumption in
> networking code that this is the case for int and long.  Personally I
> would prefer to see this made explicit using ACCESS_ONCE(), but I don't
> see any specific problem in mlx4 (not that I'm familiar with this driver
> either).

I read this email again.

I think you misunderstood the race condition here. Even read and write
are atomic here, the race still exists. One can just set NETIF_F_LRO
asynchronously right before mlx4 check this flag in mlx4_en_process_rx_cq()
which doesn't take rtnl_lock.

Also, I don't think ACCESS_ONCE() can make things atomic here.

Am I missing something?

Thanks.

^ permalink raw reply

* [RFC PATCH 1/2] mac80211: make max_network_latency notifier atomic safe
From: florian @ 2010-06-09  9:15 UTC (permalink / raw)
  To: pm list
  Cc: james.bottomley, markgross, mgross, Florian Mickler,
	John W. Linville, Johannes Berg, David S. Miller, Javier Cardona,
	Jouni Malinen, Rui Paulo, Kalle Valo, linux-wireless,
	linux-kernel, netdev

In order to have the pm_qos framework be callable from interrupt
context, all listeners have to also be callable in that context.

This patch schedules the update of the latency value via
ieee80211_max_network_latency() for execution on
the global workqueue (using schedule_work()).

As there was no synchronization/locking between the listener and the
caller of pm_qos_update_request before, there should be no new races
uncovered by this. Although the timing probably changes.

Signed-off-by: Florian Mickler <florian@mickler.org>
---

This needs some networking expertise to check the reasoning above and
judge the implementation.

 net/mac80211/ieee80211_i.h |    4 +++-
 net/mac80211/main.c        |    5 +++--
 net/mac80211/mlme.c        |   20 +++++++++++++++-----
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 1a9e2da..3d2155a 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -851,6 +851,7 @@ struct ieee80211_local {
 	struct work_struct dynamic_ps_disable_work;
 	struct timer_list dynamic_ps_timer;
 	struct notifier_block network_latency_notifier;
+	struct work_struct network_latency_notifier_work;
 
 	int user_power_level; /* in dBm */
 	int power_constr_level; /* in dBm */
@@ -994,8 +995,9 @@ ieee80211_rx_result ieee80211_sta_rx_mgmt(struct ieee80211_sub_if_data *sdata,
 void ieee80211_send_pspoll(struct ieee80211_local *local,
 			   struct ieee80211_sub_if_data *sdata);
 void ieee80211_recalc_ps(struct ieee80211_local *local, s32 latency);
-int ieee80211_max_network_latency(struct notifier_block *nb,
+int ieee80211_max_network_latency_notification(struct notifier_block *nb,
 				  unsigned long data, void *dummy);
+void ieee80211_max_network_latency(struct work_struct *w);
 void ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata,
 				      struct ieee80211_channel_sw_ie *sw_elem,
 				      struct ieee80211_bss *bss,
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 22a384d..5cded3a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -607,9 +607,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
 	rtnl_unlock();
 
 	ieee80211_led_init(local);
-
+	INIT_WORK(&local->network_latency_notifier_work,
+			ieee80211_max_network_latency);
 	local->network_latency_notifier.notifier_call =
-		ieee80211_max_network_latency;
+		ieee80211_max_network_latency_notification;
 	result = pm_qos_add_notifier(PM_QOS_NETWORK_LATENCY,
 				     &local->network_latency_notifier);
 
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 0839c4e..feb6134 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1953,17 +1953,27 @@ void ieee80211_mlme_notify_scan_completed(struct ieee80211_local *local)
 	rcu_read_unlock();
 }
 
-int ieee80211_max_network_latency(struct notifier_block *nb,
-				  unsigned long data, void *dummy)
+void ieee80211_max_network_latency(struct work_struct *w)
 {
-	s32 latency_usec = (s32) data;
+	s32 latency_usec = (s32) atomic_long_read(&w->data);
 	struct ieee80211_local *local =
-		container_of(nb, struct ieee80211_local,
-			     network_latency_notifier);
+		container_of(w, struct ieee80211_local,
+			     network_latency_notifier_work);
 
 	mutex_lock(&local->iflist_mtx);
 	ieee80211_recalc_ps(local, latency_usec);
 	mutex_unlock(&local->iflist_mtx);
+}
+
+/* the notifier might be called from interrupt context */
+int ieee80211_max_network_latency_notification(struct notifier_block *nb,
+				  unsigned long data, void *dummy)
+{
+	struct ieee80211_local *local =
+		container_of(nb, struct ieee80211_local,
+				network_latency_notifier);
+	atomic_long_set(&local->network_latency_notifier_work.data, data);
+	schedule_work(&local->network_latency_notifier_work);
 
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  9:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Venkatesh Pallipadi, DRI, Mauro Carvalho Chehab, Eric Dumazet,
	Linux ACPI, Jens Axboe, Linux PM List, Linux Wireless List,
	Linux Kernel Mailing List, Jesse Barnes, Linux SCSI List,
	Carl Worth, Härdeman, Network Development, Ingo Molnar,
	Kernel Testers List, Andrew Morton, Maciej Rutecki
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506@i5.linux-foundation.org>

On Wednesday 09 June 2010, Linus Torvalds wrote:
> 
> [ Added lots of cc's to direct specific people to look at the regressions 
>   that may or may not be theirs... ]
> 
> On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
> >
> >  * Quite a few of the already reported regressions may be related to the bug
> >    fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
> >    bug in scrup, vt.c"), so reporters please retest with this commit applied.]
> 
> From a quick look, most of them look unrelated to that unfortunate bug. 
> It's hard to tell for sure, of course (memory corruption can do pretty 
> much anything), but I'm not seeing the 07200720.. pattern at least.
> 
> And some of them do seem to be bisected to likely culprits and/or have 
> patches that are claimed to have fixed them.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16163
> > Subject	: [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> > Submitter	: David John <davidjon@xenontk.org>
> > Date		: 2010-06-02 12:52 (7 days old)
> > Message-ID	: <4C065423.3000202@xenontk.org>
> > References	: http://marc.info/?l=linux-kernel&m=127548313828613&w=2
> 
> That has a "reverting the commit fixes it", and a confirmation from Nick 
> Bowler.
> 
> Eric, Carl: should I just revert that commit? Or do you have a fix?

This one is reported to have been fixed already.  Closed now.

...
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16104
> > Subject	: Radeon KMS does not start after merge of the new PM-Code
> > Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> > Date		: 2010-06-02 07:47 (7 days old)
> 
> This one also has a patch in Bugzilla, I think Airlie is just waiting to 
> calm down his queue and already removed the dependency on the temperature 
> code. DaveA?

This should be fixed by commit f8ed8b4c5d30b5214f185997131b06e35f6f7113, so
closing now.

Rafael

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
--

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Sedat Dilek @ 2010-06-09  9:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Linus Torvalds, Kernel Testers List, Network Development,
	Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List,
	DRI, dmonakhov
In-Reply-To: <Rlf_Qjov7bG.A.a8F.I8rDMB@chimera>

The patch from [1] is still missing.

   "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
Dmitry Monakhoc

Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by Maciej Rutecki <maciej.rutecki@gmail.com>

I have already reported this issue on LKML [2] and cpufreq ML [3].

- Sedat -

[1] http://www.spinics.net/lists/cpufreq/msg01631.html
[2] http://lkml.org/lkml/2010/5/31/77
[3] http://www.spinics.net/lists/cpufreq/msg01637.html

On Wed, Jun 9, 2010 at 12:06 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> [NOTES:
>  * This by no means is a complete list, but we only put e-mail reports that
>   are at least 1 week old into the Bugzilla.
>  * Quite a few of the already reported regressions may be related to the bug
>   fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
>   bug in scrup, vt.c"), so reporters please retest with this commit applied.]
>
> This message contains a list of some regressions from 2.6.34,
> for which there are no fixes in the mainline known to the tracking team.
> If any of them have been fixed already, please let us know.
>
> If you know of any other unresolved regressions from 2.6.34, please let us
> know either and we'll add them to the list.  Also, please let us know
> if any of the entries below are invalid.
>
> Each entry from the list will be sent additionally in an automatic reply
> to this message with CCs to the people involved in reporting and handling
> the issue.
>
>
> Listed regressions statistics:
>
>  Date          Total  Pending  Unresolved
>  ----------------------------------------
>  2010-06-09       15       13          10
>
>
> Unresolved regressions
> ----------------------
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16163
> Subject         : [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> Submitter       : David John <davidjon@xenontk.org>
> Date            : 2010-06-02 12:52 (7 days old)
> Message-ID      : <4C065423.3000202@xenontk.org>
> References      : http://marc.info/?l=linux-kernel&m=127548313828613&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16161
> Subject         : [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
> Submitter       : Mikael Pettersson <mikpe@it.uu.se>
> Date            : 2010-06-01 19:57 (8 days old)
> Message-ID      : <19461.26166.427857.612983@pilspetsen.it.uu.se>
> References      : http://marc.info/?l=linux-kernel&m=127542227511925&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16160
> Subject         : 2.6.35 Radeon KMS power management regression?
> Submitter       : Nigel Cunningham <ncunningham@crca.org.au>
> Date            : 2010-06-01 6:23 (8 days old)
> Message-ID      : <4C04A767.8000209@crca.org.au>
> References      : http://marc.info/?l=linux-kernel&m=127537343722290&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16145
> Subject         : Unable to boot after "ACPI: Don't let acpi_pad needlessly mark TSC unstable"
> Submitter       : Tom Gundersen <teg@jklm.no>
> Date            : 2010-06-07 13:11 (2 days old)
> Handled-By      : Venkatesh Pallipadi <venki@google.com>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> Subject         : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> Submitter       : Jan Kreuzer <kontrollator@gmx.de>
> Date            : 2010-06-05 06:15 (4 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16122
> Subject         : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
> Submitter       : Larry Finger <Larry.Finger@lwfinger.net>
> Date            : 2010-06-04 13:18 (5 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16120
> Subject         : Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> Submitter       : Alex Zhavnerchik <alex.vizor@gmail.com>
> Date            : 2010-06-04 09:25 (5 days old)
> Handled-By      : Eric Dumazet <eric.dumazet@gmail.com>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16104
> Subject         : Radeon KMS does not start after merge of the new PM-Code
> Submitter       : Jan Kreuzer <kontrollator@gmx.de>
> Date            : 2010-06-02 07:47 (7 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16090
> Subject         : sysfs: cannot create duplicate filename
> Submitter       : Tobias <devnull@plzk.org>
> Date            : 2010-06-01 15:59 (8 days old)
> Handled-By      : Jesse Barnes <jbarnes@virtuousgeek.org>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16037
> Subject         : NULL Pointer dereference in __ir_input_register/budget_ci_attach
> Submitter       : Sean Finney <seanius@debian.org>
> Date            : 2010-05-23 19:52 (17 days old)
>
>
> Regressions with patches
> ------------------------
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16131
> Subject         : kernel BUG at fs/btrfs/extent-tree.c:4363 (btrfs_free_tree_block)
> Submitter       : Chow Loong Jin <hyperair@ubuntu.com>
> Date            : 2010-06-05 18:53 (4 days old)
> Handled-By      : Yan Zheng <zheng.yan@oracle.com>
> Patch           : https://patchwork.kernel.org/patch/103235/
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16127
> Subject         : Boot freeze on HP Compaq nx6325 (RS482) with Radeon KMS
> Submitter       : Jure Repinc <jlp.bugs@gmail.com>
> Date            : 2010-06-04 21:14 (5 days old)
> Handled-By      : Dave Airlie <airlied@linux.ie>
> Patch           : https://bugzilla.kernel.org/attachment.cgi?id=26677
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16092
> Subject         : Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb
> Submitter       : Christian Casteyde <casteyde.christian@free.fr>
> Date            : 2010-06-01 18:08 (8 days old)
> Handled-By      : Venki <venki@google.com>
> Patch           : https://bugzilla.kernel.org/show_bug.cgi?id=16092#c2
>
>
> For details, please visit the bug entries and follow the links given in
> references.
>
> As you can see, there is a Bugzilla entry for each of the listed regressions.
> There also is a Bugzilla entry used for tracking the regressions from 2.6.34,
> unresolved as well as resolved, at:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=16055
>
> Please let the tracking team know if there are any Bugzilla entries that
> should be added to the list in there.
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  9:00 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linus Torvalds, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Jens Axboe, Dave Airlie, Jesse Barnes, Ingo Molnar,
	David Härdeman, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <4C0EFBC1.5090401-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wednesday 09 June 2010, Mauro Carvalho Chehab wrote:
> Em 08-06-2010 22:53, Linus Torvalds escreveu:
> 
> > 
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16037
> >> Subject	: NULL Pointer dereference in __ir_input_register/budget_ci_attach
> >> Submitter	: Sean Finney <seanius-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org>
> >> Date		: 2010-05-23 19:52 (17 days old)
> > 
> > Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?
> > 
> > David Härdeman, Mauro Carvalho Chehab added to cc.
> 
> This patch probably solves the issue:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=84b14f181a36eea6591779156ef356f8d198bbfd
> 
> The patch were already applied upstream. I've already asked the reporter to test it, via BZ.

Confirmed fixed, so closing.

Thanks,
Rafael

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Rafael J. Wysocki @ 2010-06-09  8:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Dave Airlie, Jesse Barnes, Ingo Molnar, David Härdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <4C0F4872.7090909@fusionio.com>

On Wednesday 09 June 2010, Jens Axboe wrote:
> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> >> Date		: 2010-06-05 06:15 (4 days old)
> > 
> > This seems to have been introduced by
> > 
> > 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > 	Author: Ingo Molnar <mingo@elte.hu>
> > 	Date:   Sat Nov 8 17:05:38 2008 +0100
> > 
> > 	    sched: optimize sched_clock() a bit
> >     
> > 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > 	    variant of __cycles_2_ns().
> >     
> > 	    Most of the time sched_clock() is called with irqs disabled already.
> > 	    The few places that call it with irqs enabled need to be updated.
> >     
> > 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > 
> > and this seems to be one of those calling cases that need to be updated.
> > 
> > Ingo? The call trace is:
> > 
> > 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > 	caller is native_sched_clock+0x3c/0x68
> > 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > 	Call Trace:
> > 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> > 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > 	[<ffffffff81215537>] __make_request+0x338/0x45b
> > 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> > 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> > 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> > 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> > 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> > 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> > 
> > (from the bugzilla thing)
> 
> This should be fixed by commit 28f4197e which was merged on friday.

Thanks, closed.

Rafael

^ permalink raw reply

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  8:48 UTC (permalink / raw)
  To: Andi Kleen, Stephen Hemminger
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <87pr03gu1c.fsf@basil.nowhere.org>

>-----Original Message-----
>From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Andi
>Kleen
>Sent: Monday, June 07, 2010 3:51 PM
>To: Stephen Hemminger
>Cc: Xin, Xiaohui; netdev@vger.kernel.org; kvm@vger.kernel.org;
>linux-kernel@vger.kernel.org; mst@redhat.com; mingo@elte.hu; davem@davemloft.net;
>herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>Stephen Hemminger <shemminger@vyatta.com> writes:
>
>> Still not sure this is a good idea for a couple of reasons:
>>
>> 1. We already have lots of special cases with skb's (frags and fraglist),
>>    and skb's travel through a lot of different parts of the kernel.  So any
>>    new change like this creates lots of exposed points for new bugs. Look
>>    at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>>    and ppp and ...
>>
>> 2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>>    a fixed size pool in an external device, they can easily all get tied up
>>    if you have a slow listener. What happens then?
>
>3. If they come from an internal pool what happens when the kernel runs
>low on memory? How is that pool balanced against other kernel
>memory users?
>
The size of the pool is limited by the virtqueue capacity now.
If the virtqueue is really huge, then how to balance on memory is a problem.
I did not consider it clearly how to tune it dynamically currently...

>-Andi
>
>--
>ak@linux.intel.com -- Speaking for myself only.
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH][RFC] Infrastructure for compact call location representation
From: Nick Piggin @ 2010-06-09  8:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David VomLehn, to, netdev
In-Reply-To: <20100608084456.4da00349@nehalam>

On Tue, Jun 08, 2010 at 08:44:56AM -0700, Stephen Hemminger wrote:
> On Mon, 7 Jun 2010 17:30:52 -0700
> David VomLehn <dvomlehn@cisco.com> wrote:
> > History
> > v2	Support small callsite IDs and split out out-of-band parameter
> > 	parsing.
> > V1	Initial release
> > 
> > Signed-off-by: David VomLehn <dvomlehn@cisco.com>
> 
> This is really Linux Kernel Mailing List material (not just netdev). And it will
> be a hard sell to get it accepted, because it is basically an alternative call
> tracing mechanism, and there are already several of these in use or under development
> (see perf and ftrace).

What about a generic extension or layer on top of stacktrace that
does caching and unique IDs for stack traces. This way you can get
callsites or _full_ stack traces if required, and it shouldn't require
any extra magic in the net functions.

You would need a hash for stack traces to check for an existing trace,
and an idr to assign ids to traces.


^ permalink raw reply

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  8:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <20100606161348.427822fb@nehalam>

>-----Original Message-----
>From: Stephen Hemminger [mailto:shemminger@vyatta.com]
>Sent: Monday, June 07, 2010 7:14 AM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>Still not sure this is a good idea for a couple of reasons:
>
>1. We already have lots of special cases with skb's (frags and fraglist),
>   and skb's travel through a lot of different parts of the kernel.  So any
>   new change like this creates lots of exposed points for new bugs. Look
>   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>   and ppp and ...
>

Yes, I agree on your concern at some extent. But the skbs which use external pages in our cases have short travels which start from NIC driver and then forward to the guest immediately. It will not travel into host kernel stack or any filters which may avoid many problems you have mentioned here.

Another point is that we try to make the solution more generic to different NIC drivers, then many drivers may use the way without modifications.
  
>2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>   a fixed size pool in an external device, they can easily all get tied up
>   if you have a slow listener. What happens then?

The pool is not fixed size, it's the size of usable buffers submitted by guest virtio-net driver. Guest virtio-net will check how much buffers are filled and do resubmit. What a slow listener mean? A slow NIC driver?

Thanks
Xiaohui

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox