Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 01/12] netvm: Prevent a stream-specific deadlock
From: David Miller @ 2012-05-14 20:26 UTC (permalink / raw)
  To: mgorman
  Cc: akpm, linux-mm, netdev, linux-nfs, linux-kernel, Trond.Myklebust,
	neilb, hch, a.p.zijlstra, michaelc, emunson
In-Reply-To: <20120514105604.GB29102@suse.de>

From: Mel Gorman <mgorman@suse.de>
Date: Mon, 14 May 2012 11:56:04 +0100

> On Fri, May 11, 2012 at 01:10:34AM -0400, David Miller wrote:
>> From: Mel Gorman <mgorman@suse.de>
>> Date: Thu, 10 May 2012 14:54:14 +0100
>> 
>> > It could happen that all !SOCK_MEMALLOC sockets have buffered so
>> > much data that we're over the global rmem limit. This will prevent
>> > SOCK_MEMALLOC buffers from receiving data, which will prevent userspace
>> > from running, which is needed to reduce the buffered data.
>> > 
>> > Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.
>> > 
>> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>> 
>> This introduces an invariant which I am not so sure is enforced.
>> 
>> With this change it is absolutely required that once a socket
>> becomes SOCK_MEMALLOC it must never _ever_ lose that attribute.
>> 
> 
> This is effectively true. In the NFS case, the flag is cleared on
> swapoff after all the entries have been paged in. In the NBD case,
> SOCK_MEMALLOC is left set until the socket is destroyed. I'll update the
> changelog.

Bugs happen, you need to find a way to assert that nobody every does
this.  Because if a bug is introduced which makes this happen, it will
otherwise be very difficult to debug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/4] netfilter: ipset: fix timeout value overflow bug
From: Pablo Neira Ayuso @ 2012-05-14 20:10 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Eric Dumazet, David Laight, netfilter-devel, davem, netdev
In-Reply-To: <alpine.DEB.2.00.1205141943240.28291@blackhole.kfki.hu>

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

On Mon, May 14, 2012 at 07:45:07PM +0200, Jozsef Kadlecsik wrote:
> On Mon, 14 May 2012, Eric Dumazet wrote:
> 
> > On Mon, 2012-05-14 at 16:36 +0200, Pablo Neira Ayuso wrote:
> > > On Mon, May 14, 2012 at 03:19:49PM +0100, David Laight wrote:
> > > >  
> > > > > --- a/include/linux/netfilter/ipset/ip_set_timeout.h
> > > > > +++ b/include/linux/netfilter/ipset/ip_set_timeout.h
> > > > > @@ -30,6 +30,10 @@ ip_set_timeout_uget(struct nlattr *tb)
> > > > >  {
> > > > >  	unsigned int timeout = ip_set_get_h32(tb);
> > > > >  
> > > > > +	/* Normalize to fit into jiffies */
> > > > > +	if (timeout > UINT_MAX/1000)
> > > > > +		timeout = UINT_MAX/1000;
> > > > > +
> > > > 
> > > > Doesn't that rather assume that HZ is 1000 ?
> > > 
> > > Indeed. I overlooked that. Thanks David.
> > 
> > I dont think so.
> > 
> > 1000 here is really MSEC_PER_SEC
> > 
> > (All occurrences should be changed in this file)
> 
> Yes, exactly. Pablo, shall I produce a little patch or could you change 
> 1000 to MSEC_PER_SEC?

New patch attached.

I have rebase my tree again.

[-- Attachment #2: 0001-netfilter-ipset-fix-timeout-value-overflow-bug.patch --]
[-- Type: text/x-diff, Size: 2715 bytes --]

>From 9fdc90b2d00ecf98797c147d58cfe324fc92c9ed Mon Sep 17 00:00:00 2001
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date: Mon, 7 May 2012 02:35:44 +0000
Subject: [PATCH] netfilter: ipset: fix timeout value overflow bug

Large timeout parameters could result wrong timeout values due to
an overflow at msec to jiffies conversion (reported by Andreas Herz)

[ This patch was mangled by Pablo Neira Ayuso since David Laight and
  Eric Dumazet noticed that we were using hardcoded 1000 instead of
  MSEC_PER_SEC to calculate the timeout ]

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/ipset/ip_set_timeout.h |    4 ++++
 net/netfilter/xt_set.c                         |   15 +++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h
index 4792320..41d9cfa 100644
--- a/include/linux/netfilter/ipset/ip_set_timeout.h
+++ b/include/linux/netfilter/ipset/ip_set_timeout.h
@@ -30,6 +30,10 @@ ip_set_timeout_uget(struct nlattr *tb)
 {
 	unsigned int timeout = ip_set_get_h32(tb);
 
+	/* Normalize to fit into jiffies */
+	if (timeout > UINT_MAX/MSEC_PER_SEC)
+		timeout = UINT_MAX/MSEC_PER_SEC;
+
 	/* Userspace supplied TIMEOUT parameter: adjust crazy size */
 	return timeout == IPSET_NO_TIMEOUT ? IPSET_NO_TIMEOUT - 1 : timeout;
 }
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 0ec8138..035960e 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -44,6 +44,14 @@ const struct ip_set_adt_opt n = {	\
 	.cmdflags = cfs,		\
 	.timeout = t,			\
 }
+#define ADT_MOPT(n, f, d, fs, cfs, t)	\
+struct ip_set_adt_opt n = {		\
+	.family	= f,			\
+	.dim = d,			\
+	.flags = fs,			\
+	.cmdflags = cfs,		\
+	.timeout = t,			\
+}
 
 /* Revision 0 interface: backward compatible with netfilter/iptables */
 
@@ -296,11 +304,14 @@ static unsigned int
 set_target_v2(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_set_info_target_v2 *info = par->targinfo;
-	ADT_OPT(add_opt, par->family, info->add_set.dim,
-		info->add_set.flags, info->flags, info->timeout);
+	ADT_MOPT(add_opt, par->family, info->add_set.dim,
+		 info->add_set.flags, info->flags, info->timeout);
 	ADT_OPT(del_opt, par->family, info->del_set.dim,
 		info->del_set.flags, 0, UINT_MAX);
 
+	/* Normalize to fit into jiffies */
+	if (add_opt.timeout > UINT_MAX/MSEC_PER_SEC)
+		add_opt.timeout = UINT_MAX/MSEC_PER_SEC;
 	if (info->add_set.index != IPSET_INVALID_ID)
 		ip_set_add(info->add_set.index, skb, par, &add_opt);
 	if (info->del_set.index != IPSET_INVALID_ID)
-- 
1.7.10


^ permalink raw reply related

* [PATCH net-next 7/8] net/mlx4_core: Add XRC domains and counters to resource tracker
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Add missing resource tracking for XRC domains and complete the tracking for HCA 
network flow counters.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c          |   35 ++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    4 +
 drivers/net/ethernet/mellanox/mlx4/pd.c            |   39 ++++-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  202 +++++++++++++++++++-
 4 files changed, 275 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 2e94f76..984ace4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1306,7 +1306,7 @@ static void mlx4_cleanup_counters_table(struct mlx4_dev *dev)
 	mlx4_bitmap_cleanup(&mlx4_priv(dev)->counters_bitmap);
 }
 
-int mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx)
+int __mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
@@ -1319,13 +1319,44 @@ int mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx)
 
 	return 0;
 }
+
+int mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx)
+{
+	u64 out_param;
+	int err;
+
+	if (mlx4_is_mfunc(dev)) {
+		err = mlx4_cmd_imm(dev, 0, &out_param, RES_COUNTER,
+				   RES_OP_RESERVE, MLX4_CMD_ALLOC_RES,
+				   MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
+		if (!err)
+			*idx = get_param_l(&out_param);
+
+		return err;
+	}
+	return __mlx4_counter_alloc(dev, idx);
+}
 EXPORT_SYMBOL_GPL(mlx4_counter_alloc);
 
-void mlx4_counter_free(struct mlx4_dev *dev, u32 idx)
+void __mlx4_counter_free(struct mlx4_dev *dev, u32 idx)
 {
 	mlx4_bitmap_free(&mlx4_priv(dev)->counters_bitmap, idx);
 	return;
 }
+
+void mlx4_counter_free(struct mlx4_dev *dev, u32 idx)
+{
+	u64 in_param;
+
+	if (mlx4_is_mfunc(dev)) {
+		set_param_l(&in_param, idx);
+		mlx4_cmd(dev, in_param, RES_COUNTER, RES_OP_RESERVE,
+			 MLX4_CMD_FREE_RES, MLX4_CMD_TIME_CLASS_A,
+			 MLX4_CMD_WRAPPED);
+		return;
+	}
+	__mlx4_counter_free(dev, idx);
+}
 EXPORT_SYMBOL_GPL(mlx4_counter_free);
 
 static int mlx4_setup_hca(struct mlx4_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 8767fbf..86b6e5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -876,6 +876,10 @@ void __mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, u64 mac);
 int __mlx4_replace_mac(struct mlx4_dev *dev, u8 port, int qpn, u64 new_mac);
 int __mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 		     int start_index, int npages, u64 *page_list);
+int __mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx);
+void __mlx4_counter_free(struct mlx4_dev *dev, u32 idx);
+int __mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn);
+void __mlx4_xrcd_free(struct mlx4_dev *dev, u32 xrcdn);
 
 void mlx4_start_catas_poll(struct mlx4_dev *dev);
 void mlx4_stop_catas_poll(struct mlx4_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/pd.c b/drivers/net/ethernet/mellanox/mlx4/pd.c
index db4746d..1ac8863 100644
--- a/drivers/net/ethernet/mellanox/mlx4/pd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/pd.c
@@ -63,7 +63,7 @@ void mlx4_pd_free(struct mlx4_dev *dev, u32 pdn)
 }
 EXPORT_SYMBOL_GPL(mlx4_pd_free);
 
-int mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn)
+int __mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
@@ -73,12 +73,47 @@ int mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn)
 
 	return 0;
 }
+
+int mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn)
+{
+	u64 out_param;
+	int err;
+
+	if (mlx4_is_mfunc(dev)) {
+		err = mlx4_cmd_imm(dev, 0, &out_param,
+				   RES_XRCD, RES_OP_RESERVE,
+				   MLX4_CMD_ALLOC_RES,
+				   MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
+		if (err)
+			return err;
+
+		*xrcdn = get_param_l(&out_param);
+		return 0;
+	}
+	return __mlx4_xrcd_alloc(dev, xrcdn);
+}
 EXPORT_SYMBOL_GPL(mlx4_xrcd_alloc);
 
-void mlx4_xrcd_free(struct mlx4_dev *dev, u32 xrcdn)
+void __mlx4_xrcd_free(struct mlx4_dev *dev, u32 xrcdn)
 {
 	mlx4_bitmap_free(&mlx4_priv(dev)->xrcd_bitmap, xrcdn);
 }
+
+void mlx4_xrcd_free(struct mlx4_dev *dev, u32 xrcdn)
+{
+	u64 in_param;
+	int err;
+
+	if (mlx4_is_mfunc(dev)) {
+		set_param_l(&in_param, xrcdn);
+		err = mlx4_cmd(dev, in_param, RES_XRCD,
+			       RES_OP_RESERVE, MLX4_CMD_FREE_RES,
+			       MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
+		if (err)
+			mlx4_warn(dev, "Failed to release xrcdn %d\n", xrcdn);
+	} else
+		__mlx4_xrcd_free(dev, xrcdn);
+}
 EXPORT_SYMBOL_GPL(mlx4_xrcd_free);
 
 int mlx4_init_pd_table(struct mlx4_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index e17d759..626d1e3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -179,6 +179,16 @@ struct res_counter {
 	int			port;
 };
 
+enum res_xrcdn_states {
+	RES_XRCD_BUSY = RES_ANY_BUSY,
+	RES_XRCD_ALLOCATED,
+};
+
+struct res_xrcdn {
+	struct res_common	com;
+	int			port;
+};
+
 /* For Debug uses */
 static const char *ResourceType(enum mlx4_resource rt)
 {
@@ -191,6 +201,7 @@ static const char *ResourceType(enum mlx4_resource rt)
 	case RES_MAC: return  "RES_MAC";
 	case RES_EQ: return "RES_EQ";
 	case RES_COUNTER: return "RES_COUNTER";
+	case RES_XRCD: return "RES_XRCD";
 	default: return "Unknown resource type !!!";
 	};
 }
@@ -448,6 +459,20 @@ static struct res_common *alloc_counter_tr(int id)
 	return &ret->com;
 }
 
+static struct res_common *alloc_xrcdn_tr(int id)
+{
+	struct res_xrcdn *ret;
+
+	ret = kzalloc(sizeof *ret, GFP_KERNEL);
+	if (!ret)
+		return NULL;
+
+	ret->com.res_id = id;
+	ret->com.state = RES_XRCD_ALLOCATED;
+
+	return &ret->com;
+}
+
 static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
 				   int extra)
 {
@@ -478,7 +503,9 @@ static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
 	case RES_COUNTER:
 		ret = alloc_counter_tr(id);
 		break;
-
+	case RES_XRCD:
+		ret = alloc_xrcdn_tr(id);
+		break;
 	default:
 		return NULL;
 	}
@@ -601,6 +628,16 @@ static int remove_counter_ok(struct res_counter *res)
 	return 0;
 }
 
+static int remove_xrcdn_ok(struct res_xrcdn *res)
+{
+	if (res->com.state == RES_XRCD_BUSY)
+		return -EBUSY;
+	else if (res->com.state != RES_XRCD_ALLOCATED)
+		return -EPERM;
+
+	return 0;
+}
+
 static int remove_cq_ok(struct res_cq *res)
 {
 	if (res->com.state == RES_CQ_BUSY)
@@ -640,6 +677,8 @@ static int remove_ok(struct res_common *res, enum mlx4_resource type, int extra)
 		return remove_eq_ok((struct res_eq *)res);
 	case RES_COUNTER:
 		return remove_counter_ok((struct res_counter *)res);
+	case RES_XRCD:
+		return remove_xrcdn_ok((struct res_xrcdn *)res);
 	default:
 		return -EINVAL;
 	}
@@ -1246,6 +1285,50 @@ static int vlan_alloc_res(struct mlx4_dev *dev, int slave, int op, int cmd,
 	return 0;
 }
 
+static int counter_alloc_res(struct mlx4_dev *dev, int slave, int op, int cmd,
+			     u64 in_param, u64 *out_param)
+{
+	u32 index;
+	int err;
+
+	if (op != RES_OP_RESERVE)
+		return -EINVAL;
+
+	err = __mlx4_counter_alloc(dev, &index);
+	if (err)
+		return err;
+
+	err = add_res_range(dev, slave, index, 1, RES_COUNTER, 0);
+	if (err)
+		__mlx4_counter_free(dev, index);
+	else
+		set_param_l(out_param, index);
+
+	return err;
+}
+
+static int xrcdn_alloc_res(struct mlx4_dev *dev, int slave, int op, int cmd,
+			   u64 in_param, u64 *out_param)
+{
+	u32 xrcdn;
+	int err;
+
+	if (op != RES_OP_RESERVE)
+		return -EINVAL;
+
+	err = __mlx4_xrcd_alloc(dev, &xrcdn);
+	if (err)
+		return err;
+
+	err = add_res_range(dev, slave, xrcdn, 1, RES_XRCD, 0);
+	if (err)
+		__mlx4_xrcd_free(dev, xrcdn);
+	else
+		set_param_l(out_param, xrcdn);
+
+	return err;
+}
+
 int mlx4_ALLOC_RES_wrapper(struct mlx4_dev *dev, int slave,
 			   struct mlx4_vhcr *vhcr,
 			   struct mlx4_cmd_mailbox *inbox,
@@ -1291,6 +1374,16 @@ int mlx4_ALLOC_RES_wrapper(struct mlx4_dev *dev, int slave,
 				    vhcr->in_param, &vhcr->out_param);
 		break;
 
+	case RES_COUNTER:
+		err = counter_alloc_res(dev, slave, vhcr->op_modifier, alop,
+					vhcr->in_param, &vhcr->out_param);
+		break;
+
+	case RES_XRCD:
+		err = xrcdn_alloc_res(dev, slave, vhcr->op_modifier, alop,
+				      vhcr->in_param, &vhcr->out_param);
+		break;
+
 	default:
 		err = -EINVAL;
 		break;
@@ -1473,6 +1566,44 @@ static int vlan_free_res(struct mlx4_dev *dev, int slave, int op, int cmd,
 	return 0;
 }
 
+static int counter_free_res(struct mlx4_dev *dev, int slave, int op, int cmd,
+			    u64 in_param, u64 *out_param)
+{
+	int index;
+	int err;
+
+	if (op != RES_OP_RESERVE)
+		return -EINVAL;
+
+	index = get_param_l(&in_param);
+	err = rem_res_range(dev, slave, index, 1, RES_COUNTER, 0);
+	if (err)
+		return err;
+
+	__mlx4_counter_free(dev, index);
+
+	return err;
+}
+
+static int xrcdn_free_res(struct mlx4_dev *dev, int slave, int op, int cmd,
+			  u64 in_param, u64 *out_param)
+{
+	int xrcdn;
+	int err;
+
+	if (op != RES_OP_RESERVE)
+		return -EINVAL;
+
+	xrcdn = get_param_l(&in_param);
+	err = rem_res_range(dev, slave, xrcdn, 1, RES_XRCD, 0);
+	if (err)
+		return err;
+
+	__mlx4_xrcd_free(dev, xrcdn);
+
+	return err;
+}
+
 int mlx4_FREE_RES_wrapper(struct mlx4_dev *dev, int slave,
 			  struct mlx4_vhcr *vhcr,
 			  struct mlx4_cmd_mailbox *inbox,
@@ -1518,6 +1649,15 @@ int mlx4_FREE_RES_wrapper(struct mlx4_dev *dev, int slave,
 				   vhcr->in_param, &vhcr->out_param);
 		break;
 
+	case RES_COUNTER:
+		err = counter_free_res(dev, slave, vhcr->op_modifier, alop,
+				       vhcr->in_param, &vhcr->out_param);
+		break;
+
+	case RES_XRCD:
+		err = xrcdn_free_res(dev, slave, vhcr->op_modifier, alop,
+				     vhcr->in_param, &vhcr->out_param);
+
 	default:
 		break;
 	}
@@ -3032,6 +3172,64 @@ static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
 	spin_unlock_irq(mlx4_tlock(dev));
 }
 
+static void rem_slave_counters(struct mlx4_dev *dev, int slave)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
+	struct list_head *counter_list =
+		&tracker->slave_list[slave].res_list[RES_COUNTER];
+	struct res_counter *counter;
+	struct res_counter *tmp;
+	int err;
+	int index;
+
+	err = move_all_busy(dev, slave, RES_COUNTER);
+	if (err)
+		mlx4_warn(dev, "rem_slave_counters: Could not move all counters to "
+			  "busy for slave %d\n", slave);
+
+	spin_lock_irq(mlx4_tlock(dev));
+	list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
+		if (counter->com.owner == slave) {
+			index = counter->com.res_id;
+			radix_tree_delete(&tracker->res_tree[RES_COUNTER], index);
+			list_del(&counter->com.list);
+			kfree(counter);
+			__mlx4_counter_free(dev, index);
+		}
+	}
+	spin_unlock_irq(mlx4_tlock(dev));
+}
+
+static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
+	struct list_head *xrcdn_list =
+		&tracker->slave_list[slave].res_list[RES_XRCD];
+	struct res_xrcdn *xrcd;
+	struct res_xrcdn *tmp;
+	int err;
+	int xrcdn;
+
+	err = move_all_busy(dev, slave, RES_XRCD);
+	if (err)
+		mlx4_warn(dev, "rem_slave_xrcdns: Could not move all xrcdns to "
+			  "busy for slave %d\n", slave);
+
+	spin_lock_irq(mlx4_tlock(dev));
+	list_for_each_entry_safe(xrcd, tmp, xrcdn_list, com.list) {
+		if (xrcd->com.owner == slave) {
+			xrcdn = xrcd->com.res_id;
+			radix_tree_delete(&tracker->res_tree[RES_XRCD], xrcdn);
+			list_del(&xrcd->com.list);
+			kfree(xrcd);
+			__mlx4_xrcd_free(dev, xrcdn);
+		}
+	}
+	spin_unlock_irq(mlx4_tlock(dev));
+}
+
 void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -3045,5 +3243,7 @@ void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave)
 	rem_slave_mrs(dev, slave);
 	rem_slave_eqs(dev, slave);
 	rem_slave_mtts(dev, slave);
+	rem_slave_counters(dev, slave);
+	rem_slave_xrcdns(dev, slave);
 	mutex_unlock(&priv->mfunc.master.res_tracker.slave_list[slave].mutex);
 }
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 6/8] net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.

Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.

Note that during cleanup, we need to split the resource-tracker freeing
functionality.

Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.

Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |    7 ++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    8 +++++++-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   17 ++++++++++++-----
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 53b738b..1bcead1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1554,7 +1554,7 @@ int mlx4_multi_func_init(struct mlx4_dev *dev)
 	return 0;
 
 err_resource:
-	mlx4_free_resource_tracker(dev);
+	mlx4_free_resource_tracker(dev, RES_TR_FREE_ALL);
 err_thread:
 	flush_workqueue(priv->mfunc.master.comm_wq);
 	destroy_workqueue(priv->mfunc.master.comm_wq);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 8eed1f2..2e94f76 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2069,6 +2069,10 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 			mlx4_CLOSE_PORT(dev, p);
 		}
 
+		if (mlx4_is_master(dev))
+			mlx4_free_resource_tracker(dev,
+						   RES_TR_FREE_SLAVES_ONLY);
+
 		mlx4_cleanup_counters_table(dev);
 		mlx4_cleanup_mcg_table(dev);
 		mlx4_cleanup_qp_table(dev);
@@ -2081,7 +2085,8 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 		mlx4_cleanup_pd_table(dev);
 
 		if (mlx4_is_master(dev))
-			mlx4_free_resource_tracker(dev);
+			mlx4_free_resource_tracker(dev,
+						   RES_TR_FREE_STRUCTS_ONLY);
 
 		iounmap(priv->kar);
 		mlx4_uar_free(dev, &priv->driver_uar);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index cd56f1a..8767fbf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -146,6 +146,11 @@ enum mlx4_alloc_mode {
 	RES_OP_MAP_ICM,
 };
 
+enum mlx4_res_tracker_free_type {
+	RES_TR_FREE_ALL,
+	RES_TR_FREE_SLAVES_ONLY,
+	RES_TR_FREE_STRUCTS_ONLY,
+};
 
 /*
  *Virtual HCR structures.
@@ -1027,7 +1032,8 @@ int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave_id);
 int mlx4_init_resource_tracker(struct mlx4_dev *dev);
 
-void mlx4_free_resource_tracker(struct mlx4_dev *dev);
+void mlx4_free_resource_tracker(struct mlx4_dev *dev,
+				enum mlx4_res_tracker_free_type type);
 
 int mlx4_SET_PORT_wrapper(struct mlx4_dev *dev, int slave,
 			  struct mlx4_vhcr *vhcr,
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 66df9e1..e17d759 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -224,16 +224,23 @@ int mlx4_init_resource_tracker(struct mlx4_dev *dev)
 	return 0 ;
 }
 
-void mlx4_free_resource_tracker(struct mlx4_dev *dev)
+void mlx4_free_resource_tracker(struct mlx4_dev *dev,
+				enum mlx4_res_tracker_free_type type)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int i;
 
 	if (priv->mfunc.master.res_tracker.slave_list) {
-		for (i = 0 ; i < dev->num_slaves; i++)
-			mlx4_delete_all_resources_for_slave(dev, i);
-
-		kfree(priv->mfunc.master.res_tracker.slave_list);
+		if (type != RES_TR_FREE_STRUCTS_ONLY)
+			for (i = 0 ; i < dev->num_slaves; i++)
+				if (type == RES_TR_FREE_ALL ||
+				    dev->caps.function != i)
+					mlx4_delete_all_resources_for_slave(dev, i);
+
+		if (type != RES_TR_FREE_SLAVES_ONLY) {
+			kfree(priv->mfunc.master.res_tracker.slave_list);
+			priv->mfunc.master.res_tracker.slave_list = NULL;
+		}
 	}
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 8/8] net/mlx4_core: Fixed error flow in rem_slave_eqs
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 626d1e3..2650f23 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -3152,14 +3152,13 @@ static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
 							   MLX4_CMD_HW2SW_EQ,
 							   MLX4_CMD_TIME_CLASS_A,
 							   MLX4_CMD_NATIVE);
-					mlx4_dbg(dev, "rem_slave_eqs: failed"
-						 " to move slave %d eqs %d to"
-						 " SW ownership\n", slave, eqn);
+					if (err)
+						mlx4_dbg(dev, "rem_slave_eqs: failed"
+							 " to move slave %d eqs %d to"
+							 " SW ownership\n", slave, eqn);
 					mlx4_free_cmd_mailbox(dev, mailbox);
-					if (!err) {
-						atomic_dec(&eq->mtt->ref_count);
-						state = RES_EQ_RESERVED;
-					}
+					atomic_dec(&eq->mtt->ref_count);
+					state = RES_EQ_RESERVED;
 					break;
 
 				default:
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/8] batch of mlx4 driver fixes
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Or Gerlitz, Jack Morgenstein

Hi Dave, 

This is a batch of relatively small fixes for the mlx4_core driver, except for 
one cleanup patch from myself, all the patches are from Jack Morgenstein, who 
leads our SRIOV development efforts and do relate to the SRIOV functionality.
With Yevgeny being mostly out this week and as both of us run the internal 
reviews for upstream patches, he delegated this submission to be carried 
out by me, hope this is fine by you.

Or.

Jack Morgenstein (7):
  net/mlx4_core: Fix init_port mask state for slaves
  net/mlx4_core: Change SYNC_TPT to be native (not wrapped)
  net/mlx4_core: Remove unused *_str functions from the resource tracker
  net/mlx4_core: Do not reset module-parameter num_vfs when fail to enable sriov
  net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
  net/mlx4_core: Add XRC domains and counters to resource tracker
  net/mlx4_core: Fixed error flow in rem_slave_eqs

Or Gerlitz (1):
  net/mlx4: Address build warnings on set but not used variables

 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    7 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |    3 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |   47 +++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |    2 -
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   12 +-
 drivers/net/ethernet/mellanox/mlx4/mr.c            |    5 +-
 drivers/net/ethernet/mellanox/mlx4/pd.c            |   39 +++-
 drivers/net/ethernet/mellanox/mlx4/port.c          |    3 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  269 ++++++++++++++++----
 9 files changed, 316 insertions(+), 71 deletions(-)

Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>

^ permalink raw reply

* [PATCH net-next 1/8] net/mlx4: Address build warnings on set but not used variables
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

Handle the compiler warnings on variables which are set but not used
by removing the relevant variable or casting a return value which is
ignored on purpose to void.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    5 +----
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |    2 --
 drivers/net/ethernet/mellanox/mlx4/mr.c            |    3 ---
 drivers/net/ethernet/mellanox/mlx4/port.c          |    3 +--
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |    7 +++----
 5 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 773c70e..53b738b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1254,7 +1254,6 @@ static void mlx4_master_do_cmd(struct mlx4_dev *dev, int slave, u8 cmd,
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_slave_state *slave_state = priv->mfunc.master.slave_state;
 	u32 reply;
-	u32 slave_status = 0;
 	u8 is_going_down = 0;
 	int i;
 
@@ -1274,10 +1273,8 @@ static void mlx4_master_do_cmd(struct mlx4_dev *dev, int slave, u8 cmd,
 		}
 		/*check if we are in the middle of FLR process,
 		if so return "retry" status to the slave*/
-		if (MLX4_COMM_CMD_FLR == slave_state[slave].last_cmd) {
-			slave_status = MLX4_DELAY_RESET_SLAVE;
+		if (MLX4_COMM_CMD_FLR == slave_state[slave].last_cmd)
 			goto inform_slave_state;
-		}
 
 		/* write the version in the event field */
 		reply |= mlx4_comm_get_version();
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 4799e82..f4a8f98 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -357,7 +357,6 @@ static int add_promisc_qp(struct mlx4_dev *dev, u8 port,
 	u32 prot;
 	int i;
 	bool found;
-	int last_index;
 	int err;
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
@@ -419,7 +418,6 @@ static int add_promisc_qp(struct mlx4_dev *dev, u8 port,
 			if (err)
 				goto out_mailbox;
 		}
-		last_index = entry->index;
 	}
 
 	/* add the new qpn to list of promisc qps */
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index fe2ac84..cefa76f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -788,7 +788,6 @@ int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
 		   int max_maps, u8 page_shift, struct mlx4_fmr *fmr)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
-	u64 mtt_offset;
 	int err = -ENOMEM;
 
 	if (max_maps > dev->caps.max_fmr_maps)
@@ -811,8 +810,6 @@ int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
 	if (err)
 		return err;
 
-	mtt_offset = fmr->mr.mtt.offset * dev->caps.mtt_entry_sz;
-
 	fmr->mtts = mlx4_table_find(&priv->mr_table.mtt_table,
 				    fmr->mr.mtt.offset,
 				    &fmr->dma_handle);
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 55b12e6..70ef4f1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -338,11 +338,10 @@ EXPORT_SYMBOL_GPL(__mlx4_unregister_mac);
 void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, u64 mac)
 {
 	u64 out_param;
-	int err;
 
 	if (mlx4_is_mfunc(dev)) {
 		set_param_l(&out_param, port);
-		err = mlx4_cmd_imm(dev, mac, &out_param, RES_MAC,
+		(void) mlx4_cmd_imm(dev, mac, &out_param, RES_MAC,
 				   RES_OP_RESERVE_AND_MAP, MLX4_CMD_FREE_RES,
 				   MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
 		return;
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 8752e6e..d8970cc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2536,7 +2536,7 @@ int mlx4_QP_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
 	struct mlx4_qp qp; /* dummy for calling attach/detach */
 	u8 *gid = inbox->buf;
 	enum mlx4_protocol prot = (vhcr->in_modifier >> 28) & 0x7;
-	int err, err1;
+	int err;
 	int qpn;
 	struct res_qp *rqp;
 	int attach = vhcr->op_modifier;
@@ -2571,7 +2571,7 @@ int mlx4_QP_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
 
 ex_rem:
 	/* ignore error return below, already in error */
-	err1 = rem_mcg_res(dev, slave, rqp, gid, prot, type);
+	(void) rem_mcg_res(dev, slave, rqp, gid, prot, type);
 ex_put:
 	put_res(dev, slave, qpn, RES_QP);
 
@@ -2604,12 +2604,11 @@ static void detach_qp(struct mlx4_dev *dev, int slave, struct res_qp *rqp)
 {
 	struct res_gid *rgid;
 	struct res_gid *tmp;
-	int err;
 	struct mlx4_qp qp; /* dummy for calling attach/detach */
 
 	list_for_each_entry_safe(rgid, tmp, &rqp->mcg_list, list) {
 		qp.qpn = rqp->local_qpn;
-		err = mlx4_qp_detach_common(dev, &qp, rgid->gid, rgid->prot,
+		(void) mlx4_qp_detach_common(dev, &qp, rgid->gid, rgid->prot,
 					    rgid->steer);
 		list_del(&rgid->list);
 		kfree(rgid);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/8] net/mlx4_core: Fix init_port mask state for slaves
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

In function mlx4_INIT_PORT_wrapper, the port state mask for the
slave is only set if we are invoking the INIT_PORT fw command.

However, the reference count for the (initialized) port is
incremented anyway.

This creates a problem in that when we have multiple slaves,
then the CLOSE_PORT command will never be invoked. The
reason is that in the CLOSE_PORT wrapper, if the port-state
mask is zero for the slave (which it is), the wrapper returns
without doing anything. The only slave which will not return
immediately in the CLOSE_PORT wrapper is that slave for which
INIT_PORT was invoked.

The fix is to not have the port-state mask setting depend
on the logic for calling the INIT_PORT fw command.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 2a02ba5..24429a9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1164,9 +1164,8 @@ int mlx4_INIT_PORT_wrapper(struct mlx4_dev *dev, int slave,
 			       MLX4_CMD_TIME_CLASS_A, MLX4_CMD_NATIVE);
 		if (err)
 			return err;
-		priv->mfunc.master.slave_state[slave].init_port_mask |=
-			(1 << port);
 	}
+	priv->mfunc.master.slave_state[slave].init_port_mask |= (1 << port);
 	++priv->mfunc.master.init_port_ref[port];
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/8] net/mlx4_core: Change SYNC_TPT to be native (not wrapped)
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

The "wrapped" was incorrect, since no wrapper function was defined.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mr.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index cefa76f..af55b7c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -892,6 +892,6 @@ EXPORT_SYMBOL_GPL(mlx4_fmr_free);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev)
 {
 	return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000,
-			MLX4_CMD_WRAPPED);
+			MLX4_CMD_NATIVE);
 }
 EXPORT_SYMBOL_GPL(mlx4_SYNC_TPT);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 4/8] net/mlx4_core: Remove unused *_str functions from the resource tracker
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Removed unsued *_str helper functions from resource_tracker.c

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   30 --------------------
 1 files changed, 0 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index d8970cc..66df9e1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -89,17 +89,6 @@ enum res_qp_states {
 	RES_QP_HW
 };
 
-static inline const char *qp_states_str(enum res_qp_states state)
-{
-	switch (state) {
-	case RES_QP_BUSY: return "RES_QP_BUSY";
-	case RES_QP_RESERVED: return "RES_QP_RESERVED";
-	case RES_QP_MAPPED: return "RES_QP_MAPPED";
-	case RES_QP_HW: return "RES_QP_HW";
-	default: return "Unknown";
-	}
-}
-
 struct res_qp {
 	struct res_common	com;
 	struct res_mtt	       *mtt;
@@ -173,16 +162,6 @@ enum res_srq_states {
 	RES_SRQ_HW,
 };
 
-static inline const char *srq_states_str(enum res_srq_states state)
-{
-	switch (state) {
-	case RES_SRQ_BUSY: return "RES_SRQ_BUSY";
-	case RES_SRQ_ALLOCATED: return "RES_SRQ_ALLOCATED";
-	case RES_SRQ_HW: return "RES_SRQ_HW";
-	default: return "Unknown";
-	}
-}
-
 struct res_srq {
 	struct res_common	com;
 	struct res_mtt	       *mtt;
@@ -195,15 +174,6 @@ enum res_counter_states {
 	RES_COUNTER_ALLOCATED,
 };
 
-static inline const char *counter_states_str(enum res_counter_states state)
-{
-	switch (state) {
-	case RES_COUNTER_BUSY: return "RES_COUNTER_BUSY";
-	case RES_COUNTER_ALLOCATED: return "RES_COUNTER_ALLOCATED";
-	default: return "Unknown";
-	}
-}
-
 struct res_counter {
 	struct res_common	com;
 	int			port;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 5/8] net/mlx4_core: Do not reset module-parameter num_vfs when fail to enable sriov
From: Or Gerlitz @ 2012-05-14 20:04 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, Jack Morgenstein, Or Gerlitz
In-Reply-To: <1337025853-26685-1-git-send-email-ogerlitz@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

Consider the following scenario: 2 HCAs, where only one of which can run SRIOV.

If we reset the module parameter, all the VFs of the SRIOV HCA will be
claimed by the PPF host (-- the code relies on num_vfs being non-zero
to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed
for the non-SRIOV HCA).

The solution is not to touch the num_vfs parameter.

Also, eliminate the unneeded check of num_vfs when disabling sriov
(the dev flag bit is sufficient).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 8bb05b4..8eed1f2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1865,7 +1865,6 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 				mlx4_err(dev, "Failed to enable sriov,"
 					 "continuing without sriov enabled"
 					 " (err = %d).\n", err);
-				num_vfs = 0;
 				err = 0;
 			} else {
 				mlx4_warn(dev, "Running in master mode\n");
@@ -2022,7 +2021,7 @@ err_cmd:
 	mlx4_cmd_cleanup(dev);
 
 err_sriov:
-	if (num_vfs && (dev->flags & MLX4_FLAG_SRIOV))
+	if (dev->flags & MLX4_FLAG_SRIOV)
 		pci_disable_sriov(pdev);
 
 err_rel_own:
@@ -2099,7 +2098,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 
 		if (dev->flags & MLX4_FLAG_MSI_X)
 			pci_disable_msix(pdev);
-		if (num_vfs && (dev->flags & MLX4_FLAG_SRIOV)) {
+		if (dev->flags & MLX4_FLAG_SRIOV) {
 			mlx4_warn(dev, "Disabling sriov\n");
 			pci_disable_sriov(pdev);
 		}
-- 
1.7.1

^ permalink raw reply related

* pull request: wireless-next 2012-05-14
From: John W. Linville @ 2012-05-14 19:55 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev

commit 341352d13dae752610342923c53ebe461624ee2c

Dave,

This is essentially the same batch of updates for 3.5 that I originally
offered ~1.5 weeks ago.  The difference is that I rebased the tree
both to drop the offending bluetooth-next pull and to base on top of
a more recent net-next in order to resolve the continuing NLA_PUT_*
issues affecting linux-next.  I also took the opportunity to resolve
a little merge damage resulting from a recent pull of the net tree
into net-next.

Other highlights of this pull request include some HT enhancements
for mac80211, an expansion of the ethtool support for cfg80211-
and mac80211-based drivers, and some more iwlwifi refactoring.

Please let me know if there are problems!

John

---

The following changes since commit 9bb862beb6e5839e92f709d33fda07678f062f20:

  Merge branch 'master' of git://1984.lsi.us.es/net-next (2012-05-08 14:40:21 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git master

Amitkumar Karwar (1):
      mwifiex: fix static checker warnings

Anisse Astier (2):
      rt2x00: debugfs support - allow a register to be empty
      rt2x00: Add debugfs access for rfcsr register

Ashok Nagarajan (4):
      mac80211: Advertise HT protection mode in IEs
      mac80211: Implement HT mixed protection mode
      mac80211: Allow nonHT/HT peering in mesh
      {nl,cfg,mac}80211: Allow user to see/configure HT protection mode

Ben Greear (4):
      cfg80211: Add framework to support ethtool stats.
      mac80211: Support getting sta_info stats via ethtool.
      mac80211: Framework to get wifi-driver stats via ethtool.
      mac80211: Add more ethtools stats: survey, rates, etc

Ben Hutchings (2):
      ipw2200: Fix order of device registration
      ipw2100: Fix order of device registration

Dan Carpenter (1):
      wireless: at76c50x: allocating too much data

Emmanuel Grumbach (3):
      iwlwifi: use IWL_* instead of dev_printk when possible
      iwlwifi: don't init trans->reg_lock from the op_mode
      cfg80211: fix BSS comparison

Franky Lin (4):
      brcmfmac: stop releasing sdio host in irq handler
      brcmfmac: check bus state for status
      brcmfmac: postpone interrupt register function
      brcmfmac: add out of band interrupt support

John W. Linville (1):
      iwlwifi: fix-up some merge damage from commit 0d6c4a2

Luis R. Rodriguez (1):
      libertas: include sched.h on firmware.c

Meenakshi Venkataraman (1):
      iwlwifi: use correct released ucode version

Rajkumar Manoharan (1):
      mac80211: fix rate control update on 2040 bss change

Stanislav Yakovlev (1):
      net/wireless: ipw2200: Fix WARN_ON occurring in wiphy_register called by ipw_pci_probe

Stanislaw Gruszka (1):
      iwlwifi: add option to disable 5GHz band

Thomas Pedersen (2):
      mac80211: insert mesh peer after init
      mac80211: don't transmit 40MHz frames to 20MHz peer

WarheadsSE (1):
      mwifiex: add support for SD8786 sdio

Wey-Yi Guy (11):
      iwlwifi: remove unused macros
      iwlwifi: add BT reduced tx power flag
      iwlwifi: add checking for the condition to reduce tx power
      iwlwifi: add reduced tx power threshold define
      iwlwifi: small define change
      iwlwifi: send reduce tx power info in command
      iwlwifi: change kill mask based on reduce power state
      iwlwifi: add loose coex lut
      iwlwifi: modify #ifdef to avoid sparse complain
      iwlwifi: remove the iwl_shared reference
      iwlwifi: use 6000G2B for 6030 device series

 drivers/net/wireless/at76c50x-usb.c                |    4 +-
 drivers/net/wireless/brcm80211/Kconfig             |    9 +
 drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c   |   97 ++++++++++-
 .../net/wireless/brcm80211/brcmfmac/bcmsdh_sdmmc.c |  105 +++++++++++-
 drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c |   39 +++--
 .../net/wireless/brcm80211/brcmfmac/sdio_host.h    |   22 ++-
 drivers/net/wireless/ipw2x00/ipw2100.c             |   24 ++--
 drivers/net/wireless/ipw2x00/ipw2200.c             |   44 ++---
 drivers/net/wireless/iwlwifi/iwl-agn-lib.c         |  153 ++++++++---------
 drivers/net/wireless/iwlwifi/iwl-agn-rx.c          |    3 +-
 drivers/net/wireless/iwlwifi/iwl-agn.c             |   38 ++--
 drivers/net/wireless/iwlwifi/iwl-agn.h             |    2 +-
 drivers/net/wireless/iwlwifi/iwl-commands.h        |   21 ++-
 drivers/net/wireless/iwlwifi/iwl-dev.h             |    1 +
 drivers/net/wireless/iwlwifi/iwl-drv.c             |   12 +-
 drivers/net/wireless/iwlwifi/iwl-modparams.h       |    8 +-
 drivers/net/wireless/iwlwifi/iwl-trans-pcie.c      |    1 +
 drivers/net/wireless/libertas/firmware.c           |    1 +
 drivers/net/wireless/mwifiex/Kconfig               |    4 +-
 drivers/net/wireless/mwifiex/fw.h                  |    3 +-
 drivers/net/wireless/mwifiex/sdio.c                |    7 +
 drivers/net/wireless/mwifiex/sdio.h                |    1 +
 drivers/net/wireless/rt2x00/rt2800.h               |    2 +
 drivers/net/wireless/rt2x00/rt2800lib.c            |    7 +
 drivers/net/wireless/rt2x00/rt2x00debug.c          |   82 +++++----
 drivers/net/wireless/rt2x00/rt2x00debug.h          |    1 +
 include/linux/nl80211.h                            |    3 +
 include/net/cfg80211.h                             |   18 ++
 include/net/mac80211.h                             |   17 ++
 net/mac80211/cfg.c                                 |  182 ++++++++++++++++++++
 net/mac80211/driver-ops.h                          |   37 ++++
 net/mac80211/driver-trace.h                        |   15 ++
 net/mac80211/ibss.c                                |    2 +-
 net/mac80211/ieee80211_i.h                         |    3 +-
 net/mac80211/mesh.c                                |   18 ++-
 net/mac80211/mesh_plink.c                          |   96 ++++++++++-
 net/mac80211/mlme.c                                |    2 +-
 net/mac80211/sta_info.h                            |    1 +
 net/mac80211/util.c                                |    9 +-
 net/wireless/ethtool.c                             |   29 +++
 net/wireless/mesh.c                                |    1 +
 net/wireless/nl80211.c                             |    7 +-
 net/wireless/scan.c                                |    6 +-
 43 files changed, 893 insertions(+), 244 deletions(-)
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* [PATCH -next] net/codel: Add missing #include <linux/prefetch.h>
From: Geert Uytterhoeven @ 2012-05-14 19:47 UTC (permalink / raw)
  To: Eric Dumazet, Dave Taht, David S. Miller
  Cc: netdev, linux-kernel, linux-next, Geert Uytterhoeven

m68k allmodconfig:

net/sched/sch_codel.c: In function ‘dequeue’:
net/sched/sch_codel.c:70: error: implicit declaration of function ‘prefetch’
make[1]: *** [net/sched/sch_codel.o] Error 1

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
http://kisskb.ellerman.id.au/kisskb/buildresult/6320394/

 net/sched/sch_codel.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index b4a1a81..213ef60 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -46,6 +46,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/skbuff.h>
+#include <linux/prefetch.h>
 #include <net/pkt_sched.h>
 #include <net/codel.h>
 
-- 
1.7.0.4

^ permalink raw reply related

* Re[2]:  [PATCH] netfilter: xt_HMARK: endian bugs
From: Hans Schillstrom @ 2012-05-14 19:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Hans Schillstrom,
	Jan Engelhardt, kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	dan.carpenter@oracle.com




>---- Original Message ----
>From: Eric Dumazet <eric.dumazet@gmail.com>
>To: "Pablo Neira Ayuso" <pablo@netfilter.org>
>Cc: "Jozsef Kadlecsik" <kadlec@blackhole.kfki.hu>, "Hans Schillstrom" <hans.schillstrom@ericsson.com>, "Jan Engelhardt" <jengelh@inai.de>, "kaber@trash.net" <kaber@trash.net>, "jengelh@medozas.de" <jengelh@medozas.de>, "netfilter-devel@vger.kernel.org" <netfilter-devel@vger.kernel.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "dan.carpenter@oracle.com" <dan.carpenter@oracle.com>, "hans@schillstrom.com" <hans@schillstrom.com>
>Sent: Mon, May 14, 2012, 9:14 PM
>Subject: Re: [PATCH] netfilter: xt_HMARK: endian bugs
>
>On Mon, 2012-05-14 at 21:02 +0200, Pablo Neira Ayuso wrote:
>
>> IIRC, Hans wants that, in case you have a cluster composed of system
>> with different endianess, the hash mark calculated will be the same
>> in both systems. To ensure that the distribution is consistent with
>> independency of the endianess.
>
>Then jhash() must be audited to make sure its output is OK with this
>requirement.

I'm preparing test on mips & ppc right now to verify that as a first step.

 


^ permalink raw reply

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Michael S. Tsirkin @ 2012-05-14 19:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David S. Miller, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <1337023307.8512.617.camel@edumazet-glaptop>

On Mon, May 14, 2012 at 09:21:47PM +0200, Eric Dumazet wrote:
> On Mon, 2012-05-14 at 22:14 +0300, Michael S. Tsirkin wrote:
> 
> > They seem to be in net-next, or did I miss something?
> 
> Yes, you re-introduce in this patch some bugs already fixed in macvtap

Could explain why I see some problems in testing :)
Maybe common code should go into net/core?
I couldn't decide whether the increase in kernel
size is worth it.

-- 
MST

^ permalink raw reply

* [PATCH] net/bridge/netfilter: Fix the randconfig warning
From: Devendra Naga @ 2012-05-14 19:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Patrick McHardy, Stephen Hemminger,
	David S. Miller, netfilter-devel, netfilter, coreteam, bridge,
	netdev
  Cc: Devendra Naga

when ran with make randconfig got

warning: (BRIDGE_NF_EBTABLES) selects NETFILTER_XTABLES which has unmet direct dependencies (NET && INET && NETFILTER)

added NET && INET && NETFILTER dependency to the BRIDGE_NF_EBTABLES

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
---
 net/bridge/netfilter/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index a9aff9c..34f6d9d 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -4,7 +4,7 @@
 
 menuconfig BRIDGE_NF_EBTABLES
 	tristate "Ethernet Bridge tables (ebtables) support"
-	depends on BRIDGE && NETFILTER
+	depends on NET && INET && BRIDGE && NETFILTER
 	select NETFILTER_XTABLES
 	help
 	  ebtables is a general, extensible frame/packet identification
-- 
1.7.9.5


^ permalink raw reply related

* Re: [PATCH net-next 0/2] extend sch_mqprio to distribute traffic not only by ETS TC
From: Amir Vadai @ 2012-05-14 19:24 UTC (permalink / raw)
  To: John Fastabend
  Cc: David S. Miller, netdev, Oren Duer, Liran Liss, Jamal Hadi Salim,
	Diego Crupnicoff, Or Gerlitz
In-Reply-To: <4FAA0D2C.6040306@intel.com>

On 05/09/2012 09:22 AM, John Fastabend wrote:
> On 5/8/2012 6:56 AM, Amir Vadai wrote:
>> On 05/08/2012 03:54 AM, John Fastabend wrote:
>>> On 5/6/2012 12:05 AM, Amir Vadai wrote:
>>>> This series comes to revive the discussion initiated on the thread "net:
>>>> support tx_ring per UP in HW based QoS mechanism" (see
>>>> http://marc.info/?t=133165957200004&r=1&w=2) with the major issue to be address
>>>> is - how should sk_prio<=>    TC be done, for both, tagged and untagged traffic.
>>>> Following is a staged description addressing the background, problem
>>>> description, current situation, suggestion for the change and implementation of
>>>> it.
>>>
>>> OK but the mqprio qdisc is only concerned with mapping skb->priority to
>>> queue sets I perhaps unfortunately called the queue sets tc's. Try not
>>> to get hung up on my perhaps limiting naming of variables and functions.
>> I understand that.
>
> I figured you did. Just wanted to state it again.
>
>>
>>>
>>> mqprio is used outside of 802.1Q as well in these cases a traffic class
>>> is not usually even defined.
>>>
>>>>
>
> [...]
>
>>>> Implementation
>>>> --------------
>>>> Extended mqprio hw attribute:
>>>> * Bit 1: is queue offset/count owned by HW
>>>> * Bits 2-7: HW queueing type.
>>>>     * 0 - by ETS TC
>>>>     * 1 - by UP
>>>>
>>>> __skb_tx_hash() is now aware to the HW queuing type (pg_type): for pg_type
>>>> being ETS TC, traffic is distributed as it was before - tagged and untagged
>>>> packets are distributed by netdev_get_prio_tc_map. For pg_type being UP, tagged
>>>> and untagged packets are distributed by UP (taken from egress map for tagged
>>>> traffic, or netdev_get_prio_tc_map for untagged).
>>>
>>> I guess I don't see why we need this. If you keep the mqprio priority to
>>> queue set mapping set to 1:1. Then modify the egress map accordingly then
>>> this should work right?
>>>
>>> For example:
>>>
>>> If we want to map 8 802.1Q priority code points onto 4 traffic classes this
>>> should work,
>>>
>>> egress map: 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- vlan layer inserts correct tag
>>> mqprio up2tc: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7<-- mqprio with unfortunate 'tc' name maps priority to queues
>> But mqprio is mapping sk_priority to queue set, which is different from UP to queue set.
>> The term UP which we use, is the 8021q PCP in tagged traffic, and a priority mapped by the host admin for untagged traffic.
>> What you wrote, is actually, that the application will set the priority without enabling the host admin to control it.
>>> dcbnl up2tc: 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- dcbx pushes up2tc mapping correctly
>>
>> For example, lets take an application that calls setsockopt(SO_PRIORITY, 2):
>> according to egress map: 8021q PCP field in vlan tag should be set to 1 (=UP)
>> according to mqprio: a tx ring belonging to UP 2 will be selected.
>> according to dcbnl: traffic will have ETS attributes of TC 1
>>
>> Except some conceptual problems, it won't work:
>> 8021q PCP field is set by HW according to the tx ring (UP=2). which
>> is different from the one that the user configured in the egress_map
>> (UP=1).
>
> sorry I set it up wrong above but it _can_ be made to work as best I
> can tell (for a single egress_map at least)
>
> egress map: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7<-- ignored
> mqprio    : 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7<-- map priority to up queue sets
> up2tc     : 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- dcbx negotiated up2tc map
>
> With your example application does setsockopt(SO_PRIORITY, 2):
> 	according to egress map : insert PCP 2
> 	according to mqprio	: tx ring belonging to UP2 is selected
> 	accroding to dcbnl	: traffic will have ETS attributes of TC1
>
> The point is either you use the skb->priority to PCP map and then a 1:1
> mqprio map or you use the mqprio map directly. Agree?
>
> One more example translating the case with these patches onto a case
> without these patches as I understand them.
>
> first with these patches mapping:
>
> up2tc     : 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- dcbnl up2tc map
> egress map: 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- sets pcp bits
> mqprio    : 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7<-- with PGROUP_UP set
>
> equivalent mapping without PGROUP_UP set
>
> up2tc     : 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- dcbnl negotiated up2tc map
> egress map: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0<-- default unset egress map
> mqprio    : 0:0 1:0 2:1 3:1 4:2 5:2 6:3 7:3<-- with PGROUP_UP set
>
> Application sets skb->priority to 2, in the first case egress map
> sets the PCP to 1 and mqprio maps it to queues for UP1 based on PCP bits.
>
> In case two egress map does nothing but skb->priority is still 2 so the
> mqprio maps these to queues associated with UP1 so everything works
> still.
>
> I apologize if I'm missing something here but it seems correct to me. Spell
> it out for me if I am still wrong.
>
>>
>>>
>>> We may need to fixup userspace tools some but I think the kernel mechanisms
>>> are in place.
>>>
>>> BTW I did think about this while implementing the existing code and decided
>>> that rather than create more if/else branches the case you are describing
>>> could be handled by independently controlling the priority to queue set mappings
>>> in mqprio and the egress map.
>>>
>>> Feel free to let me know where I went wrong or why this doesn't work on
>>> your hardware. I agree though we may need to fixup lldpad and maybe even
>>> 'tc' some to get this to look correct in user space and get automagically
>>> setup correctly.
>>>
>>> Thanks,
>>> .John
>>
>> I think I understand your stand, that mqprio mapping should be generic
>> and should not be TC nor UP oriented.
>>
>
> Right. Also I want to avoid user space having to somehow "know" which
> mode to put the qdisc in. Checking if the hardware is a mellanox card
> does not scale.
>
>> The problem is that our HW needs the queues to be selected by UP. ETS
>> TC mapping is configured before traffic starts, and therefore ETS
>> attributes are enforced by HW according to the UP, which the tx ring
>> belongs to. Same thing for vlan tag in tagged traffic. 8021q PCP is
>> placed by HW according to the tx ring.
>
> OK. Can you agree though in the case where we restrict the egress_map
> to be the same for all vlan's on a real_dev the existing mqprio map
> _can_ work?
>
>>
>> For backward compatibility, tagged traffic should be steered to a tx
>> ring by UP taken from egress map - skb_tx_hash() as it is today, can't
>> do it. Even if we implement ndo_select_queue(), we will need to
>> duplicate most of the code from skb_tx_hash(), because there are more
>> than one tx ring per UP, and we need skb_tx_hash() to be able to select
>> a ring from a range of rings belonging to a UP.
>
> agreed implementing this in select_queue() is no good.
>
>>
>> Also, there are some configurations that can't be done by mqprio and
>> egress map. For example when having two vlans with different egress
>> maps.
>
> This is a good example thanks. This currently doesn't work for hardware
> that maps tx_rings to traffic classes either. So if we need/want to
> solve this case then I guess we need something like your patches. Note
> I think the patch you submitted should work regardless if you map
> the PCP bits to UP queues or PCP bits to TC queues. We could use this
> in both cases which helps my user space concerns.
>
>>
>> And in the conceptual level, I think that kernel should not accept bad
>> configurations and rely on user space to protect it.
>
> I disagree policy should be managed in user space. Also I see no reason
> to create a strong coupling between egress maps and mqprio maps. I can
> imagine a case where they could be used independently. DCB is just one
> usage model for mqprio. Using a bit like you did seems fine though.
>
>>
>> - Amir
>
> Assuming we want the multiple different egress map case to work I guess
> we will have to do something like this. At least right now I can't think
> of anything better but I'll think on it tomorrow some more.
>
> The other thought would be to provide a qdisc hook before queue selection
> to attach 'tc filters' and provide a new action to hash a skb across
> queue sets.
>
> Thanks,
> John

John Hi,

After some internal discussions, it was agreed to line up with your 
approach, to leave mqprio an abstract skb->priority <=> queue set 
mapping and to ignore egress_map if mqprio is enabled.

It would be very nice, if the term 'tc' in kernel code would be replaced 
to queue set, since it is very misleading.

There still might be some small issues with skb_tx_hash for tagged 
traffic, which I will work on tomorrow, and hopefully will send a new 
patch set with the solution.

Thanks,
Amir

^ permalink raw reply

* [PATCH] pch_gbe: fix transmit races
From: Eric Dumazet @ 2012-05-14 19:26 UTC (permalink / raw)
  To: Andy Cress, David Miller; +Cc: netdev
In-Reply-To: <40680C535D6FE6498883F1640FACD44DDF93F1@ka-exchange-1.kontronamerica.local>

From: Eric Dumazet <edumazet@google.com>

Andy reported pch_gbe triggered "NETDEV WATCHDOG" errors.

May 11 11:06:09 kontron kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0x1ec/0x200() (Not tainted)
May 11 11:06:09 kontron kernel: Hardware name: N/A
May 11 11:06:09 kontron kernel: NETDEV WATCHDOG: eth0 (pch_gbe):
transmit queue 0 timed out

It seems pch_gbe has a racy tx path (races with TX completion path)

Remove tx_queue_lock lock since it has no purpose, we must use tx_lock
instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andy Cress <andy.cress@us.kontron.com>
Tested-by: Andy Cress <andy.cress@us.kontron.com>
---

patch rebased on net tree.

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h      |    2 
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c |   25 ++++------
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index dd14915..ba78174 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -584,7 +584,6 @@ struct pch_gbe_hw_stats {
 /**
  * struct pch_gbe_adapter - board specific private data structure
  * @stats_lock:	Spinlock structure for status
- * @tx_queue_lock:	Spinlock structure for transmit
  * @ethtool_lock:	Spinlock structure for ethtool
  * @irq_sem:		Semaphore for interrupt
  * @netdev:		Pointer of network device structure
@@ -609,7 +608,6 @@ struct pch_gbe_hw_stats {
 
 struct pch_gbe_adapter {
 	spinlock_t stats_lock;
-	spinlock_t tx_queue_lock;
 	spinlock_t ethtool_lock;
 	atomic_t irq_sem;
 	struct net_device *netdev;
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 8035e5f..1e38d50 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -640,14 +640,11 @@ static void pch_gbe_mac_set_pause_packet(struct pch_gbe_hw *hw)
  */
 static int pch_gbe_alloc_queues(struct pch_gbe_adapter *adapter)
 {
-	int size;
-
-	size = (int)sizeof(struct pch_gbe_tx_ring);
-	adapter->tx_ring = kzalloc(size, GFP_KERNEL);
+	adapter->tx_ring = kzalloc(sizeof(*adapter->tx_ring), GFP_KERNEL);
 	if (!adapter->tx_ring)
 		return -ENOMEM;
-	size = (int)sizeof(struct pch_gbe_rx_ring);
-	adapter->rx_ring = kzalloc(size, GFP_KERNEL);
+
+	adapter->rx_ring = kzalloc(sizeof(*adapter->rx_ring), GFP_KERNEL);
 	if (!adapter->rx_ring) {
 		kfree(adapter->tx_ring);
 		return -ENOMEM;
@@ -1162,7 +1159,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter,
 	struct sk_buff *tmp_skb;
 	unsigned int frame_ctrl;
 	unsigned int ring_num;
-	unsigned long flags;
 
 	/*-- Set frame control --*/
 	frame_ctrl = 0;
@@ -1211,14 +1207,14 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter,
 			}
 		}
 	}
-	spin_lock_irqsave(&tx_ring->tx_lock, flags);
+
 	ring_num = tx_ring->next_to_use;
 	if (unlikely((ring_num + 1) == tx_ring->count))
 		tx_ring->next_to_use = 0;
 	else
 		tx_ring->next_to_use = ring_num + 1;
 
-	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
+
 	buffer_info = &tx_ring->buffer_info[ring_num];
 	tmp_skb = buffer_info->skb;
 
@@ -1518,7 +1514,7 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter,
 						&rx_ring->rx_buff_pool_logic,
 						GFP_KERNEL);
 	if (!rx_ring->rx_buff_pool) {
-		pr_err("Unable to allocate memory for the receive poll buffer\n");
+		pr_err("Unable to allocate memory for the receive pool buffer\n");
 		return -ENOMEM;
 	}
 	memset(rx_ring->rx_buff_pool, 0, size);
@@ -1637,15 +1633,17 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
 	pr_debug("called pch_gbe_unmap_and_free_tx_resource() %d count\n",
 		 cleaned_count);
 	/* Recover from running out of Tx resources in xmit_frame */
+	spin_lock(&tx_ring->tx_lock);
 	if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev)))) {
 		netif_wake_queue(adapter->netdev);
 		adapter->stats.tx_restart_count++;
 		pr_debug("Tx wake queue\n");
 	}
-	spin_lock(&adapter->tx_queue_lock);
+
 	tx_ring->next_to_clean = i;
-	spin_unlock(&adapter->tx_queue_lock);
+
 	pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean);
+	spin_unlock(&tx_ring->tx_lock);
 	return cleaned;
 }
 
@@ -2037,7 +2035,6 @@ static int pch_gbe_sw_init(struct pch_gbe_adapter *adapter)
 		return -ENOMEM;
 	}
 	spin_lock_init(&adapter->hw.miim_lock);
-	spin_lock_init(&adapter->tx_queue_lock);
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->ethtool_lock);
 	atomic_set(&adapter->irq_sem, 0);
@@ -2142,10 +2139,10 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 			 tx_ring->next_to_use, tx_ring->next_to_clean);
 		return NETDEV_TX_BUSY;
 	}
-	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
 
 	/* CRC,ITAG no support */
 	pch_gbe_tx_queue(adapter, tx_ring, skb);
+	spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
 	return NETDEV_TX_OK;
 }
 

^ permalink raw reply related

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Eric Dumazet @ 2012-05-14 19:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, David S. Miller, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <20120514191456.GC17086@redhat.com>

On Mon, 2012-05-14 at 22:14 +0300, Michael S. Tsirkin wrote:

> They seem to be in net-next, or did I miss something?

Yes, you re-introduce in this patch some bugs already fixed in macvtap

^ permalink raw reply

* Re: [PATCH RFC 5/6] net: orphan frags on receive
From: Michael S. Tsirkin @ 2012-05-14 19:18 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336571692.25514.122.camel@zakaz.uk.xensource.com>

On Wed, May 09, 2012 at 02:54:52PM +0100, Ian Campbell wrote:
> On Mon, 2012-05-07 at 14:54 +0100, Michael S. Tsirkin wrote:
> > zero copy packets are normally sent to the outside
> > network, but bridging, tun etc might loop them
> > back to host networking stack. If this happens
> > destructors will never be called, so orphan
> > the frags immediately on receive.
> 
> I think this deceptively simply patch is actually the meat of the
> series.

Eric, does this patch look ok to you?
It's key to both my and ian's zerocopy work.

-- 
MST

^ permalink raw reply

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Michael S. Tsirkin @ 2012-05-14 19:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David S. Miller, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <20120514191456.GC17086@redhat.com>

On Mon, May 14, 2012 at 10:14:56PM +0300, Michael S. Tsirkin wrote:
> On Mon, May 14, 2012 at 09:02:26PM +0200, Eric Dumazet wrote:
> > On Mon, 2012-05-14 at 20:04 +0300, Michael S. Tsirkin wrote:
> > > On Mon, May 14, 2012 at 09:54:46AM -0700, Stephen Hemminger wrote:
> > > > On Sun, 13 May 2012 18:52:06 +0300
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > 
> > > > > +		/* Userspace may produce vectors with count greater than
> > > > > +		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> > > > > +		 * to let the rest of data to be fit in the frags.
> > > > > +		 */
> > > > Rather than complex partial code, just go through slow path for
> > > > requests with too many frags (or for really small requests).
> > > > Creating mixed skb's seems too easy to get wrong.
> > > 
> > > I don't object in principle but macvtap has same code
> > > so seems better to stay consistent.
> > > 
> > 
> > If I remember well, code in vtap was buggy and still is.
> > 
> > Jason Wang fixes are not yet in,
> 
> They seem to be in net-next, or did I miss something?
> 
> > so maybe wait a bit, so that we dont
> > add a pile of new bugs ?
> > 
> > 

Things progress smoother upstream than out of tree
is my experience.

Also everything is guarded by a mod param in vhost
which is off by default and the name
experimental_zcopytx makes it hopefully clear there's
risk involved.

So the chance of hurting someone is imo minimal.

^ permalink raw reply

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Michael S. Tsirkin @ 2012-05-14 19:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David S. Miller, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <1337022146.8512.606.camel@edumazet-glaptop>

On Mon, May 14, 2012 at 09:02:26PM +0200, Eric Dumazet wrote:
> On Mon, 2012-05-14 at 20:04 +0300, Michael S. Tsirkin wrote:
> > On Mon, May 14, 2012 at 09:54:46AM -0700, Stephen Hemminger wrote:
> > > On Sun, 13 May 2012 18:52:06 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > +		/* Userspace may produce vectors with count greater than
> > > > +		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> > > > +		 * to let the rest of data to be fit in the frags.
> > > > +		 */
> > > Rather than complex partial code, just go through slow path for
> > > requests with too many frags (or for really small requests).
> > > Creating mixed skb's seems too easy to get wrong.
> > 
> > I don't object in principle but macvtap has same code
> > so seems better to stay consistent.
> > 
> 
> If I remember well, code in vtap was buggy and still is.
> 
> Jason Wang fixes are not yet in,

They seem to be in net-next, or did I miss something?

> so maybe wait a bit, so that we dont
> add a pile of new bugs ?
> 
> 
> 

^ permalink raw reply

* Re: [PATCH] netfilter: xt_HMARK: endian bugs
From: Eric Dumazet @ 2012-05-14 19:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jozsef Kadlecsik, Hans Schillstrom, Jan Engelhardt,
	kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	dan.carpenter@oracle.com, hans@schillstrom.com
In-Reply-To: <20120514190211.GC14897@1984>

On Mon, 2012-05-14 at 21:02 +0200, Pablo Neira Ayuso wrote:

> IIRC, Hans wants that, in case you have a cluster composed of system
> with different endianess, the hash mark calculated will be the same
> in both systems. To ensure that the distribution is consistent with
> independency of the endianess.

Then jhash() must be audited to make sure its output is OK with this
requirement.






^ permalink raw reply

* Re: [PATCH] [RFC] netfilter: don't assume NFPROTO_* are like PF_*
From: Pablo Neira Ayuso @ 2012-05-14 19:06 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Alban Crequy, Patrick McHardy, Vincent Sanders,
	Javier Martinez Canillas, netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1205141707490.11548@frira.vanv.qr>

On Mon, May 14, 2012 at 05:09:54PM +0200, Jan Engelhardt wrote:
> 
> On Monday 2012-05-14 15:58, Alban Crequy wrote:
> >--- a/include/linux/netfilter.h
> >+++ b/include/linux/netfilter.h
> >@@ -62,11 +62,11 @@ enum nf_inet_hooks {
> > 
> > enum {
> > 	NFPROTO_UNSPEC =  0,
> >-	NFPROTO_IPV4   =  2,
> >-	NFPROTO_ARP    =  3,
> >-	NFPROTO_BRIDGE =  7,
> >-	NFPROTO_IPV6   = 10,
> >-	NFPROTO_DECNET = 12,
> >+	NFPROTO_IPV4,
> >+	NFPROTO_ARP,
> >+	NFPROTO_BRIDGE,
> >+	NFPROTO_IPV6,
> >+	NFPROTO_DECNET,
> > 	NFPROTO_NUMPROTO,
> > };
> 
> This must not be changed under any circumstances. It is exported to
> and used by userspace. (Except perhaps for NFPROTO_DECNET, which
> refers to a quite dead protocol that I think no user parts have ever
> used NFPROTO_DECNET.) I would consider it acceptable to change the
> value for NFPROTO_DECNET if Pablo joins.

If there is some remote posibility to break userspace code, I won't
take the patch, sorry.

^ permalink raw reply

* Re: [PATCH v2 1/6] netfilter: sanity checks on NFPROTO_NUMPROTO
From: Pablo Neira Ayuso @ 2012-05-14 19:04 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Patrick McHardy, Vincent Sanders, Javier Martinez Canillas,
	netfilter-devel, netdev
In-Reply-To: <20120514170410.6c2f1c5b@rainbow.cbg.collabora.co.uk>

On Mon, May 14, 2012 at 05:04:10PM +0100, Alban Crequy wrote:
> Le Mon, 14 May 2012 16:39:49 +0100,
> Alban Crequy <alban.crequy@collabora.co.uk> a écrit :
> 
> > Le Mon, 14 May 2012 16:42:35 +0200,
> > Pablo Neira Ayuso <pablo@netfilter.org> a écrit :
> > 
> > > On Mon, May 14, 2012 at 02:56:34PM +0100, Alban Crequy wrote:
> > > > With the NFPROTO_* constants introduced by commit 7e9c6e
> > > > ("netfilter: Introduce NFPROTO_* constants"), it is too easy to
> > > > confuse PF_* and NFPROTO_* constants in new protocols.
> > > > 
> > > > Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
> > > > Reviewed-by: Javier Martinez Canillas
> > > > <javier.martinez@collabora.co.uk> Reviewed-by: Vincent Sanders
> > > > <vincent.sanders@collabora.co.uk> ---
> > > >  net/netfilter/core.c |    5 +++++
> > > >  1 files changed, 5 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> > > > index e1b7e05..4f16552 100644
> > > > --- a/net/netfilter/core.c
> > > > +++ b/net/netfilter/core.c
> > > > @@ -67,6 +67,11 @@ int nf_register_hook(struct nf_hook_ops *reg)
> > > >  	struct nf_hook_ops *elem;
> > > >  	int err;
> > > >  
> > > > +	if (reg->pf >= NFPROTO_NUMPROTO || reg->hooknum >=
> > > > NF_MAX_HOOKS) {
> > > > +		BUG();
> > > > +		return 1;
> > > 
> > > nf_register_hook returns a negative value on error. -EINVAL can be
> > > fine.
> > 
> > Is it the patch you mean? Do you want me to do a series repost?
> 
> Please disregard the previous patch, this is the correct one.
> 
> 
> From: Alban Crequy <alban.crequy@collabora.co.uk>
> 
> netfilter: sanity checks on NFPROTO_NUMPROTO
> 
> With the NFPROTO_* constants introduced by commit 7e9c6e ("netfilter: Introduce
> NFPROTO_* constants"), it is too easy to confuse PF_* and NFPROTO_* constants
> in new protocols.
> 
> Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
> ---
>  net/netfilter/core.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> index e1b7e05..7422989 100644
> --- a/net/netfilter/core.c
> +++ b/net/netfilter/core.c
> @@ -67,6 +67,14 @@ int nf_register_hook(struct nf_hook_ops *reg)
>  	struct nf_hook_ops *elem;
>  	int err;
>  
> +	if (reg->pf >= NFPROTO_NUMPROTO || reg->hooknum >= NF_MAX_HOOKS) {
> +		WARN(reg->pf >= NFPROTO_NUMPROTO,
> +		     "netfilter: Invalid nfproto %d\n", reg->pf);
> +		WARN(reg->hooknum >= NF_MAX_HOOKS,
> +		     "netfilter: Invalid hooknum %d\n", reg->hooknum);

Then, better add two checkings. One to spot the first warning, and
another to spot the second.

I havent seen such a code in any netfilter code and I like that things
remain consistent.

> +		return -EINVAL;
> +	}
> +
>  	err = mutex_lock_interruptible(&nf_hook_mutex);
>  	if (err < 0)
>  		return err;
> -- 
> 1.7.2.5
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox