Netdev List
 help / color / mirror / Atom feed
* [net-next 08/12] e1000e: update driver version
From: Jeff Kirsher @ 2011-06-08 22:56 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307573800-24868-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/e1000e/netdev.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 51cf715..3bf5249 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -54,9 +54,9 @@
 
 #include "e1000.h"
 
-#define DRV_EXTRAVERSION "-k2"
+#define DRV_EXTRAVERSION "-k"
 
-#define DRV_VERSION "1.3.10" DRV_EXTRAVERSION
+#define DRV_VERSION "1.3.16" DRV_EXTRAVERSION
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;
 
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 09/12] igbvf: update version number
From: Jeff Kirsher @ 2011-06-08 22:56 UTC (permalink / raw)
  To: davem; +Cc: Williams, Mitch A, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307573800-24868-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: "Williams, Mitch A" <mitch.a.williams@intel.com>

Update the version number to match version conventions. Bump the major
version to indicate that new hardware support (i350) has been added.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/igbvf/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index 1c77fb3..64b47bf 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -45,7 +45,7 @@
 
 #include "igbvf.h"
 
-#define DRV_VERSION "1.0.8-k0"
+#define DRV_VERSION "2.0.0-k"
 char igbvf_driver_name[] = "igbvf";
 const char igbvf_driver_version[] = DRV_VERSION;
 static const char igbvf_driver_string[] =
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 10/12] igb: Change version to remove number after -k in kernel versions.
From: Jeff Kirsher @ 2011-06-08 22:56 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307573800-24868-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch changes the way versioning is done for igb in the kernel by
removing the number after the "k."  It has been determined that just the
"k" is sufficient to identify a kernel version and the following number
was used in an inconsistent manner.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/igb/igb_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 18fccf9..c2e9670 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -54,9 +54,8 @@
 #define MAJ 3
 #define MIN 0
 #define BUILD 6
-#define KFIX 2
 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \
-__stringify(BUILD) "-k" __stringify(KFIX)
+__stringify(BUILD) "-k"
 char igb_driver_name[] = "igb";
 char igb_driver_version[] = DRV_VERSION;
 static const char igb_driver_string[] =
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 11/12] rtnetlink: Compute and store minimum ifinfo dump size
From: Jeff Kirsher @ 2011-06-08 22:56 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307573800-24868-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/netlink.h              |    6 ++-
 include/net/rtnetlink.h              |    7 +++-
 net/bridge/br_netlink.c              |   15 ++++++---
 net/core/fib_rules.c                 |    6 ++--
 net/core/neighbour.c                 |   11 +++---
 net/core/rtnetlink.c                 |   60 +++++++++++++++++++++++++++------
 net/dcb/dcbnl.c                      |    4 +-
 net/decnet/dn_dev.c                  |    6 ++--
 net/decnet/dn_fib.c                  |    4 +-
 net/decnet/dn_route.c                |    5 ++-
 net/ipv4/devinet.c                   |    6 ++--
 net/ipv4/fib_frontend.c              |    6 ++--
 net/ipv4/inet_diag.c                 |    2 +-
 net/ipv4/ipmr.c                      |    3 +-
 net/ipv4/route.c                     |    2 +-
 net/ipv6/addrconf.c                  |   16 ++++++---
 net/ipv6/addrlabel.c                 |    9 +++--
 net/ipv6/ip6_fib.c                   |    3 +-
 net/ipv6/ip6mr.c                     |    3 +-
 net/ipv6/route.c                     |    6 ++--
 net/netfilter/ipset/ip_set_core.c    |    2 +-
 net/netfilter/nf_conntrack_netlink.c |    4 +-
 net/netlink/af_netlink.c             |   17 ++++++---
 net/netlink/genetlink.c              |    2 +-
 net/phonet/pn_netlink.c              |   13 ++++---
 net/sched/act_api.c                  |    7 ++--
 net/sched/cls_api.c                  |    6 ++--
 net/sched/sch_api.c                  |   12 +++---
 net/xfrm/xfrm_user.c                 |    3 +-
 29 files changed, 157 insertions(+), 89 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index a9dd895..fdd0188 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -221,7 +221,8 @@ struct netlink_callback {
 	int			(*dump)(struct sk_buff * skb,
 					struct netlink_callback *cb);
 	int			(*done)(struct netlink_callback *cb);
-	int			family;
+	u16			family;
+	u16			min_dump_alloc;
 	long			args[6];
 };
 
@@ -259,7 +260,8 @@ __nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len, int flags)
 extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
 			      int (*dump)(struct sk_buff *skb, struct netlink_callback*),
-			      int (*done)(struct netlink_callback*));
+			      int (*done)(struct netlink_callback*),
+			      u16 min_dump_alloc);
 
 
 #define NL_NONROOT_RECV 0x1
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 4093ca7..678f1ff 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -6,11 +6,14 @@
 
 typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *, void *);
 typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
+typedef u16 (*rtnl_calcit_func)(struct sk_buff *);
 
 extern int	__rtnl_register(int protocol, int msgtype,
-				rtnl_doit_func, rtnl_dumpit_func);
+				rtnl_doit_func, rtnl_dumpit_func,
+				rtnl_calcit_func);
 extern void	rtnl_register(int protocol, int msgtype,
-			      rtnl_doit_func, rtnl_dumpit_func);
+			      rtnl_doit_func, rtnl_dumpit_func,
+			      rtnl_calcit_func);
 extern int	rtnl_unregister(int protocol, int msgtype);
 extern void	rtnl_unregister_all(int protocol);
 
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index ffb0dc4..6814083 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -218,19 +218,24 @@ int __init br_netlink_init(void)
 	if (err < 0)
 		goto err1;
 
-	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, br_dump_ifinfo);
+	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL,
+			      br_dump_ifinfo, NULL);
 	if (err)
 		goto err2;
-	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK,
+			      br_rtm_setlink, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH,
+			      br_fdb_add, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH,
+			      br_fdb_delete, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
+	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
+			      NULL, br_fdb_dump, NULL);
 	if (err)
 		goto err3;
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 008dc70..e7ab0c0 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -740,9 +740,9 @@ static struct pernet_operations fib_rules_net_ops = {
 static int __init fib_rules_init(void)
 {
 	int err;
-	rtnl_register(PF_UNSPEC, RTM_NEWRULE, fib_nl_newrule, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELRULE, fib_nl_delrule, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule);
+	rtnl_register(PF_UNSPEC, RTM_NEWRULE, fib_nl_newrule, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELRULE, fib_nl_delrule, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule, NULL);
 
 	err = register_pernet_subsys(&fib_rules_net_ops);
 	if (err < 0)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 799f06e..ceb505b 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2909,12 +2909,13 @@ EXPORT_SYMBOL(neigh_sysctl_unregister);
 
 static int __init neigh_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETNEIGH, NULL, neigh_dump_info);
+	rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNEIGH, NULL, neigh_dump_info, NULL);
 
-	rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info);
-	rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info,
+		      NULL);
+	rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL, NULL);
 
 	return 0;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index abd936d..a798fc6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -56,9 +56,11 @@
 struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
+	rtnl_calcit_func 	calcit;
 };
 
 static DEFINE_MUTEX(rtnl_mutex);
+static u16 min_ifinfo_dump_size;
 
 void rtnl_lock(void)
 {
@@ -144,12 +146,28 @@ static rtnl_dumpit_func rtnl_get_dumpit(int protocol, int msgindex)
 	return tab ? tab[msgindex].dumpit : NULL;
 }
 
+static rtnl_calcit_func rtnl_get_calcit(int protocol, int msgindex)
+{
+	struct rtnl_link *tab;
+
+	if (protocol <= RTNL_FAMILY_MAX)
+		tab = rtnl_msg_handlers[protocol];
+	else
+		tab = NULL;
+
+	if (tab == NULL || tab[msgindex].calcit == NULL)
+		tab = rtnl_msg_handlers[PF_UNSPEC];
+
+	return tab ? tab[msgindex].calcit : NULL;
+}
+
 /**
  * __rtnl_register - Register a rtnetlink message type
  * @protocol: Protocol family or PF_UNSPEC
  * @msgtype: rtnetlink message type
  * @doit: Function pointer called for each request message
  * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
+ * @calcit: Function pointer to calc size of dump message
  *
  * Registers the specified function pointers (at least one of them has
  * to be non-NULL) to be called whenever a request message for the
@@ -162,7 +180,8 @@ static rtnl_dumpit_func rtnl_get_dumpit(int protocol, int msgindex)
  * Returns 0 on success or a negative error code.
  */
 int __rtnl_register(int protocol, int msgtype,
-		    rtnl_doit_func doit, rtnl_dumpit_func dumpit)
+		    rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+		    rtnl_calcit_func calcit)
 {
 	struct rtnl_link *tab;
 	int msgindex;
@@ -185,6 +204,9 @@ int __rtnl_register(int protocol, int msgtype,
 	if (dumpit)
 		tab[msgindex].dumpit = dumpit;
 
+	if (calcit)
+		tab[msgindex].calcit = calcit;
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__rtnl_register);
@@ -199,9 +221,10 @@ EXPORT_SYMBOL_GPL(__rtnl_register);
  * of memory implies no sense in continuing.
  */
 void rtnl_register(int protocol, int msgtype,
-		   rtnl_doit_func doit, rtnl_dumpit_func dumpit)
+		   rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+		   rtnl_calcit_func calcit)
 {
-	if (__rtnl_register(protocol, msgtype, doit, dumpit) < 0)
+	if (__rtnl_register(protocol, msgtype, doit, dumpit, calcit) < 0)
 		panic("Unable to register rtnetlink message handler, "
 		      "protocol = %d, message type = %d\n",
 		      protocol, msgtype);
@@ -1818,6 +1841,11 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	return err;
 }
 
+static u16 rtnl_calcit(struct sk_buff *skb)
+{
+	return min_ifinfo_dump_size;
+}
+
 static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	int idx;
@@ -1847,11 +1875,14 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
+	size_t if_info_size;
 
-	skb = nlmsg_new(if_nlmsg_size(dev), GFP_KERNEL);
+	skb = nlmsg_new((if_info_size = if_nlmsg_size(dev)), GFP_KERNEL);
 	if (skb == NULL)
 		goto errout;
 
+	min_ifinfo_dump_size = max_t(u16, if_info_size, min_ifinfo_dump_size);
+
 	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
@@ -1902,14 +1933,20 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
 		struct sock *rtnl;
 		rtnl_dumpit_func dumpit;
+		rtnl_calcit_func calcit;
+		u16 min_dump_alloc = 0;
 
 		dumpit = rtnl_get_dumpit(family, type);
 		if (dumpit == NULL)
 			return -EOPNOTSUPP;
+		calcit = rtnl_get_calcit(family, type);
+		if (calcit)
+			min_dump_alloc = calcit(skb);
 
 		__rtnl_unlock();
 		rtnl = net->rtnl;
-		err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
+		err = netlink_dump_start(rtnl, skb, nlh, dumpit,
+					 NULL, min_dump_alloc);
 		rtnl_lock();
 		return err;
 	}
@@ -2019,12 +2056,13 @@ void __init rtnetlink_init(void)
 	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
 	register_netdevice_notifier(&rtnetlink_dev_notifier);
 
-	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink, rtnl_dump_ifinfo);
-	rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL);
-	rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink,
+		      rtnl_dump_ifinfo, rtnl_calcit);
+	rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL, NULL);
 
-	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all);
-	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all);
+	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, NULL);
 }
 
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 3609eac..ed1bb8c 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1819,8 +1819,8 @@ static int __init dcbnl_init(void)
 {
 	INIT_LIST_HEAD(&dcb_app_list);
 
-	rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL);
-	rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL, NULL);
 
 	return 0;
 }
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index cf26ac7..3780fd6 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -1414,9 +1414,9 @@ void __init dn_dev_init(void)
 
 	dn_dev_devices_on();
 
-	rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL);
-	rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL);
-	rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr);
+	rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr, NULL);
 
 	proc_net_fops_create(&init_net, "decnet_dev", S_IRUGO, &dn_dev_seq_fops);
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 1c74ed3..104324d 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -763,8 +763,8 @@ void __init dn_fib_init(void)
 
 	register_dnaddr_notifier(&dn_fib_dnaddr_notifier);
 
-	rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL);
-	rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL);
+	rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL, NULL);
 }
 
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 74544bc..2949ca4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1841,10 +1841,11 @@ void __init dn_route_init(void)
 	proc_net_fops_create(&init_net, "decnet_cache", S_IRUGO, &dn_rt_cache_seq_fops);
 
 #ifdef CONFIG_DECNET_ROUTER
-	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute, dn_fib_dump);
+	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
+		      dn_fib_dump, NULL);
 #else
 	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
-		      dn_cache_dump);
+		      dn_cache_dump, NULL);
 #endif
 }
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0d4a184..37b3c18 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1833,8 +1833,8 @@ void __init devinet_init(void)
 
 	rtnl_af_register(&inet_af_ops);
 
-	rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL);
-	rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL);
-	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr);
+	rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, NULL);
+	rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, NULL);
+	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, NULL);
 }
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2252471..92fc5f6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1124,9 +1124,9 @@ static struct pernet_operations fib_net_ops = {
 
 void __init ip_fib_init(void)
 {
-	rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL);
-	rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL);
-	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib);
+	rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
+	rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
+	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
 
 	register_pernet_subsys(&fib_net_ops);
 	register_netdevice_notifier(&fib_netdev_notifier);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 6ffe94c..5ff4765 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -871,7 +871,7 @@ static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 
 		return netlink_dump_start(idiagnl, skb, nlh,
-					  inet_diag_dump, NULL);
+					  inet_diag_dump, NULL, 0);
 	}
 
 	return inet_diag_get_exact(skb, nlh);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 30a7763..aae2bd8 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2544,7 +2544,8 @@ int __init ip_mr_init(void)
 		goto add_proto_fail;
 	}
 #endif
-	rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE, NULL, ipmr_rtm_dumproute);
+	rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
+		      NULL, ipmr_rtm_dumproute, NULL);
 	return 0;
 
 #ifdef CONFIG_IP_PIMSM_V2
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 52b0b95..aa29c62 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3295,7 +3295,7 @@ int __init ip_rt_init(void)
 	xfrm_init();
 	xfrm4_init(ip_rt_max_size);
 #endif
-	rtnl_register(PF_INET, RTM_GETROUTE, inet_rtm_getroute, NULL);
+	rtnl_register(PF_INET, RTM_GETROUTE, inet_rtm_getroute, NULL, NULL);
 
 #ifdef CONFIG_SYSCTL
 	register_pernet_subsys(&sysctl_route_ops);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 498b927..954772b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4692,16 +4692,20 @@ int __init addrconf_init(void)
 	if (err < 0)
 		goto errout_af;
 
-	err = __rtnl_register(PF_INET6, RTM_GETLINK, NULL, inet6_dump_ifinfo);
+	err = __rtnl_register(PF_INET6, RTM_GETLINK, NULL, inet6_dump_ifinfo,
+			      NULL);
 	if (err < 0)
 		goto errout;
 
 	/* Only the first call to __rtnl_register can fail */
-	__rtnl_register(PF_INET6, RTM_NEWADDR, inet6_rtm_newaddr, NULL);
-	__rtnl_register(PF_INET6, RTM_DELADDR, inet6_rtm_deladdr, NULL);
-	__rtnl_register(PF_INET6, RTM_GETADDR, inet6_rtm_getaddr, inet6_dump_ifaddr);
-	__rtnl_register(PF_INET6, RTM_GETMULTICAST, NULL, inet6_dump_ifmcaddr);
-	__rtnl_register(PF_INET6, RTM_GETANYCAST, NULL, inet6_dump_ifacaddr);
+	__rtnl_register(PF_INET6, RTM_NEWADDR, inet6_rtm_newaddr, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_DELADDR, inet6_rtm_deladdr, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_GETADDR, inet6_rtm_getaddr,
+			inet6_dump_ifaddr, NULL);
+	__rtnl_register(PF_INET6, RTM_GETMULTICAST, NULL,
+			inet6_dump_ifmcaddr, NULL);
+	__rtnl_register(PF_INET6, RTM_GETANYCAST, NULL,
+			inet6_dump_ifacaddr, NULL);
 
 	ipv6_addr_label_rtnl_register();
 
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index c8993e5..2d8ddba 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -592,8 +592,11 @@ out:
 
 void __init ipv6_addr_label_rtnl_register(void)
 {
-	__rtnl_register(PF_INET6, RTM_NEWADDRLABEL, ip6addrlbl_newdel, NULL);
-	__rtnl_register(PF_INET6, RTM_DELADDRLABEL, ip6addrlbl_newdel, NULL);
-	__rtnl_register(PF_INET6, RTM_GETADDRLABEL, ip6addrlbl_get, ip6addrlbl_dump);
+	__rtnl_register(PF_INET6, RTM_NEWADDRLABEL, ip6addrlbl_newdel,
+			NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_DELADDRLABEL, ip6addrlbl_newdel,
+			NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_GETADDRLABEL, ip6addrlbl_get,
+			ip6addrlbl_dump, NULL);
 }
 
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 4076a0b..3030bdf 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1586,7 +1586,8 @@ int __init fib6_init(void)
 	if (ret)
 		goto out_kmem_cache_create;
 
-	ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib);
+	ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib,
+			      NULL);
 	if (ret)
 		goto out_unregister_subsys;
 out:
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 82a8099..705c828 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1354,7 +1354,8 @@ int __init ip6_mr_init(void)
 		goto add_proto_fail;
 	}
 #endif
-	rtnl_register(RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL, ip6mr_rtm_dumproute);
+	rtnl_register(RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL,
+		      ip6mr_rtm_dumproute, NULL);
 	return 0;
 #ifdef CONFIG_IPV6_PIMSM_V2
 add_proto_fail:
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index de2b1de..216ff31 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2925,9 +2925,9 @@ int __init ip6_route_init(void)
 		goto xfrm6_init;
 
 	ret = -ENOBUFS;
-	if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL) ||
-	    __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL) ||
-	    __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL))
+	if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL, NULL) ||
+	    __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL, NULL) ||
+	    __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL, NULL))
 		goto fib6_rules_init;
 
 	ret = register_netdevice_notifier(&ip6_route_dev_notifier);
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 8041bef..333b0be 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1120,7 +1120,7 @@ ip_set_dump(struct sock *ctnl, struct sk_buff *skb,
 
 	return netlink_dump_start(ctnl, skb, nlh,
 				  ip_set_dump_start,
-				  ip_set_dump_done);
+				  ip_set_dump_done, 0);
 }
 
 /* Add, del and test */
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..7dec88a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -970,7 +970,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb,
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP)
 		return netlink_dump_start(ctnl, skb, nlh, ctnetlink_dump_table,
-					  ctnetlink_done);
+					  ctnetlink_done, 0);
 
 	err = ctnetlink_parse_zone(cda[CTA_ZONE], &zone);
 	if (err < 0)
@@ -1840,7 +1840,7 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		return netlink_dump_start(ctnl, skb, nlh,
 					  ctnetlink_exp_dump_table,
-					  ctnetlink_exp_done);
+					  ctnetlink_exp_done, 0);
 	}
 
 	err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6ef64ad..0b92f7549 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1659,13 +1659,10 @@ static int netlink_dump(struct sock *sk)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct netlink_callback *cb;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	int len, err = -ENOBUFS;
-
-	skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL);
-	if (!skb)
-		goto errout;
+	int alloc_size;
 
 	mutex_lock(nlk->cb_mutex);
 
@@ -1675,6 +1672,12 @@ static int netlink_dump(struct sock *sk)
 		goto errout_skb;
 	}
 
+	alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
+
+	skb = sock_rmalloc(sk, alloc_size, 0, GFP_KERNEL);
+	if (!skb)
+		goto errout;
+
 	len = cb->dump(skb, cb);
 
 	if (len > 0) {
@@ -1721,7 +1724,8 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 		       const struct nlmsghdr *nlh,
 		       int (*dump)(struct sk_buff *skb,
 				   struct netlink_callback *),
-		       int (*done)(struct netlink_callback *))
+		       int (*done)(struct netlink_callback *),
+		       u16 min_dump_alloc)
 {
 	struct netlink_callback *cb;
 	struct sock *sk;
@@ -1735,6 +1739,7 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->dump = dump;
 	cb->done = done;
 	cb->nlh = nlh;
+	cb->min_dump_alloc = min_dump_alloc;
 	atomic_inc(&skb->users);
 	cb->skb = skb;
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 1781d99..482fa57 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -525,7 +525,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 		genl_unlock();
 		err = netlink_dump_start(net->genl_sock, skb, nlh,
-					 ops->dumpit, ops->done);
+					 ops->dumpit, ops->done, 0);
 		genl_lock();
 		return err;
 	}
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index 438accb..d61f676 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -289,15 +289,16 @@ out:
 
 int __init phonet_netlink_register(void)
 {
-	int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit, NULL);
+	int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit,
+				  NULL, NULL);
 	if (err)
 		return err;
 
 	/* Further __rtnl_register() cannot fail */
-	__rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit);
-	__rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit);
+	__rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit, NULL);
+	__rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit, NULL);
 	return 0;
 }
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index a606025..2f64262 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1115,9 +1115,10 @@ nlmsg_failure:
 
 static int __init tc_action_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action);
+	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action,
+		      NULL);
 
 	return 0;
 }
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bb2c523..9563887 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -610,10 +610,10 @@ EXPORT_SYMBOL(tcf_exts_dump_stats);
 
 static int __init tc_filter_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
-						 tc_dump_tfilter);
+		      tc_dump_tfilter, NULL);
 
 	return 0;
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6b86276..8182aef 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1792,12 +1792,12 @@ static int __init pktsched_init(void)
 	register_qdisc(&pfifo_head_drop_qdisc_ops);
 	register_qdisc(&mq_qdisc_ops);
 
-	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETQDISC, tc_get_qdisc, tc_dump_qdisc);
-	rtnl_register(PF_UNSPEC, RTM_NEWTCLASS, tc_ctl_tclass, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELTCLASS, tc_ctl_tclass, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETTCLASS, tc_ctl_tclass, tc_dump_tclass);
+	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETQDISC, tc_get_qdisc, tc_dump_qdisc, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWTCLASS, tc_ctl_tclass, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELTCLASS, tc_ctl_tclass, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETTCLASS, tc_ctl_tclass, tc_dump_tclass, NULL);
 
 	return 0;
 }
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index c658cb3..0256b8a 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2299,7 +2299,8 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		if (link->dump == NULL)
 			return -EINVAL;
 
-		return netlink_dump_start(net->xfrm.nlsk, skb, nlh, link->dump, link->done);
+		return netlink_dump_start(net->xfrm.nlsk, skb, nlh,
+					  link->dump, link->done, 0);
 	}
 
 	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX,
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 12/12] ixgbevf: Update the driver string
From: Jeff Kirsher @ 2011-06-08 22:56 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307573800-24868-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbevf/ixgbevf_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index 28d3cb2..b2c5ecd 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -52,7 +52,7 @@ char ixgbevf_driver_name[] = "ixgbevf";
 static const char ixgbevf_driver_string[] =
 	"Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver";
 
-#define DRV_VERSION "2.0.0-k2"
+#define DRV_VERSION "2.1.0-k"
 const char ixgbevf_driver_version[] = DRV_VERSION;
 static char ixgbevf_copyright[] =
 	"Copyright (c) 2009 - 2010 Intel Corporation.";
-- 
1.7.5.2


^ permalink raw reply related

* Re: [PATCH] v2 ethtool: remove support for ETHTOOL_GRXNTUPLE
From: Ben Hutchings @ 2011-06-08 23:03 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: davem, jeffrey.t.kirsher, netdev
In-Reply-To: <20110608223508.20551.45558.stgit@gitlad.jf.intel.com>

On Wed, 2011-06-08 at 15:35 -0700, Alexander Duyck wrote:
> This change is meant to remove all support for displaying an ntuple as
> strings via ETHTOOL_GRXNTUPLE.  The reason for this change is due to the
> fact that multiple issues have been found including:
>  - Multiple buffer overruns for strings being displayed.
>  - Incorrect filters displayed, cleared filters with ring of -2 are displayed
>  - Setting get_rx_ntuple displays no rules if defined.
>  - Endianess wrong on displayed values.
>  - Hard limit of 1024 filters makes display functionality extremely limited
> 
> The only driver that had supported this interface was ixgbe.  Since it no
> longer uses the interface and due to the issues mentioned above I am
> submitting this patch to remove it.
> 
> v2:
> Updated based on comments from Ben Hutchings
>  - Left ETH_SS_NTUPLE_FILTERS in code but commented on it being deprecated
>  - Removed ethtool_rx_ntuple_list and ethtool_rx_ntuple_flow_spec_container
>  - Left ETHTOOL_GRXNTUPLE but commented it as deprecated
> 
> Also cleaned up set_rx_ntuple since there is no flow spec container to
> maintain we can drop all the code for the alloc and free of it and just
> return ops->set_rx_ntuple().
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] vlan: Fix the ingress VLAN_FLAG_REORDER_HDR check v2
From: Changli Gao @ 2011-06-08 23:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric W. Biederman, David Miller, shemminger, greearb,
	nicolas.2p.debian, netdev, kaber, fubar, eric.dumazet, andy,
	jesse
In-Reply-To: <20110608162841.GB4409@minipsycho.redhat.com>

On Thu, Jun 9, 2011 at 12:28 AM, Jiri Pirko <jpirko@redhat.com> wrote:
>
> Why we can't remove this right away?
>

I think the reason is this patch is a bug fix for the stable kernel,
so we should not introduce new features or remove old features.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH net-next-2.6] inetpeer: remove unused list
From: Eric Dumazet @ 2011-06-08 23:35 UTC (permalink / raw)
  To: David Miller; +Cc: tim.c.chen, Andi Kleen, netdev
In-Reply-To: <1307514470.3102.25.camel@edumazet-laptop>

Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tim Chen <tim.c.chen@linux.intel.com>
---
 Documentation/networking/ip-sysctl.txt |   10 
 include/net/inetpeer.h                 |    2 
 include/net/ip.h                       |    2 
 net/ipv4/inetpeer.c                    |  280 +++++------------------
 net/ipv4/sysctl_net_ipv4.c             |   14 -
 5 files changed, 74 insertions(+), 234 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d3d653a..3dcb26c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -106,16 +106,6 @@ inet_peer_maxttl - INTEGER
 	when the number of entries in the pool is very small).
 	Measured in seconds.
 
-inet_peer_gc_mintime - INTEGER
-	Minimum interval between garbage collection passes.  This interval is
-	in effect under high memory pressure on the pool.
-	Measured in seconds.
-
-inet_peer_gc_maxtime - INTEGER
-	Minimum interval between garbage collection passes.  This interval is
-	in effect under low (or absent) memory pressure on the pool.
-	Measured in seconds.
-
 TCP variables:
 
 somaxconn - INTEGER
diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index 8a159cc..1f0966f 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -32,7 +32,6 @@ struct inet_peer {
 	struct inet_peer __rcu	*avl_left, *avl_right;
 	struct inetpeer_addr	daddr;
 	__u32			avl_height;
-	struct list_head	unused;
 	__u32			dtime;		/* the time of last use of not
 						 * referenced entries */
 	atomic_t		refcnt;
@@ -56,6 +55,7 @@ struct inet_peer {
 			struct inetpeer_addr_base	redirect_learned;
 		};
 		struct rcu_head         rcu;
+		struct inet_peer	*gc_next;
 	};
 };
 
diff --git a/include/net/ip.h b/include/net/ip.h
index 66dd491..e9ea7c7 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -228,8 +228,6 @@ extern struct ctl_path net_ipv4_ctl_path[];
 extern int inet_peer_threshold;
 extern int inet_peer_minttl;
 extern int inet_peer_maxttl;
-extern int inet_peer_gc_mintime;
-extern int inet_peer_gc_maxtime;
 
 /* From ip_output.c */
 extern int sysctl_ip_dynaddr;
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index ce616d9..dafbf2c 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -54,15 +54,11 @@
  *  1.  Nodes may appear in the tree only with the pool lock held.
  *  2.  Nodes may disappear from the tree only with the pool lock held
  *      AND reference count being 0.
- *  3.  Nodes appears and disappears from unused node list only under
- *      "inet_peer_unused_lock".
- *  4.  Global variable peer_total is modified under the pool lock.
- *  5.  struct inet_peer fields modification:
+ *  3.  Global variable peer_total is modified under the pool lock.
+ *  4.  struct inet_peer fields modification:
  *		avl_left, avl_right, avl_parent, avl_height: pool lock
- *		unused: unused node list lock
  *		refcnt: atomically against modifications on other CPU;
  *		   usually under some other lock to prevent node disappearing
- *		dtime: unused node list lock
  *		daddr: unchangeable
  *		ip_id_count: atomic value (no lock needed)
  */
@@ -104,19 +100,6 @@ int inet_peer_threshold __read_mostly = 65536 + 128;	/* start to throw entries m
 					 * aggressively at this stage */
 int inet_peer_minttl __read_mostly = 120 * HZ;	/* TTL under high load: 120 sec */
 int inet_peer_maxttl __read_mostly = 10 * 60 * HZ;	/* usual time to live: 10 min */
-int inet_peer_gc_mintime __read_mostly = 10 * HZ;
-int inet_peer_gc_maxtime __read_mostly = 120 * HZ;
-
-static struct {
-	struct list_head	list;
-	spinlock_t		lock;
-} unused_peers = {
-	.list			= LIST_HEAD_INIT(unused_peers.list),
-	.lock			= __SPIN_LOCK_UNLOCKED(unused_peers.lock),
-};
-
-static void peer_check_expire(unsigned long dummy);
-static DEFINE_TIMER(peer_periodic_timer, peer_check_expire, 0, 0);
 
 
 /* Called from ip_output.c:ip_init  */
@@ -142,21 +125,6 @@ void __init inet_initpeers(void)
 			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC,
 			NULL);
 
-	/* All the timers, started at system startup tend
-	   to synchronize. Perturb it a bit.
-	 */
-	peer_periodic_timer.expires = jiffies
-		+ net_random() % inet_peer_gc_maxtime
-		+ inet_peer_gc_maxtime;
-	add_timer(&peer_periodic_timer);
-}
-
-/* Called with or without local BH being disabled. */
-static void unlink_from_unused(struct inet_peer *p)
-{
-	spin_lock_bh(&unused_peers.lock);
-	list_del_init(&p->unused);
-	spin_unlock_bh(&unused_peers.lock);
 }
 
 static int addr_compare(const struct inetpeer_addr *a,
@@ -203,20 +171,6 @@ static int addr_compare(const struct inetpeer_addr *a,
 	u;							\
 })
 
-static bool atomic_add_unless_return(atomic_t *ptr, int a, int u, int *newv)
-{
-	int cur, old = atomic_read(ptr);
-
-	while (old != u) {
-		*newv = old + a;
-		cur = atomic_cmpxchg(ptr, old, *newv);
-		if (cur == old)
-			return true;
-		old = cur;
-	}
-	return false;
-}
-
 /*
  * Called with rcu_read_lock()
  * Because we hold no lock against a writer, its quite possible we fall
@@ -225,8 +179,7 @@ static bool atomic_add_unless_return(atomic_t *ptr, int a, int u, int *newv)
  * We exit from this function if number of links exceeds PEER_MAXDEPTH
  */
 static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
-				    struct inet_peer_base *base,
-				    int *newrefcnt)
+				    struct inet_peer_base *base)
 {
 	struct inet_peer *u = rcu_dereference(base->root);
 	int count = 0;
@@ -235,11 +188,9 @@ static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
 		int cmp = addr_compare(daddr, &u->daddr);
 		if (cmp == 0) {
 			/* Before taking a reference, check if this entry was
-			 * deleted, unlink_from_pool() sets refcnt=-1 to make
-			 * distinction between an unused entry (refcnt=0) and
-			 * a freed one.
+			 * deleted (refcnt=-1)
 			 */
-			if (!atomic_add_unless_return(&u->refcnt, 1, -1, newrefcnt))
+			if (!atomic_add_unless(&u->refcnt, 1, -1))
 				u = NULL;
 			return u;
 		}
@@ -366,137 +317,96 @@ static void inetpeer_free_rcu(struct rcu_head *head)
 	kmem_cache_free(peer_cachep, container_of(head, struct inet_peer, rcu));
 }
 
-/* May be called with local BH enabled. */
 static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base,
 			     struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {
-	int do_free;
-
-	do_free = 0;
-
-	write_seqlock_bh(&base->lock);
-	/* Check the reference counter.  It was artificially incremented by 1
-	 * in cleanup() function to prevent sudden disappearing.  If we can
-	 * atomically (because of lockless readers) take this last reference,
-	 * it's safe to remove the node and free it later.
-	 * We use refcnt=-1 to alert lockless readers this entry is deleted.
-	 */
-	if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
-		struct inet_peer __rcu ***stackptr, ***delp;
-		if (lookup(&p->daddr, stack, base) != p)
-			BUG();
-		delp = stackptr - 1; /* *delp[0] == p */
-		if (p->avl_left == peer_avl_empty_rcu) {
-			*delp[0] = p->avl_right;
-			--stackptr;
-		} else {
-			/* look for a node to insert instead of p */
-			struct inet_peer *t;
-			t = lookup_rightempty(p, base);
-			BUG_ON(rcu_deref_locked(*stackptr[-1], base) != t);
-			**--stackptr = t->avl_left;
-			/* t is removed, t->daddr > x->daddr for any
-			 * x in p->avl_left subtree.
-			 * Put t in the old place of p. */
-			RCU_INIT_POINTER(*delp[0], t);
-			t->avl_left = p->avl_left;
-			t->avl_right = p->avl_right;
-			t->avl_height = p->avl_height;
-			BUG_ON(delp[1] != &p->avl_left);
-			delp[1] = &t->avl_left; /* was &p->avl_left */
-		}
-		peer_avl_rebalance(stack, stackptr, base);
-		base->total--;
-		do_free = 1;
+	struct inet_peer __rcu ***stackptr, ***delp;
+
+	if (lookup(&p->daddr, stack, base) != p)
+		BUG();
+	delp = stackptr - 1; /* *delp[0] == p */
+	if (p->avl_left == peer_avl_empty_rcu) {
+		*delp[0] = p->avl_right;
+		--stackptr;
+	} else {
+		/* look for a node to insert instead of p */
+		struct inet_peer *t;
+		t = lookup_rightempty(p, base);
+		BUG_ON(rcu_deref_locked(*stackptr[-1], base) != t);
+		**--stackptr = t->avl_left;
+		/* t is removed, t->daddr > x->daddr for any
+		 * x in p->avl_left subtree.
+		 * Put t in the old place of p. */
+		RCU_INIT_POINTER(*delp[0], t);
+		t->avl_left = p->avl_left;
+		t->avl_right = p->avl_right;
+		t->avl_height = p->avl_height;
+		BUG_ON(delp[1] != &p->avl_left);
+		delp[1] = &t->avl_left; /* was &p->avl_left */
 	}
-	write_sequnlock_bh(&base->lock);
-
-	if (do_free)
-		call_rcu(&p->rcu, inetpeer_free_rcu);
-	else
-		/* The node is used again.  Decrease the reference counter
-		 * back.  The loop "cleanup -> unlink_from_unused
-		 *   -> unlink_from_pool -> putpeer -> link_to_unused
-		 *   -> cleanup (for the same node)"
-		 * doesn't really exist because the entry will have a
-		 * recent deletion time and will not be cleaned again soon.
-		 */
-		inet_putpeer(p);
+	peer_avl_rebalance(stack, stackptr, base);
+	base->total--;
+	call_rcu(&p->rcu, inetpeer_free_rcu);
 }
 
 static struct inet_peer_base *family_to_base(int family)
 {
-	return (family == AF_INET ? &v4_peers : &v6_peers);
+	return family == AF_INET ? &v4_peers : &v6_peers;
 }
 
-static struct inet_peer_base *peer_to_base(struct inet_peer *p)
+/* perform garbage collect on all items stacked during a lookup */
+static int inet_peer_gc(struct inet_peer_base *base,
+			struct inet_peer __rcu **stack[PEER_MAXDEPTH],
+			struct inet_peer __rcu ***stackptr)
 {
-	return family_to_base(p->daddr.family);
-}
-
-/* May be called with local BH enabled. */
-static int cleanup_once(unsigned long ttl, struct inet_peer __rcu **stack[PEER_MAXDEPTH])
-{
-	struct inet_peer *p = NULL;
-
-	/* Remove the first entry from the list of unused nodes. */
-	spin_lock_bh(&unused_peers.lock);
-	if (!list_empty(&unused_peers.list)) {
-		__u32 delta;
+	struct inet_peer *p, *gchead = NULL;
+	__u32 delta, ttl;
+	int cnt = 0;
 
-		p = list_first_entry(&unused_peers.list, struct inet_peer, unused);
+	if (base->total >= inet_peer_threshold)
+		ttl = 0; /* be aggressive */
+	else
+		ttl = inet_peer_maxttl
+				- (inet_peer_maxttl - inet_peer_minttl) / HZ *
+					base->total / inet_peer_threshold * HZ;
+	stackptr--; /* last stack slot is peer_avl_empty */
+	while (stackptr > stack) {
+		stackptr--;
+		p = rcu_deref_locked(**stackptr, base);
 		delta = (__u32)jiffies - p->dtime;
-
-		if (delta < ttl) {
-			/* Do not prune fresh entries. */
-			spin_unlock_bh(&unused_peers.lock);
-			return -1;
+		if (atomic_read(&p->refcnt) == 0 && delta >= ttl &&
+		    atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
+			p->gc_next = gchead;
+			gchead = p;
 		}
-
-		list_del_init(&p->unused);
-
-		/* Grab an extra reference to prevent node disappearing
-		 * before unlink_from_pool() call. */
-		atomic_inc(&p->refcnt);
 	}
-	spin_unlock_bh(&unused_peers.lock);
-
-	if (p == NULL)
-		/* It means that the total number of USED entries has
-		 * grown over inet_peer_threshold.  It shouldn't really
-		 * happen because of entry limits in route cache. */
-		return -1;
-
-	unlink_from_pool(p, peer_to_base(p), stack);
-	return 0;
+	while ((p = gchead) != NULL) {
+		gchead = p->gc_next;
+		cnt++;
+		unlink_from_pool(p, base, stack);
+	}
+	return cnt;
 }
 
-/* Called with or without local BH being disabled. */
 struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 {
 	struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr;
 	struct inet_peer_base *base = family_to_base(daddr->family);
 	struct inet_peer *p;
 	unsigned int sequence;
-	int invalidated, newrefcnt = 0;
+	int invalidated, gccnt = 0;
 
-	/* Look up for the address quickly, lockless.
+	/* Attempt a lockless lookup first.
 	 * Because of a concurrent writer, we might not find an existing entry.
 	 */
 	rcu_read_lock();
 	sequence = read_seqbegin(&base->lock);
-	p = lookup_rcu(daddr, base, &newrefcnt);
+	p = lookup_rcu(daddr, base);
 	invalidated = read_seqretry(&base->lock, sequence);
 	rcu_read_unlock();
 
-	if (p) {
-found:		/* The existing node has been found.
-		 * Remove the entry from unused list if it was there.
-		 */
-		if (newrefcnt == 1)
-			unlink_from_unused(p);
+	if (p)
 		return p;
-	}
 
 	/* If no writer did a change during our lookup, we can return early. */
 	if (!create && !invalidated)
@@ -506,11 +416,17 @@ found:		/* The existing node has been found.
 	 * At least, nodes should be hot in our cache.
 	 */
 	write_seqlock_bh(&base->lock);
+relookup:
 	p = lookup(daddr, stack, base);
 	if (p != peer_avl_empty) {
-		newrefcnt = atomic_inc_return(&p->refcnt);
+		atomic_inc(&p->refcnt);
 		write_sequnlock_bh(&base->lock);
-		goto found;
+		return p;
+	}
+	if (!gccnt) {
+		gccnt = inet_peer_gc(base, stack, stackptr);
+		if (gccnt && create)
+			goto relookup;
 	}
 	p = create ? kmem_cache_alloc(peer_cachep, GFP_ATOMIC) : NULL;
 	if (p) {
@@ -525,7 +441,6 @@ found:		/* The existing node has been found.
 		p->pmtu_expires = 0;
 		p->pmtu_orig = 0;
 		memset(&p->redirect_learned, 0, sizeof(p->redirect_learned));
-		INIT_LIST_HEAD(&p->unused);
 
 
 		/* Link the node. */
@@ -534,63 +449,14 @@ found:		/* The existing node has been found.
 	}
 	write_sequnlock_bh(&base->lock);
 
-	if (base->total >= inet_peer_threshold)
-		/* Remove one less-recently-used entry. */
-		cleanup_once(0, stack);
-
 	return p;
 }
-
-static int compute_total(void)
-{
-	return v4_peers.total + v6_peers.total;
-}
 EXPORT_SYMBOL_GPL(inet_getpeer);
 
-/* Called with local BH disabled. */
-static void peer_check_expire(unsigned long dummy)
-{
-	unsigned long now = jiffies;
-	int ttl, total;
-	struct inet_peer __rcu **stack[PEER_MAXDEPTH];
-
-	total = compute_total();
-	if (total >= inet_peer_threshold)
-		ttl = inet_peer_minttl;
-	else
-		ttl = inet_peer_maxttl
-				- (inet_peer_maxttl - inet_peer_minttl) / HZ *
-					total / inet_peer_threshold * HZ;
-	while (!cleanup_once(ttl, stack)) {
-		if (jiffies != now)
-			break;
-	}
-
-	/* Trigger the timer after inet_peer_gc_mintime .. inet_peer_gc_maxtime
-	 * interval depending on the total number of entries (more entries,
-	 * less interval). */
-	total = compute_total();
-	if (total >= inet_peer_threshold)
-		peer_periodic_timer.expires = jiffies + inet_peer_gc_mintime;
-	else
-		peer_periodic_timer.expires = jiffies
-			+ inet_peer_gc_maxtime
-			- (inet_peer_gc_maxtime - inet_peer_gc_mintime) / HZ *
-				total / inet_peer_threshold * HZ;
-	add_timer(&peer_periodic_timer);
-}
-
 void inet_putpeer(struct inet_peer *p)
 {
-	local_bh_disable();
-
-	if (atomic_dec_and_lock(&p->refcnt, &unused_peers.lock)) {
-		list_add_tail(&p->unused, &unused_peers.list);
-		p->dtime = (__u32)jiffies;
-		spin_unlock(&unused_peers.lock);
-	}
-
-	local_bh_enable();
+	p->dtime = (__u32)jiffies;
+	atomic_dec(&p->refcnt);
 }
 EXPORT_SYMBOL_GPL(inet_putpeer);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 57d0752..69fd720 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -398,20 +398,6 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
-		.procname	= "inet_peer_gc_mintime",
-		.data		= &inet_peer_gc_mintime,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
-	},
-	{
-		.procname	= "inet_peer_gc_maxtime",
-		.data		= &inet_peer_gc_maxtime,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
-	},
-	{
 		.procname	= "tcp_orphan_retries",
 		.data		= &sysctl_tcp_orphan_retries,
 		.maxlen		= sizeof(int),



^ permalink raw reply related

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: Eric Dumazet @ 2011-06-08 23:39 UTC (permalink / raw)
  To: Jerry Chu
  Cc: David Miller, Hagen Paul Pfeifer, tsunanet,
	netdev@vger.kernel.org
In-Reply-To: <BANLkTimo_fbezdX2evKt0pAg0N+DSJQeSR6NoHLus=Z40WLkHg@mail.gmail.com>

Le mercredi 08 juin 2011 à 15:26 -0700, Jerry Chu a écrit :
> Eric,
> 
> It just occurred to me now that initRTO is being reduced, both TCP_SYN_RETRIES
> and TCP_SYNACK_RETRIES should be bumped up a bit, to e.g., 7 (?) to meet the
> 3 minutes R2 requirement per RFC1122.
> 
> If you agree, I will submit another patch.
> 

Good catch, but no RFC lowered yet this 3 minutes requirement ?




^ permalink raw reply

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: David Miller @ 2011-06-08 23:44 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hkchu, hagen, tsunanet, netdev
In-Reply-To: <1307576361.3980.26.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Jun 2011 01:39:21 +0200

> Le mercredi 08 juin 2011 à 15:26 -0700, Jerry Chu a écrit :
>> It just occurred to me now that initRTO is being reduced, both TCP_SYN_RETRIES
>> and TCP_SYNACK_RETRIES should be bumped up a bit, to e.g., 7 (?) to meet the
>> 3 minutes R2 requirement per RFC1122.
>> 
>> If you agree, I will submit another patch.
> 
> Good catch, but no RFC lowered yet this 3 minutes requirement ?

I host my email on other planets, please do not break this.

^ permalink raw reply

* Re: [PATCH] v2 ethtool: remove support for ETHTOOL_GRXNTUPLE
From: David Miller @ 2011-06-08 23:45 UTC (permalink / raw)
  To: bhutchings; +Cc: alexander.h.duyck, jeffrey.t.kirsher, netdev
In-Reply-To: <1307574227.22348.501.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 09 Jun 2011 00:03:47 +0100

> On Wed, 2011-06-08 at 15:35 -0700, Alexander Duyck wrote:
>> This change is meant to remove all support for displaying an ntuple as
>> strings via ETHTOOL_GRXNTUPLE.  The reason for this change is due to the
>> fact that multiple issues have been found including:
>>  - Multiple buffer overruns for strings being displayed.
>>  - Incorrect filters displayed, cleared filters with ring of -2 are displayed
>>  - Setting get_rx_ntuple displays no rules if defined.
>>  - Endianess wrong on displayed values.
>>  - Hard limit of 1024 filters makes display functionality extremely limited
>> 
>> The only driver that had supported this interface was ixgbe.  Since it no
>> longer uses the interface and due to the issues mentioned above I am
>> submitting this patch to remove it.
>> 
>> v2:
>> Updated based on comments from Ben Hutchings
>>  - Left ETH_SS_NTUPLE_FILTERS in code but commented on it being deprecated
>>  - Removed ethtool_rx_ntuple_list and ethtool_rx_ntuple_flow_spec_container
>>  - Left ETHTOOL_GRXNTUPLE but commented it as deprecated
>> 
>> Also cleaned up set_rx_ntuple since there is no flow spec container to
>> maintain we can drop all the code for the alloc and free of it and just
>> return ops->set_rx_ntuple().
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Acked-by: Ben Hutchings <bhutchings@solarflare.com>

I'll apply this to net-next-2.6, thanks!

^ permalink raw reply

* [PATCH] tun: do not put self in waitq if doing a nonblock read
From: Amos Kong @ 2011-06-08 23:46 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, davem, kvm, mst

Perf shows a relatively high rate (about 8%) race in
spin_lock_irqsave() when doing netperf between external host and
guest. It's mainly becuase the lock contention between the
tun_do_read() and tun_xmit_skb(), so this patch do not put self into
waitqueue to reduce this kind of race. After this patch, it drops to
4%.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
---
 drivers/net/tun.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 74e9405..95dbff4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -817,7 +817,8 @@ static ssize_t tun_do_read(struct tun_struct *tun,
 
 	tun_debug(KERN_INFO, tun, "tun_chr_read\n");
 
-	add_wait_queue(&tun->wq.wait, &wait);
+	if (unlikely(!noblock))
+		add_wait_queue(&tun->wq.wait, &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -848,7 +849,8 @@ static ssize_t tun_do_read(struct tun_struct *tun,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(&tun->wq.wait, &wait);
+	if (unlikely(!noblock))
+		remove_wait_queue(&tun->wq.wait, &wait);
 
 	return ret;
 }


^ permalink raw reply related

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: Rick Jones @ 2011-06-08 23:57 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, hkchu, hagen, tsunanet, netdev
In-Reply-To: <20110608.164433.1248144748353908346.davem@davemloft.net>

On Wed, 2011-06-08 at 16:44 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 09 Jun 2011 01:39:21 +0200
> 
> > Le mercredi 08 juin 2011 à 15:26 -0700, Jerry Chu a écrit :
> >> It just occurred to me now that initRTO is being reduced, both TCP_SYN_RETRIES
> >> and TCP_SYNACK_RETRIES should be bumped up a bit, to e.g., 7 (?) to meet the
> >> 3 minutes R2 requirement per RFC1122.
> >> 
> >> If you agree, I will submit another patch.
> > 
> > Good catch, but no RFC lowered yet this 3 minutes requirement ?
> 
> I host my email on other planets, please do not break this.

As the RTT to Mars is a minimum of 500 seconds and a maximum of 2500,
and a minimum of 276 seconds for Venus, your software must already be
successfully dealing with TCP connection timeouts less than the RTT to
the planets nearest to Earth, so there should be no need to maintain 3
minutes anyway :)

rick jones
http://c2.com/cgi/wiki?LightSpeedLag
http://ase.tufts.edu/cosmos/print_chapter.asp?id=1


^ permalink raw reply

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: David Miller @ 2011-06-08 23:59 UTC (permalink / raw)
  To: rick.jones2; +Cc: eric.dumazet, hkchu, hagen, tsunanet, netdev
In-Reply-To: <1307577470.8149.3096.camel@tardy>

From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 08 Jun 2011 16:57:50 -0700

> On Wed, 2011-06-08 at 16:44 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Thu, 09 Jun 2011 01:39:21 +0200
>> 
>> > Le mercredi 08 juin 2011 à 15:26 -0700, Jerry Chu a écrit :
>> >> It just occurred to me now that initRTO is being reduced, both TCP_SYN_RETRIES
>> >> and TCP_SYNACK_RETRIES should be bumped up a bit, to e.g., 7 (?) to meet the
>> >> 3 minutes R2 requirement per RFC1122.
>> >> 
>> >> If you agree, I will submit another patch.
>> > 
>> > Good catch, but no RFC lowered yet this 3 minutes requirement ?
>> 
>> I host my email on other planets, please do not break this.
> 
> As the RTT to Mars is a minimum of 500 seconds and a maximum of 2500,
> and a minimum of 276 seconds for Venus, your software must already be
> successfully dealing with TCP connection timeouts less than the RTT to
> the planets nearest to Earth, so there should be no need to maintain 3
> minutes anyway :)

This just proves that the stickler for details can ruin any joke, sigh...
:-)

^ permalink raw reply

* Re: [PATCH 1/2] vlan: only create special VLAN 0 once
From: David Miller @ 2011-06-09  0:01 UTC (permalink / raw)
  To: jesse; +Cc: jbohac, kaber, netdev, pedro.netdev
In-Reply-To: <BANLkTikmNXK=i1HhtGtb1+81rGqGgBQ8mA@mail.gmail.com>

From: Jesse Gross <jesse@nicira.com>
Date: Tue, 7 Jun 2011 18:25:23 -0700

> No, it's not true.  All drivers store the registered vlan filters in
> some way so that they can restore them when the device is reset.  This
> is currently done in one of two ways: storing a bitmap or iterating
> over the devices currently registered in a group.
> 
> The vlan code is moving away from directly accessing groups and no new
> drivers do this.  In fact, once all drivers are converted over groups
> will not even be registered on devices.  This is because otherwise
> there is quite a bit of vlan code in each driver, which leads to
> inconsistent behavior and bugs.
> 
> Really, all a driver needs to know is whether it should add a given
> vlan to its table, not what the upper layers plan to do with it.  So
> when ndo_vlan_rx_add_vid() is called it should add it to its CAM table
> and store it if it is needed to restore behavior after a reset, just
> as is done with all other configuration state.

Thanks for clearing all of this up Jesse.

^ permalink raw reply

* Re: [PATCH net-next-2.6] inetpeer: remove unused list
From: David Miller @ 2011-06-09  0:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tim.c.chen, andi, netdev
In-Reply-To: <1307576134.3980.23.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Jun 2011 01:35:34 +0200

> Andi Kleen and Tim Chen reported huge contention on inetpeer
> unused_peers.lock, on memcached workload on a 40 core machine, with
> disabled route cache.
> 
> It appears we constantly flip peers refcnt between 0 and 1 values, and
> we must insert/remove peers from unused_peers.list, holding a contended
> spinlock.
> 
> Remove this list completely and perform a garbage collection on-the-fly,
> at lookup time, using the expired nodes we met during the tree
> traversal.
> 
> This removes a lot of code, makes locking more standard, and obsoletes
> two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
> removes two pointers in inet_peer structure.
> 
> There is still a false sharing effect because refcnt is in first cache
> line of object [were the links and keys used by lookups are located], we
> might move it at the end of inet_peer structure to let this first cache
> line mostly read by cpus.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Didn't expect you to implement this so fast :-)

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: David Miller @ 2011-06-09  0:06 UTC (permalink / raw)
  To: hkchu; +Cc: eric.dumazet, hagen, tsunanet, netdev
In-Reply-To: <1307567318-26760-1-git-send-email-hkchu@google.com>

From: "H.K. Jerry Chu" <hkchu@google.com>
Date: Wed,  8 Jun 2011 14:08:38 -0700

> From: Jerry Chu <hkchu@google.com>
> 
> This patch lowers the default initRTO from 3secs to 1sec per
> RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet
> has been retransmitted, AND the TCP timestamp option is not on.
> 
> It also adds support to take RTT sample during 3WHS on the passive
> open side, just like its active open counterpart, and uses it, if
> valid, to seed the initRTO for the data transmission phase.
> 
> The patch also resets ssthresh to its initial default at the
> beginning of the data transmission phase, and reduces cwnd to 1 if
> there has been MORE THAN ONE retransmission during 3WHS per RFC5681.
> 
> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>

Ok, I'll apply this to net-next-2.6

We can add any necessary follow-on tweaks.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] ipv6: generate link local address for GRE tunnel
From: David Miller @ 2011-06-09  0:06 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20110608134430.112c2591@nehalam.ftrdhcpuser.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 8 Jun 2011 13:44:30 -0700

> Use same logic as SIT tunnel to handle link local address
> for GRE tunnel. OSPFv3 requires link-local address to function.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH net-next] iph: use default get_stats
From: David Miller @ 2011-06-09  0:06 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20110608110913.54becfba@nehalam.ftrdhcpuser.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 8 Jun 2011 11:09:13 -0700

> This driver keeps stats in net_device stats therefore it
> does not need to define it's own get_stats hook.
> 
> Also, use standard format for net_device_ops (without &).
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bnx2i: fix bnx2i driver to test for physical device support of iscsi early
From: Neil Horman @ 2011-06-09  0:19 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org, Mike Christie, David S. Miller
In-Reply-To: <1307563443.8532.42.camel@HP1>

On Wed, Jun 08, 2011 at 01:04:03PM -0700, Michael Chan wrote:
> 
> On Wed, 2011-06-08 at 12:29 -0700, Neil Horman wrote:
> > Recently reported error message indicating the following error:
> > bnx2 0003:01:00.1: eth1: Failed waiting for ULP up call to complete
> > 
> > The card in question is:
> > eth0: Broadcom NetXtreme II BCM5709 1000Base-SX (C0)
> > 
> > Which doesn't appear to support isci.  The undrlying cause of the error above is
> > the fact that bnx2i assumes that every bnx2 card supports iscsi, and doesn't
> > actually test for support until the iscsi virtual adapter is being brought up in
> > bnx2i_start (pointed to by cnic_start).  bnx2i_start tests for
> > cnic->max_iscsi_conn, and if that value is zero, attempts to unregister the
> > device from the cnic framework.  Unfortunately, cnic_unregister_device (pointed
> > to by cnic->unregister_device), waits for the ULP_F_CALL_PENDING to be cleared
> > before completing, and if that doesn't occur within a few tenths of a second, we
> > issue the above warning.  Since that flag gets set prior to the call to
> > bnx2i_start and cleared on its return, we're guaranteed to get this error for
> > any bnx2 adapter not supporting iscsi.
> > 
> > It seems the correct fix is to detect earlier if an adapter supports iscsi and
> > not even try to register the device with cnic if it doesn't.  This is already
> > what bnx2x does, so this patch clones the functionality of that driver for bnx2.
> 
> Thanks Neil.  We have a similar patch almost ready to be posted.  The
> main difference is that cp->max_iscsi_conn is read ahead of time during
> bnx2_init_one().  We cannot read registers in bnx2_cnic_probe() because
> it may be called when the device is already down.  I'll send out that
How do you figure?  bnx2_cnic_probe is only called from is_cnic_dev (which still
makes me shake my head a bit).  is_cnic_dev is only called from
cnic_netdev_event, which holds the rtnl_lock.  Since the event we trigger on is
called from NETDEV_REGISTER or NETDEV_UP, I don't see how we can wind up
suspending the device prior to caling bnx2_cnic_probe.  And it seems to me to be
a better solution to do the read in bnx2_cnic_probe than to distribute the
initalization of cnic_eth_dev throughout the bnx2 driver.

Neil

> patchset very soon.  Thanks again.
> 
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Michael Chan <mchan@broadcom.com>
> > CC: Mike Christie <michaelc@cs.wisc.edu>
> > CC: "David S. Miller" <davem@davemloft.net>
> > ---
> >  drivers/net/bnx2.c              |    2 +-
> >  drivers/net/cnic.c              |   15 +++------------
> >  drivers/scsi/bnx2i/bnx2i_init.c |   29 +++++++++++++++--------------
> >  3 files changed, 19 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > index 57d3293..927e3e6 100644
> > --- a/drivers/net/bnx2.c
> > +++ b/drivers/net/bnx2.c
> > @@ -423,7 +423,7 @@ struct cnic_eth_dev *bnx2_cnic_probe(struct net_device *dev)
> >  	cp->drv_ctl = bnx2_drv_ctl;
> >  	cp->drv_register_cnic = bnx2_register_cnic;
> >  	cp->drv_unregister_cnic = bnx2_unregister_cnic;
> > -
> > +	cp->max_iscsi_conn = bnx2_reg_rd_ind(bp, BNX2_FW_MAX_ISCSI_CONN);
> >  	return cp;
> >  }
> >  EXPORT_SYMBOL(bnx2_cnic_probe);
> > diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
> > index 11a92af..b6f6211 100644
> > --- a/drivers/net/cnic.c
> > +++ b/drivers/net/cnic.c
> > @@ -2420,13 +2420,11 @@ static int cnic_bnx2x_fcoe_destroy(struct cnic_dev *dev, struct kwqe *kwqe)
> >  
> >  static int cnic_bnx2x_fcoe_fw_destroy(struct cnic_dev *dev, struct kwqe *kwqe)
> >  {
> > -	struct fcoe_kwqe_destroy *req;
> >  	union l5cm_specific_data l5_data;
> >  	struct cnic_local *cp = dev->cnic_priv;
> >  	int ret;
> >  	u32 cid;
> >  
> > -	req = (struct fcoe_kwqe_destroy *) kwqe;
> >  	cid = BNX2X_HW_CID(cp, cp->fcoe_init_cid);
> >  
> >  	memset(&l5_data, 0, sizeof(l5_data));
> > @@ -4218,14 +4216,6 @@ static void cnic_enable_bnx2_int(struct cnic_dev *dev)
> >  		BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID | cp->last_status_idx);
> >  }
> >  
> > -static void cnic_get_bnx2_iscsi_info(struct cnic_dev *dev)
> > -{
> > -	u32 max_conn;
> > -
> > -	max_conn = cnic_reg_rd_ind(dev, BNX2_FW_MAX_ISCSI_CONN);
> > -	dev->max_iscsi_conn = max_conn;
> > -}
> > -
> >  static void cnic_disable_bnx2_int_sync(struct cnic_dev *dev)
> >  {
> >  	struct cnic_local *cp = dev->cnic_priv;
> > @@ -4550,8 +4540,6 @@ static int cnic_start_bnx2_hw(struct cnic_dev *dev)
> >  		return err;
> >  	}
> >  
> > -	cnic_get_bnx2_iscsi_info(dev);
> > -
> >  	return 0;
> >  }
> >  
> > @@ -5230,6 +5218,9 @@ static struct cnic_dev *init_bnx2_cnic(struct net_device *dev)
> >  	cp->close_conn = cnic_close_bnx2_conn;
> >  	cp->next_idx = cnic_bnx2_next_idx;
> >  	cp->hw_idx = cnic_bnx2_hw_idx;
> > +
> > +	cdev->max_iscsi_conn = ethdev->max_iscsi_conn;
> > +
> >  	return cdev;
> >  
> >  cnic_err:
> > diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
> > index 1d24a28..263bc60 100644
> > --- a/drivers/scsi/bnx2i/bnx2i_init.c
> > +++ b/drivers/scsi/bnx2i/bnx2i_init.c
> > @@ -163,21 +163,14 @@ void bnx2i_start(void *handle)
> >  	struct bnx2i_hba *hba = handle;
> >  	int i = HZ;
> >  
> > -	if (!hba->cnic->max_iscsi_conn) {
> > -		printk(KERN_ALERT "bnx2i: dev %s does not support "
> > -			"iSCSI\n", hba->netdev->name);
> > +	/**
> > +	 * We should never register devices that don't support iscsi
> > +	 * (see bnx2i_init_one), so something is wrong if we try to
> > +	 * to start an iscsi adapter on hardware wtih 0 supported
> > +	 * iscsi connections
> > +	 */
> > +	BUG_ON(!hba->cnic->max_iscsi_conn);
> >  
> > -		if (test_bit(BNX2I_CNIC_REGISTERED, &hba->reg_with_cnic)) {
> > -			mutex_lock(&bnx2i_dev_lock);
> > -			list_del_init(&hba->link);
> > -			adapter_count--;
> > -			hba->cnic->unregister_device(hba->cnic, CNIC_ULP_ISCSI);
> > -			clear_bit(BNX2I_CNIC_REGISTERED, &hba->reg_with_cnic);
> > -			mutex_unlock(&bnx2i_dev_lock);
> > -			bnx2i_free_hba(hba);
> > -		}
> > -		return;
> > -	}
> >  	bnx2i_send_fw_iscsi_init_msg(hba);
> >  	while (!test_bit(ADAPTER_STATE_UP, &hba->adapter_state) && i--)
> >  		msleep(BNX2I_INIT_POLL_TIME);
> > @@ -281,6 +274,13 @@ static int bnx2i_init_one(struct bnx2i_hba *hba, struct cnic_dev *cnic)
> >  	int rc;
> >  
> >  	mutex_lock(&bnx2i_dev_lock);
> > +	if (!cnic->max_iscsi_conn) {
> > +		printk(KERN_ALERT "bnx2i: dev %s does not support "
> > +			"iSCSI\n", hba->netdev->name);
> > +		rc = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +
> >  	hba->cnic = cnic;
> >  	rc = cnic->register_device(cnic, CNIC_ULP_ISCSI, hba);
> >  	if (!rc) {
> > @@ -298,6 +298,7 @@ static int bnx2i_init_one(struct bnx2i_hba *hba, struct cnic_dev *cnic)
> >  	else
> >  		printk(KERN_ERR "bnx2i dev reg, unknown error, %d\n", rc);
> >  
> > +out:
> >  	mutex_unlock(&bnx2i_dev_lock);
> >  
> >  	return rc;
> 
> 
> 

^ permalink raw reply

* Re: [PATCH] bnx2i: fix bnx2i driver to test for physical device support of iscsi early
From: Michael Chan @ 2011-06-09  0:46 UTC (permalink / raw)
  To: 'Neil Horman'
  Cc: 'netdev@vger.kernel.org', 'Mike Christie',
	'David S. Miller'
In-Reply-To: <20110609001948.GA8203@neilslaptop.think-freely.org>

Neil Horman wrote:

> How do you figure?  bnx2_cnic_probe is only called from is_cnic_dev
> (which still
> makes me shake my head a bit).  is_cnic_dev is only called from
> cnic_netdev_event, which holds the rtnl_lock.  Since the event we
> trigger on is
> called from NETDEV_REGISTER or NETDEV_UP, I don't see how we can wind
> up
> suspending the device prior to caling bnx2_cnic_probe.

Consider NETDEV_REGISTER -> NETDEV_UP -> NETDEV_DOWN

During NETDEV_DOWN, we shutdown the device, but the netdev is still
registered.

Then we load cnic.  The NETDEV events will be replayed when we call
netdev_register_notifier() and cnic will get the NETDEV_REGISTER event.
We'll then call bnx2_cnic_probe() but the device is down.



^ permalink raw reply

* [PATCH 2/9] xen: convert to 64 bit stats interface
From: Stephen Hemminger @ 2011-06-09  0:53 UTC (permalink / raw)
  To: David S. Miller, Jeremy Fitzhardinge; +Cc: netdev, xen-devel
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: xen-stats64.patch --]
[-- Type: text/plain, Size: 2930 bytes --]

Convert xen driver to 64 bit statistics interface.
This driver was already counting packet per queue in a 64 bit value so not
a huge change.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/xen-netfront.c	2011-06-07 19:34:20.752647705 +0900
+++ b/drivers/net/xen-netfront.c	2011-06-07 20:02:11.028930158 +0900
@@ -122,7 +122,14 @@ struct netfront_info {
 	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
 
 	/* Statistics */
-	unsigned long rx_gso_checksum_fixup;
+	u64 rx_packets;
+	u64 rx_bytes;
+	u64 rx_errors;
+	u64 rx_gso_checksum_fixup;
+
+	u64 tx_packets;
+	u64 tx_bytes;
+	u64 tx_dropped;
 };
 
 struct netfront_rx_info {
@@ -552,8 +559,8 @@ static int xennet_start_xmit(struct sk_b
 	if (notify)
 		notify_remote_via_irq(np->netdev->irq);
 
-	dev->stats.tx_bytes += skb->len;
-	dev->stats.tx_packets++;
+	np->tx_bytes += skb->len;
+	np->tx_packets++;
 
 	/* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
 	xennet_tx_buf_gc(dev);
@@ -566,7 +573,7 @@ static int xennet_start_xmit(struct sk_b
 	return NETDEV_TX_OK;
 
  drop:
-	dev->stats.tx_dropped++;
+	np->tx_dropped++;
 	dev_kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
@@ -847,6 +854,7 @@ out:
 static int handle_incoming_queue(struct net_device *dev,
 				 struct sk_buff_head *rxq)
 {
+	struct netfront_info *np = netdev_priv(dev);
 	int packets_dropped = 0;
 	struct sk_buff *skb;
 
@@ -867,12 +875,11 @@ static int handle_incoming_queue(struct
 		if (checksum_setup(dev, skb)) {
 			kfree_skb(skb);
 			packets_dropped++;
-			dev->stats.rx_errors++;
 			continue;
 		}
 
-		dev->stats.rx_packets++;
-		dev->stats.rx_bytes += skb->len;
+		np->rx_packets++;
+		np->rx_bytes += skb->len;
 
 		/* Pass it up. */
 		netif_receive_skb(skb);
@@ -919,7 +926,7 @@ static int xennet_poll(struct napi_struc
 err:
 			while ((skb = __skb_dequeue(&tmpq)))
 				__skb_queue_tail(&errq, skb);
-			dev->stats.rx_errors++;
+			np->rx_errors++;
 			i = np->rx.rsp_cons;
 			continue;
 		}
@@ -1034,6 +1041,22 @@ static int xennet_change_mtu(struct net_
 	return 0;
 }
 
+static struct rtnl_link_stats64 *xennet_get_stats64(struct net_device *dev,
+						    struct rtnl_link_stats64 *stats)
+{
+	struct netfront_info *np = netdev_priv(dev);
+
+	stats->rx_packets = np->rx_packets;
+	stats->rx_bytes   = np->rx_bytes;
+	stats->rx_errors  = np->rx_errors;
+
+	stats->tx_packets = np->tx_packets;
+	stats->tx_bytes   = np->tx_bytes;
+	stats->tx_bytes   = np->tx_dropped;
+
+	return stats;
+}
+
 static void xennet_release_tx_bufs(struct netfront_info *np)
 {
 	struct sk_buff *skb;
@@ -1182,6 +1205,7 @@ static const struct net_device_ops xenne
 	.ndo_stop            = xennet_close,
 	.ndo_start_xmit      = xennet_start_xmit,
 	.ndo_change_mtu	     = xennet_change_mtu,
+	.ndo_get_stats64     = xennet_get_stats64,
 	.ndo_set_mac_address = eth_mac_addr,
 	.ndo_validate_addr   = eth_validate_addr,
 	.ndo_fix_features    = xennet_fix_features,

^ permalink raw reply

* Re: [PATCH] bnx2i: fix bnx2i driver to test for physical device support of iscsi early
From: Neil Horman @ 2011-06-09  1:14 UTC (permalink / raw)
  To: Michael Chan
  Cc: 'netdev@vger.kernel.org', 'Mike Christie',
	'David S. Miller'
In-Reply-To: <C27F8246C663564A84BB7AB343977242667C64F9B5@IRVEXCHCCR01.corp.ad.broadcom.com>

On Wed, Jun 08, 2011 at 05:46:23PM -0700, Michael Chan wrote:
> Neil Horman wrote:
> 
> > How do you figure?  bnx2_cnic_probe is only called from is_cnic_dev
> > (which still
> > makes me shake my head a bit).  is_cnic_dev is only called from
> > cnic_netdev_event, which holds the rtnl_lock.  Since the event we
> > trigger on is
> > called from NETDEV_REGISTER or NETDEV_UP, I don't see how we can wind
> > up
> > suspending the device prior to caling bnx2_cnic_probe.
> 
> Consider NETDEV_REGISTER -> NETDEV_UP -> NETDEV_DOWN
> 
> During NETDEV_DOWN, we shutdown the device, but the netdev is still
> registered.
> 
> Then we load cnic.  The NETDEV events will be replayed when we call
> netdev_register_notifier() and cnic will get the NETDEV_REGISTER event.
> We'll then call bnx2_cnic_probe() but the device is down.
Ah,ok.  Please CC me on your patch.
Neil

^ permalink raw reply

* [PATCH 0/9] 64 bit statistics for VM and 10G drivers
From: Stephen Hemminger @ 2011-06-09  0:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

All 10G and virtual devices should be using the 64 bit statistics interface
because counters can wrap to fast. The standard monitoring program net-snmp
polls devices every 3 seconds (too fast), but even that can wrap too fast
to be detected.

Net-snmp needs to be updated to use netlink and handle 64 bit values.
It is still using /proc



^ permalink raw reply

* [PATCH 1/2] tun: reserves space for network in skb
From: Stephen Hemminger @ 2011-06-09  0:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110609003306.651532958@vyatta.com>

[-- Attachment #1: tun-tap-pad.patch --]
[-- Type: text/plain, Size: 1127 bytes --]

The tun driver allocates skb's to hold data from user and then passes
the data into the network stack as received data. Most network devices
allocate the receive skb with routines like dev_alloc_skb() that reserves
additional space for use by network protocol stack but tun does not.

Because of the lack of padding, when the packet is passed through bridge
netfilter a new skb has to be allocated.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/tun.c	2011-05-13 12:37:11.619318207 -0700
+++ b/drivers/net/tun.c	2011-05-13 16:43:25.643040523 -0700
@@ -584,7 +584,7 @@ static __inline__ ssize_t tun_get_user(s
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
-	size_t len = count, align = 0;
+	size_t len = count, align = NET_SKB_PAD;
 	struct virtio_net_hdr gso = { 0 };
 	int offset = 0;
 
@@ -614,7 +614,7 @@ static __inline__ ssize_t tun_get_user(s
 	}
 
 	if ((tun->flags & TUN_TYPE_MASK) == TUN_TAP_DEV) {
-		align = NET_IP_ALIGN;
+		align += NET_IP_ALIGN;
 		if (unlikely(len < ETH_HLEN ||
 			     (gso.hdr_len && gso.hdr_len < ETH_HLEN)))
 			return -EINVAL;



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox