Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/4] net_sched: gred: correct comment about qavg calculation in RIO mode
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
  To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward

Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
 net/sched/sch_gred.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index e901583..fca73cd 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -176,7 +176,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		skb->tc_index = (skb->tc_index & ~GRED_VQ_MASK) | dp;
 	}
 
-	/* sum up all the qaves of prios <= to ours to get the new qave */
+	/* sum up all the qaves of prios < ours to get the new qave */
 	if (!gred_wred_mode(t) && gred_rio_mode(t)) {
 		int i;
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/4] net_sched: gred: fix qave reporting via netlink
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
  To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>

q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
to pass q->vars.qavg as the backlog value, we need to un-scale it.
Additionally, the qave value returned via netlink should not be Wlog
scaled, so we need to un-scale the result of red_calc_qavg().

This caused artificially high values for "Average Queue" to be shown
by 'tc -s -d qdisc', but did not affect the actual operation of GRED.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
 net/sched/sch_gred.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index e19d4eb..b2570b5 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -534,6 +534,7 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
 	for (i = 0; i < MAX_DPs; i++) {
 		struct gred_sched_data *q = table->tab[i];
 		struct tc_gred_qopt opt;
+		unsigned long qavg;
 
 		memset(&opt, 0, sizeof(opt));
 
@@ -565,7 +566,9 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
 		if (gred_wred_mode(table))
 			gred_load_wred_set(table, q);
 
-		opt.qave = red_calc_qavg(&q->parms, &q->vars, q->vars.qavg);
+		qavg = red_calc_qavg(&q->parms, &q->vars,
+				     q->vars.qavg >> q->parms.Wlog);
+		opt.qave = qavg >> q->parms.Wlog;
 
 append_opt:
 		if (nla_append(skb, sizeof(opt), &opt) < 0)
-- 
1.7.4.1

^ permalink raw reply related

* Re: bnx2 cards intermittantly going offline
From: Sven Ulland @ 2012-09-13 15:45 UTC (permalink / raw)
  To: netdev; +Cc: Marc A. Donges, Michael Chan
In-Reply-To: <20120913135108.GC3650@abomination.net.united.domain>

On 09/13/2012 03:51 PM, Marc A. Donges wrote:
> After 55 days of operation the machine (A) suddenly was no longer
> reachable via network. Strangely, a second machine (B) that should
> take over the IP addresses (keepalived) did not take over. Only
> after shutting the switchport to which A is attached did B take
> over.

Hi. We've had the same symptom with our BCM5709S [14e4:163a] on
Debian. Like you, we were on stable's 2.6.32-41squeeze2. Google led us
to many similar issues [1,2,3]. They concluded with the fix being in
mainline commit c441b8d2 [4]: "bnx2: Fix lost MSI-X problem on 5709
NICs".

Broadcom: Can you publish a tool that decodes ethtool -d dumps to make
debugging easier, or do you deem it no longer necessary with the the
register dump commits in 555069da?

Now, Debian's 2.6.32-41squeeze2 is based on longterm release 2.6.32.54
[5]. That version includes commit 0b7817ed [6], which is a backport of
the already mentioned mainline commit c441b8d2.

So we tried digging further and applying some seemingly relevant
commits [7,8] to our 2.6.32, but without any change in behaviour. Our
temporary fix was to run 'ethtool -t ethX' to reset the device every
time it locked up.

This dragged on with various builds, until we ended up on mainline
2.6.38 where we no longer saw any symptoms. I don't know in which
kernel version it was fixed, but we ended up on that one, sort of by
chance. Unfortunately, it had severe issues with kswapd memory
compaction causing CPU soft lockups [9], so we went straight to
squeeze-backports' 3.2.23-1~bpo60+2. We've been happy since then.

> We have five pairs of basically identical machines performing the
> same task (each pair for one site). The error has not occured with
> any other one, but this site is the busiest:

We also saw the issue only at a site with generally higher load
compared to other sites.

I'd love to know exactly which commit fixed the issue, but it's fairly
tricky to reproduce the issue, and the bisect count is fairly high (it
need not be a specific fix for bnx2).

sven


[1]: bnx2 driver crashes under random circumstances
https://bugzilla.redhat.com/show_bug.cgi?id=520888

[2]: Access denied. Come on, Red Hat!
https://bugzilla.redhat.com/show_bug.cgi?id=511368

[3]: NIC doesn't register packets [rhel-5.5.z]
https://bugzilla.redhat.com/show_bug.cgi?id=587799

[4]: bnx2: Fix lost MSI-X problem on 5709 NICs.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=object;h=c441b8d2cb2194b05550a558d6d95d8944e56a84

[5]: Debian Changelog linux-2.6 (2.6.32-45)
http://packages.debian.org/changelogs/pool/main/l/linux-2.6/linux-2.6_2.6.32-45/changelog#version2.6.32-41

[6]: bnx2: Fix lost MSI-X problem on 5709 NICs.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=0b7817edda5e44e5fa769645bd1220f5e7b0beb5

[7]: bnx2: reset_task is crashing the kernel. Fixing it.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4529819c45161e4a119134f56ef504e69420bc98

[8]: bnx2: fixing a timout error due not refreshing TX timers correctly
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e6bf95ffa8d6f8f4b7ee33ea01490d95b0bbeb6e

[9]: [PATCH] remove compaction from kswapd
http://thread.gmane.org/gmane.linux.kernel.mm/58962
https://lkml.org/lkml/2011/3/25/664

^ permalink raw reply

* [PATCH net-next V4 0/2] Add rtnl_link_ops support to IPoIB
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
  To: davem, roland; +Cc: netdev, Or Gerlitz

This is about adding rtnl_link_ops to IPoIB, primarly addressing feedback 
from Dave on a similar patch that was part of the eIPoIB submission.

Also added the releated iproute2 patch, for the sake of the review and 
testing, e.g example usages: 

$ ip link add link ib0 name ib0.1 type ipoib 
$ ip link add link ib0 name ib0.8001 type ipoib pkey 0x8001

and the obvious

$ link delete dev ib0.8001
$ link delete dev ib0.1

Dave, as I reported over the V3 thread / http://marc.info/?l=linux-netdev&m=134744666921839&w=2
with latest net-next I see a lockdep warning when the module is unloaded which I'm
not sure if real or false, decided to keep with the submission, but I'm open to comments.

Roland, this patch is hanging out for pretty long while (few months) without 
any comment from you, if it makes things easier, I would like to merge it through
net-next, makes sense?

Or.
 
Changes from V3:
 - addressed feedback from Patrick McHardy to move the IFLA_IPOIB_yyy ipoib 
   rtnl defintions into include/linux/if_link.h 
 - changed IFLA_IPOIB_CHILD_PKEY to be named IFLA_IPOIB_PKEY which will cope
   with more IFLA_IPOIB_yyy entries to be added once the basic support is in

Changes from V2:
 - removed the notion of user defined index per child, since we can do well w.o it 
 - for that end, make (an internal to ipoib) distrinction between legacy childs created 
   through the old sysfs way to childs created using rtnl link ops

Changes from V1: 
 - applied feedback from Dave Miller to avoid using sysfs
 - added rtnl_link_ops support in ipoib and use them to add/delete childs

Or Gerlitz (1):
  IB/ipoib: Add rtnl_link_ops support


 Documentation/infiniband/ipoib.txt           |    3 +
 drivers/infiniband/ulp/ipoib/Makefile        |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib.h         |   13 +++
 drivers/infiniband/ulp/ipoib/ipoib_main.c    |   25 +++++-
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |  114 ++++++++++++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    |  102 +++++++++++++----------
 include/linux/if_link.h                      |   11 +++
 7 files changed, 220 insertions(+), 51 deletions(-)
 create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_netlink.c

^ permalink raw reply

* [PATCH iproute2 2/2] iplink: added support for ipoib rtnl link ops
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
  To: davem, roland; +Cc: netdev, Or Gerlitz
In-Reply-To: <1347551797-2495-1-git-send-email-ogerlitz@mellanox.com>

Added basic support to create/delete IPoIB child devices,
where the user can optionally specify the IB PKEY (Partition Key)
to be used by the newly created device. If nothing is provided,
the child will use the same pkey as the parent.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/linux/if_link.h |    9 ++++++
 ip/Makefile             |    2 +-
 ip/iplink.c             |    2 +-
 ip/iplink_ipoib.c       |   73 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 ip/iplink_ipoib.c

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 46f03db..7e64aa7 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -384,4 +384,13 @@ struct ifla_port_vsi {
 	__u8 pad[3];
 };
 
+/* IPoIB section */
+enum {
+	IFLA_IPOIB_UNSPEC,
+	IFLA_IPOIB_CHILD_PKEY,
+	__IFLA_IPOIB_MAX
+};
+
+#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/Makefile b/ip/Makefile
index 6a518f8..ba2e2e6 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -3,7 +3,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
     ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
     ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
     iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
-    iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o
+    iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o iplink_ipoib.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 4111871..7451aa0 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -83,7 +83,7 @@ void iplink_usage(void)
 
 	if (iplink_have_newlink()) {
 		fprintf(stderr, "\n");
-		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can | bridge }\n");
+		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can | bridge | ipoib }\n");
 	}
 	exit(-1);
 }
diff --git a/ip/iplink_ipoib.c b/ip/iplink_ipoib.c
new file mode 100644
index 0000000..7dea357
--- /dev/null
+++ b/ip/iplink_ipoib.c
@@ -0,0 +1,73 @@
+/*
+ * iplink_ipoib.c	IPoIB device support
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Or Gerlitz <ogerlitz@mellanox.com> 
+ *		copied iflink_vlan.c authored by Patrick McHardy <kaber@trash.net>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/if_link.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void explain(void)
+{
+	fprintf(stderr,
+		"Usage: ... ipoib [pkey PKEY]\n"
+		"\n"
+		"PKEY  := 0x8001-0xffff\n"
+	);
+}
+
+static int ipoib_parse_opt(struct link_util *lu, int argc, char **argv,
+			  struct nlmsghdr *n)
+{
+	__u16 pkey;
+
+	while (argc > 0) {
+		if (matches(*argv, "pkey") == 0) {
+			NEXT_ARG();
+			if (get_u16(&pkey, *argv, 0))
+				invarg("pkey is invalid", *argv);
+			addattr_l(n, 1024, IFLA_IPOIB_CHILD_PKEY, &pkey, 2);
+		} else if (matches(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "ipoib: what is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--, argv++;
+	}
+
+	return 0;
+}
+
+static void ipoib_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+	if (!tb)
+		return;
+
+	if (!tb[IFLA_IPOIB_CHILD_PKEY] ||
+	    RTA_PAYLOAD(tb[IFLA_IPOIB_CHILD_PKEY]) < sizeof(__u16))
+		return;
+
+	fprintf(f, "pkey  %.4x ", rta_getattr_u16(tb[IFLA_IPOIB_CHILD_PKEY]));
+}
+
+struct link_util ipoib_link_util = {
+	.id		= "ipoib",
+	.maxattr	= IFLA_IPOIB_MAX,
+	.parse_opt	= ipoib_parse_opt,
+	.print_opt	= ipoib_print_opt,
+};
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next V4] IB/ipoib: Add rtnl_link_ops support
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
  To: davem, roland; +Cc: netdev, Or Gerlitz, Erez Shitrit
In-Reply-To: <1347551797-2495-1-git-send-email-ogerlitz@mellanox.com>

Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.

Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.

Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 Documentation/infiniband/ipoib.txt           |    3 +
 drivers/infiniband/ulp/ipoib/Makefile        |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib.h         |   13 +++
 drivers/infiniband/ulp/ipoib/ipoib_main.c    |   25 +++++-
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |  114 ++++++++++++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    |  102 +++++++++++++----------
 include/linux/if_link.h                      |   11 +++
 7 files changed, 220 insertions(+), 51 deletions(-)
 create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_netlink.c

diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 64eeb55..f2cfe26 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -24,6 +24,9 @@ Partitions and P_Keys
   The P_Key for any interface is given by the "pkey" file, and the
   main interface for a subinterface is in "parent."
 
+  Child interface create/delete can also be done using IPoIB's
+  rtnl_link_ops, where childs created using either way behave the same.
+
 Datagram vs Connected modes
 
   The IPoIB driver supports two modes of operation: datagram and
diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile
index 3090100..e5430dd 100644
--- a/drivers/infiniband/ulp/ipoib/Makefile
+++ b/drivers/infiniband/ulp/ipoib/Makefile
@@ -5,7 +5,8 @@ ib_ipoib-y					:= ipoib_main.o \
 						   ipoib_multicast.o \
 						   ipoib_verbs.o \
 						   ipoib_vlan.o \
-						   ipoib_ethtool.o
+						   ipoib_ethtool.o \
+						   ipoib_netlink.o
 ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM)		+= ipoib_cm.o
 ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG)	+= ipoib_fs.o
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index ca43901..381f51b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -104,6 +104,10 @@ enum {
 
 	MAX_SEND_CQE		  = 16,
 	IPOIB_CM_COPYBREAK	  = 256,
+
+	IPOIB_NON_CHILD		  = 0,
+	IPOIB_LEGACY_CHILD	  = 1,
+	IPOIB_RTNL_CHILD	  = 2,
 };
 
 #define	IPOIB_OP_RECV   (1ul << 31)
@@ -350,6 +354,7 @@ struct ipoib_dev_priv {
 	struct net_device *parent;
 	struct list_head child_intfs;
 	struct list_head list;
+	int    child_type;
 
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
 	struct ipoib_cm_dev_priv cm;
@@ -509,6 +514,14 @@ void ipoib_event(struct ib_event_handler *handler,
 int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey);
 int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey);
 
+int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
+		     u16 pkey, int child_type);
+
+int  __init ipoib_netlink_init(void);
+void __exit ipoib_netlink_fini(void);
+
+void ipoib_setup(struct net_device *dev);
+
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 void ipoib_drain_cq(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3e2085a..b3e9709 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -173,6 +173,11 @@ static int ipoib_stop(struct net_device *dev)
 	return 0;
 }
 
+static void ipoib_uninit(struct net_device *dev)
+{
+	ipoib_dev_cleanup(dev);
+}
+
 static netdev_features_t ipoib_fix_features(struct net_device *dev, netdev_features_t features)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -1262,6 +1267,9 @@ out:
 void ipoib_dev_cleanup(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev), *cpriv, *tcpriv;
+	LIST_HEAD(head);
+
+	ASSERT_RTNL();
 
 	ipoib_delete_debug_files(dev);
 
@@ -1270,10 +1278,9 @@ void ipoib_dev_cleanup(struct net_device *dev)
 		/* Stop GC on child */
 		set_bit(IPOIB_STOP_NEIGH_GC, &cpriv->flags);
 		cancel_delayed_work(&cpriv->neigh_reap_task);
-		unregister_netdev(cpriv->dev);
-		ipoib_dev_cleanup(cpriv->dev);
-		free_netdev(cpriv->dev);
+		unregister_netdevice_queue(cpriv->dev, &head);
 	}
+	unregister_netdevice_many(&head);
 
 	ipoib_ib_dev_cleanup(dev);
 
@@ -1291,6 +1298,7 @@ static const struct header_ops ipoib_header_ops = {
 };
 
 static const struct net_device_ops ipoib_netdev_ops = {
+	.ndo_uninit		 = ipoib_uninit,
 	.ndo_open		 = ipoib_open,
 	.ndo_stop		 = ipoib_stop,
 	.ndo_change_mtu		 = ipoib_change_mtu,
@@ -1300,7 +1308,7 @@ static const struct net_device_ops ipoib_netdev_ops = {
 	.ndo_set_rx_mode	 = ipoib_set_mcast_list,
 };
 
-static void ipoib_setup(struct net_device *dev)
+void ipoib_setup(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -1662,7 +1670,6 @@ static void ipoib_remove_one(struct ib_device *device)
 		flush_workqueue(ipoib_workqueue);
 
 		unregister_netdev(priv->dev);
-		ipoib_dev_cleanup(priv->dev);
 		free_netdev(priv->dev);
 	}
 
@@ -1714,8 +1721,15 @@ static int __init ipoib_init_module(void)
 	if (ret)
 		goto err_sa;
 
+	ret = ipoib_netlink_init();
+	if (ret)
+		goto err_client;
+
 	return 0;
 
+err_client:
+	ib_unregister_client(&ipoib_client);
+
 err_sa:
 	ib_sa_unregister_client(&ipoib_sa_client);
 	destroy_workqueue(ipoib_workqueue);
@@ -1728,6 +1742,7 @@ err_fs:
 
 static void __exit ipoib_cleanup_module(void)
 {
+	ipoib_netlink_fini();
 	ib_unregister_client(&ipoib_client);
 	ib_sa_unregister_client(&ipoib_sa_client);
 	ipoib_unregister_debugfs();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
new file mode 100644
index 0000000..a7dc5ea
--- /dev/null
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. -  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/module.h>
+#include <net/rtnetlink.h>
+#include "ipoib.h"
+
+static const struct nla_policy ipoib_policy[IFLA_IPOIB_MAX + 1] = {
+	[IFLA_IPOIB_PKEY]	= { .type = NLA_U16 },
+};
+
+static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
+			       struct nlattr *tb[], struct nlattr *data[])
+{
+	struct net_device *pdev;
+	struct ipoib_dev_priv *ppriv;
+	u16 child_pkey;
+	int err;
+
+	if (!tb[IFLA_LINK])
+		return -EINVAL;
+
+	pdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!pdev)
+		return -ENODEV;
+
+	ppriv = netdev_priv(pdev);
+
+	if (test_bit(IPOIB_FLAG_SUBINTERFACE, &ppriv->flags)) {
+		ipoib_warn(ppriv, "child creation disallowed for child devices\n");
+		return -EINVAL;
+	}
+
+	if (!data || !data[IFLA_IPOIB_PKEY]) {
+		ipoib_dbg(ppriv, "no pkey specified, using parent pkey\n");
+		child_pkey  = ppriv->pkey;
+	} else
+		child_pkey  = nla_get_u16(data[IFLA_IPOIB_PKEY]);
+
+	err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
+
+	return err;
+}
+
+static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head *head)
+{
+	struct ipoib_dev_priv *priv, *ppriv;
+
+	priv = netdev_priv(dev);
+	ppriv = netdev_priv(priv->parent);
+
+	mutex_lock(&ppriv->vlan_mutex);
+	unregister_netdevice_queue(dev, head);
+	list_del(&priv->list);
+	mutex_unlock(&ppriv->vlan_mutex);
+}
+
+static size_t ipoib_get_size(const struct net_device *dev)
+{
+	return nla_total_size(2);	/* IFLA_IPOIB_PKEY */
+}
+
+static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
+	.kind		= "ipoib",
+	.maxtype	= IFLA_IPOIB_MAX,
+	.policy		= ipoib_policy,
+	.priv_size	= sizeof(struct ipoib_dev_priv),
+	.setup		= ipoib_setup,
+	.newlink	= ipoib_new_child_link,
+	.dellink	= ipoib_unregister_child_dev,
+	.get_size	= ipoib_get_size,
+};
+
+int __init ipoib_netlink_init(void)
+{
+	return rtnl_link_register(&ipoib_link_ops);
+}
+
+void __exit ipoib_netlink_fini(void)
+{
+	rtnl_link_unregister(&ipoib_link_ops);
+}
+
+MODULE_ALIAS_RTNL_LINK("ipoib");
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index d7e9740..238bbf9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -49,47 +49,11 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(parent, S_IRUGO, show_parent, NULL);
 
-int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
+int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
+		     u16 pkey, int type)
 {
-	struct ipoib_dev_priv *ppriv, *priv;
-	char intf_name[IFNAMSIZ];
 	int result;
 
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
-	ppriv = netdev_priv(pdev);
-
-	if (!rtnl_trylock())
-		return restart_syscall();
-	mutex_lock(&ppriv->vlan_mutex);
-
-	/*
-	 * First ensure this isn't a duplicate. We check the parent device and
-	 * then all of the child interfaces to make sure the Pkey doesn't match.
-	 */
-	if (ppriv->pkey == pkey) {
-		result = -ENOTUNIQ;
-		priv = NULL;
-		goto err;
-	}
-
-	list_for_each_entry(priv, &ppriv->child_intfs, list) {
-		if (priv->pkey == pkey) {
-			result = -ENOTUNIQ;
-			priv = NULL;
-			goto err;
-		}
-	}
-
-	snprintf(intf_name, sizeof intf_name, "%s.%04x",
-		 ppriv->dev->name, pkey);
-	priv = ipoib_intf_alloc(intf_name);
-	if (!priv) {
-		result = -ENOMEM;
-		goto err;
-	}
-
 	priv->max_ib_mtu = ppriv->max_ib_mtu;
 	/* MTU will be reset when mcast join happens */
 	priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
@@ -134,14 +98,13 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	if (device_create_file(&priv->dev->dev, &dev_attr_parent))
 		goto sysfs_failed;
 
+	priv->child_type = type;
 	list_add_tail(&priv->list, &ppriv->child_intfs);
 
-	mutex_unlock(&ppriv->vlan_mutex);
-	rtnl_unlock();
-
 	return 0;
 
 sysfs_failed:
+	result = -ENOMEM;
 	ipoib_delete_debug_files(priv->dev);
 	unregister_netdevice(priv->dev);
 
@@ -149,11 +112,60 @@ register_failed:
 	ipoib_dev_cleanup(priv->dev);
 
 err:
+	return result;
+}
+
+int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
+{
+	struct ipoib_dev_priv *ppriv, *priv;
+	char intf_name[IFNAMSIZ];
+	struct ipoib_dev_priv *tpriv;
+	int result;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	ppriv = netdev_priv(pdev);
+
+	snprintf(intf_name, sizeof intf_name, "%s.%04x",
+		 ppriv->dev->name, pkey);
+	priv = ipoib_intf_alloc(intf_name);
+	if (!priv)
+		return -ENOMEM;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	mutex_lock(&ppriv->vlan_mutex);
+
+	/*
+	 * First ensure this isn't a duplicate. We check the parent device and
+	 * then all of the legacy child interfaces to make sure the Pkey
+	 * doesn't match.
+	 */
+	if (ppriv->pkey == pkey) {
+		result = -ENOTUNIQ;
+		goto out;
+	}
+
+	list_for_each_entry(tpriv, &ppriv->child_intfs, list) {
+		if (tpriv->pkey == pkey &&
+		    tpriv->child_type == IPOIB_LEGACY_CHILD) {
+			result = -ENOTUNIQ;
+			goto out;
+		}
+	}
+
+	result = __ipoib_vlan_add(ppriv, priv, pkey, IPOIB_LEGACY_CHILD);
+
+out:
 	mutex_unlock(&ppriv->vlan_mutex);
-	rtnl_unlock();
-	if (priv)
+
+	if (result)
 		free_netdev(priv->dev);
 
+	rtnl_unlock();
+
 	return result;
 }
 
@@ -171,9 +183,9 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 		return restart_syscall();
 	mutex_lock(&ppriv->vlan_mutex);
 	list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
-		if (priv->pkey == pkey) {
+		if (priv->pkey == pkey &&
+		    priv->child_type == IPOIB_LEGACY_CHILD) {
 			unregister_netdevice(priv->dev);
-			ipoib_dev_cleanup(priv->dev);
 			list_del(&priv->list);
 			dev = priv->dev;
 			break;
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index ac173bd..24c0dd0 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -398,4 +398,15 @@ struct ifla_port_vsi {
 	__u8 pad[3];
 };
 
+
+/* IPoIB section */
+
+enum {
+	IFLA_IPOIB_UNSPEC,
+	IFLA_IPOIB_PKEY,
+	__IFLA_IPOIB_MAX
+};
+
+#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/4] net_sched: gred: eliminate redundant DP prio comparisons
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
  To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>

Each pair of DPs only needs to be compared once when searching for
a non-unique prio value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
 net/sched/sch_gred.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index fca73cd..e19d4eb 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -102,9 +102,8 @@ static inline int gred_wred_mode_check(struct Qdisc *sch)
 		if (q == NULL)
 			continue;
 
-		for (n = 0; n < table->DPs; n++)
-			if (table->tab[n] && table->tab[n] != q &&
-			    table->tab[n]->prio == q->prio)
+		for (n = i + 1; n < table->DPs; n++)
+			if (table->tab[n] && table->tab[n]->prio == q->prio)
 				return 1;
 	}
 
-- 
1.7.4.1

^ permalink raw reply related

* re: gre: Support GRE over IPv6
From: Dan Carpenter @ 2012-09-13 16:01 UTC (permalink / raw)
  To: xeb; +Cc: netdev

Hello Dmitry Kozlov,

The patch c12b395a4664: "gre: Support GRE over IPv6" from Aug 10, 
2012, leads to the following warning:
net/ipv6/ip6_gre.c:1299 ip6gre_header_parse()
	 error: memcpy() 'haddr' too small (8 vs 16)

net/ipv6/ip6_gre.c
  1296  static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
  1297  {
  1298          const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)skb_mac_header(skb);
  1299          memcpy(haddr, &ipv6h->saddr, sizeof(struct in6_addr));
                       ^^^^^
Smatch thinks this buffer is only 8 characters sometimes.

  1300          return sizeof(struct in6_addr);
  1301  }

One call tree where this would happen would be the
(struct sockaddr_ll *)sll->sll_addr[] in packet_rcv().

-> packet_rcv()
   -> dev_parse_header()
      -> ip6gre_header_parse()

I don't know the code well enough to say if this is a bug or not.  Could
you take a look?

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH net-next] drivers/net: Enable IOMMU pass through for be2net
From: Craig Hada @ 2012-09-13 17:06 UTC (permalink / raw)
  To: Hada, Craig M, netdev
  Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, linux-kernel
In-Reply-To: <4988B24899FF4448BF3D99EFCD2A3562323AD5@G1W3785.americas.hpqcorp.net>

On 9/13/2012 9:27 AM, Hada, Craig M wrote:
> This patch sets the coherent DMA mask to 64-bit after the be2net driver has been acknowledged that the system is 64-bit DMA capable. The coherent DMA mask is examined by the Intel IOMMU driver to determine whether to allow pass through context mapping for all devices. With this patch, the be2net driver combined with be2net compatible hardware provides comparable performance to the case where vt-d is disabled. The main use case for this change is to decrease the time necessary to copy virtual machine memory during KVM live migration instantiations.
>
> This patch was tested on a system that enables the IOMMU in non-coherent mode. Two DMA remapper issues were encountered and both are in the Intel IOMMU driver with the following patches submitted upstream but not yet commited.
>
> Patch 1 - DMAR:[fault reason 02] Present bit in context entry is clear
> https://lkml.org/lkml/2012/6/15/20

My apologies for posting a truncated link for the above. The correct link is https://lkml.org/lkml/2012/6/15/204

-Craig

^ permalink raw reply

* [PATCH] Xen backend support for paged out grant targets.
From: Andres Lagar-Cavilla @ 2012-09-13 17:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Ian Campbell, David Vrabel, David Miller,
	linux-kernel, netdev, Andres Lagar-Cavilla

Since Xen-4.2, hvm domains may have portions of their memory paged out. When a
foreign domain (such as dom0) attempts to map these frames, the map will
initially fail. The hypervisor returns a suitable errno, and kicks an
asynchronous page-in operation carried out by a helper. The foreign domain is
expected to retry the mapping operation until it eventually succeeds. The
foreign domain is not put to sleep because itself could be the one running the
pager assist (typical scenario for dom0).

This patch adds support for this mechanism for backend drivers using grant
mapping and copying operations. Specifically, this covers the blkback and
gntdev drivers (which map foregin grants), and the netback driver (which copies
foreign grants).

* Add GNTST_eagain, already exposed by Xen, to the grant interface.
* Add a retry method for grants that fail with GNTST_eagain (i.e. because the
  target foregin frame is paged out).
* Insert hooks with appropriate macro decorators in the aforementioned drivers.

The retry loop is only invoked if the grant operation status is GNTST_eagain.
It guarantees to leave a new status code different from GNTST_eagain. Any other
status code results in identical code execution as before.

The retry loop performs 256 attempts with increasing time intervals through a
32 second period. It uses msleep to yield while waiting for the next retry.

V2 after feedback from David Vrabel:
* Explicit MAX_DELAY instead of wrap-around delay into zero
* Abstract GNTST_eagain check into core grant table code for netback module.

V3 after feedback from Ian Campbell:
* Add placeholder in array of grant table error descriptions for unrelated
  error code we jump over.
* Eliminate single map and retry macro in favor of a generic batch flavor.
* Some renaming.
* Bury most implementation in grant_table.c, cleaner interface.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
---
 drivers/net/xen-netback/netback.c   |   11 ++------
 drivers/xen/grant-table.c           |   53 +++++++++++++++++++++++++++++++++++
 drivers/xen/xenbus/xenbus_client.c  |    6 ++--
 include/xen/grant_table.h           |   12 ++++++++
 include/xen/interface/grant_table.h |    5 +++-
 5 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 682633b..05593d8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -635,9 +635,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 		return;
 
 	BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
-					npo.copy_prod);
-	BUG_ON(ret != 0);
+	gnttab_batch_copy(netbk->grant_copy_op, npo.copy_prod);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 		sco = (struct skb_cb_overlay *)skb->cb;
@@ -1460,18 +1458,15 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
 static void xen_netbk_tx_action(struct xen_netbk *netbk)
 {
 	unsigned nr_gops;
-	int ret;
 
 	nr_gops = xen_netbk_tx_build_gops(netbk);
 
 	if (nr_gops == 0)
 		return;
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy,
-					netbk->tx_copy_ops, nr_gops);
-	BUG_ON(ret);
 
-	xen_netbk_tx_submit(netbk);
+	gnttab_batch_copy(netbk->tx_copy_ops, nr_gops);
 
+	xen_netbk_tx_submit(netbk);
 }
 
 static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index eea81cf..f5681c8 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -38,6 +38,7 @@
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 #include <linux/io.h>
+#include <linux/delay.h>
 #include <linux/hardirq.h>
 
 #include <xen/xen.h>
@@ -823,6 +824,52 @@ unsigned int gnttab_max_grant_frames(void)
 }
 EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
 
+/* Handling of paged out grant targets (GNTST_eagain) */
+#define MAX_DELAY 256
+static inline void
+gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status,
+						const char *func)
+{
+	unsigned delay = 1;
+
+	do {
+		BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1));
+		if (*status == GNTST_eagain)
+			msleep(delay++);
+	} while ((*status == GNTST_eagain) && (delay < MAX_DELAY));
+
+	if (delay >= MAX_DELAY) {
+		printk(KERN_ERR "%s: %s eagain grant\n", func, current->comm);
+		*status = GNTST_bad_page;
+	}
+}
+
+void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count)
+{
+	struct gnttab_map_grant_ref *op;
+
+	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, batch, count))
+		BUG();
+	for (op = batch; op < batch + count; op++)
+		if (op->status == GNTST_eagain)
+			gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, op,
+									&op->status, __func__);
+}
+EXPORT_SYMBOL_GPL(gnttab_batch_map);
+
+void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
+{
+	struct gnttab_copy *op;
+
+	if (HYPERVISOR_grant_table_op(GNTTABOP_copy, batch, count))
+		BUG();
+	for (op = batch; op < batch + count; op++)
+		if (op->status == GNTST_eagain)
+			gnttab_retry_eagain_gop(GNTTABOP_copy, op, &op->status,
+									__func__);
+}
+EXPORT_SYMBOL_GPL(gnttab_batch_copy);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count)
@@ -836,6 +883,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 	if (ret)
 		return ret;
 
+	/* Retry eagain maps */
+	for (i = 0; i < count; i++)
+		if (map_ops[i].status == GNTST_eagain)
+			gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, map_ops + i,
+                                    &map_ops[i].status, __func__);
+
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return ret;
 
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index b3e146e..bcf3ba4 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -490,8 +490,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
 
 	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
 
-	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
-		BUG();
+	gnttab_batch_map(&op, 1);
 
 	if (op.status != GNTST_okay) {
 		free_vm_area(area);
@@ -572,8 +571,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
 	gnttab_set_map_op(&op, (unsigned long)vaddr, GNTMAP_host_map, gnt_ref,
 			  dev->otherend_id);
 
-	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
-		BUG();
+	gnttab_batch_map(&op, 1);
 
 	if (op.status != GNTST_okay) {
 		xenbus_dev_fatal(dev, op.status,
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 11e27c3..da9386e 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -189,4 +189,16 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		      struct page **pages, unsigned int count, bool clear_pte);
 
+/* Perform a batch of grant map/copy operations. Retry every batch slot
+ * for which the hypervisor returns GNTST_eagain. This is typically due 
+ * to paged out target frames.
+ *
+ * Will retry for 1, 2, ... 255 ms, i.e. 256 times during 32 seconds.
+ *
+ * Return value in each iand every status field of the batch guaranteed
+ * to not be GNTST_eagain. 
+ */
+void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
+void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
+
 #endif /* __ASM_GNTTAB_H__ */
diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h
index 7da811b..2f912d9 100644
--- a/include/xen/interface/grant_table.h
+++ b/include/xen/interface/grant_table.h
@@ -520,6 +520,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
 #define GNTST_permission_denied (-8) /* Not enough privilege for operation.  */
 #define GNTST_bad_page         (-9) /* Specified page was invalid for op.    */
 #define GNTST_bad_copy_arg    (-10) /* copy arguments cross page boundary */
+#define GNTST_eagain          (-12) /* Retry.                                */
 
 #define GNTTABOP_error_msgs {                   \
     "okay",                                     \
@@ -532,7 +533,9 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
     "no spare translation slot in the I/O MMU", \
     "permission denied",                        \
     "bad page",                                 \
-    "copy arguments cross page boundary"        \
+    "copy arguments cross page boundary",       \
+    "_placeholder_",                            \
+    "retry"                                     \
 }
 
 #endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */
-- 
1.7.9.5

^ permalink raw reply related

* Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
From: Sylvain Munaut @ 2012-09-13 17:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, netdev
In-Reply-To: <1347455135.13103.949.camel@edumazet-glaptop>

Hi,

> Yes, but I have some worries of why it is needed.
>
> Isnt it covering a bug elsewhere ?

That may very well be.

Of the few test servers I have running the same kernel, I just found
the one with netconsole active to be "stuck".

Not frozen, but all user process are hanged up and it's spitting
message about processes and CPU being "stuck". The trace is different
in each case depending on what the process was actually doing at the
time it got stuck.

No message sent to the netconsole with the root cause and nothing was
written in the logs ...

Cheers,

    Sylvain

^ permalink raw reply

* Re: [PATCH 1/4] net_sched: gred: correct comment about qavg calculation in RIO mode
From: Jamal Hadi Salim @ 2012-09-13 17:59 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>

On 12-09-13 11:22 AM, David Ward wrote:
> Signed-off-by: David Ward <david.ward@ll.mit.edu>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply

* Re: [PATCH 2/4] net_sched: gred: eliminate redundant DP prio comparisons
From: Jamal Hadi Salim @ 2012-09-13 18:00 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-2-git-send-email-david.ward@ll.mit.edu>

On 12-09-13 11:22 AM, David Ward wrote:
> Each pair of DPs only needs to be compared once when searching for
> a non-unique prio value.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply

* Re: [PATCH 3/4] net_sched: gred: fix qave reporting via netlink
From: Jamal Hadi Salim @ 2012-09-13 18:01 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-3-git-send-email-david.ward@ll.mit.edu>

On 12-09-13 11:22 AM, David Ward wrote:
> q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
> to pass q->vars.qavg as the backlog value, we need to un-scale it.
> Additionally, the qave value returned via netlink should not be Wlog
> scaled, so we need to un-scale the result of red_calc_qavg().
>
> This caused artificially high values for "Average Queue" to be shown
> by 'tc -s -d qdisc', but did not affect the actual operation of GRED.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply

* Re: [PATCH 4/4] net_sched: gred: actually perform idling in WRED mode
From: Jamal Hadi Salim @ 2012-09-13 18:08 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-4-git-send-email-david.ward@ll.mit.edu>

On 12-09-13 11:22 AM, David Ward wrote:
> gred_dequeue() and gred_drop() do not seem to get called when the
> queue is empty, meaning that we never start idling while in WRED
> mode. And since qidlestart is not stored by gred_store_wred_set(),
> we would never stop idling while in WRED mode if we ever started.
> This messes up the average queue size calculation that influences
> packet marking/dropping behavior.
>
> Now, we start WRED mode idling as we are removing the last packet
> from the queue. Also we now actually stop WRED mode idling when we
> are enqueuing a packet.
>
> Cc: Bruce Osler <brosler@cisco.com>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>

This is one is not so obvious. Iam assuming you vetted it via some tests.
In which case:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] Xen backend support for paged out grant targets.
From: Ian Campbell @ 2012-09-13 18:11 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: xen-devel@xen.lists.org, Konrad Rzeszutek Wilk, David Vrabel,
	David Miller, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <1347557306-25453-1-git-send-email-andres@lagarcavilla.org>

On Thu, 2012-09-13 at 18:28 +0100, Andres Lagar-Cavilla wrote:
> 
> * Add placeholder in array of grant table error descriptions for
> unrelated error code we jump over. 

Why not just define it, it's listed here:
http://xenbits.xen.org/docs/unstable/hypercall/include,public,grant_table.h.html#Enum_grant_status

I notice the specific wording of the error msg is different here too.
It'd probably be best to use the same wording as the Xen definition, so
all OSes end up with the same name for the same condition (else bug
reports will be confusing).

Ian.

^ permalink raw reply

* Re: [PATCH v4 0/8] cgroup: Assign subsystem IDs during compile time
From: Tejun Heo @ 2012-09-13 18:13 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: netdev, cgroups, Daniel Wagner, David S. Miller, Paul E. McKenney,
	Andrew Morton, Eric Dumazet, Gao feng, Glauber Costa, Herbert Xu,
	Jamal Hadi Salim, John Fastabend, Kamezawa Hiroyuki, Li Zefan,
	Neil Horman
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

On Wed, Sep 12, 2012 at 04:12:00PM +0200, Daniel Wagner wrote:
> I've removed the useless test in patch #4 and updated the commit message
> on patch #7. 
> 
> While rewriting the commit message #7 I realized the pointer check was
> completely wrong. Instead testing the return value of
> task_subsys_state() I tested the pointer return by container_of. For
> more details on this see the commit message. 
> 
> Because of this I added Herbert and Paul to the Cc list. Please have
> close look at my rambling on the RCU part in patch #7. Thanks a lot!
> 
> This series is against 
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.7

Applied to cgroup/for-3.7.

Thanks!

-- 
tejun

^ permalink raw reply

* Re: [PATCH 0/4] netfilter updates for 3.6-rc5
From: David Miller @ 2012-09-13 18:25 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1347533648-3451-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Thu, 13 Sep 2012 12:54:04 +0200

> The following patchset contains four updates for your net tree, they are:
> 
> * Fix crash on timewait sockets, since the TCP early demux was added,
>   in nfnetlink_log, from Eric Dumazet.
> 
> * Fix broken syslog log-level for xt_LOG and ebt_log since printk format was
>   converted from <.> to a 2 bytes pattern using ASCII SOH, from Joe Perches.
> 
> * Two security fixes for the TCP connection tracking targeting off-path attacks,
>   from Jozsef Kadlecsik. The problem was discovered by Jan Wrobel and it is
>   documented in: http://mixedbit.org/reflection_scan/reflection_scan.pdf.
> 
> You can pull these changes from:
> 
> git://1984.lsi.us.es/nf master

Pulled, thanks Pablo.

^ permalink raw reply

* Re: [PATCH 0/5] Netfilter updates for net-next
From: David Miller @ 2012-09-13 18:26 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1347534092-3579-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Thu, 13 Sep 2012 13:01:27 +0200

> The following patchset contains four Netfilter updates, mostly targeting
> to fix issues added with IPv6 NAT, and one little IPVS update for net-next:
> 
> * Remove unneeded conditional free of skb in nfnetlink_queue, from
>   Wei Yongjun.
> 
> * One semantic path from coccinelle detected the use of list_del +
>   INIT_LIST_HEAD, instead of list_del_init, again from Wei Yongjun.
> 
> * Fix out-of-bound memory access in the NAT address selection, from
>   Florian Westphal. This was introduced with the IPv6 NAT patches.
> 
> * Two fixes for crashes that were introduced in the recently merged
>   IPv6 NAT support, from myself.
> 
> You can pull these changes from:
> 
> git://1984.lsi.us.es/nf-next master

Also pulled, thanks a lot.

^ permalink raw reply

* Re: [PATCH] mISDN: Fix wrong usage of flush_work_sync while holding locks
From: David Miller @ 2012-09-13 18:59 UTC (permalink / raw)
  To: keil; +Cc: netdev, stable
In-Reply-To: <1347546980-15923-1-git-send-email-keil@b1-systems.de>

From: Karsten Keil <keil@b1-systems.de>
Date: Thu, 13 Sep 2012 16:36:20 +0200

> It is a bad idea to hold a spinlock and call flush_work_sync.
> Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
> on closing the channel this seems to be the more correct function.
> Remove the never used and constant return value of mISDN_freebchannel.
> 
> Signed-off-by: Karsten Keil <keil@b1-systems.de>
> Cc: <stable@kernel.org>

Applied, thanks.

^ permalink raw reply

* Re: [net-next PATCH 0/3] bnx2x: Link flap avoidance added
From: David Miller @ 2012-09-13 19:01 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, eilong, ariele
In-Reply-To: <1347540981-16198-1-git-send-email-yuvalmin@broadcom.com>

From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Thu, 13 Sep 2012 15:56:18 +0300

> In various flows in the bnx2x driver, the link is toggled unnecessarily -
> In such flows, if the link is already up it would be pulled down than
> raised up again, even if no change in the link was requested by the
> user.
> 
> This patch series tries to eliminate this problem, or at least to greatly
> reduce the number of cases that would actually cause such a scenario to
> happen.
> 
> Please consider applying this patch series to 'net-next'.

All applied, thanks.

^ permalink raw reply

* Re: [PATCH] mISDN: Fix wrong usage of flush_work_sync while holding locks
From: David Miller @ 2012-09-13 19:06 UTC (permalink / raw)
  To: keil; +Cc: netdev, stable
In-Reply-To: <20120913.145937.235693485700301637.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 13 Sep 2012 14:59:37 -0400 (EDT)

> From: Karsten Keil <keil@b1-systems.de>
> Date: Thu, 13 Sep 2012 16:36:20 +0200
> 
>> It is a bad idea to hold a spinlock and call flush_work_sync.
>> Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
>> on closing the channel this seems to be the more correct function.
>> Remove the never used and constant return value of mISDN_freebchannel.
>> 
>> Signed-off-by: Karsten Keil <keil@b1-systems.de>
>> Cc: <stable@kernel.org>
> 
> Applied, thanks.

BTW, about -stable:

1) Even if it were appropriate to submit this directly to -stable,
   stable@kernel.org is not the correct email address and you must
   have seen the bounce produced by trying to send email there.

   Rather, stable@vger.kernel.org is the correct address.

2) I queue up and submit networking bug fixes for -stable myself so
   you should not submit them directly but rather make a request that
   I add your patch to my networking -stable queue.

Thanks.

^ permalink raw reply

* Re: [PATCH 4/4] net_sched: gred: actually perform idling in WRED mode
From: Ward, David - 0663 - MITLL @ 2012-09-13 19:38 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev@vger.kernel.org, Bruce Osler, Cyril Chemparathy
In-Reply-To: <50522106.40404@mojatatu.com>

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On 13/09/12 14:08, Jamal Hadi Salim wrote:
> On 12-09-13 11:22 AM, David Ward wrote:
>> gred_dequeue() and gred_drop() do not seem to get called when the
>> queue is empty, meaning that we never start idling while in WRED
>> mode. And since qidlestart is not stored by gred_store_wred_set(),
>> we would never stop idling while in WRED mode if we ever started.
>> This messes up the average queue size calculation that influences
>> packet marking/dropping behavior.
>>
>> Now, we start WRED mode idling as we are removing the last packet
>> from the queue. Also we now actually stop WRED mode idling when we
>> are enqueuing a packet.
>>
>> Cc: Bruce Osler <brosler@cisco.com>
>> Signed-off-by: David Ward <david.ward@ll.mit.edu>
> This is one is not so obvious. Iam assuming you vetted it via some tests.
> In which case:
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
> cheers,
> jamal
>

Before applying this patch, the average queue size (as seen with "tc -s 
qdisc") remained constant forever after I stopped sending any packets 
through the interface -- it didn't taper off as you would expect.  After 
the patch, the average queue size will now taper off if packets are not 
being sent.

David


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4571 bytes --]

^ permalink raw reply

* Re: [PATCH] Xen backend support for paged out grant targets.
From: Andres Lagar-Cavilla @ 2012-09-13 19:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andres Lagar-Cavilla, xen-devel@xen.lists.org,
	Konrad Rzeszutek Wilk, David Vrabel, David Miller,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <1347559899.25803.64.camel@dagon.hellion.org.uk>


On Sep 13, 2012, at 2:11 PM, Ian Campbell wrote:

> On Thu, 2012-09-13 at 18:28 +0100, Andres Lagar-Cavilla wrote:
>> 
>> * Add placeholder in array of grant table error descriptions for
>> unrelated error code we jump over. 
> 
> Why not just define it, it's listed here:
> http://xenbits.xen.org/docs/unstable/hypercall/include,public,grant_table.h.html#Enum_grant_status
Well, a) we'd be defining something no one will be using (for the moment) b) I would be signing-off on something unrelated.
> 
> I notice the specific wording of the error msg is different here too.
That should not be, I'll re-spin the patch.

Thanks
Andres

> It'd probably be best to use the same wording as the Xen definition, so
> all OSes end up with the same name for the same condition (else bug
> reports will be confusing).
> 
> Ian.
> 

^ permalink raw reply

* Re: [PATCH] mISDN: Fix wrong usage of flush_work_sync while holding locks
From: Karsten Keil @ 2012-09-13 19:50 UTC (permalink / raw)
  To: David Miller; +Cc: keil, netdev
In-Reply-To: <20120913.150659.17331564270446673.davem@davemloft.net>

Am 13.09.2012 21:06, schrieb David Miller:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 13 Sep 2012 14:59:37 -0400 (EDT)
> 
>> From: Karsten Keil <keil@b1-systems.de>
>> Date: Thu, 13 Sep 2012 16:36:20 +0200
>>
>>> It is a bad idea to hold a spinlock and call flush_work_sync.
>>> Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
>>> on closing the channel this seems to be the more correct function.
>>> Remove the never used and constant return value of mISDN_freebchannel.
>>>
>>> Signed-off-by: Karsten Keil <keil@b1-systems.de>
>>> Cc: <stable@kernel.org>
>>
>> Applied, thanks.
> 
> BTW, about -stable:
> 
> 1) Even if it were appropriate to submit this directly to -stable,
>    stable@kernel.org is not the correct email address and you must
>    have seen the bounce produced by trying to send email there.
> 
>    Rather, stable@vger.kernel.org is the correct address.
> 
> 2) I queue up and submit networking bug fixes for -stable myself so
>    you should not submit them directly but rather make a request that
>    I add your patch to my networking -stable queue.
> 

OK, then I will put something like, "Should be considered for stable
too" in the comment and do not add the CC: stable@vger.kernel.org
for the next time.

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox