Netdev List
 help / color / mirror / Atom feed
* Re: [PULL] virtio: virtio 1.0 support, misc patches
From: Michael S. Tsirkin @ 2014-12-11 22:01 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: sergei.shtylyov, kvm, netdev, linux-kernel, virtualization,
	Linus Torvalds, pbonzini, ben, David Miller, thuth
In-Reply-To: <20141212080705.50988d51@canb.auug.org.au>

On Fri, Dec 12, 2014 at 08:07:05AM +1100, Stephen Rothwell wrote:
> Hi Michael,
> 
> On Thu, 11 Dec 2014 14:02:48 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > The following changes since commit b2776bf7149bddd1f4161f14f79520f17fc1d71d:
> > 
> >   Linux 3.18 (2014-12-07 14:21:05 -0800)
> 
> hmmm ...
> 
> > are available in the git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> > 
> > for you to fetch changes up to 803cd18f7b5e6c7ad6bee9571ae8f4450190ab58:
> > 
> >   virtio_ccw: finalize_features error handling (2014-12-09 16:32:41 +0200)
> > 
> > Note: some net drivers are affected by these patches.
> > David said he's fine with merging these patches through
> > my tree.
> > Rusty's on vacation, he acked using my tree for these, too.
> 
> So, either none of this has been in linux-next or you have just rebased
> it all on top of v3.18 ...

It was on linux next for a while, then I wanted to tweak the commit log
for some messages a bit.
So since I rewrote the history anyway, I went ahead and rebased it
on v3.18, after this rebase it's been in linux-next for several days.

> -- 
> Cheers,
> Stephen Rothwell                    sfr@canb.auug.org.au

^ permalink raw reply

* Re: [PATCH v7 0/4] arch: Add lightweight memory barriers for coherent memory access
From: Alexander Duyck @ 2014-12-11 22:04 UTC (permalink / raw)
  To: David Miller
  Cc: alexander.h.duyck, linux-arch, netdev, linux-kernel, arnd,
	mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211.153204.228153607927530694.davem@davemloft.net>

On 12/11/2014 12:32 PM, David Miller wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Wed, 10 Dec 2014 21:28:39 -0800
>
>> It occurs to me that I never got a sign off from any of the maintainers
>> on getting this pulled in.
>>
>> Since the merge window is open I was wondering which tree I should make
>> sure these patches apply to and who will be the one to pull them in?
>> Since I was modifying network drivers should I resubmit them for netdev,
>> or should I submit them for asm-generic or some other tree?
> I have no problem taking this via my tree, but I want to see agreement
> from other interested parties.

I'll resubmit them as a net-next submission after I verify that they
still apply cleanly and there aren't any build issues.

- Alex

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Jiri Pirko @ 2014-12-11 22:25 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
	vyasevic, netdev, davem, shm, gospo
In-Reply-To: <5489E214.4080708@cumulusnetworks.com>

Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>
>>>>>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>to switch asic
>>>>>>>
>>>>>>>(The names of the apis look odd with 'switch_port_bridge',
>>>>>>>but am more inclined to change the prefix of the api to something else.
>>>>>>>Will take any suggestions).
>>>>>>>
>>>>>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>pass bridge port attributes to the port device.
>>>>>>>
>>>>>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>are switch ports).
>>>>>>>
>>>>>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>---
>>>>>>>include/net/switchdev.h   |    5 +++-
>>>>>>>net/switchdev/switchdev.c |   70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>index 8a6d164..22676b6 100644
>>>>>>>--- a/include/net/switchdev.h
>>>>>>>+++ b/include/net/switchdev.h
>>>>>>>@@ -17,7 +17,10 @@
>>>>>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>				struct netdev_phys_item_id *psid);
>>>>>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>-
>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>+				struct nlmsghdr *nlh, u16 flags);
>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>+				struct nlmsghdr *nlh, u16 flags);
>>>>>>>#else
>>>>>>>
>>>>>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>index d162b21..62317e1 100644
>>>>>>>--- a/net/switchdev/switchdev.c
>>>>>>>+++ b/net/switchdev/switchdev.c
>>>>>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>	return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>}
>>>>>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>+
>>>>>>>+/**
>>>>>>>+ *	netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>+ *	port attributes
>>>>>>>+ *
>>>>>>>+ *	@dev: port device
>>>>>>>+ *	@nlh: netlink msg with bridge port attributes
>>>>>>>+ *
>>>>>>>+ *	Notify switch device port of bridge port attributes
>>>>>>>+ */
>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>+									  struct nlmsghdr *nlh, u16 flags)
>>>>>>>+{
>>>>>>>+	const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>+	struct net_device *lower_dev;
>>>>>>>+	struct list_head *iter;
>>>>>>>+	int ret = 0, err = 0;
>>>>>>>+
>>>>>>>+	if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>+		return err;
>>>>>>>+
>>>>>>>+	if (ops->ndo_bridge_setlink) {
>>>>>>>+	    WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>+	    return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>	You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>	Otherwise when only this patch is applied (during bisection)
>>>>>>	this won't compile.
>>>>>ack, will fix it and keep that in mind next time.
>>>>>>>+	}
>>>>>>>+
>>>>>>>+	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>	I do not understand why to iterate over lower devices. At this
>>>>>>	stage we don't know a thing about this upper or its lowers. Let
>>>>>>	the uppers (/masters) to decide if this needs to be propagated
>>>>>>	or not.
>>>>>Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>port attributes to switch device driver today (vlan and other bridge port
>>>>>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>devices that can be bridge ports.
>>>>Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>bonding for example and let it propagate the the call to slaves.
>>>No, that will require bridge attribute support in all drivers. And that is no
>>>good.
>>Not all drivers, just all masters which want to support this. Like bond,
>>team, macvlan etc. That would be the same as for
>>ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>see any problem in that. It is much much clearer over big hammer iterate
>>over lowers in my opinion.
>
>You cannot avoid the lowerdev iteration in any case.
>If you added it in the individual drivers: bond, macvlan and other drivers
>will all have to do the same thing.
>ie Call bridge setlink on lowerdevs.

I feel that the right way is to let masters propagate that themselves in
their code. That's it. I might be wrong of course.


>My patch avoids the need to modify these drivers. Besides it does this only
>when the OFFLOAD flag is set.


Yep, well in my reply to another patch of you series I expressed my
feeling that the flag should be really checked in particular switch
driver, not core. But I might be wrong there as well...


>
>It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>will be all other ndo_ops we will need for switch asics.
>It will be l3 tomorrow, if the route is through a bond (But at that point, we
>may end up having to introduce switch device instead of going to the port.
>Lets see).
>
>Today this patch introduces an abstract way to get to the switch driver by
>getting to the slave switch port (And only when the OFFLOAD flag is set).
>
>
>>
>>
>>>>Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>might not make sense to propagate to "lowers".
>>>This does not really propagate to lowers. It is just trying to get to a
>>>switch port and from there to the switch driver.
>>>Example, bond driver does not need to care if its a bridge port. It will
>>>simply pass the call to its slave which
>>>might be a switch port.
>>>
>>>bond driver does not care if its a bridge port. But the switch driver cares,
>>>because it knows that the bond was created with switch ports.
>>>
>>>
>>>>>And this allows a switch driver to receive these callbacks if it has marked
>>>>>the switch port with an offload flag. Your way of using the switch port to
>>>>>get to the switch driver does not help in these cases.
>>>>I do not follow how this is related to this case (stacked layout).
>>>>
>>>>>The other option is to use the 'switch device (not port)' to get to the
>>>>>switch driver.
>>>>That would not help this case (stacked layout) I believe.
>>>>
>>>>
>>>>>This patch shows that you can still do this with the ndo ops.
>>>>>>>+		err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>+		if (err)
>>>>>>>+			ret = err;
>>>>>>>+    }
>>>>>>  ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>
>>>>>>>+
>>>>>>>+	return ret;
>>>>>>>+}
>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>+
>>>>>>>+/**
>>>>>>>+ *	netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>+ *	attribute delete
>>>>>>>+ *
>>>>>>>+ *	@dev: port device
>>>>>>>+ *	@nlh: netlink msg with bridge port attributes
>>>>>>>+ *
>>>>>>>+ *	Notify switch device port of bridge port attribute delete
>>>>>>>+ */
>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>+									  struct nlmsghdr *nlh, u16 flags)
>>>>>>>+{
>>>>>>>+	const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>+	struct net_device *lower_dev;
>>>>>>>+	struct list_head *iter;
>>>>>>>+	int ret = 0, err = 0;
>>>>>>>+
>>>>>>>+	if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>+		return err;
>>>>>>>+
>>>>>>>+	if (ops->ndo_bridge_dellink) {
>>>>>>>+		WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>+		return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>+	}
>>>>>>>+
>>>>>>>+	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>+		err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>+		if (err)
>>>>>>>+			ret = err;
>>>>>>>+	}
>>>>>>>+
>>>>>>>+	return ret;
>>>>>>>+}
>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>-- 
>>>>>>>1.7.10.4
>>>>>>>
>>>>--
>>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>the body of a message to majordomo@vger.kernel.org
>>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH iproute2 v3] ip: Simplify executing ip cmd within network ns
From: Vadim Kochan @ 2014-12-11 22:32 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

From: Vadim Kochan <vadim4j@gmail.com>

Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS ip OPTIONS COMMAND OBJECT

    to

    ip -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    ip -net vnet0 link add br0 type bridge
    ip -n vnet0 link

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
Changes v2 -> v3
    Switching to netns w/o execvp: suggested by Jiri Pirko
                                                Jiri Benc

So I moved switching netns to lib/utils.c, looks
not very good for me, but it allows easy reuse this
netns_switch func for other utils in the future (tc, bridge, ...).

May be it would be better to have separated lib/netns.c module and
include/netns.h header ?

Changes v1 -> v2
    use -n[etns] option name:      suggested by Nicolas Dichtel
    changed man ip.8 page

 include/utils.h |  27 +++++++++++++++
 ip/ip.c         |   4 +++
 ip/ipnetns.c    | 105 ++------------------------------------------------------
 lib/Makefile    |   4 +++
 lib/utils.c     |  88 +++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/ip.8   |  23 ++++++++++++-
 6 files changed, 147 insertions(+), 104 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index eef9c42..1fd012c 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -159,4 +159,31 @@ struct iplink_req;
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev,
 		int *group, int *index);
+
+#define NETNS_RUN_DIR "/var/run/netns"
+#define NETNS_ETC_DIR "/etc/netns"
+
+#ifndef CLONE_NEWNET
+#define CLONE_NEWNET 0x40000000	/* New network namespace (lo, device, names sockets, etc) */
+#endif
+
+#ifndef MNT_DETACH
+#define MNT_DETACH	0x00000002	/* Just detach from the tree */
+#endif /* MNT_DETACH */
+
+/* sys/mount.h may be out too old to have these */
+#ifndef MS_REC
+#define MS_REC		16384
+#endif
+
+#ifndef MS_SLAVE
+#define MS_SLAVE	(1 << 19)
+#endif
+
+#ifndef MS_SHARED
+#define MS_SHARED	(1 << 20)
+#endif
+
+extern int netns_switch(char *netns);
+
 #endif /* __UTILS_H__ */
diff --git a/ip/ip.c b/ip/ip.c
index 5f759d5..5da10bf 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -262,6 +262,10 @@ int main(int argc, char **argv)
 			rcvbuf = size;
 		} else if (matches(opt, "-help") == 0) {
 			usage();
+		} else if (matches(opt, "-netns") == 0) {
+			NEXT_ARG();
+			if (netns_switch(argv[1]))
+				exit(-1);
 		} else {
 			fprintf(stderr, "Option \"%s\" is unknown, try \"ip -help\".\n", opt);
 			exit(-1);
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 1c8aa02..dd183e1 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -18,42 +18,6 @@
 #include "utils.h"
 #include "ip_common.h"
 
-#define NETNS_RUN_DIR "/var/run/netns"
-#define NETNS_ETC_DIR "/etc/netns"
-
-#ifndef CLONE_NEWNET
-#define CLONE_NEWNET 0x40000000	/* New network namespace (lo, device, names sockets, etc) */
-#endif
-
-#ifndef MNT_DETACH
-#define MNT_DETACH	0x00000002	/* Just detach from the tree */
-#endif /* MNT_DETACH */
-
-/* sys/mount.h may be out too old to have these */
-#ifndef MS_REC
-#define MS_REC		16384
-#endif
-
-#ifndef MS_SLAVE
-#define MS_SLAVE	(1 << 19)
-#endif
-
-#ifndef MS_SHARED
-#define MS_SHARED	(1 << 20)
-#endif
-
-#ifndef HAVE_SETNS
-static int setns(int fd, int nstype)
-{
-#ifdef __NR_setns
-	return syscall(__NR_setns, fd, nstype);
-#else
-	errno = ENOSYS;
-	return -1;
-#endif
-}
-#endif /* HAVE_SETNS */
-
 static int usage(void)
 {
 	fprintf(stderr, "Usage: ip netns list\n");
@@ -101,42 +65,12 @@ static int netns_list(int argc, char **argv)
 	return 0;
 }
 
-static void bind_etc(const char *name)
-{
-	char etc_netns_path[MAXPATHLEN];
-	char netns_name[MAXPATHLEN];
-	char etc_name[MAXPATHLEN];
-	struct dirent *entry;
-	DIR *dir;
-
-	snprintf(etc_netns_path, sizeof(etc_netns_path), "%s/%s", NETNS_ETC_DIR, name);
-	dir = opendir(etc_netns_path);
-	if (!dir)
-		return;
-
-	while ((entry = readdir(dir)) != NULL) {
-		if (strcmp(entry->d_name, ".") == 0)
-			continue;
-		if (strcmp(entry->d_name, "..") == 0)
-			continue;
-		snprintf(netns_name, sizeof(netns_name), "%s/%s", etc_netns_path, entry->d_name);
-		snprintf(etc_name, sizeof(etc_name), "/etc/%s", entry->d_name);
-		if (mount(netns_name, etc_name, "none", MS_BIND, NULL) < 0) {
-			fprintf(stderr, "Bind %s -> %s failed: %s\n",
-				netns_name, etc_name, strerror(errno));
-		}
-	}
-	closedir(dir);
-}
-
 static int netns_exec(int argc, char **argv)
 {
 	/* Setup the proper environment for apps that are not netns
 	 * aware, and execute a program in that environment.
 	 */
-	const char *name, *cmd;
-	char net_path[MAXPATHLEN];
-	int netns;
+	const char *cmd;
 
 	if (argc < 1) {
 		fprintf(stderr, "No netns name specified\n");
@@ -146,45 +80,10 @@ static int netns_exec(int argc, char **argv)
 		fprintf(stderr, "No command specified\n");
 		return -1;
 	}
-
-	name = argv[0];
 	cmd = argv[1];
-	snprintf(net_path, sizeof(net_path), "%s/%s", NETNS_RUN_DIR, name);
-	netns = open(net_path, O_RDONLY | O_CLOEXEC);
-	if (netns < 0) {
-		fprintf(stderr, "Cannot open network namespace \"%s\": %s\n",
-			name, strerror(errno));
-		return -1;
-	}
-
-	if (setns(netns, CLONE_NEWNET) < 0) {
-		fprintf(stderr, "setting the network namespace \"%s\" failed: %s\n",
-			name, strerror(errno));
-		return -1;
-	}
 
-	if (unshare(CLONE_NEWNS) < 0) {
-		fprintf(stderr, "unshare failed: %s\n", strerror(errno));
-		return -1;
-	}
-	/* Don't let any mounts propagate back to the parent */
-	if (mount("", "/", "none", MS_SLAVE | MS_REC, NULL)) {
-		fprintf(stderr, "\"mount --make-rslave /\" failed: %s\n",
-			strerror(errno));
+	if (netns_switch(argv[0]))
 		return -1;
-	}
-	/* Mount a version of /sys that describes the network namespace */
-	if (umount2("/sys", MNT_DETACH) < 0) {
-		fprintf(stderr, "umount of /sys failed: %s\n", strerror(errno));
-		return -1;
-	}
-	if (mount(name, "/sys", "sysfs", 0, NULL) < 0) {
-		fprintf(stderr, "mount of /sys failed: %s\n",strerror(errno));
-		return -1;
-	}
-
-	/* Setup bind mounts for config files in /etc */
-	bind_etc(name);
 
 	fflush(stdout);
 
diff --git a/lib/Makefile b/lib/Makefile
index a42b885..f7cbc8d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -1,5 +1,9 @@
 include ../Config
 
+ifeq ($(IP_CONFIG_SETNS),y)
+	CFLAGS += -DHAVE_SETNS
+endif
+
 CFLAGS += -fPIC
 
 UTILOBJ=utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o inet_proto.o
diff --git a/lib/utils.c b/lib/utils.c
index 987377b..ff14881 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -27,7 +27,11 @@
 #include <linux/param.h>
 #include <time.h>
 #include <sys/time.h>
+#include <sys/mount.h>
+#include <sys/types.h>
 #include <errno.h>
+#include <sched.h>
+#include <dirent.h>
 
 
 #include "utils.h"
@@ -856,3 +860,87 @@ int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6)
 	else
 		return inet_pton(AF_INET, src, dst);
 }
+
+#ifndef HAVE_SETNS
+static int setns(int fd, int nstype)
+{
+#ifdef __NR_setns
+	return syscall(__NR_setns, fd, nstype);
+#else
+	errno = ENOSYS;
+	return -1;
+#endif
+}
+#endif /* HAVE_SETNS */
+
+static void bind_etc(const char *name)
+{
+	char etc_netns_path[MAXPATHLEN];
+	char netns_name[MAXPATHLEN];
+	char etc_name[MAXPATHLEN];
+	struct dirent *entry;
+	DIR *dir;
+
+	snprintf(etc_netns_path, sizeof(etc_netns_path), "%s/%s", NETNS_ETC_DIR, name);
+	dir = opendir(etc_netns_path);
+	if (!dir)
+		return;
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (strcmp(entry->d_name, ".") == 0)
+			continue;
+		if (strcmp(entry->d_name, "..") == 0)
+			continue;
+		snprintf(netns_name, sizeof(netns_name), "%s/%s", etc_netns_path, entry->d_name);
+		snprintf(etc_name, sizeof(etc_name), "/etc/%s", entry->d_name);
+		if (mount(netns_name, etc_name, "none", MS_BIND, NULL) < 0) {
+			fprintf(stderr, "Bind %s -> %s failed: %s\n",
+				netns_name, etc_name, strerror(errno));
+		}
+	}
+	closedir(dir);
+}
+
+int netns_switch(char *name)
+{
+	char net_path[MAXPATHLEN];
+	int netns;
+
+	snprintf(net_path, sizeof(net_path), "%s/%s", NETNS_RUN_DIR, name);
+	netns = open(net_path, O_RDONLY | O_CLOEXEC);
+	if (netns < 0) {
+		fprintf(stderr, "Cannot open network namespace \"%s\": %s\n",
+			name, strerror(errno));
+		return -1;
+	}
+
+	if (setns(netns, CLONE_NEWNET) < 0) {
+		fprintf(stderr, "setting the network namespace \"%s\" failed: %s\n",
+			name, strerror(errno));
+		return -1;
+	}
+
+	if (unshare(CLONE_NEWNS) < 0) {
+		fprintf(stderr, "unshare failed: %s\n", strerror(errno));
+		return -1;
+	}
+	/* Don't let any mounts propagate back to the parent */
+	if (mount("", "/", "none", MS_SLAVE | MS_REC, NULL)) {
+		fprintf(stderr, "\"mount --make-rslave /\" failed: %s\n",
+			strerror(errno));
+		return -1;
+	}
+	/* Mount a version of /sys that describes the network namespace */
+	if (umount2("/sys", MNT_DETACH) < 0) {
+		fprintf(stderr, "umount of /sys failed: %s\n", strerror(errno));
+		return -1;
+	}
+	if (mount(name, "/sys", "sysfs", 0, NULL) < 0) {
+		fprintf(stderr, "mount of /sys failed: %s\n",strerror(errno));
+		return -1;
+	}
+
+	/* Setup bind mounts for config files in /etc */
+	bind_etc(name);
+	return 0;
+}
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 2d42e98..0fb759d 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -31,7 +31,8 @@ ip \- show / manipulate routing, devices, policy routing and tunnels
 \fB\-r\fR[\fIesolve\fR] |
 \fB\-f\fR[\fIamily\fR] {
 .BR inet " | " inet6 " | " ipx " | " dnet " | " link " } | "
-\fB\-o\fR[\fIneline\fR] }
+\fB\-o\fR[\fIneline\fR] |
+\fB\-n\fR[\fIetns\fR] }
 
 
 .SH OPTIONS
@@ -134,6 +135,26 @@ the output.
 use the system's name resolver to print DNS names instead of
 host addresses.
 
+.TP
+.BR "\-n" , " \-net" , " \-netns " <NETNS>
+switches
+.B ip
+to the specified network namespace
+.IR NETNS .
+Actually it just simplifies executing of:
+
+.B ip netns exec
+.IR NETNS
+.B ip
+.RI "[ " OPTIONS " ] " OBJECT " { " COMMAND " | "
+.BR help " }"
+
+to
+
+.B ip
+.RI "-n[etns] " NETNS " [ " OPTIONS " ] " OBJECT " { " COMMAND " | "
+.BR help " }"
+
 .SH IP - COMMAND SYNTAX
 
 .SS
-- 
2.1.3

^ permalink raw reply related

* [net-next PATCH v7 resubmit 2/4] arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
From: Alexander Duyck @ 2014-12-11 23:02 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo,
	linux-arch, mikey, linux, donald.c.skidmore, matthew.vick, geert,
	jeffrey.t.kirsher, romieu, paulmck, nic_swsd, arnd, will.deacon,
	michael, tony.luck, torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211225250.25464.30291.stgit@ahduyck-server>

There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 Documentation/memory-barriers.txt   |   42 +++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/barrier.h      |    4 +++
 arch/arm64/include/asm/barrier.h    |    3 +++
 arch/ia64/include/asm/barrier.h     |    3 +++
 arch/metag/include/asm/barrier.h    |   14 ++++++------
 arch/mips/include/asm/barrier.h     |    9 ++++----
 arch/powerpc/include/asm/barrier.h  |   13 +++++++----
 arch/s390/include/asm/barrier.h     |    2 ++
 arch/sparc/include/asm/barrier_64.h |    3 +++
 arch/x86/include/asm/barrier.h      |   11 ++++++---
 arch/x86/um/asm/barrier.h           |   13 ++++++-----
 include/asm-generic/barrier.h       |    8 +++++++
 12 files changed, 99 insertions(+), 26 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 22a969c..e5d5732 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1615,6 +1615,48 @@ There are some more advanced barrier functions:
      operations" subsection for information on where to use these.
 
 
+ (*) dma_wmb();
+ (*) dma_rmb();
+
+     These are for use with consistent memory to guarantee the ordering
+     of writes or reads of shared memory accessible to both the CPU and a
+     DMA capable device.
+
+     For example, consider a device driver that shares memory with a device
+     and uses a descriptor status value to indicate if the descriptor belongs
+     to the device or the CPU, and a doorbell to notify it when new
+     descriptors are available:
+
+	if (desc->status != DEVICE_OWN) {
+		/* do not read data until we own descriptor */
+		dma_rmb();
+
+		/* read/modify data */
+		read_data = desc->data;
+		desc->data = write_data;
+
+		/* flush modifications before status update */
+		dma_wmb();
+
+		/* assign ownership */
+		desc->status = DEVICE_OWN;
+
+		/* force memory to sync before notifying device via MMIO */
+		wmb();
+
+		/* notify device of new descriptors */
+		writel(DESC_NOTIFY, doorbell);
+	}
+
+     The dma_rmb() allows us guarantee the device has released ownership
+     before we read the data from the descriptor, and he dma_wmb() allows
+     us to guarantee the data is written to the descriptor before the device
+     can see it now has ownership.  The wmb() is needed to guarantee that the
+     cache coherent memory writes have completed before attempting a write to
+     the cache incoherent MMIO region.
+
+     See Documentation/DMA-API.txt for more information on consistent memory.
+
 MMIO WRITE BARRIER
 ------------------
 
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index c6a3e73..d2f81e6 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -43,10 +43,14 @@
 #define mb()		do { dsb(); outer_sync(); } while (0)
 #define rmb()		dsb()
 #define wmb()		do { dsb(st); outer_sync(); } while (0)
+#define dma_rmb()	dmb(osh)
+#define dma_wmb()	dmb(oshst)
 #else
 #define mb()		barrier()
 #define rmb()		barrier()
 #define wmb()		barrier()
+#define dma_rmb()	barrier()
+#define dma_wmb()	barrier()
 #endif
 
 #ifndef CONFIG_SMP
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 6389d60..a5abb00 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -32,6 +32,9 @@
 #define rmb()		dsb(ld)
 #define wmb()		dsb(st)
 
+#define dma_rmb()	dmb(oshld)
+#define dma_wmb()	dmb(oshst)
+
 #ifndef CONFIG_SMP
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index e8fffb0..f6769eb 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -39,6 +39,9 @@
 #define rmb()		mb()
 #define wmb()		mb()
 
+#define dma_rmb()	mb()
+#define dma_wmb()	mb()
+
 #ifdef CONFIG_SMP
 # define smp_mb()	mb()
 #else
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 6d8b8c9..d703d8e 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -4,8 +4,6 @@
 #include <asm/metag_mem.h>
 
 #define nop()		asm volatile ("NOP")
-#define mb()		wmb()
-#define rmb()		barrier()
 
 #ifdef CONFIG_METAG_META21
 
@@ -41,11 +39,13 @@ static inline void wr_fence(void)
 
 #endif /* !CONFIG_METAG_META21 */
 
-static inline void wmb(void)
-{
-	/* flush writes through the write combiner */
-	wr_fence();
-}
+/* flush writes through the write combiner */
+#define mb()		wr_fence()
+#define rmb()		barrier()
+#define wmb()		mb()
+
+#define dma_rmb()	rmb()
+#define dma_wmb()	wmb()
 
 #ifndef CONFIG_SMP
 #define fence()		do { } while (0)
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 3d69aa8..2b8bbbc 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -75,20 +75,21 @@
 
 #include <asm/wbflush.h>
 
-#define wmb()		fast_wmb()
-#define rmb()		fast_rmb()
 #define mb()		wbflush()
 #define iob()		wbflush()
 
 #else /* !CONFIG_CPU_HAS_WB */
 
-#define wmb()		fast_wmb()
-#define rmb()		fast_rmb()
 #define mb()		fast_mb()
 #define iob()		fast_iob()
 
 #endif /* !CONFIG_CPU_HAS_WB */
 
+#define wmb()		fast_wmb()
+#define rmb()		fast_rmb()
+#define dma_wmb()	fast_wmb()
+#define dma_rmb()	fast_rmb()
+
 #if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
 #  define smp_mb()	__sync()
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index cb6d66c..a3bf5be 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -36,8 +36,6 @@
 
 #define set_mb(var, value)	do { var = value; mb(); } while (0)
 
-#ifdef CONFIG_SMP
-
 #ifdef __SUBARCH_HAS_LWSYNC
 #    define SMPWMB      LWSYNC
 #else
@@ -45,12 +43,17 @@
 #endif
 
 #define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define dma_rmb()	__lwsync()
+#define dma_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
+
+#ifdef CONFIG_SMP
+#define smp_lwsync()	__lwsync()
 
 #define smp_mb()	mb()
 #define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #else
-#define __lwsync()	barrier()
+#define smp_lwsync()	barrier()
 
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
@@ -72,7 +75,7 @@
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_lwsync();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
@@ -80,7 +83,7 @@ do {									\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_lwsync();							\
 	___p1;								\
 })
 
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 33d191d..8d72471 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -24,6 +24,8 @@
 
 #define rmb()				mb()
 #define wmb()				mb()
+#define dma_rmb()			rmb()
+#define dma_wmb()			wmb()
 #define smp_mb()			mb()
 #define smp_rmb()			rmb()
 #define smp_wmb()			wmb()
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 6c974c0..7664894 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,6 +37,9 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
+#define dma_rmb()	rmb()
+#define dma_wmb()	wmb()
+
 #define set_mb(__var, __value) \
 	do { __var = __value; membar_safe("#StoreLoad"); } while(0)
 
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 5238000..2ab1eb3 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,13 +24,16 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
-# define smp_rmb()	rmb()
+#define dma_rmb()	rmb()
 #else
-# define smp_rmb()	barrier()
+#define dma_rmb()	barrier()
 #endif
+#define dma_wmb()	barrier()
+
+#ifdef CONFIG_SMP
+#define smp_mb()	mb()
+#define smp_rmb()	dma_rmb()
 #define smp_wmb()	barrier()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 #else /* !SMP */
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index d6511d9..2d7d9a1 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -29,17 +29,18 @@
 
 #endif /* CONFIG_X86_32 */
 
-#ifdef CONFIG_SMP
-
-#define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
-#define smp_rmb()	rmb()
+#define dma_rmb()	rmb()
 #else /* CONFIG_X86_PPRO_FENCE */
-#define smp_rmb()	barrier()
+#define dma_rmb()	barrier()
 #endif /* CONFIG_X86_PPRO_FENCE */
+#define dma_wmb()	barrier()
 
-#define smp_wmb()	barrier()
+#ifdef CONFIG_SMP
 
+#define smp_mb()	mb()
+#define smp_rmb()	dma_rmb()
+#define smp_wmb()	barrier()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
 #else /* CONFIG_SMP */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 1402fa8..f5c40b0 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -42,6 +42,14 @@
 #define wmb()	mb()
 #endif
 
+#ifndef dma_rmb
+#define dma_rmb()	rmb()
+#endif
+
+#ifndef dma_wmb
+#define dma_wmb()	wmb()
+#endif
+
 #ifndef read_barrier_depends
 #define read_barrier_depends()		do { } while (0)
 #endif

^ permalink raw reply related

* [net-next PATCH v7 resubmit 3/4] r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
From: Alexander Duyck @ 2014-12-11 23:02 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo,
	linux-arch, mikey, linux, donald.c.skidmore, matthew.vick, geert,
	jeffrey.t.kirsher, romieu, paulmck, nic_swsd, arnd, will.deacon,
	michael, tony.luck, torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211225250.25464.30291.stgit@ahduyck-server>

The r8169 use a pair of wmb() calls when setting up the descriptor rings.
The first is to synchronize the descriptor data with the descriptor status,
and the second is to synchronize the descriptor status with the use of the
MMIO doorbell to notify the device that descriptors are ready.  This can
come at a heavy price on some systems, and is not really necessary on
systems such as x86 as a simple barrier() would suffice to order store/store
accesses.  As such we can replace the first memory barrier with
dma_wmb() to reduce the cost for these accesses.

In addition the r8169 uses a rmb() to prevent compiler optimization in the
cleanup paths, however by moving the barrier down a few lines and replacing
it with a dma_rmb() we should be able to use it to guarantee
descriptor accesses do not occur until the device has updated the DescOwn
bit from its end.

One last change I made is to move the update of cur_tx in the xmit path to
after the wmb.  This way we can guarantee the device and all CPUs should
see the DescOwn update before they see the cur_tx value update.

Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/realtek/r8169.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 3dad7e8..088136b 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6605,6 +6605,9 @@ static inline void rtl8169_mark_to_asic(struct RxDesc *desc, u32 rx_buf_sz)
 {
 	u32 eor = le32_to_cpu(desc->opts1) & RingEnd;
 
+	/* Force memory writes to complete before releasing descriptor */
+	dma_wmb();
+
 	desc->opts1 = cpu_to_le32(DescOwn | eor | rx_buf_sz);
 }
 
@@ -6612,7 +6615,6 @@ static inline void rtl8169_map_to_asic(struct RxDesc *desc, dma_addr_t mapping,
 				       u32 rx_buf_sz)
 {
 	desc->addr = cpu_to_le64(mapping);
-	wmb();
 	rtl8169_mark_to_asic(desc, rx_buf_sz);
 }
 
@@ -7073,16 +7075,18 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	skb_tx_timestamp(skb);
 
-	wmb();
+	/* Force memory writes to complete before releasing descriptor */
+	dma_wmb();
 
 	/* Anti gcc 2.95.3 bugware (sic) */
 	status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
 	txd->opts1 = cpu_to_le32(status);
 
-	tp->cur_tx += frags + 1;
-
+	/* Force all memory writes to complete before notifying device */
 	wmb();
 
+	tp->cur_tx += frags + 1;
+
 	RTL_W8(TxPoll, NPQ);
 
 	mmiowb();
@@ -7181,11 +7185,16 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		struct ring_info *tx_skb = tp->tx_skb + entry;
 		u32 status;
 
-		rmb();
 		status = le32_to_cpu(tp->TxDescArray[entry].opts1);
 		if (status & DescOwn)
 			break;
 
+		/* This barrier is needed to keep us from reading
+		 * any other fields out of the Tx descriptor until
+		 * we know the status of DescOwn
+		 */
+		dma_rmb();
+
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
@@ -7280,11 +7289,16 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 		struct RxDesc *desc = tp->RxDescArray + entry;
 		u32 status;
 
-		rmb();
 		status = le32_to_cpu(desc->opts1) & tp->opts1_mask;
-
 		if (status & DescOwn)
 			break;
+
+		/* This barrier is needed to keep us from reading
+		 * any other fields out of the Rx descriptor until
+		 * we know the status of DescOwn
+		 */
+		dma_rmb();
+
 		if (unlikely(status & RxRES)) {
 			netif_info(tp, rx_err, dev, "Rx ERROR. status = %08x\n",
 				   status);
@@ -7346,7 +7360,6 @@ process_pkt:
 		}
 release_descriptor:
 		desc->opts2 = 0;
-		wmb();
 		rtl8169_mark_to_asic(desc, rx_buf_sz);
 	}
 

^ permalink raw reply related

* [net-next PATCH v7 resubmit 4/4] fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
From: Alexander Duyck @ 2014-12-11 23:02 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo,
	linux-arch, mikey, linux, donald.c.skidmore, matthew.vick, geert,
	jeffrey.t.kirsher, romieu, paulmck, nic_swsd, arnd, will.deacon,
	michael, tony.luck, torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211225250.25464.30291.stgit@ahduyck-server>

This change makes it so that dma_rmb is used when reading the Rx
descriptor.  The advantage of dma_rmb is that it allows for a much
lower cost barrier on x86, powerpc, arm, and arm64 architectures than a
traditional memory barrier when dealing with reads that only have to
synchronize to coherent memory.

In addition I have updated the code so that it just checks to see if any
bits have been set instead of just the DD bit since the DD bit will always
be set as a part of a descriptor write-back so we just need to check for a
non-zero value being present at that memory location rather than just
checking for any specific bit.  This allows the code itself to appear much
cleaner and allows the compiler more room to optimize.

Cc: Matthew Vick <matthew.vick@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +++---
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 ++++-----
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index ee1ecb1..eb088b1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -615,14 +615,14 @@ static bool fm10k_clean_rx_irq(struct fm10k_q_vector *q_vector,
 
 		rx_desc = FM10K_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!fm10k_test_staterr(rx_desc, FM10K_RXD_STATUS_DD))
+		if (!rx_desc->d.staterr)
 			break;
 
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STATUS_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		dma_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = fm10k_fetch_rx_buffer(rx_ring, rx_desc, skb);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 485d2c6..44f1174 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6912,14 +6912,14 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 
 		rx_desc = IGB_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!igb_test_staterr(rx_desc, E1000_RXD_STAT_DD))
+		if (!rx_desc->wb.upper.status_error)
 			break;
 
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STAT_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		dma_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = igb_fetch_rx_buffer(rx_ring, rx_desc, skb);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 798b055..2ed2c7d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2009,15 +2009,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 		rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_DD))
+		if (!rx_desc->wb.upper.status_error)
 			break;
 
-		/*
-		 * This memory barrier is needed to keep us from reading
+		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STAT_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		dma_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = ixgbe_fetch_rx_buffer(rx_ring, rx_desc);

^ permalink raw reply related

* [net-next PATCH v7 resubmit 0/4] arch: Add lightweight memory barriers for coherent memory access
From: Alexander Duyck @ 2014-12-11 23:01 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo,
	linux-arch, mikey, linux, donald.c.skidmore, matthew.vick, geert,
	jeffrey.t.kirsher, romieu, paulmck, nic_swsd, arnd, will.deacon,
	michael, tony.luck, torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141125203310.8240.27370.stgit@ahduyck-server>

These patches introduce two new primitives for synchronizing cache coherent
memory writes and reads.  These two new primitives are:

	dma_rmb()
	dma_wmb()

The first patch cleans up some unnecessary overhead related to the
definition of read_barrier_depends, smp_read_barrier_depends, and comments
related to the barrier.

The second patch adds the primitives for the applicable architectures and
asm-generic.

The third patch adds the barriers to r8169 which turns out to be a good
example of where the new barriers might be useful as they have full
rmb()/wmb() barriers ordering accesses to the descriptors and the DescOwn
bit.

The fourth patch adds support for coherent_rmb() to the Intel fm10k, igb,
and ixgbe drivers.  Testing with the ixgbe driver has shown a processing
time reduction of at least 7ns per 64B frame on a Core i7-4930K.

This patch series is essentially the v7 for:
v4-7:	Add lightweight memory barriers for coherent memory access
v3:	Add lightweight memory barriers fast_rmb() and fast_wmb()
v2:	Introduce load_acquire() and store_release()
v1:	Introduce read_acquire()

The key changes in this patch series versus the earlier patches are:
v7 resubmit:
	- Added Acked-by: Ben Herrenschmidt from v5 to dma_rmb/wmb patch
	- No code changes from previous set, still applies cleanly and builds.
v7:
	- Dropped test/debug patch that was accidentally slipped in
v6:
	- Replaced "memory based device I/O" with "consistent memory" in
	  docs
	- Added reference to DMA-API.txt to explain consistent memory
v5:
	- Renamed barriers dma_rmb and dma_wmb
	- Undid smp_wmb changes in x86 and PowerPC
	- Defined smp_rmb as __lwsync for SMP case on PowerPC
v4:
	- Renamed barriers coherent_rmb and coherent_wmb
	- Added smp_lwsync for use in smp_load_acquire/smp_store_release
v3:
	- Moved away from acquire()/store() and instead focused on barriers
	- Added cleanup of read_barrier_depends
	- Added change in r8169 to fix cur_tx/DescOwn ordering
	- Simplified changes to just replacing/moving barriers in r8169
	- Added update to documentation with code example
v2:
	- Renamed read_acquire() to be consistent with smp_load_acquire()
	- Changed barrier used to be consistent with smp_load_acquire()
	- Updated PowerPC code to use __lwsync based on IBM article
	- Added store_release() as this is a viable use case for drivers
	- Added r8169 patch which is able to fully use primitives
	- Added fm10k/igb/ixgbe patch which is able to test performance

---

Alexander Duyck (4):
      arch: Cleanup read_barrier_depends() and comments
      arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
      r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
      fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads


 Documentation/memory-barriers.txt             |   42 +++++++++++++++
 arch/alpha/include/asm/barrier.h              |   51 ++++++++++++++++++
 arch/arm/include/asm/barrier.h                |    4 +
 arch/arm64/include/asm/barrier.h              |    3 +
 arch/blackfin/include/asm/barrier.h           |   51 ++++++++++++++++++
 arch/ia64/include/asm/barrier.h               |   25 ++++-----
 arch/metag/include/asm/barrier.h              |   19 ++++---
 arch/mips/include/asm/barrier.h               |   61 ++--------------------
 arch/powerpc/include/asm/barrier.h            |   19 ++++---
 arch/s390/include/asm/barrier.h               |    7 ++-
 arch/sparc/include/asm/barrier_64.h           |    7 ++-
 arch/x86/include/asm/barrier.h                |   70 ++++---------------------
 arch/x86/um/asm/barrier.h                     |   20 ++++---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 +--
 drivers/net/ethernet/realtek/r8169.c          |   29 ++++++++--
 include/asm-generic/barrier.h                 |    8 +++
 18 files changed, 258 insertions(+), 179 deletions(-)

--

^ permalink raw reply

* [net-next PATCH v7 resubmit 1/4] arch: Cleanup read_barrier_depends() and comments
From: Alexander Duyck @ 2014-12-11 23:01 UTC (permalink / raw)
  To: netdev, linux-kernel, davem
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo,
	linux-arch, mikey, linux, donald.c.skidmore, matthew.vick, geert,
	jeffrey.t.kirsher, romieu, paulmck, nic_swsd, arnd, will.deacon,
	michael, tony.luck, torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211225250.25464.30291.stgit@ahduyck-server>

This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 arch/alpha/include/asm/barrier.h    |   51 ++++++++++++++++++++++++++++++
 arch/blackfin/include/asm/barrier.h |   51 ++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     |   22 +++++--------
 arch/metag/include/asm/barrier.h    |    7 ++--
 arch/mips/include/asm/barrier.h     |   52 -------------------------------
 arch/powerpc/include/asm/barrier.h  |    6 ++--
 arch/s390/include/asm/barrier.h     |    5 ++-
 arch/sparc/include/asm/barrier_64.h |    4 +-
 arch/x86/include/asm/barrier.h      |   59 ++---------------------------------
 arch/x86/um/asm/barrier.h           |    7 ++--
 10 files changed, 129 insertions(+), 135 deletions(-)

diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h
index 3832bdb..77516c8 100644
--- a/arch/alpha/include/asm/barrier.h
+++ b/arch/alpha/include/asm/barrier.h
@@ -7,6 +7,57 @@
 #define rmb()	__asm__ __volatile__("mb": : :"memory")
 #define wmb()	__asm__ __volatile__("wmb": : :"memory")
 
+/**
+ * read_barrier_depends - Flush all pending reads that subsequents reads
+ * depend on.
+ *
+ * No data-dependent reads from memory-like regions are ever reordered
+ * over this barrier.  All reads preceding this primitive are guaranteed
+ * to access memory (but not necessarily other CPUs' caches) before any
+ * reads following this primitive that depend on the data return by
+ * any of the preceding reads.  This primitive is much lighter weight than
+ * rmb() on most CPUs, and is never heavier weight than is
+ * rmb().
+ *
+ * These ordering constraints are respected by both the local CPU
+ * and the compiler.
+ *
+ * Ordering is not guaranteed by anything other than these primitives,
+ * not even by data dependencies.  See the documentation for
+ * memory_barrier() for examples and URLs to more information.
+ *
+ * For example, the following code would force ordering (the initial
+ * value of "a" is zero, "b" is one, and "p" is "&a"):
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	b = 2;
+ *	memory_barrier();
+ *	p = &b;				q = p;
+ *					read_barrier_depends();
+ *					d = *q;
+ * </programlisting>
+ *
+ * because the read of "*q" depends on the read of "p" and these
+ * two reads are separated by a read_barrier_depends().  However,
+ * the following code, with the same initial values for "a" and "b":
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	a = 2;
+ *	memory_barrier();
+ *	b = 3;				y = b;
+ *					read_barrier_depends();
+ *					x = a;
+ * </programlisting>
+ *
+ * does not enforce ordering, since there is no data dependency between
+ * the read of "a" and the read of "b".  Therefore, on some CPUs, such
+ * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
+ * in cases like this where there are no data dependencies.
+ */
 #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
 
 #ifdef CONFIG_SMP
diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
index 4200068..dfb66fe 100644
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -22,6 +22,57 @@
 # define mb()	do { barrier(); smp_check_barrier(); smp_mark_barrier(); } while (0)
 # define rmb()	do { barrier(); smp_check_barrier(); } while (0)
 # define wmb()	do { barrier(); smp_mark_barrier(); } while (0)
+/*
+ * read_barrier_depends - Flush all pending reads that subsequents reads
+ * depend on.
+ *
+ * No data-dependent reads from memory-like regions are ever reordered
+ * over this barrier.  All reads preceding this primitive are guaranteed
+ * to access memory (but not necessarily other CPUs' caches) before any
+ * reads following this primitive that depend on the data return by
+ * any of the preceding reads.  This primitive is much lighter weight than
+ * rmb() on most CPUs, and is never heavier weight than is
+ * rmb().
+ *
+ * These ordering constraints are respected by both the local CPU
+ * and the compiler.
+ *
+ * Ordering is not guaranteed by anything other than these primitives,
+ * not even by data dependencies.  See the documentation for
+ * memory_barrier() for examples and URLs to more information.
+ *
+ * For example, the following code would force ordering (the initial
+ * value of "a" is zero, "b" is one, and "p" is "&a"):
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	b = 2;
+ *	memory_barrier();
+ *	p = &b;				q = p;
+ *					read_barrier_depends();
+ *					d = *q;
+ * </programlisting>
+ *
+ * because the read of "*q" depends on the read of "p" and these
+ * two reads are separated by a read_barrier_depends().  However,
+ * the following code, with the same initial values for "a" and "b":
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	a = 2;
+ *	memory_barrier();
+ *	b = 3;				y = b;
+ *					read_barrier_depends();
+ *					x = a;
+ * </programlisting>
+ *
+ * does not enforce ordering, since there is no data dependency between
+ * the read of "a" and the read of "b".  Therefore, on some CPUs, such
+ * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
+ * in cases like this where there are no data dependencies.
+ */
 # define read_barrier_depends()	do { barrier(); smp_check_barrier(); } while (0)
 #endif
 
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index a48957c..e8fffb0 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -35,26 +35,22 @@
  * it's (presumably) much slower than mf and (b) mf.a is supported for
  * sequential memory pages only.
  */
-#define mb()	ia64_mf()
-#define rmb()	mb()
-#define wmb()	mb()
-#define read_barrier_depends()	do { } while(0)
+#define mb()		ia64_mf()
+#define rmb()		mb()
+#define wmb()		mb()
 
 #ifdef CONFIG_SMP
 # define smp_mb()	mb()
-# define smp_rmb()	rmb()
-# define smp_wmb()	wmb()
-# define smp_read_barrier_depends()	read_barrier_depends()
-
 #else
-
 # define smp_mb()	barrier()
-# define smp_rmb()	barrier()
-# define smp_wmb()	barrier()
-# define smp_read_barrier_depends()	do { } while(0)
-
 #endif
 
+#define smp_rmb()	smp_mb()
+#define smp_wmb()	smp_mb()
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index c7591e8..6d8b8c9 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -47,8 +47,6 @@ static inline void wmb(void)
 	wr_fence();
 }
 
-#define read_barrier_depends()  do { } while (0)
-
 #ifndef CONFIG_SMP
 #define fence()		do { } while (0)
 #define smp_mb()        barrier()
@@ -82,7 +80,10 @@ static inline void fence(void)
 #define smp_wmb()       barrier()
 #endif
 #endif
-#define smp_read_barrier_depends()     do { } while (0)
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
 #define smp_store_release(p, v)						\
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index d0101dd..3d69aa8 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -10,58 +10,6 @@
 
 #include <asm/addrspace.h>
 
-/*
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- */
-
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index bab79a1..cb6d66c 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -33,7 +33,6 @@
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
-#define read_barrier_depends()  do { } while(0)
 
 #define set_mb(var, value)	do { var = value; mb(); } while (0)
 
@@ -50,16 +49,17 @@
 #define smp_mb()	mb()
 #define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
-#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define __lwsync()	barrier()
 
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while(0)
 #endif /* CONFIG_SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 /*
  * This is a barrier which prevents following instructions from being
  * started until the value of the argument x is known.  For example, if
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index b5dce65..33d191d 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -24,11 +24,12 @@
 
 #define rmb()				mb()
 #define wmb()				mb()
-#define read_barrier_depends()		do { } while(0)
 #define smp_mb()			mb()
 #define smp_rmb()			rmb()
 #define smp_wmb()			wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
 
 #define smp_mb__before_atomic()		smp_mb()
 #define smp_mb__after_atomic()		smp_mb()
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 305dcc3..6c974c0 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,7 +37,6 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
-#define read_barrier_depends()		do { } while(0)
 #define set_mb(__var, __value) \
 	do { __var = __value; membar_safe("#StoreLoad"); } while(0)
 
@@ -51,7 +50,8 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define smp_wmb()	__asm__ __volatile__("":::"memory")
 #endif
 
-#define smp_read_barrier_depends()	do { } while(0)
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
 
 #define smp_store_release(p, v)						\
 do {									\
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0f4460b..5238000 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,60 +24,6 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
-/**
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- **/
-
-#define read_barrier_depends()	do { } while (0)
-
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
@@ -86,16 +32,17 @@
 # define smp_rmb()	barrier()
 #endif
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 #else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
 #endif /* SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #if defined(CONFIG_X86_PPRO_FENCE)
 
 /*
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index cc04e67..d6511d9 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -29,8 +29,6 @@
 
 #endif /* CONFIG_X86_32 */
 
-#define read_barrier_depends()	do { } while (0)
-
 #ifdef CONFIG_SMP
 
 #define smp_mb()	mb()
@@ -42,7 +40,6 @@
 
 #define smp_wmb()	barrier()
 
-#define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
 #else /* CONFIG_SMP */
@@ -50,11 +47,13 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
 
 #endif /* CONFIG_SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 /*
  * Stop RDTSC speculation. This is needed when you need to use RDTSC
  * (or get_cycles or vread that possibly accesses the TSC) in a defined

^ permalink raw reply related

* Re: [PULL] virtio: virtio 1.0 support, misc patches
From: Stephen Rothwell @ 2014-12-11 23:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: sergei.shtylyov, kvm, netdev, linux-kernel, virtualization,
	Linus Torvalds, pbonzini, ben, David Miller, thuth
In-Reply-To: <20141211220157.GA22618@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 837 bytes --]

Hi Michael,

On Fri, 12 Dec 2014 00:01:57 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> It was on linux next for a while, then I wanted to tweak the commit log
> for some messages a bit.
> So since I rewrote the history anyway, I went ahead and rebased it
> on v3.18, after this rebase it's been in linux-next for several days.

Maybe you think you pushed it into a linux-next included branch (or
maybe you think I am fetching a branch that I am not), but when I look
at those commits in my tree (as I fetch Linus' tree into mine) none of
them precede any of the next-* tags ...

I am not doubting that these patches have been published and reviewed
and tested, all I am saying is that they have not been in linux-next
as those commits.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PULL] virtio: virtio 1.0 support, misc patches
From: Michael S. Tsirkin @ 2014-12-11 23:45 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: sergei.shtylyov, kvm, netdev, linux-kernel, virtualization,
	Linus Torvalds, pbonzini, ben, David Miller, thuth
In-Reply-To: <20141212102426.187891e9@canb.auug.org.au>

On Fri, Dec 12, 2014 at 10:24:26AM +1100, Stephen Rothwell wrote:
> Hi Michael,
> 
> On Fri, 12 Dec 2014 00:01:57 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > It was on linux next for a while, then I wanted to tweak the commit log
> > for some messages a bit.
> > So since I rewrote the history anyway, I went ahead and rebased it
> > on v3.18, after this rebase it's been in linux-next for several days.
> 
> Maybe you think you pushed it into a linux-next included branch (or
> maybe you think I am fetching a branch that I am not),

Oh.  I thought linux-next is pulling vhost-next, when in fact
it turns out it's pulling it's own linux-next branch.

So it looks like I didn't update that branch in a while - now I see
why it's a bit so quiet.

It's probably best to keep it at linux-next branch, as it is:
I've tweaked my push rules to update that branch now, and pushed
again.

Thanks a lot for letting me know!


> but when I look
> at those commits in my tree (as I fetch Linus' tree into mine) none of
> them precede any of the next-* tags ...
> 
> I am not doubting that these patches have been published and reviewed
> and tested, all I am saying is that they have not been in linux-next
> as those commits.

You are right of course.

> -- 
> Cheers,
> Stephen Rothwell                    sfr@canb.auug.org.au

^ permalink raw reply

* Re: [PATCH iproute2 v3] ip: Simplify executing ip cmd within network ns
From: Stephen Hemminger @ 2014-12-12  1:26 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418337171-2792-1-git-send-email-vadim4j@gmail.com>

On Fri, 12 Dec 2014 00:32:51 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:

> +
> +#define NETNS_RUN_DIR "/var/run/netns"
> +#define NETNS_ETC_DIR "/etc/netns"
> +
> +#ifndef CLONE_NEWNET
> +#define CLONE_NEWNET 0x40000000	/* New network namespace (lo, device, names sockets, etc) */
> +#endif
> +
> +#ifndef MNT_DETACH
> +#define MNT_DETACH	0x00000002	/* Just detach from the tree */
> +#endif /* MNT_DETACH */
> +
> +/* sys/mount.h may be out too old to have these */
> +#ifndef MS_REC
> +#define MS_REC		16384
> +#endif
> +
> +#ifndef MS_SLAVE
> +#define MS_SLAVE	(1 << 19)
> +#endif
> +
> +#ifndef MS_SHARED
> +#define MS_SHARED	(1 << 20)
> +#endif
> +
> +extern int netns_switch(char *netns);
> +

Maybe it would be cleaner to introduce a new file netns.h
with this and the setnetns syscall wrapper

^ permalink raw reply

* Re: [PATCH net-next v9 0/3] add hisilicon hip04 ethernet driver
From: Ding Tianhong @ 2014-12-12  1:46 UTC (permalink / raw)
  To: David Miller
  Cc: zhangfei.gao, linux, arnd, f.fainelli, sergei.shtylyov,
	mark.rutland, David.Laight, eric.dumazet, xuwei5,
	linux-arm-kernel, netdev, devicetree
In-Reply-To: <20141211.120414.806666053427479402.davem@davemloft.net>

On 2014/12/12 1:04, David Miller wrote:
> 
> The net-next tree is closed, therefore it is not appropriate to
> submit net-next changes at this time.
> 
> .
> 
Ok, sure, thanks.

Ding

^ permalink raw reply

* [PATCH net] net: dsa: bcm_sf2: force link for all fixed PHY devices
From: Florian Fainelli @ 2014-12-12  2:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, Florian Fainelli

For ports of the switch that we define as "fixed PHYs" such as MoCA, we
would have our Port 7 special handling that would allow us to assert the
link status indication.

For other ports, such as e.g: RGMII_1 connected to a cable modem, we
would rely on whatever the bootloader has left configured, which is a
bad assumption to make, we really need to force the link status
indication here.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 4f4c2a7888e5..feb29c4526f7 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -684,10 +684,9 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
 					 struct fixed_phy_status *status)
 {
 	struct bcm_sf2_priv *priv = ds_to_priv(ds);
-	u32 link, duplex, pause, speed;
+	u32 duplex, pause, speed;
 	u32 reg;
 
-	link = core_readl(priv, CORE_LNKSTS);
 	duplex = core_readl(priv, CORE_DUPSTS);
 	pause = core_readl(priv, CORE_PAUSESTS);
 	speed = core_readl(priv, CORE_SPDSTS);
@@ -701,22 +700,26 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
 	 * which means that we need to force the link at the port override
 	 * level to get the data to flow. We do use what the interrupt handler
 	 * did determine before.
+	 *
+	 * For the other ports, we just force the link status, since this is
+	 * a fixed PHY device.
 	 */
 	if (port == 7) {
 		status->link = priv->port_sts[port].link;
-		reg = core_readl(priv, CORE_STS_OVERRIDE_GMIIP_PORT(7));
-		reg |= SW_OVERRIDE;
-		if (status->link)
-			reg |= LINK_STS;
-		else
-			reg &= ~LINK_STS;
-		core_writel(priv, reg, CORE_STS_OVERRIDE_GMIIP_PORT(7));
 		status->duplex = 1;
 	} else {
-		status->link = !!(link & (1 << port));
+		status->link = 1;
 		status->duplex = !!(duplex & (1 << port));
 	}
 
+	reg = core_readl(priv, CORE_STS_OVERRIDE_GMIIP_PORT(port));
+	reg |= SW_OVERRIDE;
+	if (status->link)
+		reg |= LINK_STS;
+	else
+		reg &= ~LINK_STS;
+	core_writel(priv, reg, CORE_STS_OVERRIDE_GMIIP_PORT(port));
+
 	switch (speed) {
 	case SPDSTS_10:
 		status->speed = SPEED_10;
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net 0/2] net: dsa: two small bug fixes
From: David Miller @ 2014-12-12  2:00 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, computersforpeace, andrey.volkov
In-Reply-To: <1418330956-17151-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 11 Dec 2014 12:49:14 -0800

> Here are two small fixes for the DSA slave interface creation code:
> 
> - first patch fixes a null pointer de-reference with an invalid PHY
>   device pointer while calling phy_connect_direct()
> 
> - second path propagates the dsa_slave_phy_setup() error code down to
>   its caller: dsa_slave_create

Series applied.

Please give me some feedback on the FIXED_PHY Kconfig issue I
asked you about earlier today.

Thanks.

^ permalink raw reply

* Re: [RFC PATCH net-next 00/11] net: remove disable_irq() from ->ndo_poll_controller
From: David Miller @ 2014-12-12  2:14 UTC (permalink / raw)
  To: sd; +Cc: netdev, peterz, tglx
In-Reply-To: <20141211214537.GA4159@kria>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Thu, 11 Dec 2014 22:45:37 +0100

> Introduce disable_irq_nosleep() that returns true if it successfully
> synchronized against all handlers (there was no threaded handler
> running), false if it left some threads running.  And in
> ->ndo_poll_controller, only call the interrupt handler if
> synchronization was successful.
> 
> Both users of the poll controllers retry their action (alloc/xmit an
> skb) several times, with calls to the device's poll controller between
> attempts.  And hopefully, if the first attempt fails, we will still
> manage to get through?

This looks more palatable to me.

^ permalink raw reply

* Re: [net-next PATCH v7 resubmit 0/4] arch: Add lightweight memory barriers for coherent memory access
From: David Miller @ 2014-12-12  2:16 UTC (permalink / raw)
  To: alexander.h.duyck
  Cc: netdev, linux-kernel, mathieu.desnoyers, peterz, benh,
	heiko.carstens, mingo, linux-arch, mikey, linux,
	donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher, romieu,
	paulmck, nic_swsd, arnd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec
In-Reply-To: <20141211225250.25464.30291.stgit@ahduyck-server>

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Thu, 11 Dec 2014 15:01:43 -0800

> These patches introduce two new primitives for synchronizing cache coherent
> memory writes and reads.

Series applied, thanks Alex.

^ permalink raw reply

* Re: [PATCH net] net: dsa: bcm_sf2: force link for all fixed PHY devices
From: David Miller @ 2014-12-12  2:17 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev
In-Reply-To: <1418350362-29183-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 11 Dec 2014 18:12:42 -0800

> For ports of the switch that we define as "fixed PHYs" such as MoCA, we
> would have our Port 7 special handling that would allow us to assert the
> link status indication.
> 
> For other ports, such as e.g: RGMII_1 connected to a cable modem, we
> would rely on whatever the bootloader has left configured, which is a
> bad assumption to make, we really need to force the link status
> indication here.
> 
> Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [net PATCH] fib_trie: Fix trie balancing issue if new node pushes down existing node
From: David Miller @ 2014-12-12  2:32 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev
In-Reply-To: <20141211054815.1357.36977.stgit@ahduyck-vm-fedora20>

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Wed, 10 Dec 2014 21:49:22 -0800

> This patch addresses an issue with the level compression of the fib_trie.
> Specifically in the case of adding a new leaf that triggers a new node to
> be added that takes the place of the old node.  The result is a trie where
> the 1 child tnode is on one side and one leaf is on the other which gives
> you a very deep trie.  Below is the script I used to generate a trie on
> dummy0 with a 10.X.X.X family of addresses.
 ...
> What this fix does is start the rebalance at the newly created tnode
> instead of at the parent tnode.  This way if there is a gap between the
> parent and the new node it doesn't prevent the new tnode from being
> coalesced with any pre-existing nodes that may have been pushed into one
> of the new nodes child branches.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>

One has to be mindful with this code that what it's doing now might
be intentional.  For example, it might be doing things this way
on purpose in order to minimize rebalancing during route flaps.

Barring anything like that, I think your change is fine.

^ permalink raw reply

* Re: [PATCH net-next v2] r8169:update rtl8168g pcie ephy parameter
From: David Miller @ 2014-12-12  2:39 UTC (permalink / raw)
  To: hau; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1418218118-15018-1-git-send-email-hau@realtek.com>

From: Chunhao Lin <hau@realtek.com>
Date: Wed, 10 Dec 2014 21:28:38 +0800

> Add ephy parameter to rtl8168g.
> Also change the common function of rtl8168g from "rtl_hw_start_8168g_1" to
>  "rtl_hw_start_8168g". And function "rtl_hw_start_8168g_1" is used for
> setting rtl8168g hardware parameters.
> 
> Following is the explanation of what hardware parameter change for.
> rtl8168g may erroneous judge the PCIe signal quality and show the error bit
> on PCI configuration space when in PCIe low power mode.
> The following ephy parameters are for above issue.
> { 0x00, 0x0000,	0x0008 }
> { 0x0c, 0x37d0,	0x0820 }
> { 0x1e, 0x0000,	0x0001 }
> 
> rtl8168g may return to PCIe L0 from PCIe L0s low power mode too slow.
> The following ephy parameter is for above issue.
> { 0x19, 0x8000,	0x0000 }
> 
> Signed-off-by: Chunhao Lin <hau@realtek.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net-next v10 3/7] cxgb4/cxgb4i: set the max. pdu length in firmware
From: Karen Xie @ 2014-12-12  3:13 UTC (permalink / raw)
  To: linux-scsi, netdev
  Cc: kxie, hariprasad, anish, hch, James.Bottomley, michaelc, davem

From: Karen Xie <kxie@chelsio.com>

Programs the firmware of the maximum outgoing iscsi pdu length per connection.

Signed-off-by: Karen Xie <kxie@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |    1 
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c            |   69 ++++++++++++++++++-------
 2 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index beaf80a..89a75e3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -560,6 +560,7 @@ enum fw_flowc_mnem {
 	FW_FLOWC_MNEM_RCVNXT,
 	FW_FLOWC_MNEM_SNDBUF,
 	FW_FLOWC_MNEM_MSS,
+	FW_FLOWC_MNEM_TXDATAPLEN_MAX,
 };
 
 struct fw_flowc_mnemval {
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 197d7de..93ae720 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -75,6 +75,7 @@ typedef void (*cxgb4i_cplhandler_func)(struct cxgbi_device *, struct sk_buff *);
 static void *t4_uld_add(const struct cxgb4_lld_info *);
 static int t4_uld_rx_handler(void *, const __be64 *, const struct pkt_gl *);
 static int t4_uld_state_change(void *, enum cxgb4_state state);
+static inline int send_tx_flowc_wr(struct cxgbi_sock *);
 
 static const struct cxgb4_uld_info cxgb4i_uld_info = {
 	.name = DRV_MODULE_NAME,
@@ -392,6 +393,12 @@ static void send_abort_req(struct cxgbi_sock *csk)
 
 	if (unlikely(csk->state == CTP_ABORTING) || !skb || !csk->cdev)
 		return;
+
+	if (!cxgbi_sock_flag(csk, CTPF_TX_DATA_SENT)) {
+		send_tx_flowc_wr(csk);
+		cxgbi_sock_set_flag(csk, CTPF_TX_DATA_SENT);
+	}
+
 	cxgbi_sock_set_state(csk, CTP_ABORTING);
 	cxgbi_sock_set_flag(csk, CTPF_ABORT_RPL_PENDING);
 	cxgbi_sock_purge_write_queue(csk);
@@ -495,20 +502,40 @@ static inline unsigned int calc_tx_flits_ofld(const struct sk_buff *skb)
 	return flits + sgl_len(cnt);
 }
 
-static inline void send_tx_flowc_wr(struct cxgbi_sock *csk)
+#define FLOWC_WR_NPARAMS_MIN	9
+static inline int tx_flowc_wr_credits(int *nparamsp, int *flowclenp)
+{
+	int nparams, flowclen16, flowclen;
+
+	nparams = FLOWC_WR_NPARAMS_MIN;
+	flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
+	flowclen16 = DIV_ROUND_UP(flowclen, 16);
+	flowclen = flowclen16 * 16;
+	/*
+	 * Return the number of 16-byte credits used by the FlowC request.
+	 * Pass back the nparams and actual FlowC length if requested.
+	 */
+	if (nparamsp)
+		*nparamsp = nparams;
+	if (flowclenp)
+		*flowclenp = flowclen;
+
+	return flowclen16;
+}
+
+static inline int send_tx_flowc_wr(struct cxgbi_sock *csk)
 {
 	struct sk_buff *skb;
 	struct fw_flowc_wr *flowc;
-	int flowclen, i;
+	int nparams, flowclen16, flowclen;
 
-	flowclen = 80;
+	flowclen16 = tx_flowc_wr_credits(&nparams, &flowclen);
 	skb = alloc_wr(flowclen, 0, GFP_ATOMIC);
 	flowc = (struct fw_flowc_wr *)skb->head;
 	flowc->op_to_nparams =
-		htonl(FW_WR_OP_V(FW_FLOWC_WR) | FW_FLOWC_WR_NPARAMS_V(8));
+		htonl(FW_WR_OP_V(FW_FLOWC_WR) | FW_FLOWC_WR_NPARAMS_V(nparams));
 	flowc->flowid_len16 =
-		htonl(FW_WR_LEN16_V(DIV_ROUND_UP(72, 16)) |
-				FW_WR_FLOWID_V(csk->tid));
+		htonl(FW_WR_LEN16_V(flowclen16) | FW_WR_FLOWID_V(csk->tid));
 	flowc->mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN;
 	flowc->mnemval[0].val = htonl(csk->cdev->pfvf);
 	flowc->mnemval[1].mnemonic = FW_FLOWC_MNEM_CH;
@@ -527,11 +554,9 @@ static inline void send_tx_flowc_wr(struct cxgbi_sock *csk)
 	flowc->mnemval[7].val = htonl(csk->advmss);
 	flowc->mnemval[8].mnemonic = 0;
 	flowc->mnemval[8].val = 0;
-	for (i = 0; i < 9; i++) {
-		flowc->mnemval[i].r4[0] = 0;
-		flowc->mnemval[i].r4[1] = 0;
-		flowc->mnemval[i].r4[2] = 0;
-	}
+	flowc->mnemval[8].mnemonic = FW_FLOWC_MNEM_TXDATAPLEN_MAX;
+	flowc->mnemval[8].val = 16384;
+
 	set_queue(skb, CPL_PRIORITY_DATA, csk);
 
 	log_debug(1 << CXGBI_DBG_TOE | 1 << CXGBI_DBG_SOCK,
@@ -541,6 +566,8 @@ static inline void send_tx_flowc_wr(struct cxgbi_sock *csk)
 		csk->advmss);
 
 	cxgb4_ofld_send(csk->cdev->ports[csk->port_id], skb);
+
+	return flowclen16;
 }
 
 static inline void make_tx_data_wr(struct cxgbi_sock *csk, struct sk_buff *skb,
@@ -602,6 +629,7 @@ static int push_tx_frames(struct cxgbi_sock *csk, int req_completion)
 		int dlen = skb->len;
 		int len = skb->len;
 		unsigned int credits_needed;
+		int flowclen16 = 0;
 
 		skb_reset_transport_header(skb);
 		if (is_ofld_imm(skb))
@@ -616,6 +644,17 @@ static int push_tx_frames(struct cxgbi_sock *csk, int req_completion)
 					sizeof(struct fw_ofld_tx_data_wr),
 					16);
 
+		/*
+		 * Assumes the initial credits is large enough to support
+		 * fw_flowc_wr plus largest possible first payload
+		 */
+		if (!cxgbi_sock_flag(csk, CTPF_TX_DATA_SENT)) {
+			flowclen16 = send_tx_flowc_wr(csk);
+			csk->wr_cred -= flowclen16;
+			csk->wr_una_cred += flowclen16;
+			cxgbi_sock_set_flag(csk, CTPF_TX_DATA_SENT);
+		}
+
 		if (csk->wr_cred < credits_needed) {
 			log_debug(1 << CXGBI_DBG_PDU_TX,
 				"csk 0x%p, skb %u/%u, wr %d < %u.\n",
@@ -625,7 +664,7 @@ static int push_tx_frames(struct cxgbi_sock *csk, int req_completion)
 		}
 		__skb_unlink(skb, &csk->write_queue);
 		set_queue(skb, CPL_PRIORITY_DATA, csk);
-		skb->csum = credits_needed;
+		skb->csum = credits_needed + flowclen16;
 		csk->wr_cred -= credits_needed;
 		csk->wr_una_cred += credits_needed;
 		cxgbi_sock_enqueue_wr(csk, skb);
@@ -636,12 +675,6 @@ static int push_tx_frames(struct cxgbi_sock *csk, int req_completion)
 			csk->wr_cred, csk->wr_una_cred);
 
 		if (likely(cxgbi_skcb_test_flag(skb, SKCBF_TX_NEED_HDR))) {
-			if (!cxgbi_sock_flag(csk, CTPF_TX_DATA_SENT)) {
-				send_tx_flowc_wr(csk);
-				skb->csum += 5;
-				csk->wr_cred -= 5;
-				csk->wr_una_cred += 5;
-			}
 			len += cxgbi_ulp_extra_len(cxgbi_skcb_ulp_mode(skb));
 			make_tx_data_wr(csk, skb, dlen, len, credits_needed,
 					req_completion);

^ permalink raw reply related

* [PATCH net-next v10 4/7] cxgb4i: additional types of negative advice
From: Karen Xie @ 2014-12-12  3:13 UTC (permalink / raw)
  To: linux-scsi, netdev
  Cc: kxie, hariprasad, anish, hch, James.Bottomley, michaelc, davem

From: Karen Xie <kxie@chelsio.com>

Treat both CPL_ERR_KEEPALV_NEG_ADVICE and CPL_ERR_PERSIST_NEG_ADVICE as
negative advice.

Signed-off-by: Karen Xie <kxie@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 93ae720..8ca91fd 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -849,6 +849,13 @@ static void csk_act_open_retry_timer(unsigned long data)
 
 }
 
+static inline bool is_neg_adv(unsigned int status)
+{
+	return status == CPL_ERR_RTX_NEG_ADVICE ||
+		status == CPL_ERR_KEEPALV_NEG_ADVICE ||
+		status == CPL_ERR_PERSIST_NEG_ADVICE;
+}
+
 static void do_act_open_rpl(struct cxgbi_device *cdev, struct sk_buff *skb)
 {
 	struct cxgbi_sock *csk;
@@ -870,7 +877,7 @@ static void do_act_open_rpl(struct cxgbi_device *cdev, struct sk_buff *skb)
 		       "csk 0x%p,%u,0x%lx. ", (&csk->saddr), (&csk->daddr),
 		       atid, tid, status, csk, csk->state, csk->flags);
 
-	if (status == CPL_ERR_RTX_NEG_ADVICE)
+	if (is_neg_adv(status))
 		goto rel_skb;
 
 	module_put(THIS_MODULE);
@@ -976,8 +983,7 @@ static void do_abort_req_rss(struct cxgbi_device *cdev, struct sk_buff *skb)
 		       (&csk->saddr), (&csk->daddr),
 		       csk, csk->state, csk->flags, csk->tid, req->status);
 
-	if (req->status == CPL_ERR_RTX_NEG_ADVICE ||
-	    req->status == CPL_ERR_PERSIST_NEG_ADVICE)
+	if (is_neg_adv(req->status))
 		goto rel_skb;
 
 	cxgbi_sock_get(csk);

^ permalink raw reply related

* [PATCH net-next v10 7/7] libcxgbi: fix freeing skb prematurely
From: Karen Xie @ 2014-12-12  3:13 UTC (permalink / raw)
  To: linux-scsi, netdev
  Cc: kxie, hariprasad, anish, hch, James.Bottomley, michaelc, davem

From: Karen Xie <kxie@chelsio.com>

With debug turned on the debug print would access the skb after it is freed.
Fix it to free the skb after the debug print.

Signed-off-by: Karen Xie <kxie@chelsio.com>
---
 drivers/scsi/cxgbi/libcxgbi.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 7da59c3..eb58afc 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -2294,10 +2294,12 @@ int cxgbi_conn_xmit_pdu(struct iscsi_task *task)
 		return err;
 	}
 
-	kfree_skb(skb);
 	log_debug(1 << CXGBI_DBG_ISCSI | 1 << CXGBI_DBG_PDU_TX,
 		"itt 0x%x, skb 0x%p, len %u/%u, xmit err %d.\n",
 		task->itt, skb, skb->len, skb->data_len, err);
+
+	kfree_skb(skb);
+
 	iscsi_conn_printk(KERN_ERR, task->conn, "xmit err %d.\n", err);
 	iscsi_conn_failure(task->conn, ISCSI_ERR_XMIT_FAILED);
 	return err;

^ permalink raw reply related

* [PATCH net-next v10 2/7] cxgb4i: fix credit check for tx_data_wr
From: Karen Xie @ 2014-12-12  3:13 UTC (permalink / raw)
  To: linux-scsi, netdev
  Cc: kxie, hariprasad, anish, hch, James.Bottomley, michaelc, davem

From: Karen Xie <kxie@chelsio.com>

make sure any tx credit related checking is done before adding the wr header.

Signed-off-by: Karen Xie <kxie@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 8abe8a3..197d7de 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -549,10 +549,11 @@ static inline void make_tx_data_wr(struct cxgbi_sock *csk, struct sk_buff *skb,
 	struct fw_ofld_tx_data_wr *req;
 	unsigned int submode = cxgbi_skcb_ulp_mode(skb) & 3;
 	unsigned int wr_ulp_mode = 0, val;
+	bool imm = is_ofld_imm(skb);
 
 	req = (struct fw_ofld_tx_data_wr *)__skb_push(skb, sizeof(*req));
 
-	if (is_ofld_imm(skb)) {
+	if (imm) {
 		req->op_to_immdlen = htonl(FW_WR_OP_V(FW_OFLD_TX_DATA_WR) |
 					FW_WR_COMPL_F |
 					FW_WR_IMMDLEN_V(dlen));

^ permalink raw reply related

* [PATCH net-next v10 5/7] cxgb4i: handle non-pdu-aligned rx data
From: Karen Xie @ 2014-12-12  3:13 UTC (permalink / raw)
  To: linux-scsi, netdev
  Cc: kxie, hariprasad, anish, hch, James.Bottomley, michaelc, davem

From: Karen Xie <kxie@chelsio.com>

Abort the connection upon receiving of cpl_rx_data, which means the pdu cannot
be recovered from the tcp stream. This generally is due to pdu header
corruption.

Signed-off-by: Karen Xie <kxie@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 8ca91fd..8b7e8f8 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1037,6 +1037,27 @@ rel_skb:
 	__kfree_skb(skb);
 }
 
+static void do_rx_data(struct cxgbi_device *cdev, struct sk_buff *skb)
+{
+	struct cxgbi_sock *csk;
+	struct cpl_rx_data *cpl = (struct cpl_rx_data *)skb->data;
+	unsigned int tid = GET_TID(cpl);
+	struct cxgb4_lld_info *lldi = cxgbi_cdev_priv(cdev);
+	struct tid_info *t = lldi->tids;
+
+	csk = lookup_tid(t, tid);
+	if (!csk) {
+		pr_err("can't find connection for tid %u.\n", tid);
+	} else {
+		/* not expecting this, reset the connection. */
+		pr_err("csk 0x%p, tid %u, rcv cpl_rx_data.\n", csk, tid);
+		spin_lock_bh(&csk->lock);
+		send_abort_req(csk);
+		spin_unlock_bh(&csk->lock);
+	}
+	__kfree_skb(skb);
+}
+
 static void do_rx_iscsi_hdr(struct cxgbi_device *cdev, struct sk_buff *skb)
 {
 	struct cxgbi_sock *csk;
@@ -1456,6 +1477,7 @@ cxgb4i_cplhandler_func cxgb4i_cplhandlers[NUM_CPL_CMDS] = {
 	[CPL_SET_TCB_RPL] = do_set_tcb_rpl,
 	[CPL_RX_DATA_DDP] = do_rx_data_ddp,
 	[CPL_RX_ISCSI_DDP] = do_rx_data_ddp,
+	[CPL_RX_DATA] = do_rx_data,
 };
 
 int cxgb4i_ofld_init(struct cxgbi_device *cdev)

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox