Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 2/3] virtio_net: Introduce one dummy function virtnet_filter_rfs()
From: Tom Herbert @ 2014-01-15 17:54 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: Linux Netdev List, Eric Dumazet, David Miller, Zhi Yong Wu
In-Reply-To: <1389795654-28381-3-git-send-email-zwu.kernel@gmail.com>

Zhi, this is promising work! I can't wait to see how this impacts
network virtualization performance :-)

On Wed, Jan 15, 2014 at 6:20 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  drivers/net/virtio_net.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7b17240..046421c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1295,6 +1295,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>         return 0;
>  }
>
> +#ifdef CONFIG_RFS_ACCEL
> +static int virtnet_filter_rfs(struct net_device *net_dev,
> +               const struct sk_buff *skb, u16 rxq_index, u32 flow_id)
> +{
Does this need to be filled out with more stuff?

> +       return 0;
> +}
> +#endif /* CONFIG_RFS_ACCEL */
> +
>  static const struct net_device_ops virtnet_netdev = {
>         .ndo_open            = virtnet_open,
>         .ndo_stop            = virtnet_close,
> @@ -1309,6 +1317,9 @@ static const struct net_device_ops virtnet_netdev = {
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>         .ndo_poll_controller = virtnet_netpoll,
>  #endif
> +#ifdef CONFIG_RFS_ACCEL
> +       .ndo_rx_flow_steer   = virtnet_filter_rfs,
> +#endif
>  };
>
>  static void virtnet_config_changed_work(struct work_struct *work)
> --
> 1.7.6.5
>

^ permalink raw reply

* Re: TI CPSW Ethernet Tx performance regression
From: Ben Hutchings @ 2014-01-15 17:54 UTC (permalink / raw)
  To: Mugunthan V N; +Cc: netdev
In-Reply-To: <1389790129-5721-1-git-send-email-mugunthanvnm@ti.com>

On Wed, 2014-01-15 at 18:18 +0530, Mugunthan V N wrote:
> Hi
> 
> I am seeing a performance regression with CPSW driver on AM335x EVM. AM335x EVM
> CPSW has 3.2 kernel support [1] and Mainline support from 3.7. When I am
> comparing the performance between 3.2 and 3.13-rc4. TCP receive performance of
> CPSW between 3.2 and 3.13-rc4 is same (~180Mbps) but TCP Transmit performance
> is poor comparing to 3.2 kernel. In 3.2 kernel is it *256Mbps* and in 3.13-rc4
> it is *70Mbps*
> 
> Iperf version is *iperf version 2.0.5 (08 Jul 2010) pthreads* on both PC and EVM
> 
> On UDP transmit also performance is down comparing to 3.2 kernel. In 3.2 it is
> 196Mbps for 200Mbps band width and in 3.13-rc4 it is 92Mbps
> 
> Can someone point me out where can I look for improving Tx performance. I also
> checked whether there is Tx descriptor over flow and there is none. I have
> tries 3.11 and some older kernel, all are giving ~75Mbps Transmit performance
> only.
> 
> [1] - http://arago-project.org/git/projects/?p=linux-am33x.git;a=summary

If you don't get any specific suggestions, you could try bisecting to
find out which specific commit(s) changed the performance.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [Patch net-next] net_sched: act: fix a bug in tcf_register_action()
From: Cong Wang @ 2014-01-15 17:40 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Linux Kernel Network Developers, David S. Miller
In-Reply-To: <52D68069.1090804@mojatatu.com>

On Wed, Jan 15, 2014 at 4:34 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 01/14/14 17:48, Cong Wang wrote:
>>
>> In tcf_register_action() we check ->type and ->kind to see if there
>> is an existing action registered, but ipt action registers two
>> actions with same type but different kinds. This should be a valid
>> case, otherwise only xt can be registered.
>>
>
>
> We cant allow for conflicts by name or id - we want to catch them.
> So just introduce TCA_ACT_XT instead (ID 7)

Oh, I thought it is intentional to use the same type for xt and ipt.

>
> [
> Note: iptables used to be a constant moving API target
> and this is supposed to be the latest "backward compat mode".
> New kernel/iproute ==> We want to love "xt" more than "ipt".
> We infact want to eventually kill "ipt".
> but this preference is hard to achieve as you may have run into.
> I would be curious how you tested and run into this..
> ].
>

Just load the module, and you would see an error message. :)

^ permalink raw reply

* [PATCH net-next v2] xen-netfront: add support for IPv6 offloads
From: Paul Durrant @ 2014-01-15 17:30 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: Paul Durrant, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel

This patch adds support for IPv6 checksum offload and GSO when those
features are available in the backend.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
v2:
- Use xenbus_write rather than xenbus_printf

 drivers/net/xen-netfront.c |   48 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c41537b..d7bee8a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -617,7 +617,9 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		tx->flags |= XEN_NETTXF_extra_info;
 
 		gso->u.gso.size = skb_shinfo(skb)->gso_size;
-		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso->u.gso.type = (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) ?
+			XEN_NETIF_GSO_TYPE_TCPV6 :
+			XEN_NETIF_GSO_TYPE_TCPV4;
 		gso->u.gso.pad = 0;
 		gso->u.gso.features = 0;
 
@@ -809,15 +811,18 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
 		return -EINVAL;
 	}
 
-	/* Currently only TCPv4 S.O. is supported. */
-	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4 &&
+	    gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV6) {
 		if (net_ratelimit())
 			pr_warn("Bad GSO type %d\n", gso->u.gso.type);
 		return -EINVAL;
 	}
 
 	skb_shinfo(skb)->gso_size = gso->u.gso.size;
-	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+	skb_shinfo(skb)->gso_type =
+		(gso->u.gso.type == XEN_NETIF_GSO_TYPE_TCPV4) ?
+		SKB_GSO_TCPV4 :
+		SKB_GSO_TCPV6;
 
 	/* Header must be checked, and gso_segs computed. */
 	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
@@ -1191,6 +1196,15 @@ static netdev_features_t xennet_fix_features(struct net_device *dev,
 			features &= ~NETIF_F_SG;
 	}
 
+	if (features & NETIF_F_IPV6_CSUM) {
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
+				 "feature-ipv6-csum-offload", "%d", &val) < 0)
+			val = 0;
+
+		if (!val)
+			features &= ~NETIF_F_IPV6_CSUM;
+	}
+
 	if (features & NETIF_F_TSO) {
 		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
 				 "feature-gso-tcpv4", "%d", &val) < 0)
@@ -1200,6 +1214,15 @@ static netdev_features_t xennet_fix_features(struct net_device *dev,
 			features &= ~NETIF_F_TSO;
 	}
 
+	if (features & NETIF_F_TSO6) {
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
+				 "feature-gso-tcpv6", "%d", &val) < 0)
+			val = 0;
+
+		if (!val)
+			features &= ~NETIF_F_TSO6;
+	}
+
 	return features;
 }
 
@@ -1338,7 +1361,9 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	netif_napi_add(netdev, &np->napi, xennet_poll, 64);
 	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 				  NETIF_F_GSO_ROBUST;
-	netdev->hw_features	= NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
+	netdev->hw_features	= NETIF_F_SG |
+				  NETIF_F_IPV6_CSUM |
+				  NETIF_F_TSO | NETIF_F_TSO6;
 
 	/*
          * Assume that all hw features are available for now. This set
@@ -1716,6 +1741,19 @@ again:
 		goto abort_transaction;
 	}
 
+	err = xenbus_write(xbt, dev->nodename, "feature-gso-tcpv6", "1");
+	if (err) {
+		message = "writing feature-gso-tcpv6";
+		goto abort_transaction;
+	}
+
+	err = xenbus_write(xbt, dev->nodename, "feature-ipv6-csum-offload",
+			   "1");
+	if (err) {
+		message = "writing feature-ipv6-csum-offload";
+		goto abort_transaction;
+	}
+
 	err = xenbus_transaction_end(xbt, 0);
 	if (err) {
 		if (err == -EAGAIN)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 2/2] Documentation: Document the cephroot functionality
From: mark.doffman @ 2014-01-15 17:26 UTC (permalink / raw)
  To: ceph-devel
  Cc: Rob Taylor, sage, netdev, linux-kernel, linux-nfs, Mark Doffman
In-Reply-To: <cover.1389806186.git.mark.doffman@codethink.co.uk>

From: Rob Taylor <rob.taylor@codethink.co.uk>

Document using the cephfs as a root device, its purpose,
functionality and use.

Signed-off-by: Mark Doffman <mark.doffman@codethink.co.uk>
Signed-off-by: Rob Taylor <rob.taylor@codethink.co.uk>
Reviewed-by: Ian Molton <ian.molton@codethink.co.uk>
---
 Documentation/filesystems/{ => ceph}/ceph.txt |  0
 Documentation/filesystems/ceph/cephroot.txt   | 86 +++++++++++++++++++++++++++
 2 files changed, 86 insertions(+)
 rename Documentation/filesystems/{ => ceph}/ceph.txt (100%)
 create mode 100644 Documentation/filesystems/ceph/cephroot.txt

diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph/ceph.txt
similarity index 100%
rename from Documentation/filesystems/ceph.txt
rename to Documentation/filesystems/ceph/ceph.txt
diff --git a/Documentation/filesystems/ceph/cephroot.txt b/Documentation/filesystems/ceph/cephroot.txt
new file mode 100644
index 0000000..deda4f0
--- /dev/null
+++ b/Documentation/filesystems/ceph/cephroot.txt
@@ -0,0 +1,86 @@
+Mounting the root filesystem via Ceph (cephroot)
+===============================================
+
+Written 2013 by Rob Taylor <rob.taylor@codethink.co.uk>
+
+derived from nfsroot.txt:
+
+Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
+Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
+Updated 2006 by Horms <horms@verge.net.au>
+
+
+
+In order to use a diskless system, such as an X-terminal or printer server
+for example, it is necessary for the root filesystem to be present on a
+non-disk device. This may be an initramfs (see Documentation/filesystems/
+ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt), a
+filesystem mounted via NFS or a filesystem mounted via Ceph. The following
+text describes on how to use Ceph for the root filesystem.
+
+For the rest of this text 'client' means the diskless system, and 'server'
+means the Ceph server.
+
+
+1.) Enabling cephroot capabilities
+    -----------------------------
+
+In order to use cephroot, CEPH_FS needs to be selected as
+built-in during configuration. Once this has been selected, the cephroot
+option will become available, which should also be selected.
+
+In the networking options, kernel level autoconfiguration can be selected,
+along with the types of autoconfiguration to support. Selecting all of
+DHCP, BOOTP and RARP is safe.
+
+
+2.) Kernel command line
+    -------------------
+
+When the kernel has been loaded by a boot loader (see below) it needs to be
+told what root fs device to use. And in the case of cephroot, where to find
+both the server and the name of the directory on the server to mount as root.
+This can be established using the following kernel command line parameters:
+
+root=/dev/ceph
+
+This is necessary to enable the pseudo-Ceph-device. Note that it's not a
+real device but just a synonym to tell the kernel to use Ceph instead of
+a real device.
+
+If cephroot is not specified, it is expected that that a valid mount will be
+found via DHCP option 17, Root Path [1]
+
+cephroot=<monaddrs>:/[<subdir>],<ceph-opts>
+
+  <monaddrs>    Monitor addresses separated by commas. Each takes the form
+		host[:port]. If the port is not specified, the Ceph default
+		of 6789 is assumed.
+
+  <subdir>	A subdirectory subdir may be specified if a subset of the file
+		system is to be mounted
+
+  <ceph-opts>	Standard Ceph options. All options are separated by commas.
+		See Documentation/filesystems/ceph/ceph.txt for options and
+		their defaults.
+
+4.) References
+    ----------
+
+[1] http://tools.ietf.org/html/rfc2132
+
+5.) Credits
+    -------
+
+  cephroot was derived from nfsroot by Rob Taylor <rob.taylor@codethink.co.uk>
+  and Mark Doffman <mark.doffman@codethink.co.uk>
+
+  The nfsroot code in the kernel and the RARP support have been written
+  by Gero Kuhlmann <gero@gkminix.han.de>.
+
+  The rest of the IP layer autoconfiguration code has been written
+  by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
+
+  In order to write the initial version of nfsroot I would like to thank
+  Jens-Uwe Mager <jum@anubis.han.de> for his help.
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2 1/2] init: Add a new root device option, the Ceph file system
From: mark.doffman @ 2014-01-15 17:26 UTC (permalink / raw)
  To: ceph-devel
  Cc: Mark Doffman, sage, netdev, linux-kernel, linux-nfs, rob.taylor
In-Reply-To: <cover.1389806186.git.mark.doffman@codethink.co.uk>

From: Mark Doffman <mark.doffman@codethink.co.uk>

Analogous to NFS add a new root device option, the ability
to boot using the Ceph networked file system as the root fs.

This patch adds a new root device option '/dev/ceph' that
uses a ceph networked file system. File system parameters
are passed using a new kernel parameter: 'cephroot'.

The 'cephroot' parameters are very similar to 'nfsroot'.

Signed-off-by: Mark Doffman <mark.doffman@codethink.co.uk>
Reviewed-by: Ian Molton <ian.molton@codethink.co.uk>
---
 fs/ceph/Kconfig                |  10 +++
 fs/ceph/Makefile               |   1 +
 fs/ceph/root.c                 | 176 +++++++++++++++++++++++++++++++++++++++++
 include/linux/ceph/ceph_root.h |  10 +++
 include/linux/root_dev.h       |   1 +
 init/do_mounts.c               |  32 +++++++-
 net/ipv4/ipconfig.c            |  10 ++-
 7 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 fs/ceph/root.c
 create mode 100644 include/linux/ceph/ceph_root.h

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index ac9a2ef..325e83d 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -25,3 +25,13 @@ config CEPH_FSCACHE
 	  caching support for Ceph clients using FS-Cache
 
 endif
+
+config ROOT_CEPH
+	bool "Root file system on Ceph FS"
+	depends on CEPH_FS=y && IP_PNP
+	help
+	  If you want your system to mount its root file system via CEPH,
+	  choose Y here.  For details, read
+	  <file:Documentation/filesystems/ceph/cephroot.txt>.
+
+	  If unsure say N.
diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 32e3010..af2dcbf 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -10,3 +10,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
 	debugfs.o
 
 ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
+ceph-$(CONFIG_ROOT_CEPH) += root.o
diff --git a/fs/ceph/root.c b/fs/ceph/root.c
new file mode 100644
index 0000000..1559c19
--- /dev/null
+++ b/fs/ceph/root.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (C) 2012 Codethink Ltd. <mark.doffman@codethink.co.uk>
+ *
+ * This file is released under the GPL v2
+ *
+ * Allow a CephFS filesystem to be mounted as root.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/utsname.h>
+#include <linux/root_dev.h>
+#include <linux/in.h>
+#include <net/ipconfig.h>
+#include <linux/ceph/ceph_root.h>
+
+#define MAXPATHLEN 1024
+
+/* Parameters passed from the kernel command line */
+static char ceph_command_line_params[256] __initdata;
+
+/* server:path string passed to mount */
+static char ceph_root_device[MAXPATHLEN + 1] __initdata;
+
+/* Name of directory to mount */
+static char ceph_export_path[MAXPATHLEN + 1] __initdata;
+
+/* Mount options */
+static char ceph_root_options[256] __initdata;
+
+/*
+ *  Parse CephFS server and directory information passed on the kernel
+ *  command line.
+ *
+ *  cephroot=[<server-ip>][,<server-ips>]:<root-dir>[,<cephfs-options>]
+ */
+static int __init ceph_root_setup(char *line)
+{
+		ROOT_DEV = Root_CEPH;
+
+		strlcpy(ceph_command_line_params, line,
+				sizeof(ceph_command_line_params));
+
+		return 1;
+}
+
+__setup("cephroot=", ceph_root_setup);
+
+/*
+ * ceph_root_append - Concatenates an options or address string
+ * adding a ',' delimiter if neccessary.
+ *
+ * Returns 0 on success -E2BIG if the resulting string is too long.
+ */
+static int __init ceph_root_append(char *incoming,
+				   char *dest,
+				   const size_t destlen)
+{
+	int res = 0;
+
+	if (incoming != NULL && *incoming != '\0') {
+		size_t len = strlen(dest);
+
+		if (len && dest[len - 1] != ',') {
+			if (strlcat(dest, ",", destlen) > destlen)
+				res = -E2BIG;
+		}
+
+		if (strlcat(dest, incoming, destlen) > destlen)
+			res = -E2BIG;
+
+	}
+	return res;
+}
+
+/*
+ * ceph_root_parse_params - Parse out root export path and mount options from
+ * passed-in string @incoming.
+ *
+ * Copy the path into @path.
+ *
+ * Returns 0 on success -E2BIG if the resulting options string or device
+ * string are too long.
+ */
+static int __init ceph_root_parse_params(char *incoming, char *outpath,
+					 const size_t outpathlen)
+{
+	int res = -EINVAL;
+	char *options;
+	char *path;
+
+	options = strstr(incoming, ":/");
+	if (options == NULL)
+		options = strstr(incoming, "default");
+
+	if (options != NULL) {
+		path = strsep(&options, ",");
+		if (*path != '\0' && strcmp(path, "default") != 0)
+			strlcpy(outpath, path, outpathlen);
+		res = ceph_root_append(options, ceph_root_options,
+				sizeof(ceph_root_options));
+
+		if (res == 0) {
+			*path = '\0';
+			res = ceph_root_append(incoming, ceph_root_device,
+					sizeof(ceph_root_device));
+		}
+	}
+
+	return res;
+}
+
+/*
+ * ceph_root_data - Return mount device and data for CEPHROOT mount.
+ *
+ * @root_device: OUT: Address of string containing CEPHROOT device.
+ * @root_data: OUT: Address of string containing CEPHROOT mount options.
+ *
+ * Returns: 0 and sets @root_device and @root_data if successful.
+ *          error code if unsuccessful.
+ */
+int __init ceph_root_data(char **root_device, char **root_data)
+{
+	char *tmp_root_path = NULL;
+	const size_t tmplen = sizeof(ceph_export_path);
+	int len;
+	int res = -E2BIG;
+
+	tmp_root_path = kzalloc(tmplen, GFP_KERNEL);
+	if (tmp_root_path == NULL)
+		return -ENOMEM;
+
+	if (root_server_path[0] != '\0') {
+		if (ceph_root_parse_params(root_server_path, tmp_root_path,
+					tmplen))
+			goto out;
+	}
+
+	if (ceph_command_line_params[0] != '\0') {
+		if (ceph_root_parse_params(ceph_command_line_params,
+					tmp_root_path, tmplen))
+			goto out;
+	}
+
+	/*
+	 * Set up ceph_root_device. This looks like: server:/path
+	 *
+	 * At this point, utsname()->nodename contains our local
+	 * IP address or hostname, set by ipconfig.  If "%s" exists
+	 * in tmp_root_path, substitute the nodename, then shovel the whole
+	 * mess into ceph_root_device.
+	 */
+	len = snprintf(ceph_export_path, sizeof(ceph_export_path),
+				   tmp_root_path, utsname()->nodename);
+	if (len > (int)sizeof(ceph_export_path))
+		goto out;
+
+	len = strlcat(ceph_root_device, ceph_export_path,
+			sizeof(ceph_root_device));
+	if (len > (int)sizeof(ceph_root_device))
+		goto out;
+
+	pr_debug("Root-CEPH: Root device: %s\n", ceph_root_device);
+	pr_debug("Root-CEPH: Root options: %s\n", ceph_root_options);
+	*root_device = ceph_root_device;
+	*root_data = ceph_root_options;
+
+	res = 0;
+
+out:
+	kfree(tmp_root_path);
+	return res;
+}
diff --git a/include/linux/ceph/ceph_root.h b/include/linux/ceph/ceph_root.h
new file mode 100644
index 0000000..e6bae63
--- /dev/null
+++ b/include/linux/ceph/ceph_root.h
@@ -0,0 +1,10 @@
+/*
+ * Copyright (C) 2012 Codethink Ltd. <mark.doffman@codethink.co.uk>
+ *
+ * This file is released under the GPL v2
+ *
+ * ceph_root.h
+ */
+
+/* linux/fs/ceph/root.c */
+extern int ceph_root_data(char **root_device, char **root_data); /*__init*/
diff --git a/include/linux/root_dev.h b/include/linux/root_dev.h
index ed241aa..af6b182 100644
--- a/include/linux/root_dev.h
+++ b/include/linux/root_dev.h
@@ -16,6 +16,7 @@ enum {
 	Root_SDA2 = MKDEV(SCSI_DISK0_MAJOR, 2),
 	Root_HDC1 = MKDEV(IDE1_MAJOR, 1),
 	Root_SR0 = MKDEV(SCSI_CDROM_MAJOR, 0),
+	Root_CEPH = MKDEV(UNNAMED_MAJOR, 254),
 };
 
 extern dev_t ROOT_DEV;
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 8e5addc..d075020 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -33,6 +33,8 @@
 #include <linux/nfs_fs_sb.h>
 #include <linux/nfs_mount.h>
 
+#include <linux/ceph/ceph_root.h>
+
 #include "do_mounts.h"
 
 int __initdata rd_doload;	/* 1 = load RAM disk, 0 = don't load */
@@ -199,6 +201,7 @@ done:
  *	   a partition with a known unique id.
  *	8) <major>:<minor> major and minor number of the device separated by
  *	   a colon.
+ *	9) /dev/ceph represents Root_CEPH
  *
  *	If name doesn't have fall into the categories above, we return (0,0).
  *	block_class is used to check if something is a disk name. If the disk
@@ -245,7 +248,9 @@ dev_t name_to_dev_t(char *name)
 	res = Root_RAM0;
 	if (strcmp(name, "ram") == 0)
 		goto done;
-
+	res = Root_CEPH;
+	if (strcmp(name, "ceph") == 0)
+		goto done;
 	if (strlen(name) > 31)
 		goto fail;
 	strcpy(s, name);
@@ -473,6 +478,22 @@ static int __init mount_nfs_root(void)
 }
 #endif
 
+#ifdef CONFIG_ROOT_CEPH
+static int __init mount_ceph_root(void)
+{
+	char *root_dev, *root_data;
+
+	if (ceph_root_data(&root_dev, &root_data))
+		return 0;
+
+	if (do_mount_root(root_dev, "ceph",
+				root_mountflags, root_data))
+		return 0;
+
+	return 1;
+}
+#endif
+
 #if defined(CONFIG_BLK_DEV_RAM) || defined(CONFIG_BLK_DEV_FD)
 void __init change_floppy(char *fmt, ...)
 {
@@ -514,6 +535,15 @@ void __init mount_root(void)
 		ROOT_DEV = Root_FD0;
 	}
 #endif
+#ifdef CONFIG_ROOT_CEPH
+	if (ROOT_DEV == Root_CEPH) {
+		if (mount_ceph_root())
+			return;
+
+		printk(KERN_ERR "VFS: Unable to mount root fs via CephFS, trying floppy.\n");
+		ROOT_DEV = Root_FD0;
+	}
+#endif
 #ifdef CONFIG_BLK_DEV_FD
 	if (MAJOR(ROOT_DEV) == FLOPPY_MAJOR) {
 		/* rd_doload is 2 for a dual initrd/ramload setup */
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index efa1138..765eea4 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1435,10 +1435,10 @@ static int __init ip_auto_config(void)
 	 * missing values.
 	 */
 	if (ic_myaddr == NONE ||
-#ifdef CONFIG_ROOT_NFS
+#if defined(CONFIG_ROOT_NFS) || defined(CONFIG_ROOT_CEPH)
 	    (root_server_addr == NONE &&
 	     ic_servaddr == NONE &&
-	     ROOT_DEV == Root_NFS) ||
+	     (ROOT_DEV == Root_NFS || ROOT_DEV == Root_CEPH)) ||
 #endif
 	    ic_first_dev->next) {
 #ifdef IPCONFIG_DYNAMIC
@@ -1465,6 +1465,12 @@ static int __init ip_auto_config(void)
 				goto try_try_again;
 			}
 #endif
+#ifdef CONFIG_ROOT_CEPH
+			if (ROOT_DEV ==  Root_CEPH) {
+				pr_err("IP-Config: Retrying forever (CEPH root)...\n");
+				goto try_try_again;
+			}
+#endif
 
 			if (--retries) {
 				pr_err("IP-Config: Reopening network devices...\n");
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2 0/2] Add ceph root filesystem functionality and documentation.
From: mark.doffman-4yDnlxn2s6sWdaTGBSpHTA @ 2014-01-15 17:26 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Doffman, sage-4GqslpFJ+cxBDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	rob.taylor-4yDnlxn2s6sWdaTGBSpHTA
In-Reply-To: <1385000024-23463-1-git-send-email-mark.doffman-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>

From: Mark Doffman <mark.doffman-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>

Hi All,

The following is a second version of a patch series that adds the ability to use
a ceph distributed file system as the root device.

Changes from version 1

fs/ceph/root.c:

The parsing code that takes the DHCP option 17 and kernel command line
parameters has been extensively altered.

The parsing now accepts multiple monitor addresses and ipv6 addresses.

The monitors listed in DHCP option 17 are now concatenated with those
listed on the kernel command line.

The patch series applies to v3.13-rc8-7-g3539717

Thanks

Mark

Mark Doffman (1):
  init: Add a new root device option, the Ceph file system

Rob Taylor (1):
  Documentation: Document the cephroot functionality

 Documentation/filesystems/{ => ceph}/ceph.txt |   0
 Documentation/filesystems/ceph/cephroot.txt   |  86 +++++++++++++
 fs/ceph/Kconfig                               |  10 ++
 fs/ceph/Makefile                              |   1 +
 fs/ceph/root.c                                | 176 ++++++++++++++++++++++++++
 include/linux/ceph/ceph_root.h                |  10 ++
 include/linux/root_dev.h                      |   1 +
 init/do_mounts.c                              |  32 ++++-
 net/ipv4/ipconfig.c                           |  10 +-
 9 files changed, 323 insertions(+), 3 deletions(-)
 rename Documentation/filesystems/{ => ceph}/ceph.txt (100%)
 create mode 100644 Documentation/filesystems/ceph/cephroot.txt
 create mode 100644 fs/ceph/root.c
 create mode 100644 include/linux/ceph/ceph_root.h

-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v3 2/2] net:  Check skb->rxhash in gro_receive
From: Eric Dumazet @ 2014-01-15 17:25 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, Jerry Chu
In-Reply-To: <alpine.DEB.2.02.1401150853400.14933@tomh.mtv.corp.google.com>

On Wed, 2014-01-15 at 08:58 -0800, Tom Herbert wrote:
> When initializing a gro_list for a packet, first check the rxhash of
> the incoming skb against that of the skb's in the list. This should be
> a very strong inidicator of whether the flow is going to be matched,
> and potentially allows a lot of other checks to be short circuited.
> Use skb_hash_raw so that we don't force the hash to be calculated.
> 
> Tested by running netperf 200 TCP_STREAMs between two machines with
> GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
> of time in dev_gro_receive.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/core/dev.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 20c834e..c063c7c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3818,10 +3818,18 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
>  {
>  	struct sk_buff *p;
>  	unsigned int maclen = skb->dev->hard_header_len;
> +	u32 hash = skb_get_hash_raw(skb);
>  
>  	for (p = napi->gro_list; p; p = p->next) {
>  		unsigned long diffs;
>  
> +		NAPI_GRO_CB(p)->flush = 0;
> +
> +		if (hash != skb_get_hash_raw(p)) {
> +			NAPI_GRO_CB(p)->same_flow = 0;
> +			continue;
> +		}
> +
>  		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
>  		diffs |= p->vlan_tci ^ skb->vlan_tci;
>  		if (maclen == ETH_HLEN)
> @@ -3832,7 +3840,6 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
>  				       skb_gro_mac_header(skb),
>  				       maclen);
>  		NAPI_GRO_CB(p)->same_flow = !diffs;
> -		NAPI_GRO_CB(p)->flush = 0;
>  	}
>  }
>  

Acked-by: Eric Dumazet <edumazet@google.com>

Hmm, this looks like we should clear flush_id in ipv6 handler,
otherwise we might reuse a flush_id set from a prior gro invocation in
ipv4 (skb can be reused in napi_reuse_skb())

Jerry, what do you think of following fix ?

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 1e8683b135bb..598acd76ca4a 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -256,6 +256,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
                /* flush if Traffic Class fields are different */
                NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
                NAPI_GRO_CB(p)->flush |= flush;
+               NAPI_GRO_CB(p)->flush_id = 0;
        }
 
        NAPI_GRO_CB(skb)->flush |= flush;

^ permalink raw reply related

* Re: [PATCH 2/4] Documentation: Document the cephroot functionality
From: Mark Doffman @ 2014-01-15 17:22 UTC (permalink / raw)
  To: Sage Weil
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, Rob Taylor,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.DEB.2.00.1312062154530.1560-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>

Hi Sage,

On 12/06/2013 11:57 PM, Sage Weil wrote:
> On Wed, 20 Nov 2013, mark.doffman-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org wrote:
>> From: Rob Taylor <rob.taylor-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>>
>> Document using the cephfs as a root device, its purpose,
>> functionality and use.
>>
>> Signed-off-by: Mark Doffman <mark.doffman-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> Signed-off-by: Rob Taylor <rob.taylor-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> Reviewed-by: Ian Molton <ian.molton-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> ---
>>   Documentation/filesystems/{ => ceph}/ceph.txt |  0
>>   Documentation/filesystems/ceph/cephroot.txt   | 81 +++++++++++++++++++++++++++
>>   2 files changed, 81 insertions(+)
>>   rename Documentation/filesystems/{ => ceph}/ceph.txt (100%)
>>   create mode 100644 Documentation/filesystems/ceph/cephroot.txt
>>
>> diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph/ceph.txt
>> similarity index 100%
>> rename from Documentation/filesystems/ceph.txt
>> rename to Documentation/filesystems/ceph/ceph.txt
>> diff --git a/Documentation/filesystems/ceph/cephroot.txt b/Documentation/filesystems/ceph/cephroot.txt
>> new file mode 100644
>> index 0000000..ae0f5bb
>> --- /dev/null
>> +++ b/Documentation/filesystems/ceph/cephroot.txt
>> @@ -0,0 +1,81 @@
>> +Mounting the root filesystem via Ceph (cephroot)
>> +===============================================
>> +
>> +Written 2013 by Rob Taylor <rob.taylor-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> +
>> +derived from nfsroot.txt:
>> +
>> +Written 1996 by Gero Kuhlmann <gero-TuicA7gkpRym5h6znurzUg@public.gmane.org>
>> +Updated 1997 by Martin Mares <mj-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
>> +Updated 2006 by Nico Schottelius <nico-kernel-nfsroot-xuaVFQXs+5hIG4jRRZ66WA@public.gmane.org>
>> +Updated 2006 by Horms <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
>> +
>> +
>> +
>> +In order to use a diskless system, such as an X-terminal or printer server
>> +for example, it is necessary for the root filesystem to be present on a
>> +non-disk device. This may be an initramfs (see Documentation/filesystems/
>> +ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt), a
>> +filesystem mounted via NFS or a filesystem mounted via Ceph. The following
>> +text describes on how to use Ceph for the root filesystem.
>> +
>> +For the rest of this text 'client' means the diskless system, and 'server'
>> +means the Ceph server.
>> +
>> +
>> +1.) Enabling cephroot capabilities
>> +    -----------------------------
>> +
>> +In order to use cephroot, CEPH_FS needs to be selected as
>> +built-in during configuration. Once this has been selected, the cephroot
>> +option will become available, which should also be selected.
>> +
>> +In the networking options, kernel level autoconfiguration can be selected,
>> +along with the types of autoconfiguration to support. Selecting all of
>> +DHCP, BOOTP and RARP is safe.
>> +
>> +
>> +2.) Kernel command line
>> +    -------------------
>> +
>> +When the kernel has been loaded by a boot loader (see below) it needs to be
>> +told what root fs device to use. And in the case of cephroot, where to find
>> +both the server and the name of the directory on the server to mount as root.
>> +This can be established using the following kernel command line parameters:
>> +
>> +root=/dev/ceph
>> +
>> +This is necessary to enable the pseudo-Ceph-device. Note that it's not a
>> +real device but just a synonym to tell the kernel to use Ceph instead of
>> +a real device.
>> +
>> +cephroot=<monaddr>:/[<subdir>],<ceph-opts>
>> +
>> +  <monaddr>     Monitor address. Each takes the form host[:port]. If the port
>> +		is not specified, the Ceph default of 6789 is assumed.
>> +
>> +  <subdir>	A subdirectory subdir may be specified if a subset of the file
>> +		system is to be mounted
>> +
>> +  <ceph-opts>	Standard Ceph options. All options are separated by commas.
>> +		See Documentation/filesystems/ceph/ceph.txt for options and
>> +		their defaults.
>
> Maybe there is an existing convention here, but: it seems like it would be
> simpler to do something like
>
>   cephroot=<ip[:<port>][,...]>:/[<subdir>]
>
> i.e., the existing syntax used by mount, that (among other things) can
> also include a port, or be a list of mon ips, so that the parsing code
> can be re-used.  Then,
>
>   cephopts=<ceph-opts>
>
> Hopefully this would avoid the parsing in root.c and make things behave
> more consistently with respect to how mount(8) is used?

This would make things more consistent with mount, and easier! The 
reason to keep it the way it is is for consistency with NFS and DHCP 
option 17.

NFS concatenates the options in DHCP root-path (option 17) with the ones 
placed on the kernel command line. We could separate out the device and 
path strings from the options, but they would still be merged together 
in the DHCP string. Some parsing would still be required to split the 
DHCP string and merge with command line options. I'd prefer to keep them 
together on the command line also, just to have things stay similar to NFS.

Thanks

Mark

>
> sage
>
>> +
>> +4.) References
>> +    ----------
>> +
>> +
>> +5.) Credits
>> +    -------
>> +
>> +  cephroot was derived from nfsroot by Rob Taylor <rob.taylor-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> +  and Mark Doffman <mark.doffman-4yDnlxn2s6sWdaTGBSpHTA@public.gmane.org>
>> +
>> +  The nfsroot code in the kernel and the RARP support have been written
>> +  by Gero Kuhlmann <gero-TuicA7gkpRym5h6znurzUg@public.gmane.org>.
>> +
>> +  The rest of the IP layer autoconfiguration code has been written
>> +  by Martin Mares <mj-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>.
>> +
>> +  In order to write the initial version of nfsroot I would like to thank
>> +  Jens-Uwe Mager <jum-gG2S4stXkm6Shm5Tz/htGQ@public.gmane.org> for his help.
>> --
>> 1.8.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] net: Add skb_get_hash_raw
From: Eric Dumazet @ 2014-01-15 17:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.02.1401150853090.14881@tomh.mtv.corp.google.com>

On Wed, 2014-01-15 at 08:57 -0800, Tom Herbert wrote:
> Function to just return skb->rxhash without checking to see if it needs
> to be recomputed.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  include/linux/skbuff.h | 5 +++++
>  1 file changed, 5 insertions(+)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH net-next v2] xen-netback: Rework rx_work_todo
From: Zoltan Kiss @ 2014-01-15 17:11 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
  Cc: Zoltan Kiss

The recent patch to fix receive side flow control (11b57f) solved the spinning
thread problem, however caused an another one. The receive side can stall, if:
- [THREAD] xenvif_rx_action sets rx_queue_stopped to true
- [INTERRUPT] interrupt happens, and sets rx_event to true
- [THREAD] then xenvif_kthread sets rx_event to false
- [THREAD] rx_work_todo doesn't return true anymore

Also, if interrupt sent but there is still no room in the ring, it take quite a
long time until xenvif_rx_action realize it. This patch ditch that two variable,
and rework rx_work_todo. If the thread finds it can't fit more skb's into the
ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
kept as 0. Then rx_work_todo will check if:
- there is something to send to the ring (like before)
- there is space for the topmost packet in the queue

I think that's more natural and optimal thing to test than two bool which are
set somewhere else.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    6 +-----
 drivers/net/xen-netback/interface.c |    1 -
 drivers/net/xen-netback/netback.c   |   16 ++++++----------
 3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 4c76bcb..ae413a2 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -143,11 +143,7 @@ struct xenvif {
 	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
-	bool rx_queue_stopped;
-	/* Set when the RX interrupt is triggered by the frontend.
-	 * The worker thread may need to wake the queue.
-	 */
-	bool rx_event;
+	RING_IDX rx_last_skb_slots;
 
 	/* This array is allocated seperately as it is large */
 	struct gnttab_copy *grant_copy_op;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index b9de31e..7669d49 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -100,7 +100,6 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
-	vif->rx_event = true;
 	xenvif_kick_thread(vif);
 
 	return IRQ_HANDLED;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 2738563..bb241d0 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -477,7 +477,6 @@ static void xenvif_rx_action(struct xenvif *vif)
 	unsigned long offset;
 	struct skb_cb_overlay *sco;
 	bool need_to_notify = false;
-	bool ring_full = false;
 
 	struct netrx_pending_operations npo = {
 		.copy  = vif->grant_copy_op,
@@ -487,7 +486,7 @@ static void xenvif_rx_action(struct xenvif *vif)
 	skb_queue_head_init(&rxq);
 
 	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
-		int max_slots_needed;
+		RING_IDX max_slots_needed;
 		int i;
 
 		/* We need a cheap worse case estimate for the number of
@@ -510,9 +509,10 @@ static void xenvif_rx_action(struct xenvif *vif)
 		if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
 			skb_queue_head(&vif->rx_queue, skb);
 			need_to_notify = true;
-			ring_full = true;
+			vif->rx_last_skb_slots = max_slots_needed;
 			break;
-		}
+		} else
+			vif->rx_last_skb_slots = 0;
 
 		sco = (struct skb_cb_overlay *)skb->cb;
 		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
@@ -523,8 +523,6 @@ static void xenvif_rx_action(struct xenvif *vif)
 
 	BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
 
-	vif->rx_queue_stopped = !npo.copy_prod && ring_full;
-
 	if (!npo.copy_prod)
 		goto done;
 
@@ -1727,8 +1725,8 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 
 static inline int rx_work_todo(struct xenvif *vif)
 {
-	return (!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) ||
-		vif->rx_event;
+	return !skb_queue_empty(&vif->rx_queue) &&
+	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
 }
 
 static inline int tx_work_todo(struct xenvif *vif)
@@ -1814,8 +1812,6 @@ int xenvif_kthread(void *data)
 		if (!skb_queue_empty(&vif->rx_queue))
 			xenvif_rx_action(vif);
 
-		vif->rx_event = false;
-
 		if (skb_queue_empty(&vif->rx_queue) &&
 		    netif_queue_stopped(vif->dev))
 			xenvif_start_queue(vif);

^ permalink raw reply related

* RE: [Xen-devel] [PATCH net-next] xen-netfront: add support for IPv6 offloads
From: Paul Durrant @ 2014-01-15 17:08 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: David Vrabel, xen-devel@lists.xen.org, Boris Ostrovsky,
	netdev@vger.kernel.org
In-Reply-To: <52D6BF630200007800113EFB@nat28.tlf.novell.com>

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 15 January 2014 16:04
> To: Andrew Cooper; Paul Durrant
> Cc: David Vrabel; xen-devel@lists.xen.org; Boris Ostrovsky;
> netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: add support for
> IPv6 offloads
> 
> >>> On 15.01.14 at 16:54, Paul Durrant <Paul.Durrant@citrix.com> wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> On 15/01/14 15:18, Paul Durrant wrote:
> >> > +	err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
> "%d", 1);
> >>
> >> "%d", 1 results in a constant string.  xenbus_write() would avoid a
> >> transitory memory allocation.
> >
> > This code is consistent with all the other xenbus_printf()s in the
> > neighbourhood and does it really matter?
> 
> I think we should always strive to have the simplest possible code
> that fulfills the purpose. And hence we shouldn't be setting further
> bad precedents. (In fact I have a patch queued to replace all the
> unnecessary xenbus_printf()s with xenbus_write()s on
> linux-2.6.18-xen.hg, and may look into porting this to the
> respective upstream components.)
> 

Ok. Personally I'd go for code consistency with this patch and then a full replacement... but I'll re-work it.

  Paul

^ permalink raw reply

* [PATCH net-next v3 2/2] net:  Check skb->rxhash in gro_receive
From: Tom Herbert @ 2014-01-15 16:58 UTC (permalink / raw)
  To: davem, netdev

When initializing a gro_list for a packet, first check the rxhash of
the incoming skb against that of the skb's in the list. This should be
a very strong inidicator of whether the flow is going to be matched,
and potentially allows a lot of other checks to be short circuited.
Use skb_hash_raw so that we don't force the hash to be calculated.

Tested by running netperf 200 TCP_STREAMs between two machines with
GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
of time in dev_gro_receive.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 20c834e..c063c7c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3818,10 +3818,18 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 {
 	struct sk_buff *p;
 	unsigned int maclen = skb->dev->hard_header_len;
+	u32 hash = skb_get_hash_raw(skb);
 
 	for (p = napi->gro_list; p; p = p->next) {
 		unsigned long diffs;
 
+		NAPI_GRO_CB(p)->flush = 0;
+
+		if (hash != skb_get_hash_raw(p)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
 		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
 		diffs |= p->vlan_tci ^ skb->vlan_tci;
 		if (maclen == ETH_HLEN)
@@ -3832,7 +3840,6 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 				       skb_gro_mac_header(skb),
 				       maclen);
 		NAPI_GRO_CB(p)->same_flow = !diffs;
-		NAPI_GRO_CB(p)->flush = 0;
 	}
 }
 
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH net-next v3 1/2] net: Add skb_get_hash_raw
From: Tom Herbert @ 2014-01-15 16:57 UTC (permalink / raw)
  To: davem, netdev

Function to just return skb->rxhash without checking to see if it needs
to be recomputed.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 48b7605..1f689e6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -771,6 +771,11 @@ static inline __u32 skb_get_hash(struct sk_buff *skb)
 	return skb->rxhash;
 }
 
+static inline __u32 skb_get_hash_raw(const struct sk_buff *skb)
+{
+	return skb->rxhash;
+}
+
 static inline void skb_clear_hash(struct sk_buff *skb)
 {
 	skb->rxhash = 0;
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH net-next] vxge: make local functions static
From: Stephen Hemminger @ 2014-01-15 16:28 UTC (permalink / raw)
  To: Jon Mason, David Miller; +Cc: netdev

Remove unused function vxge_hw_vpath_vid_get

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

---
 drivers/net/ethernet/neterion/vxge/vxge-config.c  |    2 -
 drivers/net/ethernet/neterion/vxge/vxge-main.c    |    1 
 drivers/net/ethernet/neterion/vxge/vxge-main.h    |    1 
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c |   37 +---------------------
 drivers/net/ethernet/neterion/vxge/vxge-traffic.h |    8 ----
 5 files changed, 4 insertions(+), 45 deletions(-)

--- a/drivers/net/ethernet/neterion/vxge/vxge-config.c	2014-01-15 08:10:24.862145270 -0800
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c	2014-01-15 08:17:28.164022998 -0800
@@ -2148,7 +2148,7 @@ __vxge_hw_ring_mempool_item_alloc(struct
  * __vxge_hw_ring_replenish - Initial replenish of RxDs
  * This function replenishes the RxDs from reserve array to work array
  */
-enum vxge_hw_status
+static enum vxge_hw_status
 vxge_hw_ring_replenish(struct __vxge_hw_ring *ring)
 {
 	void *rxd;
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c	2014-01-15 08:10:24.862145270 -0800
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c	2014-01-15 08:17:28.164022998 -0800
@@ -87,6 +87,7 @@ static unsigned int bw_percentage[VXGE_H
 module_param_array(bw_percentage, uint, NULL, 0);
 
 static struct vxge_drv_config *driver_config;
+static enum vxge_hw_status vxge_reset_all_vpaths(struct vxgedev *vdev);
 
 static inline int is_vxge_card_up(struct vxgedev *vdev)
 {
@@ -1971,7 +1972,7 @@ static enum vxge_hw_status vxge_rth_conf
 }
 
 /* reset vpaths */
-enum vxge_hw_status vxge_reset_all_vpaths(struct vxgedev *vdev)
+static enum vxge_hw_status vxge_reset_all_vpaths(struct vxgedev *vdev)
 {
 	enum vxge_hw_status status = VXGE_HW_OK;
 	struct vxge_vpath *vpath;
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.h	2014-01-15 08:10:24.862145270 -0800
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.h	2014-01-15 08:17:28.164022998 -0800
@@ -427,7 +427,6 @@ void vxge_os_timer(struct timer_list *ti
 }
 
 void vxge_initialize_ethtool_ops(struct net_device *ndev);
-enum vxge_hw_status vxge_reset_all_vpaths(struct vxgedev *vdev);
 int vxge_fw_upgrade(struct vxgedev *vdev, char *fw_name, int override);
 
 /* #define VXGE_DEBUG_INIT: debug for initialization functions
--- a/drivers/net/ethernet/neterion/vxge/vxge-traffic.c	2014-01-15 08:10:24.862145270 -0800
+++ b/drivers/net/ethernet/neterion/vxge/vxge-traffic.c	2014-01-15 08:17:28.168022940 -0800
@@ -1956,8 +1956,7 @@ exit:
  * @vid: vlan id to be added for this vpath into the list
  *
  * Adds the given vlan id into the list for this  vpath.
- * see also: vxge_hw_vpath_vid_delete, vxge_hw_vpath_vid_get and
- * vxge_hw_vpath_vid_get_next
+ * see also: vxge_hw_vpath_vid_delete
  *
  */
 enum vxge_hw_status
@@ -1979,45 +1978,13 @@ exit:
 }
 
 /**
- * vxge_hw_vpath_vid_get - Get the first vid entry for this vpath
- *               from vlan id table.
- * @vp: Vpath handle.
- * @vid: Buffer to return vlan id
- *
- * Returns the first vlan id in the list for this vpath.
- * see also: vxge_hw_vpath_vid_get_next
- *
- */
-enum vxge_hw_status
-vxge_hw_vpath_vid_get(struct __vxge_hw_vpath_handle *vp, u64 *vid)
-{
-	u64 data;
-	enum vxge_hw_status status = VXGE_HW_OK;
-
-	if (vp == NULL) {
-		status = VXGE_HW_ERR_INVALID_HANDLE;
-		goto exit;
-	}
-
-	status = __vxge_hw_vpath_rts_table_get(vp,
-			VXGE_HW_RTS_ACCESS_STEER_CTRL_ACTION_LIST_FIRST_ENTRY,
-			VXGE_HW_RTS_ACCESS_STEER_CTRL_DATA_STRUCT_SEL_VID,
-			0, vid, &data);
-
-	*vid = VXGE_HW_RTS_ACCESS_STEER_DATA0_GET_VLAN_ID(*vid);
-exit:
-	return status;
-}
-
-/**
  * vxge_hw_vpath_vid_delete - Delete the vlan id entry for this vpath
  *               to vlan id table.
  * @vp: Vpath handle.
  * @vid: vlan id to be added for this vpath into the list
  *
  * Adds the given vlan id into the list for this  vpath.
- * see also: vxge_hw_vpath_vid_add, vxge_hw_vpath_vid_get and
- * vxge_hw_vpath_vid_get_next
+ * see also: vxge_hw_vpath_vid_add
  *
  */
 enum vxge_hw_status
--- a/drivers/net/ethernet/neterion/vxge/vxge-traffic.h	2014-01-15 08:10:24.862145270 -0800
+++ b/drivers/net/ethernet/neterion/vxge/vxge-traffic.h	2014-01-15 08:17:28.168022940 -0800
@@ -1918,9 +1918,6 @@ vxge_hw_ring_rxd_post_post(
 	struct __vxge_hw_ring *ring_handle,
 	void *rxdh);
 
-enum vxge_hw_status
-vxge_hw_ring_replenish(struct __vxge_hw_ring *ring_handle);
-
 void
 vxge_hw_ring_rxd_post_post_wmb(
 	struct __vxge_hw_ring *ring_handle,
@@ -2186,11 +2183,6 @@ vxge_hw_vpath_vid_add(
 	u64			vid);
 
 enum vxge_hw_status
-vxge_hw_vpath_vid_get(
-	struct __vxge_hw_vpath_handle *vpath_handle,
-	u64			*vid);
-
-enum vxge_hw_status
 vxge_hw_vpath_vid_delete(
 	struct __vxge_hw_vpath_handle *vpath_handle,
 	u64			vid);

^ permalink raw reply

* Re: throughput problems with realtek
From: Dmitry Kasatkin @ 2014-01-15 16:25 UTC (permalink / raw)
  To: Rick Jones; +Cc: nic_swsd, romieu, netdev, l.moiseichuk
In-Reply-To: <52D6B5F4.1060201@hp.com>

On Wed, Jan 15, 2014 at 6:23 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 01/15/2014 03:56 AM, Dmitry Kasatkin wrote:
>>
>> Hi,
>>
>> We have several devices with such adapter..
>>
>> Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
>> PCI Express Gigabit Ethernet Controller (rev 06)
>> See output of the lspci -vvv bellow...
>>
>> And I suddenly investigated throughput issues..
>>
>> After couple minutes of running 'iperf -c server' transmission speed
>> drops substantially...
>>
>> [  4]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
>> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60508
>> [  5]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
>> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60509
>> [  4]  0.0-10.0 sec  1.10 GBytes   949 Mbits/sec
>> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60510
>> [  5]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
>> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60511
>> [  4]  0.0-10.0 sec   626 MBytes   525 Mbits/sec
>> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60512
>> [  5]  0.0-10.0 sec  84.4 MBytes  70.5 Mbits/sec
>> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60513
>> [  4]  0.0-10.0 sec  87.4 MBytes  73.0 Mbits/sec
>> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port
>> 60514
>>
>>
>> But it seems after certain time of inactivity (low load) speed will be
>> up again...
>>
>> It happens almost the same way on desktop machines and also on Samsung
>> Series 7 laptop NP770Z5E...
>>
>> Does anyone have any ideas about it?
>>
>
> The card flipping back and forth between 1000 and 100 Mbit/s operation
> perhaps?
>
> rick jones


I do not see any link speed changes... it stays the same...
The same problem is visible on absolutely different computers.

-- 
Thanks,
Dmitry

^ permalink raw reply

* [PATCH net-next] bnad: code cleanup
From: Stephen Hemminger @ 2014-01-15 16:24 UTC (permalink / raw)
  To: Rasesh Mody, David Miller; +Cc: netdev

Use 'make namespacecheck' to code that could be declared static.
After that remove code that is not being used.

Compile tested only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

---
 drivers/net/ethernet/brocade/bna/bfa_ioc.c |   27 +--------------------------
 drivers/net/ethernet/brocade/bna/bnad.c    |    2 +-
 2 files changed, 2 insertions(+), 27 deletions(-)

--- a/drivers/net/ethernet/brocade/bna/bnad.c	2014-01-14 09:46:15.261710097 -0800
+++ b/drivers/net/ethernet/brocade/bna/bnad.c	2014-01-14 09:47:43.800754593 -0800
@@ -2108,7 +2108,7 @@ bnad_rx_ctrl_init(struct bnad *bnad, u32
 }
 
 /* Called with mutex_lock(&bnad->conf_mutex) held */
-u32
+static u32
 bnad_reinit_rx(struct bnad *bnad)
 {
 	struct net_device *netdev = bnad->netdev;
--- a/drivers/net/ethernet/brocade/bna/bfa_ioc.c	2014-01-14 09:46:15.261710097 -0800
+++ b/drivers/net/ethernet/brocade/bna/bfa_ioc.c	2014-01-14 09:47:43.800754593 -0800
@@ -1147,25 +1147,6 @@ bfa_nw_ioc_sem_release(void __iomem *sem
 	writel(1, sem_reg);
 }
 
-/* Invalidate fwver signature */
-enum bfa_status
-bfa_nw_ioc_fwsig_invalidate(struct bfa_ioc *ioc)
-{
-	u32	pgnum, pgoff;
-	u32	loff = 0;
-	enum bfi_ioc_state ioc_fwstate;
-
-	ioc_fwstate = bfa_ioc_get_cur_ioc_fwstate(ioc);
-	if (!bfa_ioc_state_disabled(ioc_fwstate))
-		return BFA_STATUS_ADAPTER_ENABLED;
-
-	pgnum = bfa_ioc_smem_pgnum(ioc, loff);
-	pgoff = PSS_SMEM_PGOFF(loff);
-	writel(pgnum, ioc->ioc_regs.host_page_num_fn);
-	writel(BFI_IOC_FW_INV_SIGN, ioc->ioc_regs.smem_page_start + loff);
-	return BFA_STATUS_OK;
-}
-
 /* Clear fwver hdr */
 static void
 bfa_ioc_fwver_clear(struct bfa_ioc *ioc)
@@ -1780,15 +1761,9 @@ bfa_flash_raw_read(void __iomem *pci_bar
 	return BFA_STATUS_OK;
 }
 
-u32
-bfa_nw_ioc_flash_img_get_size(struct bfa_ioc *ioc)
-{
-	return BFI_FLASH_IMAGE_SZ/sizeof(u32);
-}
-
 #define BFA_FLASH_PART_FWIMG_ADDR	0x100000 /* fw image address */
 
-enum bfa_status
+static enum bfa_status
 bfa_nw_ioc_flash_img_get_chnk(struct bfa_ioc *ioc, u32 off,
 			      u32 *fwimg)
 {

^ permalink raw reply

* Re: throughput problems with realtek
From: Rick Jones @ 2014-01-15 16:23 UTC (permalink / raw)
  To: Dmitry Kasatkin, nic_swsd, romieu, netdev; +Cc: l.moiseichuk
In-Reply-To: <CACE9dm_3jw08_dfXRJRMQ=r4X1NZ1kHF6TZopFSNy3k+DCKgTA@mail.gmail.com>

On 01/15/2014 03:56 AM, Dmitry Kasatkin wrote:
> Hi,
>
> We have several devices with such adapter..
>
> Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
> PCI Express Gigabit Ethernet Controller (rev 06)
> See output of the lspci -vvv bellow...
>
> And I suddenly investigated throughput issues..
>
> After couple minutes of running 'iperf -c server' transmission speed
> drops substantially...
>
> [  4]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60508
> [  5]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60509
> [  4]  0.0-10.0 sec  1.10 GBytes   949 Mbits/sec
> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60510
> [  5]  0.0-10.0 sec  1.10 GBytes   948 Mbits/sec
> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60511
> [  4]  0.0-10.0 sec   626 MBytes   525 Mbits/sec
> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60512
> [  5]  0.0-10.0 sec  84.4 MBytes  70.5 Mbits/sec
> [  4] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60513
> [  4]  0.0-10.0 sec  87.4 MBytes  73.0 Mbits/sec
> [  5] local 106.122.1.113 port 5001 connected with 106.122.1.121 port 60514
>
>
> But it seems after certain time of inactivity (low load) speed will be
> up again...
>
> It happens almost the same way on desktop machines and also on Samsung
> Series 7 laptop NP770Z5E...
>
> Does anyone have any ideas about it?
>

The card flipping back and forth between 1000 and 100 Mbit/s operation 
perhaps?

rick jones

^ permalink raw reply

* RE: i40e sym version file
From: Nelson, Shannon @ 2014-01-15 16:22 UTC (permalink / raw)
  To: Stephen Hemminger, Kirsher, Jeffrey T, David Miller
  Cc: netdev@vger.kernel.org, Brown, Aaron F
In-Reply-To: <20140115081645.484c124d@nehalam.linuxnetplumber.net>

Ooo, ick, that shouldn't be there.  Jeff is on sabbatical and Aaron is covering, I'll work with Aaron to find what happened and get it straightened out.

Thanks,
sln

________________________________________
From: Stephen Hemminger [stephen@networkplumber.org]
Sent: Wednesday, January 15, 2014 8:16 AM
To: Nelson, Shannon; Kirsher, Jeffrey T; David Miller
Cc: netdev@vger.kernel.org
Subject: i40e sym version file

The latest pull from Intel added a file to net-next git repo which is not
supposed to be part of build, it is a derived file:
 create mode 100644 drivers/net/ethernet/intel/i40e/Module.symvers


bad commit was
commit 9d8bf54723e9f21c502a410495840d8771f769ef
Author: Shannon Nelson <shannon.nelson@intel.com>
Date:   Tue Jan 14 00:49:50 2014 -0800

    i40e: associate VMDq queue with VM type

    Fix a bug where the queue was not associated with the right set-up
    within the hardware.  The fix is to use the right QTX_CTL VSI type
    when associating it to the VSI.

    Change-ID: I65ef6c5a8205601c640a6593e4b7e78d6ba45545
    Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Tested-by: Sibai Li <sibai.li@intel.com>
    Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH-next v2] net/ipv4: don't use module_init in non-modular gre_offload
From: Paul Gortmaker @ 2014-01-15 16:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Paul Gortmaker, Eric Dumazet

Recent commit 438e38fadca2f6e57eeecc08326c8a95758594d4
("gre_offload: statically build GRE offloading support") added
new module_init/module_exit calls to the gre_offload.c file.

The file is obj-y and can't be anything other than built-in.
Currently it can never be built modular, so using module_init
as an alias for __initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.  We also make the inclusion explicit.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

As for the module_exit, rather than replace it with __exitcall,
we simply remove it, since it appears only UML does anything
with those, and even for UML, there is no relevant cleanup
to be done here.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---

v2: dump gre_offload_exit entirely as suggested by Eric.

 net/ipv4/gre_offload.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 29512e3e7e7c..f1d32280cb54 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -11,6 +11,7 @@
  */
 
 #include <linux/skbuff.h>
+#include <linux/init.h>
 #include <net/protocol.h>
 #include <net/gre.h>
 
@@ -283,11 +284,4 @@ static int __init gre_offload_init(void)
 {
 	return inet_add_offload(&gre_offload, IPPROTO_GRE);
 }
-
-static void __exit gre_offload_exit(void)
-{
-	inet_del_offload(&gre_offload, IPPROTO_GRE);
-}
-
-module_init(gre_offload_init);
-module_exit(gre_offload_exit);
+device_initcall(gre_offload_init);
-- 
1.8.5.2

^ permalink raw reply related

* Re: [net-next v4 3/7] ixgbe: Use static inlines instead of macros
From: Rustad, Mark D @ 2014-01-15 16:16 UTC (permalink / raw)
  To: Joe Perches
  Cc: Brown, Aaron F, David Miller, Netdev, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <1389755839.14001.6.camel@joe-AO722>

On Jan 14, 2014, at 7:17 PM, Joe Perches <joe@perches.com> wrote:

> On Tue, 2014-01-14 at 18:53 -0800, Aaron Brown wrote:
>> From: Mark Rustad <mark.d.rustad@intel.com>
> 
> trivia:
> 
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
> []
>> @@ -124,24 +124,40 @@ s32 ixgbe_reset_pipeline_82599(struct ixgbe_hw *hw);
> []
>> -#define IXGBE_WRITE_REG(a, reg, value) writel((value), ((a)->hw_addr + (reg)))
>> +static inline void ixgbe_write_reg(struct ixgbe_hw *hw, u32 reg, u32 value)
>> +{
>> +	writel(value, hw->hw_addr + reg);
>> +}
>> +#define IXGBE_WRITE_REG(a, reg, value) ixgbe_write_reg((a), (reg), (value))
> 
> There's no real value in adding parentheses to these macros.

I suppose that is true in this case. I have it so ingrained to always put parens around the macro parameter references, that I just automatically do it. Still, it makes it safer for any future changes, though the most likely next change here will be deletion anyway. :-)

-- 
Mark Rustad, Networking Division, Intel Corporation

^ permalink raw reply

* i40e sym version file
From: Stephen Hemminger @ 2014-01-15 16:16 UTC (permalink / raw)
  To: Shannon Nelson, Jeff Kirsher, David Miller; +Cc: netdev

The latest pull from Intel added a file to net-next git repo which is not
supposed to be part of build, it is a derived file:
 create mode 100644 drivers/net/ethernet/intel/i40e/Module.symvers


bad commit was
commit 9d8bf54723e9f21c502a410495840d8771f769ef
Author: Shannon Nelson <shannon.nelson@intel.com>
Date:   Tue Jan 14 00:49:50 2014 -0800

    i40e: associate VMDq queue with VM type
    
    Fix a bug where the queue was not associated with the right set-up
    within the hardware.  The fix is to use the right QTX_CTL VSI type
    when associating it to the VSI.
    
    Change-ID: I65ef6c5a8205601c640a6593e4b7e78d6ba45545
    Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Tested-by: Sibai Li <sibai.li@intel.com>
    Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH net-next] netfilter: remove double colon
From: Stephen Hemminger @ 2014-01-15 16:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, David S. Miller; +Cc: netdev, netfilter-devel

This is C not shell script

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/net/ipv4/netfilter.c	2013-12-31 17:45:31.993942921 -0800
+++ b/net/ipv4/netfilter.c	2014-01-15 08:10:49.793785943 -0800
@@ -61,7 +61,7 @@ int ip_route_me_harder(struct sk_buff *s
 		skb_dst_set(skb, NULL);
 		dst = xfrm_lookup(net, dst, flowi4_to_flowi(&fl4), skb->sk, 0);
 		if (IS_ERR(dst))
-			return PTR_ERR(dst);;
+			return PTR_ERR(dst);
 		skb_dst_set(skb, dst);
 	}
 #endif

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: Rework rx_work_todo
From: Wei Liu @ 2014-01-15 16:10 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <52D6A45C.1060705@citrix.com>

On Wed, Jan 15, 2014 at 03:08:12PM +0000, Zoltan Kiss wrote:
> On 15/01/14 14:59, Wei Liu wrote:
> >On Wed, Jan 15, 2014 at 02:52:41PM +0000, Zoltan Kiss wrote:
> >>On 15/01/14 14:45, Wei Liu wrote:
> >>>>>>The recent patch to fix receive side flow control (11b57f) solved the spinning
> >>>>>>>>>thread problem, however caused an another one. The receive side can stall, if:
> >>>>>>>>>- xenvif_rx_action sets rx_queue_stopped to false
> >>>>>>>>>- interrupt happens, and sets rx_event to true
> >>>>>>>>>- then xenvif_kthread sets rx_event to false
> >>>>>>>>>
> >>>>>>>
> >>>>>>>If you mean "rx_work_todo" returns false.
> >>>>>>>
> >>>>>>>In this case
> >>>>>>>
> >>>>>>>(!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || vif->rx_event;
> >>>>>>>
> >>>>>>>can still be true, can't it?
> >>>>>Sorry, I should wrote rx_queue_stopped to true
> >>>>>
> >>>In this case, if rx_queue_stopped is true, then we're expecting frontend
> >>>to notify us, right?
> >>>
> >>>rx_queue_stopped is set to true if we cannot make any progress to queue
> >>>packet into the ring. In that situation we can expect frontend will send
> >>>notification to backend after it goes through the backlog in the ring.
> >>>That means rx_event is set to true, and rx_work_todo is true again. So
> >>>the ring is actually not stalled in this case as well. Did I miss
> >>>something?
> >>>
> >>
> >>Yes, we expect the guest to notify us, and it does, and we set
> >>rx_event to true (see second point), but then the thread set it to
> >>false (see third point). Talking with Paul, another solution could
> >>be to set rx_event false before calling xenvif_rx_action. But using
> >>rx_last_skb_slots makes it quicker for the thread to see if it
> >>doesn't have to do anything.
> >>
> >
> >OK, this is a better explaination. So actually there's no bug in the
> >original implementation and your patch is sort of an improvement.
> >
> >Could you send a new version of this patch with relevant information in
> >commit message? Talking to people offline is faster, but I would like to
> >have public discussion and relevant information archived in a searchable
> >form. Thanks.
> 
> No, there is a bug in the original implementation:
> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
> - [INTERRUPT] interrupt happens, and sets rx_event to true
> - [THREAD] then xenvif_kthread sets rx_event to false
> - [THREAD] rx_work_todo never returns true anymore
> 

I see what you mean. The interrupt is "lost", that's why it's stalled.

> I will update the explanation and send in the patch again.
> 

Thanks.

Wei.

> Zoli

^ permalink raw reply

* Re: [PATCH v2 net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 16:09 UTC (permalink / raw)
  To: Matt Evans
  Cc: Heiko Carstens, Martin Schwidefsky, Hannes Frederic Sowa, netdev,
	dborkman, darkjames-ws, Mircea Gherzan, Russell King
In-Reply-To: <927a6073e8f73b53027480e7609b4c53@ozlabs.org>

On Wed, 2014-01-15 at 15:10 +0000, Matt Evans wrote:
> Hi Eric,
...
> PPC looks fine; I had a look at the core/ARM parts which also look good.
> 
> I'd forgotten where the DIV0 checking occurred, so I also benefited from 
> your hint to Heiko. :)
> 

Thanks for reviewing !

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox