Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v6 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: David Daney @ 2012-05-03  1:16 UTC (permalink / raw)
  To: Ralf Baechle, Grant Likely, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, David S. Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	afleming-Re5JQEeQqe8AvxtiuMwx3w, David Daney,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

From: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

This code has been working well for about six months on a couple of
different configurations (boards), so I thought it would be a good
time to send it out again, and I hope get it on the path towards
merging.

v6: Correct Kconfig depends in 2/3 as noticed by David Miller.  Test
    against net-next.

v5: Correct Kconfig depends in 3/3 as noticed by David Miller.

v4: Correct some comment text and rename a couple of variables to
    better reflect their purpose.

v3: Update binding to use "mdio-mux-gpio" compatible property.
    Cleanups suggested by Grant Likely.  Now uses the driver probe
    deferral mechanism if GPIOs or parent bus not available.

v2: Update bindings to use "reg" and "mdio-parent-bus" instead of
    "cell-index" and "parent-bus"

v1:

We have several different boards with a multiplexer in the MDIO bus.
There is an MDIO bus controller connected to a switching device with
several child MDIO busses.

Everything is wired up using device tree bindings.

 1/3 - New of_mdio_find_bus() function used to help configuring the
       driver topology.

 2/3 - MDIO bus multiplexer framework.

 3/3 - A driver for a GPIO controlled multiplexer.

David Daney (3):
  netdev/of/phy: New function: of_mdio_find_bus().
  netdev/of/phy: Add MDIO bus multiplexer support.
  netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.

 .../devicetree/bindings/net/mdio-mux-gpio.txt      |  127 +++++++++++++
 Documentation/devicetree/bindings/net/mdio-mux.txt |  136 ++++++++++++++
 drivers/net/phy/Kconfig                            |   19 ++
 drivers/net/phy/Makefile                           |    2 +
 drivers/net/phy/mdio-mux-gpio.c                    |  142 +++++++++++++++
 drivers/net/phy/mdio-mux.c                         |  192 ++++++++++++++++++++
 drivers/net/phy/mdio_bus.c                         |   32 ++++
 drivers/of/of_mdio.c                               |    2 +
 include/linux/mdio-mux.h                           |   21 ++
 include/linux/of_mdio.h                            |    2 +
 10 files changed, 675 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux.txt
 create mode 100644 drivers/net/phy/mdio-mux-gpio.c
 create mode 100644 drivers/net/phy/mdio-mux.c
 create mode 100644 include/linux/mdio-mux.h

-- 
1.7.2.3

^ permalink raw reply

* [PATCH v6 1/3] netdev/of/phy: New function: of_mdio_find_bus().
From: David Daney @ 2012-05-03  1:16 UTC (permalink / raw)
  To: Ralf Baechle, Grant Likely, Rob Herring, devicetree-discuss,
	David S. Miller, netdev
  Cc: linux-kernel, linux-mips, afleming, galak, David Daney,
	Grant Likely, David S. Miller
In-Reply-To: <1336007799-31016-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

Add of_mdio_find_bus() which allows an mii_bus to be located given its
associated the device tree node.

This is needed by the follow-on patch to add a driver for MDIO bus
multiplexers.

The of_mdiobus_register() function is modified so that the device tree
node is recorded in the mii_bus.  Then we can find it again by
iterating over all mdio_bus_class devices.

Because the OF device tree has now become an integral part of the
kernel, this can live in mdio_bus.c (which contains the needed
mdio_bus_class structure) instead of of_mdio.c.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/phy/mdio_bus.c |   32 ++++++++++++++++++++++++++++++++
 drivers/of/of_mdio.c       |    2 ++
 include/linux/of_mdio.h    |    2 ++
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 8985cc6..83d5c9f 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -88,6 +88,38 @@ static struct class mdio_bus_class = {
 	.dev_release	= mdiobus_release,
 };
 
+#ifdef CONFIG_OF_MDIO
+/* Helper function for of_mdio_find_bus */
+static int of_mdio_bus_match(struct device *dev, void *mdio_bus_np)
+{
+	return dev->of_node == mdio_bus_np;
+}
+/**
+ * of_mdio_find_bus - Given an mii_bus node, find the mii_bus.
+ * @mdio_np: Pointer to the mii_bus.
+ *
+ * Returns a pointer to the mii_bus, or NULL if none found.
+ *
+ * Because the association of a device_node and mii_bus is made via
+ * of_mdiobus_register(), the mii_bus cannot be found before it is
+ * registered with of_mdiobus_register().
+ *
+ */
+struct mii_bus *of_mdio_find_bus(struct device_node *mdio_bus_np)
+{
+	struct device *d;
+
+	if (!mdio_bus_np)
+		return NULL;
+
+	d = class_find_device(&mdio_bus_class, NULL,  mdio_bus_np,
+			      of_mdio_bus_match);
+
+	return d ? to_mii_bus(d) : NULL;
+}
+EXPORT_SYMBOL(of_mdio_find_bus);
+#endif
+
 /**
  * mdiobus_register - bring up all the PHYs on a given bus and attach them to bus
  * @bus: target mii_bus
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 483c0ad..2574abd 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -45,6 +45,8 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 		for (i=0; i<PHY_MAX_ADDR; i++)
 			mdio->irq[i] = PHY_POLL;
 
+	mdio->dev.of_node = np;
+
 	/* Register the MDIO bus */
 	rc = mdiobus_register(mdio);
 	if (rc)
diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h
index 53b94e0..912c27a 100644
--- a/include/linux/of_mdio.h
+++ b/include/linux/of_mdio.h
@@ -22,4 +22,6 @@ extern struct phy_device *of_phy_connect_fixed_link(struct net_device *dev,
 					 void (*hndlr)(struct net_device *),
 					 phy_interface_t iface);
 
+extern struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np);
+
 #endif /* __LINUX_OF_MDIO_H */
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v6 2/3] netdev/of/phy: Add MDIO bus multiplexer support.
From: David Daney @ 2012-05-03  1:16 UTC (permalink / raw)
  To: Ralf Baechle, Grant Likely, Rob Herring, devicetree-discuss,
	David S. Miller, netdev
  Cc: linux-kernel, linux-mips, afleming, galak, David Daney
In-Reply-To: <1336007799-31016-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <david.daney@cavium.com>

This patch adds a somewhat generic framework for MDIO bus
multiplexers.  It is modeled on the I2C multiplexer.

The multiplexer is needed if there are multiple PHYs with the same
address connected to the same MDIO bus adepter, or if there is
insufficient electrical drive capability for all the connected PHY
devices.

Conceptually it could look something like this:

                   ------------------
                   | Control Signal |
                   --------+---------
                           |
 ---------------   --------+------
 | MDIO MASTER |---| Multiplexer |
 ---------------   --+-------+----
                     |       |
                     C       C
                     h       h
                     i       i
                     l       l
                     d       d
                     |       |
     ---------       A       B   ---------
     |       |       |       |   |       |
     | PHY@1 +-------+       +---+ PHY@1 |
     |       |       |       |   |       |
     ---------       |       |   ---------
     ---------       |       |   ---------
     |       |       |       |   |       |
     | PHY@2 +-------+       +---+ PHY@2 |
     |       |                   |       |
     ---------                   ---------

This framework configures the bus topology from device tree data.  The
mechanics of switching the multiplexer is left to device specific
drivers.

The follow-on patch contains a multiplexer driven by GPIO lines.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 Documentation/devicetree/bindings/net/mdio-mux.txt |  136 ++++++++++++++
 drivers/net/phy/Kconfig                            |    9 +
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/mdio-mux.c                         |  192 ++++++++++++++++++++
 include/linux/mdio-mux.h                           |   21 ++
 5 files changed, 359 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux.txt
 create mode 100644 drivers/net/phy/mdio-mux.c
 create mode 100644 include/linux/mdio-mux.h

diff --git a/Documentation/devicetree/bindings/net/mdio-mux.txt b/Documentation/devicetree/bindings/net/mdio-mux.txt
new file mode 100644
index 0000000..f65606f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux.txt
@@ -0,0 +1,136 @@
+Common MDIO bus multiplexer/switch properties.
+
+An MDIO bus multiplexer/switch will have several child busses that are
+numbered uniquely in a device dependent manner.  The nodes for an MDIO
+bus multiplexer/switch will have one child node for each child bus.
+
+Required properties:
+- mdio-parent-bus : phandle to the parent MDIO bus.
+- #address-cells = <1>;
+- #size-cells = <0>;
+
+Optional properties:
+- Other properties specific to the multiplexer/switch hardware.
+
+Required properties for child nodes:
+- #address-cells = <1>;
+- #size-cells = <0>;
+- reg : The sub-bus number.
+
+
+Example :
+
+	/* The parent MDIO bus. */
+	smi1: mdio@1180000001900 {
+		compatible = "cavium,octeon-3860-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0x11800 0x00001900 0x0 0x40>;
+	};
+
+	/*
+	   An NXP sn74cbtlv3253 dual 1-of-4 switch controlled by a
+	   pair of GPIO lines.  Child busses 2 and 3 populated with 4
+	   PHYs each.
+	 */
+	mdio-mux {
+		compatible = "mdio-mux-gpio";
+		gpios = <&gpio1 3 0>, <&gpio1 4 0>;
+		mdio-parent-bus = <&smi1>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		mdio@2 {
+			reg = <2>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy11: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy12: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy13: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy14: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+		};
+
+		mdio@3 {
+			reg = <3>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy21: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy22: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy23: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy24: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+		};
+	};
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 0e01f4e..99c0674a 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -135,6 +135,15 @@ config MDIO_OCTEON
 
 	  If in doubt, say Y.
 
+config MDIO_BUS_MUX
+	tristate
+	depends on OF_MDIO
+	help
+	  This module provides a driver framework for MDIO bus
+	  multiplexers which connect one of several child MDIO busses
+	  to a parent bus.  Switching between child busses is done by
+	  device specific drivers.
+
 endif # PHYLIB
 
 config MICREL_KS8995MA
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index b7438b1..a6b50e7 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -25,3 +25,4 @@ obj-$(CONFIG_MICREL_PHY)	+= micrel.o
 obj-$(CONFIG_MDIO_OCTEON)	+= mdio-octeon.o
 obj-$(CONFIG_MICREL_KS8995MA)	+= spi_ks8995.o
 obj-$(CONFIG_AMD_PHY)		+= amd.o
+obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o
diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c
new file mode 100644
index 0000000..39ea067
--- /dev/null
+++ b/drivers/net/phy/mdio-mux.c
@@ -0,0 +1,192 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011, 2012 Cavium, Inc.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/mdio-mux.h>
+#include <linux/of_mdio.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+
+#define DRV_VERSION "1.0"
+#define DRV_DESCRIPTION "MDIO bus multiplexer driver"
+
+struct mdio_mux_child_bus;
+
+struct mdio_mux_parent_bus {
+	struct mii_bus *mii_bus;
+	int current_child;
+	int parent_id;
+	void *switch_data;
+	int (*switch_fn)(int current_child, int desired_child, void *data);
+
+	/* List of our children linked through their next fields. */
+	struct mdio_mux_child_bus *children;
+};
+
+struct mdio_mux_child_bus {
+	struct mii_bus *mii_bus;
+	struct mdio_mux_parent_bus *parent;
+	struct mdio_mux_child_bus *next;
+	int bus_number;
+	int phy_irq[PHY_MAX_ADDR];
+};
+
+/*
+ * The parent bus' lock is used to order access to the switch_fn.
+ */
+static int mdio_mux_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+	struct mdio_mux_child_bus *cb = bus->priv;
+	struct mdio_mux_parent_bus *pb = cb->parent;
+	int r;
+
+	mutex_lock(&pb->mii_bus->mdio_lock);
+	r = pb->switch_fn(pb->current_child, cb->bus_number, pb->switch_data);
+	if (r)
+		goto out;
+
+	pb->current_child = cb->bus_number;
+
+	r = pb->mii_bus->read(pb->mii_bus, phy_id, regnum);
+out:
+	mutex_unlock(&pb->mii_bus->mdio_lock);
+
+	return r;
+}
+
+/*
+ * The parent bus' lock is used to order access to the switch_fn.
+ */
+static int mdio_mux_write(struct mii_bus *bus, int phy_id,
+			  int regnum, u16 val)
+{
+	struct mdio_mux_child_bus *cb = bus->priv;
+	struct mdio_mux_parent_bus *pb = cb->parent;
+
+	int r;
+
+	mutex_lock(&pb->mii_bus->mdio_lock);
+	r = pb->switch_fn(pb->current_child, cb->bus_number, pb->switch_data);
+	if (r)
+		goto out;
+
+	pb->current_child = cb->bus_number;
+
+	r = pb->mii_bus->write(pb->mii_bus, phy_id, regnum, val);
+out:
+	mutex_unlock(&pb->mii_bus->mdio_lock);
+
+	return r;
+}
+
+static int parent_count;
+
+int mdio_mux_init(struct device *dev,
+		  int (*switch_fn)(int cur, int desired, void *data),
+		  void **mux_handle,
+		  void *data)
+{
+	struct device_node *parent_bus_node;
+	struct device_node *child_bus_node;
+	int r, ret_val;
+	struct mii_bus *parent_bus;
+	struct mdio_mux_parent_bus *pb;
+	struct mdio_mux_child_bus *cb;
+
+	if (!dev->of_node)
+		return -ENODEV;
+
+	parent_bus_node = of_parse_phandle(dev->of_node, "mdio-parent-bus", 0);
+
+	if (!parent_bus_node)
+		return -ENODEV;
+
+	parent_bus = of_mdio_find_bus(parent_bus_node);
+	if (parent_bus == NULL) {
+		ret_val = -EPROBE_DEFER;
+		goto err_parent_bus;
+	}
+
+	pb = devm_kzalloc(dev, sizeof(*pb), GFP_KERNEL);
+	if (pb == NULL) {
+		ret_val = -ENOMEM;
+		goto err_parent_bus;
+	}
+
+	pb->switch_data = data;
+	pb->switch_fn = switch_fn;
+	pb->current_child = -1;
+	pb->parent_id = parent_count++;
+	pb->mii_bus = parent_bus;
+
+	ret_val = -ENODEV;
+	for_each_child_of_node(dev->of_node, child_bus_node) {
+		u32 v;
+
+		r = of_property_read_u32(child_bus_node, "reg", &v);
+		if (r)
+			continue;
+
+		cb = devm_kzalloc(dev, sizeof(*cb), GFP_KERNEL);
+		if (cb == NULL) {
+			dev_err(dev,
+				"Error: Failed to allocate memory for child\n");
+			ret_val = -ENOMEM;
+			break;
+		}
+		cb->bus_number = v;
+		cb->parent = pb;
+		cb->mii_bus = mdiobus_alloc();
+		cb->mii_bus->priv = cb;
+
+		cb->mii_bus->irq = cb->phy_irq;
+		cb->mii_bus->name = "mdio_mux";
+		snprintf(cb->mii_bus->id, MII_BUS_ID_SIZE, "%x.%x",
+			 pb->parent_id, v);
+		cb->mii_bus->parent = dev;
+		cb->mii_bus->read = mdio_mux_read;
+		cb->mii_bus->write = mdio_mux_write;
+		r = of_mdiobus_register(cb->mii_bus, child_bus_node);
+		if (r) {
+			mdiobus_free(cb->mii_bus);
+			devm_kfree(dev, cb);
+		} else {
+			of_node_get(child_bus_node);
+			cb->next = pb->children;
+			pb->children = cb;
+		}
+	}
+	if (pb->children) {
+		*mux_handle = pb;
+		dev_info(dev, "Version " DRV_VERSION "\n");
+		return 0;
+	}
+err_parent_bus:
+	of_node_put(parent_bus_node);
+	return ret_val;
+}
+EXPORT_SYMBOL_GPL(mdio_mux_init);
+
+void mdio_mux_uninit(void *mux_handle)
+{
+	struct mdio_mux_parent_bus *pb = mux_handle;
+	struct mdio_mux_child_bus *cb = pb->children;
+
+	while (cb) {
+		mdiobus_unregister(cb->mii_bus);
+		mdiobus_free(cb->mii_bus);
+		cb = cb->next;
+	}
+}
+EXPORT_SYMBOL_GPL(mdio_mux_uninit);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_VERSION(DRV_VERSION);
+MODULE_AUTHOR("David Daney");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/mdio-mux.h b/include/linux/mdio-mux.h
new file mode 100644
index 0000000..a243dbb
--- /dev/null
+++ b/include/linux/mdio-mux.h
@@ -0,0 +1,21 @@
+/*
+ * MDIO bus multiplexer framwork.
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011, 2012 Cavium, Inc.
+ */
+#ifndef __LINUX_MDIO_MUX_H
+#define __LINUX_MDIO_MUX_H
+#include <linux/device.h>
+
+int mdio_mux_init(struct device *dev,
+		  int (*switch_fn) (int cur, int desired, void *data),
+		  void **mux_handle,
+		  void *data);
+
+void mdio_mux_uninit(void *mux_handle);
+
+#endif /* __LINUX_MDIO_MUX_H */
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v6 3/3] netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.
From: David Daney @ 2012-05-03  1:16 UTC (permalink / raw)
  To: Ralf Baechle, Grant Likely, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, David S. Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	afleming-Re5JQEeQqe8AvxtiuMwx3w, David Daney,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336007799-31016-1-git-send-email-ddaney.cavm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

The GPIO pins select which sub bus is connected to the master.

Initially tested with an sn74cbtlv3253 switch device wired into the
MDIO bus.

Signed-off-by: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
---
 .../devicetree/bindings/net/mdio-mux-gpio.txt      |  127 +++++++++++++++++
 drivers/net/phy/Kconfig                            |   10 ++
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/mdio-mux-gpio.c                    |  142 ++++++++++++++++++++
 4 files changed, 280 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
 create mode 100644 drivers/net/phy/mdio-mux-gpio.c

diff --git a/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt b/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
new file mode 100644
index 0000000..7938411
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
@@ -0,0 +1,127 @@
+Properties for an MDIO bus multiplexer/switch controlled by GPIO pins.
+
+This is a special case of a MDIO bus multiplexer.  One or more GPIO
+lines are used to control which child bus is connected.
+
+Required properties in addition to the generic multiplexer properties:
+
+- compatible : mdio-mux-gpio.
+- gpios : GPIO specifiers for each GPIO line.  One or more must be specified.
+
+
+Example :
+
+	/* The parent MDIO bus. */
+	smi1: mdio@1180000001900 {
+		compatible = "cavium,octeon-3860-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0x11800 0x00001900 0x0 0x40>;
+	};
+
+	/*
+	   An NXP sn74cbtlv3253 dual 1-of-4 switch controlled by a
+	   pair of GPIO lines.  Child busses 2 and 3 populated with 4
+	   PHYs each.
+	 */
+	mdio-mux {
+		compatible = "mdio-mux-gpio";
+		gpios = <&gpio1 3 0>, <&gpio1 4 0>;
+		mdio-parent-bus = <&smi1>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		mdio@2 {
+			reg = <2>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy11: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy12: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy13: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy14: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+		};
+
+		mdio@3 {
+			reg = <3>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy21: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy22: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy23: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy24: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+		};
+	};
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 99c0674a..944cdfb 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -144,6 +144,16 @@ config MDIO_BUS_MUX
 	  to a parent bus.  Switching between child busses is done by
 	  device specific drivers.
 
+config MDIO_BUS_MUX_GPIO
+	tristate "Support for GPIO controlled MDIO bus multiplexers"
+	depends on OF_GPIO && OF_MDIO
+	select MDIO_BUS_MUX
+	help
+	  This module provides a driver for MDIO bus multiplexers that
+	  are controlled via GPIO lines.  The multiplexer connects one of
+	  several child MDIO busses to a parent bus.  Child bus
+	  selection is under the control of GPIO lines.
+
 endif # PHYLIB
 
 config MICREL_KS8995MA
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index a6b50e7..f51af68 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_MDIO_OCTEON)	+= mdio-octeon.o
 obj-$(CONFIG_MICREL_KS8995MA)	+= spi_ks8995.o
 obj-$(CONFIG_AMD_PHY)		+= amd.o
 obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o
+obj-$(CONFIG_MDIO_BUS_MUX_GPIO)	+= mdio-mux-gpio.o
diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
new file mode 100644
index 0000000..e0cc4ef
--- /dev/null
+++ b/drivers/net/phy/mdio-mux-gpio.c
@@ -0,0 +1,142 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011, 2012 Cavium, Inc.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/device.h>
+#include <linux/of_mdio.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/phy.h>
+#include <linux/mdio-mux.h>
+#include <linux/of_gpio.h>
+
+#define DRV_VERSION "1.0"
+#define DRV_DESCRIPTION "GPIO controlled MDIO bus multiplexer driver"
+
+#define MDIO_MUX_GPIO_MAX_BITS 8
+
+struct mdio_mux_gpio_state {
+	int gpio[MDIO_MUX_GPIO_MAX_BITS];
+	unsigned int num_gpios;
+	void *mux_handle;
+};
+
+static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
+				   void *data)
+{
+	int change;
+	unsigned int n;
+	struct mdio_mux_gpio_state *s = data;
+
+	if (current_child == desired_child)
+		return 0;
+
+	change = current_child == -1 ? -1 : current_child ^ desired_child;
+
+	for (n = 0; n < s->num_gpios; n++) {
+		if (change & 1)
+			gpio_set_value_cansleep(s->gpio[n],
+						(desired_child & 1) != 0);
+		change >>= 1;
+		desired_child >>= 1;
+	}
+
+	return 0;
+}
+
+static int __devinit mdio_mux_gpio_probe(struct platform_device *pdev)
+{
+	enum of_gpio_flags f;
+	struct mdio_mux_gpio_state *s;
+	unsigned int num_gpios;
+	unsigned int n;
+	int r;
+
+	if (!pdev->dev.of_node)
+		return -ENODEV;
+
+	num_gpios = of_gpio_count(pdev->dev.of_node);
+	if (num_gpios == 0 || num_gpios > MDIO_MUX_GPIO_MAX_BITS)
+		return -ENODEV;
+
+	s = devm_kzalloc(&pdev->dev, sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	s->num_gpios = num_gpios;
+
+	for (n = 0; n < num_gpios; ) {
+		int gpio = of_get_gpio_flags(pdev->dev.of_node, n, &f);
+		if (gpio < 0) {
+			r = (gpio == -ENODEV) ? -EPROBE_DEFER : gpio;
+			goto err;
+		}
+		s->gpio[n] = gpio;
+
+		n++;
+
+		r = gpio_request(gpio, "mdio_mux_gpio");
+		if (r)
+			goto err;
+
+		r = gpio_direction_output(gpio, 0);
+		if (r)
+			goto err;
+	}
+
+	r = mdio_mux_init(&pdev->dev,
+			  mdio_mux_gpio_switch_fn, &s->mux_handle, s);
+
+	if (r == 0) {
+		pdev->dev.platform_data = s;
+		return 0;
+	}
+err:
+	while (n) {
+		n--;
+		gpio_free(s->gpio[n]);
+	}
+	devm_kfree(&pdev->dev, s);
+	return r;
+}
+
+static int __devexit mdio_mux_gpio_remove(struct platform_device *pdev)
+{
+	struct mdio_mux_gpio_state *s = pdev->dev.platform_data;
+	mdio_mux_uninit(s->mux_handle);
+	return 0;
+}
+
+static struct of_device_id mdio_mux_gpio_match[] = {
+	{
+		.compatible = "mdio-mux-gpio",
+	},
+	{
+		/* Legacy compatible property. */
+		.compatible = "cavium,mdio-mux-sn74cbtlv3253",
+	},
+	{},
+};
+MODULE_DEVICE_TABLE(of, mdio_mux_gpio_match);
+
+static struct platform_driver mdio_mux_gpio_driver = {
+	.driver = {
+		.name		= "mdio-mux-gpio",
+		.owner		= THIS_MODULE,
+		.of_match_table = mdio_mux_gpio_match,
+	},
+	.probe		= mdio_mux_gpio_probe,
+	.remove		= __devexit_p(mdio_mux_gpio_remove),
+};
+
+module_platform_driver(mdio_mux_gpio_driver);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_VERSION(DRV_VERSION);
+MODULE_AUTHOR("David Daney");
+MODULE_LICENSE("GPL");
-- 
1.7.2.3

^ permalink raw reply related

* linux-next: manual merge of the wireless-next tree with the net-next tree
From: Stephen Rothwell @ 2012-05-03  1:31 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Ashok Nagarajan, David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 5530 bytes --]

Hi John,

Today's linux-next merge of the wireless-next tree got a conflict in
net/wireless/nl80211.c between commit 9360ffd18597 ("wireless: Stop using
NLA_PUT*()") from the net-next tree and commit 0a9b3782ef40 ("{nl,cfg,mac}
80211: Allow user to see/configure HT protection mode") from the
wireless-next tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/wireless/nl80211.c
index d5005c5,859bd66..0000000
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@@ -3348,50 -3292,51 +3348,52 @@@ static int nl80211_get_mesh_config(stru
  	pinfoattr = nla_nest_start(msg, NL80211_ATTR_MESH_CONFIG);
  	if (!pinfoattr)
  		goto nla_put_failure;
 -	NLA_PUT_U32(msg, NL80211_ATTR_IFINDEX, dev->ifindex);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_RETRY_TIMEOUT,
 -			cur_params.dot11MeshRetryTimeout);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_CONFIRM_TIMEOUT,
 -			cur_params.dot11MeshConfirmTimeout);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_HOLDING_TIMEOUT,
 -			cur_params.dot11MeshHoldingTimeout);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_MAX_PEER_LINKS,
 -			cur_params.dot11MeshMaxPeerLinks);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_MAX_RETRIES,
 -			cur_params.dot11MeshMaxRetries);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_TTL,
 -			cur_params.dot11MeshTTL);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_ELEMENT_TTL,
 -			cur_params.element_ttl);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_AUTO_OPEN_PLINKS,
 -			cur_params.auto_open_plinks);
 -	NLA_PUT_U32(msg, NL80211_MESHCONF_SYNC_OFFSET_MAX_NEIGHBOR,
 -			cur_params.dot11MeshNbrOffsetMaxNeighbor);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_HWMP_MAX_PREQ_RETRIES,
 -			cur_params.dot11MeshHWMPmaxPREQretries);
 -	NLA_PUT_U32(msg, NL80211_MESHCONF_PATH_REFRESH_TIME,
 -			cur_params.path_refresh_time);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_MIN_DISCOVERY_TIMEOUT,
 -			cur_params.min_discovery_timeout);
 -	NLA_PUT_U32(msg, NL80211_MESHCONF_HWMP_ACTIVE_PATH_TIMEOUT,
 -			cur_params.dot11MeshHWMPactivePathTimeout);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_HWMP_PREQ_MIN_INTERVAL,
 -			cur_params.dot11MeshHWMPpreqMinInterval);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_HWMP_PERR_MIN_INTERVAL,
 -			cur_params.dot11MeshHWMPperrMinInterval);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_HWMP_NET_DIAM_TRVS_TIME,
 -			cur_params.dot11MeshHWMPnetDiameterTraversalTime);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_HWMP_ROOTMODE,
 -			cur_params.dot11MeshHWMPRootMode);
 -	NLA_PUT_U16(msg, NL80211_MESHCONF_HWMP_RANN_INTERVAL,
 -			cur_params.dot11MeshHWMPRannInterval);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_GATE_ANNOUNCEMENTS,
 -			cur_params.dot11MeshGateAnnouncementProtocol);
 -	NLA_PUT_U8(msg, NL80211_MESHCONF_FORWARDING,
 -			cur_params.dot11MeshForwarding);
 -	NLA_PUT_U32(msg, NL80211_MESHCONF_RSSI_THRESHOLD,
 -			cur_params.rssi_threshold);
 -	NLA_PUT_U32(msg, NL80211_MESHCONF_HT_OPMODE,
 -			cur_params.ht_opmode);
 +	if (nla_put_u32(msg, NL80211_ATTR_IFINDEX, dev->ifindex) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_RETRY_TIMEOUT,
 +			cur_params.dot11MeshRetryTimeout) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_CONFIRM_TIMEOUT,
 +			cur_params.dot11MeshConfirmTimeout) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_HOLDING_TIMEOUT,
 +			cur_params.dot11MeshHoldingTimeout) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_MAX_PEER_LINKS,
 +			cur_params.dot11MeshMaxPeerLinks) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_MAX_RETRIES,
 +		       cur_params.dot11MeshMaxRetries) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_TTL,
 +		       cur_params.dot11MeshTTL) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_ELEMENT_TTL,
 +		       cur_params.element_ttl) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_AUTO_OPEN_PLINKS,
 +		       cur_params.auto_open_plinks) ||
 +	    nla_put_u32(msg, NL80211_MESHCONF_SYNC_OFFSET_MAX_NEIGHBOR,
 +			cur_params.dot11MeshNbrOffsetMaxNeighbor) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_HWMP_MAX_PREQ_RETRIES,
 +		       cur_params.dot11MeshHWMPmaxPREQretries) ||
 +	    nla_put_u32(msg, NL80211_MESHCONF_PATH_REFRESH_TIME,
 +			cur_params.path_refresh_time) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_MIN_DISCOVERY_TIMEOUT,
 +			cur_params.min_discovery_timeout) ||
 +	    nla_put_u32(msg, NL80211_MESHCONF_HWMP_ACTIVE_PATH_TIMEOUT,
 +			cur_params.dot11MeshHWMPactivePathTimeout) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_HWMP_PREQ_MIN_INTERVAL,
 +			cur_params.dot11MeshHWMPpreqMinInterval) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_HWMP_PERR_MIN_INTERVAL,
 +			cur_params.dot11MeshHWMPperrMinInterval) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_HWMP_NET_DIAM_TRVS_TIME,
 +			cur_params.dot11MeshHWMPnetDiameterTraversalTime) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_HWMP_ROOTMODE,
 +		       cur_params.dot11MeshHWMPRootMode) ||
 +	    nla_put_u16(msg, NL80211_MESHCONF_HWMP_RANN_INTERVAL,
 +			cur_params.dot11MeshHWMPRannInterval) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_GATE_ANNOUNCEMENTS,
 +		       cur_params.dot11MeshGateAnnouncementProtocol) ||
 +	    nla_put_u8(msg, NL80211_MESHCONF_FORWARDING,
 +		       cur_params.dot11MeshForwarding) ||
 +	    nla_put_u32(msg, NL80211_MESHCONF_RSSI_THRESHOLD,
- 			cur_params.rssi_threshold))
++			cur_params.rssi_threshold) ||
++	    nla_put_u32(msg, NL80211_MESHCONF_HT_OPMODE,
++			cur_params.ht_opmode))
 +		goto nla_put_failure;
  	nla_nest_end(msg, pinfoattr);
  	genlmsg_end(msg, hdr);
  	return genlmsg_reply(msg, info);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the wireless-next tree with the net-next tree
From: David Miller @ 2012-05-03  1:35 UTC (permalink / raw)
  To: sfr; +Cc: linville, linux-next, linux-kernel, ashok, netdev
In-Reply-To: <20120503113110.3eded43c5dc526a471c69dd2@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 3 May 2012 11:31:10 +1000

> Today's linux-next merge of the wireless-next tree got a conflict in
> net/wireless/nl80211.c between commit 9360ffd18597 ("wireless: Stop using
> NLA_PUT*()") from the net-next tree and commit 0a9b3782ef40 ("{nl,cfg,mac}
> 80211: Allow user to see/configure HT protection mode") from the
> wireless-next tree.

John, the NLA_PUT macros were removed from the net-next tree more
than a month ago.

There is zero reason why this should still be happening, and this
issue is causing a large, unnecessary, burdon upon Stephen.

Please take care of this, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
From: Eric Dumazet @ 2012-05-03  1:52 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <4FA19F5B.7040407@intel.com>

On Wed, 2012-05-02 at 13:55 -0700, Alexander Duyck wrote:
> On 05/02/2012 11:15 AM, Eric Dumazet wrote:
> > On Wed, 2012-05-02 at 11:05 -0700, Alexander Duyck wrote:
> >
> >> You're correct about the fragstolen case, I actually was thinking of the
> >> first patch you sent, not this second one.
> >>
> >> However we still have a problem.  What we end up with now is a case of
> >> sharing in which the clone skb no longer knows that it is sharing the
> >> head with another skb.  The dataref will drop to 1 when we call
> >> __kfree_skb.  This means that any other function out there that tries to
> >> see if the skb is shared would return false.  This could lead to issues
> >> if there is anything out there that manipulates the data in head based
> >> on the false assumption that it is not cloned.  What we would probably
> >> need to do in this case is tweak the logic for skb_cloned.  If you are
> >> using a head_frag you should probably add a check that returns true if
> >> cloned is true and page_count is greater than 1.  We should be safe in
> >> the case of skb_header_cloned since we already dropped are dataref when
> >> we stole the page and freed the skb.
> > I really dont understand this concern.
> >
> > When skb is cloned, we copy in head_frag __skb_clone()
> >
> > So both skbs have the bit set, and dataref = 2.
> >
> > first skb is freed, dataref becomes 1 and nothing special  happen
> >
> > >From this point, skb->head is not 'shared' anymore (taken your own
> > words). And we are free to do whatever we want.
> >
> > second skb is freed, dataref becomes 0 and we call the right destructor.
> The problem is that the stack will not be able to detect sharing.  As
> long as page_count is greater than 2 and skb->cloned is set we should be
> telling any callers to skb_cloned that the head is cloned.  Otherwise we
> can run into issues elsewhere with well meaning code checking and not
> detecting sharing, and then mangling the header.
> 

page count is irrelevant, since if PAGE_SIZE=65536, you can have 32
fragments (of 2048 bytes) per page. Still we can call put_page() for
each individual frag that must be freed.

You forgot to give an example of path that would be failing. Since the
skb_cloned() check is still valid.

Head is cloned if : skb->cloned is set and dataref value is not 1

(minus the skb_header_release() tweaks done on output path for tcp)

Every time a 'caller' is going to modify/mangle its skb head, it must
first call pskb_expand_head() (or various helpers around it) to :

- allocate a new skb->head
- copy old content to new head
- release a reference on old head dataref
- if old dataref reaches 0, 'free' old head (might be a kfree() or
put_page())

> Also I am not sure if the big monolithic changes are really the best way
> to approach this.  It would be nice if we could fix this incrementally
> instead of trying to do it all at once since there are multiple issues
> that need to be addressed.
> 
> I will try to submit a few patches from my end later today.  I still
> need to look over all of the changes from the past couple of weeks that
> were based on the assumption that the IP stack completely owned the skb.

I did my best to provide small changes.

Plus TCP coalescing is done after IP processing.

Owning skb is a vague concept anyway. IP borrows skb but not owns them.
They already could be cloned skb.

^ permalink raw reply

* Re: [PATCH v4 1/3] tcp: early retransmit: tcp_enter_recovery()
From: Eric Dumazet @ 2012-05-03  1:56 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Yuchung Cheng, davem, ilpo.jarvinen, nanditad, netdev
In-Reply-To: <CADVnQymbDLXSR=aRB8SSd8GxpsFB3j2DLCBxH-1t2oS0T+sxgA@mail.gmail.com>

On Wed, 2012-05-02 at 20:13 -0400, Neal Cardwell wrote:
> On Wed, May 2, 2012 at 7:30 PM, Yuchung Cheng <ycheng@google.com> wrote:
> > This a prepartion patch that refactors the code to enter recovery
> > into a new function tcp_enter_recovery(). It's needed to implement
> > the delayed fast retransmit in ER.
> >
> > Signed-off-by: Yuchung Cheng <ycheng@google.com>
> > ---
> > ChangeLog since v1:
> >  - swaped with part 1 and part2
> > ChangeLog since v2:
> >  - removed RFC in commit message
> > ChangeLog since v3:
> >  - nothing
> 
> Acked-by: Neal Cardwell <ncardwell@google.com>

Yuchung, you should add our Acked-by to new submissions if no change on
a particular part was done, or else final inclusion wont have them

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next] net: implement tcp coalescing in tcp_queue_rcv()
From: Eric Dumazet @ 2012-05-03  2:14 UTC (permalink / raw)
  To: David Miller
  Cc: alexander.duyck, alexander.h.duyck, netdev, ncardwell, therbert,
	jeffrey.t.kirsher, mchan, mcarlson, herbert, bhutchings,
	ilpo.jarvinen, maze
In-Reply-To: <20120502.211157.2201073892985382154.davem@davemloft.net>

On Wed, 2012-05-02 at 21:11 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 02 May 2012 21:58:29 +0200
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
> > receiver function when application is not blocked in recvmsg().
> > 
> > Function tcp_queue_rcv() is moved a bit to allow its call from
> > tcp_data_queue()
> > 
> > This gives good results especially if GRO could not kick, and if skb
> > head is a fragment.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied.

Thanks David

My next step is to provide a common helper to NAPI drivers to replace 
netdev_alloc_skb_ip_align(struct net_device *dev, unsigned int len)

by :

struct sk_buff *napi_alloc_rxskb(struct napi_struct *napi, unsigned int len)

That will manage a cache of one page, splitted in fragments as needed.
(roughly the code we added in tg3 as POC)

Because converting drivers to build_skb() is a too slow process.

^ permalink raw reply

* Re: [PATCH net-next] net: implement tcp coalescing in tcp_queue_rcv()
From: David Miller @ 2012-05-03  2:21 UTC (permalink / raw)
  To: eric.dumazet
  Cc: alexander.duyck, alexander.h.duyck, netdev, ncardwell, therbert,
	jeffrey.t.kirsher, mchan, mcarlson, herbert, bhutchings,
	ilpo.jarvinen, maze
In-Reply-To: <1336011260.22133.754.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 May 2012 04:14:20 +0200

> My next step is to provide a common helper to NAPI drivers to replace 
> netdev_alloc_skb_ip_align(struct net_device *dev, unsigned int len)
> 
> by :
> 
> struct sk_buff *napi_alloc_rxskb(struct napi_struct *napi, unsigned int len)
> 
> That will manage a cache of one page, splitted in fragments as needed.
> (roughly the code we added in tg3 as POC)
> 
> Because converting drivers to build_skb() is a too slow process.

Sounds good.

^ permalink raw reply

* Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
From: Alexander Duyck @ 2012-05-03  3:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <1336009974.22133.706.camel@edumazet-glaptop>

On 5/2/2012 6:52 PM, Eric Dumazet wrote:
> On Wed, 2012-05-02 at 13:55 -0700, Alexander Duyck wrote:
>> On 05/02/2012 11:15 AM, Eric Dumazet wrote:
>>> On Wed, 2012-05-02 at 11:05 -0700, Alexander Duyck wrote:
>>>> You're correct about the fragstolen case, I actually was thinking of the
>>>> first patch you sent, not this second one.
>>>>
>>>> However we still have a problem.  What we end up with now is a case of
>>>> sharing in which the clone skb no longer knows that it is sharing the
>>>> head with another skb.  The dataref will drop to 1 when we call
>>>> __kfree_skb.  This means that any other function out there that tries to
>>>> see if the skb is shared would return false.  This could lead to issues
>>>> if there is anything out there that manipulates the data in head based
>>>> on the false assumption that it is not cloned.  What we would probably
>>>> need to do in this case is tweak the logic for skb_cloned.  If you are
>>>> using a head_frag you should probably add a check that returns true if
>>>> cloned is true and page_count is greater than 1.  We should be safe in
>>>> the case of skb_header_cloned since we already dropped are dataref when
>>>> we stole the page and freed the skb.
>>> I really dont understand this concern.
>>>
>>> When skb is cloned, we copy in head_frag __skb_clone()
>>>
>>> So both skbs have the bit set, and dataref = 2.
>>>
>>> first skb is freed, dataref becomes 1 and nothing special  happen
>>>
>>> > From this point, skb->head is not 'shared' anymore (taken your own
>>> words). And we are free to do whatever we want.
>>>
>>> second skb is freed, dataref becomes 0 and we call the right destructor.
>> The problem is that the stack will not be able to detect sharing.  As
>> long as page_count is greater than 2 and skb->cloned is set we should be
>> telling any callers to skb_cloned that the head is cloned.  Otherwise we
>> can run into issues elsewhere with well meaning code checking and not
>> detecting sharing, and then mangling the header.
> page count is irrelevant, since if PAGE_SIZE=65536, you can have 32
> fragments (of 2048 bytes) per page. Still we can call put_page() for
> each individual frag that must be freed.
>
> You forgot to give an example of path that would be failing. Since the
> skb_cloned() check is still valid.
>
> Head is cloned if : skb->cloned is set and dataref value is not 1
>
> (minus the skb_header_release() tweaks done on output path for tcp)
>
> Every time a 'caller' is going to modify/mangle its skb head, it must
> first call pskb_expand_head() (or various helpers around it) to :
>
> - allocate a new skb->head
> - copy old content to new head
> - release a reference on old head dataref
> - if old dataref reaches 0, 'free' old head (might be a kfree() or
> put_page())
This is exactly my point.  The way your current code works the check in 
pskb_expand_head will not detect the header is cloned.

So for example lets say you have one of your skbs that goes through and 
is merged.

1.  You start with a cloned skb.  dataref = 2, head_frag page count = 1;
2.  You go through tcp_try_coalesce.  dataref = 2, head_frag page count = 2;
3.  You call __kfree_skb on the skb.  dataref = 1, head_frag page count = 2;
4   At this point if the holder of the clone decides to modify the skb-> 
head there is nothing to stop it from doing so.

Odds are you would have to have a pretty extreme corner case to really 
see any issues with this since I know most raw sockets won't mangle the 
data portion of the frame.  I just figure we could save ourselves a lot 
of trouble by just not coalescing the head_frag in the case that the skb 
is cloned.

>> Also I am not sure if the big monolithic changes are really the best way
>> to approach this.  It would be nice if we could fix this incrementally
>> instead of trying to do it all at once since there are multiple issues
>> that need to be addressed.
>>
>> I will try to submit a few patches from my end later today.  I still
>> need to look over all of the changes from the past couple of weeks that
>> were based on the assumption that the IP stack completely owned the skb.
> I did my best to provide small changes.
I just meant you didn't need to do things like add the kfree_skb_partial 
helper in the same patch.

> Plus TCP coalescing is done after IP processing.
>
> Owning skb is a vague concept anyway. IP borrows skb but not owns them.
> They already could be cloned skb.
I know it is a vague concept.  However it is one of those concepts where 
if we don't get it right we start to see random panics and call traces 
due to memory corruption.  That is why I prefer to avoid it if at all 
possible.  All I am really asking for is that we don't just use the 
head_frag as a page unless we know it is not cloned.  All those checks 
that just look for skb->head_frag could check for skb->head_frag && 
!skb_cloned(skb) and I would be a much happier person since then I know 
all owners of the clones will be using the same variable for reference 
count tracking.

I'm still working on those patches.  I thought I had sent them earlier 
but it looks like I had an email issue, and since then Dave pulled your 
patches so I am rebasing the patches on the updated tree and should have 
something in the next hour or so.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
From: Eric Dumazet @ 2012-05-03  3:14 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <4FA1F4C3.8000804@gmail.com>

On Wed, 2012-05-02 at 20:00 -0700, Alexander Duyck wrote:

> This is exactly my point.  The way your current code works the check in 
> pskb_expand_head will not detect the header is cloned.
> 
> So for example lets say you have one of your skbs that goes through and 
> is merged.
> 
> 1.  You start with a cloned skb.  dataref = 2, head_frag page count = 1;
> 2.  You go through tcp_try_coalesce.  dataref = 2, head_frag page count = 2;
> 3.  You call __kfree_skb on the skb.  dataref = 1, head_frag page count = 2;


If page count was incremented in 2., this is because we stole the head.

But we could not stole the head since dataref = 2

So try to find a real example, based on current state, not on the first
patch I sent, since we made progress since this time.

Thanks

^ permalink raw reply

* Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
From: Alexander Duyck @ 2012-05-03  3:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <1336014869.22133.821.camel@edumazet-glaptop>

On 5/2/2012 8:14 PM, Eric Dumazet wrote:
> On Wed, 2012-05-02 at 20:00 -0700, Alexander Duyck wrote:
>> This is exactly my point.  The way your current code works the check in
>> pskb_expand_head will not detect the header is cloned.
>>
>> So for example lets say you have one of your skbs that goes through and
>> is merged.
>>
>> 1.  You start with a cloned skb.  dataref = 2, head_frag page count = 1;
>> 2.  You go through tcp_try_coalesce.  dataref = 2, head_frag page count = 2;

4592         if (from->head_frag) {
4593                 struct page *page;
4594                 unsigned int offset;
4595
4596                 if (skb_shinfo(to)->nr_frags + skb_shinfo(from)->nr_frags>= MAX_SKB_FRAGS)
4597                         return false;
4598                 page = virt_to_head_page(from->head);
4599                 offset = from->data - (unsigned char *)page_address(page);
4600                 skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
4601                                    page, offset, skb_headlen(from));
4602
4603                 if (skb_cloned(from))
4604                         get_page(page);
4605                 else
4606                         *fragstolen = true;
4607
4608                 delta = len; /* we dont know real truesize... */
4609                 goto copyfrags;
4610         }


>> 3.  You call __kfree_skb on the skb.  dataref = 1, head_frag page count = 2;
4614 static void kfree_skb_partial(struct sk_buff *skb, bool head_stolen)
4615 {
4616         if (head_stolen)
4617                 kmem_cache_free(skbuff_head_cache, skb);
4618         else
4619                 __kfree_skb(skb);
4620 }
4621

>
> If page count was incremented in 2., this is because we stole the head.
>
> But we could not stole the head since dataref = 2
I don't see how that is the case.  It looks pretty clear to me that if 
dataref ==2, as represented by skb_cloned(from) being true above, we are 
calling get_page which will bump the page count and we are still 
stealing the head and then calling __kfree_skb which will decrement dataref.
> So try to find a real example, based on current state, not on the first
> patch I sent, since we made progress since this time.
>
> Thanks

I included a couple of grabs from blobs on git.kernel.org for Dave's 
current tree.  I should have patches out in about 10 minutes and then we 
can discuss those.

Thanks,

Alex

^ permalink raw reply

* [PATCH 0/2] Cleanups for head_frag usage and tcp_try_coalese
From: Alexander Duyck @ 2012-05-03  3:38 UTC (permalink / raw)
  To: netdev; +Cc: davem

I wrote up these patches earlier today to address issues related to the
splitting of reference counting into two seperate counts in the case of
skbs containing a head frag.  In addition I have a secondary patch for
tcp_try_coalesce which is meant to be a general cleanup as I found it
difficult to read through the code as it was.

I have done some performance testing on the series with netperf and saw no
appreciable difference after these changes for the non-cloned case.
---

Alexander Duyck (2):
      tcp: cleanup tcp_try_coalesce
      net: Stop decapitating clones that have a head_frag

 net/core/skbuff.c    |    2 +
 net/ipv4/tcp_input.c |   85 +++++++++++++++++++++++++-------------------------
 2 files changed, 44 insertions(+), 43 deletions(-)

-- 

^ permalink raw reply

* [PATCH 1/2] net: Stop decapitating clones that have a head_frag
From: Alexander Duyck @ 2012-05-03  3:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, Alexander Duyck, Eric Dumazet, Jeff Kirsher
In-Reply-To: <20120503033018.5482.89902.stgit@gitlad.jf.intel.com>

This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 net/core/skbuff.c    |    2 +-
 net/ipv4/tcp_input.c |    9 ++-------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 52ba2b5..8703754 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1699,7 +1699,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
 			      struct splice_pipe_desc *spd, struct sock *sk)
 {
 	int seg;
-	bool head_is_linear = !skb->head_frag;
+	bool head_is_linear = !skb->head_frag || skb_cloned(skb);
 
 	/* map the linear part :
 	 * If skb->head_frag is set, this 'linear' part is backed
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2f696ef..c6f78e2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4589,7 +4589,7 @@ copyfrags:
 		to->data_len += len;
 		goto merge;
 	}
-	if (from->head_frag) {
+	if (from->head_frag && !skb_cloned(from)) {
 		struct page *page;
 		unsigned int offset;
 
@@ -4599,12 +4599,7 @@ copyfrags:
 		offset = from->data - (unsigned char *)page_address(page);
 		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
 				   page, offset, skb_headlen(from));
-
-		if (skb_cloned(from))
-			get_page(page);
-		else
-			*fragstolen = true;
-
+		*fragstolen = true;
 		delta = len; /* we dont know real truesize... */
 		goto copyfrags;
 	}

^ permalink raw reply related

* [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: Alexander Duyck @ 2012-05-03  3:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, Alexander Duyck, Eric Dumazet, Jeff Kirsher
In-Reply-To: <20120503033018.5482.89902.stgit@gitlad.jf.intel.com>

This change is mostly meant to help improve the readability of
tcp_try_coalesce.  I had a few issues since there were several points when
the code would test for a conditional, fail, then succeed on another
conditional take some action, and then follow a goto back into the previous
conditional.  I just tore all of that apart and made the whole thing one
linear flow with a single goto.

Also there were multiple ways of computing the delta, the one for head_frag
made the least amount of sense to me since we were only dropping the
sk_buff so I have updated the logic for the stolen head case so that delta
is only truesize - sizeof(skb_buff), and for the case where we are dropping
the head as well it is truesize - SKB_TRUESIZE(skb_end_pointer - head).
This way we can also account for the head_frag with headlen == 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 net/ipv4/tcp_input.c |   80 +++++++++++++++++++++++++++-----------------------
 1 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c6f78e2..23bc3ff 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4548,62 +4548,68 @@ static bool tcp_try_coalesce(struct sock *sk,
 	int i, delta, len = from->len;
 
 	*fragstolen = false;
+
 	if (tcp_hdr(from)->fin || skb_cloned(to))
 		return false;
+
 	if (len <= skb_tailroom(to)) {
 		BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
-merge:
-		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
-		TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
-		TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
-		return true;
+		goto merge;
 	}
 
 	if (skb_has_frag_list(to) || skb_has_frag_list(from))
 		return false;
 
-	if (skb_headlen(from) == 0 &&
-	    (skb_shinfo(to)->nr_frags +
-	     skb_shinfo(from)->nr_frags <= MAX_SKB_FRAGS)) {
-		WARN_ON_ONCE(from->head_frag);
-		delta = from->truesize - ksize(from->head) -
-			SKB_DATA_ALIGN(sizeof(struct sk_buff));
-
-		WARN_ON_ONCE(delta < len);
-copyfrags:
-		memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
-		       skb_shinfo(from)->frags,
-		       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
-		skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
-
-		if (skb_cloned(from))
-			for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
-				skb_frag_ref(from, i);
-		else
-			skb_shinfo(from)->nr_frags = 0;
-
-		to->truesize += delta;
-		atomic_add(delta, &sk->sk_rmem_alloc);
-		sk_mem_charge(sk, delta);
-		to->len += len;
-		to->data_len += len;
-		goto merge;
-	}
-	if (from->head_frag && !skb_cloned(from)) {
+	if (skb_headlen(from) != 0) {
 		struct page *page;
 		unsigned int offset;
 
-		if (skb_shinfo(to)->nr_frags + skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+		if (skb_shinfo(to)->nr_frags +
+		    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+			return false;
+
+		if (!from->head_frag || skb_cloned(from))
 			return false;
+
+		delta = from->truesize - sizeof(struct sk_buff);
+
 		page = virt_to_head_page(from->head);
 		offset = from->data - (unsigned char *)page_address(page);
+
 		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
 				   page, offset, skb_headlen(from));
 		*fragstolen = true;
-		delta = len; /* we dont know real truesize... */
-		goto copyfrags;
+	} else {
+		if (skb_shinfo(to)->nr_frags +
+		    skb_shinfo(from)->nr_frags > MAX_SKB_FRAGS)
+			return false;
+
+		delta = from->truesize -
+			SKB_TRUESIZE(skb_end_pointer(from) - from->head);
 	}
-	return false;
+
+	memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
+	       skb_shinfo(from)->frags,
+	       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
+	skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
+
+	if (!skb_cloned(from))
+		skb_shinfo(from)->nr_frags = 0;
+
+	for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
+		skb_frag_ref(from, i);
+
+	to->truesize += delta;
+	atomic_add(delta, &sk->sk_rmem_alloc);
+	sk_mem_charge(sk, delta);
+	to->len += len;
+	to->data_len += len;
+
+merge:
+	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
+	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
+	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
+	return true;
 }
 
 static void kfree_skb_partial(struct sk_buff *skb, bool head_stolen)

^ permalink raw reply related

* Re: [PATCH 1/2] net: Stop decapitating clones that have a head_frag
From: Eric Dumazet @ 2012-05-03  3:56 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, Eric Dumazet, Jeff Kirsher
In-Reply-To: <20120503033856.5482.70122.stgit@gitlad.jf.intel.com>

On Wed, 2012-05-02 at 20:38 -0700, Alexander Duyck wrote:
> This change is meant ot prevent stealing the skb->head to use as a page in
> the event that the skb->head was cloned.  This allows the other clones to
> track each other via shinfo->dataref.
> 
> Without this we break down to two methods for tracking the reference count,
> one being dataref, the other being the page count.  As a result it becomes
> difficult to track how many references there are to skb->head.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> 
>  net/core/skbuff.c    |    2 +-
>  net/ipv4/tcp_input.c |    9 ++-------
>  2 files changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 52ba2b5..8703754 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1699,7 +1699,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
>  			      struct splice_pipe_desc *spd, struct sock *sk)
>  {
>  	int seg;
> -	bool head_is_linear = !skb->head_frag;
> +	bool head_is_linear = !skb->head_frag || skb_cloned(skb);
>  

Can you chose a better name than head_is_linear maybe ?

>  	/* map the linear part :
>  	 * If skb->head_frag is set, this 'linear' part is backed
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2f696ef..c6f78e2 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4589,7 +4589,7 @@ copyfrags:
>  		to->data_len += len;
>  		goto merge;
>  	}
> -	if (from->head_frag) {
> +	if (from->head_frag && !skb_cloned(from)) {
>  		struct page *page;
>  		unsigned int offset;
>  
> @@ -4599,12 +4599,7 @@ copyfrags:
>  		offset = from->data - (unsigned char *)page_address(page);
>  		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
>  				   page, offset, skb_headlen(from));
> -
> -		if (skb_cloned(from))
> -			get_page(page);
> -		else
> -			*fragstolen = true;
> -
> +		*fragstolen = true;
>  		delta = len; /* we dont know real truesize... */
>  		goto copyfrags;
>  	}

Thanks

^ permalink raw reply

* Re: [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: Eric Dumazet @ 2012-05-03  4:06 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, Eric Dumazet, Jeff Kirsher
In-Reply-To: <20120503033901.5482.27183.stgit@gitlad.jf.intel.com>

On Wed, 2012-05-02 at 20:39 -0700, Alexander Duyck wrote:
> This change is mostly meant to help improve the readability of
> tcp_try_coalesce.  I had a few issues since there were several points when
> the code would test for a conditional, fail, then succeed on another
> conditional take some action, and then follow a goto back into the previous
> conditional.  I just tore all of that apart and made the whole thing one
> linear flow with a single goto.
> 
> Also there were multiple ways of computing the delta, the one for head_frag
> made the least amount of sense to me since we were only dropping the
> sk_buff so I have updated the logic for the stolen head case so that delta
> is only truesize - sizeof(skb_buff), and for the case where we are dropping
> the head as well it is truesize - SKB_TRUESIZE(skb_end_pointer - head).
> This way we can also account for the head_frag with headlen == 0.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> 

Sorry I prefer you dont touch this code like this.

The truesize bits must stay as is, since it'll track drivers that lies
about truesize.

>  net/ipv4/tcp_input.c |   80 +++++++++++++++++++++++++++-----------------------
>  1 files changed, 43 insertions(+), 37 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c6f78e2..23bc3ff 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4548,62 +4548,68 @@ static bool tcp_try_coalesce(struct sock *sk,
>  	int i, delta, len = from->len;
>  
>  	*fragstolen = false;
> +
>  	if (tcp_hdr(from)->fin || skb_cloned(to))
>  		return false;
> +
>  	if (len <= skb_tailroom(to)) {
>  		BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
> -merge:
> -		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
> -		TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
> -		TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
> -		return true;
> +		goto merge;
>  	}
>  
>  	if (skb_has_frag_list(to) || skb_has_frag_list(from))
>  		return false;
>  
> -	if (skb_headlen(from) == 0 &&
> -	    (skb_shinfo(to)->nr_frags +
> -	     skb_shinfo(from)->nr_frags <= MAX_SKB_FRAGS)) {
> -		WARN_ON_ONCE(from->head_frag);
> -		delta = from->truesize - ksize(from->head) -
> -			SKB_DATA_ALIGN(sizeof(struct sk_buff));


This delta was done on purpose. We want the ksize()

> -
> -		WARN_ON_ONCE(delta < len);
> -copyfrags:
> -		memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
> -		       skb_shinfo(from)->frags,
> -		       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> -		skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
> -
> -		if (skb_cloned(from))
> -			for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
> -				skb_frag_ref(from, i);
> -		else
> -			skb_shinfo(from)->nr_frags = 0;
> -
> -		to->truesize += delta;
> -		atomic_add(delta, &sk->sk_rmem_alloc);
> -		sk_mem_charge(sk, delta);
> -		to->len += len;
> -		to->data_len += len;
> -		goto merge;
> -	}
> -	if (from->head_frag && !skb_cloned(from)) {
> +	if (skb_headlen(from) != 0) {
>  		struct page *page;
>  		unsigned int offset;
>  
> -		if (skb_shinfo(to)->nr_frags + skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
> +		if (skb_shinfo(to)->nr_frags +
> +		    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
> +			return false;
> +
> +		if (!from->head_frag || skb_cloned(from))
>  			return false;
> +
> +		delta = from->truesize - sizeof(struct sk_buff);



> +
>  		page = virt_to_head_page(from->head);
>  		offset = from->data - (unsigned char *)page_address(page);
> +
>  		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
>  				   page, offset, skb_headlen(from));
>  		*fragstolen = true;
> -		delta = len; /* we dont know real truesize... */
> -		goto copyfrags;
> +	} else {
> +		if (skb_shinfo(to)->nr_frags +
> +		    skb_shinfo(from)->nr_frags > MAX_SKB_FRAGS)
> +			return false;
> +
> +		delta = from->truesize -
> +			SKB_TRUESIZE(skb_end_pointer(from) - from->head);

No... SKB_TRUESIZE() doesnt account of power-of-two roundings in
kmalloc()

>  	}
> -	return false;
> +
> +	memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
> +	       skb_shinfo(from)->frags,
> +	       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> +	skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
> +
> +	if (!skb_cloned(from))
> +		skb_shinfo(from)->nr_frags = 0;
> +

You break the code here... we had an else.

> +	for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
> +		skb_frag_ref(from, i);
> +
> +	to->truesize += delta;
> +	atomic_add(delta, &sk->sk_rmem_alloc);
> +	sk_mem_charge(sk, delta);
> +	to->len += len;
> +	to->data_len += len;
> +
> +merge:
> +	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
> +	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
> +	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
> +	return true;
>  }
>  
>  static void kfree_skb_partial(struct sk_buff *skb, bool head_stolen)
> 

Really this patch is too hard to review.

Thanks

^ permalink raw reply

* [v2 PATCH] net: Stop decapitating clones that have a head_frag
From: Alexander Duyck @ 2012-05-03  4:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, Alexander Duyck, Eric Dumazet, Jeff Kirsher

This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 net/core/skbuff.c    |    9 +++++----
 net/ipv4/tcp_input.c |    9 ++-------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 52ba2b5..9e8caa0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1699,17 +1699,18 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
 			      struct splice_pipe_desc *spd, struct sock *sk)
 {
 	int seg;
-	bool head_is_linear = !skb->head_frag;
+	bool head_is_locked = !skb->head_frag || skb_cloned(skb);
 
 	/* map the linear part :
-	 * If skb->head_frag is set, this 'linear' part is backed
-	 * by a fragment, and we can avoid a copy.
+	 * If skb->head_frag is set, this 'linear' part is backed by a
+	 * fragment, and if the head is not shared with any clones then
+	 * we can avoid a copy since we own the head portion of this page.
 	 */
 	if (__splice_segment(virt_to_page(skb->data),
 			     (unsigned long) skb->data & (PAGE_SIZE - 1),
 			     skb_headlen(skb),
 			     offset, len, skb, spd,
-			     head_is_linear,
+			     head_is_locked,
 			     sk, pipe))
 		return true;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2f696ef..c6f78e2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4589,7 +4589,7 @@ copyfrags:
 		to->data_len += len;
 		goto merge;
 	}
-	if (from->head_frag) {
+	if (from->head_frag && !skb_cloned(from)) {
 		struct page *page;
 		unsigned int offset;
 
@@ -4599,12 +4599,7 @@ copyfrags:
 		offset = from->data - (unsigned char *)page_address(page);
 		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
 				   page, offset, skb_headlen(from));
-
-		if (skb_cloned(from))
-			get_page(page);
-		else
-			*fragstolen = true;
-
+		*fragstolen = true;
 		delta = len; /* we dont know real truesize... */
 		goto copyfrags;
 	}

^ permalink raw reply related

* Re: [v2 PATCH] net: Stop decapitating clones that have a head_frag
From: Eric Dumazet @ 2012-05-03  4:33 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, Eric Dumazet, Jeff Kirsher
In-Reply-To: <20120503041842.8315.16138.stgit@gitlad.jf.intel.com>

On Wed, 2012-05-02 at 21:18 -0700, Alexander Duyck wrote:
> This change is meant ot prevent stealing the skb->head to use as a page in
> the event that the skb->head was cloned.  This allows the other clones to
> track each other via shinfo->dataref.
> 
> Without this we break down to two methods for tracking the reference count,
> one being dataref, the other being the page count.  As a result it becomes
> difficult to track how many references there are to skb->head.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> 
>  net/core/skbuff.c    |    9 +++++----
>  net/ipv4/tcp_input.c |    9 ++-------
>  2 files changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 52ba2b5..9e8caa0 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1699,17 +1699,18 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
>  			      struct splice_pipe_desc *spd, struct sock *sk)
>  {
>  	int seg;
> -	bool head_is_linear = !skb->head_frag;
> +	bool head_is_locked = !skb->head_frag || skb_cloned(skb);
>  

Thanks

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net v2] cdc_ether: Ignore bogus union descriptor for RNDIS devices
From: Markus Kolb @ 2012-05-03  4:57 UTC (permalink / raw)
  To: David Miller, bjorn-yOkvZcmFvRU
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	shaola-r/HQZD9NxM9g9hUCZPvPmw, jrnieder-Re5JQEeQqe8AvxtiuMwx3w,
	oliver-GvhC2dPhHPQdnm+yROfE0A, 655387-61a8vm9lEZVf4u+23C9RwQ,
	stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120502.211411.560705499667456177.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>



David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> schrieb:

>From: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
>Date: Thu, 26 Apr 2012 14:35:10 +0200
>
>> The same comments as for v1 regarding testing applies.  This is build
>> tested only.  Should go through some functional testing before being
>> applied.
>
>Well?  Is anyone gonna test this?

I'll build it during next rainy day and will report its success after some usage ;-)

Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: Alexander Duyck @ 2012-05-03  4:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, Eric Dumazet, Jeff Kirsher
In-Reply-To: <1336017985.12425.9.camel@edumazet-glaptop>

On 5/2/2012 9:06 PM, Eric Dumazet wrote:
> On Wed, 2012-05-02 at 20:39 -0700, Alexander Duyck wrote:
>> This change is mostly meant to help improve the readability of
>> tcp_try_coalesce.  I had a few issues since there were several points when
>> the code would test for a conditional, fail, then succeed on another
>> conditional take some action, and then follow a goto back into the previous
>> conditional.  I just tore all of that apart and made the whole thing one
>> linear flow with a single goto.
>>
>> Also there were multiple ways of computing the delta, the one for head_frag
>> made the least amount of sense to me since we were only dropping the
>> sk_buff so I have updated the logic for the stolen head case so that delta
>> is only truesize - sizeof(skb_buff), and for the case where we are dropping
>> the head as well it is truesize - SKB_TRUESIZE(skb_end_pointer - head).
>> This way we can also account for the head_frag with headlen == 0.
>>
>> Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>> Cc: Eric Dumazet<edumazet@google.com>
>> Cc: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>> ---
>>
> Sorry I prefer you dont touch this code like this.
>
> The truesize bits must stay as is, since it'll track drivers that lies
> about truesize.

I can understand that.  But from what I can tell the only way they can 
lie about truesize is to actually change the value itself.  The code I 
modified was just tightening things up so we didn't do things like use 
the length to track the truesize like we were in the original code.  
 From what I can tell the new code should have been more accurate.

>>   net/ipv4/tcp_input.c |   80 +++++++++++++++++++++++++++-----------------------
>>   1 files changed, 43 insertions(+), 37 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index c6f78e2..23bc3ff 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -4548,62 +4548,68 @@ static bool tcp_try_coalesce(struct sock *sk,
>>   	int i, delta, len = from->len;
>>
>>   	*fragstolen = false;
>> +
>>   	if (tcp_hdr(from)->fin || skb_cloned(to))
>>   		return false;
>> +
>>   	if (len<= skb_tailroom(to)) {
>>   		BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
>> -merge:
>> -		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
>> -		TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
>> -		TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
>> -		return true;
>> +		goto merge;
>>   	}
>>
>>   	if (skb_has_frag_list(to) || skb_has_frag_list(from))
>>   		return false;
>>
>> -	if (skb_headlen(from) == 0&&
>> -	    (skb_shinfo(to)->nr_frags +
>> -	     skb_shinfo(from)->nr_frags<= MAX_SKB_FRAGS)) {
>> -		WARN_ON_ONCE(from->head_frag);
>> -		delta = from->truesize - ksize(from->head) -
>> -			SKB_DATA_ALIGN(sizeof(struct sk_buff));
>
> This delta was done on purpose. We want the ksize()
The question I have is how can you get into a case where the ksize is 
different from the end offset plus the aligned size of skb_shared_info?  
>From what I can tell it looks like the only place we can lie is if we 
use build_skb with the frag_size option, and in that case we are using a 
page, not kmalloc memory.  Otherwise in all other cases __alloc_skb or 
build_skb is using ksize(skb->head) - SKB_DATA_ALIGN(struct 
skb_shared_info) to set the end pointer, so reversing that should give 
us the same value as ksize(skb->head).

>> -
>> -		WARN_ON_ONCE(delta<  len);
>> -copyfrags:
>> -		memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>> -		       skb_shinfo(from)->frags,
>> -		       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
>> -		skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
>> -
>> -		if (skb_cloned(from))
>> -			for (i = 0; i<  skb_shinfo(from)->nr_frags; i++)
>> -				skb_frag_ref(from, i);
>> -		else
>> -			skb_shinfo(from)->nr_frags = 0;
>> -
>> -		to->truesize += delta;
>> -		atomic_add(delta,&sk->sk_rmem_alloc);
>> -		sk_mem_charge(sk, delta);
>> -		to->len += len;
>> -		to->data_len += len;
>> -		goto merge;
>> -	}
>> -	if (from->head_frag&&  !skb_cloned(from)) {
>> +	if (skb_headlen(from) != 0) {
>>   		struct page *page;
>>   		unsigned int offset;
>>
>> -		if (skb_shinfo(to)->nr_frags + skb_shinfo(from)->nr_frags>= MAX_SKB_FRAGS)
>> +		if (skb_shinfo(to)->nr_frags +
>> +		    skb_shinfo(from)->nr_frags>= MAX_SKB_FRAGS)
>> +			return false;
>> +
>> +		if (!from->head_frag || skb_cloned(from))
>>   			return false;
>> +
>> +		delta = from->truesize - sizeof(struct sk_buff);
>>
It looks like you were thinking of something here since the quote lines 
were broken.  This was the piece I saw in the original code that caught 
my eye as probably being wrong.  We are letting packets using head_frag 
just report length as the truesize.  This is why I favored using the end 
offset.  In the case of us using the head_frag we should only be 
deducting SKB_DATA_ALIGN(sizeof(struct sk_buff)) (I just relized the 
line above is incorrect), and in the case of us dropping the head_frag 
as well we should be able to also deduct the size of the shared_info and 
everything up to the offset of end.

>> +
>>   		page = virt_to_head_page(from->head);
>>   		offset = from->data - (unsigned char *)page_address(page);
>> +
>>   		skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
>>   				   page, offset, skb_headlen(from));
>>   		*fragstolen = true;
>> -		delta = len; /* we dont know real truesize... */
>> -		goto copyfrags;
>> +	} else {
>> +		if (skb_shinfo(to)->nr_frags +
>> +		    skb_shinfo(from)->nr_frags>  MAX_SKB_FRAGS)
>> +			return false;
>> +
>> +		delta = from->truesize -
>> +			SKB_TRUESIZE(skb_end_pointer(from) - from->head);
> No... SKB_TRUESIZE() doesnt account of power-of-two roundings in
> kmalloc()
You used ksize in alloc_skb or build_skb to set the end pointer.  All I 
am doing by using the end pointer is getting the value you had already 
extracted.  The end pointer should be unmovable once it is set so I 
wouldn't think it would be an issue to use it to compute the truesize 
for the section including the sk_buff and skb->head.


>>   	}
>> -	return false;
>> +
>> +	memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>> +	       skb_shinfo(from)->frags,
>> +	       skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
>> +	skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
>> +
>> +	if (!skb_cloned(from))
>> +		skb_shinfo(from)->nr_frags = 0;
>> +
> You break the code here... we had an else.
Yes and no.  I just realized that I didn't need the else because the for 
loop below does nothing if nr_frags is 0.  I suspect gcc is probably 
smart enough to just skip the loop if the skb is cloned.  I should 
probably have added a one line comment explaining that above the loop.

>> +	for (i = 0; i<  skb_shinfo(from)->nr_frags; i++)
>> +		skb_frag_ref(from, i);
>> +
>> +	to->truesize += delta;
>> +	atomic_add(delta,&sk->sk_rmem_alloc);
>> +	sk_mem_charge(sk, delta);
>> +	to->len += len;
>> +	to->data_len += len;
>> +
>> +merge:
>> +	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
>> +	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
>> +	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
>> +	return true;
>>   }
>>
>>   static void kfree_skb_partial(struct sk_buff *skb, bool head_stolen)
>>
> Really this patch is too hard to review.
Yeah, it is a bit difficult.  After Dave pulled your patches it ended up 
pushing most of the changes into this one.  I'll see if I can break this 
back up into smaller bits.  I just needed to flush the patches I had so 
I could get some feedback and stop talking in circles about the other patch.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net v2] cdc_ether: Ignore bogus union descriptor for RNDIS devices
From: David Miller @ 2012-05-03  5:11 UTC (permalink / raw)
  To: linux-201011-TzK+PxyQ8t4hFhg+JK9F0w
  Cc: bjorn-yOkvZcmFvRU, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, shaola-r/HQZD9NxM9g9hUCZPvPmw,
	jrnieder-Re5JQEeQqe8AvxtiuMwx3w, oliver-GvhC2dPhHPQdnm+yROfE0A,
	655387-61a8vm9lEZVf4u+23C9RwQ, stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <01138af4-69c3-4adb-b7f6-1e5e6f471df2-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>

From: Markus Kolb <linux-201011-TzK+PxyQ8t4hFhg+JK9F0w@public.gmane.org>
Date: Thu, 03 May 2012 06:57:39 +0200

> I'll build it during next rainy day and will report its success
> after some usage ;-)

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: Eric Dumazet @ 2012-05-03  5:19 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, netdev, davem, Eric Dumazet, Jeff Kirsher
In-Reply-To: <4FA21087.1080801@gmail.com>

On Wed, 2012-05-02 at 21:58 -0700, Alexander Duyck wrote:
> The question I have is how can you get into a case where the ksize is 
> different from the end offset plus the aligned size of skb_shared_info?  
> From what I can tell it looks like the only place we can lie is if we 
> use build_skb with the frag_size option, and in that case we are using a 
> page, not kmalloc memory.  Otherwise in all other cases __alloc_skb or 
> build_skb is using ksize(skb->head) - SKB_DATA_ALIGN(struct 
> skb_shared_info) to set the end pointer, so reversing that should give 
> us the same value as ksize(skb->head).


Right after skb is allocated (build_skb() or other skb_alloc...
variants), truesize is correct by construction.

Then drivers add fragments and can make truesize smaller than reality.

And Intel drivers are known to abuse truesize.

My last patch against iwlwifi is still waiting to make its way into
official tree.

http://www.spinics.net/lists/netdev/msg192629.html

^ permalink raw reply

* Re: [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: David Miller @ 2012-05-03  5:25 UTC (permalink / raw)
  To: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w,
	alexander.h.duyck-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336022373.12425.24.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Thu, 03 May 2012 07:19:33 +0200

> My last patch against iwlwifi is still waiting to make its way into
> official tree.
> 
> http://www.spinics.net/lists/netdev/msg192629.html

John, please rectify this situation.

The Intel Wireless folks said they would test it, but that was more
than a month ago.

It's not acceptable to let bug fixes rot for that long, I don't care
what their special internal testing procedure is.

If they give you further pushback, please just ignore them and apply
Eric's fix directly.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox