Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BQL crap and wireless
From: John W. Linville @ 2011-08-31 19:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tom Herbert, linux-wireless, Andrew McGregor, Matt Smith,
	Kevin Hayes, Dave Taht, Derek Smithies, netdev, bloat
In-Reply-To: <CAB=NE6VuP=AvYiJsYn_Noc1u0Q=jvQPutHRANdSiLP2v48ogfg@mail.gmail.com>

On Fri, Aug 26, 2011 at 04:27:22PM -0700, Luis R. Rodriguez wrote:
> I've just read this thread:
> 
> http://marc.info/?t=131277868500001&r=1&w=2
> 
> Since its not linux-wireless I'll chime in here. It seems that you are
> trying to write an algorithm that will work for all networking and
> 802.11 devices. For networking is seems tough given driver
> architecture and structure and the hope that all drivers will report
> things in a fairly similar way. For 802.11 it was pointed out how we
> have varying bandwidths and depending on the technology used for
> connection (AP, 802.11s, IBSS) a different number of possible peers
> need to be considered. 802.11 faced similar algorithmic complexities
> with rate control and the way Andrew and Derek resolved this was to
> not assume you could solve this problem and simply test out the water
> by trial and error, that gave birth to the minstrel rate control
> algorithm which Felix later rewrote for mac80211 with 802.11n support
> [1]. Can the BQL algorithm make use of the same trial and error
> mechanism and simply try different values and and use EWMA [2] to pick
> the best size for the queue ?
> 
> [1] http://wireless.kernel.org/en/developers/Documentation/mac80211/RateControl/minstrel
> [2] http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
> 
>   Luis

Since Luis has stirred things up, I took the liberty of adding Tom's
BQL series to the debloat-testing tree:

	git://git.infradead.org/debloat-testing.git

I'm not sure how well BQL will interact with my eBDP-inspired hack,
debloat-testing commit d2566a176589775cb3a1de4fade972aa62d551ff
("mac80211: implement eBDP algorithm to fight bufferbloat").  So if
anyone wants to try enabling a wireless driver for BQL, you might
consider reverting that patch or at least testing without it.

<shameless plug>
And, of course, don't forget to come to the Bufferbloat and Networking
track at Linux Plumbers Conference next week in Santa Rosa, CA! :-)
</shameless plug>

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* [PATCH 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: David Daney @ 2011-08-31 20:01 UTC (permalink / raw)
  To: linux-mips-6z/3iImG2C8G8FEW9MqTrA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

We have several different boards with a multiplexer in the MDIO bus.
There is an MDIO bus controller connected to a switching device with
several child MDIO busses.

Everything is wired up using device tree bindings.

 1/3 - New of_mdio_find_bus() function used to help configuring the
       driver topology.

 2/3 - MDIO bus multiplexer framework.

 3/3 - A driver for a GPIO controlled multiplexer.

I have an additional patch I am working on for an I2C controlled
multiplexer that I will follow up with once this reaches a mergable
state.

David Daney (3):
  netdev/of/phy: New function: of_mdio_find_bus().
  netdev/of/phy: Add MDIO bus multiplexer support.
  netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.

 .../devicetree/bindings/net/mdio-mux-gpio.txt      |  127 ++++++++++++++
 Documentation/devicetree/bindings/net/mdio-mux.txt |  132 ++++++++++++++
 drivers/net/phy/Kconfig                            |   17 ++
 drivers/net/phy/Makefile                           |    2 +
 drivers/net/phy/mdio-mux-gpio.c                    |  143 +++++++++++++++
 drivers/net/phy/mdio-mux.c                         |  182 ++++++++++++++++++++
 drivers/net/phy/mdio_bus.c                         |    3 +-
 drivers/of/of_mdio.c                               |   26 +++
 include/linux/mdio-mux.h                           |   18 ++
 include/linux/of_mdio.h                            |    2 +
 include/linux/phy.h                                |    1 +
 11 files changed, 652 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux.txt
 create mode 100644 drivers/net/phy/mdio-mux-gpio.c
 create mode 100644 drivers/net/phy/mdio-mux.c
 create mode 100644 include/linux/mdio-mux.h

-- 
1.7.2.3

^ permalink raw reply

* [PATCH 1/3] netdev/of/phy: New function: of_mdio_find_bus().
From: David Daney @ 2011-08-31 20:01 UTC (permalink / raw)
  To: linux-mips-6z/3iImG2C8G8FEW9MqTrA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: David S. Miller
In-Reply-To: <1314820906-14004-1-git-send-email-david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

Add of_mdio_find_bus() which allows an mii_bus to be located given its
associated the device tree node.

This is needed by the follow-on patch to add a driver for MDIO bus
multiplexers.

The of_mdiobus_register() function is modified so that the device tree
node is recorded in the mii_bus.  Then we can find it again by
iterating over all mdio_bus_class devices.

Signed-off-by: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Cc: Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org>
Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
---
 drivers/net/phy/mdio_bus.c |    3 ++-
 drivers/of/of_mdio.c       |   26 ++++++++++++++++++++++++++
 include/linux/of_mdio.h    |    2 ++
 include/linux/phy.h        |    1 +
 4 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 6c58da2..227a060 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -70,10 +70,11 @@ static void mdiobus_release(struct device *d)
 	kfree(bus);
 }
 
-static struct class mdio_bus_class = {
+struct class mdio_bus_class = {
 	.name		= "mdio_bus",
 	.dev_release	= mdiobus_release,
 };
+EXPORT_SYMBOL(mdio_bus_class);
 
 /**
  * mdiobus_register - bring up all the PHYs on a given bus and attach them to bus
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index d35e300..7c28e8c 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -45,6 +45,8 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 		for (i=0; i<PHY_MAX_ADDR; i++)
 			mdio->irq[i] = PHY_POLL;
 
+	mdio->dev.of_node = np;
+
 	/* Register the MDIO bus */
 	rc = mdiobus_register(mdio);
 	if (rc)
@@ -189,3 +191,27 @@ struct phy_device *of_phy_connect_fixed_link(struct net_device *dev,
 	return IS_ERR(phy) ? NULL : phy;
 }
 EXPORT_SYMBOL(of_phy_connect_fixed_link);
+
+/**
+ * of_mdio_find_bus - Given an mii_bus node, find the mii_bus.
+ * @mdio_np: Pointer to the mii_bus.
+ *
+ * Returns a pointer to the mii_bus, or NULL if none found.
+ *
+ * Because the association of a device_node and mii_bus is made via
+ * of_mdiobus_register(), the mii_bus cannot be found before it is
+ * registered with of_mdiobus_register().
+ *
+ */
+struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np)
+{
+	struct device *d;
+
+	if (!mdio_np)
+		return NULL;
+
+	d = class_find_device(&mdio_bus_class, NULL,  mdio_np, of_phy_match);
+
+	return d ? to_mii_bus(d) : NULL;
+}
+EXPORT_SYMBOL(of_mdio_find_bus);
diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h
index 53b94e0..912c27a 100644
--- a/include/linux/of_mdio.h
+++ b/include/linux/of_mdio.h
@@ -22,4 +22,6 @@ extern struct phy_device *of_phy_connect_fixed_link(struct net_device *dev,
 					 void (*hndlr)(struct net_device *),
 					 phy_interface_t iface);
 
+extern struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np);
+
 #endif /* __LINUX_OF_MDIO_H */
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 54fc413..e4c3844 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -528,4 +528,5 @@ int __init mdio_bus_init(void);
 void mdio_bus_exit(void);
 
 extern struct bus_type mdio_bus_type;
+extern struct class mdio_bus_class;
 #endif /* __PHY_H */
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 2/3] netdev/of/phy: Add MDIO bus multiplexer support.
From: David Daney @ 2011-08-31 20:01 UTC (permalink / raw)
  To: linux-mips-6z/3iImG2C8G8FEW9MqTrA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: David S. Miller
In-Reply-To: <1314820906-14004-1-git-send-email-david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

This patch adds a somewhat generic framework for MDIO bus
multiplexers.  It is modeled on the I2C multiplexer.

The multiplexer is needed if there are multiple PHYs with the same
address connected to the same MDIO bus adepter, or if there is
insufficient electrical drive capability for all the connected PHY
devices.

Conceptually it could look something like this:

                   ------------------
                   | Control Signal |
                   --------+---------
                           |
 ---------------   --------+------
 | MDIO MASTER |---| Multiplexer |
 ---------------   --+-------+----
                     |       |
                     C       C
                     h       h
                     i       i
                     l       l
                     d       d
                     |       |
     ---------       A       B   ---------
     |       |       |       |   |       |
     | PHY@1 +-------+       +---+ PHY@1 |
     |       |       |       |   |       |
     ---------       |       |   ---------
     ---------       |       |   ---------
     |       |       |       |   |       |
     | PHY@2 +-------+       +---+ PHY@2 |
     |       |                   |       |
     ---------                   ---------

This framework configures the bus topology from device tree data.  The
mechanics of switching the multiplexer is left to device specific
drivers.

The follow-on patch contains a multiplexer driven by GPIO lines.

Signed-off-by: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Cc: Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org>
Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
---
 Documentation/devicetree/bindings/net/mdio-mux.txt |  132 ++++++++++++++
 drivers/net/phy/Kconfig                            |    8 +
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/mdio-mux.c                         |  182 ++++++++++++++++++++
 include/linux/mdio-mux.h                           |   18 ++
 5 files changed, 341 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux.txt
 create mode 100644 drivers/net/phy/mdio-mux.c
 create mode 100644 include/linux/mdio-mux.h

diff --git a/Documentation/devicetree/bindings/net/mdio-mux.txt b/Documentation/devicetree/bindings/net/mdio-mux.txt
new file mode 100644
index 0000000..a908312
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux.txt
@@ -0,0 +1,132 @@
+Common MDIO bus multiplexer/switch properties.
+
+An MDIO bus multiplexer/switch will have several child busses that are
+numbered uniquely in a device dependent manner.  The nodes for an MDIO
+bus multiplexer/switch will have one child node for each child bus.
+
+Required properties:
+- parent-bus : phandle to the parent MDIO bus.
+
+Optional properties:
+- Other properties specific to the multiplexer/switch hardware.
+
+Required properties for child nodes:
+- #address-cells = <1>;
+- #size-cells = <0>;
+- cell-index : The sub-bus number.
+
+
+Example :
+
+	/* The parent MDIO bus. */
+	smi1: mdio@1180000001900 {
+		compatible = "cavium,octeon-3860-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0x11800 0x00001900 0x0 0x40>;
+	};
+
+	/*
+	   An NXP sn74cbtlv3253 dual 1-of-4 switch controlled by a
+	   pair of GPIO lines.  Child busses 2 and 3 populated with 4
+	   PHYs each.
+	 */
+	mdio-mux {
+		compatible = "cavium,mdio-mux-sn74cbtlv3253", "cavium,mdio-mux";
+		gpios = <&gpio1 3 0>, <&gpio1 4 0>;
+		parent-bus = <&smi1>;
+
+		mdio@2 {
+			cell-index = <2>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy11: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy12: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy13: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy14: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+		};
+
+		mdio@3 {
+			cell-index = <3>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy21: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy22: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy23: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy24: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+		};
+	};
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index a702443..59848bc 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -119,6 +119,14 @@ config MDIO_GPIO
 	  To compile this driver as a module, choose M here: the module
 	  will be called mdio-gpio.
 
+config MDIO_BUS_MUX
+	tristate "Support for MDIO bus multiplexers"
+	help
+	  This module provides a driver framework for MDIO bus
+	  multiplexers which connect one of several child MDIO busses
+	  to a parent bus.  Switching between child busses is done by
+	  device specific drivers.
+
 config MDIO_OCTEON
 	tristate "Support for MDIO buses on Octeon SOCs"
 	depends on  CPU_CAVIUM_OCTEON
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 2333215..0c081d5 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
 obj-$(CONFIG_STE10XP)		+= ste10Xp.o
 obj-$(CONFIG_MICREL_PHY)	+= micrel.o
 obj-$(CONFIG_MDIO_OCTEON)	+= mdio-octeon.o
+obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o
diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c
new file mode 100644
index 0000000..f9c5826
--- /dev/null
+++ b/drivers/net/phy/mdio-mux.c
@@ -0,0 +1,182 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Cavium Networks
+ */
+
+#include <linux/platform_device.h>
+#include <linux/of_mdio.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/phy.h>
+#include <linux/mdio-mux.h>
+
+#define DRV_VERSION "1.0"
+#define DRV_DESCRIPTION "MDIO bus multiplexer driver"
+
+struct mdio_mux_parent_bus {
+	struct mii_bus *mii_bus;
+	int current_child;
+	int parent_id;
+	void *switch_data;
+	int (*switch_fn)(int current_child, int desired_child, void *data);
+};
+
+struct mdio_mux_child_bus {
+	struct mii_bus *mii_bus;
+	struct mdio_mux_parent_bus *parent;
+	int bus_number;
+	int phy_irq[PHY_MAX_ADDR];
+};
+
+/*
+ * The parent bus' lock is used to order access to the switch_fn.
+ */
+static int mdio_mux_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+	struct mdio_mux_child_bus *cb = bus->priv;
+	struct mdio_mux_parent_bus *pb = cb->parent;
+	int r;
+
+	mutex_lock(&pb->mii_bus->mdio_lock);
+	r = pb->switch_fn(pb->current_child, cb->bus_number, pb->switch_data);
+	if (r)
+		goto out;
+
+	pb->current_child = cb->bus_number;
+
+	r = pb->mii_bus->read(pb->mii_bus, phy_id, regnum);
+out:
+	mutex_unlock(&pb->mii_bus->mdio_lock);
+
+	return r;
+}
+
+/*
+ * The parent bus' lock is used to order access to the switch_fn.
+ */
+static int mdio_mux_write(struct mii_bus *bus, int phy_id,
+			  int regnum, u16 val)
+{
+	struct mdio_mux_child_bus *cb = bus->priv;
+	struct mdio_mux_parent_bus *pb = cb->parent;
+
+	int r;
+
+	mutex_lock(&pb->mii_bus->mdio_lock);
+	r = pb->switch_fn(pb->current_child, cb->bus_number, pb->switch_data);
+	if (r)
+		goto out;
+
+	pb->current_child = cb->bus_number;
+
+	r = pb->mii_bus->write(pb->mii_bus, phy_id, regnum, val);
+out:
+	mutex_unlock(&pb->mii_bus->mdio_lock);
+
+	return r;
+}
+
+static int parent_count;
+
+int mdio_mux_probe(struct platform_device *pdev,
+		   int (*switch_fn)(int cur, int desired, void *data),
+		   void *data)
+{
+	struct device_node *parent_bus_node;
+	struct device_node *child_bus_node;
+	int r, n, ret_val;
+	struct mii_bus *parent_bus;
+	struct mdio_mux_parent_bus *pb;
+	struct mdio_mux_child_bus *cb;
+
+	if (!pdev->dev.of_node)
+		return -ENODEV;
+
+	parent_bus_node = of_parse_phandle(pdev->dev.of_node, "parent-bus", 0);
+
+	if (!parent_bus_node)
+		return -ENODEV;
+
+	parent_bus = of_mdio_find_bus(parent_bus_node);
+
+	pb = devm_kzalloc(&pdev->dev, sizeof(*pb), GFP_KERNEL);
+	if (pb == NULL) {
+		ret_val = -ENOMEM;
+		goto err_parent_bus;
+	}
+
+	pb->switch_data = data;
+	pb->switch_fn = switch_fn;
+	pb->current_child = -1;
+	pb->parent_id = parent_count++;
+	pb->mii_bus = parent_bus;
+
+	n = 0;
+	for_each_child_of_node(pdev->dev.of_node, child_bus_node) {
+		u32 v;
+
+		r = of_property_read_u32(child_bus_node, "cell-index", &v);
+		if (r == 0) {
+			cb = devm_kzalloc(&pdev->dev, sizeof(*cb), GFP_KERNEL);
+			if (cb == NULL)
+				break;
+			cb->bus_number = v;
+			cb->parent = pb;
+			cb->mii_bus = mdiobus_alloc();
+			cb->mii_bus->priv = cb;
+
+			cb->mii_bus->irq = cb->phy_irq;
+			cb->mii_bus->name = "mdio_mux";
+			snprintf(cb->mii_bus->id, MII_BUS_ID_SIZE, "%x.%x",
+				 pb->parent_id, v);
+			cb->mii_bus->parent = &pdev->dev;
+			cb->mii_bus->read = mdio_mux_read;
+			cb->mii_bus->write = mdio_mux_write;
+			r = of_mdiobus_register(cb->mii_bus, child_bus_node);
+			if (r) {
+				of_node_put(child_bus_node);
+				devm_kfree(&pdev->dev, cb);
+			} else {
+				n++;
+			}
+
+		} else {
+			of_node_put(child_bus_node);
+		}
+	}
+	if (n) {
+		dev_info(&pdev->dev, "Version " DRV_VERSION "\n");
+		return 0;
+	}
+	ret_val = -ENOMEM;
+	devm_kfree(&pdev->dev, pb);
+err_parent_bus:
+	of_node_put(parent_bus_node);
+	return ret_val;
+}
+EXPORT_SYMBOL(mdio_mux_probe);
+
+static int __devexit mdio_mux_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static int __init mdio_mux_mod_init(void)
+{
+	return 0;
+}
+module_init(mdio_mux_mod_init);
+
+static void __exit mdio_mux_mod_exit(void)
+{
+}
+module_exit(mdio_mux_mod_exit);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_VERSION(DRV_VERSION);
+MODULE_AUTHOR("David Daney");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/mdio-mux.h b/include/linux/mdio-mux.h
new file mode 100644
index 0000000..522992a
--- /dev/null
+++ b/include/linux/mdio-mux.h
@@ -0,0 +1,18 @@
+/*
+ * MDIO bus multiplexer framwork.
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Cavium Networks
+ */
+#ifndef __LINUX_MDIO_MUX_H
+#define __LINUX_MDIO_MUX_H
+#include <linux/platform_device.h>
+
+int mdio_mux_probe(struct platform_device *pdev,
+		   int (*switch_fn) (int cur, int desired, void *data),
+		   void *data);
+
+#endif /* __LINUX_MDIO_MUX_H */
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 3/3] netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.
From: David Daney @ 2011-08-31 20:01 UTC (permalink / raw)
  To: linux-mips, ralf, devicetree-discuss, grant.likely, linux-kernel,
	netdev
  Cc: David Daney, David S. Miller
In-Reply-To: <1314820906-14004-1-git-send-email-david.daney@cavium.com>

The GPIO pins select which sub bus is connected to the master.

Initially tested with an sn74cbtlv3253 switch device wired into the
MDIO bus.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
---
 .../devicetree/bindings/net/mdio-mux-gpio.txt      |  127 +++++++++++++++++
 drivers/net/phy/Kconfig                            |    9 ++
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/mdio-mux-gpio.c                    |  143 ++++++++++++++++++++
 4 files changed, 280 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
 create mode 100644 drivers/net/phy/mdio-mux-gpio.c

diff --git a/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt b/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
new file mode 100644
index 0000000..23406a4
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux-gpio.txt
@@ -0,0 +1,127 @@
+Properties for an MDIO bus multiplexer/switch controlled by GPIO pins.
+
+This is a special case of a MDIO bus multiplexer.  One or more GPIO
+lines are used to control which child bus is connected.
+
+Required properties in addition to the generic multiplexer properties:
+
+- compatible : Should define the compatible device type for the
+               multiplexer.  Currently "cavium,mdio-mux-sn74cbtlv3253"
+               is defined, others are possible.
+- gpios : GPIO specifiers for each GPIO line.  One or more must be specified.
+
+
+Example :
+
+	/* The parent MDIO bus. */
+	smi1: mdio@1180000001900 {
+		compatible = "cavium,octeon-3860-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0x11800 0x00001900 0x0 0x40>;
+	};
+
+	/*
+	   An NXP sn74cbtlv3253 dual 1-of-4 switch controlled by a
+	   pair of GPIO lines.  Child busses 2 and 3 populated with 4
+	   PHYs each.
+	 */
+	mdio-mux {
+		compatible = "cavium,mdio-mux-sn74cbtlv3253", "cavium,mdio-mux";
+		gpios = <&gpio1 3 0>, <&gpio1 4 0>;
+		parent-bus = <&smi1>;
+
+		mdio@2 {
+			cell-index = <2>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy11: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy12: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy13: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+			phy14: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <10 8>; /* Pin 10, active low */
+			};
+		};
+
+		mdio@3 {
+			cell-index = <3>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			phy21: ethernet-phy@1 {
+				reg = <1>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy22: ethernet-phy@2 {
+				reg = <2>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy23: ethernet-phy@3 {
+				reg = <3>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+			phy24: ethernet-phy@4 {
+				reg = <4>;
+				compatible = "marvell,88e1149r";
+				marvell,reg-init = <3 0x10 0 0x5777>,
+					<3 0x11 0 0x00aa>,
+					<3 0x12 0 0x4105>,
+					<3 0x13 0 0x0a60>;
+				interrupt-parent = <&gpio>;
+				interrupts = <12 8>; /* Pin 12, active low */
+			};
+		};
+	};
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 59848bc..59b3b17 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -127,6 +127,15 @@ config MDIO_BUS_MUX
 	  to a parent bus.  Switching between child busses is done by
 	  device specific drivers.
 
+config MDIO_BUS_MUX_GPIO
+	tristate "Support for GPIO controlled MDIO bus multiplexers"
+	depends on MDIO_BUS_MUX && GENERIC_GPIO
+	help
+	  This module provides a driver for MDIO bus multiplexers that
+	  are controlled via GPIO lines.  The multiplexer connects one of
+	  several child MDIO busses to a parent bus.  Child bus
+	  selection is under the control of GPIO lines.
+
 config MDIO_OCTEON
 	tristate "Support for MDIO buses on Octeon SOCs"
 	depends on  CPU_CAVIUM_OCTEON
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 0c081d5..d1a1927 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_STE10XP)		+= ste10Xp.o
 obj-$(CONFIG_MICREL_PHY)	+= micrel.o
 obj-$(CONFIG_MDIO_OCTEON)	+= mdio-octeon.o
 obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o
+obj-$(CONFIG_MDIO_BUS_MUX_GPIO)	+= mdio-mux-gpio.o
diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
new file mode 100644
index 0000000..3cdad35
--- /dev/null
+++ b/drivers/net/phy/mdio-mux-gpio.c
@@ -0,0 +1,143 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Cavium Networks
+ */
+
+#include <linux/platform_device.h>
+#include <linux/of_mdio.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/phy.h>
+#include <linux/mdio-mux.h>
+#include <linux/of_gpio.h>
+
+#define DRV_VERSION "1.0"
+#define DRV_DESCRIPTION "GPIO controlled MDIO bus multiplexer driver"
+
+#define MDIO_MUX_GPIO_MAX_BITS 8
+
+struct mdio_mux_gpio_state {
+	int gpio[MDIO_MUX_GPIO_MAX_BITS];
+	unsigned int num_gpios;
+};
+
+static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
+				   void *data)
+{
+	int change;
+	unsigned int n;
+	struct mdio_mux_gpio_state *s = data;
+
+	if (current_child == desired_child)
+		return 0;
+
+	change = current_child == -1 ? -1 : current_child ^ desired_child;
+
+	for (n = 0; n < s->num_gpios; n++) {
+		if (change & 1)
+			gpio_set_value_cansleep(s->gpio[n],
+						(desired_child & 1) != 0);
+		change >>= 1;
+		desired_child >>= 1;
+	}
+
+	return 0;
+}
+
+static int __devinit mdio_mux_gpio_probe(struct platform_device *pdev)
+{
+	enum of_gpio_flags f;
+	struct mdio_mux_gpio_state *s;
+	unsigned int num_gpios;
+	unsigned int n;
+	int r;
+
+	if (!pdev->dev.of_node)
+		return -ENODEV;
+
+	num_gpios = of_gpio_count(pdev->dev.of_node);
+	if (num_gpios == 0 || num_gpios > MDIO_MUX_GPIO_MAX_BITS)
+		return -ENODEV;
+
+	s = devm_kzalloc(&pdev->dev, sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	s->num_gpios = num_gpios;
+
+	for (n = 0; n < num_gpios; ) {
+		int gpio = of_get_named_gpio_flags(pdev->dev.of_node,
+						   "gpios", n, &f);
+		s->gpio[n] = gpio;
+		if (gpio < 0) {
+			r = -ENODEV;
+			goto err;
+		}
+
+		n++;
+
+		r = gpio_request(gpio, "mdio_mux_gpio");
+		if (r)
+			goto err;
+
+		r = gpio_direction_output(gpio, 0);
+		if (r)
+			goto err;
+	}
+
+	r = mdio_mux_probe(pdev, mdio_mux_gpio_switch_fn, s);
+
+	if (r == 0)
+		return 0;
+err:
+	while (n) {
+		n--;
+		gpio_free(s->gpio[n]);
+	}
+	devm_kfree(&pdev->dev, s);
+	return r;
+}
+
+static int __devexit mdio_mux_gpio_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static struct of_device_id mdio_mux_gpio_match[] = {
+	{
+		.compatible = "cavium,mdio-mux-sn74cbtlv3253",
+	},
+	{},
+};
+MODULE_DEVICE_TABLE(of, mdio_mux_match);
+
+static struct platform_driver mdio_mux_gpio_driver = {
+	.driver = {
+		.name		= "mdio-mux-gpio",
+		.owner		= THIS_MODULE,
+		.of_match_table = mdio_mux_gpio_match,
+	},
+	.probe		= mdio_mux_gpio_probe,
+	.remove		= __devexit_p(mdio_mux_gpio_remove),
+};
+
+static int __init mdio_mux_gpio_mod_init(void)
+{
+	return platform_driver_register(&mdio_mux_gpio_driver);
+}
+late_initcall(mdio_mux_gpio_mod_init);
+
+static void __exit mdio_mux_gpio_mod_exit(void)
+{
+	platform_driver_unregister(&mdio_mux_gpio_driver);
+}
+module_exit(mdio_mux_gpio_mod_exit);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_VERSION(DRV_VERSION);
+MODULE_AUTHOR("David Daney");
+MODULE_LICENSE("GPL");
-- 
1.7.2.3

^ permalink raw reply related

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Nicolas de Pesloüan @ 2011-08-31 20:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Michał Mirosław, netdev, davem, eric.dumazet,
	bhutchings, shemminger
In-Reply-To: <20110831084511.GD2010@minipsycho.brq.redhat.com>

Le 31/08/2011 10:45, Jiri Pirko a écrit :

>>>> Do you expect drivers using implementation different than just calling
>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>>> Yes, generally it can be used also for en/disable phy, for testing
>>> purposes if hw and driver would support it.
>>
>> I'd like to see this working for GRE tunnel devices (for keepalive
>> daemon to be able to indicate to routing daemons whether tunnel is
>> really working) - implementation would be identical to dummy's case.
>> Should I prepare a patch or can I leave it to you?
>
> Ok, I can include it to this patchset (I'm going to repost first patch
> anyway)

Can't we assume that the dummy's case is the default behavior and register this default 
ndo_change_carrier callback for every device ?

Device drivers willing to do something different can install a different callback if appropriate.

This would avoid duplicating the following code in most drivers, that don't need something special.

static int dummy_change_carrier(struct net_device *dev, bool new_carrier)
{
	if (new_carrier)
		netif_carrier_on(dev);
	else
		netif_carrier_off(dev);
	return 0;
}

If someone is not confident with this default callback registered for all device, at least, we can 
put this code in a common place, so that a driver willing to use it doesn't need to have its own 
version of it.

	Nicolas.

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Ben Hutchings @ 2011-08-31 20:12 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jiri Pirko, Michał Mirosław, netdev, davem,
	eric.dumazet, shemminger
In-Reply-To: <4E5E939C.5000009@gmail.com>

On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
> Le 31/08/2011 10:45, Jiri Pirko a écrit :
> 
> >>>> Do you expect drivers using implementation different than just calling
> >>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
> >>> Yes, generally it can be used also for en/disable phy, for testing
> >>> purposes if hw and driver would support it.
> >>
> >> I'd like to see this working for GRE tunnel devices (for keepalive
> >> daemon to be able to indicate to routing daemons whether tunnel is
> >> really working) - implementation would be identical to dummy's case.
> >> Should I prepare a patch or can I leave it to you?
> >
> > Ok, I can include it to this patchset (I'm going to repost first patch
> > anyway)
> 
> Can't we assume that the dummy's case is the default behavior and
> register this default 
> ndo_change_carrier callback for every device ?

You have got to be joking.  No device driver that has real link
monitoring should use this implementation.

[...]
> If someone is not confident with this default callback registered for
> all device, at least, we can 
> put this code in a common place, so that a driver willing to use it
> doesn't need to have its own 
> version of it.

I agree that this might be useful in some other software devices,
though.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Jiri Pirko @ 2011-08-31 20:26 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Nicolas de Pesloüan, Michał Mirosław, netdev,
	davem, eric.dumazet, shemminger
In-Reply-To: <1314821537.3274.24.camel@bwh-desktop>

Wed, Aug 31, 2011 at 10:12:17PM CEST, bhutchings@solarflare.com wrote:
>On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
>> 
>> >>>> Do you expect drivers using implementation different than just calling
>> >>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>> >>> Yes, generally it can be used also for en/disable phy, for testing
>> >>> purposes if hw and driver would support it.
>> >>
>> >> I'd like to see this working for GRE tunnel devices (for keepalive
>> >> daemon to be able to indicate to routing daemons whether tunnel is
>> >> really working) - implementation would be identical to dummy's case.
>> >> Should I prepare a patch or can I leave it to you?
>> >
>> > Ok, I can include it to this patchset (I'm going to repost first patch
>> > anyway)
>> 
>> Can't we assume that the dummy's case is the default behavior and
>> register this default 
>> ndo_change_carrier callback for every device ?
>
>You have got to be joking.  No device driver that has real link
>monitoring should use this implementation.
>
>[...]
>> If someone is not confident with this default callback registered for
>> all device, at least, we can 
>> put this code in a common place, so that a driver willing to use it
>> doesn't need to have its own 
>> version of it.
>
>I agree that this might be useful in some other software devices,
>though.

Maybe it would be better to just add priv_flag for this instead of
introducing netdev_op. Then do on/off is this flag is set. Not sure
though...

Jirka

>
>Ben.
>
>-- 
>Ben Hutchings, Staff Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Nicolas de Pesloüan @ 2011-08-31 20:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jiri Pirko, Michał Mirosław, netdev, davem,
	eric.dumazet, shemminger
In-Reply-To: <1314821537.3274.24.camel@bwh-desktop>

Le 31/08/2011 22:12, Ben Hutchings a écrit :
> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
>>
>>>>>> Do you expect drivers using implementation different than just calling
>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>>>>> Yes, generally it can be used also for en/disable phy, for testing
>>>>> purposes if hw and driver would support it.
>>>>
>>>> I'd like to see this working for GRE tunnel devices (for keepalive
>>>> daemon to be able to indicate to routing daemons whether tunnel is
>>>> really working) - implementation would be identical to dummy's case.
>>>> Should I prepare a patch or can I leave it to you?
>>>
>>> Ok, I can include it to this patchset (I'm going to repost first patch
>>> anyway)
>>
>> Can't we assume that the dummy's case is the default behavior and
>> register this default
>> ndo_change_carrier callback for every device ?
>
> You have got to be joking.  No device driver that has real link
> monitoring should use this implementation.

Well, why not? Arguably, this is probably not the feature one would use every day, but...

Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the 
cable for the test. I understand that one can turn off the switch port (physical or virtual), but 
echo 0 > /sys/class/net/eth0/carrier would be nice too.

Of course, I assume that netif_carrier_on and netif_carrier_off are only called on real link status 
change. So the value written into /sys/class/net/eth0/carrier would stay until the link revert it, 
due to a double status change (up-down-up or down-up-down). But I may miss totally this point.

	Nicolas.

^ permalink raw reply

* Re: [PATCH] bridge: mask forwarding of IEEE 802 local multicast groups
From: Nick Carter @ 2011-08-31 20:41 UTC (permalink / raw)
  To: Stephen Hemminger, David Lamparter, eswierk
  Cc: netdev, Michał Mirosław, davem
In-Reply-To: <20110815112501.3a3c01ad@nehalam.ftrdhcpuser.net>

On 15 August 2011 19:25, Stephen Hemminger
<shemminger@linux-foundation.org> wrote:
> On Mon, 15 Aug 2011 17:27:12 +0100
> Nick Carter <ncarter100@gmail.com> wrote:
>
>> On 28 July 2011 16:41, Stephen Hemminger
>> <shemminger@linux-foundation.org> wrote:
>> > On Wed, 27 Jul 2011 13:17:15 +0200
>> > David Lamparter <equinox@diac24.net> wrote:
>> >
>> >> On Fri, Jul 15, 2011 at 06:33:45PM +0200, David Lamparter wrote:
>> >> > On Fri, Jul 15, 2011 at 06:03:57PM +0200, David Lamparter wrote:
>> >> > > On Fri, Jul 15, 2011 at 04:44:50PM +0100, Nick Carter wrote:
>> >> > > > On 12 July 2011 12:36, David Lamparter <equinox@diac24.net> wrote:
>> >> > > > > On Mon, Jul 11, 2011 at 08:27:55AM -0700, Stephen Hemminger wrote:
>> >> > > > >> I am still undecided on this. Understand the need, but don't like idea
>> >> > > > >> of bridge behaving in non-conforming manner. Will see if IEEE 802 committee
>> >> > > > >> has any input.
>> >> > > > >
>> >> > > > > The patch doesn't make the bridge behave nonconformant. The default mask
>> >> > > > > is 0, which just keeps the old behaviour.
>> >> >
>> >> > P.S.: I'd like to once more stress this. In my opinion the patch should
>> >> > be merged because it provides desireable functionality at a small cost
>> >> > (one test, one knob) and __does not change any default behaviour__.
>> >>
>> >> Stephen, anything new on this?
>> >
>> > No.
>> > Don't like adding yet another hack user visible API which will have
>> > to be maintained for too long. But on the other hand I don't have
>> > a better solution at my finger tips. If better idea doesn't come
>> > along, then we can go with yours.
>> >
>> I have not noticed any other proposals and this thread has been open
>> for quite a while.  Have we waited long enough ? If so can this patch
>> be taken ?
>>
>
> I am testing an alternative. The problem with your proposal is that
> it relies on the multicast address. It turns out there are people using
> other addresses for the STP group address, so using that as a identifier
> is incorrect.
If the chosen STP group address is in the local multicast group range
this patch will handle it.

David Lamparter has reviewed this patch and asked for it to be merged.
 This patch has at least two real world uses.  Ed needs this patch to
forward LLDP frames and I need this patch to forward 802.1X frames.

This patch has been out for review for 9 weeks and it still looks like
the best solution.

Could this patch be merged please ?

Thanks,
Nick

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Ben Hutchings @ 2011-08-31 20:44 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jiri Pirko, Michał Mirosław, netdev, davem,
	eric.dumazet, shemminger
In-Reply-To: <4E5E9A30.1040105@gmail.com>

On Wed, 2011-08-31 at 22:31 +0200, Nicolas de Pesloüan wrote:
> Le 31/08/2011 22:12, Ben Hutchings a écrit :
> > On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
> >> Le 31/08/2011 10:45, Jiri Pirko a écrit :
> >>
> >>>>>> Do you expect drivers using implementation different than just calling
> >>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
> >>>>> Yes, generally it can be used also for en/disable phy, for testing
> >>>>> purposes if hw and driver would support it.
> >>>>
> >>>> I'd like to see this working for GRE tunnel devices (for keepalive
> >>>> daemon to be able to indicate to routing daemons whether tunnel is
> >>>> really working) - implementation would be identical to dummy's case.
> >>>> Should I prepare a patch or can I leave it to you?
> >>>
> >>> Ok, I can include it to this patchset (I'm going to repost first patch
> >>> anyway)
> >>
> >> Can't we assume that the dummy's case is the default behavior and
> >> register this default
> >> ndo_change_carrier callback for every device ?
> >
> > You have got to be joking.  No device driver that has real link
> > monitoring should use this implementation.
> 
> Well, why not? Arguably, this is probably not the feature one would
> use every day, but...
> 
> Testing a cluster reaction to a link down event would be easier if one
> doesn't need to unplug the 
> cable for the test. I understand that one can turn off the switch port
> (physical or virtual), but 
> echo 0 > /sys/class/net/eth0/carrier would be nice too.
> 
> Of course, I assume that netif_carrier_on and netif_carrier_off are
> only called on real link status 
> change. So the value written into /sys/class/net/eth0/carrier would
> stay until the link revert it, 
> due to a double status change (up-down-up or down-up-down). But I may
> miss totally this point.

Think what happens if the user sets carrier on, when the link is
actually down.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] net: Initialize entire flowi struct
From: Julian Anastasov @ 2011-08-31 20:51 UTC (permalink / raw)
  To: David Ward; +Cc: netdev
In-Reply-To: <1314806703-5275-1-git-send-email-david.ward@ll.mit.edu>


	Hello,

On Wed, 31 Aug 2011, David Ward wrote:

> The entire flowi struct needs to be initialized by afinfo->decode_session,
> because flow_hash_code operates over the entire struct and may otherwise
> return different hash values for what is intended to be the same key.

	Such change will cause problems for callers that
use flowi4 in stack. Examples:

ip_route_me_harder
icmp_route_lookup

	Not sure if adding size as parameter to flow_hash_code
is better approach. May be flow_cache_lookup needs to
determine size from family that can be used for flow_hash_code,
flow_key_compare and the memcpy(&fle->key, key, sizeof(*key))
after fle = kmem_cache_alloc(flow_cachep, GFP_ATOMIC). The
question is how to get size by family.

> Signed-off-by: David Ward <david.ward@ll.mit.edu>
> ---
>  net/ipv4/xfrm4_policy.c |    2 +-
>  net/ipv6/xfrm6_policy.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index fc5368a..afce24d 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -114,7 +114,7 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse)
>  	u8 *xprth = skb_network_header(skb) + iph->ihl * 4;
>  	struct flowi4 *fl4 = &fl->u.ip4;
>  
> -	memset(fl4, 0, sizeof(struct flowi4));
> +	memset(fl, 0, sizeof(struct flowi));
>  	fl4->flowi4_mark = skb->mark;
>  
>  	if (!ip_is_fragment(iph)) {
> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> index d879f7e..9088d38 100644
> --- a/net/ipv6/xfrm6_policy.c
> +++ b/net/ipv6/xfrm6_policy.c
> @@ -129,7 +129,7 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
>  	const unsigned char *nh = skb_network_header(skb);
>  	u8 nexthdr = nh[IP6CB(skb)->nhoff];
>  
> -	memset(fl6, 0, sizeof(struct flowi6));
> +	memset(fl, 0, sizeof(struct flowi));
>  	fl6->flowi6_mark = skb->mark;
>  
>  	ipv6_addr_copy(&fl6->daddr, reverse ? &hdr->saddr : &hdr->daddr);
> -- 
> 1.7.4.1

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH] net: Initialize entire flowi struct
From: David Miller @ 2011-08-31 20:47 UTC (permalink / raw)
  To: ja; +Cc: david.ward, netdev
In-Reply-To: <alpine.LFD.2.00.1108312321400.1826@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Wed, 31 Aug 2011 23:51:32 +0300 (EEST)

> 	Such change will cause problems for callers that
> use flowi4 in stack. Examples:
> 
> ip_route_me_harder
> icmp_route_lookup

Right.

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Ben Greear @ 2011-08-31 20:48 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Ben Hutchings, Jiri Pirko, Michał Mirosław, netdev,
	davem, eric.dumazet, shemminger
In-Reply-To: <4E5E9A30.1040105@gmail.com>

On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
> Le 31/08/2011 22:12, Ben Hutchings a écrit :
>> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
>>>
>>>>>>> Do you expect drivers using implementation different than just calling
>>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>>>>>> Yes, generally it can be used also for en/disable phy, for testing
>>>>>> purposes if hw and driver would support it.
>>>>>
>>>>> I'd like to see this working for GRE tunnel devices (for keepalive
>>>>> daemon to be able to indicate to routing daemons whether tunnel is
>>>>> really working) - implementation would be identical to dummy's case.
>>>>> Should I prepare a patch or can I leave it to you?
>>>>
>>>> Ok, I can include it to this patchset (I'm going to repost first patch
>>>> anyway)
>>>
>>> Can't we assume that the dummy's case is the default behavior and
>>> register this default
>>> ndo_change_carrier callback for every device ?
>>
>> You have got to be joking. No device driver that has real link
>> monitoring should use this implementation.
>
> Well, why not? Arguably, this is probably not the feature one would use every day, but...
>
> Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
> switch port (physical or virtual), but echo 0 > /sys/class/net/eth0/carrier would be nice too.

There is special hardware out there that can do bypass, and often it also has a mode
that will programatically cut link by throwing some relays.  We use this for our
testing equipment...

If there is some way to twiddle standard-ish hardware to actually drop link, that
would be neat.  I'd think it should be an ethtool type of thing, however.

Actually dropping link, and letting that naturally propagate up the stack seems
more reasonable than lying about the status half way up the stack.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH 1/2] Define security_sk_getsecctx
From: Casey Schaufler @ 2011-08-31 20:49 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: rongqing.li, netdev, selinux, linux-security-module,
	Casey Schaufler
In-Reply-To: <1314816361.6850.51.camel@moss-pluto>

On 8/31/2011 11:46 AM, Stephen Smalley wrote:
> On Wed, 2011-08-31 at 08:43 -0700, Casey Schaufler wrote:
>> On 8/31/2011 1:36 AM, rongqing.li@windriver.com wrote:
>>> From: Roy.Li <rongqing.li@windriver.com>
>>>
>>> Define security_sk_getsecctx to return the security
>>> context of a sock.
>> So, what is the intended use of the information
>> coming from this hook? If I wanted to write the
>> Smack hook, which of the "contexts" would I want
>> to return? There are potentially three. If I know
>> what the caller is looking for, I can (hopefully)
>> select the correct information.
> The initial use case is for netstat -Z so that it can reliably show the
> security context of the socket rather than inferring it from the owning
> process, which can be inaccurate for security-aware applications.
>
> In your situation, when in != out, which would you rather see in netstat
> -Z output?  Alternatively, if you want them both, perhaps you could
> combine in and out into a single string that is returned, similar to
> what you proposed for handling multiple xattrs with inode_getsecctx()?
>

If we want to use secctx consistently within the kernel, and
I personally think that is a good idea, I would have to chose
the SMACK64IPIN (label checked on packet delivery) value. Putting
both the SMACK64IPOUT and SMACK64IPIN "contexts" into the secctx
would violate the architectural notion that a secctx is the
textual representation of a value used to make access control
decisions. It would mean that calling security_secctx_to_secid()
with the value returned by security_sk_getsecctx would be invalid.
Note that this is different from the mechanism I had suggested
for handling secctx in the multiple LSM case, as the composed
string would map to a single secid which would in turn map back
to that same composed string. If, on the other hand, what netstat -Z
is out to show is all of the LSM base information about the socket
a compound string might make sense, it just would not be a
secctx, it would be an informational string of some other flavor.

^ permalink raw reply

* Re: [PATCH] bridge: mask forwarding of IEEE 802 local multicast groups
From: Stephen Hemminger @ 2011-08-31 20:49 UTC (permalink / raw)
  To: Nick Carter
  Cc: David Lamparter, eswierk, netdev, Michał Mirosław,
	davem
In-Reply-To: <CAEJpZP0riOx=7umYgr3fV4JF31+8Jv8m6DRTkyyuiDnioff4Vw@mail.gmail.com>

On Wed, 31 Aug 2011 21:41:26 +0100
Nick Carter <ncarter100@gmail.com> wrote:

> On 15 August 2011 19:25, Stephen Hemminger
> <shemminger@linux-foundation.org> wrote:
> > On Mon, 15 Aug 2011 17:27:12 +0100
> > Nick Carter <ncarter100@gmail.com> wrote:
> >
> >> On 28 July 2011 16:41, Stephen Hemminger
> >> <shemminger@linux-foundation.org> wrote:
> >> > On Wed, 27 Jul 2011 13:17:15 +0200
> >> > David Lamparter <equinox@diac24.net> wrote:
> >> >
> >> >> On Fri, Jul 15, 2011 at 06:33:45PM +0200, David Lamparter wrote:
> >> >> > On Fri, Jul 15, 2011 at 06:03:57PM +0200, David Lamparter wrote:
> >> >> > > On Fri, Jul 15, 2011 at 04:44:50PM +0100, Nick Carter wrote:
> >> >> > > > On 12 July 2011 12:36, David Lamparter <equinox@diac24.net> wrote:
> >> >> > > > > On Mon, Jul 11, 2011 at 08:27:55AM -0700, Stephen Hemminger wrote:
> >> >> > > > >> I am still undecided on this. Understand the need, but don't like idea
> >> >> > > > >> of bridge behaving in non-conforming manner. Will see if IEEE 802 committee
> >> >> > > > >> has any input.
> >> >> > > > >
> >> >> > > > > The patch doesn't make the bridge behave nonconformant. The default mask
> >> >> > > > > is 0, which just keeps the old behaviour.
> >> >> >
> >> >> > P.S.: I'd like to once more stress this. In my opinion the patch should
> >> >> > be merged because it provides desireable functionality at a small cost
> >> >> > (one test, one knob) and __does not change any default behaviour__.
> >> >>
> >> >> Stephen, anything new on this?
> >> >
> >> > No.
> >> > Don't like adding yet another hack user visible API which will have
> >> > to be maintained for too long. But on the other hand I don't have
> >> > a better solution at my finger tips. If better idea doesn't come
> >> > along, then we can go with yours.
> >> >
> >> I have not noticed any other proposals and this thread has been open
> >> for quite a while.  Have we waited long enough ? If so can this patch
> >> be taken ?
> >>
> >
> > I am testing an alternative. The problem with your proposal is that
> > it relies on the multicast address. It turns out there are people using
> > other addresses for the STP group address, so using that as a identifier
> > is incorrect.
> If the chosen STP group address is in the local multicast group range
> this patch will handle it.
> 
> David Lamparter has reviewed this patch and asked for it to be merged.
>  This patch has at least two real world uses.  Ed needs this patch to
> forward LLDP frames and I need this patch to forward 802.1X frames.
> 
> This patch has been out for review for 9 weeks and it still looks like
> the best solution.

I prefer the netfilter solution because it is more general. We already have
a firewall solution why shouldn't this case be part of it?

^ permalink raw reply

* Re: BQL crap and wireless
From: Luis R. Rodriguez @ 2011-08-31 20:50 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Andrew McGregor, Adrian Chadd, Tom Herbert, Dave Taht,
	linux-wireless, Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4E5E36EE.8080501-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>

On Wed, Aug 31, 2011 at 6:28 AM, Jim Gettys <jg-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org> wrote:
> On 08/30/2011 05:47 PM, Andrew McGregor wrote:
>> On 31/08/2011, at 1:58 AM, Jim Gettys wrote:
>>
>>> On 08/29/2011 11:42 PM, Adrian Chadd wrote:
>>>> On 30 August 2011 11:34, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>>>
>>>> C(P) is going to be quite variable - a full frame retransmit of a 4ms
>>>> long aggregate frame is SUM(exponential backoff, grab the air,
>>>> preamble, header, 4ms, etc. for each pass.)
>>>>
>>> It's not clear to me that doing heroic measures to compute the cost is
>>> going to be worthwhile due to the rate at which the costs can change on
>>> wireless; just getting into the rough ballpark may be enough. But
>>> buffering algorithms and AQM algorithms are going to need an estimate of
>>> the *time* it will take to transmit data, more than # of bytes or packets.
>> That's not heroic measures; mac80211 needs all the code to calculate these times anyway, it's just a matter of collecting together some things we already know and calling the right function.
>>
>>
>
> Fine; if it's easy, accurate is better (presuming the costs get
> recalculated when circumstances change). We also will need the amount of
> data being transmitted; it is the rate of transmission (the rate at
> which the buffers are draining) that we'll likely need.
>
> Here's what I've gleaned from reading "RED in a different light",  Van
> Jacobson's Mitre talk and several conversations with Kathleen Nichols
> and Van: AQM algorithms that can handle variable bandwidth environments
> will need to be able to know the rate at which buffers empty.  It's the
> direction they are going with their experiments on a "RED light" algorithm.
>
> The fundamental insight as to why classic RED can't work in the wireless
> environment is that the instantaneous queue length has little actual
> information in it.

Can you elaborate? Mind you I know squat about AQM but am curious
about what *is* required here.

> Classic RED is tuned using the queue length as its
> basic parameter.  Their belief as to algorithms that will work is that
> the need to keep track of the running queue length *minimum over time*;
> you want to keep the buffers small on a longer term basis (so they both
> can absorb transients, which is their reason for existence, while
> keeping the latency typically low).

I think I get this.

Now, I do care about the above details of what is needed but -- it
still puzzles me that no algorithm on addressing this has been
developed that simply does trial and error to fixate on the
appropriate queue length. Do we know the outcome of using a "bad queue
length", is it the latency issues? If so then can we not use latency
as a reactive measure to help us fine-tune the desired queue length.
This would be PHY agnostic. This is what I was luring to earlier when
I had mentioned using the approach Minstrel takes to solve the rate
control problem. I've lately been fixating on these sort of algorithms
approaches as solutions to complex problems as inspired by Tim
Harford's elaboration on the God complex on a TED Talk:

http://www.youtube.com/watch?v=K5wCfYujRdE

> The additional major challenge we
> face that core routers do not is the highly variable traffic of mixed
> mice and elephants.  What actually works only time will tell.

Thank you for elaborating on this, it helps to further dissect the
issues. When mixing them together it makes the task harder to address.
As for 802.11 we'd just need to worry about management frames. Dave
and Andrew had also pointed out issues with multicast and the low
rates used for them in comparison to unicast transmissions.

> So in an environment in which the rate of transmission is highly
> variable,

Not only that, remember we have different possible topologies
depending on the type of 802.11 service used, IBSS, the typical AP-STA
environment, Mesh, P2P (AP-STA really), and now with 802.11ad we get
PBSS. What counts here is we may have variable rate of transmission to
different possible peers, as such the queue length will depend on all
of the expected transmissions.

> such as wireless, or even possibly modern broadband with
> PowerBoost, classic RED or similar algorithms that do not take the
> buffer drain rate cannot possibly hack it properly.

Understood, just curious if anyone has tried a Minstrel approach.

Lets try to document all of this here:

http://wireless.kernel.org/en/developers/bufferbloat

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* GIT networking temporary freeze...
From: David Miller @ 2011-08-31 20:58 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA

As you may or may not have noticed, master.kernel.org is down since
late yesterday with some filesystem problems.

It is being worked on actively, but meanwhile I'm not going to be
applying anything to my tree until it comes back up.

Hopefully it should be going again in the next day.

Just FYI...
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] Add a netlink attribute INET_DIAG_SECCTX
From: Paul Moore @ 2011-08-31 21:18 UTC (permalink / raw)
  To: rongqing.li; +Cc: netdev, selinux, linux-security-module
In-Reply-To: <1314779777-12669-3-git-send-email-rongqing.li@windriver.com>

On Wednesday, August 31, 2011 04:36:17 PM rongqing.li@windriver.com wrote:
> From: Roy.Li <rongqing.li@windriver.com>
> 
> Add a new netlink attribute INET_DIAG_SECCTX to dump the security
> context of TCP sockets.

You'll have to forgive me, I'm not familiar with the netlink code used by 
netstat and friends, but is there anyway to report back the security context 
of UDP sockets?  Or does the code below handle that already?

In general, AF_INET and AF_INET6 sockets, regardless of any upper level 
protocols, have security contexts associated with them and it would be nice to 
see them in netstat.

> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 389a2e6..1faf752 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -34,6 +34,8 @@
> 
>  #include <linux/inet_diag.h>
> 
> +#define MAX_SECCTX_LEN 128

I'll echo Stephen's concerns that this is too small.  A MCS/MLS system with a 
moderate number of categories could bump into this limit without too much 
difficulty.

>  struct inet_diag_entry {
> @@ -108,6 +110,25 @@ static int inet_csk_diag_fill(struct sock *sk,
>  		       icsk->icsk_ca_ops->name);
>  	}
> 
> +	if (ext & (1 << (INET_DIAG_SECCTX - 1))) {
> +		u32 ctxlen = 0;
> +		void *secctx;
> +		int error;
> +
> +		error = security_sk_getsecctx(sk, &secctx, &ctxlen);
> +
> +		if (!error && ctxlen) {
> +			if (ctxlen < MAX_SECCTX_LEN) {
> +				strcpy(INET_DIAG_PUT(skb, INET_DIAG_SECCTX,
> +					ctxlen + 1), secctx);
> +			} else {
> +				strcpy(INET_DIAG_PUT(skb, INET_DIAG_SECCTX,
> +					2), "-");

Is the "-" string a special value already interpreted by the userspace tools?  
If not, you might consider using a string that would indicate an out-of-space 
condition occurred; at first glance I thought the "-" string indicated no 
context.

> +			}
> +			security_release_secctx(secctx, ctxlen);
> +		}
> +	}
> +
>  	r->idiag_family = sk->sk_family;
>  	r->idiag_state = sk->sk_state;
>  	r->idiag_timer = 0;
> @@ -246,7 +267,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff
> *skb, static int inet_diag_get_exact(struct sk_buff *in_skb,
>  			       const struct nlmsghdr *nlh)
>  {
> -	int err;
> +	int err, len;
>  	struct sock *sk;
>  	struct inet_diag_req *req = NLMSG_DATA(nlh);
>  	struct sk_buff *rep;
> @@ -293,10 +314,17 @@ static int inet_diag_get_exact(struct sk_buff *in_skb,
> goto out;
> 
>  	err = -ENOMEM;
> -	rep = alloc_skb(NLMSG_SPACE((sizeof(struct inet_diag_msg) +
> -				     sizeof(struct inet_diag_meminfo) +
> -				     handler->idiag_info_size + 64)),
> -			GFP_KERNEL);
> +	len = sizeof(struct inet_diag_msg) + 64;
> +
> +	len += (req->idiag_ext & (1 << (INET_DIAG_MEMINFO - 1))) ?
> +		sizeof(struct inet_diag_meminfo) : 0;
> +	len += (req->idiag_ext & (1 << (INET_DIAG_INFO - 1))) ?
> +		handler->idiag_info_size : 0;
> +	len += (req->idiag_ext & (1 << (INET_DIAG_SECCTX - 1))) ?
> +		MAX_SECCTX_LEN : 0;
> +
> +	rep = alloc_skb(NLMSG_SPACE(len), GFP_KERNEL);

How much of a problem would it be if you just allocated an entire page (or 4k 
in the case of huge pages) and used that?  Is memory usage a concern here?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Ben Hutchings @ 2011-08-31 21:36 UTC (permalink / raw)
  To: Ben Greear
  Cc: Nicolas de Pesloüan, Jiri Pirko, Michał Mirosław,
	netdev, davem, eric.dumazet, shemminger
In-Reply-To: <4E5E9E16.6060502@candelatech.com>

On Wed, 2011-08-31 at 13:48 -0700, Ben Greear wrote:
> On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
> > Le 31/08/2011 22:12, Ben Hutchings a écrit :
> >> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
> >>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
> >>>
> >>>>>>> Do you expect drivers using implementation different than just calling
> >>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
> >>>>>> Yes, generally it can be used also for en/disable phy, for testing
> >>>>>> purposes if hw and driver would support it.
> >>>>>
> >>>>> I'd like to see this working for GRE tunnel devices (for keepalive
> >>>>> daemon to be able to indicate to routing daemons whether tunnel is
> >>>>> really working) - implementation would be identical to dummy's case.
> >>>>> Should I prepare a patch or can I leave it to you?
> >>>>
> >>>> Ok, I can include it to this patchset (I'm going to repost first patch
> >>>> anyway)
> >>>
> >>> Can't we assume that the dummy's case is the default behavior and
> >>> register this default
> >>> ndo_change_carrier callback for every device ?
> >>
> >> You have got to be joking. No device driver that has real link
> >> monitoring should use this implementation.
> >
> > Well, why not? Arguably, this is probably not the feature one would use every day, but...
> >
> > Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
> > switch port (physical or virtual), but echo 0 > /sys/class/net/eth0/carrier would be nice too.
> 
> There is special hardware out there that can do bypass, and often it also has a mode
> that will programatically cut link by throwing some relays.  We use this for our
> testing equipment...
> 
> If there is some way to twiddle standard-ish hardware to actually drop link, that
> would be neat.  I'd think it should be an ethtool type of thing, however.

We need to be able to control this as part of our driver test suite (on
the peer, not the device under test).  There are various MDIO bits that
look like they should do this but unfortunately they don't have
consistent effects.  Besides that, many PHYs are not MDIO-manageable.
So this would have to be a device-specific operation, whether it's
exposed through ethtool or sysfs.

> Actually dropping link, and letting that naturally propagate up the stack seems
> more reasonable than lying about the status half way up the stack.

Right.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Stephen Hemminger @ 2011-08-31 21:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Ben Greear, Nicolas de Pesloüan, Jiri Pirko,
	Michał Mirosław, netdev, davem, eric.dumazet
In-Reply-To: <1314826605.3274.34.camel@bwh-desktop>

On Wed, 31 Aug 2011 22:36:45 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Wed, 2011-08-31 at 13:48 -0700, Ben Greear wrote:
> > On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
> > > Le 31/08/2011 22:12, Ben Hutchings a écrit :
> > >> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
> > >>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
> > >>>
> > >>>>>>> Do you expect drivers using implementation different than just calling
> > >>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
> > >>>>>> Yes, generally it can be used also for en/disable phy, for testing
> > >>>>>> purposes if hw and driver would support it.
> > >>>>>
> > >>>>> I'd like to see this working for GRE tunnel devices (for keepalive
> > >>>>> daemon to be able to indicate to routing daemons whether tunnel is
> > >>>>> really working) - implementation would be identical to dummy's case.
> > >>>>> Should I prepare a patch or can I leave it to you?
> > >>>>
> > >>>> Ok, I can include it to this patchset (I'm going to repost first patch
> > >>>> anyway)
> > >>>
> > >>> Can't we assume that the dummy's case is the default behavior and
> > >>> register this default
> > >>> ndo_change_carrier callback for every device ?
> > >>
> > >> You have got to be joking. No device driver that has real link
> > >> monitoring should use this implementation.
> > >
> > > Well, why not? Arguably, this is probably not the feature one would use every day, but...
> > >
> > > Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
> > > switch port (physical or virtual), but echo 0 > /sys/class/net/eth0/carrier would be nice too.
> > 
> > There is special hardware out there that can do bypass, and often it also has a mode
> > that will programatically cut link by throwing some relays.  We use this for our
> > testing equipment...
> > 
> > If there is some way to twiddle standard-ish hardware to actually drop link, that
> > would be neat.  I'd think it should be an ethtool type of thing, however.
> 
> We need to be able to control this as part of our driver test suite (on
> the peer, not the device under test).  There are various MDIO bits that
> look like they should do this but unfortunately they don't have
> consistent effects.  Besides that, many PHYs are not MDIO-manageable.
> So this would have to be a device-specific operation, whether it's
> exposed through ethtool or sysfs.
> 
> > Actually dropping link, and letting that naturally propagate up the stack seems
> > more reasonable than lying about the status half way up the stack.
> 

For testing clustering, there are hooks in vmware and QEMU/KVM to allow
dropping carrier on the VM side.

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Ben Greear @ 2011-08-31 21:49 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Nicolas de Pesloüan, Jiri Pirko, Michał Mirosław,
	netdev, davem, eric.dumazet, shemminger
In-Reply-To: <1314826605.3274.34.camel@bwh-desktop>

On 08/31/2011 02:36 PM, Ben Hutchings wrote:
> On Wed, 2011-08-31 at 13:48 -0700, Ben Greear wrote:
>> On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
>>> Le 31/08/2011 22:12, Ben Hutchings a écrit :
>>>> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>>>>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
>>>>>
>>>>>>>>> Do you expect drivers using implementation different than just calling
>>>>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>>>>>>>> Yes, generally it can be used also for en/disable phy, for testing
>>>>>>>> purposes if hw and driver would support it.
>>>>>>>
>>>>>>> I'd like to see this working for GRE tunnel devices (for keepalive
>>>>>>> daemon to be able to indicate to routing daemons whether tunnel is
>>>>>>> really working) - implementation would be identical to dummy's case.
>>>>>>> Should I prepare a patch or can I leave it to you?
>>>>>>
>>>>>> Ok, I can include it to this patchset (I'm going to repost first patch
>>>>>> anyway)
>>>>>
>>>>> Can't we assume that the dummy's case is the default behavior and
>>>>> register this default
>>>>> ndo_change_carrier callback for every device ?
>>>>
>>>> You have got to be joking. No device driver that has real link
>>>> monitoring should use this implementation.
>>>
>>> Well, why not? Arguably, this is probably not the feature one would use every day, but...
>>>
>>> Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
>>> switch port (physical or virtual), but echo 0>  /sys/class/net/eth0/carrier would be nice too.
>>
>> There is special hardware out there that can do bypass, and often it also has a mode
>> that will programatically cut link by throwing some relays.  We use this for our
>> testing equipment...
>>
>> If there is some way to twiddle standard-ish hardware to actually drop link, that
>> would be neat.  I'd think it should be an ethtool type of thing, however.
>
> We need to be able to control this as part of our driver test suite (on
> the peer, not the device under test).  There are various MDIO bits that
> look like they should do this but unfortunately they don't have
> consistent effects.  Besides that, many PHYs are not MDIO-manageable.
> So this would have to be a device-specific operation, whether it's
> exposed through ethtool or sysfs.

Well, I have boxes that can act as the peer.  Nics are intel chipset 1G,
the bypass/link-drop feature is some separate hardware on the NIC, and
there is a hack of a driver & user-space tools that controls those bits.

But, if there is a way to fiddle normal NICs w/out the special bypass/disconnect
hardware, that would be welcome.  Even if it's different for each driver, and
not supported by all, ethtool can deal with that sort of thing.

Here's a link to the hardware we use...it comes with drivers, though we've hacked on
ours a bit.

http://silicom-usa.com/upload/Downloads/Product_102.pdf

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] bridge: mask forwarding of IEEE 802 local multicast groups
From: Ed Swierk @ 2011-08-31 22:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nick Carter, David Lamparter, netdev, Michał Mirosław,
	davem
In-Reply-To: <20110831134904.1a050924@nehalam.ftrdhcpuser.net>

On Wed, Aug 31, 2011 at 1:49 PM, Stephen Hemminger
<shemminger@linux-foundation.org> wrote:
> I prefer the netfilter solution because it is more general. We already have
> a firewall solution why shouldn't this case be part of it?

For my application, I just want the bridge to forward all link-local
frames as well as regular frames between a pair of interfaces. It
seems a little odd to use a mechanism like netfilter to tell the
bridge _not_ to drop frames that would otherwise be dropped.

But other than that minor issue, my strong preference is that either
your or Nick's patch be committed as soon as possible :-)

--Ed

^ permalink raw reply

* [PATCH] ethtool: Uncook tg3 regdump output
From: Matt Carlson @ 2011-09-01  2:02 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, mcarlson

tg3 devices have changed a lot since the bcm5700 days.  Register blocks
have been added, removed and redefined.  The existing tg3 register dump
code has not kept up.

This patch changes the tg3_dump_regs() function to be more simplistic.
Rather than attempting to locate where meaningful data is, it will
instead print the registers as 32-bit values, omitting all registers
that have a value of zero.  By performing the output this way, we hope
to future-proof the interface.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 tg3.c |   30 +++++++-----------------------
 1 files changed, 7 insertions(+), 23 deletions(-)

diff --git a/tg3.c b/tg3.c
index 1ce07a8..4868504 100644
--- a/tg3.c
+++ b/tg3.c
@@ -26,32 +26,16 @@ tg3_dump_eeprom(struct ethtool_drvinfo *info, struct ethtool_eeprom *ee)
 int
 tg3_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 {
-	int i, j;
-	int reg_boundaries[] = { 0x015c, 0x0200, 0x0400, 0x0400, 0x08f0, 0x0c00,
-	       			 0x0ce0, 0x1000, 0x1004, 0x1400, 0x1480, 0x1800,
-				 0x1848, 0x1c00, 0x1c04, 0x2000, 0x225c, 0x2400,
-				 0x24c4, 0x2800, 0x2804, 0x2c00, 0x2c20, 0x3000,
-				 0x3014, 0x3400, 0x3408, 0x3800, 0x3808, 0x3c00,
-				 0x3d00, 0x4000, 0x4010, 0x4400, 0x4458, 0x4800,
-				 0x4808, 0x4c00, 0x4c08, 0x5000, 0x5280, 0x5400,
-				 0x5680, 0x5800, 0x5a10, 0x5c00, 0x5d20, 0x6000,
-				 0x600c, 0x6800, 0x6848, 0x7000, 0x7034, 0x7c00,
-				 0x7e40, 0x8000 };
+	int i;
+	u32 reg;
 
 	fprintf(stdout, "Offset\tValue\n");
 	fprintf(stdout, "------\t----------\n");
-	for (i = 0, j = 0; i < regs->len; ) {
-		u32 reg;
-
-		memcpy(&reg, &regs->data[i], 4);
-		fprintf(stdout, "0x%04x\t0x%08x\n", i, reg);
-
-		i += 4;
-		if (i == reg_boundaries[j]) {
-			i = reg_boundaries[j + 1];
-			j += 2;
-			fprintf(stdout, "\n");
-		}
+	for (i = 0; i < regs->len; i += sizeof(reg)) {
+		memcpy(&reg, &regs->data[i], sizeof(reg));
+		if (reg)
+			fprintf(stdout, "0x%04x\t0x%08x\n", i, reg);
+
 	}
 	fprintf(stdout, "\n");
 	return 0;
-- 
1.7.3.4

^ permalink raw reply related

* RFC - should network devices trim frames > soft mtu
From: Stephen Hemminger @ 2011-08-31 22:18 UTC (permalink / raw)
  To: David Miller, Michael Chan; +Cc: netdev

I noticed the following in the bnx2 driver.

static int
bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
{
...
		skb->protocol = eth_type_trans(skb, bp->dev);

		if ((len > (bp->dev->mtu + ETH_HLEN)) &&
			(ntohs(skb->protocol) != 0x8100)) {

			dev_kfree_skb(skb);
			goto next_rx;

		}

This means that for non-VLAN tagged frames, the device drops received
packets if the length is greater than the MTU.  I don't see that in
other devices. What is the correct method? IMHO the bnx2 driver is
wrong here and if the policy is desired it should be enforced at
the next level (netif_receive_skb).  Hardcoding a protocol value is
kind of a giveaway that something is fishy.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox