Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 3/3] asix: Add a new driver for the AX88172A
From: Christian Riesch @ 2012-07-12  8:18 UTC (permalink / raw)
  To: netdev
  Cc: Oliver Neukum, Eric Dumazet, Allan Chou, Mark Lord,
	Grant Grundler, Ben Hutchings, Joe Perches, Michael Riesch,
	Christian Riesch
In-Reply-To: <1342081106-8647-1-git-send-email-christian.riesch@omicron.at>

The Asix AX88172A is a USB 2.0 Ethernet interface that supports both an
internal PHY as well as an external PHY (connected via MII).

This patch adds a driver for the AX88172A and provides support for
both modes and the phylib.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
---
 drivers/net/usb/Makefile       |    2 +-
 drivers/net/usb/asix.h         |    5 +
 drivers/net/usb/asix_common.c  |   12 +-
 drivers/net/usb/asix_devices.c |    6 +
 drivers/net/usb/ax88172a.c     |  423 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 445 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/usb/ax88172a.c

diff --git a/drivers/net/usb/Makefile b/drivers/net/usb/Makefile
index a9490d9..bf06300 100644
--- a/drivers/net/usb/Makefile
+++ b/drivers/net/usb/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_USB_PEGASUS)	+= pegasus.o
 obj-$(CONFIG_USB_RTL8150)	+= rtl8150.o
 obj-$(CONFIG_USB_HSO)		+= hso.o
 obj-$(CONFIG_USB_NET_AX8817X)	+= asix.o
-asix-y := asix_devices.o asix_common.o
+asix-y := asix_devices.o asix_common.o ax88172a.o
 obj-$(CONFIG_USB_NET_CDCETHER)	+= cdc_ether.o
 obj-$(CONFIG_USB_NET_CDC_EEM)	+= cdc_eem.o
 obj-$(CONFIG_USB_NET_DM9601)	+= dm9601.o
diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
index 790af24..77d9e4c 100644
--- a/drivers/net/usb/asix.h
+++ b/drivers/net/usb/asix.h
@@ -74,6 +74,10 @@
 #define AX_CMD_SW_PHY_STATUS		0x21
 #define AX_CMD_SW_PHY_SELECT		0x22
 
+#define AX_PHY_SELECT_MASK		(BIT(3) | BIT(2))
+#define AX_PHY_SELECT_INTERNAL		0
+#define AX_PHY_SELECT_EXTERNAL		BIT(2)
+
 #define AX_MONITOR_MODE			0x01
 #define AX_MONITOR_LINK			0x02
 #define AX_MONITOR_MAGIC		0x04
@@ -181,6 +185,7 @@ struct sk_buff *asix_tx_fixup(struct usbnet *dev, struct sk_buff *skb,
 int asix_set_sw_mii(struct usbnet *dev);
 int asix_set_hw_mii(struct usbnet *dev);
 
+int asix_read_phy_addr(struct usbnet *dev, int internal);
 int asix_get_phy_addr(struct usbnet *dev);
 
 int asix_sw_reset(struct usbnet *dev, u8 flags);
diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
index 3c1429a..336f755 100644
--- a/drivers/net/usb/asix_common.c
+++ b/drivers/net/usb/asix_common.c
@@ -258,8 +258,9 @@ int asix_set_hw_mii(struct usbnet *dev)
 	return ret;
 }
 
-int asix_get_phy_addr(struct usbnet *dev)
+int asix_read_phy_addr(struct usbnet *dev, int internal)
 {
+	int offset = (internal ? 1 : 0);
 	u8 buf[2];
 	int ret = asix_read_cmd(dev, AX_CMD_READ_PHY_ID, 0, 0, 2, buf);
 
@@ -271,12 +272,19 @@ int asix_get_phy_addr(struct usbnet *dev)
 	}
 	netdev_dbg(dev->net, "asix_get_phy_addr() returning 0x%04x\n",
 		   *((__le16 *)buf));
-	ret = buf[1];
+	ret = buf[offset];
 
 out:
 	return ret;
 }
 
+int asix_get_phy_addr(struct usbnet *dev)
+{
+	/* return the address of the internal phy */
+	return asix_read_phy_addr(dev, 1);
+}
+
+
 int asix_sw_reset(struct usbnet *dev, u8 flags)
 {
 	int ret;
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 8c513f7..ed9403b 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -872,6 +872,8 @@ static const struct driver_info ax88178_info = {
 	.tx_fixup = asix_tx_fixup,
 };
 
+extern const struct driver_info ax88172a_info;
+
 static const struct usb_device_id	products [] = {
 {
 	// Linksys USB200M
@@ -997,6 +999,10 @@ static const struct usb_device_id	products [] = {
 	// Asus USB Ethernet Adapter
 	USB_DEVICE (0x0b95, 0x7e2b),
 	.driver_info = (unsigned long) &ax88772_info,
+}, {
+	/* ASIX 88172a demo board */
+	USB_DEVICE(0x0b95, 0x172a),
+	.driver_info = (unsigned long) &ax88172a_info,
 },
 	{ },		// END
 };
diff --git a/drivers/net/usb/ax88172a.c b/drivers/net/usb/ax88172a.c
new file mode 100644
index 0000000..c8895cf
--- /dev/null
+++ b/drivers/net/usb/ax88172a.c
@@ -0,0 +1,423 @@
+/*
+ * ASIX AX88172A based USB 2.0 Ethernet Devices
+ * Copyright (C) 2012 OMICRON electronics GmbH
+ *
+ * Supports external PHYs via phylib. Based on the driver for the
+ * AX88772. Original copyrights follow:
+ *
+ * Copyright (C) 2003-2006 David Hollis <dhollis@davehollis.com>
+ * Copyright (C) 2005 Phil Chang <pchang23@sbcglobal.net>
+ * Copyright (C) 2006 James Painter <jamie.painter@iname.com>
+ * Copyright (c) 2002-2003 TiVo Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include "asix.h"
+#include <linux/phy.h>
+
+struct ax88172a_private {
+	struct mii_bus *mdio;
+	struct phy_device *phydev;
+	char phy_name[20];
+	u16 phy_addr;
+	u16 oldmode;
+	int use_embdphy;
+};
+
+/* MDIO read and write wrappers for phylib */
+static int asix_mdio_bus_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+	return asix_mdio_read(((struct usbnet *)bus->priv)->net, phy_id,
+			      regnum);
+}
+
+static int asix_mdio_bus_write(struct mii_bus *bus, int phy_id, int regnum,
+			       u16 val)
+{
+	asix_mdio_write(((struct usbnet *)bus->priv)->net, phy_id, regnum, val);
+	return 0;
+}
+
+static int ax88172a_ioctl(struct net_device *net, struct ifreq *rq, int cmd)
+{
+	if (!netif_running(net))
+		return -EINVAL;
+
+	if (!net->phydev)
+		return -ENODEV;
+
+	return phy_mii_ioctl(net->phydev, rq, cmd);
+}
+
+/* set MAC link settings according to information from phylib */
+static void ax88172a_adjust_link(struct net_device *netdev)
+{
+	struct phy_device *phydev = netdev->phydev;
+	struct usbnet *dev = netdev_priv(netdev);
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+	u16 mode = 0;
+
+	if (phydev->link) {
+		mode = AX88772_MEDIUM_DEFAULT;
+
+		if (phydev->duplex == DUPLEX_HALF)
+			mode &= ~AX_MEDIUM_FD;
+
+		if (phydev->speed != SPEED_100)
+			mode &= ~AX_MEDIUM_PS;
+	}
+
+	if (mode != priv->oldmode) {
+		asix_write_medium_mode(dev, mode);
+		priv->oldmode = mode;
+		netdev_dbg(netdev, "%s: speed %u duplex %d, setting mode to 0x%04x\n",
+			   __func__, phydev->speed, phydev->duplex, mode);
+		phy_print_status(phydev);
+	}
+}
+
+static void ax88172a_status(struct usbnet *dev, struct urb *urb)
+{
+	netdev_dbg(dev->net, "%s called", __func__);
+}
+
+/* use phylib infrastructure */
+static int ax88172a_init_mdio(struct usbnet *dev)
+{
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+	int ret, i;
+
+	priv->mdio = mdiobus_alloc();
+	if (!priv->mdio) {
+		netdev_err(dev->net, "Could not allocate MDIO bus");
+		return -ENOMEM;
+	}
+
+	priv->mdio->priv = (void *)dev;
+	priv->mdio->read = &asix_mdio_bus_read;
+	priv->mdio->write = &asix_mdio_bus_write;
+	priv->mdio->name = "Asix MDIO Bus";
+	/* mii bus name is usb-<usb bus number>-<usb device number> */
+	snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
+		 dev->udev->bus->busnum, dev->udev->devnum);
+
+	priv->mdio->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+	if (!priv->mdio->irq) {
+		netdev_err(dev->net, "Could not allocate mdio->irq");
+		ret = -ENOMEM;
+		goto mfree;
+	}
+	for (i = 0; i < PHY_MAX_ADDR; i++)
+		priv->mdio->irq[i] = PHY_POLL;
+
+	ret = mdiobus_register(priv->mdio);
+	if (ret) {
+		netdev_err(dev->net, "Could not register MDIO bus");
+		goto ifree;
+	}
+
+	netdev_dbg(dev->net, "Registered mdio bus %s", priv->mdio->id);
+	return 0;
+
+ifree:
+	kfree(priv->mdio->irq);
+mfree:
+	mdiobus_free(priv->mdio);
+	return ret;
+}
+
+static void ax88172a_remove_mdio(struct usbnet *dev)
+{
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+
+	mdiobus_unregister(priv->mdio);
+	kfree(priv->mdio->irq);
+	mdiobus_free(priv->mdio);
+	netdev_dbg(dev->net, "%s called", __func__);
+
+}
+
+static const struct net_device_ops ax88172a_netdev_ops = {
+	.ndo_open		= usbnet_open,
+	.ndo_stop		= usbnet_stop,
+	.ndo_start_xmit		= usbnet_start_xmit,
+	.ndo_tx_timeout		= usbnet_tx_timeout,
+	.ndo_change_mtu		= usbnet_change_mtu,
+	.ndo_set_mac_address	= asix_set_mac_address,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_do_ioctl		= ax88172a_ioctl,
+	.ndo_set_rx_mode        = asix_set_multicast,
+};
+
+int ax88172a_get_settings(struct net_device *net, struct ethtool_cmd *cmd)
+{
+	if (!net->phydev)
+		return -ENODEV;
+
+	return phy_ethtool_gset(net->phydev, cmd);
+}
+
+int ax88172a_set_settings(struct net_device *net, struct ethtool_cmd *cmd)
+{
+	if (!net->phydev)
+		return -ENODEV;
+
+	return phy_ethtool_sset(net->phydev, cmd);
+}
+
+int ax88172a_nway_reset(struct net_device *net)
+{
+	if (!net->phydev)
+		return -ENODEV;
+
+	return phy_start_aneg(net->phydev);
+}
+
+static const struct ethtool_ops ax88172a_ethtool_ops = {
+	.get_drvinfo		= asix_get_drvinfo,
+	.get_link		= usbnet_get_link,
+	.get_msglevel		= usbnet_get_msglevel,
+	.set_msglevel		= usbnet_set_msglevel,
+	.get_wol		= asix_get_wol,
+	.set_wol		= asix_set_wol,
+	.get_eeprom_len		= asix_get_eeprom_len,
+	.get_eeprom		= asix_get_eeprom,
+	.get_settings		= ax88172a_get_settings,
+	.set_settings		= ax88172a_set_settings,
+	.nway_reset		= ax88172a_nway_reset,
+};
+
+static int ax88172a_reset_phy(struct usbnet *dev, int embd_phy)
+{
+	int ret;
+
+	ret = asix_sw_reset(dev, AX_SWRESET_IPPD);
+	if (ret < 0)
+		goto err;
+
+	msleep(150);
+	ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
+	if (ret < 0)
+		goto err;
+
+	msleep(150);
+
+	ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL : AX_SWRESET_IPPD);
+	if (ret < 0)
+		goto err;
+
+	return 0;
+
+err:
+	return ret;
+}
+
+
+static int ax88172a_bind(struct usbnet *dev, struct usb_interface *intf)
+{
+	int ret;
+	struct asix_data *data = (struct asix_data *)&dev->data;
+	u8 buf[ETH_ALEN];
+	struct ax88172a_private *priv;
+
+	data->eeprom_len = AX88772_EEPROM_LEN;
+
+	usbnet_get_endpoints(dev, intf);
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
+		netdev_err(dev->net, "Could not allocate memory for private data");
+		return -ENOMEM;
+	}
+	dev->driver_priv = priv;
+
+	/* Get the MAC address */
+	ret = asix_read_cmd(dev, AX_CMD_READ_NODE_ID, 0, 0, ETH_ALEN, buf);
+	if (ret < 0) {
+		netdev_err(dev->net, "Failed to read MAC address: %d", ret);
+		goto free;
+	}
+	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
+
+	dev->net->netdev_ops = &ax88172a_netdev_ops;
+	dev->net->ethtool_ops = &ax88172a_ethtool_ops;
+
+	/* are we using the internal or the external phy? */
+	ret = asix_read_cmd(dev, AX_CMD_SW_PHY_STATUS, 0, 0, 1, buf);
+	if (ret < 0) {
+		netdev_err(dev->net, "Failed to read software interface selection register: %d",
+			   ret);
+		goto free;
+	}
+
+	netdev_dbg(dev->net, "AX_CMD_SW_PHY_STATUS = 0x%02x\n", buf[0]);
+	switch (buf[0] & AX_PHY_SELECT_MASK) {
+	case AX_PHY_SELECT_INTERNAL:
+		netdev_dbg(dev->net, "use internal phy\n");
+		priv->use_embdphy = 1;
+		break;
+	case AX_PHY_SELECT_EXTERNAL:
+		netdev_dbg(dev->net, "use external phy\n");
+		priv->use_embdphy = 0;
+		break;
+	default:
+		netdev_err(dev->net, "Interface mode not supported by driver\n");
+		goto free;
+	}
+
+	priv->phy_addr = asix_read_phy_addr(dev, priv->use_embdphy);
+	ax88172a_reset_phy(dev, priv->use_embdphy);
+
+	/* Asix framing packs multiple eth frames into a 2K usb bulk transfer */
+	if (dev->driver_info->flags & FLAG_FRAMING_AX) {
+		/* hard_mtu  is still the default - the device does not support
+		   jumbo eth frames */
+		dev->rx_urb_size = 2048;
+	}
+
+	/* init MDIO bus */
+	ret = ax88172a_init_mdio(dev);
+	if (ret)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(priv);
+	return ret;
+}
+
+static int ax88172a_stop(struct usbnet *dev)
+{
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+
+	netdev_dbg(dev->net, "stopping interface");
+
+	if (priv->phydev) {
+		netdev_dbg(dev->net, "disconnecting from phy %s",
+			   priv->phy_name);
+		phy_stop(priv->phydev);
+		phy_disconnect(priv->phydev);
+	}
+
+	return 0;
+}
+
+static void ax88172a_unbind(struct usbnet *dev, struct usb_interface *intf)
+{
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+
+	ax88172a_remove_mdio(dev);
+	kfree(priv);
+}
+
+static int ax88172a_reset(struct usbnet *dev)
+{
+	struct asix_data *data = (struct asix_data *)&dev->data;
+	struct ax88172a_private *priv =
+		(struct ax88172a_private *)dev->driver_priv;
+	int ret;
+	u16 rx_ctl;
+	netdev_dbg(dev->net, "%s called", __func__);
+
+	ax88172a_reset_phy(dev, priv->use_embdphy);
+
+	msleep(150);
+	rx_ctl = asix_read_rx_ctl(dev);
+	netdev_dbg(dev->net, "RX_CTL is 0x%04x after software reset", rx_ctl);
+	ret = asix_write_rx_ctl(dev, 0x0000);
+	if (ret < 0)
+		goto out;
+
+	rx_ctl = asix_read_rx_ctl(dev);
+	netdev_dbg(dev->net, "RX_CTL is 0x%04x setting to 0x0000", rx_ctl);
+
+	msleep(150);
+
+	ret = asix_write_cmd(dev, AX_CMD_WRITE_IPG0,
+				AX88772_IPG0_DEFAULT | AX88772_IPG1_DEFAULT,
+				AX88772_IPG2_DEFAULT, 0, NULL);
+	if (ret < 0) {
+		netdev_err(dev->net, "Write IPG,IPG1,IPG2 failed: %d", ret);
+		goto out;
+	}
+
+	/* Rewrite MAC address */
+	memcpy(data->mac_addr, dev->net->dev_addr, ETH_ALEN);
+	ret = asix_write_cmd(dev, AX_CMD_WRITE_NODE_ID, 0, 0, ETH_ALEN,
+							data->mac_addr);
+	if (ret < 0)
+		goto out;
+
+	/* Set RX_CTL to default values with 2k buffer, and enable cactus */
+	ret = asix_write_rx_ctl(dev, AX_DEFAULT_RX_CTL);
+	if (ret < 0)
+		goto out;
+
+	rx_ctl = asix_read_rx_ctl(dev);
+	netdev_dbg(dev->net, "RX_CTL is 0x%04x after all initializations",
+		   rx_ctl);
+
+	rx_ctl = asix_read_medium_status(dev);
+	netdev_dbg(dev->net, "Medium Status is 0x%04x after all initializations",
+		   rx_ctl);
+
+	/* Connect to PHY */
+	snprintf(priv->phy_name, 20, PHY_ID_FMT,
+		 priv->mdio->id, priv->phy_addr);
+
+	priv->phydev = phy_connect(dev->net, priv->phy_name,
+				   &ax88172a_adjust_link,
+				   0, PHY_INTERFACE_MODE_MII);
+	if (IS_ERR(priv->phydev)) {
+		netdev_err(dev->net, "Could not connect to PHY device %s",
+			   priv->phy_name);
+		ret = PTR_ERR(priv->phydev);
+		goto out;
+	}
+
+	netdev_dbg(dev->net, "Connected to phy %s", priv->phy_name);
+
+	/* During power-up, the AX88172A set the power down (BMCR_PDOWN)
+	 * bit of the PHY. Bring the PHY up again.
+	 */
+	genphy_resume(priv->phydev);
+	phy_start(priv->phydev);
+
+	return 0;
+
+out:
+	return ret;
+
+}
+
+const struct driver_info ax88172a_info = {
+	.description = "ASIX AX88172A USB 2.0 Ethernet",
+	.bind = ax88172a_bind,
+	.reset = ax88172a_reset,
+	.stop = ax88172a_stop,
+	.unbind = ax88172a_unbind,
+	.status = ax88172a_status,
+	.flags = FLAG_ETHER | FLAG_FRAMING_AX | FLAG_LINK_INTR |
+		 FLAG_MULTI_PACKET,
+	.rx_fixup = asix_rx_fixup,
+	.tx_fixup = asix_tx_fixup,
+};
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH v2 0/3] Add a driver for the ASIX AX88172A with phylib support
From: Christian Riesch @ 2012-07-12  8:18 UTC (permalink / raw)
  To: netdev
  Cc: Oliver Neukum, Eric Dumazet, Allan Chou, Mark Lord,
	Grant Grundler, Ben Hutchings, Joe Perches, Michael Riesch,
	Christian Riesch

Hi,

this is v2 of my patchset that adds a driver for the ASIX AX88172A USB 2.0
to 10/100M Fast Ethernet Controller.

Although this chip is already supported by the AX88772 code in
drivers/net/usb/asix.c, I submit a new driver since the existing
driver lacks an important feature: It only supports an
Ethernet connection that is using the internal PHY embedded in the
AX88172A, although the chip also provides an MII interface to connect
an external PHY.

The new driver supports both the internal and the external PHY using
the phylib.

The driver for the AX88172A is based on drivers/net/usb/asix.c
and the work of Michael Riesch <michael@riesch.at>.

The first and the second patch factor out common code which is shared
between the existing drivers and the new driver for the AX88172A. The
third patch adds support for the AX88172A.

The patchset applies on top of net-next.

Changes for v2:
- Rebased to current net-next.
- Dropped the first patch with ridiculous checkpatch fixes, the new version
  of the patchset only fixes the code that it touches.
- Changed the way the code is factored out in the patchset. This allows
  git to detect that the code was just moved, not changed.
- In v1 I accidentally duplicated the code for reading the phy address
  from the asix chip. Now the drivers use common code in asix_common.c.
- Moved phy_connect(), phy_start() to ax88172a_reset() (called when
  the network interface is started) and phy_stop() to ax88172a_stop().
  It seems to me that this is the way to do it since other drivers
  do the same.
- Changed the naming of the mdio bus, since the way it was done in v1
  could lead to bus names that are too long. Now the mdio bus is labeled 
  usb-<usb bus number>-<usb device number>.
- Cleanup of debug and error messages, in v1 dbg() was used instead
  of netdev_err() for some error messages.
- Changed to order of the fields in struct ax88172a_private (comment from
  Grant Grundler).
- Use #defines instead of magic numbers in ax88172a_bind()

I have tested the patch with the ASIX AX88172A demo board (using the
internal PHY) and a custom board (AX88172A and National DP83640 PHY).

I am looking forward to your comments! :-)

Regards, Christian


Christian Riesch (3):
  asix: Rename asix.c to asix_devices.c
  asix: Factor out common code
  asix: Add a new driver for the AX88172A

 drivers/net/usb/Makefile       |    1 +
 drivers/net/usb/asix.c         | 1680 ----------------------------------------
 drivers/net/usb/asix.h         |  217 ++++++
 drivers/net/usb/asix_common.c  |  545 +++++++++++++
 drivers/net/usb/asix_devices.c | 1028 ++++++++++++++++++++++++
 drivers/net/usb/ax88172a.c     |  423 ++++++++++
 6 files changed, 2214 insertions(+), 1680 deletions(-)
 delete mode 100644 drivers/net/usb/asix.c
 create mode 100644 drivers/net/usb/asix.h
 create mode 100644 drivers/net/usb/asix_common.c
 create mode 100644 drivers/net/usb/asix_devices.c
 create mode 100644 drivers/net/usb/ax88172a.c

^ permalink raw reply

* [PATCH 16/16] net: Remove checks for dst_ops->redirect being NULL.
From: David Miller @ 2012-07-12  8:12 UTC (permalink / raw)
  To: netdev


No longer necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dccp/ipv4.c         |    2 +-
 net/dccp/ipv6.c         |    2 +-
 net/ipv4/tcp_ipv4.c     |    2 +-
 net/ipv4/xfrm4_policy.c |    3 +--
 net/ipv6/ip6_tunnel.c   |    6 ++----
 net/ipv6/tcp_ipv6.c     |    2 +-
 net/sctp/input.c        |    2 +-
 7 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 8f41a319..129ed8f 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -199,7 +199,7 @@ static void dccp_do_redirect(struct sk_buff *skb, struct sock *sk)
 {
 	struct dst_entry *dst = __sk_dst_check(sk, 0);
 
-	if (dst && dst->ops->redirect)
+	if (dst)
 		dst->ops->redirect(dst, skb);
 }
 
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index b4d7d28..090c080 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -133,7 +133,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (type == NDISC_REDIRECT) {
 		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
 
-		if (dst && dst->ops->redirect)
+		if (dst)
 			dst->ops->redirect(dst, skb);
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 087a848..7a0062c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -325,7 +325,7 @@ static void do_redirect(struct sk_buff *skb, struct sock *sk)
 {
 	struct dst_entry *dst = __sk_dst_check(sk, 0);
 
-	if (dst && dst->ops->redirect)
+	if (dst)
 		dst->ops->redirect(dst, skb);
 }
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 258ebd7..737131c 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -207,8 +207,7 @@ static void xfrm4_redirect(struct dst_entry *dst, struct sk_buff *skb)
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	if (path->ops->redirect)
-		path->ops->redirect(path, skb);
+	path->ops->redirect(path, skb);
 }
 
 static void xfrm4_dst_destroy(struct dst_entry *dst)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0b5b60e..61d1065 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -611,10 +611,8 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 		skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), rel_info);
 	}
-	if (rel_type == ICMP_REDIRECT) {
-		if (skb_dst(skb2)->ops->redirect)
-			skb_dst(skb2)->ops->redirect(skb_dst(skb2), skb2);
-	}
+	if (rel_type == ICMP_REDIRECT)
+		skb_dst(skb2)->ops->redirect(skb_dst(skb2), skb2);
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7249e4b..3071f37 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -366,7 +366,7 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (type == NDISC_REDIRECT) {
 		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
 
-		if (dst && dst->ops->redirect)
+		if (dst)
 			dst->ops->redirect(dst,skb);
 	}
 
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 5943b7d..f050d45 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -431,7 +431,7 @@ void sctp_icmp_redirect(struct sock *sk, struct sctp_transport *t,
 	if (!t)
 		return;
 	dst = sctp_transport_dst_check(t);
-	if (dst && dst->ops->redirect)
+	if (dst)
 		dst->ops->redirect(dst, skb);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 15/16] net: Add dummy dst_ops->redirect method where needed.
From: David Miller @ 2012-07-12  8:12 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/bridge/br_netfilter.c |    5 +++++
 net/decnet/dn_route.c     |    6 ++++++
 net/ipv4/route.c          |    5 +++++
 net/ipv6/route.c          |    5 +++++
 4 files changed, 21 insertions(+)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index b98d3d7..81f76c4 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -115,6 +115,10 @@ static void fake_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
 }
 
+static void fake_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+}
+
 static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
 {
 	return NULL;
@@ -136,6 +140,7 @@ static struct dst_ops fake_dst_ops = {
 	.family =		AF_INET,
 	.protocol =		cpu_to_be16(ETH_P_IP),
 	.update_pmtu =		fake_update_pmtu,
+	.redirect =		fake_redirect,
 	.cow_metrics =		fake_cow_metrics,
 	.neigh_lookup =		fake_neigh_lookup,
 	.mtu =			fake_mtu,
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index b5594cc..e9c4e2e 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -118,6 +118,7 @@ static void dn_dst_ifdown(struct dst_entry *, struct net_device *dev, int how);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, u32 mtu);
+static void dn_dst_redirect(struct dst_entry *dst, struct sk_buff *skb);
 static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
 					     struct sk_buff *skb,
 					     const void *daddr);
@@ -145,6 +146,7 @@ static struct dst_ops dn_dst_ops = {
 	.negative_advice =	dn_dst_negative_advice,
 	.link_failure =		dn_dst_link_failure,
 	.update_pmtu =		dn_dst_update_pmtu,
+	.redirect =		dn_dst_redirect,
 	.neigh_lookup =		dn_dst_neigh_lookup,
 };
 
@@ -292,6 +294,10 @@ static void dn_dst_update_pmtu(struct dst_entry *dst, u32 mtu)
 	}
 }
 
+static void dn_dst_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+}
+
 /*
  * When a route has been marked obsolete. (e.g. routing cache flush)
  */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e98207d..23bbe29 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2591,6 +2591,10 @@ static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
 }
 
+static void ipv4_rt_blackhole_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+}
+
 static u32 *ipv4_rt_blackhole_cow_metrics(struct dst_entry *dst,
 					  unsigned long old)
 {
@@ -2605,6 +2609,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 	.mtu			=	ipv4_blackhole_mtu,
 	.default_advmss		=	ipv4_default_advmss,
 	.update_pmtu		=	ipv4_rt_blackhole_update_pmtu,
+	.redirect		=	ipv4_rt_blackhole_redirect,
 	.cow_metrics		=	ipv4_rt_blackhole_cow_metrics,
 	.neigh_lookup		=	ipv4_neigh_lookup,
 };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7296af1..3151aab 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -191,6 +191,10 @@ static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
 }
 
+static void ip6_rt_blackhole_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+}
+
 static u32 *ip6_rt_blackhole_cow_metrics(struct dst_entry *dst,
 					 unsigned long old)
 {
@@ -205,6 +209,7 @@ static struct dst_ops ip6_dst_blackhole_ops = {
 	.mtu			=	ip6_blackhole_mtu,
 	.default_advmss		=	ip6_default_advmss,
 	.update_pmtu		=	ip6_rt_blackhole_update_pmtu,
+	.redirect		=	ip6_rt_blackhole_redirect,
 	.cow_metrics		=	ip6_rt_blackhole_cow_metrics,
 	.neigh_lookup		=	ip6_neigh_lookup,
 };
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 14/16] ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().
From: David Miller @ 2012-07-12  8:12 UTC (permalink / raw)
  To: netdev


And delete rt6_redirect(), since it is no longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_route.h |    2 -
 include/net/ipv6.h      |    2 +
 net/ipv6/icmp.c         |    2 +-
 net/ipv6/ndisc.c        |    2 +-
 net/ipv6/route.c        |  107 -----------------------------------------------
 5 files changed, 4 insertions(+), 111 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6939405..b6b6f7d 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -133,8 +133,6 @@ extern int			rt6_route_rcv(struct net_device *dev,
 					      u8 *opt, int len,
 					      const struct in6_addr *gwaddr);
 
-extern void			rt6_redirect(struct sk_buff *skb);
-
 extern void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 			    int oif, u32 mark);
 extern void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d4261d4..f695f39 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -251,6 +251,8 @@ static inline void fl6_sock_release(struct ip6_flowlabel *fl)
 		atomic_dec(&fl->users);
 }
 
+extern void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info);
+
 extern int 			ip6_ra_control(struct sock *sk, int sel);
 
 extern int			ipv6_parse_hopopts(struct sk_buff *skb);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index a113f7d..24d69db 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -598,7 +598,7 @@ out:
 	icmpv6_xmit_unlock(sk);
 }
 
-static void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
+void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
 {
 	const struct inet6_protocol *ipprot;
 	int inner_offset;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index b8d53e1..ff36194 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1350,7 +1350,7 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 		return;
 	}
 
-	rt6_redirect(skb);
+	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
 }
 
 void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f52cf83..7296af1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1633,92 +1633,6 @@ static int ip6_route_del(struct fib6_config *cfg)
 	return err;
 }
 
-/*
- *	Handle redirects
- */
-struct ip6rd_flowi {
-	struct flowi6 fl6;
-	struct in6_addr gateway;
-};
-
-static struct rt6_info *__ip6_route_redirect(struct net *net,
-					     struct fib6_table *table,
-					     struct flowi6 *fl6,
-					     int flags)
-{
-	struct ip6rd_flowi *rdfl = (struct ip6rd_flowi *)fl6;
-	struct rt6_info *rt;
-	struct fib6_node *fn;
-
-	/*
-	 * Get the "current" route for this destination and
-	 * check if the redirect has come from approriate router.
-	 *
-	 * RFC 2461 specifies that redirects should only be
-	 * accepted if they come from the nexthop to the target.
-	 * Due to the way the routes are chosen, this notion
-	 * is a bit fuzzy and one might need to check all possible
-	 * routes.
-	 */
-
-	read_lock_bh(&table->tb6_lock);
-	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
-restart:
-	for (rt = fn->leaf; rt; rt = rt->dst.rt6_next) {
-		/*
-		 * Current route is on-link; redirect is always invalid.
-		 *
-		 * Seems, previous statement is not true. It could
-		 * be node, which looks for us as on-link (f.e. proxy ndisc)
-		 * But then router serving it might decide, that we should
-		 * know truth 8)8) --ANK (980726).
-		 */
-		if (rt6_check_expired(rt))
-			continue;
-		if (!(rt->rt6i_flags & RTF_GATEWAY))
-			continue;
-		if (fl6->flowi6_oif != rt->dst.dev->ifindex)
-			continue;
-		if (!ipv6_addr_equal(&rdfl->gateway, &rt->rt6i_gateway))
-			continue;
-		break;
-	}
-
-	if (!rt)
-		rt = net->ipv6.ip6_null_entry;
-	BACKTRACK(net, &fl6->saddr);
-out:
-	dst_hold(&rt->dst);
-
-	read_unlock_bh(&table->tb6_lock);
-
-	return rt;
-};
-
-static struct rt6_info *ip6_route_redirect(const struct in6_addr *dest,
-					   const struct in6_addr *src,
-					   const struct in6_addr *gateway,
-					   struct net_device *dev)
-{
-	int flags = RT6_LOOKUP_F_HAS_SADDR;
-	struct net *net = dev_net(dev);
-	struct ip6rd_flowi rdfl = {
-		.fl6 = {
-			.flowi6_oif = dev->ifindex,
-			.daddr = *dest,
-			.saddr = *src,
-		},
-	};
-
-	rdfl.gateway = *gateway;
-
-	if (rt6_need_strict(dest))
-		flags |= RT6_LOOKUP_F_IFACE;
-
-	return (struct rt6_info *)fib6_rule_lookup(net, &rdfl.fl6,
-						   flags, __ip6_route_redirect);
-}
-
 static void rt6_do_redirect(struct dst_entry *dst, struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
@@ -1848,27 +1762,6 @@ out:
 	neigh_release(neigh);
 }
 
-void rt6_redirect(struct sk_buff *skb)
-{
-	const struct in6_addr *target;
-	const struct in6_addr *dest;
-	const struct in6_addr *src;
-	const struct in6_addr *saddr;
-	struct icmp6hdr *icmph;
-	struct rt6_info *rt;
-
-	icmph = icmp6_hdr(skb);
-	target = (const struct in6_addr *) (icmph + 1);
-	dest = target + 1;
-
-	src = &ipv6_hdr(skb)->daddr;
-	saddr = &ipv6_hdr(skb)->saddr;
-
-	rt = ip6_route_redirect(dest, src, saddr, skb->dev);
-	rt6_do_redirect(&rt->dst, skb);
-	dst_release(&rt->dst);
-}
-
 /*
  *	Misc support functions
  */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 13/16] ipv6: Add redirect support to all protocol icmp error handlers.
From: David Miller @ 2012-07-12  8:12 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sctp/sctp.h |    2 ++
 net/dccp/ipv6.c         |    7 +++++++
 net/ipv6/ah6.c          |   10 ++++++----
 net/ipv6/esp6.c         |   11 +++++++----
 net/ipv6/ip6_tunnel.c   |    7 +++++++
 net/ipv6/ipcomp6.c      |   11 +++++++----
 net/ipv6/raw.c          |    2 ++
 net/ipv6/sit.c          |    8 ++++++++
 net/ipv6/tcp_ipv6.c     |    7 +++++++
 net/ipv6/udp.c          |    2 ++
 net/ipv6/xfrm6_policy.c |    9 +++++++++
 net/sctp/input.c        |    4 ++--
 net/sctp/ipv6.c         |    3 +++
 13 files changed, 69 insertions(+), 14 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index a2ef814..1f2735d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -162,6 +162,8 @@ struct sock *sctp_err_lookup(int family, struct sk_buff *,
 void sctp_err_finish(struct sock *, struct sctp_association *);
 void sctp_icmp_frag_needed(struct sock *, struct sctp_association *,
 			   struct sctp_transport *t, __u32 pmtu);
+void sctp_icmp_redirect(struct sock *, struct sctp_transport *,
+			struct sk_buff *);
 void sctp_icmp_proto_unreachable(struct sock *sk,
 				 struct sctp_association *asoc,
 				 struct sctp_transport *t);
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 02162cf..b4d7d28 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -130,6 +130,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	np = inet6_sk(sk);
 
+	if (type == NDISC_REDIRECT) {
+		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+
+		if (dst && dst->ops->redirect)
+			dst->ops->redirect(dst, skb);
+	}
+
 	if (type == ICMPV6_PKT_TOOBIG) {
 		struct dst_entry *dst = NULL;
 
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 49d4d26..7e61395 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -613,16 +613,18 @@ static void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	struct xfrm_state *x;
 
 	if (type != ICMPV6_DEST_UNREACH &&
-	    type != ICMPV6_PKT_TOOBIG)
+	    type != ICMPV6_PKT_TOOBIG &&
+	    type != NDISC_REDIRECT)
 		return;
 
 	x = xfrm_state_lookup(net, skb->mark, (xfrm_address_t *)&iph->daddr, ah->spi, IPPROTO_AH, AF_INET6);
 	if (!x)
 		return;
 
-	NETDEBUG(KERN_DEBUG "pmtu discovery on SA AH/%08x/%pI6\n",
-		 ntohl(ah->spi), &iph->daddr);
-	ip6_update_pmtu(skb, net, info, 0, 0);
+	if (type == NDISC_REDIRECT)
+		ip6_redirect(skb, net, 0, 0);
+	else
+		ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 89a615b..6dc7fd3 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -434,16 +434,19 @@ static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	struct xfrm_state *x;
 
 	if (type != ICMPV6_DEST_UNREACH &&
-	    type != ICMPV6_PKT_TOOBIG)
+	    type != ICMPV6_PKT_TOOBIG &&
+	    type != NDISC_REDIRECT)
 		return;
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      esph->spi, IPPROTO_ESP, AF_INET6);
 	if (!x)
 		return;
-	pr_debug("pmtu discovery on SA ESP/%08x/%pI6\n",
-		 ntohl(esph->spi), &iph->daddr);
-	ip6_update_pmtu(skb, net, info, 0, 0);
+
+	if (type == NDISC_REDIRECT)
+		ip6_redirect(skb, net, 0, 0);
+	else
+		ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6af3fcf..0b5b60e 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -550,6 +550,9 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		rel_type = ICMP_DEST_UNREACH;
 		rel_code = ICMP_FRAG_NEEDED;
 		break;
+	case NDISC_REDIRECT:
+		rel_type = ICMP_REDIRECT;
+		rel_code = ICMP_REDIR_HOST;
 	default:
 		return 0;
 	}
@@ -608,6 +611,10 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 		skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), rel_info);
 	}
+	if (rel_type == ICMP_REDIRECT) {
+		if (skb_dst(skb2)->ops->redirect)
+			skb_dst(skb2)->ops->redirect(skb_dst(skb2), skb2);
+	}
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
 
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 9283238..7af5aee 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -64,7 +64,9 @@ static void ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		(struct ip_comp_hdr *)(skb->data + offset);
 	struct xfrm_state *x;
 
-	if (type != ICMPV6_DEST_UNREACH && type != ICMPV6_PKT_TOOBIG)
+	if (type != ICMPV6_DEST_UNREACH &&
+	    type != ICMPV6_PKT_TOOBIG &&
+	    type != NDISC_REDIRECT)
 		return;
 
 	spi = htonl(ntohs(ipcomph->cpi));
@@ -73,9 +75,10 @@ static void ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (!x)
 		return;
 
-	pr_debug("pmtu discovery on SA IPCOMP/%08x/%pI6\n",
-		 spi, &iph->daddr);
-	ip6_update_pmtu(skb, net, info, 0, 0);
+	if (type == NDISC_REDIRECT)
+		ip6_redirect(skb, net, 0, 0);
+	else
+		ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index b5c1dcb..ef0579d 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -332,6 +332,8 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
 		ip6_sk_update_pmtu(skb, sk, info);
 		harderr = (np->pmtudisc == IPV6_PMTUDISC_DO);
 	}
+	if (type == NDISC_REDIRECT)
+		ip6_sk_redirect(skb, sk);
 	if (np->recverr) {
 		u8 *payload = skb->data;
 		if (!inet->hdrincl)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 49aea94..fbf1622 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -539,6 +539,8 @@ static int ipip6_err(struct sk_buff *skb, u32 info)
 		if (code != ICMP_EXC_TTL)
 			return 0;
 		break;
+	case ICMP_REDIRECT:
+		break;
 	}
 
 	err = -ENOENT;
@@ -557,6 +559,12 @@ static int ipip6_err(struct sk_buff *skb, u32 info)
 		err = 0;
 		goto out;
 	}
+	if (type == ICMP_REDIRECT) {
+		ipv4_redirect(skb, dev_net(skb->dev), t->dev->ifindex, 0,
+			      IPPROTO_IPV6, 0);
+		err = 0;
+		goto out;
+	}
 
 	if (t->parms.iph.daddr == 0)
 		goto out;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 70458a9..7249e4b 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -363,6 +363,13 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	np = inet6_sk(sk);
 
+	if (type == NDISC_REDIRECT) {
+		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+
+		if (dst && dst->ops->redirect)
+			dst->ops->redirect(dst,skb);
+	}
+
 	if (type == ICMPV6_PKT_TOOBIG) {
 		struct dst_entry *dst;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1ecd102..99d0077 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -483,6 +483,8 @@ void __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	if (type == ICMPV6_PKT_TOOBIG)
 		ip6_sk_update_pmtu(skb, sk, info);
+	if (type == NDISC_REDIRECT)
+		ip6_sk_redirect(skb, sk);
 
 	np = inet6_sk(sk);
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index bb02038..f5a9cb8 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -215,6 +215,14 @@ static void xfrm6_update_pmtu(struct dst_entry *dst, u32 mtu)
 	path->ops->update_pmtu(path, mtu);
 }
 
+static void xfrm6_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
+	struct dst_entry *path = xdst->route;
+
+	path->ops->redirect(path, skb);
+}
+
 static void xfrm6_dst_destroy(struct dst_entry *dst)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
@@ -261,6 +269,7 @@ static struct dst_ops xfrm6_dst_ops = {
 	.protocol =		cpu_to_be16(ETH_P_IPV6),
 	.gc =			xfrm6_garbage_collect,
 	.update_pmtu =		xfrm6_update_pmtu,
+	.redirect =		xfrm6_redirect,
 	.cow_metrics =		dst_cow_metrics_generic,
 	.destroy =		xfrm6_dst_destroy,
 	.ifdown =		xfrm6_dst_ifdown,
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 9fb4247..5943b7d 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -423,8 +423,8 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
 	sctp_retransmit(&asoc->outqueue, t, SCTP_RTXR_PMTUD);
 }
 
-static void sctp_icmp_redirect(struct sock *sk, struct sctp_transport *t,
-			       struct sk_buff *skb)
+void sctp_icmp_redirect(struct sock *sk, struct sctp_transport *t,
+			struct sk_buff *skb)
 {
 	struct dst_entry *dst;
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 91f4791..ed7139e 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -185,6 +185,9 @@ SCTP_STATIC void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			goto out_unlock;
 		}
 		break;
+	case NDISC_REDIRECT:
+		sctp_icmp_redirect(sk, transport, skb);
+		break;
 	default:
 		break;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 12/16] ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions.
From: David Miller @ 2012-07-12  8:12 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_route.h |    2 ++
 net/ipv6/route.c        |   27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 5cedbd7..6939405 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -139,6 +139,8 @@ extern void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 			    int oif, u32 mark);
 extern void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk,
 			       __be32 mtu);
+extern void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark);
+extern void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk);
 
 struct netlink_callback;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 545b152..f52cf83 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1114,6 +1114,33 @@ void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu)
 }
 EXPORT_SYMBOL_GPL(ip6_sk_update_pmtu);
 
+void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark)
+{
+	const struct ipv6hdr *iph = (struct ipv6hdr *) skb->data;
+	struct dst_entry *dst;
+	struct flowi6 fl6;
+
+	memset(&fl6, 0, sizeof(fl6));
+	fl6.flowi6_oif = oif;
+	fl6.flowi6_mark = mark;
+	fl6.flowi6_flags = 0;
+	fl6.daddr = iph->daddr;
+	fl6.saddr = iph->saddr;
+	fl6.flowlabel = (*(__be32 *) iph) & IPV6_FLOWINFO_MASK;
+
+	dst = ip6_route_output(net, NULL, &fl6);
+	if (!dst->error)
+		rt6_do_redirect(dst, skb);
+	dst_release(dst);
+}
+EXPORT_SYMBOL_GPL(ip6_redirect);
+
+void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk)
+{
+	ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark);
+}
+EXPORT_SYMBOL_GPL(ip6_sk_redirect);
+
 static unsigned int ip6_default_advmss(const struct dst_entry *dst)
 {
 	struct net_device *dev = dst->dev;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 11/16] ipv6: Pull main logic of rt6_redirect() into rt6_do_redirect().
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


Hook it into dst_ops->redirect as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/route.c |   80 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 49 insertions(+), 31 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 73cf3f78..545b152 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -79,6 +79,7 @@ static int		ip6_pkt_discard(struct sk_buff *skb);
 static int		ip6_pkt_discard_out(struct sk_buff *skb);
 static void		ip6_link_failure(struct sk_buff *skb);
 static void		ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
+static void		rt6_do_redirect(struct dst_entry *dst, struct sk_buff *skb);
 
 #ifdef CONFIG_IPV6_ROUTE_INFO
 static struct rt6_info *rt6_add_route_info(struct net *net,
@@ -174,6 +175,7 @@ static struct dst_ops ip6_dst_ops_template = {
 	.negative_advice	=	ip6_negative_advice,
 	.link_failure		=	ip6_link_failure,
 	.update_pmtu		=	ip6_rt_update_pmtu,
+	.redirect		=	rt6_do_redirect,
 	.local_out		=	__ip6_local_out,
 	.neigh_lookup		=	ip6_neigh_lookup,
 };
@@ -1690,28 +1692,26 @@ static struct rt6_info *ip6_route_redirect(const struct in6_addr *dest,
 						   flags, __ip6_route_redirect);
 }
 
-void rt6_redirect(struct sk_buff *skb)
+static void rt6_do_redirect(struct dst_entry *dst, struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
 	struct netevent_redirect netevent;
 	struct rt6_info *rt, *nrt = NULL;
 	const struct in6_addr *target;
-	struct neighbour *old_neigh;
-	const struct in6_addr *dest;
-	const struct in6_addr *src;
-	const struct in6_addr *saddr;
 	struct ndisc_options ndopts;
+	const struct in6_addr *dest;
+	struct neighbour *old_neigh;
 	struct inet6_dev *in6_dev;
 	struct neighbour *neigh;
 	struct icmp6hdr *icmph;
-	int on_link, optlen;
-	u8 *lladdr = NULL;
+	int optlen, on_link;
+	u8 *lladdr;
 
 	optlen = skb->tail - skb->transport_header;
 	optlen -= sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr);
 
 	if (optlen < 0) {
-		net_dbg_ratelimited("rt6_redirect: packet too short\n");
+		net_dbg_ratelimited("rt6_do_redirect: packet too short\n");
 		return;
 	}
 
@@ -1720,15 +1720,16 @@ void rt6_redirect(struct sk_buff *skb)
 	dest = target + 1;
 
 	if (ipv6_addr_is_multicast(dest)) {
-		net_dbg_ratelimited("rt6_redirect: destination address is multicast\n");
+		net_dbg_ratelimited("rt6_do_redirect: destination address is multicast\n");
 		return;
 	}
 
+	on_link = 0;
 	if (ipv6_addr_equal(dest, target)) {
 		on_link = 1;
 	} else if (ipv6_addr_type(target) !=
 		   (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
-		net_dbg_ratelimited("rt6_redirect: target address is not link-local unicast\n");
+		net_dbg_ratelimited("rt6_do_redirect: target address is not link-local unicast\n");
 		return;
 	}
 
@@ -1747,6 +1748,8 @@ void rt6_redirect(struct sk_buff *skb)
 		net_dbg_ratelimited("rt6_redirect: invalid ND options\n");
 		return;
 	}
+
+	lladdr = NULL;
 	if (ndopts.nd_opts_tgt_lladdr) {
 		lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr,
 					     skb->dev);
@@ -1756,19 +1759,26 @@ void rt6_redirect(struct sk_buff *skb)
 		}
 	}
 
-	neigh = __neigh_lookup(&nd_tbl, target, skb->dev, 1);
-	if (!neigh)
+	rt = (struct rt6_info *) dst;
+	if (rt == net->ipv6.ip6_null_entry) {
+		net_dbg_ratelimited("rt6_redirect: source isn't a valid nexthop for redirect target\n");
 		return;
+	}
 
-	src = &ipv6_hdr(skb)->daddr;
-	saddr = &ipv6_hdr(skb)->saddr;
+	/* Redirect received -> path was valid.
+	 * Look, redirects are sent only in response to data packets,
+	 * so that this nexthop apparently is reachable. --ANK
+	 */
+	dst_confirm(&rt->dst);
 
-	rt = ip6_route_redirect(dest, src, saddr, neigh->dev);
+	neigh = __neigh_lookup(&nd_tbl, target, skb->dev, 1);
+	if (!neigh)
+		return;
 
-	if (rt == net->ipv6.ip6_null_entry) {
-		net_dbg_ratelimited("rt6_redirect: source isn't a valid nexthop for redirect target\n");
+	/* Duplicate redirect: silently ignore. */
+	old_neigh = rt->n;
+	if (neigh == old_neigh)
 		goto out;
-	}
 
 	/*
 	 *	We have finally decided to accept it.
@@ -1781,18 +1791,6 @@ void rt6_redirect(struct sk_buff *skb)
 				     NEIGH_UPDATE_F_ISROUTER))
 		     );
 
-	/*
-	 * Redirect received -> path was valid.
-	 * Look, redirects are sent only in response to data packets,
-	 * so that this nexthop apparently is reachable. --ANK
-	 */
-	dst_confirm(&rt->dst);
-
-	/* Duplicate redirect: silently ignore. */
-	old_neigh = rt->n;
-	if (neigh == old_neigh)
-		goto out;
-
 	nrt = ip6_rt_copy(rt, dest);
 	if (!nrt)
 		goto out;
@@ -1815,12 +1813,32 @@ void rt6_redirect(struct sk_buff *skb)
 	call_netevent_notifiers(NETEVENT_REDIRECT, &netevent);
 
 	if (rt->rt6i_flags & RTF_CACHE) {
+		rt = (struct rt6_info *) dst_clone(&rt->dst);
 		ip6_del_rt(rt);
-		return;
 	}
 
 out:
 	neigh_release(neigh);
+}
+
+void rt6_redirect(struct sk_buff *skb)
+{
+	const struct in6_addr *target;
+	const struct in6_addr *dest;
+	const struct in6_addr *src;
+	const struct in6_addr *saddr;
+	struct icmp6hdr *icmph;
+	struct rt6_info *rt;
+
+	icmph = icmp6_hdr(skb);
+	target = (const struct in6_addr *) (icmph + 1);
+	dest = target + 1;
+
+	src = &ipv6_hdr(skb)->daddr;
+	saddr = &ipv6_hdr(skb)->saddr;
+
+	rt = ip6_route_redirect(dest, src, saddr, skb->dev);
+	rt6_do_redirect(&rt->dst, skb);
 	dst_release(&rt->dst);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 10/16] ipv6: Move bulk of redirect handling into rt6_redirect().
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_route.h |    7 +----
 net/ipv6/ndisc.c        |   72 +--------------------------------------------
 net/ipv6/route.c        |   75 +++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 72 insertions(+), 82 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 58cb3fc..5cedbd7 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -133,12 +133,7 @@ extern int			rt6_route_rcv(struct net_device *dev,
 					      u8 *opt, int len,
 					      const struct in6_addr *gwaddr);
 
-extern void			rt6_redirect(const struct in6_addr *dest,
-					     const struct in6_addr *src,
-					     const struct in6_addr *saddr,
-					     struct neighbour *neigh,
-					     u8 *lladdr,
-					     int on_link);
+extern void			rt6_redirect(struct sk_buff *skb);
 
 extern void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 			    int oif, u32 mark);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index a3189ba..b8d53e1 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -143,8 +143,6 @@ struct neigh_table nd_tbl = {
 	.gc_thresh3 =	1024,
 };
 
-#define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
-
 static inline int ndisc_opt_addr_space(struct net_device *dev)
 {
 	return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
@@ -1336,16 +1334,6 @@ out:
 
 static void ndisc_redirect_rcv(struct sk_buff *skb)
 {
-	struct inet6_dev *in6_dev;
-	struct icmp6hdr *icmph;
-	const struct in6_addr *dest;
-	const struct in6_addr *target;	/* new first hop to destination */
-	struct neighbour *neigh;
-	int on_link = 0;
-	struct ndisc_options ndopts;
-	int optlen;
-	u8 *lladdr = NULL;
-
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	switch (skb->ndisc_nodetype) {
 	case NDISC_NODETYPE_HOST:
@@ -1362,65 +1350,7 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 		return;
 	}
 
-	optlen = skb->tail - skb->transport_header;
-	optlen -= sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr);
-
-	if (optlen < 0) {
-		ND_PRINTK(2, warn, "Redirect: packet too short\n");
-		return;
-	}
-
-	icmph = icmp6_hdr(skb);
-	target = (const struct in6_addr *) (icmph + 1);
-	dest = target + 1;
-
-	if (ipv6_addr_is_multicast(dest)) {
-		ND_PRINTK(2, warn,
-			  "Redirect: destination address is multicast\n");
-		return;
-	}
-
-	if (ipv6_addr_equal(dest, target)) {
-		on_link = 1;
-	} else if (ipv6_addr_type(target) !=
-		   (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
-		ND_PRINTK(2, warn,
-			  "Redirect: target address is not link-local unicast\n");
-		return;
-	}
-
-	in6_dev = __in6_dev_get(skb->dev);
-	if (!in6_dev)
-		return;
-	if (in6_dev->cnf.forwarding || !in6_dev->cnf.accept_redirects)
-		return;
-
-	/* RFC2461 8.1:
-	 *	The IP source address of the Redirect MUST be the same as the current
-	 *	first-hop router for the specified ICMP Destination Address.
-	 */
-
-	if (!ndisc_parse_options((u8*)(dest + 1), optlen, &ndopts)) {
-		ND_PRINTK(2, warn, "Redirect: invalid ND options\n");
-		return;
-	}
-	if (ndopts.nd_opts_tgt_lladdr) {
-		lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr,
-					     skb->dev);
-		if (!lladdr) {
-			ND_PRINTK(2, warn,
-				  "Redirect: invalid link-layer address length\n");
-			return;
-		}
-	}
-
-	neigh = __neigh_lookup(&nd_tbl, target, skb->dev, 1);
-	if (neigh) {
-		rt6_redirect(dest, &ipv6_hdr(skb)->daddr,
-			     &ipv6_hdr(skb)->saddr, neigh, lladdr,
-			     on_link);
-		neigh_release(neigh);
-	}
+	rt6_redirect(skb);
 }
 
 void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 563f12c..73cf3f78 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1690,14 +1690,78 @@ static struct rt6_info *ip6_route_redirect(const struct in6_addr *dest,
 						   flags, __ip6_route_redirect);
 }
 
-void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
-		  const struct in6_addr *saddr,
-		  struct neighbour *neigh, u8 *lladdr, int on_link)
+void rt6_redirect(struct sk_buff *skb)
 {
-	struct rt6_info *rt, *nrt = NULL;
+	struct net *net = dev_net(skb->dev);
 	struct netevent_redirect netevent;
-	struct net *net = dev_net(neigh->dev);
+	struct rt6_info *rt, *nrt = NULL;
+	const struct in6_addr *target;
 	struct neighbour *old_neigh;
+	const struct in6_addr *dest;
+	const struct in6_addr *src;
+	const struct in6_addr *saddr;
+	struct ndisc_options ndopts;
+	struct inet6_dev *in6_dev;
+	struct neighbour *neigh;
+	struct icmp6hdr *icmph;
+	int on_link, optlen;
+	u8 *lladdr = NULL;
+
+	optlen = skb->tail - skb->transport_header;
+	optlen -= sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr);
+
+	if (optlen < 0) {
+		net_dbg_ratelimited("rt6_redirect: packet too short\n");
+		return;
+	}
+
+	icmph = icmp6_hdr(skb);
+	target = (const struct in6_addr *) (icmph + 1);
+	dest = target + 1;
+
+	if (ipv6_addr_is_multicast(dest)) {
+		net_dbg_ratelimited("rt6_redirect: destination address is multicast\n");
+		return;
+	}
+
+	if (ipv6_addr_equal(dest, target)) {
+		on_link = 1;
+	} else if (ipv6_addr_type(target) !=
+		   (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
+		net_dbg_ratelimited("rt6_redirect: target address is not link-local unicast\n");
+		return;
+	}
+
+	in6_dev = __in6_dev_get(skb->dev);
+	if (!in6_dev)
+		return;
+	if (in6_dev->cnf.forwarding || !in6_dev->cnf.accept_redirects)
+		return;
+
+	/* RFC2461 8.1:
+	 *	The IP source address of the Redirect MUST be the same as the current
+	 *	first-hop router for the specified ICMP Destination Address.
+	 */
+
+	if (!ndisc_parse_options((u8*)(dest + 1), optlen, &ndopts)) {
+		net_dbg_ratelimited("rt6_redirect: invalid ND options\n");
+		return;
+	}
+	if (ndopts.nd_opts_tgt_lladdr) {
+		lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr,
+					     skb->dev);
+		if (!lladdr) {
+			net_dbg_ratelimited("rt6_redirect: invalid link-layer address length\n");
+			return;
+		}
+	}
+
+	neigh = __neigh_lookup(&nd_tbl, target, skb->dev, 1);
+	if (!neigh)
+		return;
+
+	src = &ipv6_hdr(skb)->daddr;
+	saddr = &ipv6_hdr(skb)->saddr;
 
 	rt = ip6_route_redirect(dest, src, saddr, neigh->dev);
 
@@ -1756,6 +1820,7 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
 	}
 
 out:
+	neigh_release(neigh);
 	dst_release(&rt->dst);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 09/16] ipv6: Export ndisc option parsing from ndisc.c
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


This is going to be used internally by the rt6 redirect code.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ndisc.h |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ndisc.c    |   47 ++---------------------------------------------
 2 files changed, 52 insertions(+), 45 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index c02b6ad..96a3b5c 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -47,6 +47,8 @@ enum {
 #include <linux/icmpv6.h>
 #include <linux/in6.h>
 #include <linux/types.h>
+#include <linux/if_arp.h>
+#include <linux/netdevice.h>
 
 #include <net/neighbour.h>
 
@@ -80,6 +82,54 @@ struct nd_opt_hdr {
 	__u8		nd_opt_len;
 } __packed;
 
+/* ND options */
+struct ndisc_options {
+	struct nd_opt_hdr *nd_opt_array[__ND_OPT_ARRAY_MAX];
+#ifdef CONFIG_IPV6_ROUTE_INFO
+	struct nd_opt_hdr *nd_opts_ri;
+	struct nd_opt_hdr *nd_opts_ri_end;
+#endif
+	struct nd_opt_hdr *nd_useropts;
+	struct nd_opt_hdr *nd_useropts_end;
+};
+
+#define nd_opts_src_lladdr	nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
+#define nd_opts_tgt_lladdr	nd_opt_array[ND_OPT_TARGET_LL_ADDR]
+#define nd_opts_pi		nd_opt_array[ND_OPT_PREFIX_INFO]
+#define nd_opts_pi_end		nd_opt_array[__ND_OPT_PREFIX_INFO_END]
+#define nd_opts_rh		nd_opt_array[ND_OPT_REDIRECT_HDR]
+#define nd_opts_mtu		nd_opt_array[ND_OPT_MTU]
+
+#define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
+
+extern struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
+						 struct ndisc_options *ndopts);
+
+/*
+ * Return the padding between the option length and the start of the
+ * link addr.  Currently only IP-over-InfiniBand needs this, although
+ * if RFC 3831 IPv6-over-Fibre Channel is ever implemented it may
+ * also need a pad of 2.
+ */
+static int ndisc_addr_option_pad(unsigned short type)
+{
+	switch (type) {
+	case ARPHRD_INFINIBAND: return 2;
+	default:                return 0;
+	}
+}
+
+static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
+				      struct net_device *dev)
+{
+	u8 *lladdr = (u8 *)(p + 1);
+	int lladdrlen = p->nd_opt_len << 3;
+	int prepad = ndisc_addr_option_pad(dev->type);
+	if (lladdrlen != NDISC_OPT_SPACE(dev->addr_len + prepad))
+		return NULL;
+	return lladdr + prepad;
+}
+
 static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, __u32 *hash_rnd)
 {
 	const u32 *p32 = pkey;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 0fddd57..a3189ba 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -143,40 +143,8 @@ struct neigh_table nd_tbl = {
 	.gc_thresh3 =	1024,
 };
 
-/* ND options */
-struct ndisc_options {
-	struct nd_opt_hdr *nd_opt_array[__ND_OPT_ARRAY_MAX];
-#ifdef CONFIG_IPV6_ROUTE_INFO
-	struct nd_opt_hdr *nd_opts_ri;
-	struct nd_opt_hdr *nd_opts_ri_end;
-#endif
-	struct nd_opt_hdr *nd_useropts;
-	struct nd_opt_hdr *nd_useropts_end;
-};
-
-#define nd_opts_src_lladdr	nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
-#define nd_opts_tgt_lladdr	nd_opt_array[ND_OPT_TARGET_LL_ADDR]
-#define nd_opts_pi		nd_opt_array[ND_OPT_PREFIX_INFO]
-#define nd_opts_pi_end		nd_opt_array[__ND_OPT_PREFIX_INFO_END]
-#define nd_opts_rh		nd_opt_array[ND_OPT_REDIRECT_HDR]
-#define nd_opts_mtu		nd_opt_array[ND_OPT_MTU]
-
 #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
 
-/*
- * Return the padding between the option length and the start of the
- * link addr.  Currently only IP-over-InfiniBand needs this, although
- * if RFC 3831 IPv6-over-Fibre Channel is ever implemented it may
- * also need a pad of 2.
- */
-static int ndisc_addr_option_pad(unsigned short type)
-{
-	switch (type) {
-	case ARPHRD_INFINIBAND: return 2;
-	default:                return 0;
-	}
-}
-
 static inline int ndisc_opt_addr_space(struct net_device *dev)
 {
 	return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
@@ -233,8 +201,8 @@ static struct nd_opt_hdr *ndisc_next_useropt(struct nd_opt_hdr *cur,
 	return cur <= end && ndisc_is_useropt(cur) ? cur : NULL;
 }
 
-static struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
-						 struct ndisc_options *ndopts)
+struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
+					  struct ndisc_options *ndopts)
 {
 	struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)opt;
 
@@ -297,17 +265,6 @@ static struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
 	return ndopts;
 }
 
-static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
-				      struct net_device *dev)
-{
-	u8 *lladdr = (u8 *)(p + 1);
-	int lladdrlen = p->nd_opt_len << 3;
-	int prepad = ndisc_addr_option_pad(dev->type);
-	if (lladdrlen != NDISC_OPT_SPACE(dev->addr_len + prepad))
-		return NULL;
-	return lladdr + prepad;
-}
-
 int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev, int dir)
 {
 	switch (dev->type) {
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 08/16] ipv4: Kill ip_rt_redirect().
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |    1 -
 net/ipv4/icmp.c     |    1 -
 net/ipv4/route.c    |   44 --------------------------------------------
 3 files changed, 46 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 6ab93ee..ace3cb4 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -108,7 +108,6 @@ extern struct ip_rt_acct __percpu *ip_rt_acct;
 
 struct in_device;
 extern int		ip_rt_init(void);
-extern void		ip_rt_redirect(struct sk_buff *skb, __be32 new_gw);
 extern void		rt_cache_flush(struct net *net, int how);
 extern void		rt_cache_flush_batch(struct net *net);
 extern struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 70a7935..d01aeb4 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -763,7 +763,6 @@ static void icmp_redirect(struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
 		return;
 
-	ip_rt_redirect(skb, icmp_hdr(skb)->un.gateway);
 	icmp_socket_deliver(skb, icmp_hdr(skb)->un.gateway);
 }
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index aabece6..e98207d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1345,50 +1345,6 @@ reject_redirect:
 	;
 }
 
-/* called in rcu_read_lock() section */
-void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
-{
-	const struct iphdr *iph = (const struct iphdr *) skb->data;
-	__be32 daddr = iph->daddr;
-	__be32 saddr = iph->saddr;
-	struct net_device *dev = skb->dev;
-	int    ikeys[2] = { dev->ifindex, 0 };
-	__be32 skeys[2] = { saddr, 0 };
-	struct net *net;
-	int s, i;
-
-	net = dev_net(dev);
-	for (s = 0; s < 2; s++) {
-		for (i = 0; i < 2; i++) {
-			unsigned int hash;
-			struct rtable __rcu **rthp;
-			struct rtable *rt;
-
-			hash = rt_hash(daddr, skeys[s], ikeys[i], rt_genid(net));
-
-			rthp = &rt_hash_table[hash].chain;
-
-			while ((rt = rcu_dereference(*rthp)) != NULL) {
-				rthp = &rt->dst.rt_next;
-
-				if (rt->rt_key_dst != daddr ||
-				    rt->rt_key_src != skeys[s] ||
-				    rt->rt_oif != ikeys[i] ||
-				    rt_is_input_route(rt) ||
-				    rt_is_expired(rt) ||
-				    !net_eq(dev_net(rt->dst.dev), net) ||
-				    rt->dst.error ||
-				    rt->dst.dev != dev)
-					continue;
-
-				ip_do_redirect(&rt->dst, skb);
-			}
-		}
-	}
-	return;
-
-}
-
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 {
 	struct rtable *rt = (struct rtable *)dst;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 07/16] ipv4: Add redirect support to all protocol icmp error handlers.
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dccp/ipv4.c         |   11 +++++++++++
 net/ipv4/ah4.c          |   18 +++++++++++++-----
 net/ipv4/esp4.c         |   18 +++++++++++++-----
 net/ipv4/ip_gre.c       |    9 ++++++++-
 net/ipv4/ipcomp.c       |   18 +++++++++++++-----
 net/ipv4/ipip.c         |    9 +++++++++
 net/ipv4/ping.c         |    1 +
 net/ipv4/raw.c          |    2 ++
 net/ipv4/tcp_ipv4.c     |   11 +++++++++++
 net/ipv4/udp.c          |    3 +++
 net/ipv4/xfrm4_policy.c |   10 ++++++++++
 net/sctp/input.c        |   16 ++++++++++++++++
 12 files changed, 110 insertions(+), 16 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 3eb76b5..8f41a319 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -195,6 +195,14 @@ static inline void dccp_do_pmtu_discovery(struct sock *sk,
 	} /* else let the usual retransmit timer handle it */
 }
 
+static void dccp_do_redirect(struct sk_buff *skb, struct sock *sk)
+{
+	struct dst_entry *dst = __sk_dst_check(sk, 0);
+
+	if (dst && dst->ops->redirect)
+		dst->ops->redirect(dst, skb);
+}
+
 /*
  * This routine is called by the ICMP module when it gets some sort of error
  * condition. If err < 0 then the socket should be closed and the error
@@ -259,6 +267,9 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 	}
 
 	switch (type) {
+	case ICMP_REDIRECT:
+		dccp_do_redirect(skb, sk);
+		goto out;
 	case ICMP_SOURCE_QUENCH:
 		/* Just silently ignore these. */
 		goto out;
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 916d5ec..a0d8392 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -398,17 +398,25 @@ static void ah4_err(struct sk_buff *skb, u32 info)
 	struct ip_auth_hdr *ah = (struct ip_auth_hdr *)(skb->data+(iph->ihl<<2));
 	struct xfrm_state *x;
 
-	if (icmp_hdr(skb)->type != ICMP_DEST_UNREACH ||
-	    icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+	switch (icmp_hdr(skb)->type) {
+	case ICMP_DEST_UNREACH:
+		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+			return;
+	case ICMP_REDIRECT:
+		break;
+	default:
 		return;
+	}
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      ah->spi, IPPROTO_AH, AF_INET);
 	if (!x)
 		return;
-	pr_debug("pmtu discovery on SA AH/%08x/%08x\n",
-		 ntohl(ah->spi), ntohl(iph->daddr));
-	ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_AH, 0);
+
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_AH, 0);
+	else
+		ipv4_redirect(skb, net, 0, 0, IPPROTO_AH, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 7b95b49..b61e9de 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -484,17 +484,25 @@ static void esp4_err(struct sk_buff *skb, u32 info)
 	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
 	struct xfrm_state *x;
 
-	if (icmp_hdr(skb)->type != ICMP_DEST_UNREACH ||
-	    icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+	switch (icmp_hdr(skb)->type) {
+	case ICMP_DEST_UNREACH:
+		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+			return;
+	case ICMP_REDIRECT:
+		break;
+	default:
 		return;
+	}
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      esph->spi, IPPROTO_ESP, AF_INET);
 	if (!x)
 		return;
-	NETDEBUG(KERN_DEBUG "pmtu discovery on SA ESP/%08x/%08x\n",
-		 ntohl(esph->spi), ntohl(iph->daddr));
-	ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_ESP, 0);
+
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_ESP, 0);
+	else
+		ipv4_redirect(skb, net, 0, 0, IPPROTO_ESP, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 594cec3..0c31235 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -528,6 +528,9 @@ static void ipgre_err(struct sk_buff *skb, u32 info)
 		if (code != ICMP_EXC_TTL)
 			return;
 		break;
+
+	case ICMP_REDIRECT:
+		break;
 	}
 
 	rcu_read_lock();
@@ -543,7 +546,11 @@ static void ipgre_err(struct sk_buff *skb, u32 info)
 				 t->parms.link, 0, IPPROTO_GRE, 0);
 		goto out;
 	}
-
+	if (type == ICMP_REDIRECT) {
+		ipv4_redirect(skb, dev_net(skb->dev), t->parms.link, 0,
+			      IPPROTO_GRE, 0);
+		goto out;
+	}
 	if (t->parms.iph.daddr == 0 ||
 	    ipv4_is_multicast(t->parms.iph.daddr))
 		goto out;
diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index b913754..d3ab47e 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -31,18 +31,26 @@ static void ipcomp4_err(struct sk_buff *skb, u32 info)
 	struct ip_comp_hdr *ipch = (struct ip_comp_hdr *)(skb->data+(iph->ihl<<2));
 	struct xfrm_state *x;
 
-	if (icmp_hdr(skb)->type != ICMP_DEST_UNREACH ||
-	    icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+	switch (icmp_hdr(skb)->type) {
+	case ICMP_DEST_UNREACH:
+		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+			return;
+	case ICMP_REDIRECT:
+		break;
+	default:
 		return;
+	}
 
 	spi = htonl(ntohs(ipch->cpi));
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      spi, IPPROTO_COMP, AF_INET);
 	if (!x)
 		return;
-	NETDEBUG(KERN_DEBUG "pmtu discovery on SA IPCOMP/%08x/%pI4\n",
-		 spi, &iph->daddr);
-	ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_COMP, 0);
+
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_COMP, 0);
+	else
+		ipv4_redirect(skb, net, 0, 0, IPPROTO_COMP, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 715338a..c2d0e6d 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -360,6 +360,8 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		if (code != ICMP_EXC_TTL)
 			return 0;
 		break;
+	case ICMP_REDIRECT:
+		break;
 	}
 
 	err = -ENOENT;
@@ -376,6 +378,13 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		goto out;
 	}
 
+	if (type == ICMP_REDIRECT) {
+		ipv4_redirect(skb, dev_net(skb->dev), t->dev->ifindex, 0,
+			      IPPROTO_IPIP, 0);
+		err = 0;
+		goto out;
+	}
+
 	if (t->parms.iph.daddr == 0)
 		goto out;
 
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 340fcf2..6232d47 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -387,6 +387,7 @@ void ping_err(struct sk_buff *skb, u32 info)
 		break;
 	case ICMP_REDIRECT:
 		/* See ICMP_SOURCE_QUENCH */
+		ipv4_sk_redirect(skb, sk);
 		err = EREMOTEIO;
 		break;
 	}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 659ddfb..ff0f071 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -218,6 +218,8 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info)
 
 	if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
 		ipv4_sk_update_pmtu(skb, sk, info);
+	else if (type == ICMP_REDIRECT)
+		ipv4_sk_redirect(skb, sk);
 
 	/* Report error on raw socket, if:
 	   1. User requested ip_recverr.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 01545a3..087a848 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -321,6 +321,14 @@ static void do_pmtu_discovery(struct sock *sk, const struct iphdr *iph, u32 mtu)
 	} /* else let the usual retransmit timer handle it */
 }
 
+static void do_redirect(struct sk_buff *skb, struct sock *sk)
+{
+	struct dst_entry *dst = __sk_dst_check(sk, 0);
+
+	if (dst && dst->ops->redirect)
+		dst->ops->redirect(dst, skb);
+}
+
 /*
  * This routine is called by the ICMP module when it gets some
  * sort of error condition.  If err < 0 then the socket should
@@ -394,6 +402,9 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 	}
 
 	switch (type) {
+	case ICMP_REDIRECT:
+		do_redirect(icmp_skb, sk);
+		goto out;
 	case ICMP_SOURCE_QUENCH:
 		/* Just silently ignore these. */
 		goto out;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ee37d47..b4c3582 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -630,6 +630,9 @@ void __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
 			err = icmp_err_convert[code].errno;
 		}
 		break;
+	case ICMP_REDIRECT:
+		ipv4_sk_redirect(skb, sk);
+		break;
 	}
 
 	/*
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 87d3fcc..258ebd7 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -202,6 +202,15 @@ static void xfrm4_update_pmtu(struct dst_entry *dst, u32 mtu)
 	path->ops->update_pmtu(path, mtu);
 }
 
+static void xfrm4_redirect(struct dst_entry *dst, struct sk_buff *skb)
+{
+	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
+	struct dst_entry *path = xdst->route;
+
+	if (path->ops->redirect)
+		path->ops->redirect(path, skb);
+}
+
 static void xfrm4_dst_destroy(struct dst_entry *dst)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
@@ -225,6 +234,7 @@ static struct dst_ops xfrm4_dst_ops = {
 	.protocol =		cpu_to_be16(ETH_P_IP),
 	.gc =			xfrm4_garbage_collect,
 	.update_pmtu =		xfrm4_update_pmtu,
+	.redirect =		xfrm4_redirect,
 	.cow_metrics =		dst_cow_metrics_generic,
 	.destroy =		xfrm4_dst_destroy,
 	.ifdown =		xfrm4_dst_ifdown,
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 80564fe..9fb4247 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -423,6 +423,18 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
 	sctp_retransmit(&asoc->outqueue, t, SCTP_RTXR_PMTUD);
 }
 
+static void sctp_icmp_redirect(struct sock *sk, struct sctp_transport *t,
+			       struct sk_buff *skb)
+{
+	struct dst_entry *dst;
+
+	if (!t)
+		return;
+	dst = sctp_transport_dst_check(t);
+	if (dst && dst->ops->redirect)
+		dst->ops->redirect(dst, skb);
+}
+
 /*
  * SCTP Implementer's Guide, 2.37 ICMP handling procedures
  *
@@ -628,6 +640,10 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 
 		err = EHOSTUNREACH;
 		break;
+	case ICMP_REDIRECT:
+		sctp_icmp_redirect(sk, transport, skb);
+		err = 0;
+		break;
 	default:
 		goto out_unlock;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 06/16] ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |    3 +++
 net/ipv4/route.c    |   28 ++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/net/route.h b/include/net/route.h
index b140278..6ab93ee 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -180,6 +180,9 @@ static inline int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 s
 extern void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
 			     int oif, u32 mark, u8 protocol, int flow_flags);
 extern void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu);
+extern void ipv4_redirect(struct sk_buff *skb, struct net *net,
+			  int oif, u32 mark, u8 protocol, int flow_flags);
+extern void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk);
 extern void ip_rt_send_redirect(struct sk_buff *skb);
 
 extern unsigned int		inet_addr_type(struct net *net, __be32 addr);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f3d2565..aabece6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1590,6 +1590,34 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 }
 EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu);
 
+void ipv4_redirect(struct sk_buff *skb, struct net *net,
+		   int oif, u32 mark, u8 protocol, int flow_flags)
+{
+	const struct iphdr *iph = (const struct iphdr *)skb->data;
+	struct flowi4 fl4;
+	struct rtable *rt;
+
+	flowi4_init_output(&fl4, oif, mark, RT_TOS(iph->tos), RT_SCOPE_UNIVERSE,
+			   protocol, flow_flags, iph->daddr, iph->saddr, 0, 0);
+	rt = __ip_route_output_key(net, &fl4);
+	if (!IS_ERR(rt)) {
+		ip_do_redirect(&rt->dst, skb);
+		ip_rt_put(rt);
+	}
+}
+EXPORT_SYMBOL_GPL(ipv4_redirect);
+
+void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+
+	return ipv4_redirect(skb, sock_net(sk), sk->sk_bound_dev_if,
+			     sk->sk_mark,
+			     inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+			     inet_sk_flowi_flags(sk));
+}
+EXPORT_SYMBOL_GPL(ipv4_sk_redirect);
+
 static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 {
 	struct rtable *rt = (struct rtable *) dst;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 05/16] ipv4: Generalize ip_do_redirect() and hook into new dst_ops->redirect.
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


All of the redirect acceptance policy is now contained within.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst_ops.h |    1 +
 net/ipv4/route.c      |   94 ++++++++++++++++++++++++++++---------------------
 2 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 4badc86..085931f 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -25,6 +25,7 @@ struct dst_ops {
 	struct dst_entry *	(*negative_advice)(struct dst_entry *);
 	void			(*link_failure)(struct sk_buff *);
 	void			(*update_pmtu)(struct dst_entry *dst, u32 mtu);
+	void			(*redirect)(struct dst_entry *dst, struct sk_buff *skb);
 	int			(*local_out)(struct sk_buff *skb);
 	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
 						struct sk_buff *skb,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f8921b4..f3d2565 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -149,6 +149,7 @@ static void		 ipv4_dst_destroy(struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
 static void		 ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
+static void		 ip_do_redirect(struct dst_entry *dst, struct sk_buff *skb);
 static int rt_garbage_collect(struct dst_ops *ops);
 
 static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
@@ -179,6 +180,7 @@ static struct dst_ops ipv4_dst_ops = {
 	.negative_advice =	ipv4_negative_advice,
 	.link_failure =		ipv4_link_failure,
 	.update_pmtu =		ip_rt_update_pmtu,
+	.redirect =		ip_do_redirect,
 	.local_out =		__ip_local_out,
 	.neigh_lookup =		ipv4_neigh_lookup,
 };
@@ -1271,42 +1273,18 @@ static void rt_del(unsigned int hash, struct rtable *rt)
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 }
 
-static void ip_do_redirect(struct rtable *rt, __be32 old_gw, __be32 new_gw)
-{
-	struct neighbour *n;
-
-	if (rt->rt_gateway != old_gw)
-		return;
-
-	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
-	if (n) {
-		if (!(n->nud_state & NUD_VALID)) {
-			neigh_event_send(n, NULL);
-		} else {
-			rt->rt_gateway = new_gw;
-			rt->rt_flags |= RTCF_REDIRECTED;
-			call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
-		}
-		neigh_release(n);
-	}
-}
-
-/* called in rcu_read_lock() section */
-void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
+static void ip_do_redirect(struct dst_entry *dst, struct sk_buff *skb)
 {
 	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	__be32 new_gw = icmp_hdr(skb)->un.gateway;
 	__be32 old_gw = ip_hdr(skb)->saddr;
+	struct net_device *dev = skb->dev;
 	__be32 daddr = iph->daddr;
 	__be32 saddr = iph->saddr;
-	struct net_device *dev = skb->dev;
-	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	int    ikeys[2] = { dev->ifindex, 0 };
-	__be32 skeys[2] = { saddr, 0 };
+	struct in_device *in_dev;
+	struct neighbour *n;
+	struct rtable *rt;
 	struct net *net;
-	int s, i;
-
-	if (!in_dev)
-		return;
 
 	switch (icmp_hdr(skb)->code & 7) {
 	case ICMP_REDIR_NET:
@@ -1319,6 +1297,14 @@ void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
 		return;
 	}
 
+	rt = (struct rtable *) dst;
+	if (rt->rt_gateway != old_gw)
+		return;
+
+	in_dev = __in_dev_get_rcu(dev);
+	if (!in_dev)
+		return;
+
 	net = dev_net(dev);
 	if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev) ||
 	    ipv4_is_multicast(new_gw) || ipv4_is_lbcast(new_gw) ||
@@ -1335,6 +1321,43 @@ void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
 			goto reject_redirect;
 	}
 
+	n = ipv4_neigh_lookup(dst, NULL, &new_gw);
+	if (n) {
+		if (!(n->nud_state & NUD_VALID)) {
+			neigh_event_send(n, NULL);
+		} else {
+			rt->rt_gateway = new_gw;
+			rt->rt_flags |= RTCF_REDIRECTED;
+			call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		}
+		neigh_release(n);
+	}
+	return;
+
+reject_redirect:
+#ifdef CONFIG_IP_ROUTE_VERBOSE
+	if (IN_DEV_LOG_MARTIANS(in_dev))
+		net_info_ratelimited("Redirect from %pI4 on %s about %pI4 ignored\n"
+				     "  Advised path = %pI4 -> %pI4\n",
+				     &old_gw, dev->name, &new_gw,
+				     &saddr, &daddr);
+#endif
+	;
+}
+
+/* called in rcu_read_lock() section */
+void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
+{
+	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	__be32 daddr = iph->daddr;
+	__be32 saddr = iph->saddr;
+	struct net_device *dev = skb->dev;
+	int    ikeys[2] = { dev->ifindex, 0 };
+	__be32 skeys[2] = { saddr, 0 };
+	struct net *net;
+	int s, i;
+
+	net = dev_net(dev);
 	for (s = 0; s < 2; s++) {
 		for (i = 0; i < 2; i++) {
 			unsigned int hash;
@@ -1358,21 +1381,12 @@ void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
 				    rt->dst.dev != dev)
 					continue;
 
-				ip_do_redirect(rt, old_gw, new_gw);
+				ip_do_redirect(&rt->dst, skb);
 			}
 		}
 	}
 	return;
 
-reject_redirect:
-#ifdef CONFIG_IP_ROUTE_VERBOSE
-	if (IN_DEV_LOG_MARTIANS(in_dev))
-		net_info_ratelimited("Redirect from %pI4 on %s about %pI4 ignored\n"
-				     "  Advised path = %pI4 -> %pI4\n",
-				     &old_gw, dev->name, &new_gw,
-				     &saddr, &daddr);
-#endif
-	;
 }
 
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 04/16] ipv4: Rearrange arguments to ip_rt_redirect()
From: David Miller @ 2012-07-12  8:11 UTC (permalink / raw)
  To: netdev


Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |    3 +--
 net/ipv4/icmp.c     |   36 ++++++------------------------------
 net/ipv4/route.c    |   23 +++++++++++++++++++----
 3 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 5236236..b140278 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -108,8 +108,7 @@ extern struct ip_rt_acct __percpu *ip_rt_acct;
 
 struct in_device;
 extern int		ip_rt_init(void);
-extern void		ip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw,
-				       __be32 src, struct net_device *dev);
+extern void		ip_rt_redirect(struct sk_buff *skb, __be32 new_gw);
 extern void		rt_cache_flush(struct net *net, int how);
 extern void		rt_cache_flush_batch(struct net *net);
 extern struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5885146..70a7935 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -755,40 +755,16 @@ out_err:
 
 static void icmp_redirect(struct sk_buff *skb)
 {
-	const struct iphdr *iph;
-
-	if (skb->len < sizeof(struct iphdr))
-		goto out_err;
+	if (skb->len < sizeof(struct iphdr)) {
+		ICMP_INC_STATS_BH(dev_net(skb->dev), ICMP_MIB_INERRORS);
+		return;
+	}
 
-	/*
-	 *	Get the copied header of the packet that caused the redirect
-	 */
 	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
-		goto out;
-
-	iph = (const struct iphdr *)skb->data;
-
-	switch (icmp_hdr(skb)->code & 7) {
-	case ICMP_REDIR_NET:
-	case ICMP_REDIR_NETTOS:
-		/*
-		 * As per RFC recommendations now handle it as a host redirect.
-		 */
-	case ICMP_REDIR_HOST:
-	case ICMP_REDIR_HOSTTOS:
-		ip_rt_redirect(ip_hdr(skb)->saddr, iph->daddr,
-			       icmp_hdr(skb)->un.gateway,
-			       iph->saddr, skb->dev);
-		break;
-	}
+		return;
 
+	ip_rt_redirect(skb, icmp_hdr(skb)->un.gateway);
 	icmp_socket_deliver(skb, icmp_hdr(skb)->un.gateway);
-
-out:
-	return;
-out_err:
-	ICMP_INC_STATS_BH(dev_net(skb->dev), ICMP_MIB_INERRORS);
-	goto out;
 }
 
 /*
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a4de87f..f8921b4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1292,18 +1292,33 @@ static void ip_do_redirect(struct rtable *rt, __be32 old_gw, __be32 new_gw)
 }
 
 /* called in rcu_read_lock() section */
-void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
-		    __be32 saddr, struct net_device *dev)
+void ip_rt_redirect(struct sk_buff *skb, __be32 new_gw)
 {
-	int s, i;
+	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	__be32 old_gw = ip_hdr(skb)->saddr;
+	__be32 daddr = iph->daddr;
+	__be32 saddr = iph->saddr;
+	struct net_device *dev = skb->dev;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	__be32 skeys[2] = { saddr, 0 };
 	int    ikeys[2] = { dev->ifindex, 0 };
+	__be32 skeys[2] = { saddr, 0 };
 	struct net *net;
+	int s, i;
 
 	if (!in_dev)
 		return;
 
+	switch (icmp_hdr(skb)->code & 7) {
+	case ICMP_REDIR_NET:
+	case ICMP_REDIR_NETTOS:
+	case ICMP_REDIR_HOST:
+	case ICMP_REDIR_HOSTTOS:
+		break;
+
+	default:
+		return;
+	}
+
 	net = dev_net(dev);
 	if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev) ||
 	    ipv4_is_multicast(new_gw) || ipv4_is_lbcast(new_gw) ||
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 03/16] ipv4: Pull redirect instantiation out into a helper function.
From: David Miller @ 2012-07-12  8:10 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 95bfa1b..a4de87f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1271,6 +1271,26 @@ static void rt_del(unsigned int hash, struct rtable *rt)
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 }
 
+static void ip_do_redirect(struct rtable *rt, __be32 old_gw, __be32 new_gw)
+{
+	struct neighbour *n;
+
+	if (rt->rt_gateway != old_gw)
+		return;
+
+	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
+	if (n) {
+		if (!(n->nud_state & NUD_VALID)) {
+			neigh_event_send(n, NULL);
+		} else {
+			rt->rt_gateway = new_gw;
+			rt->rt_flags |= RTCF_REDIRECTED;
+			call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		}
+		neigh_release(n);
+	}
+}
+
 /* called in rcu_read_lock() section */
 void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 		    __be32 saddr, struct net_device *dev)
@@ -1311,8 +1331,6 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 			rthp = &rt_hash_table[hash].chain;
 
 			while ((rt = rcu_dereference(*rthp)) != NULL) {
-				struct neighbour *n;
-
 				rthp = &rt->dst.rt_next;
 
 				if (rt->rt_key_dst != daddr ||
@@ -1322,21 +1340,10 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 				    rt_is_expired(rt) ||
 				    !net_eq(dev_net(rt->dst.dev), net) ||
 				    rt->dst.error ||
-				    rt->dst.dev != dev ||
-				    rt->rt_gateway != old_gw)
+				    rt->dst.dev != dev)
 					continue;
 
-				n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
-				if (n) {
-					if (!(n->nud_state & NUD_VALID)) {
-						neigh_event_send(n, NULL);
-					} else {
-						rt->rt_gateway = new_gw;
-						rt->rt_flags |= RTCF_REDIRECTED;
-						call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
-					}
-					neigh_release(n);
-				}
+				ip_do_redirect(rt, old_gw, new_gw);
 			}
 		}
 	}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 02/16] ipv4: Deliver ICMP redirects to sockets too.
From: David Miller @ 2012-07-12  8:10 UTC (permalink / raw)
  To: netdev


And thus, we can remove the ping_err() hack.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/icmp.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 18e39d1..5885146 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -782,13 +782,7 @@ static void icmp_redirect(struct sk_buff *skb)
 		break;
 	}
 
-	/* Ping wants to see redirects.
-         * Let's pretend they are errors of sorts... */
-	if (iph->protocol == IPPROTO_ICMP &&
-	    iph->ihl >= 5 &&
-	    pskb_may_pull(skb, (iph->ihl<<2)+8)) {
-		ping_err(skb, icmp_hdr(skb)->un.gateway);
-	}
+	icmp_socket_deliver(skb, icmp_hdr(skb)->un.gateway);
 
 out:
 	return;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 01/16] ipv4: Pull icmp socket delivery out into a helper function.
From: David Miller @ 2012-07-12  8:10 UTC (permalink / raw)
  To: netdev



Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/icmp.c |   31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 4a04944..18e39d1 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -634,18 +634,31 @@ out:;
 EXPORT_SYMBOL(icmp_send);
 
 
+static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
+{
+	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	const struct net_protocol *ipprot;
+	int protocol = iph->protocol;
+
+	raw_icmp_error(skb, protocol, info);
+
+	rcu_read_lock();
+	ipprot = rcu_dereference(inet_protos[protocol]);
+	if (ipprot && ipprot->err_handler)
+		ipprot->err_handler(skb, info);
+	rcu_read_unlock();
+}
+
 /*
  *	Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEED, and ICMP_QUENCH.
  */
 
 static void icmp_unreach(struct sk_buff *skb)
 {
-	const struct net_protocol *ipprot;
 	const struct iphdr *iph;
 	struct icmphdr *icmph;
 	struct net *net;
 	u32 info = 0;
-	int protocol;
 
 	net = dev_net(skb_dst(skb)->dev);
 
@@ -726,19 +739,7 @@ static void icmp_unreach(struct sk_buff *skb)
 	if (!pskb_may_pull(skb, iph->ihl * 4 + 8))
 		goto out;
 
-	iph = (const struct iphdr *)skb->data;
-	protocol = iph->protocol;
-
-	/*
-	 *	Deliver ICMP message to raw sockets. Pretty useless feature?
-	 */
-	raw_icmp_error(skb, protocol, info);
-
-	rcu_read_lock();
-	ipprot = rcu_dereference(inet_protos[protocol]);
-	if (ipprot && ipprot->err_handler)
-		ipprot->err_handler(skb, info);
-	rcu_read_unlock();
+	icmp_socket_deliver(skb, info);
 
 out:
 	return;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 0/16] Handle redirects just like PMTU
From: David Miller @ 2012-07-12  8:10 UTC (permalink / raw)
  To: netdev


As described in my patch series from the other day, we need to
rearrange redirect handling so that the local initiators of packets
(sockets, tunnels, xfrms, etc.) that implement the protocols compute
the route and pass this down into the ipv4/ipv6 routing code.

These changes here do so by implementing a new dst_ops->redirect
method.

No more do we have this funny code that tries several different sets
of routing keys to try and figure out which route the redirect should
actually be applied to.

No more do we have the problem wherein TOS rewriting causes problems
for us.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-12  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: nanditad, netdev, mattmathis, codel, ncardwell
In-Reply-To: <20120712.003700.49235222504944712.davem@davemloft.net>

On Thu, 2012-07-12 at 00:37 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 12 Jul 2012 09:34:19 +0200
> 
> > On Thu, 2012-07-12 at 01:49 +0200, Eric Dumazet wrote:
> > 
> >> The 10Gb receiver is a net-next kernel, but the 1Gb receiver is a 2.6.38
> >> ubuntu kernel. They probably have very different TCP behavior.
> > 
> > 
> > I tested TSQ on bnx2x and 10Gb links.
> > 
> > I get full rate even using 65536 bytes for
> > the /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> 
> Great work Eric.

Thanks !

^ permalink raw reply

* [PATCH v4] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Gao feng @ 2012-07-12  7:50 UTC (permalink / raw)
  To: nhorman; +Cc: eric.dumazet, linux-kernel, netdev, davem, Gao feng, Eric Dumazet

there are some out of bound accesses in netprio cgroup.

now before accessing the dev->priomap.priomap array,we only check
if the dev->priomap exist.and because we don't want to see
additional bound checkings in fast path, so we should make sure
that dev->priomap is null or array size of dev->priomap.priomap
is equal to max_prioidx + 1;

so in write_priomap logic,we should call extend_netdev_table when
dev->priomap is null and dev->priomap.priomap_len < max_len.
and in cgrp_create->update_netdev_tables logic,we should call
extend_netdev_table only when dev->priomap exist and
dev->priomap.priomap_len < max_len.

and it's not needed to call update_netdev_tables in write_priomap,
we can only allocate the net device's priomap which we change through
net_prio.ifpriomap.

this patch also add a return value for update_netdev_tables &
extend_netdev_table, so when new_priomap is allocated failed,
write_priomap will stop to access the priomap,and return -ENOMEM
back to the userspace to tell the user what happend.

Change From v3:
1. add rtnl protect when reading max_prioidx in write_priomap.

2. only call extend_netdev_table when map->priomap_len < max_len,
   this will make sure array size of dev->map->priomap always
   bigger than any prioidx.

3. add a function write_update_netdev_table to make codes clear.

Change From v2:
1. protect extend_netdev_table by RTNL.
2. when extend_netdev_table failed,call dev_put to reduce device's refcount.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <edumazet@google.com>
---
 net/core/netprio_cgroup.c |   71 ++++++++++++++++++++++++++++++++++-----------
 1 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index aa907ed..9b17d54 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -65,7 +65,7 @@ static void put_prioidx(u32 idx)
 	spin_unlock_irqrestore(&prioidx_map_lock, flags);
 }
 
-static void extend_netdev_table(struct net_device *dev, u32 new_len)
+static int extend_netdev_table(struct net_device *dev, u32 new_len)
 {
 	size_t new_size = sizeof(struct netprio_map) +
 			   ((sizeof(u32) * new_len));
@@ -77,7 +77,7 @@ static void extend_netdev_table(struct net_device *dev, u32 new_len)
 
 	if (!new_priomap) {
 		pr_warn("Unable to alloc new priomap!\n");
-		return;
+		return -ENOMEM;
 	}
 
 	for (i = 0;
@@ -90,46 +90,79 @@ static void extend_netdev_table(struct net_device *dev, u32 new_len)
 	rcu_assign_pointer(dev->priomap, new_priomap);
 	if (old_priomap)
 		kfree_rcu(old_priomap, rcu);
+	return 0;
 }
 
-static void update_netdev_tables(void)
+static int write_update_netdev_table(struct net_device *dev)
 {
+	int ret = 0;
+	u32 max_len;
+	struct netprio_map *map;
+
+	rtnl_lock();
+	max_len = atomic_read(&max_prioidx) + 1;
+	map = rtnl_dereference(dev->priomap);
+	if (!map || map->priomap_len < max_len)
+		ret = extend_netdev_table(dev, max_len);
+	rtnl_unlock();
+
+	return ret;
+}
+
+static int update_netdev_tables(void)
+{
+	int ret = 0;
 	struct net_device *dev;
-	u32 max_len = atomic_read(&max_prioidx) + 1;
+	u32 max_len;
 	struct netprio_map *map;
 
 	rtnl_lock();
+	max_len = atomic_read(&max_prioidx) + 1;
 	for_each_netdev(&init_net, dev) {
 		map = rtnl_dereference(dev->priomap);
-		if ((!map) ||
-		    (map->priomap_len < max_len))
-			extend_netdev_table(dev, max_len);
+		/*
+		 * don't allocate priomap if we didn't
+		 * change net_prio.ifpriomap (map == NULL),
+		 * this will speed up skb_update_prio.
+		 */
+		if (map && map->priomap_len < max_len) {
+			ret = extend_netdev_table(dev, max_len);
+			if (ret < 0)
+				break;
+		}
 	}
 	rtnl_unlock();
+	return ret;
 }
 
 static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs;
-	int ret;
+	int ret = -EINVAL;
 
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
 
-	if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx) {
-		kfree(cs);
-		return ERR_PTR(-EINVAL);
-	}
+	if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx)
+		goto out;
 
 	ret = get_prioidx(&cs->prioidx);
-	if (ret != 0) {
+	if (ret < 0) {
 		pr_warn("No space in priority index array\n");
-		kfree(cs);
-		return ERR_PTR(ret);
+		goto out;
+	}
+
+	ret = update_netdev_tables();
+	if (ret < 0) {
+		put_prioidx(cs->prioidx);
+		goto out;
 	}
 
 	return &cs->css;
+out:
+	kfree(cs);
+	return ERR_PTR(ret);
 }
 
 static void cgrp_destroy(struct cgroup *cgrp)
@@ -221,13 +254,17 @@ static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 	if (!dev)
 		goto out_free_devname;
 
-	update_netdev_tables();
-	ret = 0;
+	ret = write_update_netdev_table(dev);
+	if (ret < 0)
+		goto out_put_dev;
+
 	rcu_read_lock();
 	map = rcu_dereference(dev->priomap);
 	if (map)
 		map->priomap[prioidx] = priority;
 	rcu_read_unlock();
+
+out_put_dev:
 	dev_put(dev);
 
 out_free_devname:
-- 
1.7.7.6

^ permalink raw reply related

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: David Miller @ 2012-07-12  7:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nanditad, netdev, mattmathis, codel, ncardwell
In-Reply-To: <1342078459.3265.8244.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 12 Jul 2012 09:34:19 +0200

> On Thu, 2012-07-12 at 01:49 +0200, Eric Dumazet wrote:
> 
>> The 10Gb receiver is a net-next kernel, but the 1Gb receiver is a 2.6.38
>> ubuntu kernel. They probably have very different TCP behavior.
> 
> 
> I tested TSQ on bnx2x and 10Gb links.
> 
> I get full rate even using 65536 bytes for
> the /proc/sys/net/ipv4/tcp_limit_output_bytes tunable

Great work Eric.

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-12  7:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <1342050592.3265.8195.camel@edumazet-glaptop>

On Thu, 2012-07-12 at 01:49 +0200, Eric Dumazet wrote:

> The 10Gb receiver is a net-next kernel, but the 1Gb receiver is a 2.6.38
> ubuntu kernel. They probably have very different TCP behavior.


I tested TSQ on bnx2x and 10Gb links.

I get full rate even using 65536 bytes for
the /proc/sys/net/ipv4/tcp_limit_output_bytes tunable

OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.8.37 () port 0 AF_INET : histogram
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1606536     2097152     16384  20.00   9411.12    10^6bits/s  2.40  S      4.27   S      0.502   0.892   usec/KB  

^ permalink raw reply

* Re: [PATCH 4/4] asix: Add a new driver for the AX88172A
From: Christian Riesch @ 2012-07-12  7:22 UTC (permalink / raw)
  To: Grant Grundler
  Cc: netdev, Oliver Neukum, Eric Dumazet, Allan Chou, Mark Lord,
	Ming Lei, Michael Riesch
In-Reply-To: <CABkLObon+CkLJagu2pX0OBGpNzSn9NfyM353h2BQ68viy2Bq6Q@mail.gmail.com>

Hi Grant,

On Mon, Jul 9, 2012 at 12:22 PM, Christian Riesch
<christian.riesch@omicron.at> wrote:
> Grant,
>
> On Fri, Jul 6, 2012 at 11:20 PM, Grant Grundler <grundler@chromium.org> wrote:
>> On Fri, Jul 6, 2012 at 4:33 AM, Christian Riesch
>> <christian.riesch@omicron.at> wrote:
>>> The Asix AX88172A is a USB 2.0 Ethernet interface that supports both an
>>> internal PHY as well as an external PHY (connected via MII).
>>>
>>> This patch adds a driver for the AX88172A and provides support for
>>> both modes and supports phylib.
>>
>> Christian,
>> In general this looks fine to me...but I wouldn't know about "bus
>> identifier life times" (Ben Hutchings comment).
>>
>> My nit pick is the declaration and of use_embdphy. An alternative
>> coding _suggestion_ below.  I'm not substantially altering the
>> functionality.
>>
>> thanks,
>> grant
>
> [...]
>
>>> +
>>> +struct ax88172a_private {
>>> +       int use_embdphy;
>>
>> Can you move the "int" to the end of the struct?
>> It's cleaner to have fields "natively align". ie pointers should start
>> at 8 byte alignments when compiled for 64-bit.
>>
>>> +       struct mii_bus *mdio;
>>> +       struct phy_device *phydev;
>>> +       char phy_name[20];
>>> +       u16 phy_addr;
>>> +       u16 oldmode;
>>> +};
>>> +
>
> [...]
>
>>> +       /* are we using the internal or the external phy? */
>>> +       ret = asix_read_cmd(dev, AX_CMD_SW_PHY_STATUS, 0, 0, 1, buf);
>>> +       if (ret < 0) {
>>> +               dbg("Failed to read software interface selection register: %d",
>>> +                   ret);
>>> +               goto free;
>>> +       }
>>> +       dbg("AX_CMD_SW_PHY_STATUS = 0x%02x\n", buf[0]);
>>> +       switch ((buf[0] & 0x0c) >> 2) {
>>> +       case 0:
>>> +               dbg("use internal phy\n");
>>> +               priv->use_embdphy = 1;
>>> +               break;
>>> +       case 1:
>>> +               dbg("use external phy\n");
>>> +               priv->use_embdphy = 0;
>>> +               break;
>>> +       default:
>>> +               dbg("Interface mode not supported by driver\n");
>>> +               goto free;
>>> +       }
>>
>> This switch statement inverts the existing logic. Much simpler code would be:
>>     /* buf[0] & 0xc describes phy interface mode */
>>     if (buf[0] &  8) {
>>          dbg("Interface mode not supported by driver\n");
>>          goto free;
>>     }
>>     priv->use_extphy = (buf[0] & 4) >> 2;
>>
>
> Thank your for your comments! I'll change that in the next version!
> Regards, Christian

After rethinking it I decided to keep the switch structure, but use
defines for the different modes and the bitmask. I think this will be
easier to understand. I will submit a new version of the patchset
later today, please have a look at it.
Thanks!
Christian

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: Add new network device function to allow for MMIO batching
From: Eric Dumazet @ 2012-07-12  7:14 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, bhutchings, therbert,
	alexander.duyck
In-Reply-To: <20120712002603.27846.23752.stgit@gitlad.jf.intel.com>

On Wed, 2012-07-11 at 17:26 -0700, Alexander Duyck wrote:
> This change adds capabilities to the driver for batching the MMIO write
> involved with transmits.  Most of the logic is based off of the code for
> the qdisc scheduling.
> 
> What I did is break the transmit path into two parts.  We already had the
> ndo_start_xmit function which has been there all along.  The part I added
> was ndo_complete_xmit which is meant to handle notifying the hardware that
> frames are ready for delivery.
> 
> To control all of this I added a net sysfs value for the Tx queues called
> dispatch_limit.  When 0 it indicates that all frames will notify hardware
> immediately.  When 1 or more the netdev_complete_xmit call will queue up to
> that number of packets, and when the value is exceeded it will notify the
> hardware and reset the pending frame dispatch count.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---

The idea is good, but do we really need so complex schem ?

Most of the transmits are done from __qdisc_run()

We could add logic in __qdisc_run()/qdisc_restart()

qdisc_run_end() would then have to call ndo_complete_xmit() to make
sure the MMIO is done.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox