Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 0/8] net: extend ethtool link mode bitmaps to 48 bits
From: Ben Hutchings @ 2015-01-06 23:08 UTC (permalink / raw)
  To: David Miller
  Cc: amirv, ddecotig, f.fainelli, netdev, linux-kernel, linux-api,
	saeedm, decot, jasowang, mst, herbert, viro, yamato, xii, nhorman,
	xiyou.wangcong, fbl, teg, jiri, vyasevic, ebiederm,
	VenkatKumar.Duvvuru, _govind
In-Reply-To: <20150106.172918.70204012105519766.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On Tue, 2015-01-06 at 17:29 -0500, David Miller wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Tue, 6 Jan 2015 15:56:33 +0200
> 
> > Mellanox is about to release next month a driver for a new NIC, with 3
> > new speeds * few link modes for each + new link modes for 10G.
> > It seems that we will need to consume almost all the new bits.
> 
> This tells me that the approach to this problem needs to be rethought.
> 
> Maybe we just need to bite the bullet and make a new ETHTOOL_GSET_2
> and ETHTOOL_SSET_2 or whatever you want to name them.
> 
> Then we can define a completely new structure, with 64-bit bitmaps
> for link modes or whatever.  The ethtool_op callbacks work using
> this structure, and only the net/core/ethtool.c code knows about
> the older structure and translates to/from for ETHTOOL_{GSET,SSET}.

Agreed.

Ben.

-- 
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [RFC] netlink: get rid of nl_table_lock
From: Thomas Graf @ 2015-01-06 23:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: davem, netdev, linux-kernel, herbert, paulmck, edumazet,
	john.r.fastabend, josh
In-Reply-To: <20150103110211.18b11f0f@urahara>

On 01/03/15 at 11:02am, Stephen Hemminger wrote:
> As a follow on to Thomas's patch I think this would complete the
> transistion to RCU for netlink.
> Compile tested only.
> 
> 
> 
> This patch gets rid of the reader/writer nl_table_lock and replaces it
> with exclusively using RCU for reading, and a mutex for writing.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

I like it. One thing I noticed it that it leaves a mix of
native mutex unlocks and unlocks via netlink_table_ungrab().

The Open vSwitch upcall is a pretty good real world Netlink
benchmark. I'll run the tests I have to see if this has any
unexpected side effects.

>  void netlink_table_grab(void)
> -	__acquires(nl_table_lock)
>  {
> -	might_sleep();
> -
> -	write_lock_irq(&nl_table_lock);
> -
> -	if (atomic_read(&nl_table_users)) {
> -		DECLARE_WAITQUEUE(wait, current);
> -
> -		add_wait_queue_exclusive(&nl_table_wait, &wait);
> -		for (;;) {
> -			set_current_state(TASK_UNINTERRUPTIBLE);
> -			if (atomic_read(&nl_table_users) == 0)
> -				break;
> -			write_unlock_irq(&nl_table_lock);
> -			schedule();
> -			write_lock_irq(&nl_table_lock);
> -		}
> -
> -		__set_current_state(TASK_RUNNING);
> -		remove_wait_queue(&nl_table_wait, &wait);
> -	}
> +	mutex_lock(&nl_table_mutex);
>  }

I left this untouched so far as I wasn't clear on what side effect
it would have to remove this as it does explicitely relax writers
right now.

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Pablo Neira Ayuso @ 2015-01-06 22:47 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: netdev, linux-kernel, hannes, netfilter-devel
In-Reply-To: <CAFB3abzYNYqtzd_n+Ym0Lo=DYbV0HPU=Tqw=stALNnu188hMKQ@mail.gmail.com>

On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
> ipv6_find_hdr() currently assumes that the next-header field in the
> fragment header of the non-first fragment is the "protocol number of
> the last header" (here last header excludes any extension header
> protocol numbers ) which is incorrect as per RFC2460. The next-header
> value is the first header of the fragmentable part of the original
> packet (which can be extension header as well).
> This can create reassembly problems. For example: Fragmented
> authenticated OSPFv3 packets (where AH header is inserted before the
> protocol header). For the second fragment, the next header value in
> the fragment header will be NEXTHDR_AUTH which is correct but
> ipv6_find_hdr will return ENOENT since AH is an extension header
> resulting in second fragment getting dropped. This check for the
> presence of non-extension header needs to be removed.
> 
> Signed-off-by: Rahul Sharma <rsharma@arista.com>
> ---
> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
> 10:25:36.411419863 -0800
> +++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
> 10:51:45.819364986 -0800
> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
>   * If the first fragment doesn't contain the final protocol header or
>   * NEXTHDR_NONE it is considered invalid.
>   *
> - * Note that non-1st fragment is special case that "the protocol number
> - * of last header" is "next header" field in Fragment header. In this case,
> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
> - * isn't NULL.
> + * Note that non-1st fragment is special case that "the protocol number of the
> + * first header of the fragmentable part of the original packet" is
> + * "next header" field in the Fragment header. In this case, *offset is
> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
> + * NULL.
>   *
>   * if flags is not NULL and it's a fragment, then the frag flag
>   * IP6_FH_F_FRAG will be set. If it's an AH header, the
> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
> 
>                         _frag_off = ntohs(*fp) & ~0x7;
>                         if (_frag_off) {
> -                               if (target < 0 &&
> -                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||

This check assumes that the following headers cannot show up in the
fragmented part of the IPv6 packet:

 12 bool ipv6_ext_hdr(u8 nexthdr)
 13 {
 14         /*
 15          * find out if nexthdr is an extension header or a protocol
 16          */
 17         return   (nexthdr == NEXTHDR_HOP)       ||
 18                  (nexthdr == NEXTHDR_ROUTING)   ||
 19                  (nexthdr == NEXTHDR_FRAGMENT)  ||
 20                  (nexthdr == NEXTHDR_AUTH)      ||
 21                  (nexthdr == NEXTHDR_NONE)      ||
 22                  (nexthdr == NEXTHDR_DEST);

> -                                    hp->nexthdr == NEXTHDR_NONE)) {
> +                               if (target < 0) {
>                                         if (fragoff)
>                                                 *fragoff = _frag_off;
>                                         return hp->nexthdr;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] qla3xxx: don't allow never end busy loop
From: David Miller @ 2015-01-06 22:41 UTC (permalink / raw)
  To: andy.shevchenko; +Cc: netdev, linux-driver
In-Reply-To: <1420579073-24637-1-git-send-email-andy.shevchenko@gmail.com>

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Tue,  6 Jan 2015 23:17:53 +0200

> The counter variable wasn't increased at all which may stuck under
> certain circumstances.
> 
> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>

Applied, thanks.

^ permalink raw reply

* [PATCH V3] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: Feng Kan @ 2015-01-06 22:41 UTC (permalink / raw)
  To: patches, davem, netdev, linux-kernel; +Cc: Feng Kan

This adds support for APM X-Gene ethernet driver to use ACPI table to derive
ethernet driver parameter.

Signed-off-by: Feng Kan <fkan@apm.com>
---
V3:
   - Fix compile error caught by allmodconfig
V2:
   - remove NO_MAC define
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |   94 ++++++++++++++++------
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |   97 +++++++++++++++++-----
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |    3 +
 3 files changed, 150 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 7ba83ff..869d97f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -593,10 +593,12 @@ static int xgene_enet_reset(struct xgene_enet_pdata *pdata)
 	if (!xgene_ring_mgr_init(pdata))
 		return -ENODEV;
 
-	clk_prepare_enable(pdata->clk);
-	clk_disable_unprepare(pdata->clk);
-	clk_prepare_enable(pdata->clk);
-	xgene_enet_ecc_init(pdata);
+	if (!efi_enabled(EFI_BOOT)) {
+		clk_prepare_enable(pdata->clk);
+		clk_disable_unprepare(pdata->clk);
+		clk_prepare_enable(pdata->clk);
+		xgene_enet_ecc_init(pdata);
+	}
 	xgene_enet_config_ring_if_assoc(pdata);
 
 	/* Enable auto-incr for scanning */
@@ -663,15 +665,20 @@ static int xgene_enet_phy_connect(struct net_device *ndev)
 	struct phy_device *phy_dev;
 	struct device *dev = &pdata->pdev->dev;
 
-	phy_np = of_parse_phandle(dev->of_node, "phy-handle", 0);
-	if (!phy_np) {
-		netdev_dbg(ndev, "No phy-handle found\n");
-		return -ENODEV;
+	if (dev->of_node) {
+		phy_np = of_parse_phandle(dev->of_node, "phy-handle", 0);
+		if (!phy_np) {
+			netdev_dbg(ndev, "No phy-handle found in DT\n");
+			return -ENODEV;
+		}
+		pdata->phy_dev = of_phy_find_device(phy_np);
 	}
 
-	phy_dev = of_phy_connect(ndev, phy_np, &xgene_enet_adjust_link,
-				 0, pdata->phy_mode);
-	if (!phy_dev) {
+	phy_dev = pdata->phy_dev;
+
+	if (!phy_dev ||
+	    phy_connect_direct(ndev, phy_dev, &xgene_enet_adjust_link,
+			       pdata->phy_mode)) {
 		netdev_err(ndev, "Could not connect to PHY\n");
 		return  -ENODEV;
 	}
@@ -681,32 +688,71 @@ static int xgene_enet_phy_connect(struct net_device *ndev)
 			      ~SUPPORTED_100baseT_Half &
 			      ~SUPPORTED_1000baseT_Half;
 	phy_dev->advertising = phy_dev->supported;
-	pdata->phy_dev = phy_dev;
 
 	return 0;
 }
 
-int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
+static int xgene_mdiobus_register(struct xgene_enet_pdata *pdata,
+				  struct mii_bus *mdio)
 {
-	struct net_device *ndev = pdata->ndev;
 	struct device *dev = &pdata->pdev->dev;
+	struct net_device *ndev = pdata->ndev;
+	struct phy_device *phy;
 	struct device_node *child_np;
 	struct device_node *mdio_np = NULL;
-	struct mii_bus *mdio_bus;
 	int ret;
+	u32 phy_id;
+
+	if (dev->of_node) {
+		for_each_child_of_node(dev->of_node, child_np) {
+			if (of_device_is_compatible(child_np,
+						    "apm,xgene-mdio")) {
+				mdio_np = child_np;
+				break;
+			}
+		}
 
-	for_each_child_of_node(dev->of_node, child_np) {
-		if (of_device_is_compatible(child_np, "apm,xgene-mdio")) {
-			mdio_np = child_np;
-			break;
+		if (!mdio_np) {
+			netdev_dbg(ndev, "No mdio node in the dts\n");
+			return -ENXIO;
 		}
-	}
 
-	if (!mdio_np) {
-		netdev_dbg(ndev, "No mdio node in the dts\n");
-		return -ENXIO;
+		return of_mdiobus_register(mdio, mdio_np);
 	}
 
+	/* Mask out all PHYs from auto probing. */
+	mdio->phy_mask = ~0;
+
+	/* Register the MDIO bus */
+	ret = mdiobus_register(mdio);
+	if (ret)
+		return ret;
+
+	ret = device_property_read_u32(dev, "phy-channel", &phy_id);
+	if (ret)
+		ret = device_property_read_u32(dev, "phy-addr", &phy_id);
+	if (ret)
+		return -EINVAL;
+
+	phy = get_phy_device(mdio, phy_id, true);
+	if (!phy || IS_ERR(phy))
+		return -EIO;
+
+	ret = phy_device_register(phy);
+	if (ret)
+		phy_device_free(phy);
+	else
+		pdata->phy_dev = phy;
+
+	return ret;
+}
+
+int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
+{
+	struct net_device *ndev = pdata->ndev;
+	struct mii_bus *mdio_bus;
+	int ret;
+
 	mdio_bus = mdiobus_alloc();
 	if (!mdio_bus)
 		return -ENOMEM;
@@ -720,7 +766,7 @@ int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
 	mdio_bus->priv = pdata;
 	mdio_bus->parent = &ndev->dev;
 
-	ret = of_mdiobus_register(mdio_bus, mdio_np);
+	ret = xgene_mdiobus_register(pdata, mdio_bus);
 	if (ret) {
 		netdev_err(ndev, "Failed to register MDIO bus\n");
 		mdiobus_free(mdio_bus);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 83a5028..1e56bf3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -24,6 +24,10 @@
 #include "xgene_enet_sgmac.h"
 #include "xgene_enet_xgmac.h"
 
+#define RES_ENET_CSR	0
+#define RES_RING_CSR	1
+#define RES_RING_CMD	2
+
 static void xgene_enet_init_bufpool(struct xgene_enet_desc_ring *buf_pool)
 {
 	struct xgene_enet_raw_desc16 *raw_desc;
@@ -746,6 +750,41 @@ static const struct net_device_ops xgene_ndev_ops = {
 	.ndo_set_mac_address = xgene_enet_set_mac_address,
 };
 
+static int xgene_get_mac_address(struct device *dev,
+				 unsigned char *addr)
+{
+	int ret;
+
+	ret = device_property_read_u8_array(dev, "local-mac-address", addr, 6);
+	if (ret)
+		ret = device_property_read_u8_array(dev, "mac-address",
+						    addr, 6);
+	if (ret)
+		return -ENODEV;
+
+	return ETH_ALEN;
+}
+
+static int xgene_get_phy_mode(struct device *dev)
+{
+	int i, ret;
+	char *modestr;
+
+	ret = device_property_read_string(dev, "phy-connection-type",
+					  (const char **)&modestr);
+	if (ret)
+		ret = device_property_read_string(dev, "phy-mode",
+						  (const char **)&modestr);
+	if (ret)
+		return -ENODEV;
+
+	for (i = 0; i < PHY_INTERFACE_MODE_MAX; i++) {
+		if (!strcasecmp(modestr, phy_modes(i)))
+			return i;
+	}
+	return -ENODEV;
+}
+
 static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 {
 	struct platform_device *pdev;
@@ -753,29 +792,42 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	struct device *dev;
 	struct resource *res;
 	void __iomem *base_addr;
-	const char *mac;
 	int ret;
 
 	pdev = pdata->pdev;
 	dev = &pdev->dev;
 	ndev = pdata->ndev;
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "enet_csr");
-	pdata->base_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_ENET_CSR);
+	if (!res) {
+		dev_err(dev, "Resource enet_csr not defined\n");
+		return -ENODEV;
+	}
+	pdata->base_addr = devm_ioremap(dev, res->start, resource_size(res));
 	if (IS_ERR(pdata->base_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Port CSR region\n");
 		return PTR_ERR(pdata->base_addr);
 	}
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_csr");
-	pdata->ring_csr_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_RING_CSR);
+	if (!res) {
+		dev_err(dev, "Resource ring_csr not defined\n");
+		return -ENODEV;
+	}
+	pdata->ring_csr_addr = devm_ioremap(dev, res->start,
+							resource_size(res));
 	if (IS_ERR(pdata->ring_csr_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring CSR region\n");
 		return PTR_ERR(pdata->ring_csr_addr);
 	}
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_cmd");
-	pdata->ring_cmd_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_RING_CMD);
+	if (!res) {
+		dev_err(dev, "Resource ring_cmd not defined\n");
+		return -ENODEV;
+	}
+	pdata->ring_cmd_addr = devm_ioremap(dev, res->start,
+							resource_size(res));
 	if (IS_ERR(pdata->ring_cmd_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring command region\n");
 		return PTR_ERR(pdata->ring_cmd_addr);
@@ -789,14 +841,12 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 	pdata->rx_irq = ret;
 
-	mac = of_get_mac_address(dev->of_node);
-	if (mac)
-		memcpy(ndev->dev_addr, mac, ndev->addr_len);
-	else
+	if (xgene_get_mac_address(dev, ndev->dev_addr) != ETH_ALEN)
 		eth_hw_addr_random(ndev);
+
 	memcpy(ndev->perm_addr, ndev->dev_addr, ndev->addr_len);
 
-	pdata->phy_mode = of_get_phy_mode(pdev->dev.of_node);
+	pdata->phy_mode = xgene_get_phy_mode(dev);
 	if (pdata->phy_mode < 0) {
 		dev_err(dev, "Unable to get phy-connection-type\n");
 		return pdata->phy_mode;
@@ -809,11 +859,9 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 
 	pdata->clk = devm_clk_get(&pdev->dev, NULL);
-	ret = IS_ERR(pdata->clk);
 	if (IS_ERR(pdata->clk)) {
-		dev_err(&pdev->dev, "can't get clock\n");
-		ret = PTR_ERR(pdata->clk);
-		return ret;
+		/* Firmware may have set up the clock already. */
+		pdata->clk = NULL;
 	}
 
 	base_addr = pdata->base_addr;
@@ -924,7 +972,7 @@ static int xgene_enet_probe(struct platform_device *pdev)
 		goto err;
 	}
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+	ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(64));
 	if (ret) {
 		netdev_err(ndev, "No usable DMA configuration\n");
 		goto err;
@@ -972,17 +1020,26 @@ static int xgene_enet_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static struct of_device_id xgene_enet_match[] = {
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id xgene_enet_acpi_match[] = {
+	{ "APMC0D05", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, xgene_enet_acpi_match);
+#endif
+
+static struct of_device_id xgene_enet_of_match[] = {
 	{.compatible = "apm,xgene-enet",},
 	{},
 };
 
-MODULE_DEVICE_TABLE(of, xgene_enet_match);
+MODULE_DEVICE_TABLE(of, xgene_enet_of_match);
 
 static struct platform_driver xgene_enet_driver = {
 	.driver = {
 		   .name = "xgene-enet",
-		   .of_match_table = xgene_enet_match,
+		   .of_match_table = of_match_ptr(xgene_enet_of_match),
+		   .acpi_match_table = ACPI_PTR(xgene_enet_acpi_match),
 	},
 	.probe = xgene_enet_probe,
 	.remove = xgene_enet_remove,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index f9958fa..c2d465c 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -22,7 +22,10 @@
 #ifndef __XGENE_ENET_MAIN_H__
 #define __XGENE_ENET_MAIN_H__
 
+#include <linux/acpi.h>
 #include <linux/clk.h>
+#include <linux/efi.h>
+#include <linux/io.h>
 #include <linux/of_platform.h>
 #include <linux/of_net.h>
 #include <linux/of_mdio.h>
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Miller @ 2015-01-06 22:40 UTC (permalink / raw)
  To: ani; +Cc: corbet, edumazet, linux-doc, linux-kernel, P, netdev, fruggeri
In-Reply-To: <1420583625-32414-1-git-send-email-ani@arista.com>

From: Ani Sinha <ani@arista.com>
Date: Tue,  6 Jan 2015 14:33:45 -0800

> @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
>  	Default: 0
>  
>  route/max_size - INTEGER
> -	Maximum number of routes allowed in the kernel.  Increase
> -	this when using large numbers of interfaces and/or routes.
> +        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
> +        longer used. For ipv6, this is used to limit the maximum number of ipv6
> +        routes allowed in the kernel.  Increase this when using large numbers of
> +        interfaces and/or routes.

Please do not change the TABs into sequenes of space characters.

^ permalink raw reply

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20150106.173110.2230465152118236848.davem@davemloft.net>

On Tue, Jan 6, 2015 at 2:31 PM, David Miller <davem@davemloft.net> wrote:
> From: Ani Sinha <ani@arista.com>
> Date: Tue, 6 Jan 2015 14:02:29 -0800
>
>> +netdev
>
> Please just make a fresh list posting, otherwise I have to spend
> a lot of time editing your email to get the commit message and
> other parts to come out right.
>

Sorry, I missed netdev. I resent the patch just now.

^ permalink raw reply

* [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 22:33 UTC (permalink / raw)
  To: corbet, davem, edumazet, linux-doc, linux-kernel, ani, P, netdev,
	fruggeri

Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for
ipv4.

Signed-off-by: Ani Sinha <ani@arista.com>
---
 Documentation/networking/ip-sysctl.txt |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 9bffdfc..c8a7e37 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
 	Default: 0
 
 route/max_size - INTEGER
-	Maximum number of routes allowed in the kernel.  Increase
-	this when using large numbers of interfaces and/or routes.
+        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
+        longer used. For ipv6, this is used to limit the maximum number of ipv6
+        routes allowed in the kernel.  Increase this when using large numbers of
+        interfaces and/or routes.
 
 neigh/default/gc_thresh1 - INTEGER
 	Minimum number of entries to keep.  Garbage collector will not
-- 
1.7.4.4


^ permalink raw reply related

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Miller @ 2015-01-06 22:31 UTC (permalink / raw)
  To: ani; +Cc: netdev
In-Reply-To: <CAOxq_8OGx9VgSaEimAbNZSWjihNqNBXoVg0m8EPRaNX5jLXZiw@mail.gmail.com>

From: Ani Sinha <ani@arista.com>
Date: Tue, 6 Jan 2015 14:02:29 -0800

> +netdev

Please just make a fresh list posting, otherwise I have to spend
a lot of time editing your email to get the commit message and
other parts to come out right.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v2 0/8] net: extend ethtool link mode bitmaps to 48 bits
From: David Miller @ 2015-01-06 22:29 UTC (permalink / raw)
  To: amirv-VPRAkNaXOzVWk0Htik3J/w
  Cc: ddecotig-Re5JQEeQqe8AvxtiuMwx3w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, saeedm-VPRAkNaXOzVWk0Htik3J/w,
	decot-Ypc/8FJVVoBWk0Htik3J/w, jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	mst-H+wXaHxf7aLQT0dZR+AlfA,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, ben-/+tVBieCtBitmTQ+vhA3Yw,
	yamato-H+wXaHxf7aLQT0dZR+AlfA, xii-hpIqsD4AKlfQT0dZR+AlfA,
	nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w, fbl-H+wXaHxf7aLQT0dZR+AlfA,
	teg-B22kvLQNl6c, jiri-rHqAuBHg3fBzbRFIqnYvSA,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	VenkatKumar.Duvvuru-iH1Dq9VlAzfQT0dZR+AlfA, _govind-KK0ffGbhmjU
In-Reply-To: <54ABE991.3040107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Tue, 6 Jan 2015 15:56:33 +0200

> Mellanox is about to release next month a driver for a new NIC, with 3
> new speeds * few link modes for each + new link modes for 10G.
> It seems that we will need to consume almost all the new bits.

This tells me that the approach to this problem needs to be rethought.

Maybe we just need to bite the bullet and make a new ETHTOOL_GSET_2
and ETHTOOL_SSET_2 or whatever you want to name them.

Then we can define a completely new structure, with 64-bit bitmaps
for link modes or whatever.  The ethtool_op callbacks work using
this structure, and only the net/core/ethtool.c code knows about
the older structure and translates to/from for ETHTOOL_{GSET,SSET}.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Pravin Shelar @ 2015-01-06 22:28 UTC (permalink / raw)
  To: David Miller; +Cc: Daniele Di Proietto, netdev
In-Reply-To: <CALnjE+ojw0FnonLyu8hhrWwcn_-_+EYtbFPWdCiJ-JC5h-L9eA@mail.gmail.com>

On Tue, Jan 6, 2015 at 2:15 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Jan 6, 2015 at 2:02 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@nicira.com>
>> Date: Tue, 6 Jan 2015 13:16:11 -0800
>>
>>> Function return type and function name should be on same line,
>>> otherwise looks good.
>>
>> I disagree, where is the code in the tree that needs this?
>
> Most of function definitions that I have seen are defined like this. I
> was pointing out coding style issue.

About the actual change, I think it is a cleanup. netdev_vport_index()
hides the implementation from datapath.c. I hope Daniele will explain
need for the change.

^ permalink raw reply

* Re: [RFC PATCH] unlock rtnl mutex in ic_open_devs while waiting
From: David Miller @ 2015-01-06 22:21 UTC (permalink / raw)
  To: maarten.lankhorst; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <54AA9706.5020202@canonical.com>

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Date: Mon, 05 Jan 2015 14:52:06 +0100

> This fixes a deadlock with alx_link_check, which takes the rtnl_mutex in
> a work item to check the link.
> 
> I have no idea whether alx should be fixed or ipconfig.c,
> but this saves 120 seconds off my boot time. ;-)
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

I genuinely think that alx_link_check() needs to use a smaller hammer
to do it's locking, there is no reason to use the RTNL mutex.

A driver private mutex will probably work just as well and not have
this problem.

^ permalink raw reply

* Re: [RFC] netlink: get rid of nl_table_lock
From: David Miller @ 2015-01-06 22:19 UTC (permalink / raw)
  To: stephen
  Cc: tgraf, netdev, linux-kernel, herbert, paulmck, edumazet,
	john.r.fastabend, josh
In-Reply-To: <20150103110211.18b11f0f@urahara>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sat, 3 Jan 2015 11:02:11 -0800

> As a follow on to Thomas's patch I think this would complete the
> transistion to RCU for netlink.
> Compile tested only.
> 
> This patch gets rid of the reader/writer nl_table_lock and replaces it
> with exclusively using RCU for reading, and a mutex for writing.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

FWIW, this approach looks fine to me.

Thomas can you review this?

^ permalink raw reply

* Re: [PATCH] net/fsl: Add mEMAC MDIO support to XGMAC MDIO
From: David Miller @ 2015-01-06 22:18 UTC (permalink / raw)
  To: shh.xie; +Cc: netdev, afleming, Shaohui.Xie
In-Reply-To: <1420364162-13109-1-git-send-email-shh.xie@gmail.com>

From: <shh.xie@gmail.com>
Date: Sun, 4 Jan 2015 17:36:02 +0800

> From: Andy Fleming <afleming@gmail.com>
> 
> The Freescale mEMAC supports operating at 10/100/1000/10G, and
> its associated MDIO controller is likewise capable of operating
> both Clause 22 and Clause 45 MDIO buses. It is nearly identical
> to the MDIO controller on the XGMAC, so we just modify that
> driver.
> 
> Portions of this driver developed by:
> 
> Sandeep Singh <sandeep@freescale.com>
> Roy Zang <tie-fei.zang@freescale.com>
> 
> Signed-off-by: Andy Fleming <afleming@gmail.com>
> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ethtool: Extend ethtool plugin module eeprom API to phylib
From: David Miller @ 2015-01-06 22:17 UTC (permalink / raw)
  To: eswierk; +Cc: netdev, f.fainelli, linux-kernel
In-Reply-To: <1420248476-110859-1-git-send-email-eswierk@skyportsystems.com>

From: Ed Swierk <eswierk@skyportsystems.com>
Date: Fri,  2 Jan 2015 17:27:56 -0800

> This patch extends the ethtool plugin module eeprom API to support cards
> whose phy support is delegated to a separate driver.
> 
> The handlers for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM call the
> module_info and module_eeprom functions if the phy driver provides them;
> otherwise the handlers call the equivalent ethtool_ops functions provided
> by network drivers with built-in phy support.
> 
> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Pravin Shelar @ 2015-01-06 22:15 UTC (permalink / raw)
  To: David Miller; +Cc: Daniele Di Proietto, netdev
In-Reply-To: <20150106.170206.2121115629154170856.davem@davemloft.net>

On Tue, Jan 6, 2015 at 2:02 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Tue, 6 Jan 2015 13:16:11 -0800
>
>> Function return type and function name should be on same line,
>> otherwise looks good.
>
> I disagree, where is the code in the tree that needs this?

Most of function definitions that I have seen are defined like this. I
was pointing out coding style issue.

^ permalink raw reply

* Re: [PATCH net-next] tg3: move init/deinit from open/close to probe/remove
From: Prashant Sreedharan @ 2015-01-06 21:57 UTC (permalink / raw)
  To: Ivan Vecera; +Cc: netdev, mchan
In-Reply-To: <1420576122-23618-1-git-send-email-ivecera@redhat.com>

On Tue, 2015-01-06 at 21:28 +0100, Ivan Vecera wrote:
> Move init and deinit of PTP support from open/close functions
> to probe/remove funcs to avoid removing/re-adding of associated PTP
> device(s) during ifup/ifdown.
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
>  drivers/net/ethernet/broadcom/tg3.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 553dcd8..e86bee4 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -11681,13 +11681,6 @@ static int tg3_open(struct net_device *dev)
>  		pci_set_power_state(tp->pdev, PCI_D3hot);
>  	}
>  
> -	if (tg3_flag(tp, PTP_CAPABLE)) {
> -		tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
> -						   &tp->pdev->dev);
> -		if (IS_ERR(tp->ptp_clock))
> -			tp->ptp_clock = NULL;
> -	}
> -
>  	return err;
>  }
>  
> @@ -11701,8 +11694,6 @@ static int tg3_close(struct net_device *dev)
>  		return -EAGAIN;
>  	}
>  
> -	tg3_ptp_fini(tp);
> -
>  	tg3_stop(tp);
>  
>  	/* Clear stats across close / open calls */
> @@ -17880,6 +17871,13 @@ static int tg3_init_one(struct pci_dev *pdev,
>  		goto err_out_apeunmap;
>  	}
>  
> +	if (tg3_flag(tp, PTP_CAPABLE)) {
> +		tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
> +						   &tp->pdev->dev);
> +		if (IS_ERR(tp->ptp_clock))
> +			tp->ptp_clock = NULL;
> +	}
> +
>  	netdev_info(dev, "Tigon3 [partno(%s) rev %04x] (%s) MAC address %pM\n",
>  		    tp->board_part_number,
>  		    tg3_chip_rev_id(tp),
> @@ -17955,6 +17953,8 @@ static void tg3_remove_one(struct pci_dev *pdev)
>  	if (dev) {
>  		struct tg3 *tp = netdev_priv(dev);
>  
> +		tg3_ptp_fini(tp);
> +
>  		release_firmware(tp->fw);
>  
>  		tg3_reset_task_cancel(tp);

tg3_ptp_init() needs to be called before ptp_clock_register() to
initialize the HW and poplulate the ptp_clock_info structure. Could you
please test after making this change. Thanks.

^ permalink raw reply

* Re: [PATCH] net: ethernet: cpsw: ignore VLAN ID 1
From: Felipe Balbi @ 2015-01-06 22:04 UTC (permalink / raw)
  To: David Miller; +Cc: balbi, netdev, linux-omap, stable, mugunthanvnm
In-Reply-To: <20150106.165911.604916635790072318.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Hi,

On Tue, Jan 06, 2015 at 04:59:11PM -0500, David Miller wrote:
> From: Felipe Balbi <balbi@ti.com>
> Date: Tue, 6 Jan 2015 14:31:19 -0600
> 
> > What you're saying here is that you prefer to drop a feature that works
> > for all other 1023 IDs because 1 ID is quirky. Sounds like overkill
> > to me.
> 
> The other option is to software fallback only for VLAN 1.

now we're talking. Keep in mind, however, that this IP runs on mere
single-core cortex A8 and single-core cortex A9 devices which already
have somewhat of a hard-time keeping up with the non-accelerated
checksum calculations. But fair enough, if that's the way to go, it is
the way to go.

cheers

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Fwd: [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 22:02 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <1420574648-24213-1-git-send-email-ani@arista.com>

+netdev


---------- Forwarded message ----------
From: Ani Sinha <ani@arista.com>
Date: Tue, Jan 6, 2015 at 12:04 PM
Subject: [PATCH 1/1] update ip-sysctl.txt documentation
To: corbet@lwn.net, davem@davemloft.net, edumazet@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
ani@arista.com, P@draigbrady.com


Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for
ipv4.

Signed-off-by: Ani Sinha <ani@arista.com>
---
 Documentation/networking/ip-sysctl.txt |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt
b/Documentation/networking/ip-sysctl.txt
index 9bffdfc..c8a7e37 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
        Default: 0

 route/max_size - INTEGER
-       Maximum number of routes allowed in the kernel.  Increase
-       this when using large numbers of interfaces and/or routes.
+        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
+        longer used. For ipv6, this is used to limit the maximum number of ipv6
+        routes allowed in the kernel.  Increase this when using large
numbers of
+        interfaces and/or routes.

 neigh/default/gc_thresh1 - INTEGER
        Minimum number of entries to keep.  Garbage collector will not
--
1.7.4.4

^ permalink raw reply related

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: David Miller @ 2015-01-06 22:02 UTC (permalink / raw)
  To: pshelar; +Cc: daniele.di.proietto, netdev
In-Reply-To: <CALnjE+odj6GdO97n4NBLQhP4egMzuNHMjxVqMwC7fSC5C=hT-g@mail.gmail.com>

From: Pravin Shelar <pshelar@nicira.com>
Date: Tue, 6 Jan 2015 13:16:11 -0800

> Function return type and function name should be on same line,
> otherwise looks good.

I disagree, where is the code in the tree that needs this?

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: David Miller @ 2015-01-06 22:01 UTC (permalink / raw)
  To: daniele.di.proietto; +Cc: netdev, pshelar
In-Reply-To: <1420577481-20238-1-git-send-email-daniele.di.proietto@gmail.com>

From: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Date: Tue,  6 Jan 2015 21:51:21 +0100

> This commit introduces netdev_vport_index() to prevent datapath.c from directly accessing the 'dev' member of 'struct netdev_vport'.
> This fix is needed to allow possible alternative netdev_vport implementations.
> 
> Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>

This doesn't make any sense to me, as the code currently stands your
change is not necessary at all.

If some need does arise, submit this patch along with the change
that creates the need.

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Yuchung Cheng @ 2015-01-06 22:00 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <3F608393-E5F1-4647-81BF-C6C740100934@bengler.no>

On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>
>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>
>>>
>>> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
>>>
>>>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>>>>
>>>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>>>>
>>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>>>>
>>>
>>> Although sender seems to be reluctant to retransmit, this 'failure' is
>>> caused by receiver closing the connection too soon.
>>>
>>> Are you sure you do not ask curl to setup a very small completion
>>> timer ?
>>
>> For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).
>>
>>> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
>>> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
>>> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
>>> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
>>> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
>>> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
>>> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
>>> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
>>> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
>>> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
>>> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
>>> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
>>> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
>>> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
>>>
>>> -> Here receiver sends a FIN, because application closed the socket (or died)
>>> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
>>> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
>>> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
>>>
>>> Anyway, getting decent speed without SACK is going to be hard.
>>
>> Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.
>
> It turns out the Norway and Netherlands servers were resolving different loadbalancers. The ones I reached in Norway did not support SACKs, while the ones in the Netherlands did. Going directly to a SACK-enabled IP fixes the problem.
>
> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
probably some minor change, which gets amplified by the lack of SACKs
on the loadbalancer. Anyway, I’ll bring it up with Amazon.
can you post traces with the older kernels?

>
> Many thanks for your help, everyone.

^ permalink raw reply

* Re: [PATCH] net: ethernet: cpsw: ignore VLAN ID 1
From: David Miller @ 2015-01-06 21:59 UTC (permalink / raw)
  To: balbi; +Cc: netdev, linux-omap, stable, mugunthanvnm
In-Reply-To: <20150106203119.GC32308@saruman>

From: Felipe Balbi <balbi@ti.com>
Date: Tue, 6 Jan 2015 14:31:19 -0600

> What you're saying here is that you prefer to drop a feature that works
> for all other 1023 IDs because 1 ID is quirky. Sounds like overkill
> to me.

The other option is to software fallback only for VLAN 1.

^ permalink raw reply

* Re: Does the ordering of the fib_table_dump or /proc/net/fib_trie matter?
From: David Miller @ 2015-01-06 21:58 UTC (permalink / raw)
  To: alexander.duyck; +Cc: stephen, netdev
In-Reply-To: <54AC45CE.80800@gmail.com>

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Tue, 06 Jan 2015 12:30:06 -0800

> The question I have is if that would screw up any user-space apps.  I
> know ip route can dump the list via "ip route show".  I'm just wondering
> if there would be any problem with default being the last entry instead
> of the first entry?

The ordering already changed once when we went from fib_hash to
fib_trie, nobody should depend upon the ordering.

^ permalink raw reply

* Re: Possible BUG in ipv6_find_hdr function for fragmented packets
From: Rahul Sharma @ 2015-01-06 21:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev
In-Reply-To: <1420551094.32369.34.camel@stressinduktion.org>

Hi Hannes

On Tue, Jan 6, 2015 at 7:01 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Rahul,
>
> On Mi, 2014-12-31 at 12:33 +0530, Rahul Sharma wrote:
>> I have observed a problem when I added an AH header before protocol
>> header (OSPFv3) while implementing authentication support for OSPFv3.
>>
>> Problem: Fragmented packets which include authentication header don't
>> get reassembled in the kernel. This was because ipv6_find_hdr returns
>> ENOENT for the non-first fragment since AH is an extension header.
>>
>> Firstly, this comment  "Note that non-1st fragment is special case
>> that "the protocol number of last header" is "next header" field in
>> Fragment header" ('last header' doesn't include AH or other extension
>> headers) before ipv6_find_hdr looks incorrect as per the description
>> of the fragmentation process in RFC2460. The rfc clearly states that
>> next header value in the fragments will be the first header of the
>> Fragmentable part of the original packet which could be AH (51) as in
>> our case.
>>
>> This code looks like a problem:
>> if (_frag_off) {
>> 253                                 if (target < 0 &&
>> 254                                     ((!ipv6_ext_hdr(hp->nexthdr)) ||
>> 255                                      hp->nexthdr == NEXTHDR_NONE)) {
>> 256                                         if (fragoff)
>> 257                                                 *fragoff = _frag_off;
>> 258                                         return hp->nexthdr;
>> 259                                 }
>> 260                                 return -ENOENT;
>> 261                         }
>>
>> For non-first fragments, the 'next header' in the fragment header
>> would *always* be AUTH (or whatever extension header is the first
>> header in first fragment). But the above code will keep on returning
>> ENOENT for the non-first fragment in such cases.
>>
>> Solution: I suggest we should get away with this check
>> ((!ipv6_ext_hdr(hp->nexthdr)) ||hp->nexthdr == NEXTHDR_NONE))  and
>> simply return hp->nexthdr if the _frag_off is non zero. I tested it on
>> my machine and it works. Adding an special case for NEXTHDR_AUTH also
>> works for me.
>
> The packets do get dropped in netfilter code? Do you have any idea were
> specifically?
>
> Your suggestion seems correct to me, can you provide a patch to fix
> this?
>
> Thanks,
> Hannes
>
>

Yes, the packets get dropped in the netfilter code. ip6table_raw_hook
was returning NF_DROP for the second fragment.
This was because of xt_action_param structure's hotdrop flag being set
to true for this fragment when ip6t_do_table tries to call
ip6_packet_match which in turn calls ipv6_find_hdr which was returning
ENOENT.

I have also emailed the patch.

Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox