Netdev List
 help / color / mirror / Atom feed
* Re: [RFC] netlink: get rid of nl_table_lock
From: David Miller @ 2015-01-06 22:19 UTC (permalink / raw)
  To: stephen
  Cc: tgraf, netdev, linux-kernel, herbert, paulmck, edumazet,
	john.r.fastabend, josh
In-Reply-To: <20150103110211.18b11f0f@urahara>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sat, 3 Jan 2015 11:02:11 -0800

> As a follow on to Thomas's patch I think this would complete the
> transistion to RCU for netlink.
> Compile tested only.
> 
> This patch gets rid of the reader/writer nl_table_lock and replaces it
> with exclusively using RCU for reading, and a mutex for writing.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

FWIW, this approach looks fine to me.

Thomas can you review this?

^ permalink raw reply

* Re: [RFC PATCH] unlock rtnl mutex in ic_open_devs while waiting
From: David Miller @ 2015-01-06 22:21 UTC (permalink / raw)
  To: maarten.lankhorst; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <54AA9706.5020202@canonical.com>

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Date: Mon, 05 Jan 2015 14:52:06 +0100

> This fixes a deadlock with alx_link_check, which takes the rtnl_mutex in
> a work item to check the link.
> 
> I have no idea whether alx should be fixed or ipconfig.c,
> but this saves 120 seconds off my boot time. ;-)
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

I genuinely think that alx_link_check() needs to use a smaller hammer
to do it's locking, there is no reason to use the RTNL mutex.

A driver private mutex will probably work just as well and not have
this problem.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Pravin Shelar @ 2015-01-06 22:28 UTC (permalink / raw)
  To: David Miller; +Cc: Daniele Di Proietto, netdev
In-Reply-To: <CALnjE+ojw0FnonLyu8hhrWwcn_-_+EYtbFPWdCiJ-JC5h-L9eA@mail.gmail.com>

On Tue, Jan 6, 2015 at 2:15 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Jan 6, 2015 at 2:02 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@nicira.com>
>> Date: Tue, 6 Jan 2015 13:16:11 -0800
>>
>>> Function return type and function name should be on same line,
>>> otherwise looks good.
>>
>> I disagree, where is the code in the tree that needs this?
>
> Most of function definitions that I have seen are defined like this. I
> was pointing out coding style issue.

About the actual change, I think it is a cleanup. netdev_vport_index()
hides the implementation from datapath.c. I hope Daniele will explain
need for the change.

^ permalink raw reply

* Re: [PATCH net-next v2 0/8] net: extend ethtool link mode bitmaps to 48 bits
From: David Miller @ 2015-01-06 22:29 UTC (permalink / raw)
  To: amirv-VPRAkNaXOzVWk0Htik3J/w
  Cc: ddecotig-Re5JQEeQqe8AvxtiuMwx3w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, saeedm-VPRAkNaXOzVWk0Htik3J/w,
	decot-Ypc/8FJVVoBWk0Htik3J/w, jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	mst-H+wXaHxf7aLQT0dZR+AlfA,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, ben-/+tVBieCtBitmTQ+vhA3Yw,
	yamato-H+wXaHxf7aLQT0dZR+AlfA, xii-hpIqsD4AKlfQT0dZR+AlfA,
	nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w, fbl-H+wXaHxf7aLQT0dZR+AlfA,
	teg-B22kvLQNl6c, jiri-rHqAuBHg3fBzbRFIqnYvSA,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	VenkatKumar.Duvvuru-iH1Dq9VlAzfQT0dZR+AlfA, _govind-KK0ffGbhmjU
In-Reply-To: <54ABE991.3040107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Tue, 6 Jan 2015 15:56:33 +0200

> Mellanox is about to release next month a driver for a new NIC, with 3
> new speeds * few link modes for each + new link modes for 10G.
> It seems that we will need to consume almost all the new bits.

This tells me that the approach to this problem needs to be rethought.

Maybe we just need to bite the bullet and make a new ETHTOOL_GSET_2
and ETHTOOL_SSET_2 or whatever you want to name them.

Then we can define a completely new structure, with 64-bit bitmaps
for link modes or whatever.  The ethtool_op callbacks work using
this structure, and only the net/core/ethtool.c code knows about
the older structure and translates to/from for ETHTOOL_{GSET,SSET}.

^ permalink raw reply

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Miller @ 2015-01-06 22:31 UTC (permalink / raw)
  To: ani; +Cc: netdev
In-Reply-To: <CAOxq_8OGx9VgSaEimAbNZSWjihNqNBXoVg0m8EPRaNX5jLXZiw@mail.gmail.com>

From: Ani Sinha <ani@arista.com>
Date: Tue, 6 Jan 2015 14:02:29 -0800

> +netdev

Please just make a fresh list posting, otherwise I have to spend
a lot of time editing your email to get the commit message and
other parts to come out right.

Thanks.

^ permalink raw reply

* [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 22:33 UTC (permalink / raw)
  To: corbet, davem, edumazet, linux-doc, linux-kernel, ani, P, netdev,
	fruggeri

Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for
ipv4.

Signed-off-by: Ani Sinha <ani@arista.com>
---
 Documentation/networking/ip-sysctl.txt |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 9bffdfc..c8a7e37 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
 	Default: 0
 
 route/max_size - INTEGER
-	Maximum number of routes allowed in the kernel.  Increase
-	this when using large numbers of interfaces and/or routes.
+        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
+        longer used. For ipv6, this is used to limit the maximum number of ipv6
+        routes allowed in the kernel.  Increase this when using large numbers of
+        interfaces and/or routes.
 
 neigh/default/gc_thresh1 - INTEGER
 	Minimum number of entries to keep.  Garbage collector will not
-- 
1.7.4.4


^ permalink raw reply related

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20150106.173110.2230465152118236848.davem@davemloft.net>

On Tue, Jan 6, 2015 at 2:31 PM, David Miller <davem@davemloft.net> wrote:
> From: Ani Sinha <ani@arista.com>
> Date: Tue, 6 Jan 2015 14:02:29 -0800
>
>> +netdev
>
> Please just make a fresh list posting, otherwise I have to spend
> a lot of time editing your email to get the commit message and
> other parts to come out right.
>

Sorry, I missed netdev. I resent the patch just now.

^ permalink raw reply

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Miller @ 2015-01-06 22:40 UTC (permalink / raw)
  To: ani; +Cc: corbet, edumazet, linux-doc, linux-kernel, P, netdev, fruggeri
In-Reply-To: <1420583625-32414-1-git-send-email-ani@arista.com>

From: Ani Sinha <ani@arista.com>
Date: Tue,  6 Jan 2015 14:33:45 -0800

> @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
>  	Default: 0
>  
>  route/max_size - INTEGER
> -	Maximum number of routes allowed in the kernel.  Increase
> -	this when using large numbers of interfaces and/or routes.
> +        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
> +        longer used. For ipv6, this is used to limit the maximum number of ipv6
> +        routes allowed in the kernel.  Increase this when using large numbers of
> +        interfaces and/or routes.

Please do not change the TABs into sequenes of space characters.

^ permalink raw reply

* [PATCH V3] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: Feng Kan @ 2015-01-06 22:41 UTC (permalink / raw)
  To: patches, davem, netdev, linux-kernel; +Cc: Feng Kan

This adds support for APM X-Gene ethernet driver to use ACPI table to derive
ethernet driver parameter.

Signed-off-by: Feng Kan <fkan@apm.com>
---
V3:
   - Fix compile error caught by allmodconfig
V2:
   - remove NO_MAC define
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |   94 ++++++++++++++++------
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |   97 +++++++++++++++++-----
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |    3 +
 3 files changed, 150 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 7ba83ff..869d97f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -593,10 +593,12 @@ static int xgene_enet_reset(struct xgene_enet_pdata *pdata)
 	if (!xgene_ring_mgr_init(pdata))
 		return -ENODEV;
 
-	clk_prepare_enable(pdata->clk);
-	clk_disable_unprepare(pdata->clk);
-	clk_prepare_enable(pdata->clk);
-	xgene_enet_ecc_init(pdata);
+	if (!efi_enabled(EFI_BOOT)) {
+		clk_prepare_enable(pdata->clk);
+		clk_disable_unprepare(pdata->clk);
+		clk_prepare_enable(pdata->clk);
+		xgene_enet_ecc_init(pdata);
+	}
 	xgene_enet_config_ring_if_assoc(pdata);
 
 	/* Enable auto-incr for scanning */
@@ -663,15 +665,20 @@ static int xgene_enet_phy_connect(struct net_device *ndev)
 	struct phy_device *phy_dev;
 	struct device *dev = &pdata->pdev->dev;
 
-	phy_np = of_parse_phandle(dev->of_node, "phy-handle", 0);
-	if (!phy_np) {
-		netdev_dbg(ndev, "No phy-handle found\n");
-		return -ENODEV;
+	if (dev->of_node) {
+		phy_np = of_parse_phandle(dev->of_node, "phy-handle", 0);
+		if (!phy_np) {
+			netdev_dbg(ndev, "No phy-handle found in DT\n");
+			return -ENODEV;
+		}
+		pdata->phy_dev = of_phy_find_device(phy_np);
 	}
 
-	phy_dev = of_phy_connect(ndev, phy_np, &xgene_enet_adjust_link,
-				 0, pdata->phy_mode);
-	if (!phy_dev) {
+	phy_dev = pdata->phy_dev;
+
+	if (!phy_dev ||
+	    phy_connect_direct(ndev, phy_dev, &xgene_enet_adjust_link,
+			       pdata->phy_mode)) {
 		netdev_err(ndev, "Could not connect to PHY\n");
 		return  -ENODEV;
 	}
@@ -681,32 +688,71 @@ static int xgene_enet_phy_connect(struct net_device *ndev)
 			      ~SUPPORTED_100baseT_Half &
 			      ~SUPPORTED_1000baseT_Half;
 	phy_dev->advertising = phy_dev->supported;
-	pdata->phy_dev = phy_dev;
 
 	return 0;
 }
 
-int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
+static int xgene_mdiobus_register(struct xgene_enet_pdata *pdata,
+				  struct mii_bus *mdio)
 {
-	struct net_device *ndev = pdata->ndev;
 	struct device *dev = &pdata->pdev->dev;
+	struct net_device *ndev = pdata->ndev;
+	struct phy_device *phy;
 	struct device_node *child_np;
 	struct device_node *mdio_np = NULL;
-	struct mii_bus *mdio_bus;
 	int ret;
+	u32 phy_id;
+
+	if (dev->of_node) {
+		for_each_child_of_node(dev->of_node, child_np) {
+			if (of_device_is_compatible(child_np,
+						    "apm,xgene-mdio")) {
+				mdio_np = child_np;
+				break;
+			}
+		}
 
-	for_each_child_of_node(dev->of_node, child_np) {
-		if (of_device_is_compatible(child_np, "apm,xgene-mdio")) {
-			mdio_np = child_np;
-			break;
+		if (!mdio_np) {
+			netdev_dbg(ndev, "No mdio node in the dts\n");
+			return -ENXIO;
 		}
-	}
 
-	if (!mdio_np) {
-		netdev_dbg(ndev, "No mdio node in the dts\n");
-		return -ENXIO;
+		return of_mdiobus_register(mdio, mdio_np);
 	}
 
+	/* Mask out all PHYs from auto probing. */
+	mdio->phy_mask = ~0;
+
+	/* Register the MDIO bus */
+	ret = mdiobus_register(mdio);
+	if (ret)
+		return ret;
+
+	ret = device_property_read_u32(dev, "phy-channel", &phy_id);
+	if (ret)
+		ret = device_property_read_u32(dev, "phy-addr", &phy_id);
+	if (ret)
+		return -EINVAL;
+
+	phy = get_phy_device(mdio, phy_id, true);
+	if (!phy || IS_ERR(phy))
+		return -EIO;
+
+	ret = phy_device_register(phy);
+	if (ret)
+		phy_device_free(phy);
+	else
+		pdata->phy_dev = phy;
+
+	return ret;
+}
+
+int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
+{
+	struct net_device *ndev = pdata->ndev;
+	struct mii_bus *mdio_bus;
+	int ret;
+
 	mdio_bus = mdiobus_alloc();
 	if (!mdio_bus)
 		return -ENOMEM;
@@ -720,7 +766,7 @@ int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
 	mdio_bus->priv = pdata;
 	mdio_bus->parent = &ndev->dev;
 
-	ret = of_mdiobus_register(mdio_bus, mdio_np);
+	ret = xgene_mdiobus_register(pdata, mdio_bus);
 	if (ret) {
 		netdev_err(ndev, "Failed to register MDIO bus\n");
 		mdiobus_free(mdio_bus);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 83a5028..1e56bf3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -24,6 +24,10 @@
 #include "xgene_enet_sgmac.h"
 #include "xgene_enet_xgmac.h"
 
+#define RES_ENET_CSR	0
+#define RES_RING_CSR	1
+#define RES_RING_CMD	2
+
 static void xgene_enet_init_bufpool(struct xgene_enet_desc_ring *buf_pool)
 {
 	struct xgene_enet_raw_desc16 *raw_desc;
@@ -746,6 +750,41 @@ static const struct net_device_ops xgene_ndev_ops = {
 	.ndo_set_mac_address = xgene_enet_set_mac_address,
 };
 
+static int xgene_get_mac_address(struct device *dev,
+				 unsigned char *addr)
+{
+	int ret;
+
+	ret = device_property_read_u8_array(dev, "local-mac-address", addr, 6);
+	if (ret)
+		ret = device_property_read_u8_array(dev, "mac-address",
+						    addr, 6);
+	if (ret)
+		return -ENODEV;
+
+	return ETH_ALEN;
+}
+
+static int xgene_get_phy_mode(struct device *dev)
+{
+	int i, ret;
+	char *modestr;
+
+	ret = device_property_read_string(dev, "phy-connection-type",
+					  (const char **)&modestr);
+	if (ret)
+		ret = device_property_read_string(dev, "phy-mode",
+						  (const char **)&modestr);
+	if (ret)
+		return -ENODEV;
+
+	for (i = 0; i < PHY_INTERFACE_MODE_MAX; i++) {
+		if (!strcasecmp(modestr, phy_modes(i)))
+			return i;
+	}
+	return -ENODEV;
+}
+
 static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 {
 	struct platform_device *pdev;
@@ -753,29 +792,42 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	struct device *dev;
 	struct resource *res;
 	void __iomem *base_addr;
-	const char *mac;
 	int ret;
 
 	pdev = pdata->pdev;
 	dev = &pdev->dev;
 	ndev = pdata->ndev;
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "enet_csr");
-	pdata->base_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_ENET_CSR);
+	if (!res) {
+		dev_err(dev, "Resource enet_csr not defined\n");
+		return -ENODEV;
+	}
+	pdata->base_addr = devm_ioremap(dev, res->start, resource_size(res));
 	if (IS_ERR(pdata->base_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Port CSR region\n");
 		return PTR_ERR(pdata->base_addr);
 	}
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_csr");
-	pdata->ring_csr_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_RING_CSR);
+	if (!res) {
+		dev_err(dev, "Resource ring_csr not defined\n");
+		return -ENODEV;
+	}
+	pdata->ring_csr_addr = devm_ioremap(dev, res->start,
+							resource_size(res));
 	if (IS_ERR(pdata->ring_csr_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring CSR region\n");
 		return PTR_ERR(pdata->ring_csr_addr);
 	}
 
-	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_cmd");
-	pdata->ring_cmd_addr = devm_ioremap_resource(dev, res);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, RES_RING_CMD);
+	if (!res) {
+		dev_err(dev, "Resource ring_cmd not defined\n");
+		return -ENODEV;
+	}
+	pdata->ring_cmd_addr = devm_ioremap(dev, res->start,
+							resource_size(res));
 	if (IS_ERR(pdata->ring_cmd_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring command region\n");
 		return PTR_ERR(pdata->ring_cmd_addr);
@@ -789,14 +841,12 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 	pdata->rx_irq = ret;
 
-	mac = of_get_mac_address(dev->of_node);
-	if (mac)
-		memcpy(ndev->dev_addr, mac, ndev->addr_len);
-	else
+	if (xgene_get_mac_address(dev, ndev->dev_addr) != ETH_ALEN)
 		eth_hw_addr_random(ndev);
+
 	memcpy(ndev->perm_addr, ndev->dev_addr, ndev->addr_len);
 
-	pdata->phy_mode = of_get_phy_mode(pdev->dev.of_node);
+	pdata->phy_mode = xgene_get_phy_mode(dev);
 	if (pdata->phy_mode < 0) {
 		dev_err(dev, "Unable to get phy-connection-type\n");
 		return pdata->phy_mode;
@@ -809,11 +859,9 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 
 	pdata->clk = devm_clk_get(&pdev->dev, NULL);
-	ret = IS_ERR(pdata->clk);
 	if (IS_ERR(pdata->clk)) {
-		dev_err(&pdev->dev, "can't get clock\n");
-		ret = PTR_ERR(pdata->clk);
-		return ret;
+		/* Firmware may have set up the clock already. */
+		pdata->clk = NULL;
 	}
 
 	base_addr = pdata->base_addr;
@@ -924,7 +972,7 @@ static int xgene_enet_probe(struct platform_device *pdev)
 		goto err;
 	}
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+	ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(64));
 	if (ret) {
 		netdev_err(ndev, "No usable DMA configuration\n");
 		goto err;
@@ -972,17 +1020,26 @@ static int xgene_enet_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static struct of_device_id xgene_enet_match[] = {
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id xgene_enet_acpi_match[] = {
+	{ "APMC0D05", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, xgene_enet_acpi_match);
+#endif
+
+static struct of_device_id xgene_enet_of_match[] = {
 	{.compatible = "apm,xgene-enet",},
 	{},
 };
 
-MODULE_DEVICE_TABLE(of, xgene_enet_match);
+MODULE_DEVICE_TABLE(of, xgene_enet_of_match);
 
 static struct platform_driver xgene_enet_driver = {
 	.driver = {
 		   .name = "xgene-enet",
-		   .of_match_table = xgene_enet_match,
+		   .of_match_table = of_match_ptr(xgene_enet_of_match),
+		   .acpi_match_table = ACPI_PTR(xgene_enet_acpi_match),
 	},
 	.probe = xgene_enet_probe,
 	.remove = xgene_enet_remove,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index f9958fa..c2d465c 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -22,7 +22,10 @@
 #ifndef __XGENE_ENET_MAIN_H__
 #define __XGENE_ENET_MAIN_H__
 
+#include <linux/acpi.h>
 #include <linux/clk.h>
+#include <linux/efi.h>
+#include <linux/io.h>
 #include <linux/of_platform.h>
 #include <linux/of_net.h>
 #include <linux/of_mdio.h>
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] qla3xxx: don't allow never end busy loop
From: David Miller @ 2015-01-06 22:41 UTC (permalink / raw)
  To: andy.shevchenko; +Cc: netdev, linux-driver
In-Reply-To: <1420579073-24637-1-git-send-email-andy.shevchenko@gmail.com>

From: Andy Shevchenko <andy.shevchenko@gmail.com>
Date: Tue,  6 Jan 2015 23:17:53 +0200

> The counter variable wasn't increased at all which may stuck under
> certain circumstances.
> 
> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Pablo Neira Ayuso @ 2015-01-06 22:47 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: netdev, linux-kernel, hannes, netfilter-devel
In-Reply-To: <CAFB3abzYNYqtzd_n+Ym0Lo=DYbV0HPU=Tqw=stALNnu188hMKQ@mail.gmail.com>

On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
> ipv6_find_hdr() currently assumes that the next-header field in the
> fragment header of the non-first fragment is the "protocol number of
> the last header" (here last header excludes any extension header
> protocol numbers ) which is incorrect as per RFC2460. The next-header
> value is the first header of the fragmentable part of the original
> packet (which can be extension header as well).
> This can create reassembly problems. For example: Fragmented
> authenticated OSPFv3 packets (where AH header is inserted before the
> protocol header). For the second fragment, the next header value in
> the fragment header will be NEXTHDR_AUTH which is correct but
> ipv6_find_hdr will return ENOENT since AH is an extension header
> resulting in second fragment getting dropped. This check for the
> presence of non-extension header needs to be removed.
> 
> Signed-off-by: Rahul Sharma <rsharma@arista.com>
> ---
> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
> 10:25:36.411419863 -0800
> +++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
> 10:51:45.819364986 -0800
> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
>   * If the first fragment doesn't contain the final protocol header or
>   * NEXTHDR_NONE it is considered invalid.
>   *
> - * Note that non-1st fragment is special case that "the protocol number
> - * of last header" is "next header" field in Fragment header. In this case,
> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
> - * isn't NULL.
> + * Note that non-1st fragment is special case that "the protocol number of the
> + * first header of the fragmentable part of the original packet" is
> + * "next header" field in the Fragment header. In this case, *offset is
> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
> + * NULL.
>   *
>   * if flags is not NULL and it's a fragment, then the frag flag
>   * IP6_FH_F_FRAG will be set. If it's an AH header, the
> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
> 
>                         _frag_off = ntohs(*fp) & ~0x7;
>                         if (_frag_off) {
> -                               if (target < 0 &&
> -                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||

This check assumes that the following headers cannot show up in the
fragmented part of the IPv6 packet:

 12 bool ipv6_ext_hdr(u8 nexthdr)
 13 {
 14         /*
 15          * find out if nexthdr is an extension header or a protocol
 16          */
 17         return   (nexthdr == NEXTHDR_HOP)       ||
 18                  (nexthdr == NEXTHDR_ROUTING)   ||
 19                  (nexthdr == NEXTHDR_FRAGMENT)  ||
 20                  (nexthdr == NEXTHDR_AUTH)      ||
 21                  (nexthdr == NEXTHDR_NONE)      ||
 22                  (nexthdr == NEXTHDR_DEST);

> -                                    hp->nexthdr == NEXTHDR_NONE)) {
> +                               if (target < 0) {
>                                         if (fragoff)
>                                                 *fragoff = _frag_off;
>                                         return hp->nexthdr;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] netlink: get rid of nl_table_lock
From: Thomas Graf @ 2015-01-06 23:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: davem, netdev, linux-kernel, herbert, paulmck, edumazet,
	john.r.fastabend, josh
In-Reply-To: <20150103110211.18b11f0f@urahara>

On 01/03/15 at 11:02am, Stephen Hemminger wrote:
> As a follow on to Thomas's patch I think this would complete the
> transistion to RCU for netlink.
> Compile tested only.
> 
> 
> 
> This patch gets rid of the reader/writer nl_table_lock and replaces it
> with exclusively using RCU for reading, and a mutex for writing.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

I like it. One thing I noticed it that it leaves a mix of
native mutex unlocks and unlocks via netlink_table_ungrab().

The Open vSwitch upcall is a pretty good real world Netlink
benchmark. I'll run the tests I have to see if this has any
unexpected side effects.

>  void netlink_table_grab(void)
> -	__acquires(nl_table_lock)
>  {
> -	might_sleep();
> -
> -	write_lock_irq(&nl_table_lock);
> -
> -	if (atomic_read(&nl_table_users)) {
> -		DECLARE_WAITQUEUE(wait, current);
> -
> -		add_wait_queue_exclusive(&nl_table_wait, &wait);
> -		for (;;) {
> -			set_current_state(TASK_UNINTERRUPTIBLE);
> -			if (atomic_read(&nl_table_users) == 0)
> -				break;
> -			write_unlock_irq(&nl_table_lock);
> -			schedule();
> -			write_lock_irq(&nl_table_lock);
> -		}
> -
> -		__set_current_state(TASK_RUNNING);
> -		remove_wait_queue(&nl_table_wait, &wait);
> -	}
> +	mutex_lock(&nl_table_mutex);
>  }

I left this untouched so far as I wasn't clear on what side effect
it would have to remove this as it does explicitely relax writers
right now.

^ permalink raw reply

* Re: [PATCH net-next v2 0/8] net: extend ethtool link mode bitmaps to 48 bits
From: Ben Hutchings @ 2015-01-06 23:08 UTC (permalink / raw)
  To: David Miller
  Cc: amirv, ddecotig, f.fainelli, netdev, linux-kernel, linux-api,
	saeedm, decot, jasowang, mst, herbert, viro, yamato, xii, nhorman,
	xiyou.wangcong, fbl, teg, jiri, vyasevic, ebiederm,
	VenkatKumar.Duvvuru, _govind
In-Reply-To: <20150106.172918.70204012105519766.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On Tue, 2015-01-06 at 17:29 -0500, David Miller wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Tue, 6 Jan 2015 15:56:33 +0200
> 
> > Mellanox is about to release next month a driver for a new NIC, with 3
> > new speeds * few link modes for each + new link modes for 10G.
> > It seems that we will need to consume almost all the new bits.
> 
> This tells me that the approach to this problem needs to be rethought.
> 
> Maybe we just need to bite the bullet and make a new ETHTOOL_GSET_2
> and ETHTOOL_SSET_2 or whatever you want to name them.
> 
> Then we can define a completely new structure, with 64-bit bitmaps
> for link modes or whatever.  The ethtool_op callbacks work using
> this structure, and only the net/core/ethtool.c code knows about
> the older structure and translates to/from for ETHTOOL_{GSET,SSET}.

Agreed.

Ben.

-- 
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [PATCH V3] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: David Miller @ 2015-01-06 23:23 UTC (permalink / raw)
  To: fkan; +Cc: patches, netdev, linux-kernel
In-Reply-To: <1420584093-19236-1-git-send-email-fkan@apm.com>

From: Feng Kan <fkan@apm.com>
Date: Tue,  6 Jan 2015 15:41:33 -0700

> This adds support for APM X-Gene ethernet driver to use ACPI table to derive
> ethernet driver parameter.
> 
> Signed-off-by: Feng Kan <fkan@apm.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Daniele Di Proietto @ 2015-01-06 23:32 UTC (permalink / raw)
  To: Pravin Shelar, David Miller; +Cc: netdev
In-Reply-To: <CALnjE+ormbwkh-=gjPA4iNP+n_dWP6JQxER1Rz6X-c1h8_Y6mw@mail.gmail.com>

The motivation for the change is to make datapath.c independent from
the actual implementation of the vport. The problem came up when
experimenting with other vport implementations and this type of change
will help identifying layering violations.
You are perfectly right, however, that in this form the code does not
improve much: we should rather provide a vport_index() call, and
implement one in each of the vports.

Thoughts?

2015-01-06 23:28 GMT+01:00 Pravin Shelar <pshelar@nicira.com>:
> On Tue, Jan 6, 2015 at 2:15 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Tue, Jan 6, 2015 at 2:02 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Pravin Shelar <pshelar@nicira.com>
>>> Date: Tue, 6 Jan 2015 13:16:11 -0800
>>>
>>>> Function return type and function name should be on same line,
>>>> otherwise looks good.
>>>
>>> I disagree, where is the code in the tree that needs this?
>>
>> Most of function definitions that I have seen are defined like this. I
>> was pointing out coding style issue.
>
> About the actual change, I think it is a cleanup. netdev_vport_index()
> hides the implementation from datapath.c. I hope Daniele will explain
> need for the change.

^ permalink raw reply

* Re: [PATCH] brcm80211: brcmsmac: dma: Remove some unused functions
From: Rickard Strandqvist @ 2015-01-06 23:33 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Kalle Valo, Larry Finger, Brett Rudley, Hante Meuleman,
	Fabian Frederick, linux-wireless@vger.kernel.org,
	brcm80211-dev-list, Network Development,
	Linux Kernel Mailing List, Julia Lawall
In-Reply-To: <54AA7042.50207@broadcom.com>

2015-01-05 12:06 GMT+01:00 Arend van Spriel <arend@broadcom.com>:
> On 01/05/15 11:49, Kalle Valo wrote:
>>
>> Rickard Strandqvist<rickard_strandqvist@spectrumdigital.se>  writes:
>>
>>> As I hope you can see I have made some changes regarding the
>>> subject-line. Thought it was an advantage to be able to see which file
>>> I actually removed something from. There seems to be a big focus on
>>> getting right on subject-line right in recent weeks.
>>>
>>> I wonder why there is a script that takes a file name, and respond
>>> with an appropriate subject line?
>
>
> Is there a script for this? Anyway, I would say driver name is enough.
> Enough about the subject line ;-) I would like to give some general remarks
> as you seem to touch a lot of kernel code. First off, I think it is good to
> remove unused stuff. However, I would like some more explanation on your
> methodology apart from "partially found by using a static code analysis
> program". So a cover-letter explaining that would have been nice (maybe
> still is). Things like Kconfig option can affect whether function are used
> or not so how did you cover that.
>
> Regards,
> Arend
>
>
>> I don't think you can really automate this as some drivers do this a bit
>> differently. You always need to manually check the commit log.
>>
>>> But ok, I change my script accordingly. Should I submit the patch again?
>>
>>
>> Yes, please resubmit.
>>
>

Hi Arend

Yes, a script that had been excellent, I think!
I have one as part of my git send-email script, until a week ago, it
was enough that I removed the "drivers/" and changed all "/" to ": "
I have now been expanded my sed pipe a lot (tell me if anyone is interested)
But now I've seen everything from uppercase and [DIR], etc.
So I can not understand how anyone should be able to get the right
name without a good help.

Sure i like to share how I use cppcheck, but is very hesitant to write
this with each patch mails I send though!

I run:
cppcheck --force --quiet --enable=all .

Or a specific file instead of .

This will include, among other things get a lot of error message such,
+4000 for the kernel.
(style) The function 'xxx' is never used

For these I made a script that searched through all the files after
the function name (cppcheck missed a few). And save the rest so I go
through them and possibly send patches.


Kind regards
Rickard Strandqvist

^ permalink raw reply

* [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 23:36 UTC (permalink / raw)
  To: corbet, davem, edumazet, linux-doc, linux-kernel, ani, P, netdev,
	fruggeri

Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for ipv4.

Signed-off-by: Ani Sinha <ani@arista.com>
---
 Documentation/networking/ip-sysctl.txt |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 9bffdfc..2a28261 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
 	Default: 0
 
 route/max_size - INTEGER
-	Maximum number of routes allowed in the kernel.  Increase
-	this when using large numbers of interfaces and/or routes.
+	Post linux kernel 3.6, this is deprecated for ipv4 as route cache is no
+	longer used. For ipv6, this is used to limit the maximum number of ipv6
+	routes allowed in the kernel.  Increase this when using large numbers of
+	interfaces and/or routes.
 
 neigh/default/gc_thresh1 - INTEGER
 	Minimum number of entries to keep.  Garbage collector will not
-- 
1.7.4.4


^ permalink raw reply related

* [Patch net-next] ipv6: fix redefinition of in6_pktinfo and ip6_mtuinfo
From: Cong Wang @ 2015-01-06 23:45 UTC (permalink / raw)
  To: netdev; +Cc: carlos, vlee, davem, Cong Wang

Both netinet/in.h and linux/ipv6.h define these two structs,
if we include both of them, we got:

	/usr/include/linux/ipv6.h:19:8: error: redefinition of ‘struct in6_pktinfo’
	 struct in6_pktinfo {
		^
	In file included from /usr/include/arpa/inet.h:22:0,
			 from txtimestamp.c:33:
	/usr/include/netinet/in.h:524:8: note: originally defined here
	 struct in6_pktinfo
		^
	In file included from txtimestamp.c:40:0:
	/usr/include/linux/ipv6.h:24:8: error: redefinition of ‘struct ip6_mtuinfo’
	 struct ip6_mtuinfo {
		^
	In file included from /usr/include/arpa/inet.h:22:0,
			 from txtimestamp.c:33:
	/usr/include/netinet/in.h:531:8: note: originally defined here
	 struct ip6_mtuinfo
		^
So similarly to what we did for in6_addr, we need to sync with
libc header on their definitions.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/uapi/linux/ipv6.h        | 5 ++++-
 include/uapi/linux/libc-compat.h | 6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index e863d08..b9b1b7d 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -15,16 +15,19 @@
  *	*under construction*
  */
 
-
+#if __UAPI_DEF_IN6_PKTINFO
 struct in6_pktinfo {
 	struct in6_addr	ipi6_addr;
 	int		ipi6_ifindex;
 };
+#endif
 
+#if __UAPI_DEF_IP6_MTUINFO
 struct ip6_mtuinfo {
 	struct sockaddr_in6	ip6m_addr;
 	__u32			ip6m_mtu;
 };
+#endif
 
 struct in6_ifreq {
 	struct in6_addr	ifr6_addr;
diff --git a/include/uapi/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
index e28807a..fa673e9 100644
--- a/include/uapi/linux/libc-compat.h
+++ b/include/uapi/linux/libc-compat.h
@@ -70,6 +70,8 @@
 #define __UAPI_DEF_IPV6_MREQ		0
 #define __UAPI_DEF_IPPROTO_V6		0
 #define __UAPI_DEF_IPV6_OPTIONS		0
+#define __UAPI_DEF_IN6_PKTINFO		0
+#define __UAPI_DEF_IP6_MTUINFO		0
 
 #else
 
@@ -84,6 +86,8 @@
 #define __UAPI_DEF_IPV6_MREQ		1
 #define __UAPI_DEF_IPPROTO_V6		1
 #define __UAPI_DEF_IPV6_OPTIONS		1
+#define __UAPI_DEF_IN6_PKTINFO		1
+#define __UAPI_DEF_IP6_MTUINFO		1
 
 #endif /* _NETINET_IN_H */
 
@@ -106,6 +110,8 @@
 #define __UAPI_DEF_IPV6_MREQ		1
 #define __UAPI_DEF_IPPROTO_V6		1
 #define __UAPI_DEF_IPV6_OPTIONS		1
+#define __UAPI_DEF_IN6_PKTINFO		1
+#define __UAPI_DEF_IP6_MTUINFO		1
 
 /* Definitions for xattr.h */
 #define __UAPI_DEF_XATTR		1
-- 
1.8.3.1

^ permalink raw reply related

* [Patch net-next] doc: fix the compile error of txtimestamp.c
From: Cong Wang @ 2015-01-06 23:45 UTC (permalink / raw)
  To: netdev; +Cc: carlos, vlee, davem, Cong Wang
In-Reply-To: <1420587932-8733-1-git-send-email-xiyou.wangcong@gmail.com>

Vinson reported:

  HOSTCC  Documentation/networking/timestamping/txtimestamp
Documentation/networking/timestamping/txtimestamp.c:64:8: error:
redefinition of ‘struct in6_pktinfo’
 struct in6_pktinfo {
        ^
In file included from /usr/include/arpa/inet.h:23:0,
                 from Documentation/networking/timestamping/txtimestamp.c:33:
/usr/include/netinet/in.h:456:8: note: originally defined here
 struct in6_pktinfo
        ^

After we sync with libc header, we don't need this ugly hack any more.

Reported-by: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 Documentation/networking/timestamping/txtimestamp.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/Documentation/networking/timestamping/txtimestamp.c b/Documentation/networking/timestamping/txtimestamp.c
index 876f71c..8778e68 100644
--- a/Documentation/networking/timestamping/txtimestamp.c
+++ b/Documentation/networking/timestamping/txtimestamp.c
@@ -59,14 +59,6 @@
 #include <time.h>
 #include <unistd.h>
 
-/* ugly hack to work around netinet/in.h and linux/ipv6.h conflicts */
-#ifndef in6_pktinfo
-struct in6_pktinfo {
-	struct in6_addr	ipi6_addr;
-	int		ipi6_ifindex;
-};
-#endif
-
 /* command line parameters */
 static int cfg_proto = SOCK_STREAM;
 static int cfg_ipproto = IPPROTO_TCP;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: Stephen Hemminger @ 2015-01-06 23:48 UTC (permalink / raw)
  To: Ani Sinha; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAOxq_8OGx9VgSaEimAbNZSWjihNqNBXoVg0m8EPRaNX5jLXZiw@mail.gmail.com>

On Tue, 6 Jan 2015 14:02:29 -0800
Ani Sinha <ani@arista.com> wrote:

>  route/max_size - INTEGER
> -       Maximum number of routes allowed in the kernel.  Increase
> -       this when using large numbers of interfaces and/or routes.
> +        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
> +        longer used. For ipv6, this is used to limit the maximum number of ipv6
> +        routes allowed in the kernel.  Increase this when using large
> numbers of
> +        interfaces and/or routes.

1. You used mailer with line wrap which broke the patch.

2. The spelling is not correct 'depricated'

^ permalink raw reply

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-06 23:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20150106154824.6f2c9a75@urahara>

On Tue, Jan 6, 2015 at 3:48 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Tue, 6 Jan 2015 14:02:29 -0800
> Ani Sinha <ani@arista.com> wrote:
>
>>  route/max_size - INTEGER
>> -       Maximum number of routes allowed in the kernel.  Increase
>> -       this when using large numbers of interfaces and/or routes.
>> +        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
>> +        longer used. For ipv6, this is used to limit the maximum number of ipv6
>> +        routes allowed in the kernel.  Increase this when using large
>> numbers of
>> +        interfaces and/or routes.
>
> 1. You used mailer with line wrap which broke the patch.

I used git send-email. Not sure what else to use.

>
> 2. The spelling is not correct 'depricated'

Fixed in the latest patch.

^ permalink raw reply

* [PATCH iproute2 -next] ip: route: add congestion control metric
From: Daniel Borkmann @ 2015-01-06 23:52 UTC (permalink / raw)
  To: stephen; +Cc: fw, netdev

This patch adds configuration and dumping of congestion control metric
for ip route, for example:

  ip route add <dst> dev foo congctl [lock] dctcp

Reference: http://thread.gmane.org/gmane.linux.network/344733
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/rtnetlink.h |  2 ++
 ip/iproute.c              | 24 +++++++++++++++++++++---
 man/man8/ip-route.8.in    | 19 ++++++++++++++++++-
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9aa5c2f..ac4af97 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -389,6 +389,8 @@ enum {
 #define RTAX_INITRWND RTAX_INITRWND
 	RTAX_QUICKACK,
 #define RTAX_QUICKACK RTAX_QUICKACK
+	RTAX_CC_ALGO,
+#define RTAX_CC_ALGO RTAX_CC_ALGO
 	__RTAX_MAX
 };
 
diff --git a/ip/iproute.c b/ip/iproute.c
index 5a496a9..705d4b5 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -53,6 +53,7 @@ static const char *mx_names[RTAX_MAX+1] = {
 	[RTAX_RTO_MIN]	= "rto_min",
 	[RTAX_INITRWND]	= "initrwnd",
 	[RTAX_QUICKACK]	= "quickack",
+	[RTAX_CC_ALGO]	= "congctl",
 };
 static void usage(void) __attribute__((noreturn));
 
@@ -80,8 +81,7 @@ static void usage(void)
 	fprintf(stderr, "           [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n");
 	fprintf(stderr, "           [ ssthresh NUMBER ] [ realms REALM ] [ src ADDRESS ]\n");
 	fprintf(stderr, "           [ rto_min TIME ] [ hoplimit NUMBER ] [ initrwnd NUMBER ]\n");
-	fprintf(stderr, "           [ features FEATURES ]\n");
-	fprintf(stderr, "           [ quickack BOOL ]\n");
+	fprintf(stderr, "           [ features FEATURES ] [ quickack BOOL ] [ congctl NAME ]\n");
 	fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n");
 	fprintf(stderr, "          unreachable | prohibit | blackhole | nat ]\n");
 	fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
@@ -545,10 +545,12 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 				fprintf(fp, " %s", mx_names[i]);
 			else
 				fprintf(fp, " metric %d", i);
+
 			if (mxlock & (1<<i))
 				fprintf(fp, " lock");
+			if (i != RTAX_CC_ALGO)
+				val = *(unsigned*)RTA_DATA(mxrta[i]);
 
-			val = *(unsigned*)RTA_DATA(mxrta[i]);
 			switch (i) {
 			case RTAX_FEATURES:
 				print_rtax_features(fp, val);
@@ -573,6 +575,10 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 					fprintf(fp, " %gs", val/1e3);
 				else
 					fprintf(fp, " %ums", val);
+				break;
+			case RTAX_CC_ALGO:
+				fprintf(fp, " %s", (char *)RTA_DATA(mxrta[i]));
+				break;
 			}
 		}
 	}
@@ -925,6 +931,18 @@ static int iproute_modify(int cmd, unsigned flags, int argc, char **argv)
 			if (quickack != 1 && quickack != 0)
 				invarg("\"quickack\" value should be 0 or 1\n", *argv);
 			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_QUICKACK, quickack);
+		} else if (matches(*argv, "congctl") == 0) {
+			char cc[16];
+			NEXT_ARG();
+			memset(cc, 0, sizeof(cc));
+			if (strcmp(*argv, "lock") == 0) {
+				mxlock |= (1<<RTAX_CC_ALGO);
+				NEXT_ARG();
+			}
+			strncpy(cc, *argv, sizeof(cc) - 1);
+			if (strlen(cc) == 0)
+				invarg("\"conctl\" value must be an algorithm name\n", *argv);
+			rta_addattr_l(mxrta, sizeof(mxbuf), RTAX_CC_ALGO, cc, strlen(cc));
 		} else if (matches(*argv, "rttvar") == 0) {
 			unsigned win;
 			NEXT_ARG();
diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index 89960c1..9d32e2d 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -116,7 +116,9 @@ replace " } "
 .B  features
 .IR FEATURES " ] [ "
 .B  quickack
-.IR BOOL " ]"
+.IR BOOL " ] [ "
+.B  congctl
+.IR NAME " ]"
 
 .ti -8
 .IR TYPE " := [ "
@@ -433,6 +435,21 @@ sysctl is set to 0.
 Enable or disable quick ack for connections to this destination.
 
 .TP
+.BI congctl " NAME " "(3.20+ only)"
+.TP
+.BI "congctl lock" " NAME " "(3.20+ only)"
+Sets a specific TCP congestion control algorithm only for a given destination.
+If not specified, Linux keeps the current global default TCP congestion control
+algorithm, or the one set from the application. If the modifier
+.B lock
+is not used, an application may nevertheless overwrite the suggested congestion
+control algorithm for that destination. If the modifier
+.B lock
+is used, then an application is not allowed to overwrite the specified congestion
+control algorithm for that destination, thus it will be enforced/guaranteed to
+use the proposed algorithm.
+
+.TP
 .BI advmss " NUMBER " "(2.3.15+ only)"
 the MSS ('Maximal Segment Size') to advertise to these
 destinations when establishing TCP connections.  If it is not given,
-- 
1.9.0

^ permalink raw reply related

* Re: [PATCH v8 34/50] vhost/net: virtio 1.0 byte swap
From: Alex Williamson @ 2015-01-06 23:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, David Miller, cornelia.huck, rusty, nab, pbonzini,
	thuth, dahi, kvm, virtualization, netdev
In-Reply-To: <1417449619-24896-35-git-send-email-mst@redhat.com>

On Mon, 2014-12-01 at 18:05 +0200, Michael S. Tsirkin wrote:
> I had to add an explicit tag to suppress compiler warning:
> gcc isn't smart enough to notice that
> len is always initialized since function is called with size > 0.

I'm getting a panic inside a guest when this change is applied on the
host.  I identified this patch via bisect and confirmed by reverting it
from v3.19-rc2.  Guest is centos6.  Thanks,

Alex

commit 8b38694a2dc8b18374310df50174f1e4376d6824
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Fri Oct 24 14:19:48 2014 +0300

    vhost/net: virtio 1.0 byte swap
    
    I had to add an explicit tag to suppress compiler warning:
    gcc isn't smart enough to notice that
    len is always initialized since function is called with size > 0.
    
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

XML chunk:

    <interface type='direct'>
      <mac address='52:54:00:64:f3:34'/>
      <source dev='iscsinet0' mode='bridge'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

Panic log:

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
<1>IP: [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
<4>PGD 1aa2f4067 PUD 1aa2f5067 PMD 0 
<4>Oops: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth9/ifindex
<4>CPU 0 
<4>Modules linked in: 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc igbvf nvidia(P)(U) i2c_core tg3 ptp pps_core virtio_balloon virtio_net virtio_console ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 1374, comm: NetworkManager Tainted: P           ---------------    2.6.32-431.23.3.el6.centos.plus.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
<4>RIP: 0010:[<ffffffffa0079469>]  [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
<4>RSP: 0018:ffff880028203e48  EFLAGS: 00010246
<4>RAX: ffff8801a3383d00 RBX: ffff8801a6aaf480 RCX: ffff8801aa20b6e0
<4>RDX: 00000000000000c0 RSI: ffff8801a3383c00 RDI: ffff8801a3383cc0
<4>RBP: ffff880028203ed8 R08: 000000000000009e R09: ffff8801aa1d800c
<4>R10: 0000000000000218 R11: 0000000000000000 R12: ffff8801aa20b6e0
<4>R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>FS:  00007febf114d800(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000000010 CR3: 00000001aa793000 CR4: 00000000000006f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process NetworkManager (pid: 1374, threadinfo ffff8801a74ba000, task ffff8801a8d56040)
<4>Stack:
<4> ffff8801aa1d8000 000000000000009e ffff8801aa20b6e0 ffff8801aa20b718
<4><d> ffff8801aa20b780 ffff8801aa1d800c ffff8801a6aaf4b8 ffff8801aa20b020
<4><d> 0000000000000080 ffff8801aa20b708 0000000000000001 00001f5981a830c8
<4>Call Trace:
<4> <IRQ> 
<4> [<ffffffff8146ae33>] net_rx_action+0x103/0x2f0
<4> [<ffffffff8107a5f1>] __do_softirq+0xc1/0x1e0
<4> [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
<4> [<ffffffff8100c30c>] call_softirq+0x1c/0x30
<4> <EOI> 
<4> [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
<4> [<ffffffff8107b2ea>] local_bh_enable+0x9a/0xb0
<4> [<ffffffffa007813a>] virtnet_napi_enable+0x4a/0x60 [virtio_net]
<4> [<ffffffffa0078ebf>] virtnet_open+0x4f/0x60 [virtio_net]
<4> [<ffffffff81467691>] dev_open+0xa1/0x100
<4> [<ffffffff81466751>] dev_change_flags+0xa1/0x1d0
<4> [<ffffffff81474a59>] do_setlink+0x169/0x8b0
<4> [<ffffffff814770b6>] ? rtnl_fill_ifinfo+0x946/0xcb0
<4> [<ffffffff812a3d24>] ? nla_parse+0x34/0x110
<4> [<ffffffff8147659e>] rtnl_setlink+0xee/0x130
<4> [<ffffffff81475b67>] rtnetlink_rcv_msg+0x2d7/0x340
<4> [<ffffffff81231e14>] ? socket_has_perm+0x74/0x90
<4> [<ffffffff81475890>] ? rtnetlink_rcv_msg+0x0/0x340
<4> [<ffffffff814910a9>] netlink_rcv_skb+0xa9/0xd0
<4> [<ffffffff81475875>] rtnetlink_rcv+0x25/0x40
<4> [<ffffffff81490cdb>] netlink_unicast+0x2db/0x320
<4> [<ffffffff81491750>] netlink_sendmsg+0x2c0/0x3d0
<4> [<ffffffff814520c3>] sock_sendmsg+0x123/0x150
<4> [<ffffffff81453d73>] ? sock_recvmsg+0x133/0x160
<4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffff81136941>] ? lru_cache_add_lru+0x21/0x40
<4> [<ffffffff8115522d>] ? page_add_new_anon_rmap+0x9d/0xf0
<4> [<ffffffff8114aeef>] ? handle_pte_fault+0x4af/0xb00
<4> [<ffffffff81451f14>] ? move_addr_to_kernel+0x64/0x70
<4> [<ffffffff814538b6>] __sys_sendmsg+0x406/0x420
<4> [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff814523d9>] ? sys_sendto+0x139/0x190
<4> [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
<4> [<ffffffff81453ad9>] sys_sendmsg+0x49/0x90
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 83 e0 00 00 00 00 10 00 00 48 03 93 d0 00 00 00 66 83 42 04 01 8b 93 cc 00 00 00 48 8b b3 d0 00 00 00 80 4c 16 10 20 44 2b 68 0c <4d> 8b 76 10 75 89 e9 d1 fd ff ff 0f 1f 40 00 a8 02 74 0d 0f b6 
<1>RIP  [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
<4> RSP <ffff880028203e48>
<4>CR2: 0000000000000010

^ permalink raw reply

* Re: Does the ordering of the fib_table_dump or /proc/net/fib_trie matter?
From: Alexander Duyck @ 2015-01-07  0:58 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev
In-Reply-To: <20150106.165822.1294578064447416624.davem@davemloft.net>

On 01/06/2015 01:58 PM, David Miller wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Tue, 06 Jan 2015 12:30:06 -0800
>
>> The question I have is if that would screw up any user-space apps.  I
>> know ip route can dump the list via "ip route show".  I'm just wondering
>> if there would be any problem with default being the last entry instead
>> of the first entry?
> The ordering already changed once when we went from fib_hash to
> fib_trie, nobody should depend upon the ordering.

Okay good to hear.  I kind of thought that was the case, but I wanted to
make sure before I went too far down this rabbit hole.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH iproute2 -next] ip: route: add congestion control metric
From: Stephen Hemminger @ 2015-01-07  1:09 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: fw, netdev
In-Reply-To: <1420588357-17665-1-git-send-email-dborkman@redhat.com>

On Wed,  7 Jan 2015 00:52:37 +0100
Daniel Borkmann <dborkman@redhat.com> wrote:

> +		} else if (matches(*argv, "congctl") == 0) {
> +			char cc[16];
> +			NEXT_ARG();
> +			memset(cc, 0, sizeof(cc));
> +			if (strcmp(*argv, "lock") == 0) {
> +				mxlock |= (1<<RTAX_CC_ALGO);

Unneeded paren

> +				NEXT_ARG();
> +			}
> +			strncpy(cc, *argv, sizeof(cc) - 1);
> +			if (strlen(cc) == 0)
> +				invarg("\"conctl\" value must be an algorithm name\n", *argv

Silently truncating the string is not odd. Can't we just let kernel impose
length restrictions.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox