Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v4 5/5] net: stmmac: Register parent MDIO in case of fake mdio-mux
From: Corentin Labbe @ 2017-08-26  7:33 UTC (permalink / raw)
  To: robh+dt, mark.rutland, maxime.ripard, wens, linux,
	peppe.cavallaro, alexandre.torgue, andrew, f.fainelli
  Cc: icenowy, netdev, devicetree, linux-arm-kernel, linux-kernel,
	Corentin Labbe
In-Reply-To: <20170826073311.25612-1-clabbe.montjoie@gmail.com>

In case of a fake MDIO switch/mux (like Allwinner H3),
the registered MDIO node should be the parent of the PHY.
Otherwise of_phy_connect will fail.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index e1be5735365b..4d5f3cc82476 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -312,10 +312,12 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
 	static const struct of_device_id need_mdio_ids[] = {
 		{ .compatible = "snps,dwc-qos-ethernet-4.10" },
 		{ .compatible = "allwinner,sun8i-a83t-emac" },
-		{ .compatible = "allwinner,sun8i-h3-emac" },
 		{ .compatible = "allwinner,sun8i-v3s-emac" },
 		{ .compatible = "allwinner,sun50i-a64-emac" },
 	};
+	static const struct of_device_id register_parent_mdio_ids[] = {
+		{ .compatible = "allwinner,sun8i-h3-emac" },
+	};
 
 	/* If phy-handle property is passed from DT, use it as the PHY */
 	plat->phy_node = of_parse_phandle(np, "phy-handle", 0);
@@ -332,7 +334,14 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
 		mdio = false;
 	}
 
-	if (of_match_node(need_mdio_ids, np) && !of_phy_is_fixed_link(np)) {
+	/*
+	 * In case of a fake MDIO switch/mux (like Allwinner H3),
+	 * the registered MDIO node should be the parent of the PHY.
+	 * Otherwise of_phy_connect will fail.
+	 */
+	if (of_match_node(register_parent_mdio_ids, np) && !of_phy_is_fixed_link(np)) {
+		plat->mdio_node =  of_get_parent(plat->phy_node);
+	} else if (of_match_node(need_mdio_ids, np) && !of_phy_is_fixed_link(np)) {
 		plat->mdio_node = of_get_child_by_name(np, "mdio");
 	} else {
 		/**
-- 
2.13.5

^ permalink raw reply related

* [PATCH v4 3/5] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY
From: Corentin Labbe @ 2017-08-26  7:33 UTC (permalink / raw)
  To: robh+dt, mark.rutland, maxime.ripard, wens, linux,
	peppe.cavallaro, alexandre.torgue, andrew, f.fainelli
  Cc: icenowy, netdev, devicetree, linux-arm-kernel, linux-kernel,
	Corentin Labbe
In-Reply-To: <20170826073311.25612-1-clabbe.montjoie@gmail.com>

This patch add documentation about the MDIO switch used on sun8i-h3-emac
for integrated PHY.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 .../devicetree/bindings/net/dwmac-sun8i.txt        | 117 ++++++++++++++++++++-
 1 file changed, 112 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 725f3b187886..5751f7afc5dd 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -39,7 +39,7 @@ Optional properties for the following compatibles:
 - allwinner,leds-active-low: EPHY LEDs are active low
 
 Required child node of emac:
-- mdio bus node: should be named mdio
+- mdio bus node: should be labelled mdio
 
 Required properties of the mdio node:
 - #address-cells: shall be 1
@@ -48,14 +48,25 @@ Required properties of the mdio node:
 The device node referenced by "phy" or "phy-handle" should be a child node
 of the mdio node. See phy.txt for the generic PHY bindings.
 
-Required properties of the phy node with the following compatibles:
+The following compatibles require an mdio-mux node:
+  - "allwinner,sun8i-h3-emac"
+Required properties for the mdio-mux node:
+  - compatible = "mdio-mux"
+  - two child mdio, one for the integrated mdio, one for the external mdio
+  - mdio-parent-bus: a phandle to the emac's MDIO node
+
+The following compatibles require a PHY node representing the integrated
+PHY, under the integrated MDIO bus node if an mdio-mux node is used:
   - "allwinner,sun8i-h3-emac",
   - "allwinner,sun8i-v3s-emac":
+
+Required properties of the integrated phy node:
 - clocks: a phandle to the reference clock for the EPHY
 - resets: a phandle to the reset control for the EPHY
+- phy-is-integrated
+- Should be a child of the integrated mdio
 
-Example:
-
+Example with integrated PHY:
 emac: ethernet@1c0b000 {
 	compatible = "allwinner,sun8i-h3-emac";
 	syscon = <&syscon>;
@@ -72,13 +83,109 @@ emac: ethernet@1c0b000 {
 	phy-handle = <&int_mii_phy>;
 	phy-mode = "mii";
 	allwinner,leds-active-low;
-	mdio: mdio {
+
+	mdio0: mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "snps,dwmac-mdio";
+	};
+
+};
+eth-phy-mux {
+	compatible = "mdio-mux";
+	#address-cells = <1>;
+	#size-cells = <0>;
+	mdio-parent-bus = <&mdio0>;
+
+	int_mdio: mdio@1 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		int_mii_phy: ethernet-phy@1 {
+			reg = <1>;
+			clocks = <&ccu CLK_BUS_EPHY>;
+			resets = <&ccu RST_BUS_EPHY>;
+			phy-is-integrated
+		};
+	};
+	ext_mdio: mdio@0 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+	};
+};
+
+Example with external PHY:
+emac: ethernet@1c0b000 {
+	compatible = "allwinner,sun8i-h3-emac";
+	syscon = <&syscon>;
+	reg = <0x01c0b000 0x104>;
+	interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
+	interrupt-names = "macirq";
+	resets = <&ccu RST_BUS_EMAC>;
+	reset-names = "stmmaceth";
+	clocks = <&ccu CLK_BUS_EMAC>;
+	clock-names = "stmmaceth";
+	#address-cells = <1>;
+	#size-cells = <0>;
+
+	phy-handle = <&ext_rgmii_phy>;
+	phy-mode = "rgmii";
+	allwinner,leds-active-low;
+
+	mdio0: mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "snps,dwmac-mdio";
+	};
+
+};
+eth-phy-mux {
+	compatible = "mdio-mux";
+	#address-cells = <1>;
+	#size-cells = <0>;
+	mdio-parent-bus = <&mdio0>;
+
+	int_mdio: mdio@1 {
 		#address-cells = <1>;
 		#size-cells = <0>;
 		int_mii_phy: ethernet-phy@1 {
 			reg = <1>;
 			clocks = <&ccu CLK_BUS_EPHY>;
 			resets = <&ccu RST_BUS_EPHY>;
+			phy-is-integrated
+		};
+	};
+	ext_mdio: mdio@0 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		ext_rgmii_phy: ethernet-phy@1 {
+			reg = <1>;
+		};
+	};
+};
+
+Example with SoC without integrated PHY
+
+emac: ethernet@1c0b000 {
+	compatible = "allwinner,sun8i-a83t-emac";
+	syscon = <&syscon>;
+	reg = <0x01c0b000 0x104>;
+	interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
+	interrupt-names = "macirq";
+	resets = <&ccu RST_BUS_EMAC>;
+	reset-names = "stmmaceth";
+	clocks = <&ccu CLK_BUS_EMAC>;
+	clock-names = "stmmaceth";
+	#address-cells = <1>;
+	#size-cells = <0>;
+
+	phy-handle = <&ext_rgmii_phy>;
+	phy-mode = "rgmii";
+
+	mdio: mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		ext_rgmii_phy: ethernet-phy@1 {
+			reg = <1>;
 		};
 	};
 };
-- 
2.13.5

^ permalink raw reply related

* [PATCH] netfilter: ipv4: nf_defrag: constify nf_hook_ops
From: Arvind Yadav @ 2017-08-26 10:41 UTC (permalink / raw)
  To: pablo, kadlec, fw, davem, kuznet, yoshfuji
  Cc: linux-kernel, coreteam, netfilter-devel, netdev

nf_hook_ops are not supposed to change at runtime. nf_register_net_hooks
and nf_unregister_net_hooks are working with const nf_hook_ops.
So mark the non-const nf_hook_ops structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 net/ipv4/netfilter/nf_defrag_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 346bf7c..37fe1616 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -90,7 +90,7 @@ static unsigned int ipv4_conntrack_defrag(void *priv,
 	return NF_ACCEPT;
 }
 
-static struct nf_hook_ops ipv4_defrag_ops[] = {
+static const struct nf_hook_ops ipv4_defrag_ops[] = {
 	{
 		.hook		= ipv4_conntrack_defrag,
 		.pf		= NFPROTO_IPV4,
-- 
2.7.4


^ permalink raw reply related

* [PATCH] netfilter: ipv6: nf_defrag: constify nf_hook_ops
From: Arvind Yadav @ 2017-08-26 10:42 UTC (permalink / raw)
  To: pablo, kadlec, fw, davem, kuznet, yoshfuji
  Cc: linux-kernel, coreteam, netfilter-devel, netdev

nf_hook_ops are not supposed to change at runtime. nf_register_net_hooks
and nf_unregister_net_hooks are working with const nf_hook_ops.
So mark the non-const nf_hook_ops structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index ada60d1..b326da5 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -74,7 +74,7 @@ static unsigned int ipv6_defrag(void *priv,
 	return err == 0 ? NF_ACCEPT : NF_DROP;
 }
 
-static struct nf_hook_ops ipv6_defrag_ops[] = {
+static const struct nf_hook_ops ipv6_defrag_ops[] = {
 	{
 		.hook		= ipv6_defrag,
 		.pf		= NFPROTO_IPV6,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] netfilter: ipv4: nf_defrag: constify nf_hook_ops
From: Florian Westphal @ 2017-08-26 10:45 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: pablo, kadlec, fw, davem, kuznet, yoshfuji, linux-kernel,
	coreteam, netfilter-devel, netdev
In-Reply-To: <7859bbf3c48bf4dc16bc3e55067e01f72aceb71f.1503743850.git.arvind.yadav.cs@gmail.com>

Arvind Yadav <arvind.yadav.cs@gmail.com> wrote:
> nf_hook_ops are not supposed to change at runtime. nf_register_net_hooks
> and nf_unregister_net_hooks are working with const nf_hook_ops.
> So mark the non-const nf_hook_ops structs as const.

please update your nf-next tree, all nf_hook_ops are suppoed
to be const already.

^ permalink raw reply

* [PATCH net] ipv6: set dst.obsolete when a cached route has expired
From: Xin Long @ 2017-08-26 12:10 UTC (permalink / raw)
  To: network dev; +Cc: davem, hannes

Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.

The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.

But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.

This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.

Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_fib.c | 4 +++-
 net/ipv6/route.c   | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index a5ebf86..18567b8 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1792,8 +1792,10 @@ static int fib6_age(struct rt6_info *rt, void *arg)
 		}
 		gc_args->more++;
 	} else if (rt->rt6i_flags & RTF_CACHE) {
+		if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout))
+			rt->dst.obsolete = DST_OBSOLETE_KILL;
 		if (atomic_read(&rt->dst.__refcnt) == 1 &&
-		    time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
+		    rt->dst.obsolete == DST_OBSOLETE_KILL) {
 			RT6_TRACE("aging clone %p\n", rt);
 			return -1;
 		} else if (rt->rt6i_flags & RTF_GATEWAY) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 48c8c92..7c634b6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -440,7 +440,8 @@ static bool rt6_check_expired(const struct rt6_info *rt)
 		if (time_after(jiffies, rt->dst.expires))
 			return true;
 	} else if (rt->dst.from) {
-		return rt6_check_expired((struct rt6_info *) rt->dst.from);
+		return rt->dst.obsolete != DST_OBSOLETE_FORCE_CHK ||
+		       rt6_check_expired((struct rt6_info *)rt->dst.from);
 	}
 	return false;
 }
-- 
2.1.0

^ permalink raw reply related

* Re: UDP sockets oddities
From: Eric Dumazet @ 2017-08-26 12:47 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, netdev, pabeni, willemb
In-Reply-To: <20170825.211905.920493778125075310.davem@davemloft.net>

On Fri, 2017-08-25 at 21:19 -0700, David Miller wrote:

> Agreed, but the ARP resolution queue really needs to scale it's backlog
> to the physical technology it is attached to.
Yes, last time (in 2011) we increased the old limit of 3 packets :/

We probably should match sysctl_wmem_max so that a single socket
provider would hit its sk_sndbuf limit

Something like :

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6b0bc0f715346a097a6df46e2ba2771359abcd23..7777dceb78107c0019fb39d5b69be1959005b78e 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -109,7 +109,8 @@ neigh/default/unres_qlen_bytes - INTEGER
 	queued for each	unresolved address by other network layers.
 	(added in linux 3.3)
 	Setting negative value is meaningless and will return error.
-	Default: 65536 Bytes(64KB)
+	Default: SK_WMEM_MAX, enough to store 256 packets of medium size
+		 (less than 256 bytes per packet)
 
 neigh/default/unres_qlen - INTEGER
 	The maximum number of packets which may be queued for each
diff --git a/include/net/sock.h b/include/net/sock.h
index 1c2912d433e81b10f3fdc87bcfcbb091570edc03..03a362568357acc7278a318423dd3873103f90ca 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2368,6 +2368,16 @@ bool sk_net_capable(const struct sock *sk, int cap);
 
 void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
 
+/* Take into consideration the size of the struct sk_buff overhead in the
+ * determination of these values, since that is non-constant across
+ * platforms.  This makes socket queueing behavior and performance
+ * not depend upon such differences.
+ */
+#define _SK_MEM_PACKETS		256
+#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
+#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+
 extern __u32 sysctl_wmem_max;
 extern __u32 sysctl_rmem_max;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index dfdd14cac775e9bfcee0085ee32ffcd0ab28b67b..9b7b6bbb2a23e7652a1f34a305f29d49de00bc8c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -307,16 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
 static struct lock_class_key af_elock_keys[AF_MAX];
 static struct lock_class_key af_kern_callback_keys[AF_MAX];
 
-/* Take into consideration the size of the struct sk_buff overhead in the
- * determination of these values, since that is non-constant across
- * platforms.  This makes socket queueing behavior and performance
- * not depend upon such differences.
- */
-#define _SK_MEM_PACKETS		256
-#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
-#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
-#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
-
 /* Run time adjustable parameters. */
 __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
 EXPORT_SYMBOL(sysctl_wmem_max);
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 21dedf6fd0f76dec22b2b3685beb89cfefea7ded..22bf0b95d6edc3c27ef3a99d27cb70a1551e3e0e 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -94,7 +94,7 @@ struct neigh_table dn_neigh_table = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64*1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 0,
 			[NEIGH_VAR_ANYCAST_DELAY] = 0,
 			[NEIGH_VAR_PROXY_DELAY] = 0,
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 8b52179ddc6e54eabf6d3c2ed0132083228680bb..7c45b8896709815c5dde5972fd57cb5c3bcb2648 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 64,
 			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
 			[NEIGH_VAR_PROXY_DELAY]	= (8 * HZ) / 10,
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 5e338eb89509b1df6ebd060f8bd19fcb4b86fe05..266a530414d7be4f1e7be922e465bbab46f7cbac 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -127,7 +127,7 @@ struct neigh_table nd_tbl = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = ND_REACHABLE_TIME,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 64,
 			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
 			[NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10,

^ permalink raw reply related

* Re: [PATCH 1/4] sgiseeq: switch to dma_alloc_attrs
From: Ralf Baechle @ 2017-08-26 13:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: netdev, David S. Miller, linux-mips, linux-parisc, linux-kernel
In-Reply-To: <20170826072125.9790-2-hch@lst.de>

On Sat, Aug 26, 2017 at 09:21:22AM +0200, Christoph Hellwig wrote:

Looks good,

Acked-by: Ralf Baechle <ralf@linux-mips.org>

  Ralf

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] tcp_diag: report TCP MD5 signing keys and addresses
From: Eric Dumazet @ 2017-08-26 13:08 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: David Miller, netdev
In-Reply-To: <20170826055348.jjzv3n5vao34mkgb@ycc.fr>

On Sat, 2017-08-26 at 07:53 +0200, Ivan Delalande wrote:

> 
> Sorry, I probably should have detailed my changes. I tried to address
> this by locking the whole socket in the caller, tcp_diag_get_aux, just
> outside of the rcu_read_lock. Would this work here, or do you see a
> better way?
> 

locking the socket is problematic.

It is already done in tcp_get_info() since linux-4.10 and unfortunately
it added unreasonable stall when a socket is flooded with tiny SACK
messages (socket backlog is huge)

People are now making tcp_rmem and tcp_wmem much bigger to allow BBR
flows to reach line rate on very long distance communications.

We are working to make tcp_rack_mark_lost() not having O(N) behavior,
but it is not done yet.

I would stick to RCU, but add sanity checks, so that _if_ the list is
different on the second RCU list traversal, you make sure :

1) We do not try to put more data in the reserved space

2) We memset(ptr, 0, remaining) the remaining space if we found less
entries in the 2nd loop.

^ permalink raw reply

* Re: [PATCH 3/4] i825xx: switch to switch to dma_alloc_attrs
From: Ralf Baechle @ 2017-08-26 13:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: netdev, David S. Miller, linux-mips, linux-parisc, linux-kernel
In-Reply-To: <20170826072125.9790-4-hch@lst.de>

On Sat, Aug 26, 2017 at 09:21:24AM +0200, Christoph Hellwig wrote:

Adding Thomas Bogendoerfer <tsbogend@alpha.franken.de>, the author of
sni_82596.c to cc.

> This way we can always pass DMA_ATTR_NON_CONSISTENT, the SNI mips version
> will simply ignore the flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/net/ethernet/i825xx/lasi_82596.c | 6 ++----
>  drivers/net/ethernet/i825xx/lib82596.c   | 9 +++++----
>  drivers/net/ethernet/i825xx/sni_82596.c  | 6 ++----
>  3 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
> index d787fdd5db7b..d5b5021aa759 100644
> --- a/drivers/net/ethernet/i825xx/lasi_82596.c
> +++ b/drivers/net/ethernet/i825xx/lasi_82596.c
> @@ -96,8 +96,6 @@
>  
>  #define OPT_SWAP_PORT	0x0001	/* Need to wordswp on the MPU port */
>  
> -#define DMA_ALLOC                        dma_alloc_noncoherent
> -#define DMA_FREE                         dma_free_noncoherent
>  #define DMA_WBACK(ndev, addr, len) \
>  	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)
>  
> @@ -200,8 +198,8 @@ static int lan_remove_chip(struct parisc_device *pdev)
>  	struct i596_private *lp = netdev_priv(dev);
>  
>  	unregister_netdev (dev);
> -	DMA_FREE(&pdev->dev, sizeof(struct i596_private),
> -		 (void *)lp->dma, lp->dma_addr);
> +	dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
> +		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
>  	free_netdev (dev);
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c
> index 8449c58f01fd..f00a1dc2128c 100644
> --- a/drivers/net/ethernet/i825xx/lib82596.c
> +++ b/drivers/net/ethernet/i825xx/lib82596.c
> @@ -1063,8 +1063,9 @@ static int i82596_probe(struct net_device *dev)
>  	if (!dev->base_addr || !dev->irq)
>  		return -ENODEV;
>  
> -	dma = (struct i596_dma *) DMA_ALLOC(dev->dev.parent,
> -		sizeof(struct i596_dma), &lp->dma_addr, GFP_KERNEL);
> +	dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
> +			      &lp->dma_addr, GFP_KERNEL,
> +			      DMA_ATTR_NON_CONSISTENT);
>  	if (!dma) {
>  		printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
>  		return -ENOMEM;
> @@ -1085,8 +1086,8 @@ static int i82596_probe(struct net_device *dev)
>  
>  	i = register_netdev(dev);
>  	if (i) {
> -		DMA_FREE(dev->dev.parent, sizeof(struct i596_dma),
> -				    (void *)dma, lp->dma_addr);
> +		dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
> +			       dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
>  		return i;
>  	}
>  
> diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
> index 2af7f77345fb..b2c04a789744 100644
> --- a/drivers/net/ethernet/i825xx/sni_82596.c
> +++ b/drivers/net/ethernet/i825xx/sni_82596.c
> @@ -23,8 +23,6 @@
>  
>  static const char sni_82596_string[] = "snirm_82596";
>  
> -#define DMA_ALLOC                      dma_alloc_coherent
> -#define DMA_FREE                       dma_free_coherent
>  #define DMA_WBACK(priv, addr, len)     do { } while (0)
>  #define DMA_INV(priv, addr, len)       do { } while (0)
>  #define DMA_WBACK_INV(priv, addr, len) do { } while (0)
> @@ -152,8 +150,8 @@ static int sni_82596_driver_remove(struct platform_device *pdev)
>  	struct i596_private *lp = netdev_priv(dev);
>  
>  	unregister_netdev(dev);
> -	DMA_FREE(dev->dev.parent, sizeof(struct i596_private),
> -		 lp->dma, lp->dma_addr);
> +	dma_free_attrs(dev->dev.parent, sizeof(struct i596_private), lp->dma,
> +		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
>  	iounmap(lp->ca);
>  	iounmap(lp->mpu_port);
>  	free_netdev (dev);
> -- 
> 2.11.0

^ permalink raw reply

* [PATCH] sni_82596: Add Thomas' email address to driver.
From: Ralf Baechle @ 2017-08-26 13:15 UTC (permalink / raw)
  To: David S. Miller, Thomas Bogendoerfer, Christoph Hellwig
  Cc: netdev, linux-mips

---
Reviewing Christoph's DMA patch I noticed Thomas' email address was missing
from the entire driver file.

 drivers/net/ethernet/i825xx/sni_82596.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
index 2af7f77345fb..38e0e16c9bfb 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -39,7 +39,7 @@ static const char sni_82596_string[] = "snirm_82596";
 
 #include "lib82596.c"
 
-MODULE_AUTHOR("Thomas Bogendoerfer");
+MODULE_AUTHOR("Thomas Bogendoerfer <tsbogend@alpha.franken.de>");
 MODULE_DESCRIPTION("i82596 driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("platform:snirm_82596");

^ permalink raw reply related

* [PATCH net-next v2] sched: sfq: drop packets after root qdisc lock is released
From: gfree.wind @ 2017-08-26 14:58 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, edumazet, netdev; +Cc: Gao Feng

From: Gao Feng <gfree.wind@vip.163.com>

The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. But there are
some points which are not changed in SFQ enqueue operation.
1. Fail to find the SFQ hash slot;
2. When the queue is full;

Now use qdisc_drop instead free skb directly.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 v2: Add the to_free in the sfq_change, per Eric
 v1: initial version

 net/sched/sch_sfq.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 82469ef..1896a8c 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -292,7 +292,7 @@ static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
 	slot->skblist_prev = skb;
 }
 
-static unsigned int sfq_drop(struct Qdisc *sch)
+static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	sfq_index x, d = q->cur_depth;
@@ -310,9 +310,8 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 		slot->backlog -= len;
 		sfq_dec(q, x);
 		sch->q.qlen--;
-		qdisc_qstats_drop(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
-		kfree_skb(skb);
+		qdisc_drop(skb, sch, to_free);
 		return len;
 	}
 
@@ -360,7 +359,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
 	if (hash == 0) {
 		if (ret & __NET_XMIT_BYPASS)
 			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		__qdisc_drop(skb, to_free);
 		return ret;
 	}
 	hash--;
@@ -465,7 +464,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
 		return NET_XMIT_SUCCESS;
 
 	qlen = slot->qlen;
-	dropped = sfq_drop(sch);
+	dropped = sfq_drop(sch, to_free);
 	/* Return Congestion Notification only if we dropped a packet
 	 * from this flow.
 	 */
@@ -628,6 +627,8 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 	struct tc_sfq_qopt_v1 *ctl_v1 = NULL;
 	unsigned int qlen, dropped = 0;
 	struct red_parms *p = NULL;
+	struct sk_buff *to_free = NULL;
+	struct sk_buff *tail = NULL;
 
 	if (opt->nla_len < nla_attr_size(sizeof(*ctl)))
 		return -EINVAL;
@@ -674,8 +675,13 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	qlen = sch->q.qlen;
-	while (sch->q.qlen > q->limit)
-		dropped += sfq_drop(sch);
+	while (sch->q.qlen > q->limit) {
+		dropped += sfq_drop(sch, &to_free);
+		if (!tail)
+			tail = to_free;
+	}
+
+	rtnl_kfree_skbs(to_free, tail);
 	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
 
 	del_timer(&q->perturb_timer);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 0/4] xfrm_user info leaks
From: Mathias Krause @ 2017-08-26 15:08 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Herbert Xu; +Cc: netdev, Mathias Krause

Hi David, Steffen,

the following series fixes a few info leaks due to missing padding byte
initialization in the xfrm_user netlink interface.

Please apply!

Mathias Krause (4):
  xfrm_user: fix info leak in copy_user_offload()
  xfrm_user: fix info leak in xfrm_notify_sa()
  xfrm_user: fix info leak in build_expire()
  xfrm_user: fix info leak in build_aevent()

 net/xfrm/xfrm_user.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH net 1/4] xfrm_user: fix info leak in copy_user_offload()
From: Mathias Krause @ 2017-08-26 15:08 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Herbert Xu; +Cc: netdev, Mathias Krause
In-Reply-To: <1503760140-9095-1-git-send-email-minipli@googlemail.com>

The memory reserved to dump the xfrm offload state includes padding
bytes of struct xfrm_user_offload added by the compiler for alignment.
Add an explicit memset(0) before filling the buffer to avoid the heap
info leak.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
 net/xfrm/xfrm_user.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2be4c6af008a..3259555ae7d7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -796,7 +796,7 @@ static int copy_user_offload(struct xfrm_state_offload *xso, struct sk_buff *skb
 		return -EMSGSIZE;

 	xuo = nla_data(attr);
-
+	memset(xuo, 0, sizeof(*xuo));
 	xuo->ifindex = xso->dev->ifindex;
 	xuo->flags = xso->flags;

-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net 2/4] xfrm_user: fix info leak in xfrm_notify_sa()
From: Mathias Krause @ 2017-08-26 15:08 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Herbert Xu; +Cc: netdev, Mathias Krause
In-Reply-To: <1503760140-9095-1-git-send-email-minipli@googlemail.com>

The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
 net/xfrm/xfrm_user.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3259555ae7d7..c33516ef52f2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2715,6 +2715,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 		struct nlattr *attr;

 		id = nlmsg_data(nlh);
+		memset(id, 0, sizeof(*id));
 		memcpy(&id->daddr, &x->id.daddr, sizeof(id->daddr));
 		id->spi = x->id.spi;
 		id->family = x->props.family;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net 3/4] xfrm_user: fix info leak in build_expire()
From: Mathias Krause @ 2017-08-26 15:08 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Herbert Xu; +Cc: netdev, Mathias Krause
In-Reply-To: <1503760140-9095-1-git-send-email-minipli@googlemail.com>

The memory reserved to dump the expired xfrm state includes padding
bytes in struct xfrm_user_expire added by the compiler for alignment. To
prevent the heap info leak, memset(0) the remainder of the struct.
Initializing the whole structure isn't needed as copy_to_user_state()
already takes care of clearing the padding bytes within the 'state'
member.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
 net/xfrm/xfrm_user.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index c33516ef52f2..2cbdc81610c6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2578,6 +2578,8 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
 	ue = nlmsg_data(nlh);
 	copy_to_user_state(x, &ue->state);
 	ue->hard = (c->data.hard != 0) ? 1 : 0;
+	/* clear the padding bytes */
+	memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));

 	err = xfrm_mark_put(skb, &x->mark);
 	if (err)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net 4/4] xfrm_user: fix info leak in build_aevent()
From: Mathias Krause @ 2017-08-26 15:09 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Herbert Xu
  Cc: netdev, Mathias Krause, Jamal Hadi Salim
In-Reply-To: <1503760140-9095-1-git-send-email-minipli@googlemail.com>

The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Fixes: d51d081d6504 ("[IPSEC]: Sync series - user")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
 net/xfrm/xfrm_user.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2cbdc81610c6..9391ced05259 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1869,6 +1869,7 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
 		return -EMSGSIZE;

 	id = nlmsg_data(nlh);
+	memset(&id->sa_id, 0, sizeof(id->sa_id));
 	memcpy(&id->sa_id.daddr, &x->id.daddr, sizeof(x->id.daddr));
 	id->sa_id.spi = x->id.spi;
 	id->sa_id.family = x->props.family;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net 0/4] xfrm_user info leaks
From: Joe Perches @ 2017-08-26 15:58 UTC (permalink / raw)
  To: Mathias Krause, Steffen Klassert, David S. Miller, Herbert Xu; +Cc: netdev
In-Reply-To: <1503760140-9095-1-git-send-email-minipli@googlemail.com>

On Sat, 2017-08-26 at 17:08 +0200, Mathias Krause wrote:
> Hi David, Steffen,
> 
> the following series fixes a few info leaks due to missing padding byte
> initialization in the xfrm_user netlink interface.

Were these found by inspection or by some tool?
If by tool, perhaps there are other _to_user cases?

^ permalink raw reply

* Re: mlxsw and rtnl lock
From: Ido Schimmel @ 2017-08-26 17:04 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Pirko, netdev@vger.kernel.org, mlxsw
In-Reply-To: <dccefbeb-b5c3-14ed-05d3-07d464989708@gmail.com>

On Fri, Aug 25, 2017 at 01:26:07PM -0700, David Ahern wrote:
> Jiri / Ido:
> 
> I was looking at the mlxsw driver and the places it holds the rtnl lock.
> There are a lot of them and from an admittedly short review it seems
> like the rtnl is protecting changes to mlxsw data structures as opposed
> to calling into the core networking stack. This is going to have huge
> impacts on scalability when both the kernel programming (user changes)
> and the hardware programming require the rtnl.

I'm aware of the problem and I intend to look into it. When we started
working on mlxsw about two years ago all the operations were serialized
by rtnl_lock, so when we had to add processing in a different context,
we ended up taking the same lock to protect against changes. But it can
impact scalability as you mentioned.

> With regards to the FIB notifier, why add the fib events to a work queue
> that is processed asynchronously if processing the work queue requires
> the rtnl lock? What is gained by deferring the work since a major side
> effect of the work queue is the loss of error propagation back to the
> user on the a failure. That is, if the FIB add/replace/append fails in
> the h/w for any reason, offload is silently aborted (an entry in the
> kernel log is still a silent abort).

FIB events are received in an atomic context and therefore must be
deferred to a workqueue. The chain was initially blocking, but this had
to change in commit d3f706f68e2f ("ipv4: fib: Convert FIB notification
chain to be atomic") to support dumping of IPv4 routes under RCU. IPv6
events are always sent in an atomic context.

Regarding the silent abort, that's intentional. You can look at the same
code in v4.9 - when the chain was still blocking - and you'll see that
we didn't propagate the error even then. This was discussed in the past
and the conclusion was that user doesn't expect to operation to fail. If
hardware resources are exceeded, we let the kernel take care of the
forwarding instead.

^ permalink raw reply

* Transaction
From: Nora Stanley @ 2017-08-26 18:10 UTC (permalink / raw)


Attention:

Greetings from Miss Nora Stanley, A banker who is assigned by the
Togolaise ministry of finance and inheritance funds reconciliation
Forum to represent you in the release of your assigned inheritance
funds with the ORA BANK TOGO.

I want to inform you that the ministry of finance and the inheritance
funds reconciliation forum in conjuction with the ORA BANK TOGO has
agreed to wire USD$ 7,500.000.00 (Seven Million, Five Hundred Thousand
United States Dollars Only) get in touch with me by my private email
immediately: (norasexyone@gmail.com)for more details.

Warmest Regards

A Banker: Nora Stanley

^ permalink raw reply

* Re: UDP sockets oddities
From: Florian Fainelli @ 2017-08-26 18:56 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, pabeni, willemb
In-Reply-To: <1503751671.11498.25.camel@edumazet-glaptop3.roam.corp.google.com>



On 08/26/2017 05:47 AM, Eric Dumazet wrote:
> On Fri, 2017-08-25 at 21:19 -0700, David Miller wrote:
> 
>> Agreed, but the ARP resolution queue really needs to scale it's backlog
>> to the physical technology it is attached to.
> Yes, last time (in 2011) we increased the old limit of 3 packets :/
> 
> We probably should match sysctl_wmem_max so that a single socket
> provider would hit its sk_sndbuf limit

Before:
/proc/sys/net/ipv4/neigh/eth0/unres_qlen:34
/proc/sys/net/ipv4/neigh/eth0/unres_qlen_bytes:65536
/proc/sys/net/ipv4/neigh/gphy/unres_qlen:34
/proc/sys/net/ipv4/neigh/gphy/unres_qlen_bytes:65536

After:
/proc/sys/net/ipv4/neigh/eth0/unres_qlen:106
/proc/sys/net/ipv4/neigh/eth0/unres_qlen_bytes:229376
/proc/sys/net/ipv4/neigh/gphy/unres_qlen:106
/proc/sys/net/ipv4/neigh/gphy/unres_qlen_bytes:229376

and this does help a lot with the test case reported over an hour, only
2 packets lost:

# perf record -a -g -e skb:kfree_skb iperf -c 192.168.1.23 -b 900M -t
3600 -u
------------------------------------------------------------
Client connecting to 192.168.1.23, UDP port 5001
Sending 1470 byte datagrams, IPG target: 13.07 us (kalman adjust)
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.66 port 48209 connected with 192.168.1.23 port 5001
write failed: Invalid argument
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-404.9 sec  4.51 GBytes  95.7 Mbits/sec
[  4] Sent 3294727 datagrams
[  4] Server Report:
[  4]  0.0-405.1 sec  4.51 GBytes  95.6 Mbits/sec  14.979 ms
2/3294728 (6.1e-05%)

Thanks Eric!

> 
> Something like :
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 6b0bc0f715346a097a6df46e2ba2771359abcd23..7777dceb78107c0019fb39d5b69be1959005b78e 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -109,7 +109,8 @@ neigh/default/unres_qlen_bytes - INTEGER
>  	queued for each	unresolved address by other network layers.
>  	(added in linux 3.3)
>  	Setting negative value is meaningless and will return error.
> -	Default: 65536 Bytes(64KB)
> +	Default: SK_WMEM_MAX, enough to store 256 packets of medium size
> +		 (less than 256 bytes per packet)
>  
>  neigh/default/unres_qlen - INTEGER
>  	The maximum number of packets which may be queued for each
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1c2912d433e81b10f3fdc87bcfcbb091570edc03..03a362568357acc7278a318423dd3873103f90ca 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2368,6 +2368,16 @@ bool sk_net_capable(const struct sock *sk, int cap);
>  
>  void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
>  
> +/* Take into consideration the size of the struct sk_buff overhead in the
> + * determination of these values, since that is non-constant across
> + * platforms.  This makes socket queueing behavior and performance
> + * not depend upon such differences.
> + */
> +#define _SK_MEM_PACKETS		256
> +#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
> +#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
> +#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
> +
>  extern __u32 sysctl_wmem_max;
>  extern __u32 sysctl_rmem_max;
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index dfdd14cac775e9bfcee0085ee32ffcd0ab28b67b..9b7b6bbb2a23e7652a1f34a305f29d49de00bc8c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -307,16 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
>  static struct lock_class_key af_elock_keys[AF_MAX];
>  static struct lock_class_key af_kern_callback_keys[AF_MAX];
>  
> -/* Take into consideration the size of the struct sk_buff overhead in the
> - * determination of these values, since that is non-constant across
> - * platforms.  This makes socket queueing behavior and performance
> - * not depend upon such differences.
> - */
> -#define _SK_MEM_PACKETS		256
> -#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
> -#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
> -#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
> -
>  /* Run time adjustable parameters. */
>  __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
>  EXPORT_SYMBOL(sysctl_wmem_max);
> diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
> index 21dedf6fd0f76dec22b2b3685beb89cfefea7ded..22bf0b95d6edc3c27ef3a99d27cb70a1551e3e0e 100644
> --- a/net/decnet/dn_neigh.c
> +++ b/net/decnet/dn_neigh.c
> @@ -94,7 +94,7 @@ struct neigh_table dn_neigh_table = {
>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64*1024,
> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>  			[NEIGH_VAR_PROXY_QLEN] = 0,
>  			[NEIGH_VAR_ANYCAST_DELAY] = 0,
>  			[NEIGH_VAR_PROXY_DELAY] = 0,
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 8b52179ddc6e54eabf6d3c2ed0132083228680bb..7c45b8896709815c5dde5972fd57cb5c3bcb2648 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>  			[NEIGH_VAR_PROXY_QLEN] = 64,
>  			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
>  			[NEIGH_VAR_PROXY_DELAY]	= (8 * HZ) / 10,
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 5e338eb89509b1df6ebd060f8bd19fcb4b86fe05..266a530414d7be4f1e7be922e465bbab46f7cbac 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -127,7 +127,7 @@ struct neigh_table nd_tbl = {
>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = ND_REACHABLE_TIME,
>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>  			[NEIGH_VAR_PROXY_QLEN] = 64,
>  			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
>  			[NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10,
> 
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH net 0/4] xfrm_user info leaks
From: Mathias Krause @ 2017-08-26 19:56 UTC (permalink / raw)
  To: Joe Perches; +Cc: Steffen Klassert, David S. Miller, Herbert Xu, netdev
In-Reply-To: <1503763138.12569.45.camel@perches.com>

On 26 August 2017 at 17:58, Joe Perches <joe@perches.com> wrote:
> On Sat, 2017-08-26 at 17:08 +0200, Mathias Krause wrote:
>> Hi David, Steffen,
>>
>> the following series fixes a few info leaks due to missing padding byte
>> initialization in the xfrm_user netlink interface.
>
> Were these found by inspection or by some tool?
> If by tool, perhaps there are other _to_user cases?

I found the one in the offload API by manual inspection, looked around
a little and found the others. No tool involved.

I already looked at the xfrm_user API back in 2012 and fixed quite a
few info leaks but missed the ones in the netlink multicast
notification code :/

Regards,
Mathias

^ permalink raw reply

* [PATCH] net: ethernet: broadcom: Remove null check before kfree
From: Himanshu Jha @ 2017-08-26 20:17 UTC (permalink / raw)
  To: davem; +Cc: jarod, netdev, linux-kernel, Himanshu Jha

Kfree on NULL pointer is a no-op and therefore checking is redundant.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
---
 drivers/net/ethernet/broadcom/sb1250-mac.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/sb1250-mac.c b/drivers/net/ethernet/broadcom/sb1250-mac.c
index 16a0f19..ecdef42 100644
--- a/drivers/net/ethernet/broadcom/sb1250-mac.c
+++ b/drivers/net/ethernet/broadcom/sb1250-mac.c
@@ -1367,15 +1367,11 @@ static int sbmac_initctx(struct sbmac_softc *s)
 
 static void sbdma_uninitctx(struct sbmacdma *d)
 {
-	if (d->sbdma_dscrtable_unaligned) {
-		kfree(d->sbdma_dscrtable_unaligned);
-		d->sbdma_dscrtable_unaligned = d->sbdma_dscrtable = NULL;
-	}
+	kfree(d->sbdma_dscrtable_unaligned);
+	d->sbdma_dscrtable_unaligned = d->sbdma_dscrtable = NULL;
 
-	if (d->sbdma_ctxtable) {
-		kfree(d->sbdma_ctxtable);
-		d->sbdma_ctxtable = NULL;
-	}
+	kfree(d->sbdma_ctxtable);
+	d->sbdma_ctxtable = NULL;
 }
 
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH RFC WIP 5/5] net: dsa: Don't include CPU port when adding MDB to a port
From: Andrew Lunn @ 2017-08-26 20:56 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, nikolay, roopa,
	bridge, jiri
In-Reply-To: <1503780970-10312-1-git-send-email-andrew@lunn.ch>

Now that the MDB are explicitly added to the CPU port when required,
don't add the CPU port adding an MDB to a switch port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/switch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 97e2e9c8cf3f..c178e2b86a9a 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -130,7 +130,7 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
 	if (ds->index == info->sw_index)
 		set_bit(info->port, group);
 	for (port = 0; port < ds->num_ports; port++)
-		if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+		if (dsa_is_dsa_port(ds, port))
 			set_bit(port, group);
 
 	if (switchdev_trans_ph_prepare(trans)) {
-- 
2.14.1

^ permalink raw reply related

* [PATCH RFC WIP 0/5] IGMP snooping for local traffic
From: Andrew Lunn @ 2017-08-26 20:56 UTC (permalink / raw)
  To: netdev
  Cc: Vivien Didelot, Florian Fainelli, nikolay, jiri, roopa, stephen,
	bridge, Andrew Lunn

This is a WIP patchset i would like comments on from bridge, switchdev
and hardware offload people.

The linux bridge supports IGMP snooping. It will listen to IGMP
reports on bridge ports and keep track of which groups have been
joined on an interface. It will then forward multicast based on this
group membership.

When the bridge adds or removed groups from an interface, it uses
switchdev to request the hardware add an mdb to a port, so the
hardware can perform the selective forwarding between ports.

What is not covered by the current bridge code, is IGMP joins/leaves
from the host on the brX interface. No such monitoring is
performed. With a pure software bridge, it is not required. All
mulitcast frames are passed to the brX interface, and the network
stack filters them, as it does for any interface. However, when
hardware offload is involved, things change. We should program the
hardware to only send multcast packets to the host when the host has
in interest in them.

Thus we need to perform IGMP snooping on the brX interface, just like
any other interface of the bridge. However, currently the brX
interface is missing all the needed data structures to do this. There
is no net_bridge_port structure for the brX interface. This strucuture
is created when an interface is added to the bridge. But the brX
interface is not a member of the bridge. So this patchset makes the
brX interface a first class member of the bridge. When the brX
interface is opened, the interface is added to the bridge. A
net_bridge_port is allocated for it, and IGMP snooping is performed as
usual.

There are some complexities here. Some assumptions are broken, like
the master interface of a port interface is the bridge interface. The
brX interface cannot be its own master. The use of
netdev_master_upper_dev_get() within the bridge code has been changed
to reflecit this. The bridge receive handler needs to not process
frames for the brX interface, etc.

The interface downward to the hardware is also an issue. The code
presented here is a hack and needs to change. But that is secondary
and can be solved once it is agreed how the bridge needs to change to
support this use case.

Comment welcome and wanted.

	Andrew

Andrew Lunn (5):
  net: rtnetlink: Handle bridge port without upper device
  net: bridge: Skip receive handler on brX interface
  net: bridge: Make the brX interface a member of the bridge
  net: dsa: HACK: Handle MDB add/remove for none-switch ports
  net: dsa: Don't include CPU port when adding MDB to a port

 include/linux/if_bridge.h |  1 +
 net/bridge/br_device.c    | 12 ++++++++++--
 net/bridge/br_if.c        | 37 ++++++++++++++++++++++++-------------
 net/bridge/br_input.c     |  4 ++++
 net/bridge/br_mdb.c       |  2 --
 net/bridge/br_multicast.c |  7 ++++---
 net/bridge/br_private.h   |  1 +
 net/core/rtnetlink.c      | 23 +++++++++++++++++++++--
 net/dsa/port.c            | 19 +++++++++++++++++--
 net/dsa/switch.c          |  2 +-
 10 files changed, 83 insertions(+), 25 deletions(-)

-- 
2.14.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox