Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 3/3] riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
From: Sagar Kadam @ 2019-07-19 11:53 UTC (permalink / raw)
  To: Yash Shah
  Cc: davem, Rob Herring, Paul Walmsley, netdev, devicetree,
	Linux Kernel Mailing List, linux-riscv, Mark Rutland, Albert Ou,
	Palmer Dabbelt, nicolas.ferre, Sachin Ghadi, ynezz
In-Reply-To: <1563534631-15897-3-git-send-email-yash.shah@sifive.com>

The series looks good to me.

Reviewed-by: Sagar Kadam <sagar.kadam@sifive.com>

On Fri, Jul 19, 2019 at 4:41 PM Yash Shah <yash.shah@sifive.com> wrote:
>
> DT node for SiFive FU540-C000 GEMGXL Ethernet controller driver added
>
> Signed-off-by: Yash Shah <yash.shah@sifive.com>
> ---
>  arch/riscv/boot/dts/sifive/fu540-c000.dtsi          | 15 +++++++++++++++
>  arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts |  9 +++++++++
>  2 files changed, 24 insertions(+)
>
> diff --git a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
> index cc73522..588669f0 100644
> --- a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
> +++ b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
> @@ -231,5 +231,20 @@
>                         #size-cells = <0>;
>                         status = "disabled";
>                 };
> +               eth0: ethernet@10090000 {
> +                       compatible = "sifive,fu540-c000-gem";
> +                       interrupt-parent = <&plic0>;
> +                       interrupts = <53>;
> +                       reg = <0x0 0x10090000 0x0 0x2000
> +                              0x0 0x100a0000 0x0 0x1000>;
> +                       local-mac-address = [00 00 00 00 00 00];
> +                       clock-names = "pclk", "hclk";
> +                       clocks = <&prci PRCI_CLK_GEMGXLPLL>,
> +                                <&prci PRCI_CLK_GEMGXLPLL>;
> +                       #address-cells = <1>;
> +                       #size-cells = <0>;
> +                       status = "disabled";
> +               };
> +
>         };
>  };
> diff --git a/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts b/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
> index 0b55c53..85c17a7 100644
> --- a/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
> +++ b/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
> @@ -76,3 +76,12 @@
>                 disable-wp;
>         };
>  };
> +
> +&eth0 {
> +       status = "okay";
> +       phy-mode = "gmii";
> +       phy-handle = <&phy1>;
> +       phy1: ethernet-phy@0 {
> +               reg = <0>;
> +       };
> +};
> --
> 1.9.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply

* [PATCH 3/3] riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
From: Yash Shah @ 2019-07-19 11:10 UTC (permalink / raw)
  To: davem, robh+dt, paul.walmsley, netdev, devicetree, linux-kernel,
	linux-riscv
  Cc: mark.rutland, palmer, aou, nicolas.ferre, ynezz, sachin.ghadi,
	Yash Shah
In-Reply-To: <1563534631-15897-1-git-send-email-yash.shah@sifive.com>

DT node for SiFive FU540-C000 GEMGXL Ethernet controller driver added

Signed-off-by: Yash Shah <yash.shah@sifive.com>
---
 arch/riscv/boot/dts/sifive/fu540-c000.dtsi          | 15 +++++++++++++++
 arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts |  9 +++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
index cc73522..588669f0 100644
--- a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
+++ b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
@@ -231,5 +231,20 @@
 			#size-cells = <0>;
 			status = "disabled";
 		};
+		eth0: ethernet@10090000 {
+			compatible = "sifive,fu540-c000-gem";
+			interrupt-parent = <&plic0>;
+			interrupts = <53>;
+			reg = <0x0 0x10090000 0x0 0x2000
+			       0x0 0x100a0000 0x0 0x1000>;
+			local-mac-address = [00 00 00 00 00 00];
+			clock-names = "pclk", "hclk";
+			clocks = <&prci PRCI_CLK_GEMGXLPLL>,
+				 <&prci PRCI_CLK_GEMGXLPLL>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+			status = "disabled";
+		};
+
 	};
 };
diff --git a/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts b/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
index 0b55c53..85c17a7 100644
--- a/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
+++ b/arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts
@@ -76,3 +76,12 @@
 		disable-wp;
 	};
 };
+
+&eth0 {
+	status = "okay";
+	phy-mode = "gmii";
+	phy-handle = <&phy1>;
+	phy1: ethernet-phy@0 {
+		reg = <0>;
+	};
+};
-- 
1.9.1


^ permalink raw reply related

* [PATCH 2/3] macb: Update compatibility string for SiFive FU540-C000
From: Yash Shah @ 2019-07-19 11:10 UTC (permalink / raw)
  To: davem, robh+dt, paul.walmsley, netdev, devicetree, linux-kernel,
	linux-riscv
  Cc: mark.rutland, palmer, aou, nicolas.ferre, ynezz, sachin.ghadi,
	Yash Shah
In-Reply-To: <1563534631-15897-1-git-send-email-yash.shah@sifive.com>

Update the compatibility string for SiFive FU540-C000 as per the new
string updated in the binding doc.
Reference: https://lkml.org/lkml/2019/7/17/200

Signed-off-by: Yash Shah <yash.shah@sifive.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 15d0737..305371c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4112,7 +4112,7 @@ static int fu540_c000_init(struct platform_device *pdev)
 	{ .compatible = "cdns,emac", .data = &emac_config },
 	{ .compatible = "cdns,zynqmp-gem", .data = &zynqmp_config},
 	{ .compatible = "cdns,zynq-gem", .data = &zynq_config },
-	{ .compatible = "sifive,fu540-macb", .data = &fu540_c000_config },
+	{ .compatible = "sifive,fu540-c000-gem", .data = &fu540_c000_config },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, macb_dt_ids);
-- 
1.9.1


^ permalink raw reply related

* [PATCH 1/3] macb: bindings doc: update sifive fu540-c000 binding
From: Yash Shah @ 2019-07-19 11:10 UTC (permalink / raw)
  To: davem, robh+dt, paul.walmsley, netdev, devicetree, linux-kernel,
	linux-riscv
  Cc: mark.rutland, palmer, aou, nicolas.ferre, ynezz, sachin.ghadi,
	Yash Shah

As per the discussion with Nicolas Ferre, rename the compatible property
to a more appropriate and specific string.
LINK: https://lkml.org/lkml/2019/7/17/200

Signed-off-by: Yash Shah <yash.shah@sifive.com>
---
 Documentation/devicetree/bindings/net/macb.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt
index 63c73fa..0b61a90 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -15,10 +15,10 @@ Required properties:
   Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 SoCs.
   Use "cdns,zynq-gem" Xilinx Zynq-7xxx SoC.
   Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC.
-  Use "sifive,fu540-macb" for SiFive FU540-C000 SoC.
+  Use "sifive,fu540-c000-gem" for SiFive FU540-C000 SoC.
   Or the generic form: "cdns,emac".
 - reg: Address and length of the register set for the device
-	For "sifive,fu540-macb", second range is required to specify the
+	For "sifive,fu540-c000-gem", second range is required to specify the
 	address and length of the registers for GEMGXL Management block.
 - interrupts: Should contain macb interrupt
 - phy-mode: See ethernet.txt file in the same directory.
-- 
1.9.1


^ permalink raw reply related

* [patch iproute2 rfc 2/2] ip: allow to use alternative names as handle
From: Jiri Pirko @ 2019-07-19 11:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 ip/iplink.c  |  5 +++--
 lib/ll_map.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 45f975f1dce9..ad1e67761dd8 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -929,7 +929,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
 				NEXT_ARG();
 			if (dev != name)
 				duparg2("dev", *argv);
-			if (check_ifname(*argv))
+			if (check_altifname(*argv))
 				invarg("\"dev\" not a valid ifname", *argv);
 			dev = *argv;
 		}
@@ -1104,7 +1104,8 @@ int iplink_get(char *name, __u32 filt_mask)
 
 	if (name) {
 		addattr_l(&req.n, sizeof(req),
-			  IFLA_IFNAME, name, strlen(name) + 1);
+			  check_ifname(name) ? IFLA_IFNAME : IFLA_ALT_IFNAME,
+			  name, strlen(name) + 1);
 	}
 	addattr32(&req.n, sizeof(req), IFLA_EXT_MASK, filt_mask);
 
diff --git a/lib/ll_map.c b/lib/ll_map.c
index e0ed54bf77c9..04dfb0f2320b 100644
--- a/lib/ll_map.c
+++ b/lib/ll_map.c
@@ -70,7 +70,7 @@ static struct ll_cache *ll_get_by_name(const char *name)
 		struct ll_cache *im
 			= container_of(n, struct ll_cache, name_hash);
 
-		if (strncmp(im->name, name, IFNAMSIZ) == 0)
+		if (strcmp(im->name, name) == 0)
 			return im;
 	}
 
@@ -240,6 +240,43 @@ int ll_index_to_flags(unsigned idx)
 	return im ? im->flags : -1;
 }
 
+static int altnametoindex(const char *name)
+{
+	struct {
+		struct nlmsghdr		n;
+		struct ifinfomsg	ifm;
+		char			buf[1024];
+	} req = {
+		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+		.n.nlmsg_flags = NLM_F_REQUEST,
+		.n.nlmsg_type = RTM_GETLINK,
+	};
+	struct rtnl_handle rth = {};
+	struct nlmsghdr *answer;
+	struct ifinfomsg *ifm;
+	int rc = 0;
+
+	if (rtnl_open(&rth, 0) < 0)
+		return 0;
+
+	addattr32(&req.n, sizeof(req), IFLA_EXT_MASK,
+		  RTEXT_FILTER_VF | RTEXT_FILTER_SKIP_STATS);
+	addattr_l(&req.n, sizeof(req), IFLA_ALT_IFNAME, name, strlen(name) + 1);
+
+	if (rtnl_talk_suppress_rtnl_errmsg(&rth, &req.n, &answer) < 0)
+		goto out;
+
+	ifm = NLMSG_DATA(answer);
+	rc = ifm->ifi_index;
+
+	free(answer);
+
+	rtnl_close(&rth);
+out:
+	return rc;
+}
+
+
 unsigned ll_name_to_index(const char *name)
 {
 	const struct ll_cache *im;
@@ -257,6 +294,8 @@ unsigned ll_name_to_index(const char *name)
 		idx = if_nametoindex(name);
 	if (idx == 0)
 		idx = ll_idx_a2n(name);
+	if (idx == 0)
+		idx = altnametoindex(name);
 	return idx;
 }
 
-- 
2.21.0


^ permalink raw reply related

* [patch iproute2 rfc 1/2] ip: add support for alternative name addition/deletion/list
From: Jiri Pirko @ 2019-07-19 11:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/if_link.h   |  3 ++
 include/uapi/linux/rtnetlink.h |  7 +++
 include/utils.h                |  1 +
 ip/ipaddress.c                 | 14 ++++++
 ip/iplink.c                    | 81 ++++++++++++++++++++++++++++++++++
 lib/utils.c                    | 19 +++++---
 6 files changed, 120 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d36919fb4024..2386a3e94082 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -167,6 +167,9 @@ enum {
 	IFLA_NEW_IFINDEX,
 	IFLA_MIN_MTU,
 	IFLA_MAX_MTU,
+	IFLA_ALT_IFNAME_MOD, /* Alternative ifname to add/delete */
+	IFLA_ALT_IFNAME_LIST, /* nest */
+	IFLA_ALT_IFNAME, /* Alternative ifname */
 	__IFLA_MAX
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 358e83ee134d..38a4f5b55e17 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -164,6 +164,13 @@ enum {
 	RTM_GETNEXTHOP,
 #define RTM_GETNEXTHOP	RTM_GETNEXTHOP
 
+	RTM_NEWALTIFNAME = 108,
+#define RTM_NEWALTIFNAME	RTM_NEWALTIFNAME
+	RTM_DELALTIFNAME,
+#define RTM_DELALTIFNAME	RTM_DELALTIFNAME
+	RTM_GETALTIFNAME,
+#define RTM_GETALTIFNAME	RTM_GETALTIFNAME
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/include/utils.h b/include/utils.h
index 794d36053634..2dd6443fc6ff 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -196,6 +196,7 @@ void duparg(const char *, const char *) __attribute__((noreturn));
 void duparg2(const char *, const char *) __attribute__((noreturn));
 int nodev(const char *dev);
 int check_ifname(const char *);
+int check_altifname(const char *);
 int get_ifname(char *, const char *);
 const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
 bool matches(const char *prefix, const char *string);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index bc8f5ba13c33..8060161dcf1a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1139,6 +1139,20 @@ int print_linkinfo(struct nlmsghdr *n, void *arg)
 		close_json_array(PRINT_JSON, NULL);
 	}
 
+	if (tb[IFLA_ALT_IFNAME_LIST]) {
+		struct rtattr *i, *alt_ifname_list = tb[IFLA_ALT_IFNAME_LIST];
+		int rem = RTA_PAYLOAD(alt_ifname_list);
+
+		open_json_array(PRINT_JSON, "altifnames");
+		for (i = RTA_DATA(alt_ifname_list); RTA_OK(i, rem);
+		     i = RTA_NEXT(i, rem)) {
+			print_nl();
+			print_string(PRINT_ANY, NULL,
+				     "    altname %s", rta_getattr_str(i));
+		}
+		close_json_array(PRINT_JSON, NULL);
+	}
+
 	print_string(PRINT_FP, NULL, "%s", "\n");
 	fflush(fp);
 	return 1;
diff --git a/ip/iplink.c b/ip/iplink.c
index 212a088535da..45f975f1dce9 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -1617,6 +1617,84 @@ static int iplink_afstats(int argc, char **argv)
 	return 0;
 }
 
+static int iplink_altname_mod(int argc, char **argv, struct iplink_req *req)
+{
+	char *name = NULL;
+	char *dev = NULL;
+
+	while (argc > 0) {
+		if (matches(*argv, "name") == 0) {
+			NEXT_ARG();
+			if (check_altifname(*argv))
+				invarg("not a valid altifname", *argv);
+			name = *argv;
+		} else if (matches(*argv, "help") == 0) {
+			usage();
+		} else {
+			if (strcmp(*argv, "dev") == 0)
+				NEXT_ARG();
+			if (dev)
+				duparg2("dev", *argv);
+			if (check_altifname(*argv))
+				invarg("\"dev\" not a valid altifname", *argv);
+			dev = *argv;
+		}
+		argv++; argc--;
+	}
+
+	if (!dev) {
+		fprintf(stderr, "Not enough of information: \"dev\" argument is required.\n");
+		exit(-1);
+	}
+
+	if (!name) {
+		if (req->n.nlmsg_type != RTM_NEWALTIFNAME) {
+			name = dev;
+		} else {
+			fprintf(stderr, "Not enough of information: \"name\" argument is required.\n");
+			exit(-1);
+		}
+	}
+
+	addattr_l(&req->n, sizeof(*req), IFLA_ALT_IFNAME,
+		  dev, strlen(dev) + 1);
+	addattr_l(&req->n, sizeof(*req), IFLA_ALT_IFNAME_MOD,
+		  name, strlen(name) + 1);
+
+	if (rtnl_talk(&rth, &req->n, NULL) < 0)
+		return -2;
+
+	return 0;
+}
+
+static int iplink_altname(int argc, char **argv)
+{
+	struct iplink_req req = {
+		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+		.n.nlmsg_flags = NLM_F_REQUEST,
+		.i.ifi_family = preferred_family,
+	};
+
+	if (argc <= 0) {
+		usage();
+		exit(-1);
+	}
+
+	if (matches(*argv, "add") == 0) {
+		req.n.nlmsg_flags |= NLM_F_EXCL | NLM_F_CREATE | NLM_F_APPEND;
+		req.n.nlmsg_type = RTM_NEWALTIFNAME;
+	} else if (matches(*argv, "del") == 0) {
+		req.n.nlmsg_flags |= RTM_DELLINK;
+		req.n.nlmsg_type = RTM_DELALTIFNAME;
+	} else if (matches(*argv, "help") == 0) {
+		usage();
+	} else {
+		fprintf(stderr, "Operator required\n");
+		exit(-1);
+	}
+	return iplink_altname_mod(argc - 1, argv + 1, &req);
+}
+
 static void do_help(int argc, char **argv)
 {
 	struct link_util *lu = NULL;
@@ -1674,6 +1752,9 @@ int do_iplink(int argc, char **argv)
 		return 0;
 	}
 
+	if (matches(*argv, "altname") == 0)
+		return iplink_altname(argc-1, argv+1);
+
 	if (matches(*argv, "help") == 0) {
 		do_help(argc-1, argv+1);
 		return 0;
diff --git a/lib/utils.c b/lib/utils.c
index 9ea21fa16503..80e025e96a5a 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -824,14 +824,10 @@ int nodev(const char *dev)
 	return -1;
 }
 
-int check_ifname(const char *name)
+static int __check_ifname(const char *name)
 {
-	/* These checks mimic kernel checks in dev_valid_name */
 	if (*name == '\0')
 		return -1;
-	if (strlen(name) >= IFNAMSIZ)
-		return -1;
-
 	while (*name) {
 		if (*name == '/' || isspace(*name))
 			return -1;
@@ -840,6 +836,19 @@ int check_ifname(const char *name)
 	return 0;
 }
 
+int check_ifname(const char *name)
+{
+	/* These checks mimic kernel checks in dev_valid_name */
+	if (strlen(name) >= IFNAMSIZ)
+		return -1;
+	return __check_ifname(name);
+}
+
+int check_altifname(const char *name)
+{
+	return __check_ifname(name);
+}
+
 /* buf is assumed to be IFNAMSIZ */
 int get_ifname(char *buf, const char *name)
 {
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 5/7] net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

__rtnl_newlink() code flow is a bit different around tb[IFLA_IFNAME]
processing comparing to the other places. Change that to be unified with
the rest.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/rtnetlink.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f11a2367037d..8994dc858ae0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3067,12 +3067,10 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else {
-		if (ifname[0])
-			dev = __dev_get_by_name(net, ifname);
-		else
-			dev = NULL;
-	}
+	else if (tb[IFLA_IFNAME])
+		dev = __dev_get_by_name(net, ifname);
+	else
+		dev = NULL;
 
 	if (dev) {
 		master_dev = netdev_master_upper_dev_get(dev);
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 7/7] net: rtnetlink: add possibility to use alternative names as message handle
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Extend the basic rtnetlink commands to use alternative interface names
as a handle instead of ifindex and ifname.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/rtnetlink.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1fa30d514e3f..68ad12a7fc4d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1793,6 +1793,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_MAX_MTU]		= { .type = NLA_U32 },
 	[IFLA_ALT_IFNAME_MOD]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
+	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
+				    .len = ALTIFNAMSIZ - 1 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2767,14 +2769,17 @@ static int do_setlink(const struct sk_buff *skb,
 
 static struct net_device *rtnl_dev_get(struct net *net,
 				       struct nlattr *ifname_attr,
+				       struct nlattr *altifname_attr,
 				       char *ifname)
 {
-	char buffer[IFNAMSIZ];
+	char buffer[ALTIFNAMSIZ];
 
 	if (!ifname) {
 		ifname = buffer;
 		if (ifname_attr)
 			nla_strlcpy(ifname, ifname_attr, IFNAMSIZ);
+		else if (altifname_attr)
+			nla_strlcpy(ifname, altifname_attr, ALTIFNAMSIZ);
 		else
 			return NULL;
 	}
@@ -2810,8 +2815,8 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(net, NULL, ifname);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(net, NULL, tb[IFLA_ALT_IFNAME], ifname);
 	else
 		goto errout;
 
@@ -2908,8 +2913,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME],
+				   tb[IFLA_ALT_IFNAME], NULL);
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
 	else
@@ -3080,8 +3086,8 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(net, NULL, ifname);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(net, NULL, tb[IFLA_ALT_IFNAME], ifname);
 	else
 		dev = NULL;
 
@@ -3345,6 +3351,7 @@ static int rtnl_valid_getlink_req(struct sk_buff *skb,
 
 		switch (i) {
 		case IFLA_IFNAME:
+		case IFLA_ALT_IFNAME:
 		case IFLA_EXT_MASK:
 		case IFLA_TARGET_NETNSID:
 			break;
@@ -3392,8 +3399,9 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(tgt_net, tb[IFLA_IFNAME], NULL);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(tgt_net, tb[IFLA_IFNAME],
+				   tb[IFLA_ALT_IFNAME], NULL);
 	else
 		goto out;
 
@@ -3444,8 +3452,9 @@ static int rtnl_newaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME],
+				   tb[IFLA_ALT_IFNAME], NULL);
 	else
 		return -EINVAL;
 
@@ -3491,8 +3500,9 @@ static int rtnl_delaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
+	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME],
+				   tb[IFLA_ALT_IFNAME], NULL);
 	else
 		return -EINVAL;
 
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 6/7] net: rtnetlink: introduce helper to get net_device instance by ifname
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Introduce helper function rtnl_get_dev() that gets net_device structure
instance pointer according to passed ifname or ifname attribute.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/rtnetlink.c | 57 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8994dc858ae0..1fa30d514e3f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2765,6 +2765,23 @@ static int do_setlink(const struct sk_buff *skb,
 	return err;
 }
 
+static struct net_device *rtnl_dev_get(struct net *net,
+				       struct nlattr *ifname_attr,
+				       char *ifname)
+{
+	char buffer[IFNAMSIZ];
+
+	if (!ifname) {
+		ifname = buffer;
+		if (ifname_attr)
+			nla_strlcpy(ifname, ifname_attr, IFNAMSIZ);
+		else
+			return NULL;
+	}
+
+	return __dev_get_by_name(net, ifname);
+}
+
 static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
@@ -2794,7 +2811,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME])
-		dev = __dev_get_by_name(net, ifname);
+		dev = rtnl_dev_get(net, NULL, ifname);
 	else
 		goto errout;
 
@@ -2867,7 +2884,6 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *tgt_net = net;
 	struct net_device *dev = NULL;
 	struct ifinfomsg *ifm;
-	char ifname[IFNAMSIZ];
 	struct nlattr *tb[IFLA_MAX+1];
 	int err;
 	int netnsid = -1;
@@ -2881,9 +2897,6 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		return err;
 
-	if (tb[IFLA_IFNAME])
-		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
-
 	if (tb[IFLA_TARGET_NETNSID]) {
 		netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
 		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
@@ -2896,7 +2909,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME])
-		dev = __dev_get_by_name(tgt_net, ifname);
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
 	else
@@ -3068,7 +3081,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME])
-		dev = __dev_get_by_name(net, ifname);
+		dev = rtnl_dev_get(net, NULL, ifname);
 	else
 		dev = NULL;
 
@@ -3350,7 +3363,6 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(skb->sk);
 	struct net *tgt_net = net;
 	struct ifinfomsg *ifm;
-	char ifname[IFNAMSIZ];
 	struct nlattr *tb[IFLA_MAX+1];
 	struct net_device *dev = NULL;
 	struct sk_buff *nskb;
@@ -3373,9 +3385,6 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			return PTR_ERR(tgt_net);
 	}
 
-	if (tb[IFLA_IFNAME])
-		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
-
 	if (tb[IFLA_EXT_MASK])
 		ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
 
@@ -3384,7 +3393,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME])
-		dev = __dev_get_by_name(tgt_net, ifname);
+		dev = rtnl_dev_get(tgt_net, tb[IFLA_IFNAME], NULL);
 	else
 		goto out;
 
@@ -3433,16 +3442,12 @@ static int rtnl_newaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return err;
 
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_index > 0) {
+	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	} else if (tb[IFLA_IFNAME]) {
-		char ifname[IFNAMSIZ];
-
-		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
-		dev = __dev_get_by_name(net, ifname);
-	} else {
+	else if (tb[IFLA_IFNAME])
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
+	else
 		return -EINVAL;
-	}
 
 	if (!dev)
 		return -ENODEV;
@@ -3484,16 +3489,12 @@ static int rtnl_delaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return err;
 
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_index > 0) {
+	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
-	} else if (tb[IFLA_IFNAME]) {
-		char ifname[IFNAMSIZ];
-
-		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
-		dev = __dev_get_by_name(net, ifname);
-	} else {
+	else if (tb[IFLA_IFNAME])
+		dev = rtnl_dev_get(net, tb[IFLA_IFNAME], NULL);
+	else
 		return -EINVAL;
-	}
 
 	if (!dev)
 		return -ENODEV;
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Add two commands to add and delete alternative ifnames for net device.
Each net device can have multiple alternative names.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdevice.h      |   4 ++
 include/uapi/linux/if.h        |   1 +
 include/uapi/linux/if_link.h   |   1 +
 include/uapi/linux/rtnetlink.h |   7 +++
 net/core/dev.c                 |  58 ++++++++++++++++++-
 net/core/rtnetlink.c           | 102 +++++++++++++++++++++++++++++++++
 security/selinux/nlmsgtab.c    |   4 +-
 7 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74f99f127b0e..6922fdb483ca 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -920,10 +920,14 @@ struct tlsdev_ops;
 
 struct netdev_name_node {
 	struct hlist_node hlist;
+	struct list_head list;
 	struct net_device *dev;
 	char *name;
 };
 
+int netdev_name_node_alt_create(struct net_device *dev, char *name);
+int netdev_name_node_alt_destroy(struct net_device *dev, char *name);
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 7fea0fd7d6f5..4bf33344aab1 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -33,6 +33,7 @@
 #define	IFNAMSIZ	16
 #endif /* __UAPI_DEF_IF_IFNAMSIZ */
 #define	IFALIASZ	256
+#define	ALTIFNAMSIZ	128
 #include <linux/hdlc/ioctl.h>
 
 /* For glibc compatibility. An empty enum does not compile. */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4a8c02cafa9a..92268946e04a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -167,6 +167,7 @@ enum {
 	IFLA_NEW_IFINDEX,
 	IFLA_MIN_MTU,
 	IFLA_MAX_MTU,
+	IFLA_ALT_IFNAME_MOD, /* Alternative ifname to add/delete */
 	__IFLA_MAX
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ce2a623abb75..b36cfd83eb76 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -164,6 +164,13 @@ enum {
 	RTM_GETNEXTHOP,
 #define RTM_GETNEXTHOP	RTM_GETNEXTHOP
 
+	RTM_NEWALTIFNAME = 108,
+#define RTM_NEWALTIFNAME	RTM_NEWALTIFNAME
+	RTM_DELALTIFNAME,
+#define RTM_DELALTIFNAME	RTM_DELALTIFNAME
+	RTM_GETALTIFNAME,
+#define RTM_GETALTIFNAME	RTM_GETALTIFNAME
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index ad0d42fbdeee..2a3be2b279d3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -244,7 +244,13 @@ static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
 static struct netdev_name_node *
 netdev_name_node_head_alloc(struct net_device *dev)
 {
-	return netdev_name_node_alloc(dev, dev->name);
+	struct netdev_name_node *name_node;
+
+	name_node = netdev_name_node_alloc(dev, dev->name);
+	if (!name_node)
+		return NULL;
+	INIT_LIST_HEAD(&name_node->list);
+	return name_node;
 }
 
 static void netdev_name_node_free(struct netdev_name_node *name_node)
@@ -288,6 +294,55 @@ static struct netdev_name_node *netdev_name_node_lookup_rcu(struct net *net,
 	return NULL;
 }
 
+int netdev_name_node_alt_create(struct net_device *dev, char *name)
+{
+	struct netdev_name_node *name_node;
+	struct net *net = dev_net(dev);
+
+	name_node = netdev_name_node_lookup(net, name);
+	if (name_node)
+		return -EEXIST;
+	name_node = netdev_name_node_alloc(dev, name);
+	if (!name_node)
+		return -ENOMEM;
+	netdev_name_node_add(net, name_node);
+	/* The node that holds dev->name acts as a head of per-device list. */
+	list_add_tail(&name_node->list, &dev->name_node->list);
+
+	return 0;
+}
+EXPORT_SYMBOL(netdev_name_node_alt_create);
+
+static void __netdev_name_node_alt_destroy(struct netdev_name_node *name_node)
+{
+	list_del(&name_node->list);
+	netdev_name_node_del(name_node);
+	kfree(name_node->name);
+	netdev_name_node_free(name_node);
+}
+
+int netdev_name_node_alt_destroy(struct net_device *dev, char *name)
+{
+	struct netdev_name_node *name_node;
+	struct net *net = dev_net(dev);
+
+	name_node = netdev_name_node_lookup(net, name);
+	if (!name_node)
+		return -ENOENT;
+	__netdev_name_node_alt_destroy(name_node);
+
+	return 0;
+}
+EXPORT_SYMBOL(netdev_name_node_alt_destroy);
+
+static void netdev_name_node_alt_flush(struct net_device *dev)
+{
+	struct netdev_name_node *name_node, *tmp;
+
+	list_for_each_entry_safe(name_node, tmp, &dev->name_node->list, list)
+		__netdev_name_node_alt_destroy(name_node);
+}
+
 /* Device list insertion */
 static void list_netdevice(struct net_device *dev)
 {
@@ -8258,6 +8313,7 @@ static void rollback_registered_many(struct list_head *head)
 		dev_uc_flush(dev);
 		dev_mc_flush(dev);
 
+		netdev_name_node_alt_flush(dev);
 		netdev_name_node_free(dev->name_node);
 
 		if (dev->netdev_ops->ndo_uninit)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1ee6460f8275..7a2010b16e10 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1750,6 +1750,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 },
 	[IFLA_MIN_MTU]		= { .type = NLA_U32 },
 	[IFLA_MAX_MTU]		= { .type = NLA_U32 },
+	[IFLA_ALT_IFNAME_MOD]	= { .type = NLA_STRING,
+				    .len = ALTIFNAMSIZ - 1 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -3373,6 +3375,103 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
+static int rtnl_newaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
+			     struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFLA_MAX + 1];
+	struct net_device *dev;
+	struct ifinfomsg *ifm;
+	char *new_alt_ifname;
+	int err;
+
+	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, extack);
+	if (err)
+		return err;
+
+	err = rtnl_ensure_unique_netns(tb, extack, true);
+	if (err)
+		return err;
+
+	ifm = nlmsg_data(nlh);
+	if (ifm->ifi_index > 0) {
+		dev = __dev_get_by_index(net, ifm->ifi_index);
+	} else if (tb[IFLA_IFNAME]) {
+		char ifname[IFNAMSIZ];
+
+		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+		dev = __dev_get_by_name(net, ifname);
+	} else {
+		return -EINVAL;
+	}
+
+	if (!dev)
+		return -ENODEV;
+
+	if (!tb[IFLA_ALT_IFNAME_MOD])
+		return -EINVAL;
+
+	new_alt_ifname = nla_strdup(tb[IFLA_ALT_IFNAME_MOD], GFP_KERNEL);
+	if (!new_alt_ifname)
+		return -ENOMEM;
+
+	err = netdev_name_node_alt_create(dev, new_alt_ifname);
+	if (err)
+		goto out_free_new_alt_ifname;
+
+	return 0;
+
+out_free_new_alt_ifname:
+	kfree(new_alt_ifname);
+	return err;
+}
+
+static int rtnl_delaltifname(struct sk_buff *skb, struct nlmsghdr *nlh,
+			     struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFLA_MAX + 1];
+	struct net_device *dev;
+	struct ifinfomsg *ifm;
+	char *del_alt_ifname;
+	int err;
+
+	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, extack);
+	if (err)
+		return err;
+
+	err = rtnl_ensure_unique_netns(tb, extack, true);
+	if (err)
+		return err;
+
+	ifm = nlmsg_data(nlh);
+	if (ifm->ifi_index > 0) {
+		dev = __dev_get_by_index(net, ifm->ifi_index);
+	} else if (tb[IFLA_IFNAME]) {
+		char ifname[IFNAMSIZ];
+
+		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+		dev = __dev_get_by_name(net, ifname);
+	} else {
+		return -EINVAL;
+	}
+
+	if (!dev)
+		return -ENODEV;
+
+	if (!tb[IFLA_ALT_IFNAME_MOD])
+		return -EINVAL;
+
+	del_alt_ifname = nla_strdup(tb[IFLA_ALT_IFNAME_MOD], GFP_KERNEL);
+	if (!del_alt_ifname)
+		return -ENOMEM;
+
+	err = netdev_name_node_alt_destroy(dev, del_alt_ifname);
+	kfree(del_alt_ifname);
+
+	return err;
+}
+
 static u16 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
@@ -5331,6 +5430,9 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETNETCONF, NULL, rtnl_dump_all, 0);
 
+	rtnl_register(PF_UNSPEC, RTM_NEWALTIFNAME, rtnl_newaltifname, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_DELALTIFNAME, rtnl_delaltifname, NULL, 0);
+
 	rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, 0);
 	rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, 0);
 	rtnl_register(PF_BRIDGE, RTM_GETNEIGH, rtnl_fdb_get, rtnl_fdb_dump, 0);
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 58345ba0528e..a712b54c666c 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -83,6 +83,8 @@ static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOP,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOP,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOP,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_NEWALTIFNAME,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
+	{ RTM_DELALTIFNAME,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 };
 
 static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
@@ -166,7 +168,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 		 * structures at the top of this file with the new mappings
 		 * before updating the BUILD_BUG_ON() macro!
 		 */
-		BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOP + 3));
+		BUILD_BUG_ON(RTM_MAX != (RTM_NEWALTIFNAME + 3));
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
 				 sizeof(nlmsg_route_perms));
 		break;
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 4/7] net: rtnetlink: put alternative names to getlink message
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Extend exiting getlink info message with list of alternative names.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/if_link.h |  2 ++
 net/core/rtnetlink.c         | 41 ++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 92268946e04a..038361f9847b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -168,6 +168,8 @@ enum {
 	IFLA_MIN_MTU,
 	IFLA_MAX_MTU,
 	IFLA_ALT_IFNAME_MOD, /* Alternative ifname to add/delete */
+	IFLA_ALT_IFNAME_LIST, /* nest */
+	IFLA_ALT_IFNAME, /* Alternative ifname */
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7a2010b16e10..f11a2367037d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -980,6 +980,18 @@ static size_t rtnl_xdp_size(void)
 	return xdp_size;
 }
 
+static size_t rtnl_alt_ifname_list_size(const struct net_device *dev)
+{
+	struct netdev_name_node *name_node;
+	size_t size = nla_total_size(0);
+
+	if (list_empty(&dev->name_node->list))
+		return 0;
+	list_for_each_entry(name_node, &dev->name_node->list, list)
+		size += nla_total_size(ALTIFNAMSIZ);
+	return size;
+}
+
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
 				     u32 ext_filter_mask)
 {
@@ -1027,6 +1039,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4)  /* IFLA_CARRIER_DOWN_COUNT */
 	       + nla_total_size(4)  /* IFLA_MIN_MTU */
 	       + nla_total_size(4)  /* IFLA_MAX_MTU */
+	       + rtnl_alt_ifname_list_size(dev)
 	       + 0;
 }
 
@@ -1584,6 +1597,31 @@ static int rtnl_fill_link_af(struct sk_buff *skb,
 	return 0;
 }
 
+static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
+				 const struct net_device *dev)
+{
+	struct netdev_name_node *name_node;
+	struct nlattr *alt_ifname_list;
+
+	if (list_empty(&dev->name_node->list))
+		return 0;
+
+	alt_ifname_list = nla_nest_start_noflag(skb, IFLA_ALT_IFNAME_LIST);
+	if (!alt_ifname_list)
+		return -EMSGSIZE;
+
+	list_for_each_entry(name_node, &dev->name_node->list, list)
+		if (nla_put_string(skb, IFLA_ALT_IFNAME, name_node->name))
+			goto nla_put_failure;
+
+	nla_nest_end(skb, alt_ifname_list);
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, alt_ifname_list);
+	return -EMSGSIZE;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb,
 			    struct net_device *dev, struct net *src_net,
 			    int type, u32 pid, u32 seq, u32 change,
@@ -1697,6 +1735,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 		goto nla_put_failure_rcu;
 	rcu_read_unlock();
 
+	if (rtnl_fill_alt_ifnames(skb, dev))
+		goto nla_put_failure;
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 0/7] net: introduce alternative names for network interfaces
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for
netdevice name length. Now when we have PF and VF representors
with port names like "pfXvfY", it became quite common to hit this limit:
0123456789012345
enp131s0f1npf0vf6
enp131s0f1npf0vf22

Udev cannot rename these interfaces out-of-the-box and user needs to
create custom rules to handle them.

Also, udev has multiple schemes of netdev names. From udev code:
 * Type of names:
 *   b<number>                             - BCMA bus core number
 *   c<bus_id>                             - bus id of a grouped CCW or CCW device,
 *                                           with all leading zeros stripped [s390]
 *   o<index>[n<phys_port_name>|d<dev_port>]
 *                                         - on-board device index number
 *   s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - hotplug slot index number
 *   x<MAC>                                - MAC address
 *   [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - PCI geographical location
 *   [P<domain>]p<bus>s<slot>[f<function>][u<port>][..][c<config>][i<interface>]
 *                                         - USB port number chain
 *   v<slot>                               - VIO slot number (IBM PowerVM)
 *   a<vendor><model>i<instance>           - Platform bus ACPI instance id
 *   i<addr>n<phys_port_name>              - Netdevsim bus address and port name

One device can be often renamed by multiple patterns at the
same time (e.g. pci address/mac).

This patchset introduces alternative names for network interfaces.
Main goal is to:
1) Overcome the IFNAMSIZ limitation
2) Allow to have multiple names at the same time (multiple udev patterns)
3) Allow to use alternative names as handle for commands

See following examples.

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff

-> Add alternative names for dummy0:

$ ip link altname add dummy0 name someothername
$ ip link altname add dummy0 name someotherveryveryveryverylongname
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
  
-> Add bridge brx, add it's alternative name and use alternative names to
   do enslavement.

$ ip link add name brx type bridge
$ ip link altname add brx name mypersonalsuperspecialbridge
$ ip link set someotherveryveryveryverylongname master mypersonalsuperspecialbridge
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

-> Add ipv4 address to the bridge using alternative name:
    
$ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge
$ ip addr show mypersonalsuperspecialbridge     
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge
    inet 192.168.0.1/24 scope global brx
       valid_lft forever preferred_lft forever

-> Delete one of dummy0 alternative names:

$ ip link altname del someotherveryveryveryverylongname    
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

TODO:
- notifications for alternative names add/removal
- sanitization of add/del cmds (similar to get link)
- test more usecases and write selftests
- extend support for other netlink ifaces (ovs for example)
- add sysfs symlink altname->basename?

Jiri Pirko (7):
  net: procfs: use index hashlist instead of name hashlist
  net: introduce name_node struct to be used in hashlist
  net: rtnetlink: add commands to add and delete alternative ifnames
  net: rtnetlink: put alternative names to getlink message
  net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest
  net: rtnetlink: introduce helper to get net_device instance by ifname
  net: rtnetlink: add possibility to use alternative names as message
    handle

 include/linux/netdevice.h      |  14 ++-
 include/uapi/linux/if.h        |   1 +
 include/uapi/linux/if_link.h   |   3 +
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/dev.c                 | 152 ++++++++++++++++++++++----
 net/core/net-procfs.c          |   4 +-
 net/core/rtnetlink.c           | 192 +++++++++++++++++++++++++++++----
 security/selinux/nlmsgtab.c    |   4 +-
 8 files changed, 334 insertions(+), 43 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [patch net-next rfc 1/7] net: procfs: use index hashlist instead of name hashlist
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Name hashlist is going to be used for more than just dev->name, so use
rather index hashlist for iteration over net_device instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/net-procfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 36347933ec3a..6bbd06f7dc7d 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -20,8 +20,8 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff
 	struct hlist_head *h;
 	unsigned int count = 0, offset = get_offset(*pos);
 
-	h = &net->dev_name_head[get_bucket(*pos)];
-	hlist_for_each_entry_rcu(dev, h, name_hlist) {
+	h = &net->dev_index_head[get_bucket(*pos)];
+	hlist_for_each_entry_rcu(dev, h, index_hlist) {
 		if (++count == offset)
 			return dev;
 	}
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 2/7] net: introduce name_node struct to be used in hashlist
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdevice.h | 10 +++-
 net/core/dev.c            | 96 +++++++++++++++++++++++++++++++--------
 2 files changed, 86 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88292953aa6f..74f99f127b0e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -918,6 +918,12 @@ struct dev_ifalias {
 struct devlink;
 struct tlsdev_ops;
 
+struct netdev_name_node {
+	struct hlist_node hlist;
+	struct net_device *dev;
+	char *name;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1551,7 +1557,7 @@ enum netdev_priv_flags {
  *		(i.e. as seen by users in the "Space.c" file).  It is the name
  *		of the interface.
  *
- *	@name_hlist: 	Device name hash chain, please keep it close to name[]
+ *	@name_node:	Name hashlist node
  *	@ifalias:	SNMP alias
  *	@mem_end:	Shared memory end
  *	@mem_start:	Shared memory start
@@ -1761,7 +1767,7 @@ enum netdev_priv_flags {
 
 struct net_device {
 	char			name[IFNAMSIZ];
-	struct hlist_node	name_hlist;
+	struct netdev_name_node	*name_node;
 	struct dev_ifalias	__rcu *ifalias;
 	/*
 	 *	I/O specific fields
diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b2610e3..ad0d42fbdeee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -228,6 +228,66 @@ static inline void rps_unlock(struct softnet_data *sd)
 #endif
 }
 
+static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
+						       char *name)
+{
+	struct netdev_name_node *name_node;
+
+	name_node = kzalloc(sizeof(*name_node), GFP_KERNEL);
+	if (!name_node)
+		return NULL;
+	name_node->dev = dev;
+	name_node->name = name;
+	return name_node;
+}
+
+static struct netdev_name_node *
+netdev_name_node_head_alloc(struct net_device *dev)
+{
+	return netdev_name_node_alloc(dev, dev->name);
+}
+
+static void netdev_name_node_free(struct netdev_name_node *name_node)
+{
+	kfree(name_node);
+}
+
+static void netdev_name_node_add(struct net *net,
+				 struct netdev_name_node *name_node)
+{
+	hlist_add_head_rcu(&name_node->hlist,
+			   dev_name_hash(net, name_node->name));
+}
+
+static void netdev_name_node_del(struct netdev_name_node *name_node)
+{
+	hlist_del_rcu(&name_node->hlist);
+}
+
+static struct netdev_name_node *netdev_name_node_lookup(struct net *net,
+							const char *name)
+{
+	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *name_node;
+
+	hlist_for_each_entry(name_node, head, hlist)
+		if (!strcmp(name_node->name, name))
+			return name_node;
+	return NULL;
+}
+
+static struct netdev_name_node *netdev_name_node_lookup_rcu(struct net *net,
+							    const char *name)
+{
+	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *name_node;
+
+	hlist_for_each_entry_rcu(name_node, head, hlist)
+		if (!strcmp(name_node->name, name))
+			return name_node;
+	return NULL;
+}
+
 /* Device list insertion */
 static void list_netdevice(struct net_device *dev)
 {
@@ -237,7 +297,7 @@ static void list_netdevice(struct net_device *dev)
 
 	write_lock_bh(&dev_base_lock);
 	list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
-	hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
+	netdev_name_node_add(net, dev->name_node);
 	hlist_add_head_rcu(&dev->index_hlist,
 			   dev_index_hash(net, dev->ifindex));
 	write_unlock_bh(&dev_base_lock);
@@ -255,7 +315,7 @@ static void unlist_netdevice(struct net_device *dev)
 	/* Unlink dev from the device chain */
 	write_lock_bh(&dev_base_lock);
 	list_del_rcu(&dev->dev_list);
-	hlist_del_rcu(&dev->name_hlist);
+	netdev_name_node_del(dev->name_node);
 	hlist_del_rcu(&dev->index_hlist);
 	write_unlock_bh(&dev_base_lock);
 
@@ -733,14 +793,10 @@ EXPORT_SYMBOL_GPL(dev_fill_metadata_dst);
 
 struct net_device *__dev_get_by_name(struct net *net, const char *name)
 {
-	struct net_device *dev;
-	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *node_name;
 
-	hlist_for_each_entry(dev, head, name_hlist)
-		if (!strncmp(dev->name, name, IFNAMSIZ))
-			return dev;
-
-	return NULL;
+	node_name = netdev_name_node_lookup(net, name);
+	return node_name ? node_name->dev : NULL;
 }
 EXPORT_SYMBOL(__dev_get_by_name);
 
@@ -758,14 +814,10 @@ EXPORT_SYMBOL(__dev_get_by_name);
 
 struct net_device *dev_get_by_name_rcu(struct net *net, const char *name)
 {
-	struct net_device *dev;
-	struct hlist_head *head = dev_name_hash(net, name);
-
-	hlist_for_each_entry_rcu(dev, head, name_hlist)
-		if (!strncmp(dev->name, name, IFNAMSIZ))
-			return dev;
+	struct netdev_name_node *node_name;
 
-	return NULL;
+	node_name = netdev_name_node_lookup_rcu(net, name);
+	return node_name ? node_name->dev : NULL;
 }
 EXPORT_SYMBOL(dev_get_by_name_rcu);
 
@@ -1232,13 +1284,13 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	netdev_adjacent_rename_links(dev, oldname);
 
 	write_lock_bh(&dev_base_lock);
-	hlist_del_rcu(&dev->name_hlist);
+	netdev_name_node_del(dev->name_node);
 	write_unlock_bh(&dev_base_lock);
 
 	synchronize_rcu();
 
 	write_lock_bh(&dev_base_lock);
-	hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
+	netdev_name_node_add(net, dev->name_node);
 	write_unlock_bh(&dev_base_lock);
 
 	ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
@@ -8206,6 +8258,8 @@ static void rollback_registered_many(struct list_head *head)
 		dev_uc_flush(dev);
 		dev_mc_flush(dev);
 
+		netdev_name_node_free(dev->name_node);
+
 		if (dev->netdev_ops->ndo_uninit)
 			dev->netdev_ops->ndo_uninit(dev);
 
@@ -8648,6 +8702,10 @@ int register_netdevice(struct net_device *dev)
 	if (ret < 0)
 		goto out;
 
+	dev->name_node = netdev_name_node_head_alloc(dev);
+	if (!dev->name_node)
+		goto out;
+
 	/* Init, if this function is available */
 	if (dev->netdev_ops->ndo_init) {
 		ret = dev->netdev_ops->ndo_init(dev);
@@ -8767,6 +8825,8 @@ int register_netdevice(struct net_device *dev)
 	return ret;
 
 err_uninit:
+	if (dev->name_node)
+		netdev_name_node_free(dev->name_node);
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
 	if (dev->priv_destructor)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] be2net: fix adapter->big_page_size miscaculation
From: kbuild test robot @ 2019-07-19 10:32 UTC (permalink / raw)
  To: Qian Cai
  Cc: kbuild-all, davem, sathya.perla, ajit.khaparde,
	sriharsha.basavapatna, somnath.kotur, arnd, dhowells, hpa, netdev,
	linux-arch, linux-kernel, Qian Cai
In-Reply-To: <1562959401-19815-1-git-send-email-cai@lca.pw>

[-- Attachment #1: Type: text/plain, Size: 2951 bytes --]

Hi Qian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.2 next-20190719]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Qian-Cai/be2net-fix-adapter-big_page_size-miscaculation/20190713-191644
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
>> drivers/net/ethernet/emulex/benet/be_main.c:3138:33: error: implicit declaration of function '__get_order'; did you mean 'get_order'? [-Werror=implicit-function-declaration]
     adapter->big_page_size = (1 << __get_order(rx_frag_size)) * PAGE_SIZE;
                                    ^~~~~~~~~~~
                                    get_order
   cc1: some warnings being treated as errors

vim +3138 drivers/net/ethernet/emulex/benet/be_main.c

  3116	
  3117	static int be_rx_cqs_create(struct be_adapter *adapter)
  3118	{
  3119		struct be_queue_info *eq, *cq;
  3120		struct be_rx_obj *rxo;
  3121		int rc, i;
  3122	
  3123		adapter->num_rss_qs =
  3124				min(adapter->num_evt_qs, adapter->cfg_num_rx_irqs);
  3125	
  3126		/* We'll use RSS only if atleast 2 RSS rings are supported. */
  3127		if (adapter->num_rss_qs < 2)
  3128			adapter->num_rss_qs = 0;
  3129	
  3130		adapter->num_rx_qs = adapter->num_rss_qs + adapter->need_def_rxq;
  3131	
  3132		/* When the interface is not capable of RSS rings (and there is no
  3133		 * need to create a default RXQ) we'll still need one RXQ
  3134		 */
  3135		if (adapter->num_rx_qs == 0)
  3136			adapter->num_rx_qs = 1;
  3137	
> 3138		adapter->big_page_size = (1 << __get_order(rx_frag_size)) * PAGE_SIZE;
  3139		for_all_rx_queues(adapter, rxo, i) {
  3140			rxo->adapter = adapter;
  3141			cq = &rxo->cq;
  3142			rc = be_queue_alloc(adapter, cq, RX_CQ_LEN,
  3143					    sizeof(struct be_eth_rx_compl));
  3144			if (rc)
  3145				return rc;
  3146	
  3147			u64_stats_init(&rxo->stats.sync);
  3148			eq = &adapter->eq_obj[i % adapter->num_evt_qs].q;
  3149			rc = be_cmd_cq_create(adapter, cq, eq, false, 3);
  3150			if (rc)
  3151				return rc;
  3152		}
  3153	
  3154		dev_info(&adapter->pdev->dev,
  3155			 "created %d RX queue(s)\n", adapter->num_rx_qs);
  3156		return 0;
  3157	}
  3158	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 54343 bytes --]

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19 10:25 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <4db855e4-1d59-d30b-154c-e7a2aa1c9047@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/19/2019, 09:49:10 (UTC+00:00)

> 
> On 19/07/2019 09:44, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/19/2019, 09:37:49 (UTC+00:00)
> > 
> >>
> >> On 19/07/2019 08:51, Jose Abreu wrote:
> >>> From: Jon Hunter <jonathanh@nvidia.com>
> >>> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> >>>
> >>>> Have you tried using NFS on a board with this ethernet controller?
> >>>
> >>> I'm having some issues setting up the NFS server in order to replicate 
> >>> so this may take some time.
> >>
> >> If that's the case, we may wish to consider reverting this for now as it
> >> is preventing our board from booting. Appears to revert cleanly on top
> >> of mainline.
> >>
> >>> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> >>> the buffer address ?
> >>
> >> If you have a debug patch you would like me to apply and test with I
> >> can. However, it is best you prepare the patch as maybe I will not dump
> >> the appropriate addresses.
> >>
> >> Cheers
> >> Jon
> >>
> >> -- 
> >> nvpublic
> > 
> > Send me full boot log please.
> 
> Please see: https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.debian.net_1092277_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=iHahNPEIegk1merE1utjRvC8Xoz5jQlNb1VRzPHk4-4&s=4UTbo8miS4M-PmGNup4OXgJOosgvJQZm9wcvWYjJs7k&e= 
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

Thanks. Can you add attached patch and check if WARN is triggered ? And 
it would be good to know whether this is boot specific crash or just 
doesn't work at all, i.e. not using NFS to mount rootfs and instead 
manually configure interface and send/receive packets.

---
Thanks,
Jose Miguel Abreu

[-- Attachment #2: 0001-net-stmmac-Add-page-sanity-check.patch --]
[-- Type: application/octet-stream, Size: 1393 bytes --]

From d495620feccf24dc54218219c4c7f79c8696ecaa Mon Sep 17 00:00:00 2001
Message-Id: <d495620feccf24dc54218219c4c7f79c8696ecaa.1563531731.git.joabreu@synopsys.com>
From: Jose Abreu <joabreu@synopsys.com>
Date: Fri, 19 Jul 2019 12:21:44 +0200
Subject: [PATCH net] net: stmmac: Add page sanity check

Add a WARN_ON() when page is NULL.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5f1294ce0216..eac6920301e9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3350,6 +3350,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		entry = next_entry;
 		buf = &rx_q->buf_pool[entry];
 
+		WARN_ON(!buf->page);
+
 		if (priv->extend_desc)
 			p = (struct dma_desc *)(rx_q->dma_erx + entry);
 		else
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] libertas_tf: Use correct channel range in lbtf_geo_init
From: kbuild test robot @ 2019-07-19  9:32 UTC (permalink / raw)
  To: YueHaibing
  Cc: kbuild-all, kvalo, davem, lkundrak, derosier, linux-kernel,
	netdev, linux-wireless, YueHaibing
In-Reply-To: <20190716132534.11256-1-yuehaibing@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

Hi YueHaibing,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.2 next-20190718]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/YueHaibing/libertas_tf-Use-correct-channel-range-in-lbtf_geo_init/20190718-011728
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/wireless/marvell/libertas_tf/cmd.c: In function 'lbtf_geo_init':
>> drivers/net/wireless/marvell/libertas_tf/cmd.c:68:17: error: 'range' is a pointer; did you mean to use '->'?
     for (ch = range.start; ch < range.end; ch++)
                    ^
                    ->
   drivers/net/wireless/marvell/libertas_tf/cmd.c:68:35: error: 'range' is a pointer; did you mean to use '->'?
     for (ch = range.start; ch < range.end; ch++)
                                      ^
                                      ->

vim +68 drivers/net/wireless/marvell/libertas_tf/cmd.c

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66191 bytes --]

^ permalink raw reply

* Re: [PATCH] qca8k: enable port flow control
From: Niklas Cassel @ 2019-07-19  9:31 UTC (permalink / raw)
  To: xiaofeis
  Cc: davem, vkoul, netdev, andrew, linux-arm-msm, bjorn.andersson,
	vivien.didelot, f.fainelli, xiazha
In-Reply-To: <1563504791-43398-1-git-send-email-xiaofeis@codeaurora.org>

On Fri, Jul 19, 2019 at 10:53:11AM +0800, xiaofeis wrote:
> Set phy device advertising to enable MAC flow control.
> 
> Change-Id: Ibf0f554b072fc73136ec9f7ffb90c20b25a4faae
> Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
> ---
>  drivers/net/dsa/qca8k.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
> index d93be14..95ac081 100644
> --- a/drivers/net/dsa/qca8k.c
> +++ b/drivers/net/dsa/qca8k.c
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright (C) 2009 Felix Fietkau <nbd@nbd.name>
>   * Copyright (C) 2011-2012 Gabor Juhos <juhosg@openwrt.org>
> - * Copyright (c) 2015, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2015, 2019, The Linux Foundation. All rights reserved.
>   * Copyright (c) 2016 John Crispin <john@phrozen.org>
>   *
>   * This program is free software; you can redistribute it and/or modify
> @@ -800,6 +800,8 @@
>  	qca8k_port_set_status(priv, port, 1);
>  	priv->port_sts[port].enabled = 1;
>  
> +	phy->advertising |= (ADVERTISED_Pause | ADVERTISED_Asym_Pause);

Drop the unnecessary parentheses.

Question for DSA maintainers: shouldn't this be implemented in the
dsa_switch_ops phylink_validate callback, like it's done for other
dsa drivers?


Kind regards,
Niklas

> +
>  	return 0;
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  9:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <53da84b9-184f-1377-0582-ab7cf42ebdb6@redhat.com>

On Fri, Jul 19, 2019 at 04:51:00PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> > On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> > > On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > > > the length in the packet header.
> > > > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > > > So how does it work right now? If an app
> > > > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > > > 
> > > > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > > > for now we postponed it)
> > > > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > > > Why is it done like this?
> > > > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > > > that the idea was to have only one worker that uses the vring, and
> > > > > > multiple user threads that enqueue packets in the list.
> > > > > > This can simplify the code and we can put the user threads to sleep if
> > > > > > we don't have "credit" available (this means that the receiver doesn't
> > > > > > have space to receive the packet).
> > > > > I think you mean the reverse: even without credits you can copy from
> > > > > user and queue up data, then process it without waking up the user
> > > > > thread.
> > > > I checked the code better, but it doesn't seem to do that.
> > > > The .sendmsg callback of af_vsock, check if the transport has space
> > > > (virtio-vsock transport returns the credit available). If there is no
> > > > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > > > 
> > > > When the transport receives an update of credit available on the other
> > > > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > > > sleeping, that will queue the new packet.
> > > > 
> > > > So, in the current implementation, the TX worker doesn't check the
> > > > credit available, it only sends the packets.
> > > > 
> > > > > Does it help though? It certainly adds up work outside of
> > > > > user thread context which means it's not accounted for
> > > > > correctly.
> > > > I can try to xmit the packet directly in the user thread context, to see
> > > > the improvements.
> > > 
> > > It will then looks more like what virtio-net (and other networking device)
> > > did.
> > I'll try ASAP, the changes should not be too complicated... I hope :)
> > 
> > > 
> > > > > Maybe we want more VQs. Would help improve parallelism. The question
> > > > > would then become how to map sockets to VQs. With a simple hash
> > > > > it's easy to create collisions ...
> > > > Yes, more VQs can help but the map question is not simple to answer.
> > > > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > > > of queue utilization and try to balance.
> > > > Should the mapping be unique?
> > > 
> > > It sounds to me you want some kind of fair queuing? We've already had
> > > several qdiscs that do this.
> > Thanks for pointing it out!
> > 
> > > So if we use the kernel networking xmit path, all those issues could be
> > > addressed.
> > One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> > drawbacks in using the net-stack. (e.g. more latency due to the complexity
> > of the net-stack?)
> 
> 
> Yes, we need benchmark the performance. But as we've noticed, current vsock
> implementation is not efficient, and for stream socket, the overhead should
> be minimal. The most important thing is to avoid reinventing things that has
> already existed.

Got it. I completely agree with you, and I want to avoid reinventing things
(surely in a worse way).

But the idea (suggested also by Micheal) to discover how fast can go a
new protocol separate from the networking stack, is quite attractive :)

Thanks,
Stefano

^ permalink raw reply

* [PATCH bpf v3] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-19  9:18 UTC (permalink / raw)
  To: bpf, netdev, ys114321, alexei.starovoitov
  Cc: gor, heiko.carstens, Ilya Leoshkevich

The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.

The preprocessed code of the check is:

{
	__u8 tmp = *((volatile __u8 *)&skb->len +
		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};

clang generates the following code for it:

      0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
      1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
      2:	57 30 00 00 00 00 00 ff	r3 &= 255
      3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>

Finally, verifier transforms it to:

  0: (61) r2 = *(u32 *)(r1 +104)
  1: (bc) w2 = w2
  2: (74) w2 >>= 24
  3: (bc) w2 = w2
  4: (54) w2 &= 255
  5: (bc) w2 = w2

The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.

Adjust shift count calculation to account for endianness.

Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---

v1 -> v2: extract endianness-dependent code into a separate function
v2 -> v3: rename the function and move it to where it belongs

 include/linux/filter.h | 13 +++++++++++++
 kernel/bpf/verifier.c  |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ff65d22cf336..92c6e31fb008 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -24,6 +24,7 @@
 
 #include <net/sch_generic.h>
 
+#include <asm/byteorder.h>
 #include <uapi/linux/filter.h>
 #include <uapi/linux/bpf.h>
 
@@ -747,6 +748,18 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 	return size <= size_default && (size & (size - 1)) == 0;
 }
 
+static inline u8
+bpf_ctx_narrow_load_shift(u32 off, u32 size, u32 size_default)
+{
+	u8 load_off = off & (size_default - 1);
+
+#ifdef __LITTLE_ENDIAN
+	return load_off * 8;
+#else
+	return (size_default - (load_off + size)) * 8;
+#endif
+}
+
 #define bpf_ctx_wide_access_ok(off, size, type, field)			\
 	(size == sizeof(__u64) &&					\
 	off >= offsetof(type, field) &&					\
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5900cbb966b1..c84d83f86141 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8616,8 +8616,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 
 		if (is_narrower_load && size < target_size) {
-			u8 shift = (off & (size_default - 1)) * 8;
-
+			u8 shift = bpf_ctx_narrow_load_shift(off, size,
+							     size_default);
 			if (ctx_field_size <= 4) {
 				if (shift)
 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
-- 
2.21.0


^ permalink raw reply related

* [PATCH] ax88179_178a: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: Chuhong Yuan @ 2019-07-19  9:07 UTC (permalink / raw)
  Cc: David S . Miller, linux-usb, netdev, linux-kernel, Chuhong Yuan

Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/usb/ax88179_178a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index 0bc457ba8574..72d165114b67 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1366,8 +1366,7 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		return 0;
 
 	skb_trim(skb, skb->len - 4);
-	memcpy(&rx_hdr, skb_tail_pointer(skb), 4);
-	le32_to_cpus(&rx_hdr);
+	rx_hdr = get_unaligned_le32(skb_tail_pointer(skb));
 
 	pkt_cnt = (u16)rx_hdr;
 	hdr_off = (u16)(rx_hdr >> 16);
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf] selftests/bpf: fix sendmsg6_prog on s390
From: Ilya Leoshkevich @ 2019-07-19  9:06 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, rdna, Ilya Leoshkevich

"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in
sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to
leading IPv6 address digits, which is not the case on big-endian
machines.

Since checking bitwise operations doesn't seem to be the point of the
test, replace two short comparisons with a single int comparison.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/testing/selftests/bpf/progs/sendmsg6_prog.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/sendmsg6_prog.c b/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
index 5aeaa284fc47..a68062820410 100644
--- a/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
+++ b/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
@@ -41,8 +41,7 @@ int sendmsg_v6_prog(struct bpf_sock_addr *ctx)
 	}
 
 	/* Rewrite destination. */
-	if ((ctx->user_ip6[0] & 0xFFFF) == bpf_htons(0xFACE) &&
-	     ctx->user_ip6[0] >> 16 == bpf_htons(0xB00C)) {
+	if (ctx->user_ip6[0] == bpf_htonl(0xFACEB00C)) {
 		ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
 		ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1);
 		ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Jason Wang @ 2019-07-19  8:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <20190719083920.67qo2umpthz454be@steredhat>


On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
>> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>>>> the length in the packet header.
>>>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>>>> So how does it work right now? If an app
>>>>>>>> does sendmsg with a 64K buffer and the other
>>>>>>>> side publishes 4K buffers - does it just stall?
>>>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>>>
>>>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>>>> the TX worker, depending on the size of the buffers available in the
>>>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>>>> for now we postponed it)
>>>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>>>> Why is it done like this?
>>>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>>>> that the idea was to have only one worker that uses the vring, and
>>>>> multiple user threads that enqueue packets in the list.
>>>>> This can simplify the code and we can put the user threads to sleep if
>>>>> we don't have "credit" available (this means that the receiver doesn't
>>>>> have space to receive the packet).
>>>> I think you mean the reverse: even without credits you can copy from
>>>> user and queue up data, then process it without waking up the user
>>>> thread.
>>> I checked the code better, but it doesn't seem to do that.
>>> The .sendmsg callback of af_vsock, check if the transport has space
>>> (virtio-vsock transport returns the credit available). If there is no
>>> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>>>
>>> When the transport receives an update of credit available on the other
>>> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
>>> sleeping, that will queue the new packet.
>>>
>>> So, in the current implementation, the TX worker doesn't check the
>>> credit available, it only sends the packets.
>>>
>>>> Does it help though? It certainly adds up work outside of
>>>> user thread context which means it's not accounted for
>>>> correctly.
>>> I can try to xmit the packet directly in the user thread context, to see
>>> the improvements.
>>
>> It will then looks more like what virtio-net (and other networking device)
>> did.
> I'll try ASAP, the changes should not be too complicated... I hope :)
>
>>
>>>> Maybe we want more VQs. Would help improve parallelism. The question
>>>> would then become how to map sockets to VQs. With a simple hash
>>>> it's easy to create collisions ...
>>> Yes, more VQs can help but the map question is not simple to answer.
>>> Maybe we can do an hash on the (cid, port) or do some kind of estimation
>>> of queue utilization and try to balance.
>>> Should the mapping be unique?
>>
>> It sounds to me you want some kind of fair queuing? We've already had
>> several qdiscs that do this.
> Thanks for pointing it out!
>
>> So if we use the kernel networking xmit path, all those issues could be
>> addressed.
> One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> drawbacks in using the net-stack. (e.g. more latency due to the complexity
> of the net-stack?)


Yes, we need benchmark the performance. But as we've noticed, current 
vsock implementation is not efficient, and for stream socket, the 
overhead should be minimal. The most important thing is to avoid 
reinventing things that has already existed.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-19  8:49 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB3266989D15E017A789E14282D3CB0@BN8PR12MB3266.namprd12.prod.outlook.com>


On 19/07/2019 09:44, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/19/2019, 09:37:49 (UTC+00:00)
> 
>>
>> On 19/07/2019 08:51, Jose Abreu wrote:
>>> From: Jon Hunter <jonathanh@nvidia.com>
>>> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
>>>
>>>> Have you tried using NFS on a board with this ethernet controller?
>>>
>>> I'm having some issues setting up the NFS server in order to replicate 
>>> so this may take some time.
>>
>> If that's the case, we may wish to consider reverting this for now as it
>> is preventing our board from booting. Appears to revert cleanly on top
>> of mainline.
>>
>>> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
>>> the buffer address ?
>>
>> If you have a debug patch you would like me to apply and test with I
>> can. However, it is best you prepare the patch as maybe I will not dump
>> the appropriate addresses.
>>
>> Cheers
>> Jon
>>
>> -- 
>> nvpublic
> 
> Send me full boot log please.

Please see: https://paste.debian.net/1092277/

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19  8:44 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <bc9ab3c5-b1b9-26d4-7b73-01474328eafa@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/19/2019, 09:37:49 (UTC+00:00)

> 
> On 19/07/2019 08:51, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> > 
> >> Have you tried using NFS on a board with this ethernet controller?
> > 
> > I'm having some issues setting up the NFS server in order to replicate 
> > so this may take some time.
> 
> If that's the case, we may wish to consider reverting this for now as it
> is preventing our board from booting. Appears to revert cleanly on top
> of mainline.
> 
> > Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> > the buffer address ?
> 
> If you have a debug patch you would like me to apply and test with I
> can. However, it is best you prepare the patch as maybe I will not dump
> the appropriate addresses.
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

Send me full boot log please.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox