Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 2/5] phy: micrel: add Microchip KSZ 9477 Switch PHY support
From: Woojung.Huh @ 2017-05-05 23:17 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot; +Cc: netdev, davem, UNGLinuxDriver

From: Woojung Huh <Woojung.Huh@microchip.com>

Adding Microchip 9477 Phy included in KSZ9477 Switch.

Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com>
---
 drivers/net/phy/micrel.c   | 12 ++++++++++++
 include/linux/micrel_phy.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 6a5fd18..65520990 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -20,6 +20,7 @@
  *			   ksz8081, ksz8091,
  *			   ksz8061,
  *		Switch : ksz8873, ksz886x
+ *			 ksz9477
  */
 
 #include <linux/kernel.h>
@@ -997,6 +998,17 @@ static struct phy_driver ksphy_driver[] = {
 	.read_status	= ksz8873mll_read_status,
 	.suspend	= genphy_suspend,
 	.resume		= genphy_resume,
+}, {
+	.phy_id		= PHY_ID_KSZ9477,
+	.phy_id_mask	= MICREL_PHY_ID_MASK,
+	.name		= "Microchip KSZ9477",
+	.features	= PHY_GBIT_FEATURES,
+	.flags		= PHY_HAS_MAGICANEG,
+	.config_init	= kszphy_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= genphy_suspend,
+	.resume		= genphy_resume,
 } };
 
 module_phy_driver(ksphy_driver);
diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h
index f541da6..472fa4d 100644
--- a/include/linux/micrel_phy.h
+++ b/include/linux/micrel_phy.h
@@ -37,6 +37,8 @@
 
 #define PHY_ID_KSZ8795		0x00221550
 
+#define	PHY_ID_KSZ9477		0x00221631
+
 /* struct phy_device dev_flags definitions */
 #define MICREL_PHY_50MHZ_CLK	0x00000001
 #define MICREL_PHY_FXEN		0x00000002
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 1/5] dsa: add support for Microchip KSZ tail tagging
From: Woojung.Huh @ 2017-05-05 23:17 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot; +Cc: netdev, davem, UNGLinuxDriver

From: Woojung Huh <Woojung.Huh@microchip.com>

Adding support for the Microchip KSZ switch family tail tagging.

Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com>
---
 include/net/dsa.h  |  1 +
 net/dsa/Kconfig    |  3 ++
 net/dsa/Makefile   |  1 +
 net/dsa/dsa.c      |  3 ++
 net/dsa/dsa_priv.h |  3 ++
 net/dsa/tag_ksz.c  | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 109 insertions(+)
 create mode 100644 net/dsa/tag_ksz.c

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 8e24677..c92204a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -34,6 +34,7 @@ enum dsa_tag_protocol {
 	DSA_TAG_PROTO_QCA,
 	DSA_TAG_PROTO_MTK,
 	DSA_TAG_PROTO_LAN9303,
+	DSA_TAG_PROTO_KSZ,
 	DSA_TAG_LAST,		/* MUST BE LAST */
 };
 
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 81a0868..ce31428 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -37,4 +37,7 @@ config NET_DSA_TAG_MTK
 config NET_DSA_TAG_LAN9303
 	bool
 
+config NET_DSA_TAG_KSZ
+	bool
+
 endif
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 0b747d7..8becb26 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -10,3 +10,4 @@ dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o
 dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
 dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
 dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
+dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 26130ae..6340323 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -61,6 +61,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_LAN9303
 	[DSA_TAG_PROTO_LAN9303] = &lan9303_netdev_ops,
 #endif
+#ifdef CONFIG_NET_DSA_TAG_KSZ
+	[DSA_TAG_PROTO_KSZ] = &ksz_netdev_ops,
+#endif
 	[DSA_TAG_PROTO_NONE] = &none_ops,
 };
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index f4a88e4..70183ac 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -96,4 +96,7 @@ extern const struct dsa_device_ops mtk_netdev_ops;
 /* tag_lan9303.c */
 extern const struct dsa_device_ops lan9303_netdev_ops;
 
+/* tag_ksz.c */
+extern const struct dsa_device_ops ksz_netdev_ops;
+
 #endif
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
new file mode 100644
index 0000000..270bfb9
--- /dev/null
+++ b/net/dsa/tag_ksz.c
@@ -0,0 +1,98 @@
+/*
+ * net/dsa/tag_ksz.c - Microchip KSZ Switch tag format handling
+ * Copyright (c) 2017 Microchip Technology
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <net/dsa.h>
+#include "dsa_priv.h"
+
+/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
+ * ---------------------------------------------------------------------------
+ * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
+ * ---------------------------------------------------------------------------
+ * tag0 : Prioritization (not used now)
+ * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
+ *
+ * For Egress (KSZ -> Host), 1 byte is added before FCS.
+ * ---------------------------------------------------------------------------
+ * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes)
+ * ---------------------------------------------------------------------------
+ * tag0 : zero-based value represents port
+ *	  (eg, 0x00=port1, 0x02=port3, 0x06=port7)
+ */
+
+static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct sk_buff *nskb;
+	int padlen;
+	u8 *tag;
+
+	padlen = 0;
+	if (skb->len < 60)
+		padlen = 60 - skb->len;
+
+	nskb = alloc_skb(NET_IP_ALIGN + skb->len + padlen + 2, GFP_ATOMIC);
+	if (!nskb) {
+		kfree_skb(skb);
+		return NULL;
+	}
+	skb_reserve(nskb, NET_IP_ALIGN);
+
+	skb_reset_mac_header(nskb);
+	skb_set_network_header(nskb, skb_network_header(skb) - skb->head);
+	skb_set_transport_header(nskb, skb_transport_header(skb) - skb->head);
+	skb_copy_and_csum_dev(skb, skb_put(nskb, skb->len));
+	kfree_skb(skb);
+
+	if (padlen) {
+		u8 *pad = skb_put(nskb, padlen);
+
+		memset(pad, 0, padlen);
+	}
+
+	tag = skb_put(nskb, 2);
+	tag[0] = 0;
+	tag[1] = 1 << p->dp->index; /* destnation port */
+
+	return nskb;
+}
+
+struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
+			struct packet_type *pt, struct net_device *orig_dev)
+{
+	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_switch *ds;
+	u8 *tag;
+	int source_port;
+
+	ds = dst->cpu_switch;
+
+	if (skb_linearize(skb))
+		return NULL;
+
+	tag = skb_tail_pointer(skb) - 1;
+
+	source_port = tag[0] & 7;
+	if (source_port >= ds->num_ports || !ds->ports[source_port].netdev)
+		return NULL;
+
+	pskb_trim_rcsum(skb, skb->len - 1);
+
+	skb->dev = ds->ports[source_port].netdev;
+
+	return skb;
+}
+
+const struct dsa_device_ops ksz_netdev_ops = {
+	.xmit	= ksz_xmit,
+	.rcv	= ksz_rcv,
+};
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH/RFC net-next v2 4/4] net/sched: cls_flower: allow control of tree traversal on packet parse errors
From: Cong Wang @ 2017-05-05 22:44 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jiri Pirko, Jamal Hadi Salim, Dinan Gunawardena,
	Linux Kernel Network Developers, oss-drivers, Benjamin LaHaise
In-Reply-To: <1493988426-22854-5-git-send-email-simon.horman@netronome.com>

On Fri, May 5, 2017 at 5:47 AM, Simon Horman <simon.horman@netronome.com> wrote:
>
>  # tc qdisc del dev eth0 ingress; tc qdisc add dev eth0 ingress
>  # tc filter add dev eth0 protocol ip parent ffff: flower \
>        indev eth0 ip_proto udp dst_port 80 truncated drop action continue
>
[...]
> @@ -188,7 +189,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>          */
>         skb_key.basic.n_proto = skb->protocol;
>         if (!skb_flow_dissect(skb, &head->dissector, &skb_key, 0))
> -               return -1;
> +               return head->err_action;

This design looks odd, if you consider matching truncated packets as
a filter like other normal filters, then you should rely on the action
appended to return the action code, not within the filter itself.

^ permalink raw reply

* [PATCH] ray_cs: Avoid reading past end of buffer
From: Kees Cook @ 2017-05-05 22:38 UTC (permalink / raw)
  To: netdev; +Cc: Kalle Valo, linux-wireless, linux-kernel, Daniel Micay

Using memcpy() from a buffer that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. In this case, the source was made longer, since it did not
match the destination structure size. Additionally removes a needless cast.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/wireless/ray_cs.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c
index b94479441b0c..170cd504e8ff 100644
--- a/drivers/net/wireless/ray_cs.c
+++ b/drivers/net/wireless/ray_cs.c
@@ -247,7 +247,10 @@ static const UCHAR b4_default_startup_parms[] = {
 	0x04, 0x08,		/* Noise gain, limit offset */
 	0x28, 0x28,		/* det rssi, med busy offsets */
 	7,			/* det sync thresh */
-	0, 2, 2			/* test mode, min, max */
+	0, 2, 2,		/* test mode, min, max */
+	0,			/* rx/tx delay */
+	0, 0, 0, 0, 0, 0,	/* current BSS id */
+	0			/* hop set */
 };
 
 /*===========================================================================*/
@@ -597,7 +600,7 @@ static void init_startup_params(ray_dev_t *local)
 	 *    a_beacon_period = hops    a_beacon_period = KuS
 	 *//* 64ms = 010000 */
 	if (local->fw_ver == 0x55) {
-		memcpy((UCHAR *) &local->sparm.b4, b4_default_startup_parms,
+		memcpy(&local->sparm.b4, b4_default_startup_parms,
 		       sizeof(struct b4_startup_params));
 		/* Translate sane kus input values to old build 4/5 format */
 		/* i = hop time in uS truncated to 3 bytes */
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* [PATCH] qlge: Avoid reading past end of buffer
From: Kees Cook @ 2017-05-05 22:34 UTC (permalink / raw)
  To: netdev
  Cc: Harish Patil, Manish Chopra, Dept-GELinuxNICDev, linux-kernel,
	Daniel Micay

Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/ethernet/qlogic/qlge/qlge_dbg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
index 829be21f97b2..28ea0af89aef 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
@@ -765,7 +765,7 @@ int ql_core_dump(struct ql_adapter *qdev, struct ql_mpi_coredump *mpi_coredump)
 		sizeof(struct mpi_coredump_global_header);
 	mpi_coredump->mpi_global_header.imageSize =
 		sizeof(struct ql_mpi_coredump);
-	memcpy(mpi_coredump->mpi_global_header.idString, "MPI Coredump",
+	strncpy(mpi_coredump->mpi_global_header.idString, "MPI Coredump",
 		sizeof(mpi_coredump->mpi_global_header.idString));
 
 	/* Get generic NIC reg dump */
@@ -1255,7 +1255,7 @@ static void ql_gen_reg_dump(struct ql_adapter *qdev,
 		sizeof(struct mpi_coredump_global_header);
 	mpi_coredump->mpi_global_header.imageSize =
 		sizeof(struct ql_reg_dump);
-	memcpy(mpi_coredump->mpi_global_header.idString, "MPI Coredump",
+	strncpy(mpi_coredump->mpi_global_header.idString, "MPI Coredump",
 		sizeof(mpi_coredump->mpi_global_header.idString));
 
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* [PATCH] bna: ethtool: Avoid reading past end of buffer
From: Kees Cook @ 2017-05-05 22:30 UTC (permalink / raw)
  To: netdev
  Cc: Rasesh Mody, Sudarsana Kalluru, linux-kernel, Dept-GELinuxNICDev,
	Daniel Micay

Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 286593922139..31032de5843b 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -547,8 +547,8 @@ bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 		for (i = 0; i < BNAD_ETHTOOL_STATS_NUM; i++) {
 			BUG_ON(!(strlen(bnad_net_stats_strings[i]) <
 				   ETH_GSTRING_LEN));
-			memcpy(string, bnad_net_stats_strings[i],
-			       ETH_GSTRING_LEN);
+			strncpy(string, bnad_net_stats_strings[i],
+				ETH_GSTRING_LEN);
 			string += ETH_GSTRING_LEN;
 		}
 		bmap = bna_tx_rid_mask(&bnad->bna);
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* [PATCH] bna: Avoid reading past end of buffer
From: Kees Cook @ 2017-05-05 22:25 UTC (permalink / raw)
  To: netdev
  Cc: Rasesh Mody, Sudarsana Kalluru, Dept-GELinuxNICDev, linux-kernel,
	Daniel Micay

Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/ethernet/brocade/bna/bfa_ioc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/brocade/bna/bfa_ioc.c b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
index 0f6811860ad5..a36e38676640 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_ioc.c
+++ b/drivers/net/ethernet/brocade/bna/bfa_ioc.c
@@ -2845,7 +2845,7 @@ bfa_ioc_get_adapter_optrom_ver(struct bfa_ioc *ioc, char *optrom_ver)
 static void
 bfa_ioc_get_adapter_manufacturer(struct bfa_ioc *ioc, char *manufacturer)
 {
-	memcpy(manufacturer, BFA_MFG_NAME, BFA_ADAPTER_MFG_NAME_LEN);
+	strncpy(manufacturer, BFA_MFG_NAME, BFA_ADAPTER_MFG_NAME_LEN);
 }
 
 static void
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* [PATCH V4 net-next 2/2] bonding: Prevent duplicate userspace notification
From: Vladislav Yasevich @ 2017-05-05 20:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, dsa, jiri, Vladislav Yasevich
In-Reply-To: <1494017569-12869-1-git-send-email-vyasevic@redhat.com>

Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
notificatin is generated which results in a rtnelink message to
be sent.  While runnig 'ip monitor', we can actually see 2 messages,
one a result of the event, and the other a result of state change
that is generated bo netdev_state_change().  However, this is not
always the case. If bonding changes were done via sysfs or ifenslave
(old ioctl interface), then only 1 message is seen.

This patch removes duplicate messages in the case of using netlink
to configure bonding.  It introduceds a separte function that
triggers a netdev event and uses that function in the syfs and ioctl
cases.

This was discovered while auditing all the different envents and
continues the effort of cleaning up duplicated netlink messages.

CC: David Ahern <dsa@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 drivers/net/bonding/bond_main.c    |  3 ++-
 drivers/net/bonding/bond_options.c | 27 +++++++++++++++++++++++++--
 include/net/bond_options.h         |  2 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2be7880..ac2e425 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3481,7 +3481,8 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 	case BOND_CHANGE_ACTIVE_OLD:
 	case SIOCBONDCHANGEACTIVE:
 		bond_opt_initstr(&newval, slave_dev->name);
-		res = __bond_opt_set(bond, BOND_OPT_ACTIVE_SLAVE, &newval);
+		res = __bond_opt_set_notify(bond, BOND_OPT_ACTIVE_SLAVE,
+					    &newval);
 		break;
 	default:
 		res = -EOPNOTSUPP;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 1bcbb89..8ca6833 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -673,7 +673,30 @@ int __bond_opt_set(struct bonding *bond,
 out:
 	if (ret)
 		bond_opt_error_interpret(bond, opt, ret, val);
-	else if (bond->dev->reg_state == NETREG_REGISTERED)
+
+	return ret;
+}
+/**
+ * __bond_opt_set_notify - set a bonding option
+ * @bond: target bond device
+ * @option: option to set
+ * @val: value to set it to
+ *
+ * This function is used to change the bond's option value and trigger
+ * a notification to user sapce. It can be used for both enabling/changing
+ * an option and for disabling it. RTNL lock must be obtained before calling
+ * this function.
+ */
+int __bond_opt_set_notify(struct bonding *bond,
+			  unsigned int option, struct bond_opt_value *val)
+{
+	int ret = -ENOENT;
+
+	ASSERT_RTNL();
+
+	ret = __bond_opt_set(bond, option, val);
+
+	if (!ret && (bond->dev->reg_state == NETREG_REGISTERED))
 		call_netdevice_notifiers(NETDEV_CHANGEINFODATA, bond->dev);
 
 	return ret;
@@ -696,7 +719,7 @@ int bond_opt_tryset_rtnl(struct bonding *bond, unsigned int option, char *buf)
 	if (!rtnl_trylock())
 		return restart_syscall();
 	bond_opt_initstr(&optval, buf);
-	ret = __bond_opt_set(bond, option, &optval);
+	ret = __bond_opt_set_notify(bond, option, &optval);
 	rtnl_unlock();
 
 	return ret;
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index 1797235..d79d28f 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -104,6 +104,8 @@ struct bond_option {
 
 int __bond_opt_set(struct bonding *bond, unsigned int option,
 		   struct bond_opt_value *val);
+int __bond_opt_set_notify(struct bonding *bond, unsigned int option,
+			  struct bond_opt_value *val);
 int bond_opt_tryset_rtnl(struct bonding *bond, unsigned int option, char *buf);
 
 const struct bond_opt_value *bond_opt_parse(const struct bond_option *opt,
-- 
2.7.4

^ permalink raw reply related

* [PATCH V3 net-next 1/2] rtnl: Add support for netdev event to link messages
From: Vladislav Yasevich @ 2017-05-05 20:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, dsa, jiri, Vladislav Yasevich
In-Reply-To: <1494017569-12869-1-git-send-email-vyasevic@redhat.com>

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  So, it is impossible
to tell what just happend for these events.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of event that triggered this
message.  This would allow the the message consumer to easily determine
if it needs to perform certain actions.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 include/linux/rtnetlink.h    |  3 ++-
 include/uapi/linux/if_link.h | 11 ++++++++
 net/core/dev.c               |  2 +-
 net/core/rtnetlink.c         | 62 +++++++++++++++++++++++++++++++++++++-------
 4 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 57e5484..0459018 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -18,7 +18,8 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t flags);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-				       unsigned change, gfp_t flags);
+				       unsigned change, unsigned long event,
+				       gfp_t flags);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
 		       gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8e56ac7..0d4501e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
 	IFLA_GSO_MAX_SIZE,
 	IFLA_PAD,
 	IFLA_XDP,
+	IFLA_EVENT,
 	__IFLA_MAX
 };
 
@@ -902,4 +903,14 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+	IFLA_EVENT_UNSPEC,
+	IFLA_EVENT_REBOOT,
+	IFLA_EVENT_FEAT_CHANGE,
+	IFLA_EVENT_BONDING_FAILOVER,
+	IFLA_EVENT_NOTIFY_PEERS,
+	IFLA_EVENT_RESEND_IGMP,
+	IFLA_EVENT_CHANGE_INFO_DATA,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index d07aa5f..e111521 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6988,7 +6988,7 @@ static void rollback_registered_many(struct list_head *head)
 
 		if (!dev->rtnl_link_ops ||
 		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-			skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+			skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
 						     GFP_KERNEL);
 
 		/*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6e67315..43d2223 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -942,6 +942,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
 	       + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
 	       + rtnl_xdp_size() /* IFLA_XDP */
+	       + nla_total_size(4)  /* IFLA_EVENT */
 	       + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1286,9 +1287,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
+{
+	u32 rtnl_event;
+
+	switch (event) {
+	case NETDEV_REBOOT:
+		rtnl_event = IFLA_EVENT_REBOOT;
+		break;
+	case NETDEV_FEAT_CHANGE:
+		rtnl_event = IFLA_EVENT_FEAT_CHANGE;
+		break;
+	case NETDEV_BONDING_FAILOVER:
+		rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
+		break;
+	case NETDEV_NOTIFY_PEERS:
+		rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
+		break;
+	case NETDEV_RESEND_IGMP:
+		rtnl_event = IFLA_EVENT_RESEND_IGMP;
+		break;
+	case NETDEV_CHANGEINFODATA:
+		rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
+		break;
+	default:
+		return 0;
+	}
+
+	return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
-			    unsigned int flags, u32 ext_filter_mask)
+			    unsigned int flags, u32 ext_filter_mask,
+			    unsigned long event)
 {
 	struct ifinfomsg *ifm;
 	struct nlmsghdr *nlh;
@@ -1337,6 +1369,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	    nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
 		goto nla_put_failure;
 
+	if (rtnl_fill_link_event(skb, event))
+		goto nla_put_failure;
+
 	if (rtnl_fill_link_ifmap(skb, dev))
 		goto nla_put_failure;
 
@@ -1471,6 +1506,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 	[IFLA_PROTO_DOWN]	= { .type = NLA_U8 },
 	[IFLA_XDP]		= { .type = NLA_NESTED },
+	[IFLA_EVENT]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1630,7 +1666,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 					       NETLINK_CB(cb->skb).portid,
 					       cb->nlh->nlmsg_seq, 0,
 					       flags,
-					       ext_filter_mask);
+					       ext_filter_mask, 0);
 			/* If we ran out of room on the first message,
 			 * we're in trouble
 			 */
@@ -2733,7 +2769,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -ENOBUFS;
 
 	err = rtnl_fill_ifinfo(nskb, dev, RTM_NEWLINK, NETLINK_CB(skb).portid,
-			       nlh->nlmsg_seq, 0, 0, ext_filter_mask);
+			       nlh->nlmsg_seq, 0, 0, ext_filter_mask, 0);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size */
 		WARN_ON(err == -EMSGSIZE);
@@ -2805,7 +2841,8 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 }
 
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-				       unsigned int change, gfp_t flags)
+				       unsigned int change,
+				       unsigned long event, gfp_t flags)
 {
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
@@ -2816,7 +2853,7 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
 	if (skb == NULL)
 		goto errout;
 
-	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0, 0);
+	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0, 0, event);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
@@ -2837,18 +2874,25 @@ void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
 	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
 }
 
-void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
-		  gfp_t flags)
+static void rtmsg_ifinfo_event(int type, struct net_device *dev,
+			       unsigned int change, unsigned long event,
+			       gfp_t flags)
 {
 	struct sk_buff *skb;
 
 	if (dev->reg_state != NETREG_REGISTERED)
 		return;
 
-	skb = rtmsg_ifinfo_build_skb(type, dev, change, flags);
+	skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags);
 	if (skb)
 		rtmsg_ifinfo_send(skb, dev, flags);
 }
+
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
+		  gfp_t flags)
+{
+	rtmsg_ifinfo_event(type, dev, change, 0, flags);
+}
 EXPORT_SYMBOL(rtmsg_ifinfo);
 
 static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
@@ -4152,7 +4196,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_NOTIFY_PEERS:
 	case NETDEV_RESEND_IGMP:
 	case NETDEV_CHANGEINFODATA:
-		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
+		rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, event, GFP_KERNEL);
 		break;
 	default:
 		break;
-- 
2.7.4

^ permalink raw reply related

* [PATCH v4 net-next 0/2] rtnetlink: Updates to rtnetlink_event()
From: Vladislav Yasevich @ 2017-05-05 20:52 UTC (permalink / raw)
  To: netdev; +Cc: roopa, dsa, jiri, Vladislav Yasevich

This is a version 4 series came out of the conversation that started
as a result my first attempt to add netdevice event info to netlink messages.

First is the patch to add IFLA_EVENT attribute to the netlink message.  It
supports only currently white-listed events.
Like before, this is just an attribute that gets added to the rtnetlink
message only when the messaged was generated as a result of a netdev event.
In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
in user space.  This is not possible since the messages generated as
a result of netdev events do not usually contain any changed data.  They
are just notifications.  This patch exposes this notification type to
userspace.

Second, I remove duplicate messages that a result of a change to bonding
options.  If netlink is used to configure bonding options, 2 messages
are generated, one as a result NETDEV_CHANGEINFODATA event triggered by
bonding code and one a result of device state changes triggered by
netdev_state_change (called from do_setlink).

I will also update my patch to iproute that will show this data
through 'ip monitor'. 

V4:
  * Removed the patch the removed NETDEV_CHANGENAME from event whitelist.
    It doesn't trigger duplicate messages since name changes can only be
    done while device is down and netdev_state_change() doesn't report
    changes while device is down.
  * Added a patch to clean-up duplicate messages on bonding option changes.

V3: Rebased.  Cleaned-up duplicate event.

V2: Added missed events (from David Ahern)

Vladislav Yasevich (2):
  rtnl: Add support for netdev event to link messages
  bonding: Prevent duplicate userspace notification

 drivers/net/bonding/bond_main.c    |  3 +-
 drivers/net/bonding/bond_options.c | 27 +++++++++++++++--
 include/linux/rtnetlink.h          |  3 +-
 include/net/bond_options.h         |  2 ++
 include/uapi/linux/if_link.h       | 11 +++++++
 net/core/dev.c                     |  2 +-
 net/core/rtnetlink.c               | 62 ++++++++++++++++++++++++++++++++------
 7 files changed, 96 insertions(+), 14 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH v2] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Alexander Duyck @ 2017-05-05 20:37 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: Netdev, Michal Kubecek, avagin, Vladislav Yasevich
In-Reply-To: <1494015461-12192-1-git-send-email-vyasevic@redhat.com>

On Fri, May 5, 2017 at 1:17 PM, Vladislav Yasevich <vyasevich@gmail.com> wrote:
> Vlan devices, like all other software devices, enable
> NETIF_F_HW_CSUM feature.  However, unlike all the othe other
> software devices, vlans will switch to using IP|IPV6_CSUM
> features, if the underlying devices uses them.  In these situations,
> checksum offload features on the vlan device can't be controlled
> via ethtool.
>
> This patch makes vlans keep HW_CSUM feature if the underlying
> device supports checksum offloading.  This makes vlan devices
> behave like other software devices, and restores control to the
> user.
>
> A side-effect is that some offload settings (typically UFO)
> may be enabled on the vlan device while being disabled on the HW.
> However, the GSO code will correctly process the packets. This
> actually results in slightly better raw throughput.
>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

> ---
> V2:  posted the right patch.
>
>  net/8021q/vlan_dev.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 9ee5787..ff12cf3 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -626,11 +626,18 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
>  {
>         struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
>         netdev_features_t old_features = features;
> +       netdev_features_t lower_features;
>
> -       features = netdev_intersect_features(features, real_dev->vlan_features);
> -       features |= NETIF_F_RXCSUM;
> -       features = netdev_intersect_features(features, real_dev->features);
> +       lower_features = netdev_intersect_features((real_dev->vlan_features |
> +                                                   NETIF_F_RXCSUM),
> +                                                  real_dev->features);
>
> +       /* Add HW_CSUM setting to preserve user ability to control
> +        * checksum offload on the vlan device.
> +        */
> +       if (lower_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
> +               lower_features |= NETIF_F_HW_CSUM;
> +       features = netdev_intersect_features(features, lower_features);
>         features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
>         features |= NETIF_F_LLTX;
>
> --
> 2.7.4
>

^ permalink raw reply

* bpf pointer alignment validation
From: David Miller @ 2017-05-05 20:20 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev

Alexei and Daniel, I just wanted to let you guys know that I'm working
on an alignment tracker in the BPF verifier.

After trying several approaches I think what is going to work is to
maintain state like this:

1) For non-pointer registers, we record what we can prove is the
   minimum alignment of the value held in the register.

   So for example:

	r5 <<= 2

   would result in a min_align value of '4'.

   These alignment values assist us when check_packet_ptr_add() has to
   transition a pointer register and allocate an ID to it.

2) Packet pointer registers have a base alignment (which is something
   relative to NET_IP_ALIGN).

   Then there is something called an auxiliary offset alignment.

   Any time we add some non-constant value to a pointer, we apply the
   value's min alignment to the pointer register's auxiliary offset
   alignment.

Then check_pkt_ptr_alignment has it's logic adjusted such that it
takes all of this new information into account.

First, it makes the existing test:

        if ((NET_IP_ALIGN + reg->off + off) % size != 0) {

except that NET_IP_ALIGN is replaced with the packet pointer base
alignment (which we'll set in the context load helpers, thus putting
the NET_IP_ALIGN detail back into the networking code).

So that turns into something like:

        if ((reg->ptr_base_align + reg->off + off) % size != 0) {

Next, if an ID has been assigned, we have to also check the auxiliary
alignment:

	if (reg->id && (reg->aux_off_align % size) != 0) {

Otherwise, we can prove that the size access will work.

I think in order for this to work properly, we also have to stop
"forgetting" the reg->off value when we assign an ID to a pointer
register.  However, the reg->range we still have to always kill in
this situation.

Anyways, I'll play with this design and see what happens...  Feedback
is of course welcome.

^ permalink raw reply

* Re: [PATCH v5 04/20] dt-bindings: syscon: Add DT bindings documentation for Allwinner syscon
From: Rob Herring @ 2017-05-05 20:20 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland-5wv7dgnIgG8,
	maxime.ripard-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8, wens-jdAy2FN1RRM,
	linux-I+IVW8TIWO2tmTQ+vhA3Yw, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, peppe.cavallaro-qxv4g6HH51o,
	alexandre.torgue-qxv4g6HH51o, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20170501124520.3769-5-clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Mon, May 01, 2017 at 02:45:04PM +0200, Corentin Labbe wrote:
> This patch adds documentation for Device-Tree bindings for the
> syscon present in allwinner devices.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  .../devicetree/bindings/misc/allwinner,syscon.txt     | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/allwinner,syscon.txt

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Vladislav Yasevich @ 2017-05-05 20:17 UTC (permalink / raw)
  To: netdev; +Cc: mkubecek, alexander.duyck, avagin, Vladislav Yasevich

Vlan devices, like all other software devices, enable
NETIF_F_HW_CSUM feature.  However, unlike all the othe other
software devices, vlans will switch to using IP|IPV6_CSUM
features, if the underlying devices uses them.  In these situations,
checksum offload features on the vlan device can't be controlled
via ethtool.

This patch makes vlans keep HW_CSUM feature if the underlying
device supports checksum offloading.  This makes vlan devices
behave like other software devices, and restores control to the
user.

A side-effect is that some offload settings (typically UFO)
may be enabled on the vlan device while being disabled on the HW.
However, the GSO code will correctly process the packets. This
actually results in slightly better raw throughput.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
V2:  posted the right patch.

 net/8021q/vlan_dev.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 9ee5787..ff12cf3 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -626,11 +626,18 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
 {
 	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
 	netdev_features_t old_features = features;
+	netdev_features_t lower_features;

-	features = netdev_intersect_features(features, real_dev->vlan_features);
-	features |= NETIF_F_RXCSUM;
-	features = netdev_intersect_features(features, real_dev->features);
+	lower_features = netdev_intersect_features((real_dev->vlan_features |
+						    NETIF_F_RXCSUM),
+						   real_dev->features);

+	/* Add HW_CSUM setting to preserve user ability to control
+	 * checksum offload on the vlan device.
+	 */
+	if (lower_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
+		lower_features |= NETIF_F_HW_CSUM;
+	features = netdev_intersect_features(features, lower_features);
 	features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
 	features |= NETIF_F_LLTX;

-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Vlad Yasevich @ 2017-05-05 20:12 UTC (permalink / raw)
  To: Alexander Duyck, Vladislav Yasevich; +Cc: Netdev, Michal Kubecek, avagin
In-Reply-To: <CAKgT0UdF_Mk4aEDKS2L_qNOTk_UhSpBP6dHeTN2Y_wz1vj-B2g@mail.gmail.com>

On 05/05/2017 04:01 PM, Alexander Duyck wrote:
> On Fri, May 5, 2017 at 12:20 PM, Vladislav Yasevich <vyasevich@gmail.com> wrote:
>> Vlan devices, like all other software devices, enable
>> NETIF_F_HW_CSUM feature.  However, unlike all the othe other
>> software devices, vlans will switch to using IP|IPV6_CSUM
>> features, if the underlying devices uses them.  In these situations,
>> checksum offload features on the vlan device can't be controlled
>> via ethtool.
>>
>> This patch makes vlans keep HW_CSUM feature if the underlying
>> device supports checksum offloading.  This makes vlan devices
>> behave like other software devices, and restores control to the
>> user.
>>
>> A side-effect is that some offload settings (typically UFO)
>> may be enabled on the vlan device while being disabled on the HW.
>> However, the GSO code will correctly process the packets. This
>> actually results in slightly better raw throughput.
>>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>  net/8021q/vlan_dev.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index 9ee5787..ffc8167 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -626,10 +626,16 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
>>  {
>>         struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
>>         netdev_features_t old_features = features;
>> +       netdev_features_t real_dev_features = real_dev->features;
>>
>> -       features = netdev_intersect_features(features, real_dev->vlan_features);
>> +       features = netdev_intersect_features(features,
>> +                                            (real_dev->vlan_features |
>> +                                             NETIF_F_HW_CSUM));
> 
> You might want to change the ordering on all this.
> 
> You could start out with a value based on the intersection of
> real_dev->features and real_dev->vlan_features. Then you don't need to
> mess around with this extra piece where you are having OR in HW_CSUM.

You know,  I did that and that was the patch I meant to send... I had
3 different versions of this thing trying to find the best way...

Let me repost, since some of the rest of the changes go away.

-vlad

> That way you don't risk losing track of the state of the hardware
> checksum offload in terms of vlan_features as it should completely
> clear all of the checksums if none of them are supported in hardware.
> 
>>         features |= NETIF_F_RXCSUM;
> 
> This line would probably need to be changed to OR NETIF_F_RXCSUM with
> the real_dev->vlan_features when we perform the first intersect test.
> That way we are guaranteed to report RXCSUM if the underlying device
> supports it. It might just be worth while to force this into the
> vlan_features for all devices in register_netdevice() then we wouldn't
> need this line at all and it probably makes sense since it would allow
> us to save a few extra cycles/instructions by combining it with the
> line that was adding high dma support.
> 
>> -       features = netdev_intersect_features(features, real_dev->features);
>> +       if (real_dev_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
>> +               real_dev_features |= NETIF_F_HW_CSUM;
>> +
>> +       features = netdev_intersect_features(features, real_dev_features);
> 
> This part all looks good.
> 
> My only advice like I said would be to record the vlan_features
> intersection first based on the real_dev. That way you don't risk
> losing state data from real device if for some reason it doesn't
> support checksum offload with VLAN tagged frames.
> 
>>         features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
>>         features |= NETIF_F_LLTX;
>> --
>> 2.7.4
>>

^ permalink raw reply

* Re: [PATCH] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Alexander Duyck @ 2017-05-05 20:01 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: Netdev, Michal Kubecek, avagin, Vladislav Yasevich
In-Reply-To: <1494012038-5776-1-git-send-email-vyasevic@redhat.com>

On Fri, May 5, 2017 at 12:20 PM, Vladislav Yasevich <vyasevich@gmail.com> wrote:
> Vlan devices, like all other software devices, enable
> NETIF_F_HW_CSUM feature.  However, unlike all the othe other
> software devices, vlans will switch to using IP|IPV6_CSUM
> features, if the underlying devices uses them.  In these situations,
> checksum offload features on the vlan device can't be controlled
> via ethtool.
>
> This patch makes vlans keep HW_CSUM feature if the underlying
> device supports checksum offloading.  This makes vlan devices
> behave like other software devices, and restores control to the
> user.
>
> A side-effect is that some offload settings (typically UFO)
> may be enabled on the vlan device while being disabled on the HW.
> However, the GSO code will correctly process the packets. This
> actually results in slightly better raw throughput.
>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  net/8021q/vlan_dev.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 9ee5787..ffc8167 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -626,10 +626,16 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
>  {
>         struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
>         netdev_features_t old_features = features;
> +       netdev_features_t real_dev_features = real_dev->features;
>
> -       features = netdev_intersect_features(features, real_dev->vlan_features);
> +       features = netdev_intersect_features(features,
> +                                            (real_dev->vlan_features |
> +                                             NETIF_F_HW_CSUM));

You might want to change the ordering on all this.

You could start out with a value based on the intersection of
real_dev->features and real_dev->vlan_features. Then you don't need to
mess around with this extra piece where you are having OR in HW_CSUM.
That way you don't risk losing track of the state of the hardware
checksum offload in terms of vlan_features as it should completely
clear all of the checksums if none of them are supported in hardware.

>         features |= NETIF_F_RXCSUM;

This line would probably need to be changed to OR NETIF_F_RXCSUM with
the real_dev->vlan_features when we perform the first intersect test.
That way we are guaranteed to report RXCSUM if the underlying device
supports it. It might just be worth while to force this into the
vlan_features for all devices in register_netdevice() then we wouldn't
need this line at all and it probably makes sense since it would allow
us to save a few extra cycles/instructions by combining it with the
line that was adding high dma support.

> -       features = netdev_intersect_features(features, real_dev->features);
> +       if (real_dev_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
> +               real_dev_features |= NETIF_F_HW_CSUM;
> +
> +       features = netdev_intersect_features(features, real_dev_features);

This part all looks good.

My only advice like I said would be to record the vlan_features
intersection first based on the real_dev. That way you don't risk
losing state data from real device if for some reason it doesn't
support checksum offload with VLAN tagged frames.

>         features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
>         features |= NETIF_F_LLTX;
> --
> 2.7.4
>

^ permalink raw reply

* [PATCH net] tcp: make congestion control optionally skip slow start after idle
From: Wei Wang @ 2017-05-05 19:53 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Yuchung Cheng, Neal Cardwell, Eric Dumazet, Wei Wang

From: Wei Wang <weiwan@google.com>

Congestion control modules that want full control over congestion
control behavior do not want the cwnd modifications controlled by
the sysctl_tcp_slow_start_after_idle code path.
So skip those code paths for CC modules that use the cong_control()
API.
As an example, those cwnd effects are not desired for the BBR congestion
control algorithm.

Fixes: c0402760f565 ("tcp: new CC hook to set sending rate with rate_sample in any CA state")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h     | 4 +++-
 net/ipv4/tcp_output.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 270e5cc43c99..4e16486802fc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1234,10 +1234,12 @@ void tcp_cwnd_restart(struct sock *sk, s32 delta);
 
 static inline void tcp_slow_start_after_idle_check(struct sock *sk)
 {
+	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	struct tcp_sock *tp = tcp_sk(sk);
 	s32 delta;
 
-	if (!sysctl_tcp_slow_start_after_idle || tp->packets_out)
+	if (!sysctl_tcp_slow_start_after_idle || tp->packets_out ||
+	    ca_ops->cong_control)
 		return;
 	delta = tcp_time_stamp - tp->lsndtime;
 	if (delta > inet_csk(sk)->icsk_rto)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 60111a0fc201..4858e190f6ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1514,6 +1514,7 @@ static void tcp_cwnd_application_limited(struct sock *sk)
 
 static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 {
+	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	/* Track the maximum number of outstanding packets in each
@@ -1536,7 +1537,8 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 			tp->snd_cwnd_used = tp->packets_out;
 
 		if (sysctl_tcp_slow_start_after_idle &&
-		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto)
+		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto &&
+		    !ca_ops->cong_control)
 			tcp_cwnd_application_limited(sk);
 
 		/* The following conditions together indicate the starvation
-- 
2.13.0.rc1.294.g07d810a77f-goog

^ permalink raw reply related

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-05 19:43 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies
In-Reply-To: <20170501.235158.89801140704756675.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 01 May 2017 23:51:58 -0400 (EDT)

> From: Alexei Starovoitov <ast@fb.com>
> Date: Mon, 1 May 2017 20:49:21 -0700
> 
>> (gdb) x/10i bpf_prog1
>>    0x0 <bpf_prog1>:	ldimm64	r0, 590618314553
>>    0x10 <bpf_prog1+16>:	stdw	[r1+-8], r10
>>    0x18 <bpf_prog1+24>:	lddw	r10, [r1+-8]
>>    0x20 <bpf_prog1+32>:	add	r0, -1879113726
>>    0x28 <bpf_prog1+40>:	lddw	r1, [r0+0]
>>    0x30 <bpf_prog1+48>:	exit
>>    0x38:	Cannot access memory at address 0x38
>> 
>> the last line also seems wrong. Off by 1 error?
> 
> Maybe, I'll look into it tomorrow.

This is not a BPF specific problem, GDB does this for any non-linked
object you try to inspect under it.  F.e. on a sparc object:

(gdb) x/10i 0
   0x0 <foo>:   retl
   0x4 <foo+4>: clr  %o0
   0x8: Cannot access memory at address 0x8
(gdb)

Same behavior.

^ permalink raw reply

* [PATCH] vlan: Keep NETIF_F_HW_CSUM similar to other software devices
From: Vladislav Yasevich @ 2017-05-05 19:20 UTC (permalink / raw)
  To: netdev; +Cc: mkubecek, alexander.duyck, avagin, Vladislav Yasevich

Vlan devices, like all other software devices, enable
NETIF_F_HW_CSUM feature.  However, unlike all the othe other
software devices, vlans will switch to using IP|IPV6_CSUM
features, if the underlying devices uses them.  In these situations,
checksum offload features on the vlan device can't be controlled
via ethtool.

This patch makes vlans keep HW_CSUM feature if the underlying
device supports checksum offloading.  This makes vlan devices
behave like other software devices, and restores control to the
user.

A side-effect is that some offload settings (typically UFO)
may be enabled on the vlan device while being disabled on the HW.
However, the GSO code will correctly process the packets. This
actually results in slightly better raw throughput.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 net/8021q/vlan_dev.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 9ee5787..ffc8167 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -626,10 +626,16 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
 {
 	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
 	netdev_features_t old_features = features;
+	netdev_features_t real_dev_features = real_dev->features;

-	features = netdev_intersect_features(features, real_dev->vlan_features);
+	features = netdev_intersect_features(features,
+					     (real_dev->vlan_features |
+					      NETIF_F_HW_CSUM));
 	features |= NETIF_F_RXCSUM;
-	features = netdev_intersect_features(features, real_dev->features);
+	if (real_dev_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
+		real_dev_features |= NETIF_F_HW_CSUM;
+
+	features = netdev_intersect_features(features, real_dev_features);

 	features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
 	features |= NETIF_F_LLTX;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-05 18:53 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies, yhs
In-Reply-To: <bf3b6bde-d159-c6a3-a532-83cccb1833d9@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Fri, 5 May 2017 11:24:19 -0700

> Yonghong fixed llvm bug with big-endian dwarf [1]
> and binutils worked out of the box :)
> 
> $ ./bin/clang -O2 -target bpfeb -c -g test.c
> $ /w/binutils-gdb/bld/binutils/objdump -S test.o
> 
> test.o:     file format elf64-bpfbe
> 
> Disassembly of section .text:
> 0000000000000000 <bpf_prog1>:
> int bpf_prog1(void *ign)
> {
>   volatile unsigned long t = 0x8983984739ull;
 ...
> [1]
> https://reviews.llvm.org/rL302265

Great, that's good to know!

^ permalink raw reply

* Re: [PATCH] net: alx: handle pci_alloc_irq_vectors return correctly
From: David Miller @ 2017-05-05 18:52 UTC (permalink / raw)
  To: rakesh
  Cc: jcliburn, chris.snook, tobias.regnery, feng.tang, edumazet,
	netdev, linux-kernel, hch
In-Reply-To: <20170505112823.GA4019@hercules.tuxera.com>

From: Rakesh Pandit <rakesh@tuxera.com>
Date: Fri, 5 May 2017 14:28:23 +0300

> It was introduced while switching to pci_alloc_irq_vectors recently
> and fixes:
 ...
> Fixes: f3297f68 ("net: alx: switch to pci_alloc_irq_vectors")
> Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>

Applied, thanks.

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Bart Van Assche @ 2017-05-05 18:38 UTC (permalink / raw)
  To: leon@kernel.org
  Cc: jiri@mellanox.com, linux-rdma@vger.kernel.org,
	ram.amrani@cavium.com, sagi@grimberg.me, ogerlitz@mellanox.com,
	hch@lst.de, dennis.dalessandro@intel.com, netdev@vger.kernel.org,
	jgunthorpe@obsidianresearch.com, stephen@networkplumber.org,
	dledford@redhat.com, ariela@mellanox.com
In-Reply-To: <20170504184531.GE22833@mtr-leonro.local>

On Thu, 2017-05-04 at 21:45 +0300, Leon Romanovsky wrote:
> It is not hard to create new tool, the hardest part is to ensure that it is
> part of the distributions. Did you count how many months we are trying to
> add rdma-core to debian?

Hello Leon,

Sorry but I was not aware that the effort to add rdma-core to Debian was taking
that long. Please let me know if I can help with that effort.

Bart.

^ permalink raw reply

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-05-05 18:24 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, xdp-newbies, Yonghong Song
In-Reply-To: <33505cff-f730-ebac-c2d7-38f1793062b7@fb.com>

On 5/1/17 8:49 PM, Alexei Starovoitov wrote:
> On 4/30/17 9:07 AM, David Miller wrote:
>> This is mainly a synchronization point, I still need to look
>> more deeply into Alexei's -g issue.
>>
>> New in this version from v3:
>>  - Remove tailcall from opcode table
>>  - Rearrange relocations so that numbers match with LLVM ones
>>  - Emit relocs properly so that dwarf2 debug info tests pass
>>  - Handle negative load/store offsets properly, add tests
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> dwarf on little endian works now :)

Yonghong fixed llvm bug with big-endian dwarf [1]
and binutils worked out of the box :)

$ ./bin/clang -O2 -target bpfeb -c -g test.c
$ /w/binutils-gdb/bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfbe

Disassembly of section .text:
0000000000000000 <bpf_prog1>:
int bpf_prog1(void *ign)
{
   volatile unsigned long t = 0x8983984739ull;
    0:	18 10 00 00 83 98 47 39 	ldimm64	r1, 590618314553
    8:	00 00 00 00 00 00 00 89
   10:	7b a1 ff f8 00 00 00 00 	stdw	[r10+-8], r1
   return *(unsigned long *)((0xffffffff8fff0002ull) + t);
   18:	79 1a ff f8 00 00 00 00 	lddw	r1, [r10+-8]

[1]
https://reviews.llvm.org/rL302265

^ permalink raw reply

* admin
From: administrador @ 2017-05-05 13:34 UTC (permalink / raw)
  To: Recipients

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de  enviar o recibir correo nuevo hasta que vuelva a validar subuzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario:
contraseña: 
Confirmar contraseña: 
E-mail: 
teléfono: 0

Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es:00916gbd51.17 
Correo Soporte Técnico © 2017

¡gracias
Sistemas administrador

^ permalink raw reply

* Re: net/smc and the RDMA core
From: Jason Gunthorpe @ 2017-05-05 17:10 UTC (permalink / raw)
  To: Ursula Braun
  Cc: hch@lst.de, Sagi Grimberg, Bart Van Assche, davem@davemloft.net,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <750b09b5-f898-fe7f-1e82-1f6c06cc0f58@linux.vnet.ibm.com>

On Fri, May 05, 2017 at 07:06:56PM +0200, Ursula Braun wrote:

> We do not see that just loading the smc module causes this issue.The security
> risk starts with the first connection, that actually uses smc. This is only
> possible if an AF_SMC socket connection is created while the so-called
> pnet-table is available and offers a mapping between the used Ethernet
> interface and RoCE device. Such a mapping has to be configured by a user
> (via a netlink interface) and, thus, is a conscious decision by that user.

At a mimimum this escaltes any local root exploit to a full kernel
exploit in the presense of RDMA hardware, so I do not think you should
be so dimissive of the impact.

I recommend immediately sending a kconfig patch cc'd to stable making
SMC require CONFIG_BROKEN so that nobody inadvertantly turns it on.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox