Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] ipv6: propagate genlmsg_reply return code
From: Li RongQing @ 2019-02-11 11:32 UTC (permalink / raw)
  To: netdev

genlmsg_reply can fail, so propagate its return code

Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/ipv6/seg6.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 8d0ba757a46c..9b2f272ca164 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -221,9 +221,7 @@ static int seg6_genl_get_tunsrc(struct sk_buff *skb, struct genl_info *info)
 	rcu_read_unlock();
 
 	genlmsg_end(msg, hdr);
-	genlmsg_reply(msg, info);
-
-	return 0;
+	return genlmsg_reply(msg, info);
 
 nla_put_failure:
 	rcu_read_unlock();
-- 
2.16.2


^ permalink raw reply related

* [PATCH net-next] net/tls: Do not use async crypto for non-data records
From: Vakul Garg @ 2019-02-11 11:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: borisp@mellanox.com, aviadye@mellanox.com, davejwatson@fb.com,
	davem@davemloft.net, doronrk@fb.com, Vakul Garg

Addition of tls1.3 support broke tls1.2 handshake when async crypto
accelerator is used. This is because the record type for non-data
records is not propagated to user application. Also when async
decryption happens, the decryption does not stop when two different
types of records get dequeued and submitted for decryption. To address
it, we decrypt tls1.2 non-data records in synchronous way. We check
whether the record we just processed has same type as the previous one
before checking for async condition and jumping to dequeue next record.

Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
---
 net/tls/tls_sw.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index fe8c287cbaa1..ae4784734547 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1645,10 +1645,10 @@ int tls_sw_recvmsg(struct sock *sk,
 
 	do {
 		bool retain_skb = false;
-		bool async = false;
 		bool zc = false;
 		int to_decrypt;
 		int chunk = 0;
+		bool async;
 
 		skb = tls_wait_data(sk, psock, flags, timeo, &err);
 		if (!skb) {
@@ -1674,18 +1674,21 @@ int tls_sw_recvmsg(struct sock *sk,
 		    tls_ctx->crypto_recv.info.version != TLS_1_3_VERSION)
 			zc = true;
 
+		/* Do not use async mode if record is non-data */
+		if (ctx->control == TLS_RECORD_TYPE_DATA)
+			async = ctx->async_capable;
+		else
+			async = false;
+
 		err = decrypt_skb_update(sk, skb, &msg->msg_iter,
-					 &chunk, &zc, ctx->async_capable);
+					 &chunk, &zc, async);
 		if (err < 0 && err != -EINPROGRESS) {
 			tls_err_abort(sk, EBADMSG);
 			goto recv_end;
 		}
 
-		if (err == -EINPROGRESS) {
-			async = true;
+		if (err == -EINPROGRESS)
 			num_async++;
-			goto pick_next_record;
-		}
 
 		if (!cmsg) {
 			int cerr;
@@ -1704,6 +1707,9 @@ int tls_sw_recvmsg(struct sock *sk,
 			goto recv_end;
 		}
 
+		if (async)
+			goto pick_next_record;
+
 		if (!zc) {
 			if (rxm->full_len > len) {
 				retain_skb = true;
-- 
2.13.6


^ permalink raw reply related

* Re: [RFC] apparently bogus logics in unix_find_other() since 2002
From: Solar Designer @ 2019-02-11 11:21 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, David Miller
In-Reply-To: <20190210042414.GH2217@ZenIV.linux.org.uk>

On Sun, Feb 10, 2019 at 04:24:22AM +0000, Al Viro wrote:
> 	In "net/unix/af_unix.c: Set ATIME on socket inode" (back in
> 2002) we'd grown something rather odd in unix_find_other().  In the
> original patch it was
>                 u=unix_find_socket_byname(sunname, len, type, hash);
> -               if (!u)
> +               if (u) {
> +                       struct dentry *dentry;
> +                       dentry = u->protinfo.af_unix.dentry;
> +                       if (dentry)
> +                               UPDATE_ATIME(dentry->d_inode);
> +               } else
>                         goto fail;

It's this commit:

https://github.com/dmgerman/linux-bitkeeper/commit/80cbc5b9c7393c4456236543ca1e639ea0841c19

There are two hunks in that patch: one after "if (sunname->sun_path[0])"
and the other after "else".  I just did some more digging and found the
private discussion of the time, as well as a previous revision of the
patch (against 2.2.21, whereas the committed one was against 2.4.x of
the same era).  Even the earliest revision I found already has both
hunks.  I couldn't find any discussion as to why the second hunk was
possibly needed.  It is quite possible that I had added it in error.

The original problem this patch addressed was stmpclean deleting sockets
that were still actively used - specifically, PostgreSQL's.  I found
that I also tested the patch on /dev/log and X11 sockets.  However, I
can't find any indication of me ever testing with the first hunk only,
so it's quite possible I wrote both hunks at once and only tested both.

> These days the code is
> 
>                 u = unix_find_socket_byname(net, sunname, len, type, hash);
>                 if (u) {
>                         struct dentry *dentry;
>                         dentry = unix_sk(u)->path.dentry;
>                         if (dentry)
>                                 touch_atime(&unix_sk(u)->path);
>                 } else  
>                         goto fail;
> 
> but the logics is the same.  It's the abstract address case - we have
> '\0' in sunname->sun_path[0].  How in hell could that possibly have
> non-NULL ->path.dentry and what would it be?

This is probably in fact impossible.

I think it'd make sense to drop this logic, reverting to:

		if (!u)
			goto fail;

and then see if atime on an actively used socket in /tmp or on /dev/log
keeps getting updated (due to the first hunk of the above commit).

Alexander

^ permalink raw reply

* [PATCH][net-next] devlink: use direct return of genlmsg_reply
From: Li RongQing @ 2019-02-11 11:09 UTC (permalink / raw)
  To: netdev

This can remove redundant check

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/core/devlink.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index e6a015b8ac9b..76a9d287dbec 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4355,11 +4355,8 @@ static int devlink_fmsg_snd(struct devlink_fmsg *fmsg,
 		err = -EMSGSIZE;
 		goto nla_put_failure;
 	}
-	err = genlmsg_reply(skb, info);
-	if (err)
-		return err;
 
-	return 0;
+	return genlmsg_reply(skb, info);
 
 nla_put_failure:
 	nlmsg_free(skb);
-- 
2.16.2


^ permalink raw reply related

* Re: [PATCH net-next 2/2] net: marvell: mvpp2: use mvpp2_is_xlg() helper elsewhere
From: Maxime Chevallier @ 2019-02-11 11:02 UTC (permalink / raw)
  To: Russell King
  Cc: Antoine Tenart, Baruch Siach, Sven Auhagen, David S. Miller,
	netdev
In-Reply-To: <E1gt8k7-0007iN-I2@rmk-PC.armlinux.org.uk>

Hello Russell,

On Mon, 11 Feb 2019 10:23:15 +0000
Russell King <rmk+kernel@armlinux.org.uk> wrote:

>There are several places which make the decision whether to access the
>XLGMAC vs GMAC that only check for PHY_INTERFACE_MODE_10GKR and not its
>XAUI variant.  Switch these to use the new helper so that we have
>consistency through the driver.
>
>Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Thanks,

Maxime

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: marvell: mvpp2: add mvpp2_is_xlg() helper
From: Maxime Chevallier @ 2019-02-11 11:01 UTC (permalink / raw)
  To: Russell King
  Cc: Antoine Tenart, Baruch Siach, Sven Auhagen, David S. Miller,
	netdev
In-Reply-To: <E1gt8k2-0007iA-DL@rmk-PC.armlinux.org.uk>

Hello Russell,

On Mon, 11 Feb 2019 10:23:10 +0000
Russell King <rmk+kernel@armlinux.org.uk> wrote:

>Add a mvpp2_is_xlg() helper to identify whether the interface mode
>should be using the XLGMAC rather than the GMAC.
>
>Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Thanks,

Maxime

^ permalink raw reply

* Re: [PATCH for-next 4/4] devlink: add health command support
From: Jiri Pirko @ 2019-02-11 10:41 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog
In-Reply-To: <1549823329-10377-5-git-send-email-ayal@mellanox.com>

Sun, Feb 10, 2019 at 07:28:49PM CET, ayal@mellanox.com wrote:
>This patch adds support for the following commands:
>devlink health show      [DEV reporter REPORTE_NAME]
>devlink health recover    DEV reporter REPORTER_NAME
>devlink health diagnose   DEV reporter REPORTER_NAME
>devlink health dump show  DEV reporter REPORTER_NAME
>devlink health dump clear DEV reporter REPORTER_NAME
>devlink health set        DEV reporter REPORTER_NAME NAME VALUE
>
> * show: Devlink health show command displays status and configuration info on
>   specific reporter on a device or dump the info on all reporters on all
>   devices.
> * recover: Devlink health recover enables the user to initiate a
>   recovery on a reporter. This operation will increment the recoveries
>   counter displayed in the show command.
> * diagnose: Devlink health diagnose enables the user to retrieve diagnostics data
>   on a reporter on a device. The command's output is a free text defined
>   by the reporter.
> * dump show: Devlink health dump show displays the last saved dump. Devlink
>   health saves a single dump. If a dump is not already stored by
>   the Devlink for this reporter, Devlink generates a new dump. The
>   dump can be generated automatically when a reporter reports on an
>   error or manually by user's request.
>   dump output is defined by the reporter.
> * dump clear: Devlink health dump clear, deletes the last saved dump file.
> * set: Devlink health set, enables the user to configure:
>	1) grace_period [msec] time interval between auto recoveries.
>	2) auto_recover [true/false] whether the devlink should execute
>	automatic recover on error.
>
>Examples:
>$devlink health show pci/0000:00:09.0 reporter tx
>pci/0000:00:09.0:
>name tx
>  state healthy #err 0 #recover 1 last_dump_ts N/A
>    parameters:
>      grace period 600 auto_recover true
>$devlink health diagnose pci/0000:00:09.0 reporter tx
>SQs:
>  sqn: 4283 HW state: 1 stopped: false
>  sqn: 4288 HW state: 1 stopped: false
>  sqn: 4293 HW state: 1 stopped: false
>  sqn: 4298 HW state: 1 stopped: false
>  sqn: 4303 HW state: 1 stopped: false
>$devlink health dump show pci/0000:00:09.0 reporter tx
>TX dump data
>$devlink health dump clear pci/0000:00:09.0 reporter tx
>$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
>$devlink health set pci/0000:00:09.0 reporter tx auto_recover false
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
>---
> devlink/devlink.c            | 551 ++++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/devlink.h |  23 ++
> man/man8/devlink-health.8    | 176 ++++++++++++++
> man/man8/devlink.8           |   7 +-
> 4 files changed, 755 insertions(+), 2 deletions(-)

755 lines is too much for one patch.
For easier review, please split this patch into separate patchset,
preferably per-cmd.

^ permalink raw reply

* Re: [PATCH for-next 2/4]  devlink: fix print of uint64_t
From: Jiri Pirko @ 2019-02-11 10:32 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog
In-Reply-To: <1549823329-10377-3-git-send-email-ayal@mellanox.com>

Sun, Feb 10, 2019 at 07:28:47PM CET, ayal@mellanox.com wrote:
> This patch prints uint64_t with its corresponding format and avoid implicit
> cast to uint32_t.

Drop the space at the beginning of the lines.

Otherwise this looks fine.
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH for-next 1/4] devlink: refactor validation of finding required arguments
From: Jiri Pirko @ 2019-02-11 10:29 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog
In-Reply-To: <1549823329-10377-2-git-send-email-ayal@mellanox.com>

Sun, Feb 10, 2019 at 07:28:46PM CET, ayal@mellanox.com wrote:
>Introducing argument's metadata structure matching a bitmap flag per
>required argument and an error message if missing. Using this static
>array to refactor validation of finding required arguments in devlink
>command line and to ease further maintenance.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
>---
> devlink/devlink.c | 155 +++++++++++++++++-------------------------------------
> 1 file changed, 47 insertions(+), 108 deletions(-)
>
>diff --git a/devlink/devlink.c b/devlink/devlink.c
>index d823512a4030..a05755385a49 100644
>--- a/devlink/devlink.c
>+++ b/devlink/devlink.c
>@@ -39,6 +39,7 @@
> #define PARAM_CMODE_RUNTIME_STR "runtime"
> #define PARAM_CMODE_DRIVERINIT_STR "driverinit"
> #define PARAM_CMODE_PERMANENT_STR "permanent"
>+#define DL_ARGS_REQUIRED_MAX_ERR_LEN 80
> 
> static int g_new_line_count;
> 
>@@ -950,6 +951,51 @@ static int param_cmode_get(const char *cmodestr,
> 	return 0;
> }
> 
>+struct dl_args_metadata {
>+	uint32_t o_flag;
>+	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
>+};
>+
>+static const struct dl_args_metadata dl_args_required[] = {
>+	{DL_OPT_PORT_TYPE,	      "Port type not set.\n"},
>+	{DL_OPT_PORT_COUNT,	      "Port split count option expected.\n"},
>+	{DL_OPT_SB_POOL,	      "Pool index option expected.\n"},
>+	{DL_OPT_SB_SIZE,	      "Pool size option expected.\n"},
>+	{DL_OPT_SB_TYPE,	      "Pool type option expected.\n"},
>+	{DL_OPT_SB_THTYPE,	      "Pool threshold type option expected.\n"},
>+	{DL_OPT_SB_TH,		      "Threshold option expected.\n"},
>+	{DL_OPT_SB_TC,		      "TC index option expected.\n"},
>+	{DL_OPT_ESWITCH_MODE,	      "E-Switch mode option expected.\n"},
>+	{DL_OPT_ESWITCH_INLINE_MODE,  "E-Switch inline-mode option expected.\n"},
>+	{DL_OPT_DPIPE_TABLE_NAME,     "Dpipe table name expected\n"},
>+	{DL_OPT_DPIPE_TABLE_COUNTERS, "Dpipe table counter state expected\n"},
>+	{DL_OPT_ESWITCH_ENCAP_MODE,   "E-Switch encapsulation option expected.\n"},
>+	{DL_OPT_PARAM_NAME,	      "Parameter name expected.\n"},
>+	{DL_OPT_PARAM_VALUE,	      "Value to set expected.\n"},
>+	{DL_OPT_PARAM_CMODE,	      "Configuration mode expected.\n"},
>+	{DL_OPT_REGION_SNAPSHOT_ID,   "Region snapshot id expected.\n"},
>+	{DL_OPT_REGION_ADDRESS,	      "Region address value expected.\n"},
>+	{DL_OPT_REGION_LENGTH,	      "Region length value expected.\n"},

Remove the "\n" from there and put it to the pr_err() call.


>+};
>+
>+static int validate_finding_required_dl_args(uint32_t o_required,

Please maintain the current naming scheme of functions. This should be
named something like:
dl_args_finding_required_validate()


>+					     uint32_t o_found)
>+{
>+	uint32_t dl_args_required_size;
>+	uint32_t o_flag;
>+	int i;
>+
>+	dl_args_required_size = ARRAY_SIZE(dl_args_required);
>+	for (i = 0; i < dl_args_required_size; i++) {
>+		o_flag = dl_args_required[i].o_flag;
>+		if ((o_required & o_flag) && !(o_found & o_flag)) {
>+			pr_err("%s", dl_args_required[i].err_msg);
>+			return -EINVAL;
>+		}
>+	}
>+	return 0;
>+}
>+

[...]

^ permalink raw reply

* [PATCH net-next 2/2] net: marvell: mvpp2: use mvpp2_is_xlg() helper elsewhere
From: Russell King @ 2019-02-11 10:23 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

There are several places which make the decision whether to access the
XLGMAC vs GMAC that only check for PHY_INTERFACE_MODE_10GKR and not its
XAUI variant.  Switch these to use the new helper so that we have
consistency through the driver.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 03c79618bfef..94c92a49f12f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -1106,7 +1106,7 @@ static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
 	if (port->gop_id == 0) {
 		/* Enable the XLG/GIG irqs for this port */
 		val = readl(port->base + MVPP22_XLG_EXT_INT_MASK);
-		if (port->phy_interface == PHY_INTERFACE_MODE_10GKR)
+		if (mvpp2_is_xlg(port->phy_interface))
 			val |= MVPP22_XLG_EXT_INT_MASK_XLG;
 		else
 			val |= MVPP22_XLG_EXT_INT_MASK_GIG;
@@ -2471,8 +2471,7 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
 
 	mvpp22_gop_mask_irq(port);
 
-	if (port->gop_id == 0 &&
-	    port->phy_interface == PHY_INTERFACE_MODE_10GKR) {
+	if (port->gop_id == 0 && mvpp2_is_xlg(port->phy_interface)) {
 		val = readl(port->base + MVPP22_XLG_INT_STAT);
 		if (val & MVPP22_XLG_INT_STAT_LINK) {
 			event = true;
@@ -4680,7 +4679,7 @@ static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
 	bool change_interface = port->phy_interface != state->interface;
 
 	/* Check for invalid configuration */
-	if (state->interface == PHY_INTERFACE_MODE_10GKR && port->gop_id != 0) {
+	if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
 		netdev_err(dev, "Invalid mode on %s\n", dev->name);
 		return;
 	}
@@ -4700,7 +4699,7 @@ static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
 	}
 
 	/* mac (re)configuration */
-	if (state->interface == PHY_INTERFACE_MODE_10GKR)
+	if (mvpp2_is_xlg(state->interface))
 		mvpp2_xlg_config(port, mode, state);
 	else if (phy_interface_mode_is_rgmii(state->interface) ||
 		 phy_interface_mode_is_8023z(state->interface) ||
@@ -4722,8 +4721,7 @@ static void mvpp2_mac_link_up(struct net_device *dev, unsigned int mode,
 	struct mvpp2_port *port = netdev_priv(dev);
 	u32 val;
 
-	if (!phylink_autoneg_inband(mode) &&
-	    interface != PHY_INTERFACE_MODE_10GKR) {
+	if (!phylink_autoneg_inband(mode) && !mvpp2_is_xlg(interface)) {
 		val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
 		val &= ~MVPP2_GMAC_FORCE_LINK_DOWN;
 		val |= MVPP2_GMAC_FORCE_LINK_PASS;
@@ -4743,8 +4741,7 @@ static void mvpp2_mac_link_down(struct net_device *dev, unsigned int mode,
 	struct mvpp2_port *port = netdev_priv(dev);
 	u32 val;
 
-	if (!phylink_autoneg_inband(mode) &&
-	    interface != PHY_INTERFACE_MODE_10GKR) {
+	if (!phylink_autoneg_inband(mode) && !mvpp2_is_xlg(interface)) {
 		val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
 		val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
 		val |= MVPP2_GMAC_FORCE_LINK_DOWN;
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 1/2] net: marvell: mvpp2: add mvpp2_is_xlg() helper
From: Russell King @ 2019-02-11 10:23 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

Add a mvpp2_is_xlg() helper to identify whether the interface mode
should be using the XLGMAC rather than the GMAC.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index fdd538c28f8a..03c79618bfef 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -965,6 +965,11 @@ mvpp2_shared_interrupt_mask_unmask(struct mvpp2_port *port, bool mask)
 }
 
 /* Port configuration routines */
+static bool mvpp2_is_xlg(phy_interface_t interface)
+{
+	return interface == PHY_INTERFACE_MODE_10GKR ||
+	       interface == PHY_INTERFACE_MODE_XAUI;
+}
 
 static void mvpp22_gop_init_rgmii(struct mvpp2_port *port)
 {
@@ -1196,9 +1201,7 @@ static void mvpp2_port_enable(struct mvpp2_port *port)
 	u32 val;
 
 	/* Only GOP port 0 has an XLG MAC */
-	if (port->gop_id == 0 &&
-	    (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
-	     port->phy_interface == PHY_INTERFACE_MODE_10GKR)) {
+	if (port->gop_id == 0 && mvpp2_is_xlg(port->phy_interface)) {
 		val = readl(port->base + MVPP22_XLG_CTRL0_REG);
 		val |= MVPP22_XLG_CTRL0_PORT_EN |
 		       MVPP22_XLG_CTRL0_MAC_RESET_DIS;
@@ -1217,9 +1220,7 @@ static void mvpp2_port_disable(struct mvpp2_port *port)
 	u32 val;
 
 	/* Only GOP port 0 has an XLG MAC */
-	if (port->gop_id == 0 &&
-	    (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
-	     port->phy_interface == PHY_INTERFACE_MODE_10GKR)) {
+	if (port->gop_id == 0 && mvpp2_is_xlg(port->phy_interface)) {
 		val = readl(port->base + MVPP22_XLG_CTRL0_REG);
 		val &= ~MVPP22_XLG_CTRL0_PORT_EN;
 		writel(val, port->base + MVPP22_XLG_CTRL0_REG);
@@ -3161,8 +3162,7 @@ static void mvpp22_mode_reconfigure(struct mvpp2_port *port)
 		ctrl3 = readl(port->base + MVPP22_XLG_CTRL3_REG);
 		ctrl3 &= ~MVPP22_XLG_CTRL3_MACMODESELECT_MASK;
 
-		if (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
-		    port->phy_interface == PHY_INTERFACE_MODE_10GKR)
+		if (mvpp2_is_xlg(port->phy_interface))
 			ctrl3 |= MVPP22_XLG_CTRL3_MACMODESELECT_10G;
 		else
 			ctrl3 |= MVPP22_XLG_CTRL3_MACMODESELECT_GMAC;
@@ -3170,9 +3170,7 @@ static void mvpp22_mode_reconfigure(struct mvpp2_port *port)
 		writel(ctrl3, port->base + MVPP22_XLG_CTRL3_REG);
 	}
 
-	if (port->gop_id == 0 &&
-	    (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
-	     port->phy_interface == PHY_INTERFACE_MODE_10GKR))
+	if (port->gop_id == 0 && mvpp2_is_xlg(port->phy_interface))
 		mvpp2_xlg_max_rx_size_set(port);
 	else
 		mvpp2_gmac_max_rx_size_set(port);
-- 
2.7.4


^ permalink raw reply related

* Re: [Xen-devel] [PATCH net-next] xen-netback: mark expected switch fall-through
From: Jan Beulich @ 2019-02-11  9:50 UTC (permalink / raw)
  To: Gustavo A.R.Silva
  Cc: Paul Durrant, Wei Liu, davem, xen-devel, linux-kernel, netdev
In-Reply-To: <20190208195838.GA11878@embeddedor>

>>> On 08.02.19 at 20:58, <gustavo@embeddedor.com> wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
> 
> Warning level 3 was used: -Wimplicit-fallthrough=3
> 
> This patch is part of the ongoing efforts to enabling
> -Wimplicit-fallthrough.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> ---
>  drivers/net/xen-netback/xenbus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netback/xenbus.c 
> b/drivers/net/xen-netback/xenbus.c
> index 2625740bdc4a..330ddb64930f 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -655,7 +655,7 @@ static void frontend_changed(struct xenbus_device *dev,
>  		set_backend_state(be, XenbusStateClosed);
>  		if (xenbus_dev_is_online(dev))
>  			break;
> -		/* fall through if not online */
> +		/* fall through - if not online */
>  	case XenbusStateUnknown:

Considering the fall-through was already annotated, I don't think
title and description really justify the change. Is the compiler after
a particular wording here?

Jan



^ permalink raw reply

* Re: [PATCH net-next v3] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs
From: Nicolas Dichtel @ 2019-02-11  9:38 UTC (permalink / raw)
  To: Callum Sinclair, davem, kuznet, yoshfuji, nikolay, netdev,
	linux-kernel
In-Reply-To: <20190211035412.29218-2-callum.sinclair@alliedtelesis.co.nz>

Le 11/02/2019 à 04:54, Callum Sinclair a écrit :
> v1 -> v2:
> Implemented additional flags for static entries
> v2 -> v3:
> Cleaned up flag logic so any combination of routes can be cleared.
> Fixed style errors
> Fixed incorrect flag values
nit: those lines are usually put after the '---', thus they don't appear in the
final commit log (they are useful for reviewers only).

> 
> Currently the only way to clear the forwarding cache was to delete the
> entries one by one using the MRT_DEL_MFC socket option or to destroy and
> recreate the socket.
> 
> Create a new socket option which with the use of optional flags can
> clear any combination of multicast entries (static or not static) and
> multicast vifs (static or not static).
> 
> Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and
> MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for
> static entries.
> 
> Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
> ---
ie, here

>  include/uapi/linux/mroute.h  |  9 ++++-
>  include/uapi/linux/mroute6.h |  9 ++++-
>  net/ipv4/ipmr.c              | 73 ++++++++++++++++++++-------------
>  net/ipv6/ip6mr.c             | 78 +++++++++++++++++++++++-------------
>  4 files changed, 112 insertions(+), 57 deletions(-)
> 
> diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
> index 5d37a9ccce63..11c8c1fc1124 100644
> --- a/include/uapi/linux/mroute.h
> +++ b/include/uapi/linux/mroute.h
> @@ -28,12 +28,19 @@
>  #define MRT_TABLE	(MRT_BASE+9)	/* Specify mroute table ID		*/
>  #define MRT_ADD_MFC_PROXY	(MRT_BASE+10)	/* Add a (*,*|G) mfc entry	*/
>  #define MRT_DEL_MFC_PROXY	(MRT_BASE+11)	/* Del a (*,*|G) mfc entry	*/
> -#define MRT_MAX		(MRT_BASE+11)
> +#define MRT_FLUSH	(MRT_BASE+12)	/* Flush all mfc entries and/or vifs	*/
> +#define MRT_MAX		(MRT_BASE+12)
>  
>  #define SIOCGETVIFCNT	SIOCPROTOPRIVATE	/* IP protocol privates */
>  #define SIOCGETSGCNT	(SIOCPROTOPRIVATE+1)
>  #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
>  
> +/* MRT_FLUSH optional flags */
> +#define MRT_FLUSH_MFC	1	/* Flush multicast entries */
> +#define MRT_FLUSH_MFC_STATIC	2	/* Flush static multicast entries */
> +#define MRT_FLUSH_VIFS	4	/* Flush multicast vifs */
> +#define MRT_FLUSH_VIFS_STATIC	8	/* Flush static multicast vifs */
> +
>  #define MAXVIFS		32
>  typedef unsigned long vifbitmap_t;	/* User mode code depends on this lot */
>  typedef unsigned short vifi_t;
> diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
> index 9999cc006390..ac84ef11b29c 100644
> --- a/include/uapi/linux/mroute6.h
> +++ b/include/uapi/linux/mroute6.h
> @@ -31,12 +31,19 @@
>  #define MRT6_TABLE	(MRT6_BASE+9)	/* Specify mroute table ID		*/
>  #define MRT6_ADD_MFC_PROXY	(MRT6_BASE+10)	/* Add a (*,*|G) mfc entry	*/
>  #define MRT6_DEL_MFC_PROXY	(MRT6_BASE+11)	/* Del a (*,*|G) mfc entry	*/
> -#define MRT6_MAX	(MRT6_BASE+11)
> +#define MRT6_FLUSH	(MRT6_BASE+12)	/* Flush all mfc entries and/or vifs	*/
> +#define MRT6_MAX	(MRT6_BASE+12)
>  
>  #define SIOCGETMIFCNT_IN6	SIOCPROTOPRIVATE	/* IP protocol privates */
>  #define SIOCGETSGCNT_IN6	(SIOCPROTOPRIVATE+1)
>  #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
>  
> +/* MRT6_FLUSH optional flags */
> +#define MRT6_FLUSH_MFC	1	/* Flush multicast entries */
> +#define MRT6_FLUSH_MFC_STATIC	2	/* Flush static multicast entries */
> +#define MRT6_FLUSH_VIFS	4	/* Flushing multicast vifs */
> +#define MRT6_FLUSH_VIFS_STATIC	8	/* Flush static multicast vifs */
> +
>  #define MAXMIFS		32
>  typedef unsigned long mifbitmap_t;	/* User mode code depends on this lot */
>  typedef unsigned short mifi_t;
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index e536970557dd..2c95ef8cf224 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -110,7 +110,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
>  static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
>  				 int cmd);
>  static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
> -static void mroute_clean_tables(struct mr_table *mrt, bool all);
> +static void mroute_clean_tables(struct mr_table *mrt, int flags);
>  static void ipmr_expire_process(struct timer_list *t);
>  
>  #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
> @@ -415,7 +415,8 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
>  static void ipmr_free_table(struct mr_table *mrt)
>  {
>  	del_timer_sync(&mrt->ipmr_expire_timer);
> -	mroute_clean_tables(mrt, true);
> +	mroute_clean_tables(mrt, MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC |
> +						MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC);
nit: MRT_FLUSH_MFC must be aligned with 'mrt'

>  	rhltable_destroy(&mrt->mfc_hash);
>  	kfree(mrt);
>  }
> @@ -1296,7 +1297,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
>  }
>  
>  /* Close the multicast socket, and clear the vif tables etc */
> -static void mroute_clean_tables(struct mr_table *mrt, bool all)
> +static void mroute_clean_tables(struct mr_table *mrt, int flags)
>  {
>  	struct net *net = read_pnet(&mrt->net);
>  	struct mr_mfc *c, *tmp;
> @@ -1305,35 +1306,42 @@ static void mroute_clean_tables(struct mr_table *mrt, bool all)
>  	int i;
>  
>  	/* Shut down all active vif entries */
> -	for (i = 0; i < mrt->maxvif; i++) {
> -		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
> -			continue;
> -		vif_delete(mrt, i, 0, &list);
> +	if (flags & (MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC)) {
> +		for (i = 0; i < mrt->maxvif; i++) {
> +			if (((mrt->vif_table[i].flags & VIFF_STATIC) &&
> +			     !(flags & MRT_FLUSH_VIFS_STATIC)) ||
> +			    (!(mrt->vif_table[i].flags & VIFF_STATIC) && !(flags & MRT_FLUSH)))
s/MRT_FLUSH/MRT_FLUSH_VIFS


Regards,
Nicolas

^ permalink raw reply

* Re: [Resend PATCH] mt76: change the return type of mt76_dma_attach()
From: Kalle Valo @ 2019-02-11  9:19 UTC (permalink / raw)
  To: Ryder Lee
  Cc: Lorenzo Bianconi, Felix Fietkau, Roy Luo, linux-wireless,
	linux-kernel, netdev, linux-mediatek
In-Reply-To: <228fdddb9ca96e8ce861e324eb9039722cf18f49.1549850911.git.ryder.lee@mediatek.com>

Ryder Lee <ryder.lee@mediatek.com> writes:

> There is no need to return 0 in mt76_dma_attach(), so switch it to void.
>
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>

When you send a new version mark it as v2:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#patch_version_missing

And add a changelog:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#changelog_missing

No need to resend because of this, but please do remember this in the
future.

-- 
Kalle Valo

^ permalink raw reply

* [PATCH v2 net-next] Revert "devlink: Add a generic wake_on_lan port parameter"
From: Vasundhara Volam @ 2019-02-11  9:16 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1549876577-195336-1-git-send-email-vasundhara-v.volam@broadcom.com>

This reverts commit b639583f9e36d044ac1b13090ae812266992cbac.

As per discussion with Jakub Kicinski and Michal Kubecek,
this will be better addressed by soon-too-come ethtool netlink
API with additional indication that given configuration request
is supposed to be persisted.

Also, remove the parameter support from bnxt_en driver.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 19 +------------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 -
 include/net/devlink.h                             |  8 --------
 net/core/devlink.c                                |  5 -----
 4 files changed, 1 insertion(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 2955e40..e1feb97 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -37,8 +37,6 @@ enum bnxt_dl_param_id {
 	 NVM_OFF_MSIX_VEC_PER_PF_MIN, BNXT_NVM_SHARED_CFG, 7},
 	{BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK, NVM_OFF_DIS_GRE_VER_CHECK,
 	 BNXT_NVM_SHARED_CFG, 1},
-
-	{DEVLINK_PARAM_GENERIC_ID_WOL, NVM_OFF_WOL, BNXT_NVM_PORT_CFG, 1},
 };
 
 static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
@@ -72,8 +70,7 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
 	bytesize = roundup(nvm_param.num_bits, BITS_PER_BYTE) / BITS_PER_BYTE;
 	switch (bytesize) {
 	case 1:
-		if (nvm_param.num_bits == 1 &&
-		    nvm_param.id != DEVLINK_PARAM_GENERIC_ID_WOL)
+		if (nvm_param.num_bits == 1)
 			buf = &val->vbool;
 		else
 			buf = &val->vu8;
@@ -167,17 +164,6 @@ static int bnxt_dl_msix_validate(struct devlink *dl, u32 id,
 	return 0;
 }
 
-static int bnxt_dl_wol_validate(struct devlink *dl, u32 id,
-				union devlink_param_value val,
-				struct netlink_ext_ack *extack)
-{
-	if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
-		NL_SET_ERR_MSG_MOD(extack, "WOL type is not supported");
-		return -EINVAL;
-	}
-	return 0;
-}
-
 static const struct devlink_param bnxt_dl_params[] = {
 	DEVLINK_PARAM_GENERIC(ENABLE_SRIOV,
 			      BIT(DEVLINK_PARAM_CMODE_PERMANENT),
@@ -203,9 +189,6 @@ static int bnxt_dl_wol_validate(struct devlink *dl, u32 id,
 };
 
 static const struct devlink_param bnxt_dl_port_params[] = {
-	DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
-			      bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
-			      bnxt_dl_wol_validate),
 };
 
 int bnxt_dl_register(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index da065ca..5b6b2c7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -35,7 +35,6 @@ static inline void bnxt_link_bp_to_dl(struct bnxt *bp, struct devlink *dl)
 
 #define NVM_OFF_MSIX_VEC_PER_PF_MAX	108
 #define NVM_OFF_MSIX_VEC_PER_PF_MIN	114
-#define NVM_OFF_WOL			152
 #define NVM_OFF_IGNORE_ARI		164
 #define NVM_OFF_DIS_GRE_VER_CHECK	171
 #define NVM_OFF_ENABLE_SRIOV		401
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2b384a3..083a8b3 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -370,17 +370,12 @@ enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
 	DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
 	DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY,
-	DEVLINK_PARAM_GENERIC_ID_WOL,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
 	DEVLINK_PARAM_GENERIC_ID_MAX = __DEVLINK_PARAM_GENERIC_ID_MAX - 1,
 };
 
-enum devlink_param_wol_types {
-	DEVLINK_PARAM_WAKE_MAGIC = (1 << 0),
-};
-
 #define DEVLINK_PARAM_GENERIC_INT_ERR_RESET_NAME "internal_error_reset"
 #define DEVLINK_PARAM_GENERIC_INT_ERR_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
 
@@ -405,9 +400,6 @@ enum devlink_param_wol_types {
 #define DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_NAME "fw_load_policy"
 #define DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_TYPE DEVLINK_PARAM_TYPE_U8
 
-#define DEVLINK_PARAM_GENERIC_WOL_NAME "wake_on_lan"
-#define DEVLINK_PARAM_GENERIC_WOL_TYPE DEVLINK_PARAM_TYPE_U8
-
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e6a015b..00269a9 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2701,11 +2701,6 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 		.name = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_NAME,
 		.type = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_TYPE,
 	},
-	{
-		.id = DEVLINK_PARAM_GENERIC_ID_WOL,
-		.name = DEVLINK_PARAM_GENERIC_WOL_NAME,
-		.type = DEVLINK_PARAM_GENERIC_WOL_TYPE,
-	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v2 net-next] Revert wake_on_lan devlink parameter
From: Vasundhara Volam @ 2019-02-11  9:16 UTC (permalink / raw)
  To: davem; +Cc: netdev

As per discussion with Jakub Kicinski and Michal Kubecek,
this will be better addressed by soon-too-come ethtool netlink
API with additional indication that given WoL configuration request
is supposed to be persisted.

Retain bnxt_en code for devlink port param table registration.
There will be follow up patches to add some devlink port params
for bnxt_en driver.

v1->v2:
Combine 2 patches into 1 patch to avoid build failures.

Vasundhara Volam (1):
  Revert "devlink: Add a generic wake_on_lan port parameter"

 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 19 +------------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 -
 include/net/devlink.h                             |  8 --------
 net/core/devlink.c                                |  5 -----
 4 files changed, 1 insertion(+), 32 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [iproute PATCH] man: ip-link: Describe promisc mode
From: Phil Sutter @ 2019-02-11  9:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, William Flanagan
In-Reply-To: <20190211091418.GL26388@orbyte.nwl.cc>

Briefly explain what it is and where it's typically used.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 man/man8/ip-link.8.in | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 73d37c190fffa..5c327f01b6b45 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1780,6 +1780,14 @@ flag on the device. Indicates that address can change when interface goes down (
 .B NOT
 used by the Linux).
 
+.TP
+.BR "promisc on " or " promisc off"
+change the
+.B PROMISC
+flag on the device. This requests receipt of all packets arriving at the NIC
+irrespective of their destination MAC address. It is typically used by traffic
+sniffers and also set by Linux bridges for their ports.
+
 .TP
 .BI name " NAME"
 change the name of the device. This operation is not
-- 
2.20.1


^ permalink raw reply related

* Re: How to set promiscuous mode
From: Phil Sutter @ 2019-02-11  9:14 UTC (permalink / raw)
  To: William Flanagan; +Cc: netdev
In-Reply-To: <f4c04e76-1e38-c742-5637-030cd74bc279@flanagan-consulting.com>

Hi,

On Sat, Feb 09, 2019 at 03:22:33PM -0500, William Flanagan wrote:
> Working with iproute2 for a task with Wireshark.  I don't see the 
> command in 'ip' to put an Ethernet port into promiscuous mode.  A reply 
> from the openSuse forum (below) tells me how.
> 
> I'm wondering if this should be in the 'MAN ip' page.

Please have a look at ip-link(8), its synopsis section at least lists
'promisc' option for 'ip link set' command. I'll follow-up with a patch
adding a little description, though.

Cheers, Phil

^ permalink raw reply

* [PATCH net] sk_msg: Keep reference on socket file while psock lives
From: Jakub Sitnicki @ 2019-02-11  9:09 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Borkmann, John Fastabend, Marek Majkowski

Backlog work for psock (sk_psock_backlog) might sleep while waiting for
memory to free up when sending packets. While sleeping, socket can
disappear from under our feet together with its wait queue because the
userspace has closed it.

This breaks an assumption in sk_stream_wait_memory, which expects the
wait queue to be still there when it wakes up resulting in a
use-after-free:

==================================================================
BUG: KASAN: use-after-free in remove_wait_queue+0x31/0x70
Write of size 8 at addr ffff888069a0c4e8 by task kworker/0:2/110

CPU: 0 PID: 110 Comm: kworker/0:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue: events sk_psock_backlog
Call Trace:
 print_address_description+0x6e/0x2b0
 ? remove_wait_queue+0x31/0x70
 kasan_report+0xfd/0x177
 ? remove_wait_queue+0x31/0x70
 ? remove_wait_queue+0x31/0x70
 remove_wait_queue+0x31/0x70
 sk_stream_wait_memory+0x4dd/0x5f0
 ? sk_stream_wait_close+0x1b0/0x1b0
 ? wait_woken+0xc0/0xc0
 ? tcp_current_mss+0xc5/0x110
 tcp_sendmsg_locked+0x634/0x15d0
 ? tcp_set_state+0x2e0/0x2e0
 ? __kasan_slab_free+0x1d1/0x230
 ? kmem_cache_free+0x70/0x140
 ? sk_psock_backlog+0x40c/0x4b0
 ? process_one_work+0x40b/0x660
 ? worker_thread+0x82/0x680
 ? kthread+0x1b9/0x1e0
 ? ret_from_fork+0x1f/0x30
 ? check_preempt_curr+0xaf/0x130
 ? iov_iter_kvec+0x5f/0x70
 ? kernel_sendmsg_locked+0xa0/0xe0
 skb_send_sock_locked+0x273/0x3c0
 ? skb_splice_bits+0x180/0x180
 ? start_thread+0xe0/0xe0
 ? update_min_vruntime.constprop.27+0x88/0xc0
 sk_psock_backlog+0xb3/0x4b0
 ? strscpy+0xbf/0x1e0
 process_one_work+0x40b/0x660
 worker_thread+0x82/0x680
 ? process_one_work+0x660/0x660
 kthread+0x1b9/0x1e0
 ? __kthread_create_on_node+0x250/0x250
 ret_from_fork+0x1f/0x30

Allocated by task 109:
 sock_alloc_inode+0x54/0x120
 alloc_inode+0x28/0xb0
 new_inode_pseudo+0x7/0x60
 sock_alloc+0x21/0xe0
 __sys_accept4+0xc2/0x330
 __x64_sys_accept+0x3b/0x50
 do_syscall_64+0xb2/0x3e0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 7:
 kfree+0x7f/0x140
 rcu_process_callbacks+0xe0/0x100
 __do_softirq+0xe5/0x29a

The buggy address belongs to the object at ffff888069a0c4e0
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 8 bytes inside of
 64-byte region [ffff888069a0c4e0, ffff888069a0c520)
The buggy address belongs to the page:
page:ffffea0001a68300 count:1 mapcount:0 mapping:ffff88806d4018c0 index:0x0
flags: 0x4000000000000200(slab)
raw: 4000000000000200 dead000000000100 dead000000000200 ffff88806d4018c0
raw: 0000000000000000 00000000002a002a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888069a0c380: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb
 ffff888069a0c400: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
>ffff888069a0c480: 00 00 00 00 00 00 00 00 fc fc fc fc fb fb fb fb
                                                          ^
 ffff888069a0c500: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb
 ffff888069a0c580: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
==================================================================

Avoid it by keeping a reference to the socket file until the psock gets
destroyed.

While at it, rearrange the order of reference grabbing and
initialization to match the destructor in reverse.

Reported-by: Marek Majkowski <marek@cloudflare.com>
Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/skmsg.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 8c826603bf36..a38442b8580b 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -493,8 +493,13 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
 	sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED);
 	refcount_set(&psock->refcnt, 1);
 
-	rcu_assign_sk_user_data(sk, psock);
+	/* Hold on to socket wait queue. Backlog work waits on it for
+	 * memory when sending. We must cancel work before socket wait
+	 * queue can go away.
+	 */
+	get_file(sk->sk_socket->file);
 	sock_hold(sk);
+	rcu_assign_sk_user_data(sk, psock);
 
 	return psock;
 }
@@ -558,6 +563,7 @@ static void sk_psock_destroy_deferred(struct work_struct *gc)
 	if (psock->sk_redir)
 		sock_put(psock->sk_redir);
 	sock_put(psock->sk);
+	fput(psock->sk->sk_socket->file);
 	kfree(psock);
 }
 
-- 
2.17.2


^ permalink raw reply related

* Re: [PATCH][next] can: at91_can: mark expected switch fall-throughs
From: Nicolas.Ferre @ 2019-02-11  9:03 UTC (permalink / raw)
  To: gustavo, wg, mkl, davem, alexandre.belloni, Ludovic.Desroches
  Cc: linux-can, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20190208184444.GA28484@embeddedor>

On 08/02/2019 at 19:44, Gustavo A. R. Silva wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
> 
> Notice that, in this particular case, the /* fall through */
> comments are placed at the bottom of the case statement, which
> is what GCC is expecting to find.
> 
> Warning level 3 was used: -Wimplicit-fallthrough=3
> 
> This patch is part of the ongoing efforts to enabling
> -Wimplicit-fallthrough.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Looks good to me:
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>

> ---
>   drivers/net/can/at91_can.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
> index d98c69045b17..1718c20f9c99 100644
> --- a/drivers/net/can/at91_can.c
> +++ b/drivers/net/can/at91_can.c
> @@ -902,7 +902,8 @@ static void at91_irq_err_state(struct net_device *dev,
>   				CAN_ERR_CRTL_TX_WARNING :
>   				CAN_ERR_CRTL_RX_WARNING;
>   		}
> -	case CAN_STATE_ERROR_WARNING:	/* fallthrough */
> +		/* fall through */
> +	case CAN_STATE_ERROR_WARNING:
>   		/*
>   		 * from: ERROR_ACTIVE, ERROR_WARNING
>   		 * to  : ERROR_PASSIVE, BUS_OFF
> @@ -951,7 +952,8 @@ static void at91_irq_err_state(struct net_device *dev,
>   		netdev_dbg(dev, "Error Active\n");
>   		cf->can_id |= CAN_ERR_PROT;
>   		cf->data[2] = CAN_ERR_PROT_ACTIVE;
> -	case CAN_STATE_ERROR_WARNING:	/* fallthrough */
> +		/* fall through */
> +	case CAN_STATE_ERROR_WARNING:
>   		reg_idr = AT91_IRQ_ERRA | AT91_IRQ_WARN | AT91_IRQ_BOFF;
>   		reg_ier = AT91_IRQ_ERRP;
>   		break;
> 


-- 
Nicolas Ferre

^ permalink raw reply

* [PATCH net-next v4 08/17] net: sched: introduce reference counting for tcf_proto
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock and allow concurrent tcf_proto
modification, extend tcf_proto with reference counter. Implement helper
get/put functions for tcf proto and use them to modify cls API to always
take reference to tcf_proto while using it. Only release reference to
parent chain after releasing last reference to tp.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 53 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 85993d7efee6..4372c08fc4d9 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -322,6 +322,7 @@ struct tcf_proto {
 	void			*data;
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
+	refcount_t		refcnt;
 	struct rcu_head		rcu;
 };
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3fce30ae9a9b..37c05b96898f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -180,6 +180,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->protocol = protocol;
 	tp->prio = prio;
 	tp->chain = chain;
+	refcount_set(&tp->refcnt, 1);
 
 	err = tp->ops->init(tp);
 	if (err) {
@@ -193,14 +194,29 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	return ERR_PTR(err);
 }
 
+static void tcf_proto_get(struct tcf_proto *tp)
+{
+	refcount_inc(&tp->refcnt);
+}
+
+static void tcf_chain_put(struct tcf_chain *chain);
+
 static void tcf_proto_destroy(struct tcf_proto *tp,
 			      struct netlink_ext_ack *extack)
 {
 	tp->ops->destroy(tp, extack);
+	tcf_chain_put(tp->chain);
 	module_put(tp->ops->owner);
 	kfree_rcu(tp, rcu);
 }
 
+static void tcf_proto_put(struct tcf_proto *tp,
+			  struct netlink_ext_ack *extack)
+{
+	if (refcount_dec_and_test(&tp->refcnt))
+		tcf_proto_destroy(tp, extack);
+}
+
 #define ASSERT_BLOCK_LOCKED(block)					\
 	lockdep_assert_held(&(block)->lock)
 
@@ -445,18 +461,18 @@ static void tcf_chain_put_explicitly_created(struct tcf_chain *chain)
 
 static void tcf_chain_flush(struct tcf_chain *chain)
 {
-	struct tcf_proto *tp;
+	struct tcf_proto *tp, *tp_next;
 
 	mutex_lock(&chain->filter_chain_lock);
 	tp = tcf_chain_dereference(chain->filter_chain, chain);
+	RCU_INIT_POINTER(chain->filter_chain, NULL);
 	tcf_chain0_head_change(chain, NULL);
 	mutex_unlock(&chain->filter_chain_lock);
 
 	while (tp) {
-		RCU_INIT_POINTER(chain->filter_chain, tp->next);
-		tcf_proto_destroy(tp, NULL);
-		tp = rtnl_dereference(chain->filter_chain);
-		tcf_chain_put(chain);
+		tp_next = rcu_dereference_protected(tp->next, 1);
+		tcf_proto_put(tp, NULL);
+		tp = tp_next;
 	}
 }
 
@@ -1500,9 +1516,9 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 {
 	if (*chain_info->pprev == chain->filter_chain)
 		tcf_chain0_head_change(chain, tp);
+	tcf_proto_get(tp);
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
-	tcf_chain_hold(chain);
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -1514,7 +1530,6 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 	if (tp == chain->filter_chain)
 		tcf_chain0_head_change(chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
-	tcf_chain_put(chain);
 }
 
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
@@ -1541,7 +1556,12 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 		}
 	}
 	chain_info->pprev = pprev;
-	chain_info->next = tp ? tp->next : NULL;
+	if (tp) {
+		chain_info->next = tp->next;
+		tcf_proto_get(tp);
+	} else {
+		chain_info->next = NULL;
+	}
 	return tp;
 }
 
@@ -1699,6 +1719,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	prio = TC_H_MAJ(t->tcm_info);
 	prio_allocate = false;
 	parent = t->tcm_parent;
+	tp = NULL;
 	cl = 0;
 
 	if (prio == 0) {
@@ -1816,6 +1837,12 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
+		if (!tp_created)
+			tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
@@ -1946,8 +1973,11 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 errout:
-	if (chain)
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	return err;
 
@@ -2038,8 +2068,11 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 errout:
-	if (chain)
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	return err;
 }
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 02/17] net: sched: protect chain->explicitly_created with block->lock
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock, protect
tcf_chain->explicitly_created flag with block->lock. Consolidate code that
checks and resets 'explicitly_created' flag into __tcf_chain_put() to
execute it atomically with rest of code that puts chain reference.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/cls_api.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 806e7158a7e8..2ebf8e53038a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -370,13 +370,22 @@ EXPORT_SYMBOL(tcf_chain_get_by_act);
 
 static void tc_chain_tmplt_del(struct tcf_chain *chain);
 
-static void __tcf_chain_put(struct tcf_chain *chain, bool by_act)
+static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
+			    bool explicitly_created)
 {
 	struct tcf_block *block = chain->block;
 	bool is_last, free_block = false;
 	unsigned int refcnt;
 
 	mutex_lock(&block->lock);
+	if (explicitly_created) {
+		if (!chain->explicitly_created) {
+			mutex_unlock(&block->lock);
+			return;
+		}
+		chain->explicitly_created = false;
+	}
+
 	if (by_act)
 		chain->action_refcnt--;
 
@@ -402,19 +411,18 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act)
 
 static void tcf_chain_put(struct tcf_chain *chain)
 {
-	__tcf_chain_put(chain, false);
+	__tcf_chain_put(chain, false, false);
 }
 
 void tcf_chain_put_by_act(struct tcf_chain *chain)
 {
-	__tcf_chain_put(chain, true);
+	__tcf_chain_put(chain, true, false);
 }
 EXPORT_SYMBOL(tcf_chain_put_by_act);
 
 static void tcf_chain_put_explicitly_created(struct tcf_chain *chain)
 {
-	if (chain->explicitly_created)
-		tcf_chain_put(chain);
+	__tcf_chain_put(chain, false, true);
 }
 
 static void tcf_chain_flush(struct tcf_chain *chain)
@@ -2305,7 +2313,6 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		 * to the chain previously taken during addition.
 		 */
 		tcf_chain_put_explicitly_created(chain);
-		chain->explicitly_created = false;
 		break;
 	case RTM_GETCHAIN:
 		err = tc_chain_notify(chain, skb, n->nlmsg_seq,
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 04/17] net: sched: protect block->chain0 with block->lock
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock, use block->lock to protect
chain0 struct from concurrent modification. Rearrange code in chain0
callback add and del functions to only access chain0 when block->lock is
held.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b5db0f79db14..869ae44d7631 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -244,8 +244,11 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
 
 	if (chain->index)
 		return;
+
+	mutex_lock(&block->lock);
 	list_for_each_entry(item, &block->chain0.filter_chain_list, list)
 		tcf_chain_head_change_item(item, tp_head);
+	mutex_unlock(&block->lock);
 }
 
 /* Returns true if block can be safely freed. */
@@ -756,8 +759,8 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 			      struct tcf_block_ext_info *ei,
 			      struct netlink_ext_ack *extack)
 {
-	struct tcf_chain *chain0 = block->chain0.chain;
 	struct tcf_filter_chain_list_item *item;
+	struct tcf_chain *chain0;
 
 	item = kmalloc(sizeof(*item), GFP_KERNEL);
 	if (!item) {
@@ -766,9 +769,14 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 	}
 	item->chain_head_change = ei->chain_head_change;
 	item->chain_head_change_priv = ei->chain_head_change_priv;
+
+	mutex_lock(&block->lock);
+	chain0 = block->chain0.chain;
 	if (chain0 && chain0->filter_chain)
 		tcf_chain_head_change_item(item, chain0->filter_chain);
 	list_add(&item->list, &block->chain0.filter_chain_list);
+	mutex_unlock(&block->lock);
+
 	return 0;
 }
 
@@ -776,20 +784,23 @@ static void
 tcf_chain0_head_change_cb_del(struct tcf_block *block,
 			      struct tcf_block_ext_info *ei)
 {
-	struct tcf_chain *chain0 = block->chain0.chain;
 	struct tcf_filter_chain_list_item *item;
 
+	mutex_lock(&block->lock);
 	list_for_each_entry(item, &block->chain0.filter_chain_list, list) {
 		if ((!ei->chain_head_change && !ei->chain_head_change_priv) ||
 		    (item->chain_head_change == ei->chain_head_change &&
 		     item->chain_head_change_priv == ei->chain_head_change_priv)) {
-			if (chain0)
+			if (block->chain0.chain)
 				tcf_chain_head_change_item(item, NULL);
 			list_del(&item->list);
+			mutex_unlock(&block->lock);
+
 			kfree(item);
 			return;
 		}
 	}
+	mutex_unlock(&block->lock);
 	WARN_ON(1);
 }
 
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto()
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 70 +++++++++++++++++++++++++++++++++++++++++++--------
 net/sched/sch_api.c   |  4 +--
 3 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 38bee7dd21d1..e5dafa5ee1b2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -46,6 +46,8 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
 				     struct tcf_chain *chain);
+struct tcf_proto *tcf_get_next_proto(struct tcf_chain *chain,
+				     struct tcf_proto *tp);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 37c05b96898f..dca8a3bee9c2 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -980,6 +980,45 @@ tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_get_next_chain);
 
+static struct tcf_proto *
+__tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+	ASSERT_RTNL();
+	mutex_lock(&chain->filter_chain_lock);
+
+	if (!tp)
+		tp = tcf_chain_dereference(chain->filter_chain, chain);
+	else
+		tp = tcf_chain_dereference(tp->next, chain);
+
+	if (tp)
+		tcf_proto_get(tp);
+
+	mutex_unlock(&chain->filter_chain_lock);
+
+	return tp;
+}
+
+/* Function to be used by all clients that want to iterate over all tp's on
+ * chain. Users of this function must be tolerant to concurrent tp
+ * insertion/deletion or ensure that no concurrent chain modification is
+ * possible. Note that all netlink dump callbacks cannot guarantee to provide
+ * consistent dump because rtnl lock is released each time skb is filled with
+ * data and sent to user-space.
+ */
+
+struct tcf_proto *
+tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+	struct tcf_proto *tp_next = __tcf_get_next_proto(chain, tp);
+
+	if (tp)
+		tcf_proto_put(tp, NULL);
+
+	return tp_next;
+}
+EXPORT_SYMBOL(tcf_get_next_proto);
+
 static void tcf_block_flush_all_chains(struct tcf_block *block)
 {
 	struct tcf_chain *chain;
@@ -1352,7 +1391,7 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcf_chain *chain, *chain_prev;
-	struct tcf_proto *tp;
+	struct tcf_proto *tp, *tp_prev;
 	int err;
 
 	for (chain = __tcf_get_next_chain(block, NULL);
@@ -1360,8 +1399,10 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 	     chain_prev = chain,
 		     chain = __tcf_get_next_chain(block, chain),
 		     tcf_chain_put(chain_prev)) {
-		for (tp = rtnl_dereference(chain->filter_chain); tp;
-		     tp = rtnl_dereference(tp->next)) {
+		for (tp = __tcf_get_next_proto(chain, NULL); tp;
+		     tp_prev = tp,
+			     tp = __tcf_get_next_proto(chain, tp),
+			     tcf_proto_put(tp_prev, NULL)) {
 			if (tp->ops->reoffload) {
 				err = tp->ops->reoffload(tp, add, cb, cb_priv,
 							 extack);
@@ -1378,6 +1419,7 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 	return 0;
 
 err_playback_remove:
+	tcf_proto_put(tp, NULL);
 	tcf_chain_put(chain);
 	tcf_block_playback_offloads(block, cb, cb_priv, false, offload_in_use,
 				    extack);
@@ -1677,8 +1719,8 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 {
 	struct tcf_proto *tp;
 
-	for (tp = rtnl_dereference(chain->filter_chain);
-	     tp; tp = rtnl_dereference(tp->next))
+	for (tp = tcf_get_next_proto(chain, NULL);
+	     tp; tp = tcf_get_next_proto(chain, tp))
 		tfilter_notify(net, oskb, n, tp, block,
 			       q, parent, NULL, event, false);
 }
@@ -2104,11 +2146,15 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 	struct net *net = sock_net(skb->sk);
 	struct tcf_block *block = chain->block;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
+	struct tcf_proto *tp, *tp_prev;
 	struct tcf_dump_args arg;
-	struct tcf_proto *tp;
 
-	for (tp = rtnl_dereference(chain->filter_chain);
-	     tp; tp = rtnl_dereference(tp->next), (*p_index)++) {
+	for (tp = __tcf_get_next_proto(chain, NULL);
+	     tp;
+	     tp_prev = tp,
+		     tp = __tcf_get_next_proto(chain, tp),
+		     tcf_proto_put(tp_prev, NULL),
+		     (*p_index)++) {
 		if (*p_index < index_start)
 			continue;
 		if (TC_H_MAJ(tcm->tcm_info) &&
@@ -2125,7 +2171,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 					  NETLINK_CB(cb->skb).portid,
 					  cb->nlh->nlmsg_seq, NLM_F_MULTI,
 					  RTM_NEWTFILTER) <= 0)
-				return false;
+				goto errout;
 
 			cb->args[1] = 1;
 		}
@@ -2145,9 +2191,13 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 		cb->args[2] = arg.w.cookie;
 		cb->args[1] = arg.w.count + 1;
 		if (arg.w.stop)
-			return false;
+			goto errout;
 	}
 	return true;
+
+errout:
+	tcf_proto_put(tp, NULL);
+	return false;
 }
 
 /* called with RTNL */
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 80058abc729f..9a530cad2759 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1914,8 +1914,8 @@ static void tc_bind_tclass(struct Qdisc *q, u32 portid, u32 clid,
 	     chain = tcf_get_next_chain(block, chain)) {
 		struct tcf_proto *tp;
 
-		for (tp = rtnl_dereference(chain->filter_chain);
-		     tp; tp = rtnl_dereference(tp->next)) {
+		for (tp = tcf_get_next_proto(chain, NULL);
+		     tp; tp = tcf_get_next_proto(chain, tp)) {
 			struct tcf_bind_args arg = {};
 
 			arg.w.fn = tcf_node_bind;
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 17/17] net: sched: unlock rules update API
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.

Introduce tcf_proto_is_unlocked() helper that is used to check
tcf_proto_ops->flag to determine if ops can be called without taking rtnl
lock. Manually lookup Qdisc, class and block in rule update handlers.
Verify that both Qdisc ops and proto ops are unlocked before using any of
their callbacks, and obtain rtnl lock otherwise.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 131 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 114 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5f9373ee47ce..266fcb34fefe 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -163,6 +163,23 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
 	return TC_H_MAJ(first);
 }
 
+static bool tcf_proto_is_unlocked(const char *kind)
+{
+	const struct tcf_proto_ops *ops;
+	bool ret;
+
+	ops = tcf_proto_lookup_ops(kind, false, NULL);
+	/* On error return false to take rtnl lock. Proto lookup/create
+	 * functions will perform lookup again and properly handle errors.
+	 */
+	if (IS_ERR(ops))
+		return false;
+
+	ret = !!(ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED);
+	module_put(ops->owner);
+	return ret;
+}
+
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 					  u32 prio, struct tcf_chain *chain,
 					  bool rtnl_held,
@@ -1312,8 +1329,12 @@ static void tcf_block_release(struct Qdisc *q, struct tcf_block *block,
 	if (!IS_ERR_OR_NULL(block))
 		tcf_block_refcnt_put(block, rtnl_held);
 
-	if (q)
-		qdisc_put(q);
+	if (q) {
+		if (rtnl_held)
+			qdisc_put(q);
+		else
+			qdisc_put_unlocked(q);
+	}
 }
 
 struct tcf_block_owner_item {
@@ -1966,7 +1987,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	void *fh;
 	int err;
 	int tp_created;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1985,6 +2006,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	parent = t->tcm_parent;
 	tp = NULL;
 	cl = 0;
+	block = NULL;
 
 	if (prio == 0) {
 		/* If no priority is provided by the user,
@@ -2001,8 +2023,27 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	/* Find head of filter chain. */
 
-	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+	err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack);
+	if (err)
+		return err;
+
+	/* Take rtnl mutex if rtnl_held was set to true on previous iteration,
+	 * block is shared (no qdisc found), qdisc is not unlocked, classifier
+	 * type is not specified, classifier is not unlocked.
+	 */
+	if (rtnl_held ||
+	    (q && !(q->ops->cl_ops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED)) ||
+	    !tca[TCA_KIND] || !tcf_proto_is_unlocked(nla_data(tca[TCA_KIND]))) {
+		rtnl_held = true;
+		rtnl_lock();
+	}
+
+	err = __tcf_qdisc_cl_find(q, parent, &cl, t->tcm_ifindex, extack);
+	if (err)
+		goto errout;
+
+	block = __tcf_block_find(net, q, cl, t->tcm_ifindex, t->tcm_block_index,
+				 extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -2123,9 +2164,18 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			tcf_chain_put(chain);
 	}
 	tcf_block_release(q, block, rtnl_held);
-	if (err == -EAGAIN)
+
+	if (rtnl_held)
+		rtnl_unlock();
+
+	if (err == -EAGAIN) {
+		/* Take rtnl lock in case EAGAIN is caused by concurrent flush
+		 * of target chain.
+		 */
+		rtnl_held = true;
 		/* Replay the request. */
 		goto replay;
+	}
 	return err;
 
 errout_locked:
@@ -2146,12 +2196,12 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	struct Qdisc *q = NULL;
 	struct tcf_chain_info chain_info;
 	struct tcf_chain *chain = NULL;
-	struct tcf_block *block;
+	struct tcf_block *block = NULL;
 	struct tcf_proto *tp = NULL;
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -2172,8 +2222,27 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	/* Find head of filter chain. */
 
-	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+	err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack);
+	if (err)
+		return err;
+
+	/* Take rtnl mutex if flushing whole chain, block is shared (no qdisc
+	 * found), qdisc is not unlocked, classifier type is not specified,
+	 * classifier is not unlocked.
+	 */
+	if (!prio ||
+	    (q && !(q->ops->cl_ops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED)) ||
+	    !tca[TCA_KIND] || !tcf_proto_is_unlocked(nla_data(tca[TCA_KIND]))) {
+		rtnl_held = true;
+		rtnl_lock();
+	}
+
+	err = __tcf_qdisc_cl_find(q, parent, &cl, t->tcm_ifindex, extack);
+	if (err)
+		goto errout;
+
+	block = __tcf_block_find(net, q, cl, t->tcm_ifindex, t->tcm_block_index,
+				 extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -2255,6 +2324,10 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tcf_chain_put(chain);
 	}
 	tcf_block_release(q, block, rtnl_held);
+
+	if (rtnl_held)
+		rtnl_unlock();
+
 	return err;
 
 errout_locked:
@@ -2275,12 +2348,12 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	struct Qdisc *q = NULL;
 	struct tcf_chain_info chain_info;
 	struct tcf_chain *chain = NULL;
-	struct tcf_block *block;
+	struct tcf_block *block = NULL;
 	struct tcf_proto *tp = NULL;
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, rtm_tca_policy, extack);
 	if (err < 0)
@@ -2298,8 +2371,26 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	/* Find head of filter chain. */
 
-	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+	err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack);
+	if (err)
+		return err;
+
+	/* Take rtnl mutex if block is shared (no qdisc found), qdisc is not
+	 * unlocked, classifier type is not specified, classifier is not
+	 * unlocked.
+	 */
+	if ((q && !(q->ops->cl_ops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED)) ||
+	    !tca[TCA_KIND] || !tcf_proto_is_unlocked(nla_data(tca[TCA_KIND]))) {
+		rtnl_held = true;
+		rtnl_lock();
+	}
+
+	err = __tcf_qdisc_cl_find(q, parent, &cl, t->tcm_ifindex, extack);
+	if (err)
+		goto errout;
+
+	block = __tcf_block_find(net, q, cl, t->tcm_ifindex, t->tcm_block_index,
+				 extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -2352,6 +2443,10 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tcf_chain_put(chain);
 	}
 	tcf_block_release(q, block, rtnl_held);
+
+	if (rtnl_held)
+		rtnl_unlock();
+
 	return err;
 }
 
@@ -3214,10 +3309,12 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_rhash_setup_block_ht;
 
-	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, 0);
-	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
+		      RTNL_FLAG_DOIT_UNLOCKED);
+	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
+		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
-		      tc_dump_tfilter, 0);
+		      tc_dump_tfilter, RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_NEWCHAIN, tc_ctl_chain, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELCHAIN, tc_ctl_chain, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETCHAIN, tc_ctl_chain,
-- 
2.13.6


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox