Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] Documentation: fix some freescale dpio-driver.rst warnings
From: Randy Dunlap @ 2019-02-11  6:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Stuart Yoder, Laurentiu Tudor, Ioana Radulescu, Madalin Bucur,
	David Miller, linux-doc@vger.kernel.org

From: Randy Dunlap <rdunlap@infradead.org>

Fix markup warnings for one list by using correct list syntax.
Fix markup warnings for another list by using blank lines before the
list.

Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:30: WARNING: Unexpected indentation.
Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:143: WARNING: Unexpected indentation.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Cc: netdev@vger.kernel.org
Cc: Madalin Bucur <madalin.bucur@nxp.com>
---
This still leaves 2 other warnings that I don't yet see how to fix.

 Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst |   14 +++++-----
 1 file changed, 7 insertions(+), 7 deletions(-)

--- lnx-50-rc6.orig/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
+++ lnx-50-rc6/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
@@ -27,11 +27,12 @@ Driver Overview
 
 The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
 provides services that:
-  A) allow other drivers, such as the Ethernet driver, to enqueue and dequeue
+
+  A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
      frames for their respective objects
-  B) allow drivers to register callbacks for data availability notifications
+  B. allow drivers to register callbacks for data availability notifications
      when data becomes available on a queue or channel
-  C) allow drivers to manage hardware buffer pools
+  C. allow drivers to manage hardware buffer pools
 
 The Linux DPIO driver consists of 3 primary components--
    DPIO object driver-- fsl-mc driver that manages the DPIO object
@@ -140,11 +141,10 @@ QBman portal interface (qbman-portal.c)
 
    The qbman-portal component provides APIs to do the low level hardware
    bit twiddling for operations such as:
-      -initializing Qman software portals
-
-      -building and sending portal commands
 
-      -portal interrupt configuration and processing
+      - initializing Qman software portals
+      - building and sending portal commands
+      - portal interrupt configuration and processing
 
    The qbman-portal APIs are not public to other drivers, and are
    only used by dpio-service.



^ permalink raw reply

* Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
From: Jean-Mickael Guerin @ 2019-02-11  6:33 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jakub.kicinski, bjorn.topel,
	qi.z.zhang, brouer, xiaolong.ye
In-Reply-To: <1549631126-29067-1-git-send-email-magnus.karlsson@intel.com>

Hi Magnus,

> * In a future release, I am planning on adding a higher level data
>   plane interface too. This will be based around recvmsg and sendmsg
>   with the use of struct iovec for batching, without the user having
>   to know anything about the underlying four rings of an AF_XDP
>   socket. There will be one semantic difference though from the
>   standard recvmsg and that is that the kernel will fill in the iovecs
>   instead of the application. But the rest should be the same as the
>   libc versions so that application writers feel at home.

You might consider recvmmsg() and sendmmsg() (bulk of multi segments packets?)

Jean-Mickael

^ permalink raw reply

* [net] tipc: fix link session and re-establish issues
From: Tuong Lien @ 2019-02-11  6:29 UTC (permalink / raw)
  To: davem, jon.maloy, ying.xue, netdev; +Cc: tipc-discussion

When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.

However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:

1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!

Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.

2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!

Solution:
===========

This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.

In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.

The solution is backward compatible.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
---
 net/tipc/link.c | 15 +++++++++++++++
 net/tipc/msg.h  | 22 ++++++++++++++++++++++
 net/tipc/node.c | 11 ++++++-----
 3 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index ac306d17f8ad..631e21cd4256 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1425,6 +1425,10 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe,
 		l->rcv_unacked = 0;
 	} else {
 		/* RESET_MSG or ACTIVATE_MSG */
+		if (mtyp == ACTIVATE_MSG) {
+			msg_set_dest_session_valid(hdr, 1);
+			msg_set_dest_session(hdr, l->peer_session);
+		}
 		msg_set_max_pkt(hdr, l->advertised_mtu);
 		strcpy(data, l->if_name);
 		msg_set_size(hdr, INT_H_SIZE + TIPC_MAX_IF_NAME);
@@ -1642,6 +1646,17 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
 			rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
 			break;
 		}
+
+		/* If this endpoint was re-created while peer was ESTABLISHING
+		 * it doesn't know current session number. Force re-synch.
+		 */
+		if (mtyp == ACTIVATE_MSG && msg_dest_session_valid(hdr) &&
+		    l->session != msg_dest_session(hdr)) {
+			if (less(l->session, msg_dest_session(hdr)))
+				l->session = msg_dest_session(hdr) + 1;
+			break;
+		}
+
 		/* ACTIVATE_MSG serves as PEER_RESET if link is already down */
 		if (mtyp == RESET_MSG || !link_is_up(l))
 			rc = tipc_link_fsm_evt(l, LINK_PEER_RESET_EVT);
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index a0924956bb61..d7e4b8b93f9d 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -360,6 +360,28 @@ static inline void msg_set_bcast_ack(struct tipc_msg *m, u16 n)
 	msg_set_bits(m, 1, 0, 0xffff, n);
 }

+/* Note: reusing bits in word 1 for ACTIVATE_MSG only, to re-synch
+ * link peer session number
+ */
+static inline bool msg_dest_session_valid(struct tipc_msg *m)
+{
+	return msg_bits(m, 1, 16, 0x1);
+}
+
+static inline void msg_set_dest_session_valid(struct tipc_msg *m, bool valid)
+{
+	msg_set_bits(m, 1, 16, 0x1, valid);
+}
+
+static inline u16 msg_dest_session(struct tipc_msg *m)
+{
+	return msg_bits(m, 1, 0, 0xffff);
+}
+
+static inline void msg_set_dest_session(struct tipc_msg *m, u16 n)
+{
+	msg_set_bits(m, 1, 0, 0xffff, n);
+}

 /*
  * Word 2
diff --git a/net/tipc/node.c b/net/tipc/node.c
index db2a6c3e0be9..2dc4919ab23c 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -830,15 +830,16 @@ static void tipc_node_link_down(struct tipc_node *n, int bearer_id, bool delete)
 	tipc_node_write_lock(n);
 	if (!tipc_link_is_establishing(l)) {
 		__tipc_node_link_down(n, &bearer_id, &xmitq, &maddr);
-		if (delete) {
-			kfree(l);
-			le->link = NULL;
-			n->link_cnt--;
-		}
 	} else {
 		/* Defuse pending tipc_node_link_up() */
+		tipc_link_reset(l);
 		tipc_link_fsm_evt(l, LINK_RESET_EVT);
 	}
+	if (delete) {
+		kfree(l);
+		le->link = NULL;
+		n->link_cnt--;
+	}
 	trace_tipc_node_link_down(n, true, "node link down or deleted!");
 	tipc_node_write_unlock(n);
 	if (delete)
-- 
2.13.7

^ permalink raw reply related

* [RFC 0/3] devlink: add the ability to update device flash
From: Jakub Kicinski @ 2019-02-11  6:59 UTC (permalink / raw)
  To: davem, jiri; +Cc: netdev, oss-drivers, mkubecek, andrew, Jakub Kicinski

Hi!

This series is the second step to allow trouble shooting and recovering
devices in bad state without the use of netdevs as handles.  We can
already query FW versions over devlink, now we add the ability to update
the FW.  This will allow drivers to implement some from of "limp-mode"
where the device can't really be used for networking and hence has no
netdev, but we can interrogate it over devlink and fix the broken FW.

Small but nice advantage of devlink is that it only holds the devlink
instance lock during flashing, unlike ethtool which holds rtnl_lock().

Sending as RFC due to impending conflicts.

Jakub Kicinski (3):
  devlink: add flash update command
  ethtool: add compat for flash update
  nfp: devlink: allow flashing the device via devlink

 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 47 +++++++++++++-
 include/net/devlink.h                         | 11 ++++
 include/uapi/linux/devlink.h                  |  6 ++
 net/core/devlink.c                            | 61 +++++++++++++++++++
 net/core/ethtool.c                            | 12 +++-
 5 files changed, 133 insertions(+), 4 deletions(-)

-- 
2.19.2

^ permalink raw reply

* [RFC 1/3] devlink: add flash update command
From: Jakub Kicinski @ 2019-02-11  6:59 UTC (permalink / raw)
  To: davem, jiri; +Cc: netdev, oss-drivers, mkubecek, andrew, Jakub Kicinski
In-Reply-To: <20190211065923.22670-1-jakub.kicinski@netronome.com>

Add devlink flash update command. Advanced NICs have firmware
stored in flash and often cryptographically secured. Updating
that flash is handled by management firmware. Ethtool has a
flash update command which served us well, however, it has two
shortcomings:
 - it takes rtnl_lock unnecessarily - really flash update has
   nothing to do with networking, so using a networking device
   as a handle is suboptimal, which leads us to the second one:
 - it requires a functioning netdev - in case device enters an
   error state and can't spawn a netdev (e.g. communication
   with the device fails) there is no netdev to use as a handle
   for flashing.

Devlink already has the ability to report the firmware versions,
now with the ability to update the firmware/flash we will be
able to recover devices in bad state.

To enable easy interoperability with ethtool add the target
partition ID. We may or may not add a different method of
identification, but there is no such immediate need.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        |  2 ++
 include/uapi/linux/devlink.h |  6 ++++++
 net/core/devlink.c           | 30 ++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 07660fe4c0e3..55b3478b1291 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -529,6 +529,8 @@ struct devlink_ops {
 				      struct netlink_ext_ack *extack);
 	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
 			struct netlink_ext_ack *extack);
+	int (*flash_update)(struct devlink *devlink, const char *path,
+			    u32 target, struct netlink_ext_ack *extack);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 72d9f7c89190..f4417283fd1b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -103,6 +103,8 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
 
+	DEVLINK_CMD_FLASH_UPDATE,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -326,6 +328,10 @@ enum devlink_attr {
 	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
 	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
 	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
+
+	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
+	DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID,	/* u32 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 46c468a1f3dc..a4b5e194e33e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2660,6 +2660,27 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 	return devlink->ops->reload(devlink, info->extack);
 }
 
+static int devlink_nl_cmd_flash_update(struct sk_buff *skb,
+				       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	const char *file_name;
+	u32 target = 0;
+
+	if (!devlink->ops->flash_update)
+		return -EOPNOTSUPP;
+
+	if (!info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME])
+		return -EINVAL;
+	file_name = nla_data(info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME]);
+
+	if (info->attrs[DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID])
+		target = nla_get_u32(info->attrs[DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID]);
+
+	return devlink->ops->flash_update(devlink, file_name, target,
+					  info->extack);
+}
+
 static const struct devlink_param devlink_param_generic[] = {
 	{
 		.id = DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
@@ -4876,6 +4897,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
+	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -5164,6 +5187,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
 				  DEVLINK_NL_FLAG_NO_LOCK,
 	},
+	{
+		.cmd = DEVLINK_CMD_FLASH_UPDATE,
+		.doit = devlink_nl_cmd_flash_update,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.19.2


^ permalink raw reply related

* [RFC 2/3] ethtool: add compat for flash update
From: Jakub Kicinski @ 2019-02-11  6:59 UTC (permalink / raw)
  To: davem, jiri; +Cc: netdev, oss-drivers, mkubecek, andrew, Jakub Kicinski
In-Reply-To: <20190211065923.22670-1-jakub.kicinski@netronome.com>

If driver does not support ethtool flash update operation
call into devlink.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h |  9 +++++++++
 net/core/devlink.c    | 31 +++++++++++++++++++++++++++++++
 net/core/ethtool.c    | 12 +++++++++---
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 55b3478b1291..8fdadd0a43ce 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1202,11 +1202,20 @@ devlink_health_report(struct devlink_health_reporter *reporter,
 #if IS_REACHABLE(CONFIG_NET_DEVLINK)
 void devlink_compat_running_version(struct net_device *dev,
 				    char *buf, size_t len);
+int devlink_compat_flash_update(struct net_device *dev, const char *file_name,
+				u32 target);
 #else
 static inline void
 devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
 {
 }
+
+static inline int
+devlink_compat_flash_update(struct net_device *dev, const char *file_name,
+			    u32 target)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index a4b5e194e33e..fb1b0982281b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -6435,6 +6435,37 @@ void devlink_compat_running_version(struct net_device *dev,
 	mutex_unlock(&devlink_mutex);
 }
 
+int devlink_compat_flash_update(struct net_device *dev, const char *file_name,
+				u32 target)
+{
+	struct devlink_port *devlink_port;
+	struct devlink *devlink;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(devlink_port, &devlink->port_list, list) {
+			int ret = -EOPNOTSUPP;
+
+			if (devlink_port->type != DEVLINK_PORT_TYPE_ETH ||
+			    devlink_port->type_dev != dev)
+				continue;
+
+			mutex_unlock(&devlink_mutex);
+			if (devlink->ops->flash_update)
+				ret = devlink->ops->flash_update(devlink,
+								 file_name,
+								 target, NULL);
+			mutex_unlock(&devlink->lock);
+			return ret;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+	mutex_unlock(&devlink_mutex);
+
+	return -EOPNOTSUPP;
+}
+
 static int __init devlink_module_init(void)
 {
 	return genl_register_family(&devlink_nl_family);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index d2c47cdf25da..389782ccd4c5 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2038,11 +2038,17 @@ static noinline_for_stack int ethtool_flash_device(struct net_device *dev,
 
 	if (copy_from_user(&efl, useraddr, sizeof(efl)))
 		return -EFAULT;
+	efl.data[ETHTOOL_FLASH_MAX_FILENAME - 1] = 0;
 
-	if (!dev->ethtool_ops->flash_device)
-		return -EOPNOTSUPP;
+	if (!dev->ethtool_ops->flash_device) {
+		int ret;
 
-	efl.data[ETHTOOL_FLASH_MAX_FILENAME - 1] = 0;
+		rtnl_unlock();
+		ret = devlink_compat_flash_update(dev, efl.data, efl.region);
+		rtnl_lock();
+
+		return ret;
+	}
 
 	return dev->ethtool_ops->flash_device(dev, &efl);
 }
-- 
2.19.2


^ permalink raw reply related

* [RFC 3/3] nfp: devlink: allow flashing the device via devlink
From: Jakub Kicinski @ 2019-02-11  6:59 UTC (permalink / raw)
  To: davem, jiri; +Cc: netdev, oss-drivers, mkubecek, andrew, Jakub Kicinski
In-Reply-To: <20190211065923.22670-1-jakub.kicinski@netronome.com>

Devlink now allows updating device flash.  Implement this
callback.

Compared to ethtool update we no longer have to release
the networking locks - devlink doesn't take them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  9 ++++
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 44 +++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_main.h |  2 +
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  | 39 ++--------------
 4 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 080a301f379e..ee45a6f9030a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -330,6 +330,14 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 	return err;
 }
 
+static int
+nfp_devlink_flash_update(struct devlink *devlink, const char *path,
+			 u32 target, struct netlink_ext_ack *extack)
+{
+	return nfp_flash_update_common(devlink_priv(devlink), path, target,
+				       extack);
+}
+
 const struct devlink_ops nfp_devlink_ops = {
 	.port_split		= nfp_devlink_port_split,
 	.port_unsplit		= nfp_devlink_port_unsplit,
@@ -338,6 +346,7 @@ const struct devlink_ops nfp_devlink_ops = {
 	.eswitch_mode_get	= nfp_devlink_eswitch_mode_get,
 	.eswitch_mode_set	= nfp_devlink_eswitch_mode_set,
 	.info_get		= nfp_devlink_info_get,
+	.flash_update		= nfp_devlink_flash_update,
 };
 
 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 6c10e8d119e4..3f55575c2929 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -300,6 +300,50 @@ static int nfp_pcie_sriov_configure(struct pci_dev *pdev, int num_vfs)
 		return nfp_pcie_sriov_enable(pdev, num_vfs);
 }
 
+int nfp_flash_update_common(struct nfp_pf *pf, const char *path,
+			    u32 target, struct netlink_ext_ack *extack)
+{
+	struct device *dev = &pf->pdev->dev;
+	const struct firmware *fw;
+	struct nfp_nsp *nsp;
+	int err;
+
+	if (target != ETHTOOL_FLASH_ALL_REGIONS)
+		return -EOPNOTSUPP;
+
+	nsp = nfp_nsp_open(pf->cpp);
+	if (IS_ERR(nsp)) {
+		err = PTR_ERR(nsp);
+		if (extack)
+			NL_SET_ERR_MSG_MOD(extack, "can't access NSP");
+		else
+			dev_err(dev, "Failed to access the NSP: %d\n", err);
+		return err;
+	}
+
+	err = request_firmware_direct(&fw, path, dev);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "unable to read flash file from disk");
+		goto exit_close_nsp;
+	}
+
+	dev_info(dev, "Please be patient while writing flash image: %s\n",
+		 path);
+
+	err = nfp_nsp_write_flash(nsp, fw);
+	if (err < 0)
+		goto exit_release_fw;
+	dev_info(dev, "Finished writing flash image\n");
+	err = 0;
+
+exit_release_fw:
+	release_firmware(fw);
+exit_close_nsp:
+	nfp_nsp_close(nsp);
+	return err;
+}
+
 static const struct firmware *
 nfp_net_fw_request(struct pci_dev *pdev, struct nfp_pf *pf, const char *name)
 {
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index a3613a2e0aa5..6e4b509017c1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -164,6 +164,8 @@ nfp_pf_map_rtsym(struct nfp_pf *pf, const char *name, const char *sym_fmt,
 		 unsigned int min_size, struct nfp_cpp_area **area);
 int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, u64 in_length,
 		 void *out_data, u64 out_length);
+int nfp_flash_update_common(struct nfp_pf *pf, const char *path,
+			    u32 target, struct netlink_ext_ack *extack);
 
 enum nfp_dump_diag {
 	NFP_DUMP_NSP_DIAG = 0,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index cb9c512abc76..244b60f406c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -1237,52 +1237,21 @@ static int nfp_net_set_channels(struct net_device *netdev,
 static int
 nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
 {
-	const struct firmware *fw;
 	struct nfp_app *app;
-	struct nfp_nsp *nsp;
-	struct device *dev;
-	int err;
-
-	if (flash->region != ETHTOOL_FLASH_ALL_REGIONS)
-		return -EOPNOTSUPP;
+	int ret;
 
 	app = nfp_app_from_netdev(netdev);
 	if (!app)
 		return -EOPNOTSUPP;
 
-	dev = &app->pdev->dev;
-
-	nsp = nfp_nsp_open(app->cpp);
-	if (IS_ERR(nsp)) {
-		err = PTR_ERR(nsp);
-		dev_err(dev, "Failed to access the NSP: %d\n", err);
-		return err;
-	}
-
-	err = request_firmware_direct(&fw, flash->data, dev);
-	if (err)
-		goto exit_close_nsp;
-
-	dev_info(dev, "Please be patient while writing flash image: %s\n",
-		 flash->data);
 	dev_hold(netdev);
 	rtnl_unlock();
-
-	err = nfp_nsp_write_flash(nsp, fw);
-	if (err < 0) {
-		dev_err(dev, "Flash write failed: %d\n", err);
-		goto exit_rtnl_lock;
-	}
-	dev_info(dev, "Finished writing flash image\n");
-
-exit_rtnl_lock:
+	ret = nfp_flash_update_common(app->pf, flash->data, flash->region,
+				      NULL);
 	rtnl_lock();
 	dev_put(netdev);
-	release_firmware(fw);
 
-exit_close_nsp:
-	nfp_nsp_close(nsp);
-	return err;
+	return ret;
 }
 
 static const struct ethtool_ops nfp_net_ethtool_ops = {
-- 
2.19.2


^ permalink raw reply related

* [RFC iproute2] devlink: add support for updating device flash
From: Jakub Kicinski @ 2019-02-11  6:59 UTC (permalink / raw)
  To: davem, jiri; +Cc: netdev, oss-drivers, mkubecek, andrew, Jakub Kicinski
In-Reply-To: <20190211065923.22670-1-jakub.kicinski@netronome.com>

Add new command for updating flash of devices via devlink API.
Example:

$ cp flash-boot.bin /lib/firmware/
$ devlink dev flash pci/0000:05:00.0 file flash-boot.bin

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 devlink/devlink.c      | 54 ++++++++++++++++++++++++++++++++++++++++++
 man/man8/devlink-dev.8 | 29 +++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index d823512a4030..dd5f153eddc6 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -199,6 +199,8 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_REGION_SNAPSHOT_ID	BIT(22)
 #define DL_OPT_REGION_ADDRESS		BIT(23)
 #define DL_OPT_REGION_LENGTH		BIT(24)
+#define DL_OPT_FLASH_FILE_NAME	BIT(25)
+#define DL_OPT_FLASH_TARGET_ID	BIT(26)
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -230,6 +232,8 @@ struct dl_opts {
 	uint32_t region_snapshot_id;
 	uint64_t region_address;
 	uint64_t region_length;
+	const char *flash_file_name;
+	uint32_t flash_target_id;
 };
 
 struct dl {
@@ -1185,6 +1189,20 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_REGION_LENGTH;
+		} else if (dl_argv_match(dl, "file") &&
+			   (o_all & DL_OPT_FLASH_FILE_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->flash_file_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_FLASH_FILE_NAME;
+		} else if (dl_argv_match(dl, "target") &&
+			   (o_all & DL_OPT_FLASH_TARGET_ID)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint32_t(dl, &opts->flash_target_id);
+			if (err)
+				return err;
+			o_found |= DL_OPT_FLASH_TARGET_ID;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1389,6 +1407,12 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_REGION_LENGTH)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_REGION_CHUNK_LEN,
 				 opts->region_length);
+	if (opts->present & DL_OPT_FLASH_FILE_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,
+				  opts->flash_file_name);
+	if (opts->present & DL_OPT_FLASH_TARGET_ID)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID,
+				 opts->flash_target_id);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1451,6 +1475,7 @@ static void cmd_dev_help(void)
 	pr_err("       devlink dev param show [DEV name PARAMETER]\n");
 	pr_err("       devlink dev reload DEV\n");
 	pr_err("       devlink dev info [ DEV ]\n");
+	pr_err("       devlink dev flash DEV file PATH [ target ID ]\n");
 }
 
 static bool cmp_arr_last_handle(struct dl *dl, const char *bus_name,
@@ -2583,6 +2608,32 @@ static int cmd_dev_info(struct dl *dl)
 	return err;
 }
 
+static void cmd_dev_flash_help(void)
+{
+	pr_err("Usage: devlink dev flash DEV file PATH [ target ID ]\n");
+}
+
+static int cmd_dev_flash(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_dev_flash_help();
+		return 0;
+	}
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_FLASH_UPDATE,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_FLASH_FILE_NAME,
+				DL_OPT_FLASH_TARGET_ID);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
 static int cmd_dev(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
@@ -2604,6 +2655,9 @@ static int cmd_dev(struct dl *dl)
 	} else if (dl_argv_match(dl, "info")) {
 		dl_arg_inc(dl);
 		return cmd_dev_info(dl);
+	} else if (dl_argv_match(dl, "flash")) {
+		dl_arg_inc(dl);
+		return cmd_dev_flash(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
index 47838371fecd..dda35fb09ee0 100644
--- a/man/man8/devlink-dev.8
+++ b/man/man8/devlink-dev.8
@@ -69,6 +69,16 @@ devlink-dev \- devlink device configuration
 .IR DEV
 .RI "]"
 
+.ti -8
+.BR "devlink dev flash"
+.IR DEV
+.BR file
+.IR PATH
+.RI "["
+.BR target
+.IR ID
+.RI "]"
+
 .SH "DESCRIPTION"
 .SS devlink dev show - display devlink device attributes
 
@@ -177,6 +187,25 @@ versions may differ after flash has been updated, but before reboot.
 - specifies the devlink device to show.
 If this argument is omitted all devices are listed.
 
+.SS devlink dev flash - write device's non-volatile memory.
+
+.PP
+.I "DEV"
+- specifies the devlink device to write to.
+
+.BR file
+.I PATH
+- Path to the file which will be written into device's flash. The path needs
+to be relative to one of the directories searched by the kernel firmware loaded,
+such as /lib/firmware.
+
+.BR target
+.I ID
+- If device stores multiple firmware images in non-volatile memory, this
+parameter may be used to indicate which firmware image should be written.
+The default value of 0 means that all regions should be updated.
+Interpretation of other values is driver-dependent.
+
 .SH "EXAMPLES"
 .PP
 devlink dev show
-- 
2.19.2


^ permalink raw reply related

* Re: [Patch net 2/3] net_sched: fix a memory leak in cls_tcindex
From: Cong Wang @ 2019-02-11  7:19 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <201902111051.xRYMeLJl%fengguang.wu@intel.com>

On Sun, Feb 10, 2019 at 6:15 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Cong,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on net/master]
>
> url:    https://github.com/0day-ci/linux/commits/Cong-Wang/net_sched-some-fixes-for-cls_tcindex/20190211-095057
> config: i386-randconfig-x002-201906 (attached as .config)
> compiler: gcc-8 (Debian 8.2.0-14) 8.2.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
>    net/sched/cls_tcindex.c: In function 'tcindex_alloc_perfect_hash':
> >> net/sched/cls_tcindex.c:301:22: error: 'struct tcf_exts' has no member named 'net'
>       cp->perfect[i].exts.net = net;

Yeah, looks like I missed the CONFIG_CLS_ACT=n case.

Let me think about how to fix it properly.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 0/2] Revert wake_on_lan devlink parameter
From: David Miller @ 2019-02-11  7:35 UTC (permalink / raw)
  To: vasundhara-v.volam; +Cc: michael.chan, jiri, netdev
In-Reply-To: <CAACQVJpyD2EW0d=sUAz6h1rk9A0iNQZUhH5aC9iAJrCZpsX60A@mail.gmail.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Date: Mon, 11 Feb 2019 10:09:23 +0530

> On Sat, Feb 9, 2019 at 12:37 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
>> Date: Fri,  8 Feb 2019 14:43:08 +0530
>>
>> > As per discussion with Jakub Kicinski and Michal Kubecek,
>> > this will be better addressed by soon-too-come ethtool netlink
>> > API with additional indication that given WoL configuration request
>> > is supposed to be persisted.
>> >
>> > Retain bnxt_en code for devlink port param table registration.
>> > There will be follow up patches to add some devlink port params
>> > for bnxt_en driver.
>>
>> Please fix the kbuild robot reported build failure and repost.
> David, second patch in this patchset has already taken care of all
> this failures.
> Could you please apply both patches together?

You cannot break bisection like this.

If they cannot be separated, resubmit the change as one patch.

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Maxime Chevallier @ 2019-02-11  7:46 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: davem, Andrew Lunn, netdev, linux-kernel, Florian Fainelli,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <81c340ea-54b0-1abf-94af-b8dc4ee83e3a@gmail.com>

Hello Heiner,

>Hi Maxime,
>
>Andrew and me are working on Aquantia PHY support and he handed over
>to me a patch series which includes parts of the first version of your
>series. Having said that I'm especially interested in your patches
>5 and 6. Because your series is somewhat bigger and there are a few
>review comments, preparing the next round may take time.
>
>I'd propose that you extract generic patches being submission-ready
>and split the patch series into two. I think the following patches
>would be candidates for the first series: 2, 3, 5, 6
>(provided they have no dependency on the other patches)
>Based on that both of us can go on with our work.

Sure, I'll sent that shortly. Thanks for the help,

Maxime

^ permalink raw reply

* Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
From: Magnus Karlsson @ 2019-02-11  7:52 UTC (permalink / raw)
  To: Jean-Mickael Guerin
  Cc: Magnus Karlsson, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer, xiaolong.ye
In-Reply-To: <CAP2kokD9aSryMBDjt+ip=1NFronXrMfY59b6DauTT59kWHLwcw@mail.gmail.com>

On Mon, Feb 11, 2019 at 7:34 AM Jean-Mickael Guerin <jmg@6wind.com> wrote:
>
> Hi Magnus,
>
> > * In a future release, I am planning on adding a higher level data
> >   plane interface too. This will be based around recvmsg and sendmsg
> >   with the use of struct iovec for batching, without the user having
> >   to know anything about the underlying four rings of an AF_XDP
> >   socket. There will be one semantic difference though from the
> >   standard recvmsg and that is that the kernel will fill in the iovecs
> >   instead of the application. But the rest should be the same as the
> >   libc versions so that application writers feel at home.
>
> You might consider recvmmsg() and sendmmsg() (bulk of multi segments packets?)

Exactly :-). Spot on.

/Magnus

> Jean-Mickael

^ permalink raw reply

* [PATCH net-next 1/1] flow_offload: Fix flow action infrastructure
From: Eli Britstein @ 2019-02-11  7:52 UTC (permalink / raw)
  To: netdev
  Cc: Roi Dayan, Pablo Neira Ayuso, Jiri Pirko, Saeed Mahameed,
	Eli Britstein

Implementation of macro "flow_action_for_each" introduced in
commit e3ab786b42535 ("flow_offload: add flow action infrastructure")
and used in commit 738678817573c ("drivers: net: use flow action
infrastructure") iterated the first item twice and did not reach the
last one. Fix it.

Fixes: e3ab786b42535 ("flow_offload: add flow action infrastructure")
Fixes: 738678817573c ("drivers: net: use flow action infrastructure")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 include/net/flow_offload.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 23166caa0da5..a307ccb18015 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -171,7 +171,7 @@ static inline bool flow_action_has_entries(const struct flow_action *action)
 }
 
 #define flow_action_for_each(__i, __act, __actions)			\
-        for (__i = 0, __act = &(__actions)->entries[0]; __i < (__actions)->num_entries; __act = &(__actions)->entries[__i++])
+        for (__i = 0, __act = &(__actions)->entries[0]; __i < (__actions)->num_entries; __act = &(__actions)->entries[++__i])
 
 struct flow_rule {
 	struct flow_match	match;
-- 
2.14.5


^ permalink raw reply related

* Re: [RFC PATCH] perf, bpf: Retain kernel executable code in memory to aid Intel PT tracing
From: Adrian Hunter @ 2019-02-11  7:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ingo Molnar, Peter Zijlstra, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Song Liu, Daniel Borkmann,
	Alexei Starovoitov, linux-kernel, netdev
In-Reply-To: <20190208232924.bpdjuaqufndigtd4@ast-mbp>

On 9/02/19 1:29 AM, Alexei Starovoitov wrote:
> On Thu, Feb 07, 2019 at 01:19:01PM +0200, Adrian Hunter wrote:
>> Subject to memory pressure and other limits, retain executable code, such
>> as JIT-compiled bpf, in memory instead of freeing it immediately it is no
>> longer needed for execution.
>>
>> While perf is primarily aimed at statistical analysis, tools like Intel
>> PT can aim to provide a trace of exactly what happened. As such, corner
>> cases that can be overlooked statistically need to be addressed. For
>> example, there is a gap where JIT-compiled bpf can be freed from memory
>> before a tracer has a chance to read it out through the bpf syscall.
>> While that can be ignored statistically, it contributes to a death by
>> 1000 cuts for tracers attempting to assemble exactly what happened. This is
>> a bit gratuitous given that retaining the executable code is relatively
>> simple, and the amount of memory involved relatively small. The retained
>> executable code is then available in memory images such as /proc/kcore.
>>
>> This facility could perhaps be extended also to init sections.
>>
>> Note that this patch is compile tested only and, at present, is missing
>> the ability to retain symbols.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  arch/x86/Kconfig.cpu       |   1 +
>>  include/linux/filter.h     |   4 +
>>  include/linux/xc_retain.h  |  49 ++++++++++
>>  init/Kconfig               |   6 ++
>>  kernel/Makefile            |   1 +
>>  kernel/bpf/core.c          |  44 ++++++++-
>>  kernel/xc_retain.c         | 183 +++++++++++++++++++++++++++++++++++++
>>  net/core/sysctl_net_core.c |  62 +++++++++++++
>>  8 files changed, 349 insertions(+), 1 deletion(-)
>>  create mode 100644 include/linux/xc_retain.h
>>  create mode 100644 kernel/xc_retain.c
>>
>> diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
>> index 6adce15268bd..21dcd064c272 100644
>> --- a/arch/x86/Kconfig.cpu
>> +++ b/arch/x86/Kconfig.cpu
>> @@ -389,6 +389,7 @@ menuconfig PROCESSOR_SELECT
>>  config CPU_SUP_INTEL
>>  	default y
>>  	bool "Support Intel processors" if PROCESSOR_SELECT
>> +	select XC_RETAIN if PERF_EVENTS && BPF_JIT
>>  	---help---
>>  	  This enables detection, tunings and quirks for Intel processors
>>  
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index d531d4250bff..40b9f601e18f 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -851,6 +851,10 @@ extern int bpf_jit_enable;
>>  extern int bpf_jit_harden;
>>  extern int bpf_jit_kallsyms;
>>  extern long bpf_jit_limit;
>> +extern unsigned int bpf_jit_retain_min;
>> +extern unsigned int bpf_jit_retain_max;
>> +
>> +void bpf_jit_retain_update_sz(void);
>>  
>>  typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
>>  
>> diff --git a/include/linux/xc_retain.h b/include/linux/xc_retain.h
>> new file mode 100644
>> index 000000000000..e79dc138bab8
>> --- /dev/null
>> +++ b/include/linux/xc_retain.h
>> @@ -0,0 +1,49 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2019 Intel Corporation.
>> + */
>> +#ifndef _LINUX_XC_RETAIN_H
>> +#define _LINUX_XC_RETAIN_H
>> +
>> +#include <linux/list.h>
>> +#include <linux/shrinker.h>
>> +#include <linux/spinlock.h>
>> +
>> +struct xc_retain_ops {
>> +	void (*free)(void *addr);
>> +};
>> +
>> +struct xc_retain {
>> +	struct list_head list;
>> +	struct list_head items;
>> +	const struct xc_retain_ops ops;
>> +	unsigned int min_pages;
>> +	unsigned int max_pages;
>> +	unsigned int current_pages;
>> +	unsigned int item_cnt;
>> +	spinlock_t lock;
>> +	struct shrinker shrinker;
>> +};
>> +
>> +#ifdef CONFIG_XC_RETAIN
>> +int xc_retain_register(struct xc_retain *xr);
>> +void xc_retain_binary(struct xc_retain *xr, void *addr, unsigned int pages);
>> +void xc_retain_set_min_pages(struct xc_retain *xr, unsigned int min_pages);
>> +void xc_retain_set_max_pages(struct xc_retain *xr, unsigned int max_pages);
>> +#else
>> +static inline int xc_retain_register(struct xc_retain *xr)
>> +{
>> +	return 0;
>> +}
>> +static inline void xc_retain_binary(struct xc_retain *xr, void *addr,
>> +				    unsigned int pages)
>> +{
>> +	xr->ops.free(addr);
>> +}
>> +static inline void xc_retain_set_max_pages(struct xc_retain *xr,
>> +					   unsigned int max_pages)
>> +{
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/init/Kconfig b/init/Kconfig
>> index c9386a365eea..954c288cabdc 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1550,6 +1550,12 @@ config EMBEDDED
>>  	  an embedded system so certain expert options are available
>>  	  for configuration.
>>  
>> +config XC_RETAIN
>> +	bool
>> +	help
>> +	  Retain kernel executable code (e.g. jitted BPF) in memory after it
>> +	  would normally be freed.
>> +
>>  config HAVE_PERF_EVENTS
>>  	bool
>>  	help
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index 6aa7543bcdb2..5df40e2a934e 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -98,6 +98,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>>  obj-$(CONFIG_BPF) += bpf/
>> +obj-$(CONFIG_XC_RETAIN) += xc_retain.o
>>  
>>  obj-$(CONFIG_PERF_EVENTS) += events/
>>  
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 19c49313c709..7fd235d235c2 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -34,6 +34,7 @@
>>  #include <linux/kallsyms.h>
>>  #include <linux/rcupdate.h>
>>  #include <linux/perf_event.h>
>> +#include <linux/xc_retain.h>
>>  
>>  #include <asm/unaligned.h>
>>  
>> @@ -480,6 +481,10 @@ int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
>>  int bpf_jit_harden   __read_mostly;
>>  int bpf_jit_kallsyms __read_mostly;
>>  long bpf_jit_limit   __read_mostly;
>> +#define BPF_JIT_RETAIN_MIN 0
>> +#define BPF_JIT_RETAIN_MAX 16
>> +unsigned int bpf_jit_retain_min __read_mostly = BPF_JIT_RETAIN_MIN;
>> +unsigned int bpf_jit_retain_max __read_mostly = BPF_JIT_RETAIN_MAX;
>>  
>>  static __always_inline void
>>  bpf_get_prog_addr_region(const struct bpf_prog *prog,
>> @@ -795,6 +800,43 @@ void bpf_jit_binary_free(struct bpf_binary_header *hdr)
>>  	bpf_jit_uncharge_modmem(pages);
>>  }
>>  
>> +#ifdef CONFIG_XC_RETAIN
>> +static struct xc_retain bpf_jit_retain = {
>> +	.min_pages = BPF_JIT_RETAIN_MIN,
>> +	.max_pages = BPF_JIT_RETAIN_MAX,
>> +	.ops = {
>> +		.free = module_memfree,
>> +	},
>> +};
>> +
>> +void bpf_jit_retain_update_sz(void)
>> +{
>> +	xc_retain_set_min_pages(&bpf_jit_retain, bpf_jit_retain_min);
>> +	xc_retain_set_max_pages(&bpf_jit_retain, bpf_jit_retain_max);
>> +}
>> +
>> +static int __init bpf_jit_retain_init(void)
>> +{
>> +	return xc_retain_register(&bpf_jit_retain);
>> +}
>> +subsys_initcall(bpf_jit_retain_init);
>> +
>> +static void bpf_jit_binary_retain(struct bpf_prog *fp,
>> +				  struct bpf_binary_header *hdr)
>> +{
>> +	u32 pages = hdr->pages;
>> +
>> +	xc_retain_binary(&bpf_jit_retain, hdr, pages);
>> +	bpf_jit_uncharge_modmem(pages);
>> +}
>> +#else
>> +static void bpf_jit_binary_retain(struct bpf_prog *fp,
>> +				  struct bpf_binary_header *hdr)
>> +{
>> +	return bpf_jit_binary_free(hdr);
>> +}
>> +#endif
> 
> I'm strongly against this approach.

Thanks for commenting, but Peter wrote that he prefers this option:

	https://lkml.kernel.org/r/20190109101808.GG1900@hirez.programming.kicks-ass.net

> I understand that it's under CONFIG, but changing kernel
> into garbage collection nightmare even under config
> or sysctl is not an option.
> In many cases bpf progs are loaded/unloaded a lot.
> Consider CI test system that runs tests 24/7.
> bpf progs are loaded/unloaded in huge numbers.

Which is not really a real use-case.

For PT, the data recorded is generally too large (e.g. 100MB per cpu per second)
to record continuously.  But there is a "snapshot" mode that just captures
the PT data that is currently buffered.  In that case we want to have a
kernel image that reflects what was running for the last small period of
time.

> Such system will suffer non deterministic test and
> performance results due to shrinkers.

The default was 16 pages, after that everything gets freed immediately. 
That is assuming you don't turn it off for test purposes.  That would
not have an effect on your tests.

Also, why would copying it out of memory be less intrusive that keeping it
in memory.  Copying every bpf jit to disk for every load would be much more
intrusive.

> perf analysis with PT becomes inaccurate and main goal
> of retaining accurate instruction info is not achieved.

For the majority of real use-cases, yes it is.

> bpf_jit_retain_min/max tunables is not an option either.
> Please see how perf record is handling bpf prog/unload.
> What stops you from doing the same for PT?

Opens the door for the bpf program to be unloaded before it is copied out. 
All the corner cases where it is not possible to get accurate decoding start
to add up.

^ permalink raw reply

* Re: [RFC PATCH] perf, bpf: Retain kernel executable code in memory to aid Intel PT tracing
From: Alexei Starovoitov @ 2019-02-11  8:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Peter Zijlstra, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Song Liu, Daniel Borkmann,
	Alexei Starovoitov, linux-kernel, netdev
In-Reply-To: <85ebb8e5-97a0-801a-8d5f-bc09a72047bb@intel.com>

On Mon, Feb 11, 2019 at 09:54:01AM +0200, Adrian Hunter wrote:
> 
> Which is not really a real use-case.
..
> > perf analysis with PT becomes inaccurate and main goal
> > of retaining accurate instruction info is not achieved.
> 
> For the majority of real use-cases, yes it is.

In our fleet not a single server is using Intel PT, yet you're
proposing to penalize all of them with shrinker-based JIT freeing?
There is no negotiation here.
NACK


^ permalink raw reply

* Re: [PATCH net-next 0/2] Revert wake_on_lan devlink parameter
From: Vasundhara Volam @ 2019-02-11  8:21 UTC (permalink / raw)
  To: David Miller; +Cc: michael.chan@broadcom.com, Jiri Pirko, Netdev
In-Reply-To: <20190210.233503.1063032485031175825.davem@davemloft.net>

On Mon, Feb 11, 2019 at 1:05 PM David Miller <davem@davemloft.net> wrote:
>
> From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
> Date: Mon, 11 Feb 2019 10:09:23 +0530
>
> > On Sat, Feb 9, 2019 at 12:37 PM David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
> >> Date: Fri,  8 Feb 2019 14:43:08 +0530
> >>
> >> > As per discussion with Jakub Kicinski and Michal Kubecek,
> >> > this will be better addressed by soon-too-come ethtool netlink
> >> > API with additional indication that given WoL configuration request
> >> > is supposed to be persisted.
> >> >
> >> > Retain bnxt_en code for devlink port param table registration.
> >> > There will be follow up patches to add some devlink port params
> >> > for bnxt_en driver.
> >>
> >> Please fix the kbuild robot reported build failure and repost.
> > David, second patch in this patchset has already taken care of all
> > this failures.
> > Could you please apply both patches together?
>
> You cannot break bisection like this.
>
> If they cannot be separated, resubmit the change as one patch.
Okay. I will resubmit as one patch. Thank you David.

^ permalink raw reply

* Re: [RFC PATCH] perf, bpf: Retain kernel executable code in memory to aid Intel PT tracing
From: Adrian Hunter @ 2019-02-11  8:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ingo Molnar, Peter Zijlstra, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Jiri Olsa, Song Liu, Daniel Borkmann,
	Alexei Starovoitov, linux-kernel, netdev
In-Reply-To: <20190211081840.j3vhyp3cffftb6m2@ast-mbp>

On 11/02/19 10:18 AM, Alexei Starovoitov wrote:
> On Mon, Feb 11, 2019 at 09:54:01AM +0200, Adrian Hunter wrote:
>>
>> Which is not really a real use-case.
> ..
>>> perf analysis with PT becomes inaccurate and main goal
>>> of retaining accurate instruction info is not achieved.
>>
>> For the majority of real use-cases, yes it is.
> 
> In our fleet not a single server is using Intel PT, yet you're
> proposing to penalize all of them with shrinker-based JIT freeing?

I already responded to that.

> There is no negotiation here.

Apart from Peter and Ingo already having indicated a different approach is
preferred, why not? Shouldn't maintainers provide technical reasons.

^ permalink raw reply

* Re: [PATCH] mt76: change the retun type of mt76_dma_attach()
From: Sergei Shtylyov @ 2019-02-11  8:38 UTC (permalink / raw)
  To: Ryder Lee, Lorenzo Bianconi, Felix Fietkau, Kalle Valo
  Cc: Roy Luo, linux-wireless, linux-kernel, netdev, linux-mediatek
In-Reply-To: <228fdddb9ca96e8ce861e324eb9039722cf18f49.1549850911.git.ryder.lee@mediatek.com>

Hello!

On 11.02.2019 5:13, Ryder Lee wrote:

> There is no need to retun 0 in mt76_dma_attach(), so switch it to void.
                           ^ r missing :-)
> 
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] mt76: change the retun type of mt76_dma_attach()
From: Ryder Lee @ 2019-02-11  8:48 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Lorenzo Bianconi, Felix Fietkau, Kalle Valo, netdev,
	linux-mediatek, linux-wireless, Roy Luo, linux-kernel
In-Reply-To: <f65c85c2-ef47-d141-d78e-db2a138a1daf@cogentembedded.com>

On Mon, 2019-02-11 at 11:38 +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 11.02.2019 5:13, Ryder Lee wrote:
> 
> > There is no need to retun 0 in mt76_dma_attach(), so switch it to void.
>                            ^ r missing :-)
> > 
> > Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> [...]
> 
> MBR, Sergei
> 

I will resend a new one.

Thanks
Ryder



^ permalink raw reply

* [Resend PATCH] mt76: change the return type of mt76_dma_attach()
From: Ryder Lee @ 2019-02-11  8:48 UTC (permalink / raw)
  To: Lorenzo Bianconi, Felix Fietkau, Kalle Valo
  Cc: Roy Luo, linux-wireless, linux-kernel, netdev, linux-mediatek,
	Ryder Lee

There is no need to return 0 in mt76_dma_attach(), so switch it to void.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
---
 drivers/net/wireless/mediatek/mt76/dma.c | 3 +--
 drivers/net/wireless/mediatek/mt76/dma.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c
index e2ba263..d934d72 100644
--- a/drivers/net/wireless/mediatek/mt76/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/dma.c
@@ -522,10 +522,9 @@ int mt76_dma_tx_queue_skb(struct mt76_dev *dev, struct mt76_queue *q,
 	.kick = mt76_dma_kick_queue,
 };
 
-int mt76_dma_attach(struct mt76_dev *dev)
+void mt76_dma_attach(struct mt76_dev *dev)
 {
 	dev->queue_ops = &mt76_dma_ops;
-	return 0;
 }
 EXPORT_SYMBOL_GPL(mt76_dma_attach);
 
diff --git a/drivers/net/wireless/mediatek/mt76/dma.h b/drivers/net/wireless/mediatek/mt76/dma.h
index 357cc35..e3292df 100644
--- a/drivers/net/wireless/mediatek/mt76/dma.h
+++ b/drivers/net/wireless/mediatek/mt76/dma.h
@@ -54,7 +54,7 @@ enum mt76_mcu_evt_type {
 	EVT_EVENT_DFS_DETECT_RSP,
 };
 
-int mt76_dma_attach(struct mt76_dev *dev);
+void mt76_dma_attach(struct mt76_dev *dev);
 void mt76_dma_cleanup(struct mt76_dev *dev);
 
 #endif
-- 
1.9.1


^ permalink raw reply related

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Tariq Toukan @ 2019-02-11  8:53 UTC (permalink / raw)
  To: Ilias Apalodimas, Matthew Wilcox
  Cc: David Miller, brouer@redhat.com, toke@redhat.com,
	netdev@vger.kernel.org, mgorman@techsingularity.net,
	linux-mm@kvack.org
In-Reply-To: <20190207214237.GA10676@Iliass-MBP.lan>

On 2/7/2019 11:42 PM, Ilias Apalodimas wrote:
> Hi Matthew,
> 
> On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote:
>> On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:
>>> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>> Date: Thu, 7 Feb 2019 17:20:34 +0200
>>>
>>>> Well updating struct page is the final goal, hence the comment. I am mostly
>>>> looking for opinions here since we are trying to store dma addresses which are
>>>> irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
>>>> bit controversial isn't it ? If we can add that, then yes the code would look
>>>> better
>>>
>>> I fundamentally disagree.
>>>
>>> One of the core operations performed on a page is mapping it so that a device
>>> and use it.
>>>
>>> Why have ancillary data structure support for this all over the place, rather
>>> than in the common spot which is the page.
>>>
>>> A page really is not just a 'mm' structure, it is a system structure.
>>
>> +1
>>
>> The fundamental point of computing is to do I/O.
> Ok, great that should sort it out then.
> I'll use your proposal and base the patch on that.
> 
> Thanks for taking the time with this
> 
> /Ilias
> 

Hi,

It's great to use the struct page to store its dma mapping, but I am 
worried about extensibility.
page_pool is evolving, and it would need several more per-page fields. 
One of them would be pageref_bias, a planned optimization to reduce the 
number of the costly atomic pageref operations (and replace existing 
code in several drivers).

I would replace this dma field with a pointer to an extensible struct, 
that would contain the dma mapping (and other stuff in the near future).
This pointer fits perfectly with the existing unsigned long private; 
they can share the memory, for both 32- and 64-bits systems.

The only downside is one more pointer de-reference. This should be perf 
tested.
However, when introducing the page refcnt bias optimization into 
page_pool, I believe the perf gain would be guaranteed.

Regards,
Tariq

^ permalink raw reply

* [PATCH net-next v4 13/17] net: sched: extend proto ops with 'put' callback
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e8cf36ed3e87..410dda80ca62 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -277,6 +277,7 @@ struct tcf_proto_ops {
 					   struct netlink_ext_ack *extack);
 
 	void*			(*get)(struct tcf_proto*, u32 handle);
+	void			(*put)(struct tcf_proto *tp, void *f);
 	int			(*change)(struct net *net, struct sk_buff *,
 					struct tcf_proto*, unsigned long,
 					u32 handle, struct nlattr **,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a3e715d34efb..8fe38aa180cf 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1870,6 +1870,12 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 			       q, parent, NULL, event, false);
 }
 
+static void tfilter_put(struct tcf_proto *tp, void *fh)
+{
+	if (tp->ops->put && fh)
+		tp->ops->put(tp, fh);
+}
+
 static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			  struct netlink_ext_ack *extack)
 {
@@ -2012,6 +2018,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			goto errout;
 		}
 	} else if (n->nlmsg_flags & NLM_F_EXCL) {
+		tfilter_put(tp, fh);
 		NL_SET_ERR_MSG(extack, "Filter already exists");
 		err = -EEXIST;
 		goto errout;
@@ -2026,9 +2033,11 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE,
 			      extack);
-	if (err == 0)
+	if (err == 0) {
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
+		tfilter_put(tp, fh);
+	}
 
 errout:
 	if (err && tp_created)
@@ -2259,6 +2268,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			NL_SET_ERR_MSG(extack, "Failed to send filter notify message");
 	}
 
+	tfilter_put(tp, fh);
 errout:
 	if (chain) {
 		if (tp && !IS_ERR(tp))
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 03/17] net: sched: refactor tc_ctl_chain() to use block->lock
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock, modify chain API to use
block->lock to protect chain from concurrent modification. Rearrange
tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock to
prevent concurrent chain removal.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2ebf8e53038a..b5db0f79db14 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2255,6 +2255,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		err = -EINVAL;
 		goto errout_block;
 	}
+
+	mutex_lock(&block->lock);
 	chain = tcf_chain_lookup(block, chain_index);
 	if (n->nlmsg_type == RTM_NEWCHAIN) {
 		if (chain) {
@@ -2266,41 +2268,49 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 			} else {
 				NL_SET_ERR_MSG(extack, "Filter chain already exists");
 				err = -EEXIST;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 		} else {
 			if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 				NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN and NLM_F_CREATE to create a new chain");
 				err = -ENOENT;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 			chain = tcf_chain_create(block, chain_index);
 			if (!chain) {
 				NL_SET_ERR_MSG(extack, "Failed to create filter chain");
 				err = -ENOMEM;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 		}
 	} else {
 		if (!chain || tcf_chain_held_by_acts_only(chain)) {
 			NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
 			err = -EINVAL;
-			goto errout_block;
+			goto errout_block_locked;
 		}
 		tcf_chain_hold(chain);
 	}
 
+	if (n->nlmsg_type == RTM_NEWCHAIN) {
+		/* Modifying chain requires holding parent block lock. In case
+		 * the chain was successfully added, take a reference to the
+		 * chain. This ensures that an empty chain does not disappear at
+		 * the end of this function.
+		 */
+		tcf_chain_hold(chain);
+		chain->explicitly_created = true;
+	}
+	mutex_unlock(&block->lock);
+
 	switch (n->nlmsg_type) {
 	case RTM_NEWCHAIN:
 		err = tc_chain_tmplt_add(chain, net, tca, extack);
-		if (err)
+		if (err) {
+			tcf_chain_put_explicitly_created(chain);
 			goto errout;
-		/* In case the chain was successfully added, take a reference
-		 * to the chain. This ensures that an empty chain
-		 * does not disappear at the end of this function.
-		 */
-		tcf_chain_hold(chain);
-		chain->explicitly_created = true;
+		}
+
 		tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
 				RTM_NEWCHAIN, false);
 		break;
@@ -2334,6 +2344,10 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		/* Replay the request. */
 		goto replay;
 	return err;
+
+errout_block_locked:
+	mutex_unlock(&block->lock);
+	goto errout_block;
 }
 
 /* called with RTNL */
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 10/17] net: sched: refactor tp insert/delete for concurrent execution
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

Implement unique insertion function to atomically attach tcf_proto to chain
after verifying that no other tcf proto with specified priority exists.
Implement delete function that verifies that tp is actually empty before
deleting it. Use these functions to refactor cls API to account for
concurrent tp and rule update instead of relying on rtnl lock. Add new
'deleting' flag to tcf proto. Use it to restart search when iterating over
tp's on chain to prevent accessing potentially inval tp->next pointer.

Extend tcf proto with spinlock that is intended to be used to protect its
data from concurrent modification instead of relying on rtnl mutex. Use it
to protect 'deleting' flag. Add lockdep macros to validate that lock is
held when accessing protected fields.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  18 +++++
 net/sched/cls_api.c       | 177 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 170 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4372c08fc4d9..083e566fc380 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -322,6 +322,11 @@ struct tcf_proto {
 	void			*data;
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
+	/* Lock protects tcf_proto shared state and can be used by unlocked
+	 * classifiers to protect their private data.
+	 */
+	spinlock_t		lock;
+	bool			deleting;
 	refcount_t		refcnt;
 	struct rcu_head		rcu;
 };
@@ -382,16 +387,29 @@ static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
 {
 	return lockdep_is_held(&chain->filter_chain_lock);
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+	return lockdep_is_held(&tp->lock);
+}
 #else
 static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
 {
 	return true;
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+	return true;
+}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
 #define tcf_chain_dereference(p, chain)					\
 	rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
 
+#define tcf_proto_dereference(p, tp)					\
+	rcu_dereference_protected(p, lockdep_tcf_proto_is_locked(tp))
+
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 {
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index dca8a3bee9c2..c6452e3bfc6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -180,6 +180,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->protocol = protocol;
 	tp->prio = prio;
 	tp->chain = chain;
+	spin_lock_init(&tp->lock);
 	refcount_set(&tp->refcnt, 1);
 
 	err = tp->ops->init(tp);
@@ -217,6 +218,49 @@ static void tcf_proto_put(struct tcf_proto *tp,
 		tcf_proto_destroy(tp, extack);
 }
 
+static int walker_noop(struct tcf_proto *tp, void *d, struct tcf_walker *arg)
+{
+	return -1;
+}
+
+static bool tcf_proto_is_empty(struct tcf_proto *tp)
+{
+	struct tcf_walker walker = { .fn = walker_noop, };
+
+	if (tp->ops->walk) {
+		tp->ops->walk(tp, &walker);
+		return !walker.stop;
+	}
+	return true;
+}
+
+static bool tcf_proto_check_delete(struct tcf_proto *tp)
+{
+	spin_lock(&tp->lock);
+	if (tcf_proto_is_empty(tp))
+		tp->deleting = true;
+	spin_unlock(&tp->lock);
+	return tp->deleting;
+}
+
+static void tcf_proto_mark_delete(struct tcf_proto *tp)
+{
+	spin_lock(&tp->lock);
+	tp->deleting = true;
+	spin_unlock(&tp->lock);
+}
+
+static bool tcf_proto_is_deleting(struct tcf_proto *tp)
+{
+	bool deleting;
+
+	spin_lock(&tp->lock);
+	deleting = tp->deleting;
+	spin_unlock(&tp->lock);
+
+	return deleting;
+}
+
 #define ASSERT_BLOCK_LOCKED(block)					\
 	lockdep_assert_held(&(block)->lock)
 
@@ -983,13 +1027,27 @@ EXPORT_SYMBOL(tcf_get_next_chain);
 static struct tcf_proto *
 __tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
 {
+	u32 prio = 0;
+
 	ASSERT_RTNL();
 	mutex_lock(&chain->filter_chain_lock);
 
-	if (!tp)
+	if (!tp) {
 		tp = tcf_chain_dereference(chain->filter_chain, chain);
-	else
+	} else if (tcf_proto_is_deleting(tp)) {
+		/* 'deleting' flag is set and chain->filter_chain_lock was
+		 * unlocked, which means next pointer could be invalid. Restart
+		 * search.
+		 */
+		prio = tp->prio + 1;
+		tp = tcf_chain_dereference(chain->filter_chain, chain);
+
+		for (; tp; tp = tcf_chain_dereference(tp->next, chain))
+			if (!tp->deleting && tp->prio >= prio)
+				break;
+	} else {
 		tp = tcf_chain_dereference(tp->next, chain);
+	}
 
 	if (tp)
 		tcf_proto_get(tp);
@@ -1569,6 +1627,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 {
 	struct tcf_proto *next = tcf_chain_dereference(chain_info->next, chain);
 
+	tcf_proto_mark_delete(tp);
 	if (tp == chain->filter_chain)
 		tcf_chain0_head_change(chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
@@ -1577,6 +1636,79 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 					   struct tcf_chain_info *chain_info,
 					   u32 protocol, u32 prio,
+					   bool prio_allocate);
+
+/* Try to insert new proto.
+ * If proto with specified priority already exists, free new proto
+ * and return existing one.
+ */
+
+static struct tcf_proto *tcf_chain_tp_insert_unique(struct tcf_chain *chain,
+						    struct tcf_proto *tp_new,
+						    u32 protocol, u32 prio)
+{
+	struct tcf_chain_info chain_info;
+	struct tcf_proto *tp;
+
+	mutex_lock(&chain->filter_chain_lock);
+
+	tp = tcf_chain_tp_find(chain, &chain_info,
+			       protocol, prio, false);
+	if (!tp)
+		tcf_chain_tp_insert(chain, &chain_info, tp_new);
+	mutex_unlock(&chain->filter_chain_lock);
+
+	if (tp) {
+		tcf_proto_destroy(tp_new, NULL);
+		tp_new = tp;
+	}
+
+	return tp_new;
+}
+
+static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
+				      struct tcf_proto *tp,
+				      struct netlink_ext_ack *extack)
+{
+	struct tcf_chain_info chain_info;
+	struct tcf_proto *tp_iter;
+	struct tcf_proto **pprev;
+	struct tcf_proto *next;
+
+	mutex_lock(&chain->filter_chain_lock);
+
+	/* Atomically find and remove tp from chain. */
+	for (pprev = &chain->filter_chain;
+	     (tp_iter = tcf_chain_dereference(*pprev, chain));
+	     pprev = &tp_iter->next) {
+		if (tp_iter == tp) {
+			chain_info.pprev = pprev;
+			chain_info.next = tp_iter->next;
+			WARN_ON(tp_iter->deleting);
+			break;
+		}
+	}
+	/* Verify that tp still exists and no new filters were inserted
+	 * concurrently.
+	 * Mark tp for deletion if it is empty.
+	 */
+	if (!tp_iter || !tcf_proto_check_delete(tp)) {
+		mutex_unlock(&chain->filter_chain_lock);
+		return;
+	}
+
+	next = tcf_chain_dereference(chain_info.next, chain);
+	if (tp == chain->filter_chain)
+		tcf_chain0_head_change(chain, next);
+	RCU_INIT_POINTER(*chain_info.pprev, next);
+	mutex_unlock(&chain->filter_chain_lock);
+
+	tcf_proto_put(tp, extack);
+}
+
+static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
+					   struct tcf_chain_info *chain_info,
+					   u32 protocol, u32 prio,
 					   bool prio_allocate)
 {
 	struct tcf_proto **pprev;
@@ -1809,6 +1941,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 	if (tp == NULL) {
+		struct tcf_proto *tp_new = NULL;
+
 		/* Proto-tcf does not exist, create new one */
 
 		if (tca[TCA_KIND] == NULL || !protocol) {
@@ -1828,25 +1962,25 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 							       &chain_info));
 
 		mutex_unlock(&chain->filter_chain_lock);
-		tp = tcf_proto_create(nla_data(tca[TCA_KIND]),
-				      protocol, prio, chain, extack);
-		if (IS_ERR(tp)) {
-			err = PTR_ERR(tp);
+		tp_new = tcf_proto_create(nla_data(tca[TCA_KIND]),
+					  protocol, prio, chain, extack);
+		if (IS_ERR(tp_new)) {
+			err = PTR_ERR(tp_new);
 			goto errout;
 		}
 
-		mutex_lock(&chain->filter_chain_lock);
-		tcf_chain_tp_insert(chain, &chain_info, tp);
-		mutex_unlock(&chain->filter_chain_lock);
 		tp_created = 1;
-	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
-		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
-		err = -EINVAL;
-		goto errout_locked;
+		tp = tcf_chain_tp_insert_unique(chain, tp_new, protocol, prio);
 	} else {
 		mutex_unlock(&chain->filter_chain_lock);
 	}
 
+	if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
+		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
+		err = -EINVAL;
+		goto errout;
+	}
+
 	fh = tp->ops->get(tp, t->tcm_handle);
 
 	if (!fh) {
@@ -1873,12 +2007,10 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	if (err == 0)
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
-	else if (tp_created)
-		tcf_proto_destroy(tp, NULL);
 
 errout:
-	if (chain)
-		tcf_chain_put(chain);
+	if (err && tp_created)
+		tcf_chain_tp_delete_empty(chain, tp, NULL);
 	if (chain) {
 		if (tp && !IS_ERR(tp))
 			tcf_proto_put(tp, NULL);
@@ -1984,9 +2116,9 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tcf_chain_tp_remove(chain, &chain_info, tp);
 		mutex_unlock(&chain->filter_chain_lock);
 
+		tcf_proto_put(tp, NULL);
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_DELTFILTER, false);
-		tcf_proto_destroy(tp, extack);
 		err = 0;
 		goto errout;
 	}
@@ -2005,13 +2137,8 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 					 extack);
 		if (err)
 			goto errout;
-		if (last) {
-			mutex_lock(&chain->filter_chain_lock);
-			tcf_chain_tp_remove(chain, &chain_info, tp);
-			mutex_unlock(&chain->filter_chain_lock);
-
-			tcf_proto_destroy(tp, extack);
-		}
+		if (last)
+			tcf_chain_tp_delete_empty(chain, tp, extack);
 	}
 
 errout:
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next v4 16/17] net: sched: refactor tcf_block_find() into standalone functions
From: Vlad Buslov @ 2019-02-11  8:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov
In-Reply-To: <20190211085548.7190-1-vladbu@mellanox.com>

Refactor tcf_block_find() code into three standalone functions:
- __tcf_qdisc_find() to lookup Qdisc and increment its reference counter.
- __tcf_qdisc_cl_find() to lookup class.
- __tcf_block_find() to lookup block and increment its reference counter.

This change is necessary to allow netlink tc rule update handlers to call
these functions directly in order to conditionally take rtnl lock
according to Qdisc class ops flags before calling any of class ops
functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 241 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 149 insertions(+), 92 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e8ed461e94af..5f9373ee47ce 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1101,6 +1101,142 @@ static void tcf_block_flush_all_chains(struct tcf_block *block, bool rtnl_held)
 	}
 }
 
+/* Lookup Qdisc and increments its reference counter.
+ * Set parent, if necessary.
+ */
+
+static int __tcf_qdisc_find(struct net *net, struct Qdisc **q,
+			    u32 *parent, int ifindex, bool rtnl_held,
+			    struct netlink_ext_ack *extack)
+{
+	const struct Qdisc_class_ops *cops;
+	struct net_device *dev;
+	int err = 0;
+
+	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		return 0;
+
+	rcu_read_lock();
+
+	/* Find link */
+	dev = dev_get_by_index_rcu(net, ifindex);
+	if (!dev) {
+		rcu_read_unlock();
+		return -ENODEV;
+	}
+
+	/* Find qdisc */
+	if (!*parent) {
+		*q = dev->qdisc;
+		*parent = (*q)->handle;
+	} else {
+		*q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
+		if (!*q) {
+			NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+			err = -EINVAL;
+			goto errout_rcu;
+		}
+	}
+
+	*q = qdisc_refcount_inc_nz(*q);
+	if (!*q) {
+		NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+		err = -EINVAL;
+		goto errout_rcu;
+	}
+
+	/* Is it classful? */
+	cops = (*q)->ops->cl_ops;
+	if (!cops) {
+		NL_SET_ERR_MSG(extack, "Qdisc not classful");
+		err = -EINVAL;
+		goto errout_qdisc;
+	}
+
+	if (!cops->tcf_block) {
+		NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
+		err = -EOPNOTSUPP;
+		goto errout_qdisc;
+	}
+
+errout_rcu:
+	/* At this point we know that qdisc is not noop_qdisc,
+	 * which means that qdisc holds a reference to net_device
+	 * and we hold a reference to qdisc, so it is safe to release
+	 * rcu read lock.
+	 */
+	rcu_read_unlock();
+	return err;
+
+errout_qdisc:
+	rcu_read_unlock();
+
+	if (rtnl_held)
+		qdisc_put(*q);
+	else
+		qdisc_put_unlocked(*q);
+	*q = NULL;
+
+	return err;
+}
+
+static int __tcf_qdisc_cl_find(struct Qdisc *q, u32 parent, unsigned long *cl,
+			       int ifindex, struct netlink_ext_ack *extack)
+{
+	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		return 0;
+
+	/* Do we search for filter, attached to class? */
+	if (TC_H_MIN(parent)) {
+		const struct Qdisc_class_ops *cops = q->ops->cl_ops;
+
+		*cl = cops->find(q, parent);
+		if (*cl == 0) {
+			NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
+			return -ENOENT;
+		}
+	}
+
+	return 0;
+}
+
+static struct tcf_block *__tcf_block_find(struct net *net, struct Qdisc *q,
+					  unsigned long cl, int ifindex,
+					  u32 block_index,
+					  struct netlink_ext_ack *extack)
+{
+	struct tcf_block *block;
+
+	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
+		block = tcf_block_refcnt_get(net, block_index);
+		if (!block) {
+			NL_SET_ERR_MSG(extack, "Block of given index was not found");
+			return ERR_PTR(-EINVAL);
+		}
+	} else {
+		const struct Qdisc_class_ops *cops = q->ops->cl_ops;
+
+		block = cops->tcf_block(q, cl, extack);
+		if (!block)
+			return ERR_PTR(-EINVAL);
+
+		if (tcf_block_shared(block)) {
+			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
+			return ERR_PTR(-EOPNOTSUPP);
+		}
+
+		/* Always take reference to block in order to support execution
+		 * of rules update path of cls API without rtnl lock. Caller
+		 * must release block when it is finished using it. 'if' block
+		 * of this conditional obtain reference to block by calling
+		 * tcf_block_refcnt_get().
+		 */
+		refcount_inc(&block->refcnt);
+	}
+
+	return block;
+}
+
 static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
 			    struct tcf_block_ext_info *ei, bool rtnl_held)
 {
@@ -1146,106 +1282,27 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 	struct tcf_block *block;
 	int err = 0;
 
-	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_refcnt_get(net, block_index);
-		if (!block) {
-			NL_SET_ERR_MSG(extack, "Block of given index was not found");
-			return ERR_PTR(-EINVAL);
-		}
-	} else {
-		const struct Qdisc_class_ops *cops;
-		struct net_device *dev;
-
-		rcu_read_lock();
-
-		/* Find link */
-		dev = dev_get_by_index_rcu(net, ifindex);
-		if (!dev) {
-			rcu_read_unlock();
-			return ERR_PTR(-ENODEV);
-		}
-
-		/* Find qdisc */
-		if (!*parent) {
-			*q = dev->qdisc;
-			*parent = (*q)->handle;
-		} else {
-			*q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
-			if (!*q) {
-				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-				err = -EINVAL;
-				goto errout_rcu;
-			}
-		}
-
-		*q = qdisc_refcount_inc_nz(*q);
-		if (!*q) {
-			NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-			err = -EINVAL;
-			goto errout_rcu;
-		}
-
-		/* Is it classful? */
-		cops = (*q)->ops->cl_ops;
-		if (!cops) {
-			NL_SET_ERR_MSG(extack, "Qdisc not classful");
-			err = -EINVAL;
-			goto errout_rcu;
-		}
-
-		if (!cops->tcf_block) {
-			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-			err = -EOPNOTSUPP;
-			goto errout_rcu;
-		}
-
-		/* At this point we know that qdisc is not noop_qdisc,
-		 * which means that qdisc holds a reference to net_device
-		 * and we hold a reference to qdisc, so it is safe to release
-		 * rcu read lock.
-		 */
-		rcu_read_unlock();
+	ASSERT_RTNL();
 
-		/* Do we search for filter, attached to class? */
-		if (TC_H_MIN(*parent)) {
-			*cl = cops->find(*q, *parent);
-			if (*cl == 0) {
-				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
-				err = -ENOENT;
-				goto errout_qdisc;
-			}
-		}
+	err = __tcf_qdisc_find(net, q, parent, ifindex, true, extack);
+	if (err)
+		goto errout;
 
-		/* And the last stroke */
-		block = cops->tcf_block(*q, *cl, extack);
-		if (!block) {
-			err = -EINVAL;
-			goto errout_qdisc;
-		}
-		if (tcf_block_shared(block)) {
-			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
-			err = -EOPNOTSUPP;
-			goto errout_qdisc;
-		}
+	err = __tcf_qdisc_cl_find(*q, *parent, cl, ifindex, extack);
+	if (err)
+		goto errout_qdisc;
 
-		/* Always take reference to block in order to support execution
-		 * of rules update path of cls API without rtnl lock. Caller
-		 * must release block when it is finished using it. 'if' block
-		 * of this conditional obtain reference to block by calling
-		 * tcf_block_refcnt_get().
-		 */
-		refcount_inc(&block->refcnt);
-	}
+	block = __tcf_block_find(net, *q, *cl, ifindex, block_index, extack);
+	if (IS_ERR(block))
+		goto errout_qdisc;
 
 	return block;
 
-errout_rcu:
-	rcu_read_unlock();
 errout_qdisc:
-	if (*q) {
+	if (*q)
 		qdisc_put(*q);
-		*q = NULL;
-	}
+errout:
+	*q = NULL;
 	return ERR_PTR(err);
 }
 
-- 
2.13.6


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox