Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] sctp: fix spelling mistake: "max_retans" -> "max_retrans"
From: Neil Horman @ 2018-05-09 11:16 UTC (permalink / raw)
  To: Colin King
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S . Miller,
	linux-sctp, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180508222428.24874-1-colin.king@canonical.com>

On Tue, May 08, 2018 at 11:24:28PM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in error string
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  net/sctp/sm_make_chunk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 4d7b3ccea078..4a4fd1971255 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -1131,7 +1131,7 @@ struct sctp_chunk *sctp_make_violation_max_retrans(
>  					const struct sctp_association *asoc,
>  					const struct sctp_chunk *chunk)
>  {
> -	static const char error[] = "Association exceeded its max_retans count";
> +	static const char error[] = "Association exceeded its max_retrans count";
>  	size_t payload_len = sizeof(error) + sizeof(struct sctp_errhdr);
>  	struct sctp_chunk *retval;
>  
> -- 
> 2.17.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues.
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
  To: davem; +Cc: netdev

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  This RFC series adds the infrastructure
to do that and adds the functionality to the bnxt_en driver.

The "ip link set" command will subsequently be patched to support the new
operation.

Michael Chan (3):
  net: Add support to configure SR-IOV VF minimum and maximum queues.
  bnxt_en: Store min/max tx/rx rings for individual VFs.
  bnxt_en: Implement .ndo_set_vf_queues().

 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  5 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 90 +++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  2 +
 include/linux/if_link.h                         |  4 ++
 include/linux/netdevice.h                       |  6 ++
 include/uapi/linux/if_link.h                    |  9 +++
 net/core/rtnetlink.c                            | 28 +++++++-
 8 files changed, 138 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525864903-32619-1-git-send-email-michael.chan@broadcom.com>

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  With ever increasing number of VFs
being supported, it is desirable to allow the admin. to configure queue
resources differently for the VFs.  Some VFs may require more or fewer
queues due to different bandwidth requirements or different number of
vCPUs in the VM.  This patch adds the infrastructure to do that by
adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
to the net_device_ops.

Four parameters are exposed for each VF:

o min_tx_queues - Guaranteed or current tx queues assigned to the VF.

o max_tx_queues - Maximum but not necessarily guaranteed tx queues
  available to the VF.

o min_rx_queues - Guaranteed or current rx queues assigned to the VF.

o max_rx_queues - Maximum but not necessarily guaranteed rx queues
  available to the VF.

The "ip link set" command will subsequently be patched to support the new
operation to set the above parameters.

After the admin. makes a change to the above parameters, the corresponding
VF will have a new range of channels to set using ethtool -L.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 include/linux/if_link.h      |  4 ++++
 include/linux/netdevice.h    |  6 ++++++
 include/uapi/linux/if_link.h |  9 +++++++++
 net/core/rtnetlink.c         | 28 +++++++++++++++++++++++++---
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658d..8e81121 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -29,5 +29,9 @@ struct ifla_vf_info {
 	__u32 rss_query_en;
 	__u32 trusted;
 	__be16 vlan_proto;
+	__u32 min_tx_queues;
+	__u32 max_tx_queues;
+	__u32 min_rx_queues;
+	__u32 max_rx_queues;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 03ed492..30a3caf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,8 @@ struct dev_ifalias {
  *      with PF and querying it may introduce a theoretical security risk.
  * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
+ *			    int max_txq, int min_rxq, int max_rxq);
  * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
  *		       void *type_data);
  *	Called to setup any 'tc' scheduler, classifier or action on @dev.
@@ -1272,6 +1274,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
+	int			(*ndo_set_vf_queues)(struct net_device *dev,
+						     int vf,
+						     int min_txq, int max_txq,
+						     int min_rxq, int max_rxq);
 	int			(*ndo_setup_tc)(struct net_device *dev,
 						enum tc_setup_type type,
 						void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b852664..fc56a47 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -658,6 +658,7 @@ enum {
 	IFLA_VF_IB_NODE_GUID,	/* VF Infiniband node GUID */
 	IFLA_VF_IB_PORT_GUID,	/* VF Infiniband port GUID */
 	IFLA_VF_VLAN_LIST,	/* nested list of vlans, option for QinQ */
+	IFLA_VF_QUEUES,		/* Min and Max TX/RX queues */
 	__IFLA_VF_MAX,
 };
 
@@ -748,6 +749,14 @@ struct ifla_vf_trust {
 	__u32 setting;
 };
 
+struct ifla_vf_queues {
+	__u32 vf;
+	__u32 min_tx_queues;
+	__u32 max_tx_queues;
+	__u32 min_rx_queues;
+	__u32 max_rx_queues;
+};
+
 /* VF ports management section
  *
  *	Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8080254..7cf3582 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -921,7 +921,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size_64bit(sizeof(__u64)) +
 			 /* IFLA_VF_STATS_TX_DROPPED */
 			 nla_total_size_64bit(sizeof(__u64)) +
-			 nla_total_size(sizeof(struct ifla_vf_trust)));
+			 nla_total_size(sizeof(struct ifla_vf_trust)) +
+			 nla_total_size(sizeof(struct ifla_vf_queues)));
 		return size;
 	} else
 		return 0;
@@ -1181,6 +1182,7 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	struct ifla_vf_vlan_info vf_vlan_info;
 	struct ifla_vf_spoofchk vf_spoofchk;
 	struct ifla_vf_tx_rate vf_tx_rate;
+	struct ifla_vf_queues vf_queues;
 	struct ifla_vf_stats vf_stats;
 	struct ifla_vf_trust vf_trust;
 	struct ifla_vf_vlan vf_vlan;
@@ -1217,7 +1219,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		vf_spoofchk.vf =
 		vf_linkstate.vf =
 		vf_rss_query_en.vf =
-		vf_trust.vf = ivi.vf;
+		vf_trust.vf =
+		vf_queues.vf = ivi.vf;
 
 	memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 	vf_vlan.vlan = ivi.vlan;
@@ -1232,6 +1235,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	vf_linkstate.link_state = ivi.linkstate;
 	vf_rss_query_en.setting = ivi.rss_query_en;
 	vf_trust.setting = ivi.trusted;
+	vf_queues.min_tx_queues = ivi.min_tx_queues;
+	vf_queues.max_tx_queues = ivi.max_tx_queues;
+	vf_queues.min_rx_queues = ivi.min_rx_queues;
+	vf_queues.max_rx_queues = ivi.max_rx_queues;
 	vf = nla_nest_start(skb, IFLA_VF_INFO);
 	if (!vf)
 		goto nla_put_vfinfo_failure;
@@ -1249,7 +1256,9 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		    sizeof(vf_rss_query_en),
 		    &vf_rss_query_en) ||
 	    nla_put(skb, IFLA_VF_TRUST,
-		    sizeof(vf_trust), &vf_trust))
+		    sizeof(vf_trust), &vf_trust) ||
+	    nla_put(skb, IFLA_VF_QUEUES,
+		    sizeof(vf_queues), &vf_queues))
 		goto nla_put_vf_failure;
 	vfvlanlist = nla_nest_start(skb, IFLA_VF_VLAN_LIST);
 	if (!vfvlanlist)
@@ -1706,6 +1715,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	[IFLA_VF_TRUST]		= { .len = sizeof(struct ifla_vf_trust) },
 	[IFLA_VF_IB_NODE_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
 	[IFLA_VF_IB_PORT_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
+	[IFLA_VF_QUEUES]	= { .len = sizeof(struct ifla_vf_queues) },
 };
 
 static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -2208,6 +2218,18 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 		return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
 	}
 
+	if (tb[IFLA_VF_QUEUES]) {
+		struct ifla_vf_queues *ivq = nla_data(tb[IFLA_VF_QUEUES]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_queues)
+			err = ops->ndo_set_vf_queues(dev, ivq->vf,
+					ivq->min_tx_queues, ivq->max_tx_queues,
+					ivq->min_rx_queues, ivq->max_rx_queues);
+		if (err < 0)
+			return err;
+	}
+
 	return err;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next RFC 3/3] bnxt_en: Implement .ndo_set_vf_queues().
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525864903-32619-1-git-send-email-michael.chan@broadcom.com>

Implement .ndo_set_vf_queues() on the PF driver to configure the queues
parameters for individual VFs.  This allows the admin. on the host to
increase or decrease queues for individual VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 67 +++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  2 +
 3 files changed, 70 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dfa0839..2ce9779 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8373,6 +8373,7 @@ static int bnxt_swdev_port_attr_get(struct net_device *dev,
 	.ndo_set_vf_link_state	= bnxt_set_vf_link_state,
 	.ndo_set_vf_spoofchk	= bnxt_set_vf_spoofchk,
 	.ndo_set_vf_trust	= bnxt_set_vf_trust,
+	.ndo_set_vf_queues	= bnxt_set_vf_queues,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= bnxt_poll_controller,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 489e534..f0d938c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -138,6 +138,73 @@ int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trusted)
 	return 0;
 }
 
+static bool bnxt_param_ok(int new, u16 curr, u16 avail)
+{
+	int delta;
+
+	if (new <= curr)
+		return true;
+
+	delta = new - curr;
+	if (delta <= avail)
+		return true;
+	return false;
+}
+
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq)
+{
+	struct hwrm_func_vf_resource_cfg_input req = {0};
+	struct bnxt *bp = netdev_priv(dev);
+	u16 avail_tx_rings, avail_rx_rings;
+	struct bnxt_hw_resc *hw_resc;
+	struct bnxt_vf_info *vf;
+	int rc;
+
+	if (bnxt_vf_ndo_prep(bp, vf_id))
+		return -EINVAL;
+
+	if (!(bp->flags & BNXT_FLAG_NEW_RM))
+		return -EOPNOTSUPP;
+
+	vf = &bp->pf.vf[vf_id];
+	hw_resc = &bp->hw_resc;
+
+	avail_tx_rings = hw_resc->max_tx_rings - bp->tx_nr_rings;
+	if (bp->flags & BNXT_FLAG_AGG_RINGS)
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
+	else
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings;
+	if (!bnxt_param_ok(min_txq, vf->min_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(min_rxq, vf->min_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_txq, vf->max_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_rxq, vf->max_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
+	memcpy(&req, &bp->vf_resc_cfg_input, sizeof(req));
+	req.min_tx_rings = cpu_to_le16(min_txq);
+	req.min_rx_rings = cpu_to_le16(min_rxq);
+	req.max_tx_rings = cpu_to_le16(max_txq);
+	req.max_rx_rings = cpu_to_le16(max_rxq);
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return -EIO;
+
+	hw_resc->max_tx_rings += vf->min_tx_rings;
+	hw_resc->max_rx_rings += vf->min_rx_rings;
+	vf->min_tx_rings = min_txq;
+	vf->max_tx_rings = max_txq;
+	vf->min_rx_rings = min_rxq;
+	vf->max_rx_rings = max_rxq;
+	hw_resc->max_tx_rings -= vf->min_tx_rings;
+	hw_resc->max_rx_rings -= vf->min_rx_rings;
+	return 0;
+}
+
 int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		       struct ifla_vf_info *ivi)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..325b412 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -35,6 +35,8 @@
 int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next RFC 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs.
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1525864903-32619-1-git-send-email-michael.chan@broadcom.com>

With new infrastructure to configure queues differently for each VF,
we need to store the current min/max rx/tx rings for each VF.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  5 +++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 23 +++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9b14eb6..2f5a23c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -837,6 +837,10 @@ struct bnxt_vf_info {
 	u32	func_flags; /* func cfg flags */
 	u32	min_tx_rate;
 	u32	max_tx_rate;
+	u16	min_tx_rings;
+	u16	max_tx_rings;
+	u16	min_rx_rings;
+	u16	max_rx_rings;
 	void	*hwrm_cmd_req_addr;
 	dma_addr_t	hwrm_cmd_req_dma_addr;
 };
@@ -1351,6 +1355,7 @@ struct bnxt {
 #ifdef CONFIG_BNXT_SRIOV
 	int			nr_vfs;
 	struct bnxt_vf_info	vf;
+	struct hwrm_func_vf_resource_cfg_input vf_resc_cfg_input;
 	wait_queue_head_t	sriov_cfg_wait;
 	bool			sriov_cfg;
 #define BNXT_SRIOV_CFG_WAIT_TMO	msecs_to_jiffies(10000)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index a649108..489e534 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -171,6 +171,10 @@ int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		ivi->linkstate = IFLA_VF_LINK_STATE_ENABLE;
 	else
 		ivi->linkstate = IFLA_VF_LINK_STATE_DISABLE;
+	ivi->min_tx_queues = vf->min_tx_rings;
+	ivi->max_tx_queues = vf->max_tx_rings;
+	ivi->min_rx_queues = vf->min_rx_rings;
+	ivi->max_rx_queues = vf->max_rx_rings;
 
 	return 0;
 }
@@ -498,6 +502,8 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -506,7 +512,11 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 			break;
 		}
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = pf->first_vf_id + i;
+		vf->fw_fid = pf->first_vf_id + i;
+		vf->min_tx_rings = le16_to_cpu(req.min_tx_rings);
+		vf->max_tx_rings = vf_tx_rings;
+		vf->min_rx_rings = le16_to_cpu(req.min_rx_rings);
+		vf->max_rx_rings = vf_rx_rings;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (pf->active_vfs) {
@@ -521,6 +531,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 		hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n;
 		hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n;
 
+		memcpy(&bp->vf_resc_cfg_input, &req, sizeof(req));
 		rc = pf->active_vfs;
 	}
 	return rc;
@@ -585,6 +596,7 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
 		int vf_tx_rsvd = vf_tx_rings;
 
 		req.fid = cpu_to_le16(pf->first_vf_id + i);
@@ -593,12 +605,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 		if (rc)
 			break;
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = le16_to_cpu(req.fid);
-		rc = __bnxt_hwrm_get_tx_rings(bp, pf->vf[i].fw_fid,
-					      &vf_tx_rsvd);
+		vf->fw_fid = le16_to_cpu(req.fid);
+		rc = __bnxt_hwrm_get_tx_rings(bp, vf->fw_fid, &vf_tx_rsvd);
 		if (rc)
 			break;
 		total_vf_tx_rings += vf_tx_rsvd;
+		vf->min_tx_rings = vf_tx_rsvd;
+		vf->max_tx_rings = vf_tx_rsvd;
+		vf->min_rx_rings = vf_rx_rings;
+		vf->max_rx_rings = vf_rx_rings;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (rc)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next] net: dsa: fix added_by_user switchdev notification
From: Nikolay Aleksandrov @ 2018-05-09 11:32 UTC (permalink / raw)
  To: Vivien Didelot, netdev
  Cc: linux-kernel, kernel, Petr Machata, jiri, idosch, ivecera, davem,
	stephen, andrew, f.fainelli, bridge
In-Reply-To: <20180509030312.29548-1-vivien.didelot@savoirfairelinux.com>

On 09/05/18 06:03, Vivien Didelot wrote:
> Commit 161d82de1ff8 ("net: bridge: Notify about !added_by_user FDB
> entries") causes the below oops when bringing up a slave interface,
> because dsa_port_fdb_add is still scheduled, but with a NULL address.
> 
> To fix this, keep the dsa_slave_switchdev_event function agnostic of the
> notified info structure and handle the added_by_user flag in the
> specific dsa_slave_switchdev_event_work function.
> 
>      [   75.512263] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>      [   75.519063] pgd = (ptrval)
>      [   75.520545] [00000000] *pgd=00000000
>      [   75.522839] Internal error: Oops: 17 [#1] ARM
>      [   75.525898] Modules linked in:
>      [   75.527673] CPU: 0 PID: 9 Comm: kworker/u2:1 Not tainted 4.17.0-rc2 #78
>      [   75.532988] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
>      [   75.538153] Workqueue: dsa_ordered dsa_slave_switchdev_event_work
>      [   75.542970] PC is at mv88e6xxx_port_db_load_purge+0x60/0x1b0
>      [   75.547341] LR is at mdiobus_read_nested+0x6c/0x78
>      [   75.550833] pc : [<804cd5c0>]    lr : [<804bba84>]    psr: 60070013
>      [   75.555796] sp : 9f54bd78  ip : 9f54bd87  fp : 9f54bddc
>      [   75.559719] r10: 00000000  r9 : 0000000e  r8 : 9f6a6010
>      [   75.563643] r7 : 00000000  r6 : 81203048  r5 : 9f6a6010  r4 : 9f6a601c
>      [   75.568867] r3 : 00000000  r2 : 00000000  r1 : 0000000d  r0 : 00000000
>      [   75.574094] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>      [   75.579933] Control: 10c53c7d  Table: 9de20059  DAC: 00000051
>      [   75.584384] Process kworker/u2:1 (pid: 9, stack limit = 0x(ptrval))
>      [   75.589349] Stack: (0x9f54bd78 to 0x9f54c000)
>      [   75.592406] bd60:                                                       00000000 00000000
>      [   75.599295] bd80: 00000391 9f299d10 9f299d68 8014317c 9f7f0000 8120af00 00006dc2 00000000
>      [   75.606186] bda0: 8120af00 00000000 9f54bdec 1c9f5d92 8014317c 9f6a601c 9f6a6010 00000000
>      [   75.613076] bdc0: 00000000 00000000 9dd1141c 8125a0b4 9f54be0c 9f54bde0 804cd8a8 804cd56c
>      [   75.619966] bde0: 0000000e 80143680 00000001 9dce9c1c 81203048 9dce9c10 00000003 00000000
>      [   75.626858] be00: 9f54be5c 9f54be10 806abcac 804cd864 9f54be54 80143664 8014317c 80143054
>      [   75.633748] be20: ffcaa81d 00000000 812030b0 1c9f5d92 00000000 81203048 9f54beb4 00000003
>      [   75.640639] be40: ffffffff 00000000 9dd1141c 8125a0b4 9f54be84 9f54be60 80138e98 806abb18
>      [   75.647529] be60: 81203048 9ddc4000 9dce9c54 9f72a300 00000000 00000000 9f54be9c 9f54be88
>      [   75.654420] be80: 801390bc 80138e50 00000000 9dce9c54 9f54beac 9f54bea0 806a9524 801390a0
>      [   75.661310] bea0: 9f54bedc 9f54beb0 806a9c7c 806a950c 9f54becc 00000000 00000000 00000000
>      [   75.668201] bec0: 9f540000 1c9f5d92 805fe604 9ddffc00 9f54befc 9f54bee0 806ab228 806a9c38
>      [   75.675092] bee0: 806ab178 9ddffc00 9f4c1900 9f40d200 9f54bf34 9f54bf00 80131e30 806ab184
>      [   75.681983] bf00: 9f40d214 9f54a038 9f40d200 9f40d200 9f4c1918 812119a0 9f40d214 9f54a038
>      [   75.688873] bf20: 9f40d200 9f4c1900 9f54bf7c 9f54bf38 80132124 80131d1c 9f5f2dd8 00000000
>      [   75.695764] bf40: 812119a0 9f54a038 812119a0 81259c5b 9f5f2dd8 9f5f2dc0 9f53dbc0 00000000
>      [   75.702655] bf60: 9f4c1900 801320b4 9f5f2dd8 9f4f7e88 9f54bfac 9f54bf80 80137ad0 801320c0
>      [   75.709544] bf80: 9f54a000 9f53dbc0 801379a0 00000000 00000000 00000000 00000000 00000000
>      [   75.716434] bfa0: 00000000 9f54bfb0 801010e8 801379ac 00000000 00000000 00000000 00000000
>      [   75.723324] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>      [   75.730206] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
>      [   75.737083] Backtrace:
>      [   75.738252] [<804cd560>] (mv88e6xxx_port_db_load_purge) from [<804cd8a8>] (mv88e6xxx_port_fdb_add+0x50/0x68)
>      [   75.746795]  r10:8125a0b4 r9:9dd1141c r8:00000000 r7:00000000 r6:00000000 r5:9f6a6010
>      [   75.753323]  r4:9f6a601c
>      [   75.754570] [<804cd858>] (mv88e6xxx_port_fdb_add) from [<806abcac>] (dsa_switch_event+0x1a0/0x660)
>      [   75.762238]  r8:00000000 r7:00000003 r6:9dce9c10 r5:81203048 r4:9dce9c1c
>      [   75.767655] [<806abb0c>] (dsa_switch_event) from [<80138e98>] (notifier_call_chain+0x54/0x94)
>      [   75.774893]  r10:8125a0b4 r9:9dd1141c r8:00000000 r7:ffffffff r6:00000003 r5:9f54beb4
>      [   75.781423]  r4:81203048
>      [   75.782672] [<80138e44>] (notifier_call_chain) from [<801390bc>] (raw_notifier_call_chain+0x28/0x30)
>      [   75.790514]  r9:00000000 r8:00000000 r7:9f72a300 r6:9dce9c54 r5:9ddc4000 r4:81203048
>      [   75.796982] [<80139094>] (raw_notifier_call_chain) from [<806a9524>] (dsa_port_notify+0x24/0x38)
>      [   75.804483] [<806a9500>] (dsa_port_notify) from [<806a9c7c>] (dsa_port_fdb_add+0x50/0x6c)
>      [   75.811371] [<806a9c2c>] (dsa_port_fdb_add) from [<806ab228>] (dsa_slave_switchdev_event_work+0xb0/0x10c)
>      [   75.819635]  r4:9ddffc00
>      [   75.820885] [<806ab178>] (dsa_slave_switchdev_event_work) from [<80131e30>] (process_one_work+0x120/0x3a4)
>      [   75.829241]  r6:9f40d200 r5:9f4c1900 r4:9ddffc00 r3:806ab178
>      [   75.833612] [<80131d10>] (process_one_work) from [<80132124>] (worker_thread+0x70/0x574)
>      [   75.840415]  r10:9f4c1900 r9:9f40d200 r8:9f54a038 r7:9f40d214 r6:812119a0 r5:9f4c1918
>      [   75.846945]  r4:9f40d200
>      [   75.848191] [<801320b4>] (worker_thread) from [<80137ad0>] (kthread+0x130/0x160)
>      [   75.854300]  r10:9f4f7e88 r9:9f5f2dd8 r8:801320b4 r7:9f4c1900 r6:00000000 r5:9f53dbc0
>      [   75.860830]  r4:9f5f2dc0
>      [   75.862076] [<801379a0>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
>      [   75.867999] Exception stack(0x9f54bfb0 to 0x9f54bff8)
>      [   75.871753] bfa0:                                     00000000 00000000 00000000 00000000
>      [   75.878640] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>      [   75.885519] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>      [   75.890844]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801379a0
>      [   75.897377]  r4:9f53dbc0 r3:9f54a000
>      [   75.899663] Code: e3a02000 e3a03000 e14b26f4 e24bc055 (e5973000)
>      [   75.904575] ---[ end trace fbca818a124dbf0d ]---
> 
> Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
> @petr I expect the same issue with Rocker, but I haven't tested it.
> 
>   net/dsa/slave.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index c287f1ef964c..746ab428a17a 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1395,6 +1395,9 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
>   	switch (switchdev_work->event) {
>   	case SWITCHDEV_FDB_ADD_TO_DEVICE:
>   		fdb_info = &switchdev_work->fdb_info;
> +		if (!fdb_info->added_by_user)
> +			break;
> +
>   		err = dsa_port_fdb_add(dp, fdb_info->addr, fdb_info->vid);
>   		if (err) {
>   			netdev_dbg(dev, "fdb add failed err=%d\n", err);
> @@ -1406,6 +1409,9 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
>   
>   	case SWITCHDEV_FDB_DEL_TO_DEVICE:
>   		fdb_info = &switchdev_work->fdb_info;
> +		if (!fdb_info->added_by_user)
> +			break;
> +
>   		err = dsa_port_fdb_del(dp, fdb_info->addr, fdb_info->vid);
>   		if (err) {
>   			netdev_dbg(dev, "fdb del failed err=%d\n", err);
> @@ -1441,7 +1447,6 @@ static int dsa_slave_switchdev_event(struct notifier_block *unused,
>   				     unsigned long event, void *ptr)
>   {
>   	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> -	struct switchdev_notifier_fdb_info *fdb_info = ptr;
>   	struct dsa_switchdev_event_work *switchdev_work;
>   
>   	if (!dsa_slave_dev_check(dev))
> @@ -1459,10 +1464,7 @@ static int dsa_slave_switchdev_event(struct notifier_block *unused,
>   	switch (event) {
>   	case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
>   	case SWITCHDEV_FDB_DEL_TO_DEVICE:
> -		if (!fdb_info->added_by_user)
> -			break;
> -		if (dsa_slave_switchdev_fdb_work_init(switchdev_work,
> -						      fdb_info))
> +		if (dsa_slave_switchdev_fdb_work_init(switchdev_work, ptr))
>   			goto err_fdb_work_init;
>   		dev_hold(dev);
>   		break;
> 

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* [PATCH ethtool] ethtool: fix stack clash in do_get_phy_tunable and do_set_phy_tunable
From: Michal Kubecek @ 2018-05-09 12:01 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev, Raju Lakkaraju, Allan W. Nielsen

Users reported stack clash detected when using --get-phy-tunable on
ppc64le. Problem is caused by local variable ds of type struct
ethtool_tunable which has last member "void *data[0]". Accessing data[0]
(as do_get_phy_tunable() does) or adding requested value at the end (which
is what kernel ioctl does) writes past allocated space for the variable.

Make ds part of an anonymous structure to make sure there is enough space
for tunable value and drop the (pointless) access to ds.data[0]. The same
problem also exists in do_set_phy_tunable().

Fixes: b0fe96dec90f ("Ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE and PHY downshift")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 ethtool.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 3289e0f6e8ec..2e873848eb4e 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4740,20 +4740,22 @@ static int do_get_phy_tunable(struct cmd_context *ctx)
 	}
 
 	if (downshift_changed) {
-		struct ethtool_tunable ds;
+		struct {
+			struct ethtool_tunable ds;
+			u8 __count;
+		} cont;
 		u8 count = 0;
 
-		ds.cmd = ETHTOOL_PHY_GTUNABLE;
-		ds.id = ETHTOOL_PHY_DOWNSHIFT;
-		ds.type_id = ETHTOOL_TUNABLE_U8;
-		ds.len = 1;
-		ds.data[0] = &count;
-		err = send_ioctl(ctx, &ds);
+		cont.ds.cmd = ETHTOOL_PHY_GTUNABLE;
+		cont.ds.id = ETHTOOL_PHY_DOWNSHIFT;
+		cont.ds.type_id = ETHTOOL_TUNABLE_U8;
+		cont.ds.len = 1;
+		err = send_ioctl(ctx, &cont.ds);
 		if (err < 0) {
 			perror("Cannot Get PHY downshift count");
 			return 87;
 		}
-		count = *((u8 *)&ds.data[0]);
+		count = *((u8 *)&cont.ds.data[0]);
 		if (count)
 			fprintf(stdout, "Downshift count: %d\n", count);
 		else
@@ -4931,16 +4933,17 @@ static int do_set_phy_tunable(struct cmd_context *ctx)
 
 	/* Do it */
 	if (ds_changed) {
-		struct ethtool_tunable ds;
-		u8 count;
-
-		ds.cmd = ETHTOOL_PHY_STUNABLE;
-		ds.id = ETHTOOL_PHY_DOWNSHIFT;
-		ds.type_id = ETHTOOL_TUNABLE_U8;
-		ds.len = 1;
-		ds.data[0] = &count;
-		*((u8 *)&ds.data[0]) = ds_cnt;
-		err = send_ioctl(ctx, &ds);
+		struct {
+			struct ethtool_tunable ds;
+			u8 __count;
+		} cont;
+
+		cont.ds.cmd = ETHTOOL_PHY_STUNABLE;
+		cont.ds.id = ETHTOOL_PHY_DOWNSHIFT;
+		cont.ds.type_id = ETHTOOL_TUNABLE_U8;
+		cont.ds.len = 1;
+		*((u8 *)&cont.ds.data[0]) = ds_cnt;
+		err = send_ioctl(ctx, &cont.ds);
 		if (err < 0) {
 			perror("Cannot Set PHY downshift count");
 			err = 87;
-- 
2.16.3

^ permalink raw reply related

* Re: [PATCH 09/18] net: mac80211.h: fix a bad comment line
From: Mauro Carvalho Chehab @ 2018-05-09 12:04 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Kalle Valo, Linux Doc Mailing List, Mauro Carvalho Chehab,
	linux-kernel, Jonathan Corbet, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <1525696706.6049.7.camel@sipsolutions.net>

Em Mon, 07 May 2018 14:38:26 +0200
Johannes Berg <johannes@sipsolutions.net> escreveu:

> On Mon, 2018-05-07 at 15:37 +0300, Kalle Valo wrote:
> > Mauro Carvalho Chehab <mchehab+samsung@kernel.org> writes:
> >   
> > > Sphinx produces a lot of errors like this:
> > > 	./include/net/mac80211.h:2083: warning: bad line:  >
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>  
> > 
> > Randy already submitted a similar patch:
> > 
> > https://patchwork.kernel.org/patch/10367275/
> > 
> > But it seems Johannes has not applied that yet.  
> 
> Yeah, I've been super busy preparing for the plugfest.
> 
> I'll make a pass over all the patches as soon as I can, hopefully today
> or tomorrow.

Thanks. I'll drop it from my patchset, assuming that you'll
be applying Randy's version or mine via your tree.

Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH 09/18] net: mac80211.h: fix a bad comment line
From: Johannes Berg @ 2018-05-09 12:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Kalle Valo, Linux Doc Mailing List, Mauro Carvalho Chehab,
	linux-kernel, Jonathan Corbet, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <20180509090400.0156c0c8@vento.lan>

On Wed, 2018-05-09 at 09:04 -0300, Mauro Carvalho Chehab wrote:
> Em Mon, 07 May 2018 14:38:26 +0200
> Johannes Berg <johannes@sipsolutions.net> escreveu:
> 
> > On Mon, 2018-05-07 at 15:37 +0300, Kalle Valo wrote:
> > > Mauro Carvalho Chehab <mchehab+samsung@kernel.org> writes:
> > >   
> > > > Sphinx produces a lot of errors like this:
> > > > 	./include/net/mac80211.h:2083: warning: bad line:  >
> > > > 
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>  
> > > 
> > > Randy already submitted a similar patch:
> > > 
> > > https://patchwork.kernel.org/patch/10367275/
> > > 
> > > But it seems Johannes has not applied that yet.  
> > 
> > Yeah, I've been super busy preparing for the plugfest.
> > 
> > I'll make a pass over all the patches as soon as I can, hopefully today
> > or tomorrow.
> 
> Thanks. I'll drop it from my patchset, assuming that you'll
> be applying Randy's version or mine via your tree.

Right, I did, just need to send a pull request.

johannes

^ permalink raw reply

* [PATCH] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
From: Dan Murphy @ 2018-05-09 12:05 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, linux-kernel, Dan Murphy

Add support for the DP83811 phy.

The DP83811 supports both rgmii and sgmii interfaces.
There are 2 part numbers for this the DP83TC811R does not
reliably support the SGMII interface but the DP83TC811S will.

There is not a way to differentiate these parts from the
hardware or register set.  So this is controlled via the DT
to indicate which phy mode is required.  Or the part can be
strapped to a certain interface.

Data sheet can be found here:
http://www.ti.com/product/DP83TC811S-Q1/description
http://www.ti.com/product/DP83TC811R-Q1/description

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---
 drivers/net/phy/Kconfig     |   5 +
 drivers/net/phy/Makefile    |   1 +
 drivers/net/phy/dp83tc811.c | 350 ++++++++++++++++++++++++++++++++++++
 3 files changed, 356 insertions(+)
 create mode 100644 drivers/net/phy/dp83tc811.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb86ee0..810140a9e114 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -285,6 +285,11 @@ config DP83822_PHY
 	---help---
 	  Supports the DP83822 PHY.
 
+config DP83TC811_PHY
+	tristate "Texas Instruments DP83TC822 PHY"
+	---help---
+	  Supports the DP83TC822 PHY.
+
 config DP83848_PHY
 	tristate "Texas Instruments DP83848 PHY"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb2c798..00445b61a9a8 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
 obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
+obj-$(CONFIG_DP83TC811_PHY)	+= dp83tc811.o
 obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
 obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
 obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
diff --git a/drivers/net/phy/dp83tc811.c b/drivers/net/phy/dp83tc811.c
new file mode 100644
index 000000000000..01cb0e246449
--- /dev/null
+++ b/drivers/net/phy/dp83tc811.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for the Texas Instruments DP83TC811 PHY
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy.h>
+#include <linux/netdevice.h>
+
+#define DP83TC811_PHY_ID	0x2000a253
+#define DP83811_DEVADDR		0x1f
+
+#define MII_DP83811_SGMII_CTRL	0x09
+#define MII_DP83811_INT_STAT1	0x12
+#define MII_DP83811_INT_STAT2	0x13
+#define MII_DP83811_RESET_CTRL	0x1f
+
+#define DP83811_HW_RESET	BIT(15)
+#define DP83811_SW_RESET	BIT(14)
+
+/* INT_STAT1 bits */
+#define DP83811_RX_ERR_HF_INT_EN	BIT(0)
+#define DP83811_MS_TRAINING_INT_EN	BIT(1)
+#define DP83811_ANEG_COMPLETE_INT_EN	BIT(2)
+#define DP83811_ESD_EVENT_INT_EN	BIT(3)
+#define DP83811_WOL_INT_EN		BIT(4)
+#define DP83811_LINK_STAT_INT_EN	BIT(5)
+#define DP83811_ENERGY_DET_INT_EN	BIT(6)
+#define DP83811_LINK_QUAL_INT_EN	BIT(7)
+
+/* INT_STAT2 bits */
+#define DP83811_JABBER_DET_INT_EN	BIT(0)
+#define DP83811_POLARITY_INT_EN		BIT(1)
+#define DP83811_SLEEP_MODE_INT_EN	BIT(2)
+#define DP83811_OVERTEMP_INT_EN		BIT(3)
+#define DP83811_OVERVOLTAGE_INT_EN	BIT(6)
+#define DP83811_UNDERVOLTAGE_INT_EN	BIT(7)
+
+#define MII_DP83811_RXSOP1	0x04a5
+#define MII_DP83811_RXSOP2	0x04a6
+#define MII_DP83811_RXSOP3	0x04a7
+
+/* WoL Registers */
+#define MII_DP83811_WOL_CFG	0x04a0
+#define MII_DP83811_WOL_STAT	0x04a1
+#define MII_DP83811_WOL_DA1	0x04a2
+#define MII_DP83811_WOL_DA2	0x04a3
+#define MII_DP83811_WOL_DA3	0x04a4
+
+/* WoL bits */
+#define DP83811_WOL_MAGIC_EN	BIT(0)
+#define DP83811_WOL_SECURE_ON	BIT(5)
+#define DP83811_WOL_EN		BIT(7)
+#define DP83811_WOL_INDICATION_SEL BIT(8)
+#define DP83811_WOL_CLR_INDICATION BIT(11)
+
+/* SGMII CTRL bits */
+#define DP83811_TDR_AUTO		BIT(8)
+#define DP83811_SGMII_EN		BIT(12)
+#define DP83811_SGMII_AUTO_NEG_EN	BIT(13)
+#define DP83811_SGMII_TX_ERR_DIS	BIT(14)
+#define DP83811_SGMII_SOFT_RESET	BIT(15)
+
+static int dp83811_ack_interrupt(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_read(phydev, MII_DP83811_INT_STAT1);
+	if (err < 0)
+		return err;
+
+	err = phy_read(phydev, MII_DP83811_INT_STAT2);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83811_set_wol(struct phy_device *phydev,
+			   struct ethtool_wolinfo *wol)
+{
+	struct net_device *ndev = phydev->attached_dev;
+	u16 value;
+	const u8 *mac;
+
+	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
+		mac = (const u8 *)ndev->dev_addr;
+
+		if (!is_valid_ether_addr(mac))
+			return -EINVAL;
+
+		/* MAC addresses start with byte 5, but stored in mac[0].
+		 * 811 PHYs store bytes 4|5, 2|3, 0|1
+		 */
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA1,
+			      (mac[1] << 8) | mac[0]);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA2,
+			      (mac[3] << 8) | mac[2]);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA3,
+			      (mac[5] << 8) | mac[4]);
+
+		value = phy_read_mmd(phydev, DP83811_DEVADDR,
+				     MII_DP83811_WOL_CFG);
+		if (wol->wolopts & WAKE_MAGIC)
+			value |= DP83811_WOL_MAGIC_EN;
+		else
+			value &= ~DP83811_WOL_MAGIC_EN;
+
+		if (wol->wolopts & WAKE_MAGICSECURE) {
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP1,
+				      (wol->sopass[1] << 8) | wol->sopass[0]);
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP2,
+				      (wol->sopass[3] << 8) | wol->sopass[2]);
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP3,
+				      (wol->sopass[5] << 8) | wol->sopass[4]);
+			value |= DP83811_WOL_SECURE_ON;
+		} else {
+			value &= ~DP83811_WOL_SECURE_ON;
+		}
+
+		value |= (DP83811_WOL_EN | DP83811_WOL_INDICATION_SEL |
+			  DP83811_WOL_CLR_INDICATION);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+			      value);
+	} else {
+		value = phy_read_mmd(phydev, DP83811_DEVADDR,
+				     MII_DP83811_WOL_CFG);
+		value &= ~DP83811_WOL_EN;
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+			      value);
+	}
+
+	return 0;
+}
+
+static void dp83811_get_wol(struct phy_device *phydev,
+			    struct ethtool_wolinfo *wol)
+{
+	int value;
+	u16 sopass_val;
+
+	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
+	wol->wolopts = 0;
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	if (value & DP83811_WOL_MAGIC_EN)
+		wol->wolopts |= WAKE_MAGIC;
+
+	if (value & DP83811_WOL_SECURE_ON) {
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP1);
+		wol->sopass[0] = (sopass_val & 0xff);
+		wol->sopass[1] = (sopass_val >> 8);
+
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP2);
+		wol->sopass[2] = (sopass_val & 0xff);
+		wol->sopass[3] = (sopass_val >> 8);
+
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP3);
+		wol->sopass[4] = (sopass_val & 0xff);
+		wol->sopass[5] = (sopass_val >> 8);
+
+		wol->wolopts |= WAKE_MAGICSECURE;
+	}
+
+	/* WoL is not enabled so set wolopts to 0 */
+	if (!(value & DP83811_WOL_EN))
+		wol->wolopts = 0;
+}
+
+static int dp83811_config_intr(struct phy_device *phydev)
+{
+	int misr_status;
+	int err;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		misr_status = phy_read(phydev, MII_DP83811_INT_STAT1);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83811_RX_ERR_HF_INT_EN |
+				DP83811_MS_TRAINING_INT_EN |
+				DP83811_ANEG_COMPLETE_INT_EN |
+				DP83811_ESD_EVENT_INT_EN |
+				DP83811_WOL_INT_EN |
+				DP83811_LINK_STAT_INT_EN |
+				DP83811_ENERGY_DET_INT_EN |
+				DP83811_LINK_QUAL_INT_EN);
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, misr_status);
+		if (err < 0)
+			return err;
+
+		misr_status = phy_read(phydev, MII_DP83811_INT_STAT2);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83811_JABBER_DET_INT_EN |
+				DP83811_POLARITY_INT_EN |
+				DP83811_SLEEP_MODE_INT_EN |
+				DP83811_OVERTEMP_INT_EN |
+				DP83811_OVERVOLTAGE_INT_EN |
+				DP83811_UNDERVOLTAGE_INT_EN);
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT2, misr_status);
+
+	} else {
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+		if (err < 0)
+			return err;
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+	}
+
+	return err;
+}
+
+static int dp83811_config_aneg(struct phy_device *phydev)
+{
+	int err;
+	int value;
+
+	value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+	if (phydev->autoneg == AUTONEG_ENABLE) {
+		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(DP83811_SGMII_AUTO_NEG_EN | value));
+		if (err < 0)
+			return err;
+	} else {
+		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(~DP83811_SGMII_AUTO_NEG_EN & value));
+		if (err < 0)
+			return err;
+	}
+
+	return genphy_config_aneg(phydev);
+}
+
+static int dp83811_config_init(struct phy_device *phydev)
+{
+	int err;
+	int value;
+
+	err = genphy_config_init(phydev);
+	if (err < 0)
+		return err;
+
+	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+		value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+		if (!(value & DP83811_SGMII_EN)) {
+			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+					(DP83811_SGMII_EN | value));
+			if (err < 0)
+				return err;
+		} else {
+			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(~DP83811_SGMII_EN & value));
+			if (err < 0)
+				return err;
+		}
+	}
+
+	value = DP83811_WOL_MAGIC_EN | DP83811_WOL_SECURE_ON | DP83811_WOL_EN;
+
+	return phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+	      value);
+}
+
+static int dp83811_phy_reset(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_write(phydev, MII_DP83811_RESET_CTRL, DP83811_HW_RESET);
+	if (err < 0)
+		return err;
+
+	dp83811_config_init(phydev);
+
+	return 0;
+}
+
+static int dp83811_suspend(struct phy_device *phydev)
+{
+	int value;
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	if (!(value & DP83811_WOL_EN))
+		genphy_suspend(phydev);
+
+	return 0;
+}
+
+static int dp83811_resume(struct phy_device *phydev)
+{
+	int value;
+
+	genphy_resume(phydev);
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG, value |
+		      DP83811_WOL_CLR_INDICATION);
+
+	return 0;
+}
+
+static struct phy_driver dp83811_driver[] = {
+	{
+		.phy_id = DP83TC811_PHY_ID,
+		.phy_id_mask = 0xfffffff0,
+		.name = "TI DP83TC811",
+		.features = PHY_BASIC_FEATURES,
+		.flags = PHY_HAS_INTERRUPT,
+		.config_init = genphy_config_init,
+		.config_aneg = dp83811_config_aneg,
+		.soft_reset = dp83811_phy_reset,
+		.get_wol = dp83811_get_wol,
+		.set_wol = dp83811_set_wol,
+		.ack_interrupt = dp83811_ack_interrupt,
+		.config_intr = dp83811_config_intr,
+		.suspend = dp83811_suspend,
+		.resume = dp83811_resume,
+	 },
+};
+module_phy_driver(dp83811_driver);
+
+static struct mdio_device_id __maybe_unused dp83811_tbl[] = {
+	{ DP83TC811_PHY_ID, 0xfffffff0 },
+	{ },
+};
+MODULE_DEVICE_TABLE(mdio, dp83811_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments DP83TC811 PHY driver");
+MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
+MODULE_LICENSE("GPL");
-- 
2.17.0.582.gccdcbd54c

^ permalink raw reply related

* Re: [PATCH ghak81 RFC V1 4/5] audit: use inline function to set audit context
From: Richard Guy Briggs @ 2018-05-09 12:09 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: Linux-Audit Mailing List, LKML,
	Linux NetDev Upstream Mailing List, Netfilter Devel List,
	Linux Security Module list, Integrity Measurement Architecture,
	SElinux list, Eric Paris, Paul Moore, Steve Grubb, Ingo Molnar,
	David Howells
In-Reply-To: <20180509020700.GE7517@eros>

On 2018-05-09 12:07, Tobin C. Harding wrote:
> On Fri, May 04, 2018 at 04:54:37PM -0400, Richard Guy Briggs wrote:
> > Recognizing that the audit context is an internal audit value, use an
> > access function to set the audit context pointer for the task
> > rather than reaching directly into the task struct to set it.
> > 
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> >  include/linux/audit.h | 8 ++++++++
> >  kernel/auditsc.c      | 6 +++---
> >  kernel/fork.c         | 2 +-
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 93e4c61..dba0d45 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -235,6 +235,10 @@ extern void __audit_inode_child(struct inode *parent,
> >  extern void __audit_seccomp(unsigned long syscall, long signr, int code);
> >  extern void __audit_ptrace(struct task_struct *t);
> >  
> > +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> > +{
> > +	task->audit_context = ctx;
> > +}
> >  static inline struct audit_context *audit_context(struct task_struct *task)
> >  {
> >  	return task->audit_context;
> > @@ -472,6 +476,10 @@ static inline bool audit_dummy_context(void)
> >  {
> >  	return true;
> >  }
> > +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> > +{
> > +	task->audit_context = ctx;
> > +}
> 
> If audit_context is an internal audit value why do we set it when
> CONFIG_AUDITSYSCALL is not set?

Agreed, that is unnecessary, but harmless since it won't be called, or
will be called with a value of NULL.  That has been fixed in my dynamic
allocation patchset since not even the audit_task_info struct is
available to assign the value.  It is now an empty function like the
rest.

> Tobin.

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH v15 net-next 00/12] Chelsio Inline TLS
From: Atul Gupta @ 2018-05-09 12:15 UTC (permalink / raw)
  To: Boris Pismenny, David Miller
  Cc: herbert, davejwatson, sd, sbrivio, linux-crypto, netdev, werner,
	leedom, swise, indranil, ganeshgr
In-Reply-To: <ee9bfb6a-887b-a0fb-8880-dc709a217b50@mellanox.com>



On 4/1/2018 6:27 PM, Boris Pismenny wrote:
> Hi,
>
> On 4/1/2018 6:37 AM, David Miller wrote:
>> From: Atul Gupta <atul.gupta@chelsio.com>
>> Date: Sat, 31 Mar 2018 21:41:51 +0530
>>
>>> Series for Chelsio Inline TLS driver (chtls)
>>
>> Series applied, thank you.
>>
>
> Sorry for being late to the party, could you please help answer a few questions to help me understand better.
Going over these points and addressing some of them in follow-up patches:
>
> 1. What happens if someone attempts to set a TCP socket option for a socket whose TCP stack resides in the TCP offload engine(TOE)? Do you emulate all Linux socket options? What about IP socket options?
HW offloaded options are handled while rest shall be redirected to host.
>
> If I follow the code correctly, then the original TCP/IP setsockopt is called. But, it doesn't change any of the parameters of the TCP/IP offload engine in hardware.
>
> 2. I can't find where is the TLS record sequence number pushed to hardware. Is that on purpose?
seq-nos is pushed to HW in cpl_tx_tls_sfo->scmd1
>
> FYI, ignoring this parameter might cause a record sequence number reuse which breaks the integrity of the AES-GCM TLS ciphersuite.
>
> 3. How does a TOE handle Tx only or Rx only?
Driver does not differentiate/isolate the tx and rx path for Inline Processing
>
> 4. What happens when there is a routing change that redirects traffic to a different netdev? Is there a software fallback?
The case we think is handling the next hop change, is there any other case?
>
> 5. The TLS socket option is set in the middle of a TCP connection. What happens to the existing TCP connection and the data that is currently queued in the TCP write queue?
I believe this behave same as SW. If by TLS options you mean re-keying then outstanding data on Tx is flushed before new key takes effect. For Rx user should be careful else it will result in MAC error.

Thanks
Atul
>
> Thanks,
> Boris.
>

^ permalink raw reply

* [PATCH] netfilter: nf_tables: fix memory leak on error exit return
From: Colin King @ 2018-05-09 12:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S . Miller, netfilter-devel, coreteam, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Currently the -EBUSY error return path is not free'ing resources
allocated earlier, leaving a memory leak. Fix this by exiting via the
error exit label err5 that performs the necessary resource clean
up.

Detected by CoverityScan, CID#1432975 ("Resource leak")

Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/netfilter/nf_tables_api.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6422eba367cf..a3d77aa0f752 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4098,8 +4098,10 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
-			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF))
-				return -EBUSY;
+			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
+				err = -EBUSY;
+				goto err5;
+			}
 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
 			     memcmp(nft_set_ext_data(ext),
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] netfilter: nf_tables: fix memory leak on error exit return
From: Pablo Neira Ayuso @ 2018-05-09 12:32 UTC (permalink / raw)
  To: Colin King
  Cc: Jozsef Kadlecsik, Florian Westphal, David S . Miller,
	netfilter-devel, coreteam, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180509122256.16859-1-colin.king@canonical.com>

On Wed, May 09, 2018 at 01:22:56PM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Currently the -EBUSY error return path is not free'ing resources
> allocated earlier, leaving a memory leak. Fix this by exiting via the
> error exit label err5 that performs the necessary resource clean
> up.
> 
> Detected by CoverityScan, CID#1432975 ("Resource leak")

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
From: Stephen Smalley @ 2018-05-09 12:37 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexey Kodanev, netdev,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	selinux-+05T5uksL2qpZYMLLGbcSA
In-Reply-To: <CAHC9VhT1+-ch1Ncv5YCNgu7tPnUj1Qx8S=a=q=Fn=Dwx4SnTKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 05/08/2018 08:25 PM, Paul Moore wrote:
> On Tue, May 8, 2018 at 2:40 PM, Stephen Smalley <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org> wrote:
>> On 05/08/2018 01:05 PM, Paul Moore wrote:
>>> On Tue, May 8, 2018 at 10:05 AM, Alexey Kodanev
>>> <alexey.kodanev-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>> Commit d452930fd3b9 ("selinux: Add SCTP support") breaks compatibility
>>>> with the old programs that can pass sockaddr_in with AF_UNSPEC and
>>>> INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT error.
>>>> It was found with LTP/asapi_01 test.
>>>>
>>>> Similar to commit 29c486df6a20 ("net: ipv4: relax AF_INET check in
>>>> bind()"), which relaxed AF_INET check for compatibility, add AF_UNSPEC
>>>> case to AF_INET and make sure that the address is INADDR_ANY.
>>>>
>>>> Also, in the end of selinux_socket_bind(), instead of adding AF_UNSPEC
>>>> to 'address->sa_family == AF_INET', verify AF_INET6 first.
>>>>
>>>> Fixes: d452930fd3b9 ("selinux: Add SCTP support")
>>>> Signed-off-by: Alexey Kodanev <alexey.kodanev-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  security/selinux/hooks.c | 12 +++++++++---
>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> Thanks for finding and reporting this regression.
>>>
>>> I think I would prefer to avoid having to duplicate the
>>> AF_UNSPEC/INADDR_ANY checking logic in the SELinux hook, even though
>>> it is a small bit of code and unlikely to change.  I'm wondering if it
>>> would be better to check both the socket and sockaddr address family
>>> in the main if conditional inside selinux_socket_bind(), what do you
>>> think?  Another option would be to go back to just checking the
>>> soackaddr address family; we moved away from that for a reason which
>>> escapes at the moment (code cleanliness?), but perhaps that was a
>>> mistake.
>>
>> We've always used the sk->sk_family there; it was only the recent code from Richard that started
>> using the socket address family.
> 
> Yes I know, I thought I was the one that suggested it at some point
> (I'll take the blame) ... although now that I'm looking at the git
> log, maybe I'm confusing it with something else.
> 
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 4cafe6a19167..a3789b167667 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -4577,6 +4577,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>> {
>>>        struct sock *sk = sock->sk;
>>>        u16 family;
>>> +       u16 family_sa;
>>>        int err;
>>>
>>>        err = sock_has_perm(sk, SOCKET__BIND);
>>> @@ -4585,7 +4586,9 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>>
>>>        /* If PF_INET or PF_INET6, check name_bind permission for the port. */
>>>        family = sk->sk_family;
>>> -       if (family == PF_INET || family == PF_INET6) {
>>> +       family_sa = address->sa_family;
>>> +       if ((family == PF_INET || family == PF_INET6) &&
>>> +           (family_sa == PF_INET || family_sa == PF_INET6)) {
>>
>> Wouldn't this allow bypassing the name_bind permission check by passing in AF_UNSPEC?
> 
> I believe these name_bind permission checkis skipped for AF_UNSPEC
> already, isn't it?  The only way the name_bind check would be
> triggered is if the source port, snum, was non-zero and I didn't think
> that was really legal for AF_UNSPEC/INADDR_ANY, is it?

1) What in inet_bind() prevents that from occurring?
2) Regardless, what about the node_bind check?

> 
>>>                char *addrp;
>>>                struct sk_security_struct *sksec = sk->sk_security;
>>>                struct common_audit_data ad;
>>> @@ -4601,7 +4604,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>>                 * need to check address->sa_family as it is possible to have
>>>                 * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>>                 */
>>> -               switch (address->sa_family) {
>>> +               switch (family_sa) {
>>>                case AF_INET:
>>>                        if (addrlen < sizeof(struct sockaddr_in))
>>>                                return -EINVAL;
>>>
>>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>>> index 4cafe6a..649a3be 100644
>>>> --- a/security/selinux/hooks.c
>>>> +++ b/security/selinux/hooks.c
>>>> @@ -4602,10 +4602,16 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>>                  * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>>>                  */
>>>>                 switch (address->sa_family) {
>>>> +               case AF_UNSPEC:
>>>>                 case AF_INET:
>>>>                         if (addrlen < sizeof(struct sockaddr_in))
>>>>                                 return -EINVAL;
>>>>                         addr4 = (struct sockaddr_in *)address;
>>>> +
>>>> +                       if (address->sa_family == AF_UNSPEC &&
>>>> +                           addr4->sin_addr.s_addr != htonl(INADDR_ANY))
>>>> +                               return -EAFNOSUPPORT;
>>>> +
>>>>                         snum = ntohs(addr4->sin_port);
>>>>                         addrp = (char *)&addr4->sin_addr.s_addr;
>>>>                         break;
>>>> @@ -4681,10 +4687,10 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>>                 ad.u.net->sport = htons(snum);
>>>>                 ad.u.net->family = family;
>>>>
>>>> -               if (address->sa_family == AF_INET)
>>>> -                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>>> -               else
>>>> +               if (address->sa_family == AF_INET6)
>>>>                         ad.u.net->v6info.saddr = addr6->sin6_addr;
>>>> +               else
>>>> +                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>>>
>>>>                 err = avc_has_perm(&selinux_state,
>>>>                                    sksec->sid, sid,
>>>> --
>>>> 1.8.3.1
>>>>
>>>
>>
> 
> 
> 

^ permalink raw reply

* [PATCH] can: hi311x: Acquire SPI lock on ->do_get_berr_counter
From: Lukas Wunner @ 2018-05-09 12:38 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev
  Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick, Stef Walter,
	Karel Zak

hi3110_get_berr_counter() may run concurrently to the rest of the driver
but neglects to acquire the lock protecting access to the SPI device.
As a result, it and the rest of the driver may clobber each other's tx
and rx buffers.

We became aware of this issue because transmission of packets with
"cangen -g 0 -i -x" frequently hung.  It turns out that agetty executes
->do_get_berr_counter every few seconds via the following call stack:

    CPU: 2 PID: 1605 Comm: agetty
    [<7f3f7500>] (hi3110_get_berr_counter [hi311x])
    [<7f130204>] (can_fill_info [can_dev])
    [<80693bc0>] (rtnl_fill_ifinfo)
    [<806949ec>] (rtnl_dump_ifinfo)
    [<806b4834>] (netlink_dump)
    [<806b4bc8>] (netlink_recvmsg)
    [<8065f180>] (sock_recvmsg)
    [<80660f90>] (___sys_recvmsg)
    [<80661e7c>] (__sys_recvmsg)
    [<80661ec0>] (SyS_recvmsg)
    [<80108b20>] (ret_fast_syscall+0x0/0x1c)

agetty listens to netlink messages in order to update the login prompt
when IP addresses change (if /etc/issue contains \4 or \6 escape codes):
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e36deb6424e8

It's a useful feature, though it seems questionable that it causes CAN
bit error statistics to be queried.

Be that as it may, if hi3110_get_berr_counter() is invoked while a frame
is sent by hi3110_hw_tx(), bogus SPI transfers like the following may
occur:

    => 12 00             (hi3110_get_berr_counter() wanted to transmit
                          EC 00 to query the transmit error counter,
                          but the first byte was overwritten by
                          hi3110_hw_tx_frame())

    => EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a
                          frame, but the first byte was overwritten by
                          hi3110_get_berr_counter() because it wanted
                          to query the receive error counter)

This sequence hangs the transmission because the driver believes it has
sent a frame and waits for the interrupt signaling completion, but in
reality the chip has never sent away the frame since the commands it
received were malformed.

Fix by acquiring the SPI lock in hi3110_get_berr_counter().

I've scrutinized the entire driver for further unlocked SPI accesses but
found no others.

Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: Stef Walter <stefw@redhat.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/can/spi/hi311x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index 5590c559a8ca..c2cf254e4e95 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct net_device *net,
 	struct hi3110_priv *priv = netdev_priv(net);
 	struct spi_device *spi = priv->spi;
 
+	mutex_lock(&priv->hi3110_lock);
 	bec->txerr = hi3110_read(spi, HI3110_READ_TEC);
 	bec->rxerr = hi3110_read(spi, HI3110_READ_REC);
+	mutex_unlock(&priv->hi3110_lock);
 
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH] can: hi311x: Work around TX complete interrupt erratum
From: Lukas Wunner @ 2018-05-09 12:43 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev
  Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick

When sending packets as fast as possible using "cangen -g 0 -i -x", the
HI-3110 occasionally latches the interrupt pin high on completion of a
packet, but doesn't set the TXCPLT bit in the INTF register.  The INTF
register contains 0x00 as if no interrupt has occurred.  Even waiting
for a few milliseconds after the interrupt doesn't help.

Work around this apparent erratum by instead checking the TXMTY bit in
the STATF register ("TX FIFO empty").  We know that we've queued up a
packet for transmission if priv->tx_len is nonzero.  If the TX FIFO is
empty, transmission of that packet must have completed.

Note that this is congruent with our handling of received packets, which
likewise gleans from the STATF register whether a packet is waiting in
the RX FIFO, instead of looking at the INTF register.

Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/can/spi/hi311x.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index c2cf254e4e95..53e320c92a8b 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -91,6 +91,7 @@
 #define HI3110_STAT_BUSOFF BIT(2)
 #define HI3110_STAT_ERRP BIT(3)
 #define HI3110_STAT_ERRW BIT(4)
+#define HI3110_STAT_TXMTY BIT(7)
 
 #define HI3110_BTR0_SJW_SHIFT 6
 #define HI3110_BTR0_BRP_SHIFT 0
@@ -737,10 +738,7 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
 			}
 		}
 
-		if (intf == 0)
-			break;
-
-		if (intf & HI3110_INT_TXCPLT) {
+		if (priv->tx_len && statf & HI3110_STAT_TXMTY) {
 			net->stats.tx_packets++;
 			net->stats.tx_bytes += priv->tx_len - 1;
 			can_led_event(net, CAN_LED_EVENT_TX);
@@ -750,6 +748,9 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
 			}
 			netif_wake_queue(net);
 		}
+
+		if (intf == 0)
+			break;
 	}
 	mutex_unlock(&priv->hi3110_lock);
 	return IRQ_HANDLED;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH rdma-next] MAINTAINERS: Remove bouncing @mellanox.com addresses
From: Doug Ledford @ 2018-05-09 13:08 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, netdev
In-Reply-To: <20180503183746.7629-1-leon@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On Thu, 2018-05-03 at 21:37 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Delete non-existent @mellanox.com addresses from MAINTAINERS file.
> 
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Thanks, applied to for-rc.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net] cxgb4: zero the HMA memory
From: Ganesh Goudar @ 2018-05-09 13:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar

firmware expects HMA memory to be zeroed, use __GFP_ZERO
for HMA memory allocation.

Fixes: 8b4e6b3ca2ed ("cxgb4: Add HMA support")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 24d2865..c3ae575 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3433,8 +3433,8 @@ static int adap_config_hma(struct adapter *adapter)
 	sgl = adapter->hma.sgt->sgl;
 	node = dev_to_node(adapter->pdev_dev);
 	for_each_sg(sgl, iter, sgt->orig_nents, i) {
-		newpage = alloc_pages_node(node, __GFP_NOWARN | GFP_KERNEL,
-					   page_order);
+		newpage = alloc_pages_node(node, __GFP_NOWARN | GFP_KERNEL |
+					   __GFP_ZERO, page_order);
 		if (!newpage) {
 			dev_err(adapter->pdev_dev,
 				"Not enough memory for HMA page allocation\n");
-- 
2.1.0

^ permalink raw reply related

* [bpf-next PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API
From: Jesper Dangaard Brouer @ 2018-05-09 13:12 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson

This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.

When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.

This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations.  The DMA APIs use of indirect
function pointer calls is the primary source the regression.  It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).

The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets.  The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system.  Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues.  E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.

---

Jesper Dangaard Brouer (4):
      bpf: devmap introduce dev_map_enqueue
      bpf: devmap prepare xdp frames for bulking
      xdp: add tracepoint for devmap like cpumap have
      xdp: change ndo_xdp_xmit API to support bulking


 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 ++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++-
 drivers/net/tun.c                             |   37 ++++---
 drivers/net/virtio_net.c                      |   66 +++++++++---
 include/linux/bpf.h                           |   15 ++-
 include/linux/netdevice.h                     |   14 ++-
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   50 +++++++++
 kernel/bpf/devmap.c                           |  134 ++++++++++++++++++++++++-
 net/core/filter.c                             |   19 +---
 net/core/xdp.c                                |   14 ++-
 samples/bpf/xdp_monitor_kern.c                |   49 +++++++++
 samples/bpf/xdp_monitor_user.c                |   69 +++++++++++++
 14 files changed, 436 insertions(+), 81 deletions(-)

^ permalink raw reply

* [bpf-next PATCH 1/4] bpf: devmap introduce dev_map_enqueue
From: Jesper Dangaard Brouer @ 2018-05-09 13:12 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Functionality is the same, but the ndo_xdp_xmit call is now
simply invoked from inside the devmap.c code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h        |   13 ++++++++++---
 include/trace/events/xdp.h |    9 ++++++++-
 kernel/bpf/devmap.c        |   37 +++++++++++++++++++++++++++++++------
 net/core/filter.c          |   15 ++-------------
 4 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 321969da67b7..6457343f6dfa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -485,14 +485,15 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
 
 /* Map specifics */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
+struct xdp_buff;
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
 void __cpu_map_flush(struct bpf_map *map);
-struct xdp_buff;
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 
@@ -571,6 +572,13 @@ static inline void __dev_map_flush(struct bpf_map *map)
 {
 }
 
+struct xdp_buff;
+static inline
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	return 0;
+}
+
 static inline
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -585,7 +593,6 @@ static inline void __cpu_map_flush(struct bpf_map *map)
 {
 }
 
-struct xdp_buff;
 static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
 				  struct xdp_buff *xdp,
 				  struct net_device *dev_rx)
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 8989a92c571a..96104610d40e 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -138,11 +138,18 @@ DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
 		  __entry->map_id, __entry->map_index)
 );
 
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+	struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
 #define devmap_ifindex(fwd, map)				\
 	(!fwd ? 0 :						\
 	 (!map ? 0 :						\
 	  ((map->map_type == BPF_MAP_TYPE_DEVMAP) ?		\
-	   ((struct net_device *)fwd)->ifindex : 0)))
+	   ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)))
 
 #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx)		\
 	 trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map),	\
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 565f9ece9115..808808bf2bf2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -48,18 +48,21 @@
  * calls will fail at this point.
  */
 #include <linux/bpf.h>
+#include <net/xdp.h>
 #include <linux/filter.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+/* objects in the map */
 struct bpf_dtab_netdev {
-	struct net_device *dev;
+	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
 	struct rcu_head rcu;
 };
 
+/* bpf map container */
 struct bpf_dtab {
 	struct bpf_map map;
 	struct bpf_dtab_netdev **netdev_map;
@@ -240,21 +243,43 @@ void __dev_map_flush(struct bpf_map *map)
  * update happens in parallel here a dev_put wont happen until after reading the
  * ifindex.
  */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_dtab_netdev *dev;
+	struct bpf_dtab_netdev *obj;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	dev = READ_ONCE(dtab->netdev_map[key]);
-	return dev ? dev->dev : NULL;
+	obj = READ_ONCE(dtab->netdev_map[key]);
+	return obj;
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	struct net_device *dev = dst->dev;
+	struct xdp_frame *xdpf;
+	int err;
+
+	if (!dev->netdev_ops->ndo_xdp_xmit)
+		return -EOPNOTSUPP;
+
+	xdpf = convert_to_xdp_frame(xdp);
+	if (unlikely(!xdpf))
+		return -EOVERFLOW;
+
+	/* TODO: implement a bulking/enqueue step later */
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	if (err)
+		return err;
+
+	return 0;
 }
 
 static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
 {
-	struct net_device *dev = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct net_device *dev = dev = obj ? obj->dev : NULL;
 
 	return dev ? &dev->ifindex : NULL;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 6877426c23a6..258d243099e1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3020,20 +3020,9 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 
 	switch (map->map_type) {
 	case BPF_MAP_TYPE_DEVMAP: {
-		struct net_device *dev = fwd;
-		struct xdp_frame *xdpf;
+		struct bpf_dtab_netdev *dst = fwd;
 
-		if (!dev->netdev_ops->ndo_xdp_xmit)
-			return -EOPNOTSUPP;
-
-		xdpf = convert_to_xdp_frame(xdp);
-		if (unlikely(!xdpf))
-			return -EOVERFLOW;
-
-		/* TODO: move to inside map code instead, for bulk support
-		 * err = dev_map_enqueue(dev, xdp);
-		 */
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		err = dev_map_enqueue(dst, xdp);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);

^ permalink raw reply related

* [bpf-next PATCH 2/4] bpf: devmap prepare xdp frames for bulking
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Like cpumap create queue for xdp frames that will be bulked.  For now,
this patch simply invoke ndo_xdp_xmit foreach frame.  This happens,
either when the map flush operation is envoked, or when the limit
DEV_MAP_BULK_SIZE is reached.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/devmap.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 808808bf2bf2..cab72c100bb5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -54,11 +54,18 @@
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+#define DEV_MAP_BULK_SIZE 16
+struct xdp_bulk_queue {
+	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	unsigned int count;
+};
+
 /* objects in the map */
 struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
+	struct xdp_bulk_queue __percpu *bulkq;
 	struct rcu_head rcu;
 };
 
@@ -209,6 +216,38 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 	__set_bit(bit, bitmap);
 }
 
+static int bq_xmit_all(struct bpf_dtab_netdev *obj,
+			 struct xdp_bulk_queue *bq)
+{
+	unsigned int processed = 0, drops = 0;
+	struct net_device *dev = obj->dev;
+	int i;
+
+	if (unlikely(!bq->count))
+		return 0;
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		prefetch(xdpf);
+	}
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+		int err;
+
+		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		if (err) {
+			drops++;
+			xdp_return_frame(xdpf);
+		}
+		processed++;
+	}
+	bq->count = 0;
+
+	return 0;
+}
+
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
  * from the driver before returning from its napi->poll() routine. The poll()
  * routine is called either from busy_poll context or net_rx_action signaled
@@ -224,6 +263,7 @@ void __dev_map_flush(struct bpf_map *map)
 
 	for_each_set_bit(bit, bitmap, map->max_entries) {
 		struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
+		struct xdp_bulk_queue *bq;
 		struct net_device *netdev;
 
 		/* This is possible if the dev entry is removed by user space
@@ -233,6 +273,9 @@ void __dev_map_flush(struct bpf_map *map)
 			continue;
 
 		__clear_bit(bit, bitmap);
+
+		bq = this_cpu_ptr(dev->bulkq);
+		bq_xmit_all(dev, bq);
 		netdev = dev->dev;
 		if (likely(netdev->netdev_ops->ndo_xdp_flush))
 			netdev->netdev_ops->ndo_xdp_flush(netdev);
@@ -255,6 +298,20 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 	return obj;
 }
 
+/* Runs under RCU-read-side, plus in softirq under NAPI protection.
+ * Thus, safe percpu variable access.
+ */
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+{
+	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+
+	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
+		bq_xmit_all(obj, bq);
+
+	bq->q[bq->count++] = xdpf;
+	return 0;
+}
+
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 {
 	struct net_device *dev = dst->dev;
@@ -268,8 +325,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	/* TODO: implement a bulking/enqueue step later */
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	err = bq_enqueue(dst, xdpf);
 	if (err)
 		return err;
 
@@ -288,13 +344,18 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
 {
 	if (dev->dev->netdev_ops->ndo_xdp_flush) {
 		struct net_device *fl = dev->dev;
+		struct xdp_bulk_queue *bq;
 		unsigned long *bitmap;
+
 		int cpu;
 
 		for_each_online_cpu(cpu) {
 			bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu);
 			__clear_bit(dev->bit, bitmap);
 
+			bq = per_cpu_ptr(dev->bulkq, cpu);
+			bq_xmit_all(dev, bq);
+
 			fl->netdev_ops->ndo_xdp_flush(dev->dev);
 		}
 	}
@@ -306,6 +367,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
 
 	dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
 	dev_map_flush_old(dev);
+	free_percpu(dev->bulkq);
 	dev_put(dev->dev);
 	kfree(dev);
 }
@@ -338,6 +400,7 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	struct net *net = current->nsproxy->net_ns;
+	gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
 	struct bpf_dtab_netdev *dev, *old_dev;
 	u32 i = *(u32 *)key;
 	u32 ifindex = *(u32 *)value;
@@ -352,11 +415,17 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (!ifindex) {
 		dev = NULL;
 	} else {
-		dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
-				   map->numa_node);
+		dev = kmalloc_node(sizeof(*dev), gfp, map->numa_node);
 		if (!dev)
 			return -ENOMEM;
 
+		dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
+						sizeof(void *), gfp);
+		if (!dev->bulkq) {
+			kfree(dev);
+			return -ENOMEM;
+		}
+
 		dev->dev = dev_get_by_index(net, ifindex);
 		if (!dev->dev) {
 			kfree(dev);

^ permalink raw reply related

* [bpf-next PATCH 3/4] xdp: add tracepoint for devmap like cpumap have
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Notice how this allow us get XDP statistic without affecting the XDP
performance, as tracepoint is no-longer activated on a per packet basis.

The xdp_monitor sample/tool is updated to use this new tracepoint.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h            |    6 ++++-
 include/trace/events/xdp.h     |   39 +++++++++++++++++++++++++++++++++++
 kernel/bpf/devmap.c            |   25 ++++++++++++++++++-----
 net/core/filter.c              |    2 +-
 samples/bpf/xdp_monitor_kern.c |   39 +++++++++++++++++++++++++++++++++++
 samples/bpf/xdp_monitor_user.c |   44 +++++++++++++++++++++++++++++++++++++++-
 6 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6457343f6dfa..066b5c67c71f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -489,7 +489,8 @@ struct xdp_buff;
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
@@ -574,7 +575,8 @@ static inline void __dev_map_flush(struct bpf_map *map)
 
 struct xdp_buff;
 static inline
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	return 0;
 }
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 96104610d40e..2e9ef0650144 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -229,6 +229,45 @@ TRACE_EVENT(xdp_cpumap_enqueue,
 		  __entry->to_cpu)
 );
 
+TRACE_EVENT(xdp_devmap_xmit,
+
+	TP_PROTO(const struct bpf_map *map, u32 map_index,
+		 int sent, int drops,
+		 const struct net_device *from_dev,
+		 const struct net_device *to_dev),
+
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+
+	TP_STRUCT__entry(
+		__field(int, map_id)
+		__field(u32, act)
+		__field(u32, map_index)
+		__field(int, drops)
+		__field(int, sent)
+		__field(int, from_ifindex)
+		__field(int, to_ifindex)
+	),
+
+	TP_fast_assign(
+		__entry->map_id		= map->id;
+		__entry->act		= XDP_REDIRECT;
+		__entry->map_index	= map_index;
+		__entry->drops		= drops;
+		__entry->sent		= sent;
+		__entry->from_ifindex	= from_dev->ifindex;
+		__entry->to_ifindex	= to_dev->ifindex;
+	),
+
+	TP_printk("ndo_xdp_xmit"
+		  " map_id=%d map_index=%d action=%s"
+		  " sent=%d drops=%d"
+		  " from_ifindex=%d to_ifindex=%d",
+		  __entry->map_id, __entry->map_index,
+		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->sent, __entry->drops,
+		  __entry->from_ifindex, __entry->to_ifindex)
+);
+
 #endif /* _TRACE_XDP_H */
 
 #include <trace/define_trace.h>
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index cab72c100bb5..6f84100723b0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,7 @@
 #include <linux/bpf.h>
 #include <net/xdp.h>
 #include <linux/filter.h>
+#include <trace/events/xdp.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -57,6 +58,7 @@
 #define DEV_MAP_BULK_SIZE 16
 struct xdp_bulk_queue {
 	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	struct net_device *dev_rx;
 	unsigned int count;
 };
 
@@ -219,8 +221,8 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
-	unsigned int processed = 0, drops = 0;
 	struct net_device *dev = obj->dev;
+	int sent = 0, drops = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -241,10 +243,13 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			drops++;
 			xdp_return_frame(xdpf);
 		}
-		processed++;
+		sent++;
 	}
 	bq->count = 0;
 
+	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
+			      sent, drops, bq->dev_rx, dev);
+	bq->dev_rx = NULL;
 	return 0;
 }
 
@@ -301,18 +306,28 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 /* Runs under RCU-read-side, plus in softirq under NAPI protection.
  * Thus, safe percpu variable access.
  */
-static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+		      struct net_device *dev_rx)
+
 {
 	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
 		bq_xmit_all(obj, bq);
 
+	/* Ingress dev_rx will be the same for all xdp_frame's in
+	 * bulk_queue, because bq stored per-CPU and must be flushed
+	 * from net_device drivers NAPI func end.
+	 */
+	if (!bq->dev_rx)
+		bq->dev_rx = dev_rx;
+
 	bq->q[bq->count++] = xdpf;
 	return 0;
 }
 
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	struct net_device *dev = dst->dev;
 	struct xdp_frame *xdpf;
@@ -325,7 +340,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = bq_enqueue(dst, xdpf);
+	err = bq_enqueue(dst, xdpf, dev_rx);
 	if (err)
 		return err;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 258d243099e1..8b7924368dc1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3022,7 +3022,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 	case BPF_MAP_TYPE_DEVMAP: {
 		struct bpf_dtab_netdev *dst = fwd;
 
-		err = dev_map_enqueue(dst, xdp);
+		err = dev_map_enqueue(dst, xdp, dev_rx);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 211db8ded0de..2854aa0665ea 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -208,3 +208,42 @@ int trace_xdp_cpumap_kthread(struct cpumap_kthread_ctx *ctx)
 
 	return 0;
 }
+
+struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_devmap_xmit/format
+ * Code in:         kernel/include/trace/events/xdp.h
+ */
+struct devmap_xmit_ctx {
+	u64 __pad;		// First 8 bytes are not accessible by bpf code
+	int map_id;		//	offset:8;  size:4; signed:1;
+	u32 act;		//	offset:12; size:4; signed:0;
+	u32 map_index;		//	offset:16; size:4; signed:0;
+	int drops;		//	offset:20; size:4; signed:1;
+	int sent;		//	offset:24; size:4; signed:1;
+	int from_ifindex;	//	offset:28; size:4; signed:1;
+	int to_ifindex;		//	offset:32; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_devmap_xmit")
+int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
+{
+	struct datarec *rec;
+	u32 key = 0;
+
+	rec = bpf_map_lookup_elem(&devmap_xmit_cnt, &key);
+	if (!rec)
+		return 0;
+	rec->processed += ctx->sent;
+	rec->dropped   += ctx->drops;
+
+	/* Record bulk events, then userspace can calc average bulk size */
+	rec->info += 1;
+
+	return 1;
+}
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..4aaf1ab1927d 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -141,6 +141,7 @@ struct stats_record {
 	struct record_u64 xdp_exception[XDP_ACTION_MAX];
 	struct record xdp_cpumap_kthread;
 	struct record xdp_cpumap_enqueue[MAX_CPUS];
+	struct record xdp_devmap_xmit;
 };
 
 static bool map_collect_record(int fd, __u32 key, struct record *rec)
@@ -397,7 +398,7 @@ static void stats_print(struct stats_record *stats_rec,
 			info = calc_info(r, p, t);
 			if (info > 0)
 				i_str = "sched";
-			if (pps > 0)
+			if (pps > 0 || drop > 0)
 				printf(fmt1, "cpumap-kthread",
 				       i, pps, drop, info, i_str);
 		}
@@ -409,6 +410,42 @@ static void stats_print(struct stats_record *stats_rec,
 		printf(fmt2, "cpumap-kthread", "total", pps, drop, info, i_str);
 	}
 
+	/* devmap ndo_xdp_xmit stats */
+	{
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		struct record *rec, *prev;
+		double drop, info;
+		char *i_str = "";
+
+		rec  =  &stats_rec->xdp_devmap_xmit;
+		prev = &stats_prev->xdp_devmap_xmit;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop(r, p, t);
+			info = calc_info(r, p, t);
+			if (info > 0) {
+				i_str = "bulk-average";
+				info = (pps+drop) / info; /* calc avg bulk */
+			}
+			if (pps > 0 || drop > 0)
+				printf(fmt1, "devmap-xmit",
+				       i, pps, drop, info, i_str);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop(&rec->total, &prev->total, t);
+		info = calc_info(&rec->total, &prev->total, t);
+		if (info > 0) {
+			i_str = "bulk-average";
+			info = (pps+drop) / info; /* calc avg bulk */
+		}
+		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+	}
+
 	printf("\n");
 }
 
@@ -437,6 +474,9 @@ static bool stats_collect(struct stats_record *rec)
 	fd = map_data[3].fd; /* map3: cpumap_kthread_cnt */
 	map_collect_record(fd, 0, &rec->xdp_cpumap_kthread);
 
+	fd = map_data[4].fd; /* map4: devmap_xmit_cnt */
+	map_collect_record(fd, 0, &rec->xdp_devmap_xmit);
+
 	return true;
 }
 
@@ -480,6 +520,7 @@ static struct stats_record *alloc_stats_record(void)
 
 	rec_sz = sizeof(struct datarec);
 	rec->xdp_cpumap_kthread.cpu = alloc_rec_per_cpu(rec_sz);
+	rec->xdp_devmap_xmit.cpu    = alloc_rec_per_cpu(rec_sz);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		rec->xdp_cpumap_enqueue[i].cpu = alloc_rec_per_cpu(rec_sz);
@@ -498,6 +539,7 @@ static void free_stats_record(struct stats_record *r)
 		free(r->xdp_exception[i].cpu);
 
 	free(r->xdp_cpumap_kthread.cpu);
+	free(r->xdp_devmap_xmit.cpu);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		free(r->xdp_cpumap_enqueue[i].cpu);

^ permalink raw reply related

* [bpf-next PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 +++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 ++++++--
 drivers/net/tun.c                             |   37 +++++++++-----
 drivers/net/virtio_net.c                      |   66 +++++++++++++++++++------
 include/linux/netdevice.h                     |   14 +++--
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   10 ++--
 kernel/bpf/devmap.c                           |   33 ++++++++-----
 net/core/filter.c                             |    4 +-
 net/core/xdp.c                                |   14 ++++-
 samples/bpf/xdp_monitor_kern.c                |   10 ++++
 samples/bpf/xdp_monitor_user.c                |   35 +++++++++++--
 13 files changed, 197 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 87fb27ab9c24..27e69fdd0a11 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3686,14 +3686,19 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
  * @dev: netdev
  * @xdp: XDP buffer
  *
- * Returns Zero if sent, else an error code
+ * Returns number of frames successfully sent. Frames that fail are
+ * free'ed via XDP return API.
+ *
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
  **/
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	unsigned int queue_index = smp_processor_id();
 	struct i40e_vsi *vsi = np->vsi;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (test_bit(__I40E_VSI_DOWN, vsi->state))
 		return -ENETDOWN;
@@ -3701,11 +3706,18 @@ int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
 		return -ENXIO;
 
-	err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
-	if (err != I40E_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
+		if (err != I40E_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 4bf318b8be85..da6697c0407f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -511,7 +511,7 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool in_sw);
 void i40e_detect_recover_hung(struct i40e_vsi *vsi);
 int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
 bool __i40e_chk_linearize(struct sk_buff *skb);
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf);
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames);
 void i40e_xdp_flush(struct net_device *dev);
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b6e5cea84949..e64f71bc04c2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10042,11 +10042,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int ixgbe_xdp_xmit(struct net_device *dev, int n,
+			  struct xdp_frame **frames)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_ring *ring;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
 		return -ENETDOWN;
@@ -10058,11 +10060,18 @@ static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (unlikely(!ring))
 		return -ENXIO;
 
-	err = ixgbe_xmit_xdp_ring(adapter, xdpf);
-	if (err != IXGBE_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		if (err != IXGBE_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 static void ixgbe_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d3c04ab9752a..4fe0c75c5e0b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,6 +70,7 @@
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
+#include <net/xdp.h>
 #include <linux/seq_file.h>
 #include <linux/uio.h>
 #include <linux/skb_array.h>
@@ -1290,34 +1291,44 @@ static const struct net_device_ops tun_netdev_ops = {
 	.ndo_get_stats64	= tun_net_get_stats64,
 };
 
-static int tun_xdp_xmit(struct net_device *dev, struct xdp_frame *frame)
+static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	u32 numqueues;
-	int ret = 0;
+	int drops = 0;
+	int cnt = n;
+	int i;
 
 	rcu_read_lock();
 
 	numqueues = READ_ONCE(tun->numqueues);
 	if (!numqueues) {
-		ret = -ENOSPC;
-		goto out;
+		rcu_read_unlock();
+		return -ENXIO; /* Caller will free/return all frames */
 	}
 
 	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
 					    numqueues]);
-	/* Encode the XDP flag into lowest bit for consumer to differ
-	 * XDP buffer from sk_buff.
-	 */
-	if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(frame))) {
-		this_cpu_inc(tun->pcpu_stats->tx_dropped);
-		ret = -ENOSPC;
+
+	spin_lock(&tfile->tx_ring.producer_lock);
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdp = frames[i];
+		/* Encode the XDP flag into lowest bit for consumer to differ
+		 * XDP buffer from sk_buff.
+		 */
+		void *frame = tun_xdp_to_ptr(xdp);
+
+		if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
+			this_cpu_inc(tun->pcpu_stats->tx_dropped);
+			xdp_return_frame_rx_napi(xdp);
+			drops++;
+		}
 	}
+	spin_unlock(&tfile->tx_ring.producer_lock);
 
-out:
 	rcu_read_unlock();
-	return ret;
+	return cnt - drops;
 }
 
 static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
@@ -1327,7 +1338,7 @@ static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
 	if (unlikely(!frame))
 		return -EOVERFLOW;
 
-	return tun_xdp_xmit(dev, frame);
+	return tun_xdp_xmit(dev, 1, &frame);
 }
 
 static void tun_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f34794a76c4d..39a0783d1cde 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -419,23 +419,13 @@ static void virtnet_xdp_flush(struct net_device *dev)
 	virtqueue_kick(sq->vq);
 }
 
-static int __virtnet_xdp_xmit(struct virtnet_info *vi,
-			       struct xdp_frame *xdpf)
+static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
+				   struct send_queue *sq,
+				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	struct xdp_frame *xdpf_sent;
-	struct send_queue *sq;
-	unsigned int len;
-	unsigned int qp;
 	int err;
 
-	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
-	sq = &vi->sq[qp];
-
-	/* Free up any pending old buffers before queueing new ones. */
-	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
-		xdp_return_frame(xdpf_sent);
-
 	/* virtqueue want to use data area in-front of packet */
 	if (unlikely(xdpf->metasize > 0))
 		return -EOPNOTSUPP;
@@ -459,11 +449,40 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi,
 	return 0;
 }
 
-static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int __virtnet_xdp_tx_xmit(struct virtnet_info *vi,
+				   struct xdp_frame *xdpf)
+{
+	struct xdp_frame *xdpf_sent;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	return __virtnet_xdp_xmit_one(vi, sq, xdpf);
+}
+
+static int virtnet_xdp_xmit(struct net_device *dev,
+			    int n, struct xdp_frame **frames)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct receive_queue *rq = vi->rq;
+	struct xdp_frame *xdpf_sent;
 	struct bpf_prog *xdp_prog;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+	int drops = 0;
+	int err;
+	int i;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
 
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
 	 * indicate XDP resources have been successfully allocated.
@@ -472,7 +491,20 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!xdp_prog)
 		return -ENXIO;
 
-	return __virtnet_xdp_xmit(vi, xdpf);
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+
+		err = __virtnet_xdp_xmit_one(vi, sq, xdpf);
+		if (err) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+	return n - drops;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -616,7 +648,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
@@ -779,7 +811,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				if (unlikely(xdp_page != page))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a30435118530..1c92f3b63c70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1165,9 +1165,13 @@ struct dev_ifalias {
  *	This function is used to set or query state related to XDP on the
  *	netdevice and manage BPF offload. See definition of
  *	enum bpf_netdev_command for details.
- * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp);
- *	This function is used to submit a XDP packet for transmit on a
- *	netdevice.
+ * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
+ *	This function is used to submit @n XDP packets for transmit on a
+ *	netdevice. Returns number of frames successfully transmitted, frames
+ *	that got dropped are freed/returned via xdp_return_frame().
+ *	Returns negative number, means general error invoking ndo, meaning
+ *	no frames were xmit'ed and core-caller will free all frames.
+ *	TODO: Consider add flag to allow sending flush operation.
  * void (*ndo_xdp_flush)(struct net_device *dev);
  *	This function is used to inform the driver to flush a particular
  *	xdp tx queue. Must be called on same CPU as xdp_xmit.
@@ -1355,8 +1359,8 @@ struct net_device_ops {
 						       int needed_headroom);
 	int			(*ndo_bpf)(struct net_device *dev,
 					   struct netdev_bpf *bpf);
-	int			(*ndo_xdp_xmit)(struct net_device *dev,
-						struct xdp_frame *xdp);
+	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
+						struct xdp_frame **xdp);
 	void			(*ndo_xdp_flush)(struct net_device *dev);
 };
 
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0b689cf561c7..7ad779237ae8 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 }
 
 void xdp_return_frame(struct xdp_frame *xdpf);
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 2e9ef0650144..1ecf4c67fcf7 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -234,9 +234,9 @@ TRACE_EVENT(xdp_devmap_xmit,
 	TP_PROTO(const struct bpf_map *map, u32 map_index,
 		 int sent, int drops,
 		 const struct net_device *from_dev,
-		 const struct net_device *to_dev),
+		 const struct net_device *to_dev, int err),
 
-	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
 
 	TP_STRUCT__entry(
 		__field(int, map_id)
@@ -246,6 +246,7 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__field(int, sent)
 		__field(int, from_ifindex)
 		__field(int, to_ifindex)
+		__field(int, err)
 	),
 
 	TP_fast_assign(
@@ -256,16 +257,17 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__entry->sent		= sent;
 		__entry->from_ifindex	= from_dev->ifindex;
 		__entry->to_ifindex	= to_dev->ifindex;
+		__entry->err		= err;
 	),
 
 	TP_printk("ndo_xdp_xmit"
 		  " map_id=%d map_index=%d action=%s"
 		  " sent=%d drops=%d"
-		  " from_ifindex=%d to_ifindex=%d",
+		  " from_ifindex=%d to_ifindex=%d err=%d",
 		  __entry->map_id, __entry->map_index,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->sent, __entry->drops,
-		  __entry->from_ifindex, __entry->to_ifindex)
+		  __entry->from_ifindex, __entry->to_ifindex, __entry->err)
 );
 
 #endif /* _TRACE_XDP_H */
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 6f84100723b0..4dd8f0e3a8d9 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -222,7 +222,7 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
 	struct net_device *dev = obj->dev;
-	int sent = 0, drops = 0;
+	int sent = 0, drops = 0, err = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -234,23 +234,32 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 		prefetch(xdpf);
 	}
 
-	for (i = 0; i < bq->count; i++) {
-		struct xdp_frame *xdpf = bq->q[i];
-		int err;
-
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-		if (err) {
-			drops++;
-			xdp_return_frame(xdpf);
-		}
-		sent++;
+	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q);
+	if (sent < 0) {
+		err = sent;
+		sent = 0;
+		goto error;
 	}
+	drops = bq->count - sent;
+out:
 	bq->count = 0;
 
 	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
-			      sent, drops, bq->dev_rx, dev);
+			      sent, drops, bq->dev_rx, dev, err);
 	bq->dev_rx = NULL;
 	return 0;
+error:
+	/* If ndo_xdp_xmit fails with an errno, no frames have been
+	 * xmit'ed and it's our responsibility to them free all.
+	 */
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		/* RX path under NAPI protection, can return frames faster */
+		xdp_return_frame_rx_napi(xdpf);
+		drops++;
+	}
+	goto out;
 }
 
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
diff --git a/net/core/filter.c b/net/core/filter.c
index 8b7924368dc1..547174d37f66 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3004,8 +3004,8 @@ static int __bpf_tx_xdp(struct net_device *dev,
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-	if (err)
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf);
+	if (err <= 0)
 		return err;
 	dev->netdev_ops->ndo_xdp_flush(dev);
 	return 0;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index bf6758f74339..237d374e6ef7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -308,7 +308,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 }
 EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
 
-static void xdp_return(void *data, struct xdp_mem_info *mem)
+static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -320,7 +320,7 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 		page = virt_to_head_page(data);
 		if (xa)
-			page_pool_put_page(xa->page_pool, page);
+			__page_pool_put_page(xa->page_pool, page, napi_direct);
 		else
 			put_page(page);
 		rcu_read_unlock();
@@ -340,12 +340,18 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	xdp_return(xdpf->data, &xdpf->mem);
+	__xdp_return(xdpf->data, &xdpf->mem, false);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
+{
+	__xdp_return(xdpf->data, &xdpf->mem, true);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
+
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	xdp_return(xdp->data, &xdp->rxq->mem);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 2854aa0665ea..ad10fe700d7d 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -125,6 +125,7 @@ struct datarec {
 	u64 processed;
 	u64 dropped;
 	u64 info;
+	u64 err;
 };
 #define MAX_CPUS 64
 
@@ -228,6 +229,7 @@ struct devmap_xmit_ctx {
 	int sent;		//	offset:24; size:4; signed:1;
 	int from_ifindex;	//	offset:28; size:4; signed:1;
 	int to_ifindex;		//	offset:32; size:4; signed:1;
+	int err;		//	offset:36; size:4; signed:1;
 };
 
 SEC("tracepoint/xdp/xdp_devmap_xmit")
@@ -245,5 +247,13 @@ int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
 	/* Record bulk events, then userspace can calc average bulk size */
 	rec->info += 1;
 
+	/* Record error cases, where no frame were sent */
+	if (ctx->err)
+		rec->err++;
+
+	/* Catch API error of drv ndo_xdp_xmit sent more than count */
+	if (ctx->drops < 0)
+		rec->err++;
+
 	return 1;
 }
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 4aaf1ab1927d..0ec7967dd32a 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -117,6 +117,7 @@ struct datarec {
 	__u64 processed;
 	__u64 dropped;
 	__u64 info;
+	__u64 err;
 };
 #define MAX_CPUS 64
 
@@ -152,6 +153,7 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 	__u64 sum_processed = 0;
 	__u64 sum_dropped = 0;
 	__u64 sum_info = 0;
+	__u64 sum_err = 0;
 	int i;
 
 	if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
@@ -170,10 +172,13 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 		sum_dropped        += values[i].dropped;
 		rec->cpu[i].info = values[i].info;
 		sum_info        += values[i].info;
+		rec->cpu[i].err = values[i].err;
+		sum_err        += values[i].err;
 	}
 	rec->total.processed = sum_processed;
 	rec->total.dropped   = sum_dropped;
 	rec->total.info      = sum_info;
+	rec->total.err       = sum_err;
 	return true;
 }
 
@@ -274,6 +279,18 @@ static double calc_info(struct datarec *r, struct datarec *p, double period)
 	return pps;
 }
 
+static double calc_err(struct datarec *r, struct datarec *p, double period)
+{
+	__u64 packets = 0;
+	double pps = 0;
+
+	if (period > 0) {
+		packets = r->err - p->err;
+		pps = packets / period;
+	}
+	return pps;
+}
+
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
 			bool err_only)
@@ -412,11 +429,12 @@ static void stats_print(struct stats_record *stats_rec,
 
 	/* devmap ndo_xdp_xmit stats */
 	{
-		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
-		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s %s\n";
 		struct record *rec, *prev;
-		double drop, info;
+		double drop, info, err;
 		char *i_str = "";
+		char *err_str = "";
 
 		rec  =  &stats_rec->xdp_devmap_xmit;
 		prev = &stats_prev->xdp_devmap_xmit;
@@ -428,22 +446,29 @@ static void stats_print(struct stats_record *stats_rec,
 			pps  = calc_pps(r, p, t);
 			drop = calc_drop(r, p, t);
 			info = calc_info(r, p, t);
+			err  = calc_err(r, p, t);
 			if (info > 0) {
 				i_str = "bulk-average";
 				info = (pps+drop) / info; /* calc avg bulk */
 			}
+			if (err > 0)
+				err_str = "drv-err";
 			if (pps > 0 || drop > 0)
 				printf(fmt1, "devmap-xmit",
-				       i, pps, drop, info, i_str);
+				       i, pps, drop, info, i_str, err_str);
 		}
 		pps = calc_pps(&rec->total, &prev->total, t);
 		drop = calc_drop(&rec->total, &prev->total, t);
 		info = calc_info(&rec->total, &prev->total, t);
+		err  = calc_err(&rec->total, &prev->total, t);
 		if (info > 0) {
 			i_str = "bulk-average";
 			info = (pps+drop) / info; /* calc avg bulk */
 		}
-		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+		if (err > 0)
+			err_str = "drv-err";
+		printf(fmt2, "devmap-xmit", "total", pps, drop,
+		       info, i_str, err_str);
 	}
 
 	printf("\n");

^ permalink raw reply related

* [PATCH v2 01/11] docs: can.rst: fix a footnote reference
From: Mauro Carvalho Chehab @ 2018-05-09 13:18 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, linux-can, netdev
In-Reply-To: <cover.1525870886.git.mchehab+samsung@kernel.org>

As stated at:
	http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes

A footnote should contain either a number, a reference or
an auto number, e. g.:
        [1], [#f1] or [#].

While using [*] accidentaly works for html, it fails for other
document outputs. In particular, it causes an error with LaTeX
output, causing all books after networking to not be built.

So, replace it by a valid syntax.

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/networking/can.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst
index d23c51abf8c6..2fd0b51a8c52 100644
--- a/Documentation/networking/can.rst
+++ b/Documentation/networking/can.rst
@@ -164,7 +164,7 @@ The Linux network devices (by default) just can handle the
 transmission and reception of media dependent frames. Due to the
 arbitration on the CAN bus the transmission of a low prio CAN-ID
 may be delayed by the reception of a high prio CAN frame. To
-reflect the correct [*]_ traffic on the node the loopback of the sent
+reflect the correct [#f1]_ traffic on the node the loopback of the sent
 data has to be performed right after a successful transmission. If
 the CAN network interface is not capable of performing the loopback for
 some reason the SocketCAN core can do this task as a fallback solution.
@@ -175,7 +175,7 @@ networking behaviour for CAN applications. Due to some requests from
 the RT-SocketCAN group the loopback optionally may be disabled for each
 separate socket. See sockopts from the CAN RAW sockets in :ref:`socketcan-raw-sockets`.
 
-.. [*] you really like to have this when you're running analyser
+.. [#f1] you really like to have this when you're running analyser
        tools like 'candump' or 'cansniffer' on the (same) node.
 
 
-- 
2.17.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox