Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v5 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Sean Young @ 2018-06-05 10:16 UTC (permalink / raw)
  To: Matthias Reichl, linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Devin Heitmueller,
	Y Song, Quentin Monnet
In-Reply-To: <20180604174730.sctfoklq7klswebp@camel2.lan>

On Mon, Jun 04, 2018 at 07:47:30PM +0200, Matthias Reichl wrote:
> Hi Sean,
> 
> I finally found the time to test your patch series and noticed
> 2 issues - comments are inline
> 
> On Sun, May 27, 2018 at 12:24:09PM +0100, Sean Young wrote:
> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> > index eb2c3b6eca7f..d5b35a6ba899 100644
> > --- a/drivers/media/rc/Kconfig
> > +++ b/drivers/media/rc/Kconfig
> > @@ -25,6 +25,19 @@ config LIRC
> >  	   passes raw IR to and from userspace, which is needed for
> >  	   IR transmitting (aka "blasting") and for the lirc daemon.
> >  
> > +config BPF_LIRC_MODE2
> > +	bool "Support for eBPF programs attached to lirc devices"
> > +	depends on BPF_SYSCALL
> > +	depends on RC_CORE=y
> 
> Requiring rc-core to be built into the kernel could become
> problematic in the future for people using media_build.
> 
> Currently the whole media tree (including rc-core) can be built
> as modules so DVB and IR drivers can be replaced by newer versions.
> But with rc-core in the kernel things could easily break if internal
> data structures are changed.
> 
> Maybe we should add a small layer with a stable API/ABI between
> bpf-lirc and rc-core to decouple them? Or would it be possible
> to build rc-core with bpf support as a module?

Unfortunately bpf cannot be built as a module.

> > +	depends on LIRC
> > +	help
> > +	   Allow attaching eBPF programs to a lirc device using the bpf(2)
> > +	   syscall command BPF_PROG_ATTACH. This is supported for raw IR
> > +	   receivers.
> > +
> > +	   These eBPF programs can be used to decode IR into scancodes, for
> > +	   IR protocols not supported by the kernel decoders.
> > +
> >  menuconfig RC_DECODERS
> >  	bool "Remote controller decoders"
> >  	depends on RC_CORE
> > [...]
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 388d4feda348..3c104113d040 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -11,6 +11,7 @@
> >   */
> >  #include <linux/bpf.h>
> >  #include <linux/bpf_trace.h>
> > +#include <linux/bpf_lirc.h>
> >  #include <linux/btf.h>
> >  #include <linux/syscalls.h>
> >  #include <linux/slab.h>
> > @@ -1578,6 +1579,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> >  	case BPF_SK_SKB_STREAM_PARSER:
> >  	case BPF_SK_SKB_STREAM_VERDICT:
> >  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> > +	case BPF_LIRC_MODE2:
> > +		return lirc_prog_attach(attr);
> >  	default:
> >  		return -EINVAL;
> >  	}
> > @@ -1648,6 +1651,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
> >  	case BPF_SK_SKB_STREAM_PARSER:
> >  	case BPF_SK_SKB_STREAM_VERDICT:
> >  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
> > +	case BPF_LIRC_MODE2:
> > +		return lirc_prog_detach(attr);
> >  	default:
> >  		return -EINVAL;
> >  	}
> > @@ -1695,6 +1700,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
> >  	case BPF_CGROUP_SOCK_OPS:
> >  	case BPF_CGROUP_DEVICE:
> >  		break;
> > +	case BPF_LIRC_MODE2:
> > +		return lirc_prog_query(attr, uattr);
> 
> When testing this patch series I was wondering why I always got
> -EINVAL when trying to query the registered programs.
> 
> Closer inspection revealed that bpf_prog_attach/detach/query and
> calls to them in the bpf syscall are in "#ifdef CONFIG_CGROUP_BPF"
> blocks - and as I built the kernel without CONFIG_CGROUP_BPF
> BPF_PROG_ATTACH/DETACH/QUERY weren't handled in the syscall switch
> and I got -EINVAL from the bpf syscall function.
> 
> I haven't checked in detail yet, but it looks to me like
> bpf_prog_attach/detach/query could always be built (or when
> either cgroup bpf or lirc bpf are enabled) and the #ifdefs moved
> inside the switch(). So lirc bpf could be used without cgroup bpf.
> Or am I missing something?

You are right, this features depends on CONFIG_CGROUP_BPF right now. This
also affects the BPF_SK_MSG_VERDICT, BPF_SK_SKB_STREAM_VERDICT and
BPF_SK_SKB_STREAM_PARSER type bpf attachments, and as far as I know
these shouldn't depend on CONFIG_CGROUP_BPF either.


Sean

^ permalink raw reply

* [PATCH net-next] qed*: Utilize FW 8.37.2.0
From: Michal Kalderon @ 2018-06-05 10:11 UTC (permalink / raw)
  To: michal.kalderon, davem
  Cc: netdev, linux-rdma, linux-scsi, Michal Kalderon, Ariel Elior,
	Manish Rangankar

This FW contains several fixes and features.

RDMA
- Several modifications and fixes for Memory Windows
- drop vlan and tcp timestamp from mss calculation in driver for
  this FW
- Fix SQ completion flow when local ack timeout is infinite
- Modifications in t10dif support

ETH
- Fix aRFS for tunneled traffic without inner IP.
- Fix chip configuration which may fail under heavy traffic conditions.
- Support receiving any-VNI in VXLAN and GENEVE RX classification.

iSCSI / FcoE
- Fix iSCSI recovery flow
- Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0

Misc
- Several registers (split registers) won't read correctly with
  ethtool -d

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
---
 drivers/infiniband/hw/qedr/qedr_hsi_rdma.h         | 139 +++---
 drivers/infiniband/hw/qedr/verbs.c                 |   4 +-
 drivers/net/ethernet/qlogic/qed/qed_debug.c        | 492 ++++++++++++---------
 drivers/net/ethernet/qlogic/qed/qed_dev.c          |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h          | 462 ++++++++++++-------
 drivers/net/ethernet/qlogic/qed/qed_hw.c           |  20 +
 drivers/net/ethernet/qlogic/qed/qed_hw.h           |  12 +
 .../net/ethernet/qlogic/qed/qed_init_fw_funcs.c    |  50 ++-
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c        |  13 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c           |   3 +
 drivers/net/ethernet/qlogic/qed/qed_l2.h           |   1 +
 drivers/net/ethernet/qlogic/qed/qed_rdma.c         |   8 +-
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h     |   3 +-
 drivers/net/ethernet/qlogic/qed/qed_roce.c         |  31 +-
 drivers/scsi/qedi/qedi_iscsi.c                     |   6 -
 include/linux/qed/common_hsi.h                     |   4 +-
 include/linux/qed/iscsi_common.h                   |   8 +-
 include/linux/qed/qed_rdma_if.h                    |   4 +-
 include/linux/qed/roce_common.h                    |   1 +
 19 files changed, 727 insertions(+), 536 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h b/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
index b816c80..7e1f702 100644
--- a/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
+++ b/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
@@ -116,6 +116,7 @@ enum rdma_cqe_requester_status_enum {
 	RDMA_CQE_REQ_STS_TRANSPORT_RETRY_CNT_ERR,
 	RDMA_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR,
 	RDMA_CQE_REQ_STS_XRC_VOILATION_ERR,
+	RDMA_CQE_REQ_STS_SIG_ERR,
 	MAX_RDMA_CQE_REQUESTER_STATUS_ENUM
 };
 
@@ -152,12 +153,12 @@ struct rdma_rq_sge {
 	struct regpair addr;
 	__le32 length;
 	__le32 flags;
-#define RDMA_RQ_SGE_L_KEY_MASK      0x3FFFFFF
-#define RDMA_RQ_SGE_L_KEY_SHIFT     0
+#define RDMA_RQ_SGE_L_KEY_LO_MASK   0x3FFFFFF
+#define RDMA_RQ_SGE_L_KEY_LO_SHIFT  0
 #define RDMA_RQ_SGE_NUM_SGES_MASK   0x7
 #define RDMA_RQ_SGE_NUM_SGES_SHIFT  26
-#define RDMA_RQ_SGE_RESERVED0_MASK  0x7
-#define RDMA_RQ_SGE_RESERVED0_SHIFT 29
+#define RDMA_RQ_SGE_L_KEY_HI_MASK   0x7
+#define RDMA_RQ_SGE_L_KEY_HI_SHIFT  29
 };
 
 struct rdma_srq_sge {
@@ -241,18 +242,39 @@ enum rdma_dif_io_direction_flg {
 	MAX_RDMA_DIF_IO_DIRECTION_FLG
 };
 
-/* RDMA DIF Runt Result Structure */
-struct rdma_dif_runt_result {
-	__le16 guard_tag;
-	__le16 reserved[3];
+struct rdma_dif_params {
+	__le32 base_ref_tag;
+	__le16 app_tag;
+	__le16 app_tag_mask;
+	__le16 runt_crc_value;
+	__le16 flags;
+#define RDMA_DIF_PARAMS_IO_DIRECTION_FLG_MASK    0x1
+#define RDMA_DIF_PARAMS_IO_DIRECTION_FLG_SHIFT   0
+#define RDMA_DIF_PARAMS_BLOCK_SIZE_MASK          0x1
+#define RDMA_DIF_PARAMS_BLOCK_SIZE_SHIFT         1
+#define RDMA_DIF_PARAMS_RUNT_VALID_FLG_MASK      0x1
+#define RDMA_DIF_PARAMS_RUNT_VALID_FLG_SHIFT     2
+#define RDMA_DIF_PARAMS_VALIDATE_CRC_GUARD_MASK  0x1
+#define RDMA_DIF_PARAMS_VALIDATE_CRC_GUARD_SHIFT 3
+#define RDMA_DIF_PARAMS_VALIDATE_REF_TAG_MASK    0x1
+#define RDMA_DIF_PARAMS_VALIDATE_REF_TAG_SHIFT   4
+#define RDMA_DIF_PARAMS_VALIDATE_APP_TAG_MASK    0x1
+#define RDMA_DIF_PARAMS_VALIDATE_APP_TAG_SHIFT   5
+#define RDMA_DIF_PARAMS_CRC_SEED_MASK            0x1
+#define RDMA_DIF_PARAMS_CRC_SEED_SHIFT           6
+#define RDMA_DIF_PARAMS_RX_REF_TAG_CONST_MASK    0x1
+#define RDMA_DIF_PARAMS_RX_REF_TAG_CONST_SHIFT   7
+#define RDMA_DIF_PARAMS_BLOCK_GUARD_TYPE_MASK    0x1
+#define RDMA_DIF_PARAMS_BLOCK_GUARD_TYPE_SHIFT   8
+#define RDMA_DIF_PARAMS_APP_ESCAPE_MASK          0x1
+#define RDMA_DIF_PARAMS_APP_ESCAPE_SHIFT         9
+#define RDMA_DIF_PARAMS_REF_ESCAPE_MASK          0x1
+#define RDMA_DIF_PARAMS_REF_ESCAPE_SHIFT         10
+#define RDMA_DIF_PARAMS_RESERVED4_MASK           0x1F
+#define RDMA_DIF_PARAMS_RESERVED4_SHIFT          11
+	__le32 reserved5;
 };
 
-/* Memory window type enumeration */
-enum rdma_mw_type {
-	RDMA_MW_TYPE_1,
-	RDMA_MW_TYPE_2A,
-	MAX_RDMA_MW_TYPE
-};
 
 struct rdma_sq_atomic_wqe {
 	__le32 reserved1;
@@ -334,17 +356,17 @@ struct rdma_sq_bind_wqe {
 #define RDMA_SQ_BIND_WQE_SE_FLG_SHIFT        3
 #define RDMA_SQ_BIND_WQE_INLINE_FLG_MASK     0x1
 #define RDMA_SQ_BIND_WQE_INLINE_FLG_SHIFT    4
-#define RDMA_SQ_BIND_WQE_RESERVED0_MASK      0x7
-#define RDMA_SQ_BIND_WQE_RESERVED0_SHIFT     5
+#define RDMA_SQ_BIND_WQE_DIF_ON_HOST_FLG_MASK  0x1
+#define RDMA_SQ_BIND_WQE_DIF_ON_HOST_FLG_SHIFT 5
+#define RDMA_SQ_BIND_WQE_RESERVED0_MASK      0x3
+#define RDMA_SQ_BIND_WQE_RESERVED0_SHIFT     6
 	u8 wqe_size;
 	u8 prev_wqe_size;
 	u8 bind_ctrl;
 #define RDMA_SQ_BIND_WQE_ZERO_BASED_MASK     0x1
 #define RDMA_SQ_BIND_WQE_ZERO_BASED_SHIFT    0
-#define RDMA_SQ_BIND_WQE_MW_TYPE_MASK        0x1
-#define RDMA_SQ_BIND_WQE_MW_TYPE_SHIFT       1
-#define RDMA_SQ_BIND_WQE_RESERVED1_MASK      0x3F
-#define RDMA_SQ_BIND_WQE_RESERVED1_SHIFT     2
+#define RDMA_SQ_BIND_WQE_RESERVED1_MASK        0x7F
+#define RDMA_SQ_BIND_WQE_RESERVED1_SHIFT       1
 	u8 access_ctrl;
 #define RDMA_SQ_BIND_WQE_REMOTE_READ_MASK    0x1
 #define RDMA_SQ_BIND_WQE_REMOTE_READ_SHIFT   0
@@ -363,6 +385,7 @@ struct rdma_sq_bind_wqe {
 	__le32 length_lo;
 	__le32 parent_l_key;
 	__le32 reserved4;
+	struct rdma_dif_params dif_params;
 };
 
 /* First element (16 bytes) of bind wqe */
@@ -392,10 +415,8 @@ struct rdma_sq_bind_wqe_2nd {
 	u8 bind_ctrl;
 #define RDMA_SQ_BIND_WQE_2ND_ZERO_BASED_MASK     0x1
 #define RDMA_SQ_BIND_WQE_2ND_ZERO_BASED_SHIFT    0
-#define RDMA_SQ_BIND_WQE_2ND_MW_TYPE_MASK        0x1
-#define RDMA_SQ_BIND_WQE_2ND_MW_TYPE_SHIFT       1
-#define RDMA_SQ_BIND_WQE_2ND_RESERVED1_MASK      0x3F
-#define RDMA_SQ_BIND_WQE_2ND_RESERVED1_SHIFT     2
+#define RDMA_SQ_BIND_WQE_2ND_RESERVED1_MASK      0x7F
+#define RDMA_SQ_BIND_WQE_2ND_RESERVED1_SHIFT     1
 	u8 access_ctrl;
 #define RDMA_SQ_BIND_WQE_2ND_REMOTE_READ_MASK    0x1
 #define RDMA_SQ_BIND_WQE_2ND_REMOTE_READ_SHIFT   0
@@ -416,6 +437,11 @@ struct rdma_sq_bind_wqe_2nd {
 	__le32 reserved4;
 };
 
+/* Third element (16 bytes) of bind wqe */
+struct rdma_sq_bind_wqe_3rd {
+	struct rdma_dif_params dif_params;
+};
+
 /* Structure with only the SQ WQE common
  * fields. Size is of one SQ element (16B)
  */
@@ -486,30 +512,6 @@ struct rdma_sq_fmr_wqe {
 	u8 length_hi;
 	__le32 length_lo;
 	struct regpair pbl_addr;
-	__le32 dif_base_ref_tag;
-	__le16 dif_app_tag;
-	__le16 dif_app_tag_mask;
-	__le16 dif_runt_crc_value;
-	__le16 dif_flags;
-#define RDMA_SQ_FMR_WQE_DIF_IO_DIRECTION_FLG_MASK	0x1
-#define RDMA_SQ_FMR_WQE_DIF_IO_DIRECTION_FLG_SHIFT	0
-#define RDMA_SQ_FMR_WQE_DIF_BLOCK_SIZE_MASK		0x1
-#define RDMA_SQ_FMR_WQE_DIF_BLOCK_SIZE_SHIFT		1
-#define RDMA_SQ_FMR_WQE_DIF_RUNT_VALID_FLG_MASK		0x1
-#define RDMA_SQ_FMR_WQE_DIF_RUNT_VALID_FLG_SHIFT	2
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_CRC_GUARD_MASK	0x1
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_CRC_GUARD_SHIFT	3
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_REF_TAG_MASK	0x1
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_REF_TAG_SHIFT	4
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_APP_TAG_MASK	0x1
-#define RDMA_SQ_FMR_WQE_DIF_VALIDATE_APP_TAG_SHIFT	5
-#define RDMA_SQ_FMR_WQE_DIF_CRC_SEED_MASK		0x1
-#define RDMA_SQ_FMR_WQE_DIF_CRC_SEED_SHIFT		6
-#define RDMA_SQ_FMR_WQE_DIF_RX_REF_TAG_CONST_MASK	0x1
-#define RDMA_SQ_FMR_WQE_DIF_RX_REF_TAG_CONST_SHIFT	7
-#define RDMA_SQ_FMR_WQE_RESERVED4_MASK			0xFF
-#define RDMA_SQ_FMR_WQE_RESERVED4_SHIFT			8
-	__le32 reserved5;
 };
 
 /* First element (16 bytes) of fmr wqe */
@@ -566,33 +568,6 @@ struct rdma_sq_fmr_wqe_2nd {
 	struct regpair pbl_addr;
 };
 
-/* Third element (16 bytes) of fmr wqe */
-struct rdma_sq_fmr_wqe_3rd {
-	__le32 dif_base_ref_tag;
-	__le16 dif_app_tag;
-	__le16 dif_app_tag_mask;
-	__le16 dif_runt_crc_value;
-	__le16 dif_flags;
-#define RDMA_SQ_FMR_WQE_3RD_DIF_IO_DIRECTION_FLG_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_IO_DIRECTION_FLG_SHIFT		0
-#define RDMA_SQ_FMR_WQE_3RD_DIF_BLOCK_SIZE_MASK			0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_BLOCK_SIZE_SHIFT		1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_RUNT_VALID_FLG_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_RUNT_VALID_FLG_SHIFT		2
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_CRC_GUARD_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_CRC_GUARD_SHIFT	3
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_REF_TAG_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_REF_TAG_SHIFT		4
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_APP_TAG_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_VALIDATE_APP_TAG_SHIFT		5
-#define RDMA_SQ_FMR_WQE_3RD_DIF_CRC_SEED_MASK			0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_CRC_SEED_SHIFT			6
-#define RDMA_SQ_FMR_WQE_3RD_DIF_RX_REF_TAG_CONST_MASK		0x1
-#define RDMA_SQ_FMR_WQE_3RD_DIF_RX_REF_TAG_CONST_SHIFT		7
-#define RDMA_SQ_FMR_WQE_3RD_RESERVED4_MASK			0xFF
-#define RDMA_SQ_FMR_WQE_RESERVED4_SHIFT				8
-	__le32 reserved5;
-};
 
 struct rdma_sq_local_inv_wqe {
 	struct regpair reserved;
@@ -637,8 +612,8 @@ struct rdma_sq_rdma_wqe {
 #define RDMA_SQ_RDMA_WQE_DIF_ON_HOST_FLG_SHIFT	5
 #define RDMA_SQ_RDMA_WQE_READ_INV_FLG_MASK	0x1
 #define RDMA_SQ_RDMA_WQE_READ_INV_FLG_SHIFT	6
-#define RDMA_SQ_RDMA_WQE_RESERVED0_MASK		0x1
-#define RDMA_SQ_RDMA_WQE_RESERVED0_SHIFT	7
+#define RDMA_SQ_RDMA_WQE_RESERVED1_MASK        0x1
+#define RDMA_SQ_RDMA_WQE_RESERVED1_SHIFT       7
 	u8 wqe_size;
 	u8 prev_wqe_size;
 	struct regpair remote_va;
@@ -646,13 +621,9 @@ struct rdma_sq_rdma_wqe {
 	u8 dif_flags;
 #define RDMA_SQ_RDMA_WQE_DIF_BLOCK_SIZE_MASK            0x1
 #define RDMA_SQ_RDMA_WQE_DIF_BLOCK_SIZE_SHIFT           0
-#define RDMA_SQ_RDMA_WQE_DIF_FIRST_RDMA_IN_IO_FLG_MASK  0x1
-#define RDMA_SQ_RDMA_WQE_DIF_FIRST_RDMA_IN_IO_FLG_SHIFT 1
-#define RDMA_SQ_RDMA_WQE_DIF_LAST_RDMA_IN_IO_FLG_MASK   0x1
-#define RDMA_SQ_RDMA_WQE_DIF_LAST_RDMA_IN_IO_FLG_SHIFT  2
-#define RDMA_SQ_RDMA_WQE_RESERVED1_MASK                 0x1F
-#define RDMA_SQ_RDMA_WQE_RESERVED1_SHIFT                3
-	u8 reserved2[3];
+#define RDMA_SQ_RDMA_WQE_RESERVED2_MASK        0x7F
+#define RDMA_SQ_RDMA_WQE_RESERVED2_SHIFT       1
+	u8 reserved3[3];
 };
 
 /* First element (16 bytes) of rdma wqe */
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 3f9afc0..e2caabb 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -3276,7 +3276,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 				SET_FIELD(flags, RDMA_RQ_SGE_NUM_SGES,
 					  wr->num_sge);
 
-			SET_FIELD(flags, RDMA_RQ_SGE_L_KEY,
+			SET_FIELD(flags, RDMA_RQ_SGE_L_KEY_LO,
 				  wr->sg_list[i].lkey);
 
 			RQ_SGE_SET(rqe, wr->sg_list[i].addr,
@@ -3295,7 +3295,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 			/* First one must include the number
 			 * of SGE in the list
 			 */
-			SET_FIELD(flags, RDMA_RQ_SGE_L_KEY, 0);
+			SET_FIELD(flags, RDMA_RQ_SGE_L_KEY_LO, 0);
 			SET_FIELD(flags, RDMA_RQ_SGE_NUM_SGES, 1);
 
 			RQ_SGE_SET(rqe, 0, 0, flags);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c
index 39124b5..b9ec460 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
@@ -183,16 +183,9 @@ enum platform_ids {
 	MAX_PLATFORM_IDS
 };
 
-struct chip_platform_defs {
-	u8 num_ports;
-	u8 num_pfs;
-	u8 num_vfs;
-};
-
 /* Chip constant definitions */
 struct chip_defs {
 	const char *name;
-	struct chip_platform_defs per_platform[MAX_PLATFORM_IDS];
 };
 
 /* Platform constant definitions */
@@ -317,6 +310,11 @@ struct phy_defs {
 	u32 tbus_data_hi_addr;
 };
 
+/* Split type definitions */
+struct split_type_defs {
+	const char *name;
+};
+
 /******************************** Constants **********************************/
 
 #define MAX_LCIDS			320
@@ -469,21 +467,9 @@ static struct dbg_array s_dbg_arrays[MAX_BIN_DBG_BUFFER_TYPE] = { {NULL} };
 
 /* Chip constant definitions array */
 static struct chip_defs s_chip_defs[MAX_CHIP_IDS] = {
-	{ "bb",
-	  {{MAX_NUM_PORTS_BB, MAX_NUM_PFS_BB, MAX_NUM_VFS_BB},
-	   {0, 0, 0},
-	   {0, 0, 0},
-	   {0, 0, 0} } },
-	{ "ah",
-	  {{MAX_NUM_PORTS_K2, MAX_NUM_PFS_K2, MAX_NUM_VFS_K2},
-	   {0, 0, 0},
-	   {0, 0, 0},
-	   {0, 0, 0} } },
-	{ "reserved",
-	   {{0, 0, 0},
-	   {0, 0, 0},
-	   {0, 0, 0},
-	   {0, 0, 0} } }
+	{"bb"},
+	{"ah"},
+	{"reserved"},
 };
 
 /* Storm constant definitions array */
@@ -1588,7 +1574,7 @@ static struct grc_param_defs s_grc_param_defs[] = {
 	{{0, 0, 0}, 0, 1, false, false, 0, 1},
 
 	/* DBG_GRC_PARAM_DUMP_BMB */
-	{{0, 0, 0}, 0, 1, false, false, 0, 1},
+	{{0, 0, 0}, 0, 1, false, false, 0, 0},
 
 	/* DBG_GRC_PARAM_DUMP_NIG */
 	{{1, 1, 1}, 0, 1, false, false, 0, 1},
@@ -1745,6 +1731,23 @@ static struct phy_defs s_phy_defs[] = {
 	 PHY_PCIE_IP_REG_AHB_CMU_CSR_0_X131_K2_E5},
 };
 
+static struct split_type_defs s_split_type_defs[] = {
+	/* SPLIT_TYPE_NONE */
+	{"eng"},
+
+	/* SPLIT_TYPE_PORT */
+	{"port"},
+
+	/* SPLIT_TYPE_PF */
+	{"pf"},
+
+	/* SPLIT_TYPE_PORT_PF */
+	{"port"},
+
+	/* SPLIT_TYPE_VF */
+	{"vf"}
+};
+
 /**************************** Private Functions ******************************/
 
 /* Reads and returns a single dword from the specified unaligned buffer */
@@ -1781,28 +1784,68 @@ static enum dbg_status qed_dbg_dev_init(struct qed_hwfn *p_hwfn,
 					struct qed_ptt *p_ptt)
 {
 	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
+	u8 num_pfs = 0, max_pfs_per_port = 0;
 
 	if (dev_data->initialized)
 		return DBG_STATUS_OK;
 
+	/* Set chip */
 	if (QED_IS_K2(p_hwfn->cdev)) {
 		dev_data->chip_id = CHIP_K2;
 		dev_data->mode_enable[MODE_K2] = 1;
+		dev_data->num_vfs = MAX_NUM_VFS_K2;
+		num_pfs = MAX_NUM_PFS_K2;
+		max_pfs_per_port = MAX_NUM_PFS_K2 / 2;
 	} else if (QED_IS_BB_B0(p_hwfn->cdev)) {
 		dev_data->chip_id = CHIP_BB;
 		dev_data->mode_enable[MODE_BB] = 1;
+		dev_data->num_vfs = MAX_NUM_VFS_BB;
+		num_pfs = MAX_NUM_PFS_BB;
+		max_pfs_per_port = MAX_NUM_PFS_BB;
 	} else {
 		return DBG_STATUS_UNKNOWN_CHIP;
 	}
 
+	/* Set platofrm */
 	dev_data->platform_id = PLATFORM_ASIC;
 	dev_data->mode_enable[MODE_ASIC] = 1;
 
+	/* Set port mode */
+	switch (qed_rd(p_hwfn, p_ptt, MISC_REG_PORT_MODE)) {
+	case 0:
+		dev_data->mode_enable[MODE_PORTS_PER_ENG_1] = 1;
+		break;
+	case 1:
+		dev_data->mode_enable[MODE_PORTS_PER_ENG_2] = 1;
+		break;
+	case 2:
+		dev_data->mode_enable[MODE_PORTS_PER_ENG_4] = 1;
+		break;
+	}
+
+	/* Set 100G mode */
+	if (dev_data->chip_id == CHIP_BB &&
+	    qed_rd(p_hwfn, p_ptt, CNIG_REG_NW_PORT_MODE_BB) == 2)
+		dev_data->mode_enable[MODE_100G] = 1;
+
+	/* Set number of ports */
+	if (dev_data->mode_enable[MODE_PORTS_PER_ENG_1] ||
+	    dev_data->mode_enable[MODE_100G])
+		dev_data->num_ports = 1;
+	else if (dev_data->mode_enable[MODE_PORTS_PER_ENG_2])
+		dev_data->num_ports = 2;
+	else if (dev_data->mode_enable[MODE_PORTS_PER_ENG_4])
+		dev_data->num_ports = 4;
+
+	/* Set number of PFs per port */
+	dev_data->num_pfs_per_port = min_t(u32,
+					   num_pfs / dev_data->num_ports,
+					   max_pfs_per_port);
+
 	/* Initializes the GRC parameters */
 	qed_dbg_grc_init_params(p_hwfn);
 
 	dev_data->use_dmae = true;
-	dev_data->num_regs_read = 0;
 	dev_data->initialized = 1;
 
 	return DBG_STATUS_OK;
@@ -1821,9 +1864,9 @@ static struct dbg_bus_block *get_dbg_bus_block_desc(struct qed_hwfn *p_hwfn,
 /* Reads the FW info structure for the specified Storm from the chip,
  * and writes it to the specified fw_info pointer.
  */
-static void qed_read_fw_info(struct qed_hwfn *p_hwfn,
-			     struct qed_ptt *p_ptt,
-			     u8 storm_id, struct fw_info *fw_info)
+static void qed_read_storm_fw_info(struct qed_hwfn *p_hwfn,
+				   struct qed_ptt *p_ptt,
+				   u8 storm_id, struct fw_info *fw_info)
 {
 	struct storm_defs *storm = &s_storm_defs[storm_id];
 	struct fw_info_location fw_info_location;
@@ -1945,45 +1988,29 @@ static u32 qed_dump_fw_ver_param(struct qed_hwfn *p_hwfn,
 				 struct qed_ptt *p_ptt,
 				 u32 *dump_buf, bool dump)
 {
-	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
 	char fw_ver_str[16] = EMPTY_FW_VERSION_STR;
 	char fw_img_str[16] = EMPTY_FW_IMAGE_STR;
 	struct fw_info fw_info = { {0}, {0} };
 	u32 offset = 0;
 
 	if (dump && !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_FW_VER)) {
-		/* Read FW image/version from PRAM in a non-reset SEMI */
-		bool found = false;
-		u8 storm_id;
-
-		for (storm_id = 0; storm_id < MAX_DBG_STORMS && !found;
-		     storm_id++) {
-			struct storm_defs *storm = &s_storm_defs[storm_id];
-
-			/* Read FW version/image */
-			if (dev_data->block_in_reset[storm->block_id])
-				continue;
-
-			/* Read FW info for the current Storm */
-			qed_read_fw_info(p_hwfn, p_ptt, storm_id, &fw_info);
-
-			/* Create FW version/image strings */
-			if (snprintf(fw_ver_str, sizeof(fw_ver_str),
-				     "%d_%d_%d_%d", fw_info.ver.num.major,
-				     fw_info.ver.num.minor, fw_info.ver.num.rev,
-				     fw_info.ver.num.eng) < 0)
-				DP_NOTICE(p_hwfn,
-					  "Unexpected debug error: invalid FW version string\n");
-			switch (fw_info.ver.image_id) {
-			case FW_IMG_MAIN:
-				strcpy(fw_img_str, "main");
-				break;
-			default:
-				strcpy(fw_img_str, "unknown");
-				break;
-			}
-
-			found = true;
+		/* Read FW info from chip */
+		qed_read_fw_info(p_hwfn, p_ptt, &fw_info);
+
+		/* Create FW version/image strings */
+		if (snprintf(fw_ver_str, sizeof(fw_ver_str),
+			     "%d_%d_%d_%d", fw_info.ver.num.major,
+			     fw_info.ver.num.minor, fw_info.ver.num.rev,
+			     fw_info.ver.num.eng) < 0)
+			DP_NOTICE(p_hwfn,
+				  "Unexpected debug error: invalid FW version string\n");
+		switch (fw_info.ver.image_id) {
+		case FW_IMG_MAIN:
+			strcpy(fw_img_str, "main");
+			break;
+		default:
+			strcpy(fw_img_str, "unknown");
+			break;
 		}
 	}
 
@@ -2412,20 +2439,21 @@ static void qed_grc_clear_all_prty(struct qed_hwfn *p_hwfn,
 
 /* Dumps GRC registers section header. Returns the dumped size in dwords.
  * The following parameters are dumped:
- * - count:	 no. of dumped entries
- * - split:	 split type
- * - id:	 split ID (dumped only if split_id >= 0)
+ * - count: no. of dumped entries
+ * - split_type: split type
+ * - split_id: split ID (dumped only if split_id != SPLIT_TYPE_NONE)
  * - param_name: user parameter value (dumped only if param_name != NULL
  *		 and param_val != NULL).
  */
 static u32 qed_grc_dump_regs_hdr(u32 *dump_buf,
 				 bool dump,
 				 u32 num_reg_entries,
-				 const char *split_type,
-				 int split_id,
+				 enum init_split_types split_type,
+				 u8 split_id,
 				 const char *param_name, const char *param_val)
 {
-	u8 num_params = 2 + (split_id >= 0 ? 1 : 0) + (param_name ? 1 : 0);
+	u8 num_params = 2 +
+	    (split_type != SPLIT_TYPE_NONE ? 1 : 0) + (param_name ? 1 : 0);
 	u32 offset = 0;
 
 	offset += qed_dump_section_hdr(dump_buf + offset,
@@ -2433,8 +2461,9 @@ static u32 qed_grc_dump_regs_hdr(u32 *dump_buf,
 	offset += qed_dump_num_param(dump_buf + offset,
 				     dump, "count", num_reg_entries);
 	offset += qed_dump_str_param(dump_buf + offset,
-				     dump, "split", split_type);
-	if (split_id >= 0)
+				     dump, "split",
+				     s_split_type_defs[split_type].name);
+	if (split_type != SPLIT_TYPE_NONE)
 		offset += qed_dump_num_param(dump_buf + offset,
 					     dump, "id", split_id);
 	if (param_name && param_val)
@@ -2463,9 +2492,12 @@ void qed_read_regs(struct qed_hwfn *p_hwfn,
 static u32 qed_grc_dump_addr_range(struct qed_hwfn *p_hwfn,
 				   struct qed_ptt *p_ptt,
 				   u32 *dump_buf,
-				   bool dump, u32 addr, u32 len, bool wide_bus)
+				   bool dump, u32 addr, u32 len, bool wide_bus,
+				   enum init_split_types split_type,
+				   u8 split_id)
 {
 	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
+	u8 port_id = 0, pf_id = 0, vf_id = 0, fid = 0;
 
 	if (!dump)
 		return len;
@@ -2481,8 +2513,27 @@ static u32 qed_grc_dump_addr_range(struct qed_hwfn *p_hwfn,
 		dev_data->num_regs_read = 0;
 	}
 
+	switch (split_type) {
+	case SPLIT_TYPE_PORT:
+		port_id = split_id;
+		break;
+	case SPLIT_TYPE_PF:
+		pf_id = split_id;
+		break;
+	case SPLIT_TYPE_PORT_PF:
+		port_id = split_id / dev_data->num_pfs_per_port;
+		pf_id = port_id + dev_data->num_ports *
+		    (split_id % dev_data->num_pfs_per_port);
+		break;
+	case SPLIT_TYPE_VF:
+		vf_id = split_id;
+		break;
+	default:
+		break;
+	}
+
 	/* Try reading using DMAE */
-	if (dev_data->use_dmae &&
+	if (dev_data->use_dmae && split_type == SPLIT_TYPE_NONE &&
 	    (len >= s_platform_defs[dev_data->platform_id].dmae_thresh ||
 	     wide_bus)) {
 		if (!qed_dmae_grc2host(p_hwfn, p_ptt, DWORDS_TO_BYTES(addr),
@@ -2494,7 +2545,37 @@ static u32 qed_grc_dump_addr_range(struct qed_hwfn *p_hwfn,
 			   "Failed reading from chip using DMAE, using GRC instead\n");
 	}
 
-	/* Read registers */
+	/* If not read using DMAE, read using GRC */
+
+	/* Set pretend */
+	if (split_type != dev_data->pretend.split_type || split_id !=
+	    dev_data->pretend.split_id) {
+		switch (split_type) {
+		case SPLIT_TYPE_PORT:
+			qed_port_pretend(p_hwfn, p_ptt, port_id);
+			break;
+		case SPLIT_TYPE_PF:
+			fid = pf_id << PXP_PRETEND_CONCRETE_FID_PFID_SHIFT;
+			qed_fid_pretend(p_hwfn, p_ptt, fid);
+			break;
+		case SPLIT_TYPE_PORT_PF:
+			fid = pf_id << PXP_PRETEND_CONCRETE_FID_PFID_SHIFT;
+			qed_port_fid_pretend(p_hwfn, p_ptt, port_id, fid);
+			break;
+		case SPLIT_TYPE_VF:
+			fid = BIT(PXP_PRETEND_CONCRETE_FID_VFVALID_SHIFT) |
+			      (vf_id << PXP_PRETEND_CONCRETE_FID_VFID_SHIFT);
+			qed_fid_pretend(p_hwfn, p_ptt, fid);
+			break;
+		default:
+			break;
+		}
+
+		dev_data->pretend.split_type = (u8)split_type;
+		dev_data->pretend.split_id = split_id;
+	}
+
+	/* Read registers using GRC */
 	qed_read_regs(p_hwfn, p_ptt, dump_buf, addr, len);
 
 	return len;
@@ -2518,7 +2599,8 @@ static u32 qed_grc_dump_reg_entry_hdr(u32 *dump_buf,
 static u32 qed_grc_dump_reg_entry(struct qed_hwfn *p_hwfn,
 				  struct qed_ptt *p_ptt,
 				  u32 *dump_buf,
-				  bool dump, u32 addr, u32 len, bool wide_bus)
+				  bool dump, u32 addr, u32 len, bool wide_bus,
+				  enum init_split_types split_type, u8 split_id)
 {
 	u32 offset = 0;
 
@@ -2526,7 +2608,8 @@ static u32 qed_grc_dump_reg_entry(struct qed_hwfn *p_hwfn,
 	offset += qed_grc_dump_addr_range(p_hwfn,
 					  p_ptt,
 					  dump_buf + offset,
-					  dump, addr, len, wide_bus);
+					  dump, addr, len, wide_bus,
+					  split_type, split_id);
 
 	return offset;
 }
@@ -2559,7 +2642,8 @@ static u32 qed_grc_dump_reg_entry_skip(struct qed_hwfn *p_hwfn,
 		offset += qed_grc_dump_addr_range(p_hwfn,
 						  p_ptt,
 						  dump_buf + offset,
-						  dump, addr, curr_len, false);
+						  dump,  addr, curr_len, false,
+						  SPLIT_TYPE_NONE, 0);
 		reg_offset += curr_len;
 		addr += curr_len;
 
@@ -2581,6 +2665,8 @@ static u32 qed_grc_dump_regs_entries(struct qed_hwfn *p_hwfn,
 				     struct dbg_array input_regs_arr,
 				     u32 *dump_buf,
 				     bool dump,
+				     enum init_split_types split_type,
+				     u8 split_id,
 				     bool block_enable[MAX_BLOCK_ID],
 				     u32 *num_dumped_reg_entries)
 {
@@ -2628,7 +2714,8 @@ static u32 qed_grc_dump_regs_entries(struct qed_hwfn *p_hwfn,
 							 dump,
 							 addr,
 							 len,
-							 wide_bus);
+							 wide_bus,
+							 split_type, split_id);
 			(*num_dumped_reg_entries)++;
 		}
 	}
@@ -2643,19 +2730,28 @@ static u32 qed_grc_dump_split_data(struct qed_hwfn *p_hwfn,
 				   u32 *dump_buf,
 				   bool dump,
 				   bool block_enable[MAX_BLOCK_ID],
-				   const char *split_type_name,
-				   u32 split_id,
+				   enum init_split_types split_type,
+				   u8 split_id,
 				   const char *param_name,
 				   const char *param_val)
 {
+	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
+	enum init_split_types hdr_split_type = split_type;
 	u32 num_dumped_reg_entries, offset;
+	u8 hdr_split_id = split_id;
+
+	/* In PORT_PF split type, print a port split header */
+	if (split_type == SPLIT_TYPE_PORT_PF) {
+		hdr_split_type = SPLIT_TYPE_PORT;
+		hdr_split_id = split_id / dev_data->num_pfs_per_port;
+	}
 
 	/* Calculate register dump header size (and skip it for now) */
 	offset = qed_grc_dump_regs_hdr(dump_buf,
 				       false,
 				       0,
-				       split_type_name,
-				       split_id, param_name, param_val);
+				       hdr_split_type,
+				       hdr_split_id, param_name, param_val);
 
 	/* Dump registers */
 	offset += qed_grc_dump_regs_entries(p_hwfn,
@@ -2663,6 +2759,8 @@ static u32 qed_grc_dump_split_data(struct qed_hwfn *p_hwfn,
 					    input_regs_arr,
 					    dump_buf + offset,
 					    dump,
+					    split_type,
+					    split_id,
 					    block_enable,
 					    &num_dumped_reg_entries);
 
@@ -2671,8 +2769,8 @@ static u32 qed_grc_dump_split_data(struct qed_hwfn *p_hwfn,
 		qed_grc_dump_regs_hdr(dump_buf,
 				      dump,
 				      num_dumped_reg_entries,
-				      split_type_name,
-				      split_id, param_name, param_val);
+				      hdr_split_type,
+				      hdr_split_id, param_name, param_val);
 
 	return num_dumped_reg_entries > 0 ? offset : 0;
 }
@@ -2688,26 +2786,21 @@ static u32 qed_grc_dump_registers(struct qed_hwfn *p_hwfn,
 				  const char *param_name, const char *param_val)
 {
 	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
-	struct chip_platform_defs *chip_platform;
 	u32 offset = 0, input_offset = 0;
-	struct chip_defs *chip;
-	u8 port_id, pf_id, vf_id;
 	u16 fid;
-
-	chip = &s_chip_defs[dev_data->chip_id];
-	chip_platform = &chip->per_platform[dev_data->platform_id];
-
 	while (input_offset <
 	       s_dbg_arrays[BIN_BUF_DBG_DUMP_REG].size_in_dwords) {
 		const struct dbg_dump_split_hdr *split_hdr;
 		struct dbg_array curr_input_regs_arr;
+		enum init_split_types split_type;
+		u16 split_count = 0;
 		u32 split_data_size;
-		u8 split_type_id;
+		u8 split_id;
 
 		split_hdr =
 			(const struct dbg_dump_split_hdr *)
 			&s_dbg_arrays[BIN_BUF_DBG_DUMP_REG].ptr[input_offset++];
-		split_type_id =
+		split_type =
 			GET_FIELD(split_hdr->hdr,
 				  DBG_DUMP_SPLIT_HDR_SPLIT_TYPE_ID);
 		split_data_size =
@@ -2717,99 +2810,44 @@ static u32 qed_grc_dump_registers(struct qed_hwfn *p_hwfn,
 			&s_dbg_arrays[BIN_BUF_DBG_DUMP_REG].ptr[input_offset];
 		curr_input_regs_arr.size_in_dwords = split_data_size;
 
-		switch (split_type_id) {
+		switch (split_type) {
 		case SPLIT_TYPE_NONE:
-			offset += qed_grc_dump_split_data(p_hwfn,
-							  p_ptt,
-							  curr_input_regs_arr,
-							  dump_buf + offset,
-							  dump,
-							  block_enable,
-							  "eng",
-							  (u32)(-1),
-							  param_name,
-							  param_val);
+			split_count = 1;
 			break;
-
 		case SPLIT_TYPE_PORT:
-			for (port_id = 0; port_id < chip_platform->num_ports;
-			     port_id++) {
-				if (dump)
-					qed_port_pretend(p_hwfn, p_ptt,
-							 port_id);
-				offset +=
-				    qed_grc_dump_split_data(p_hwfn, p_ptt,
-							    curr_input_regs_arr,
-							    dump_buf + offset,
-							    dump, block_enable,
-							    "port", port_id,
-							    param_name,
-							    param_val);
-			}
+			split_count = dev_data->num_ports;
 			break;
-
 		case SPLIT_TYPE_PF:
 		case SPLIT_TYPE_PORT_PF:
-			for (pf_id = 0; pf_id < chip_platform->num_pfs;
-			     pf_id++) {
-				u8 pfid_shift =
-					PXP_PRETEND_CONCRETE_FID_PFID_SHIFT;
-
-				if (dump) {
-					fid = pf_id << pfid_shift;
-					qed_fid_pretend(p_hwfn, p_ptt, fid);
-				}
-
-				offset +=
-				    qed_grc_dump_split_data(p_hwfn,
-							    p_ptt,
-							    curr_input_regs_arr,
-							    dump_buf + offset,
-							    dump,
-							    block_enable,
-							    "pf",
-							    pf_id,
-							    param_name,
-							    param_val);
-			}
+			split_count = dev_data->num_ports *
+			    dev_data->num_pfs_per_port;
 			break;
-
 		case SPLIT_TYPE_VF:
-			for (vf_id = 0; vf_id < chip_platform->num_vfs;
-			     vf_id++) {
-				u8 vfvalid_shift =
-					PXP_PRETEND_CONCRETE_FID_VFVALID_SHIFT;
-				u8 vfid_shift =
-					PXP_PRETEND_CONCRETE_FID_VFID_SHIFT;
-
-				if (dump) {
-					fid = BIT(vfvalid_shift) |
-					      (vf_id << vfid_shift);
-					qed_fid_pretend(p_hwfn, p_ptt, fid);
-				}
-
-				offset +=
-				    qed_grc_dump_split_data(p_hwfn, p_ptt,
-							    curr_input_regs_arr,
-							    dump_buf + offset,
-							    dump, block_enable,
-							    "vf", vf_id,
-							    param_name,
-							    param_val);
-			}
+			split_count = dev_data->num_vfs;
 			break;
-
 		default:
-			break;
+			return 0;
 		}
 
+		for (split_id = 0; split_id < split_count; split_id++)
+			offset += qed_grc_dump_split_data(p_hwfn, p_ptt,
+							  curr_input_regs_arr,
+							  dump_buf + offset,
+							  dump, block_enable,
+							  split_type,
+							  split_id,
+							  param_name,
+							  param_val);
+
 		input_offset += split_data_size;
 	}
 
-	/* Pretend to original PF */
+	/* Cancel pretends (pretend to original PF) */
 	if (dump) {
 		fid = p_hwfn->rel_pf_id << PXP_PRETEND_CONCRETE_FID_PFID_SHIFT;
 		qed_fid_pretend(p_hwfn, p_ptt, fid);
+		dev_data->pretend.split_type = SPLIT_TYPE_NONE;
+		dev_data->pretend.split_id = 0;
 	}
 
 	return offset;
@@ -2825,7 +2863,8 @@ static u32 qed_grc_dump_reset_regs(struct qed_hwfn *p_hwfn,
 
 	/* Calculate header size */
 	offset += qed_grc_dump_regs_hdr(dump_buf,
-					false, 0, "eng", -1, NULL, NULL);
+					false, 0,
+					SPLIT_TYPE_NONE, 0, NULL, NULL);
 
 	/* Write reset registers */
 	for (i = 0; i < MAX_DBG_RESET_REGS; i++) {
@@ -2838,14 +2877,15 @@ static u32 qed_grc_dump_reset_regs(struct qed_hwfn *p_hwfn,
 						 dump,
 						 BYTES_TO_DWORDS
 						 (s_reset_regs_defs[i].addr), 1,
-						 false);
+						 false, SPLIT_TYPE_NONE, 0);
 		num_regs++;
 	}
 
 	/* Write header */
 	if (dump)
 		qed_grc_dump_regs_hdr(dump_buf,
-				      true, num_regs, "eng", -1, NULL, NULL);
+				      true, num_regs, SPLIT_TYPE_NONE,
+				      0, NULL, NULL);
 
 	return offset;
 }
@@ -2864,7 +2904,8 @@ static u32 qed_grc_dump_modified_regs(struct qed_hwfn *p_hwfn,
 
 	/* Calculate header size */
 	offset += qed_grc_dump_regs_hdr(dump_buf,
-					false, 0, "eng", -1, NULL, NULL);
+					false, 0, SPLIT_TYPE_NONE,
+					0, NULL, NULL);
 
 	/* Write parity registers */
 	for (block_id = 0; block_id < MAX_BLOCK_ID; block_id++) {
@@ -2899,7 +2940,8 @@ static u32 qed_grc_dump_modified_regs(struct qed_hwfn *p_hwfn,
 							 dump_buf + offset,
 							 dump,
 							 addr,
-							 1, false);
+							 1, false,
+							 SPLIT_TYPE_NONE, 0);
 			addr = GET_FIELD(reg_data->data,
 					 DBG_ATTN_REG_STS_ADDRESS);
 			offset += qed_grc_dump_reg_entry(p_hwfn,
@@ -2907,7 +2949,8 @@ static u32 qed_grc_dump_modified_regs(struct qed_hwfn *p_hwfn,
 							 dump_buf + offset,
 							 dump,
 							 addr,
-							 1, false);
+							 1, false,
+							 SPLIT_TYPE_NONE, 0);
 			num_reg_entries += 2;
 		}
 	}
@@ -2929,7 +2972,7 @@ static u32 qed_grc_dump_modified_regs(struct qed_hwfn *p_hwfn,
 						 dump,
 						 addr,
 						 1,
-						 false);
+						 false, SPLIT_TYPE_NONE, 0);
 		num_reg_entries++;
 	}
 
@@ -2937,7 +2980,8 @@ static u32 qed_grc_dump_modified_regs(struct qed_hwfn *p_hwfn,
 	if (dump)
 		qed_grc_dump_regs_hdr(dump_buf,
 				      true,
-				      num_reg_entries, "eng", -1, NULL, NULL);
+				      num_reg_entries, SPLIT_TYPE_NONE,
+				      0, NULL, NULL);
 
 	return offset;
 }
@@ -2950,7 +2994,8 @@ static u32 qed_grc_dump_special_regs(struct qed_hwfn *p_hwfn,
 	u32 offset = 0, addr;
 
 	offset += qed_grc_dump_regs_hdr(dump_buf,
-					dump, 2, "eng", -1, NULL, NULL);
+					dump, 2, SPLIT_TYPE_NONE, 0,
+					NULL, NULL);
 
 	/* Dump R/TDIF_REG_DEBUG_ERROR_INFO_SIZE (every 8'th register should be
 	 * skipped).
@@ -3096,7 +3141,8 @@ static u32 qed_grc_dump_mem(struct qed_hwfn *p_hwfn,
 	offset += qed_grc_dump_addr_range(p_hwfn,
 					  p_ptt,
 					  dump_buf + offset,
-					  dump, addr, len, wide_bus);
+					  dump, addr, len, wide_bus,
+					  SPLIT_TYPE_NONE, 0);
 
 	return offset;
 }
@@ -3235,12 +3281,12 @@ static u32 qed_grc_dump_memories(struct qed_hwfn *p_hwfn,
 	       s_dbg_arrays[BIN_BUF_DBG_DUMP_MEM].size_in_dwords) {
 		const struct dbg_dump_split_hdr *split_hdr;
 		struct dbg_array curr_input_mems_arr;
+		enum init_split_types split_type;
 		u32 split_data_size;
-		u8 split_type_id;
 
 		split_hdr = (const struct dbg_dump_split_hdr *)
 			&s_dbg_arrays[BIN_BUF_DBG_DUMP_MEM].ptr[input_offset++];
-		split_type_id =
+		split_type =
 			GET_FIELD(split_hdr->hdr,
 				  DBG_DUMP_SPLIT_HDR_SPLIT_TYPE_ID);
 		split_data_size =
@@ -3250,20 +3296,15 @@ static u32 qed_grc_dump_memories(struct qed_hwfn *p_hwfn,
 			&s_dbg_arrays[BIN_BUF_DBG_DUMP_MEM].ptr[input_offset];
 		curr_input_mems_arr.size_in_dwords = split_data_size;
 
-		switch (split_type_id) {
-		case SPLIT_TYPE_NONE:
+		if (split_type == SPLIT_TYPE_NONE)
 			offset += qed_grc_dump_mem_entries(p_hwfn,
 							   p_ptt,
 							   curr_input_mems_arr,
 							   dump_buf + offset,
 							   dump);
-			break;
-
-		default:
+		else
 			DP_NOTICE(p_hwfn,
 				  "Dumping split memories is currently not supported\n");
-			break;
-		}
 
 		input_offset += split_data_size;
 	}
@@ -3623,7 +3664,8 @@ static u32 qed_grc_dump_rss(struct qed_hwfn *p_hwfn,
 							  dump,
 							  addr,
 							  num_dwords_to_read,
-							  false);
+							  false,
+							  SPLIT_TYPE_NONE, 0);
 			total_dwords -= num_dwords_to_read;
 			rss_addr++;
 		}
@@ -3682,7 +3724,7 @@ static u32 qed_grc_dump_big_ram(struct qed_hwfn *p_hwfn,
 						  dump,
 						  addr,
 						  len,
-						  false);
+						  false, SPLIT_TYPE_NONE, 0);
 	}
 
 	return offset;
@@ -3731,7 +3773,8 @@ static u32 qed_grc_dump_mcp(struct qed_hwfn *p_hwfn,
 
 	/* Dump required non-MCP registers */
 	offset += qed_grc_dump_regs_hdr(dump_buf + offset,
-					dump, 1, "eng", -1, "block", "MCP");
+					dump, 1, SPLIT_TYPE_NONE, 0,
+					"block", "MCP");
 	addr = BYTES_TO_DWORDS(MISC_REG_SHARED_MEM_ADDR);
 	offset += qed_grc_dump_reg_entry(p_hwfn,
 					 p_ptt,
@@ -3739,7 +3782,7 @@ static u32 qed_grc_dump_mcp(struct qed_hwfn *p_hwfn,
 					 dump,
 					 addr,
 					 1,
-					 false);
+					 false, SPLIT_TYPE_NONE, 0);
 
 	/* Release MCP */
 	if (halted && qed_mcp_resume(p_hwfn, p_ptt))
@@ -3923,7 +3966,8 @@ static u32 qed_grc_dump_static_debug(struct qed_hwfn *p_hwfn,
 							  dump,
 							  addr,
 							  len,
-							  true);
+							  true, SPLIT_TYPE_NONE,
+							  0);
 		}
 
 		/* Disable block's client and debug output */
@@ -3949,28 +3993,15 @@ static enum dbg_status qed_grc_dump(struct qed_hwfn *p_hwfn,
 {
 	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
 	bool parities_masked = false;
-	u8 i, port_mode = 0;
 	u32 offset = 0;
+	u8 i;
 
 	*num_dumped_dwords = 0;
+	dev_data->num_regs_read = 0;
 
-	if (dump) {
-		/* Find port mode */
-		switch (qed_rd(p_hwfn, p_ptt, MISC_REG_PORT_MODE)) {
-		case 0:
-			port_mode = 1;
-			break;
-		case 1:
-			port_mode = 2;
-			break;
-		case 2:
-			port_mode = 4;
-			break;
-		}
-
-		/* Update reset state */
+	/* Update reset state */
+	if (dump)
 		qed_update_blocks_reset_state(p_hwfn, p_ptt);
-	}
 
 	/* Dump global params */
 	offset += qed_dump_common_global_params(p_hwfn,
@@ -3989,7 +4020,7 @@ static enum dbg_status qed_grc_dump(struct qed_hwfn *p_hwfn,
 				     qed_grc_get_param(p_hwfn,
 						DBG_GRC_PARAM_NUM_LTIDS));
 	offset += qed_dump_num_param(dump_buf + offset,
-				     dump, "num-ports", port_mode);
+				     dump, "num-ports", dev_data->num_ports);
 
 	/* Dump reset registers (dumped before taking blocks out of reset ) */
 	if (qed_grc_is_included(p_hwfn, DBG_GRC_PARAM_DUMP_REGS))
@@ -4093,10 +4124,10 @@ static enum dbg_status qed_grc_dump(struct qed_hwfn *p_hwfn,
 		offset += qed_grc_dump_phy(p_hwfn,
 					   p_ptt, dump_buf + offset, dump);
 
-	/* Dump static debug data  */
+	/* Dump static debug data (only if not during debug bus recording) */
 	if (qed_grc_is_included(p_hwfn,
 				DBG_GRC_PARAM_DUMP_STATIC) &&
-	    dev_data->bus.state == DBG_BUS_STATE_IDLE)
+	    (!dump || dev_data->bus.state == DBG_BUS_STATE_IDLE))
 		offset += qed_grc_dump_static_debug(p_hwfn,
 						    p_ptt,
 						    dump_buf + offset, dump);
@@ -4250,7 +4281,8 @@ static u32 qed_idle_chk_dump_failure(struct qed_hwfn *p_hwfn,
 							  dump_buf + offset,
 							  dump,
 							  addr,
-							  reg->size, wide_bus);
+							  reg->size, wide_bus,
+							  SPLIT_TYPE_NONE, 0);
 		}
 	}
 
@@ -4373,7 +4405,8 @@ qed_idle_chk_dump_rule_entries(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt,
 							    next_reg_offset,
 							    dump, addr,
 							    reg->entry_size,
-							    wide_bus);
+							    wide_bus,
+							    SPLIT_TYPE_NONE, 0);
 			}
 
 			/* Call rule condition function.
@@ -4723,7 +4756,8 @@ static enum dbg_status qed_mcp_trace_dump(struct qed_hwfn *p_hwfn,
 					  dump_buf + offset,
 					  dump,
 					  BYTES_TO_DWORDS(trace_data_grc_addr),
-					  trace_data_size_dwords, false);
+					  trace_data_size_dwords, false,
+					  SPLIT_TYPE_NONE, 0);
 
 	/* Resume MCP (only if halt succeeded) */
 	if (halted && qed_mcp_resume(p_hwfn, p_ptt))
@@ -4829,7 +4863,8 @@ static enum dbg_status qed_reg_fifo_dump(struct qed_hwfn *p_hwfn,
 						  true,
 						  addr,
 						  len,
-						  true);
+						  true, SPLIT_TYPE_NONE,
+						  0);
 		fifo_has_data = qed_rd(p_hwfn, p_ptt,
 				       GRC_REG_TRACE_FIFO_VALID_DATA) > 0;
 	}
@@ -4898,7 +4933,8 @@ static enum dbg_status qed_igu_fifo_dump(struct qed_hwfn *p_hwfn,
 						  true,
 						  addr,
 						  len,
-						  true);
+						  true, SPLIT_TYPE_NONE,
+						  0);
 		fifo_has_data = qed_rd(p_hwfn, p_ptt,
 				       IGU_REG_ERROR_HANDLING_DATA_VALID) > 0;
 	}
@@ -4956,7 +4992,7 @@ static enum dbg_status qed_protection_override_dump(struct qed_hwfn *p_hwfn,
 					  true,
 					  addr,
 					  override_window_dwords,
-					  true);
+					  true, SPLIT_TYPE_NONE, 0);
 	qed_dump_num_param(dump_buf + size_param_offset, dump, "size",
 			   override_window_dwords);
 out:
@@ -4998,7 +5034,7 @@ static u32 qed_fw_asserts_dump(struct qed_hwfn *p_hwfn,
 			continue;
 
 		/* Read FW info for the current Storm */
-		qed_read_fw_info(p_hwfn, p_ptt, storm_id, &fw_info);
+		qed_read_storm_fw_info(p_hwfn, p_ptt, storm_id, &fw_info);
 
 		asserts = &fw_info.fw_asserts_section;
 
@@ -5036,7 +5072,7 @@ static u32 qed_fw_asserts_dump(struct qed_hwfn *p_hwfn,
 					    dump_buf + offset,
 					    dump, addr,
 					    asserts->list_element_dword_size,
-					    false);
+						  false, SPLIT_TYPE_NONE, 0);
 	}
 
 	/* Dump last section */
@@ -5063,6 +5099,28 @@ enum dbg_status qed_dbg_set_bin_ptr(const u8 * const bin_ptr)
 	return DBG_STATUS_OK;
 }
 
+bool qed_read_fw_info(struct qed_hwfn *p_hwfn,
+		      struct qed_ptt *p_ptt, struct fw_info *fw_info)
+{
+	struct dbg_tools_data *dev_data = &p_hwfn->dbg_info;
+	u8 storm_id;
+
+	for (storm_id = 0; storm_id < MAX_DBG_STORMS; storm_id++) {
+		struct storm_defs *storm = &s_storm_defs[storm_id];
+
+		/* Skip Storm if it's in reset */
+		if (dev_data->block_in_reset[storm->block_id])
+			continue;
+
+		/* Read FW info for the current Storm */
+		qed_read_storm_fw_info(p_hwfn, p_ptt, storm_id, fw_info);
+
+		return true;
+	}
+
+	return false;
+}
+
 /* Assign default GRC param values */
 void qed_dbg_grc_set_params_default(struct qed_hwfn *p_hwfn)
 {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index fde20fd..b285edc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -2792,7 +2792,7 @@ static void qed_hw_info_port_num_bb(struct qed_hwfn *p_hwfn,
 {
 	u32 port_mode;
 
-	port_mode = qed_rd(p_hwfn, p_ptt, CNIG_REG_NW_PORT_MODE_BB_B0);
+	port_mode = qed_rd(p_hwfn, p_ptt, CNIG_REG_NW_PORT_MODE_BB);
 
 	if (port_mode < 3) {
 		p_hwfn->cdev->num_ports_in_engine = 1;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index b9704be..bee10c1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -1095,14 +1095,16 @@ enum personality_type {
 struct pf_start_tunnel_config {
 	u8 set_vxlan_udp_port_flg;
 	u8 set_geneve_udp_port_flg;
+	u8 set_no_inner_l2_vxlan_udp_port_flg;
 	u8 tunnel_clss_vxlan;
 	u8 tunnel_clss_l2geneve;
 	u8 tunnel_clss_ipgeneve;
 	u8 tunnel_clss_l2gre;
 	u8 tunnel_clss_ipgre;
-	u8 reserved;
 	__le16 vxlan_udp_port;
 	__le16 geneve_udp_port;
+	__le16 no_inner_l2_vxlan_udp_port;
+	__le16 reserved[3];
 };
 
 /* Ramrod data for PF start ramrod */
@@ -1145,14 +1147,17 @@ struct pf_update_tunnel_config {
 	u8 update_rx_def_non_ucast_clss;
 	u8 set_vxlan_udp_port_flg;
 	u8 set_geneve_udp_port_flg;
+	u8 set_no_inner_l2_vxlan_udp_port_flg;
 	u8 tunnel_clss_vxlan;
 	u8 tunnel_clss_l2geneve;
 	u8 tunnel_clss_ipgeneve;
 	u8 tunnel_clss_l2gre;
 	u8 tunnel_clss_ipgre;
+	u8 reserved;
 	__le16 vxlan_udp_port;
 	__le16 geneve_udp_port;
-	__le16 reserved;
+	__le16 no_inner_l2_vxlan_udp_port;
+	__le16 reserved1[3];
 };
 
 /* Data for port update ramrod */
@@ -2535,7 +2540,14 @@ struct idle_chk_data {
 	u16 reserved2;
 };
 
-/* Debug Tools data (per HW function) */
+struct pretend_params {
+	u8 split_type;
+	u8 reserved;
+	u16 split_id;
+};
+
+/* Debug Tools data (per HW function)
+ */
 struct dbg_tools_data {
 	struct dbg_grc_data grc;
 	struct dbg_bus_data bus;
@@ -2544,8 +2556,13 @@ struct dbg_tools_data {
 	u8 block_in_reset[88];
 	u8 chip_id;
 	u8 platform_id;
+	u8 num_ports;
+	u8 num_pfs_per_port;
+	u8 num_vfs;
 	u8 initialized;
 	u8 use_dmae;
+	u8 reserved;
+	struct pretend_params pretend;
 	u32 num_regs_read;
 };
 
@@ -2975,6 +2992,24 @@ void qed_read_regs(struct qed_hwfn *p_hwfn,
 		   struct qed_ptt *p_ptt, u32 *buf, u32 addr, u32 len);
 
 /**
+ * @brief qed_read_fw_info - Reads FW info from the chip.
+ *
+ * The FW info contains FW-related information, such as the FW version,
+ * FW image (main/L2B/kuku), FW timestamp, etc.
+ * The FW info is read from the internal RAM of the first Storm that is not in
+ * reset.
+ *
+ * @param p_hwfn -	    HW device data
+ * @param p_ptt -	    Ptt window used for writing the registers.
+ * @param fw_info -	Out: a pointer to write the FW info into.
+ *
+ * @return true if the FW info was read successfully from one of the Storms,
+ * or false if all Storms are in reset.
+ */
+bool qed_read_fw_info(struct qed_hwfn *p_hwfn,
+		      struct qed_ptt *p_ptt, struct fw_info *fw_info);
+
+/**
  * @brief qed_dbg_grc_set_params_default - Reverts all GRC parameters to their
  *	default value.
  *
@@ -4110,6 +4145,21 @@ void qed_memset_session_ctx(void *p_ctx_mem, u32 ctx_size, u8 ctx_type);
  */
 void qed_memset_task_ctx(void *p_ctx_mem, u32 ctx_size, u8 ctx_type);
 
+#define NUM_STORMS 6
+
+/**
+ * @brief qed_set_rdma_error_level - Sets the RDMA assert level.
+ *                                   If the severity of the error will be
+ *                                   above the level, the FW will assert.
+ * @param p_hwfn - HW device data
+ * @param p_ptt - ptt window used for writing the registers
+ * @param assert_level - An array of assert levels for each storm.
+ *
+ */
+void qed_set_rdma_error_level(struct qed_hwfn *p_hwfn,
+			      struct qed_ptt *p_ptt,
+			      u8 assert_level[NUM_STORMS]);
+
 /* Ystorm flow control mode. Use enum fw_flow_ctrl_mode */
 #define YSTORM_FLOW_CONTROL_MODE_OFFSET			(IRO[0].base)
 #define YSTORM_FLOW_CONTROL_MODE_SIZE			(IRO[0].size)
@@ -4340,27 +4390,67 @@ void qed_memset_task_ctx(void *p_ctx_mem, u32 ctx_size, u8 ctx_type);
 	(IRO[46].base + ((rdma_stat_counter_id) * IRO[46].m1))
 #define TSTORM_RDMA_QUEUE_STAT_SIZE			(IRO[46].size)
 
+/* Xstorm error level for assert */
+#define XSTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[47].base +	((pf_id) * IRO[47].m1))
+#define XSTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[47].size)
+
+/* Ystorm error level for assert */
+#define YSTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[48].base + ((pf_id) * IRO[48].m1))
+#define YSTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[48].size)
+
+/* Pstorm error level for assert */
+#define PSTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[49].base +	((pf_id) * IRO[49].m1))
+#define PSTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[49].size)
+
+/* Tstorm error level for assert */
+#define TSTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[50].base +	((pf_id) * IRO[50].m1))
+#define TSTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[50].size)
+
+/* Mstorm error level for assert */
+#define MSTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[51].base + ((pf_id) * IRO[51].m1))
+#define MSTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[51].size)
+
+/* Ustorm error level for assert */
+#define USTORM_RDMA_ASSERT_LEVEL_OFFSET(pf_id) \
+	(IRO[52].base + ((pf_id) * IRO[52].m1))
+#define USTORM_RDMA_ASSERT_LEVEL_SIZE			(IRO[52].size)
+
 /* Xstorm iWARP rxmit stats */
 #define XSTORM_IWARP_RXMIT_STATS_OFFSET(pf_id) \
-	(IRO[47].base + ((pf_id) * IRO[47].m1))
-#define XSTORM_IWARP_RXMIT_STATS_SIZE			(IRO[47].size)
+	(IRO[53].base +	((pf_id) * IRO[53].m1))
+#define XSTORM_IWARP_RXMIT_STATS_SIZE			(IRO[53].size)
 
 /* Tstorm RoCE Event Statistics */
 #define TSTORM_ROCE_EVENTS_STAT_OFFSET(roce_pf_id) \
-	(IRO[48].base + ((roce_pf_id) * IRO[48].m1))
-#define TSTORM_ROCE_EVENTS_STAT_SIZE			(IRO[48].size)
+	(IRO[54].base + ((roce_pf_id) * IRO[54].m1))
+#define TSTORM_ROCE_EVENTS_STAT_SIZE			(IRO[54].size)
 
 /* DCQCN Received Statistics */
 #define YSTORM_ROCE_DCQCN_RECEIVED_STATS_OFFSET(roce_pf_id) \
-	(IRO[49].base + ((roce_pf_id) * IRO[49].m1))
-#define YSTORM_ROCE_DCQCN_RECEIVED_STATS_SIZE		(IRO[49].size)
+	(IRO[55].base + ((roce_pf_id) * IRO[55].m1))
+#define YSTORM_ROCE_DCQCN_RECEIVED_STATS_SIZE		(IRO[55].size)
+
+/* RoCE Error Statistics */
+#define YSTORM_ROCE_ERROR_STATS_OFFSET(roce_pf_id) \
+	(IRO[56].base + ((roce_pf_id) * IRO[56].m1))
+#define YSTORM_ROCE_ERROR_STATS_SIZE			(IRO[56].size)
 
 /* DCQCN Sent Statistics */
 #define PSTORM_ROCE_DCQCN_SENT_STATS_OFFSET(roce_pf_id) \
-	(IRO[50].base + ((roce_pf_id) * IRO[50].m1))
-#define PSTORM_ROCE_DCQCN_SENT_STATS_SIZE		(IRO[50].size)
+	(IRO[57].base + ((roce_pf_id) * IRO[57].m1))
+#define PSTORM_ROCE_DCQCN_SENT_STATS_SIZE		(IRO[57].size)
 
-static const struct iro iro_arr[51] = {
+/* RoCE CQEs Statistics */
+#define USTORM_ROCE_CQE_STATS_OFFSET(roce_pf_id) \
+	(IRO[58].base + ((roce_pf_id) * IRO[58].m1))
+#define USTORM_ROCE_CQE_STATS_SIZE			(IRO[58].size)
+
+static const struct iro iro_arr[59] = {
 	{0x0, 0x0, 0x0, 0x0, 0x8},
 	{0x4cb8, 0x88, 0x0, 0x0, 0x88},
 	{0x6530, 0x20, 0x0, 0x0, 0x20},
@@ -4408,10 +4498,18 @@ static const struct iro iro_arr[51] = {
 	{0x10768, 0x20, 0x0, 0x0, 0x20},
 	{0x2d48, 0x80, 0x0, 0x0, 0x10},
 	{0x5048, 0x10, 0x0, 0x0, 0x10},
+	{0xc748, 0x8, 0x0, 0x0, 0x1},
+	{0xa128, 0x8, 0x0, 0x0, 0x1},
+	{0x10f00, 0x8, 0x0, 0x0, 0x1},
+	{0xf030, 0x8, 0x0, 0x0, 0x1},
+	{0x13028, 0x8, 0x0, 0x0, 0x1},
+	{0x12c58, 0x8, 0x0, 0x0, 0x1},
 	{0xc9b8, 0x30, 0x0, 0x0, 0x10},
-	{0xed90, 0x10, 0x0, 0x0, 0x10},
-	{0xa3a0, 0x10, 0x0, 0x0, 0x10},
+	{0xed90, 0x28, 0x0, 0x0, 0x28},
+	{0xa520, 0x18, 0x0, 0x0, 0x18},
+	{0xa6a0, 0x8, 0x0, 0x0, 0x8},
 	{0x13108, 0x8, 0x0, 0x0, 0x8},
+	{0x13c50, 0x18, 0x0, 0x0, 0x18},
 };
 
 /* Runtime array offsets */
@@ -4797,147 +4895,147 @@ static const struct iro iro_arr[51] = {
 #define NIG_REG_LLH_FUNC_FILTER_HDR_SEL_RT_OFFSET		39769
 #define NIG_REG_LLH_FUNC_FILTER_HDR_SEL_RT_SIZE			16
 #define NIG_REG_TX_EDPM_CTRL_RT_OFFSET				39785
-#define NIG_REG_ROCE_DUPLICATE_TO_HOST_RT_OFFSET		39786
-#define NIG_REG_PPF_TO_ENGINE_SEL_RT_OFFSET			39787
-#define NIG_REG_PPF_TO_ENGINE_SEL_RT_SIZE			8
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_VALUE_RT_OFFSET		39795
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_VALUE_RT_SIZE		1024
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_EN_RT_OFFSET		40819
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_EN_RT_SIZE		512
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_MODE_RT_OFFSET		41331
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_MODE_RT_SIZE		512
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_PROTOCOL_TYPE_RT_OFFSET	41843
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_PROTOCOL_TYPE_RT_SIZE	512
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_HDR_SEL_RT_OFFSET	42355
-#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_HDR_SEL_RT_SIZE		512
-#define NIG_REG_LLH_PF_CLS_FILTERS_MAP_RT_OFFSET		42867
-#define NIG_REG_LLH_PF_CLS_FILTERS_MAP_RT_SIZE			32
-#define CDU_REG_CID_ADDR_PARAMS_RT_OFFSET			42899
-#define CDU_REG_SEGMENT0_PARAMS_RT_OFFSET			42900
-#define CDU_REG_SEGMENT1_PARAMS_RT_OFFSET			42901
-#define CDU_REG_PF_SEG0_TYPE_OFFSET_RT_OFFSET			42902
-#define CDU_REG_PF_SEG1_TYPE_OFFSET_RT_OFFSET			42903
-#define CDU_REG_PF_SEG2_TYPE_OFFSET_RT_OFFSET			42904
-#define CDU_REG_PF_SEG3_TYPE_OFFSET_RT_OFFSET			42905
-#define CDU_REG_PF_FL_SEG0_TYPE_OFFSET_RT_OFFSET		42906
-#define CDU_REG_PF_FL_SEG1_TYPE_OFFSET_RT_OFFSET		42907
-#define CDU_REG_PF_FL_SEG2_TYPE_OFFSET_RT_OFFSET		42908
-#define CDU_REG_PF_FL_SEG3_TYPE_OFFSET_RT_OFFSET		42909
-#define CDU_REG_VF_SEG_TYPE_OFFSET_RT_OFFSET			42910
-#define CDU_REG_VF_FL_SEG_TYPE_OFFSET_RT_OFFSET			42911
-#define PBF_REG_TAG_ETHERTYPE_0_RT_OFFSET			42912
-#define PBF_REG_BTB_SHARED_AREA_SIZE_RT_OFFSET			42913
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ0_RT_OFFSET		42914
-#define PBF_REG_BTB_GUARANTEED_VOQ0_RT_OFFSET			42915
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ0_RT_OFFSET		42916
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ1_RT_OFFSET		42917
-#define PBF_REG_BTB_GUARANTEED_VOQ1_RT_OFFSET			42918
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ1_RT_OFFSET		42919
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ2_RT_OFFSET		42920
-#define PBF_REG_BTB_GUARANTEED_VOQ2_RT_OFFSET			42921
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ2_RT_OFFSET		42922
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ3_RT_OFFSET		42923
-#define PBF_REG_BTB_GUARANTEED_VOQ3_RT_OFFSET			42924
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ3_RT_OFFSET		42925
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ4_RT_OFFSET		42926
-#define PBF_REG_BTB_GUARANTEED_VOQ4_RT_OFFSET			42927
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ4_RT_OFFSET		42928
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ5_RT_OFFSET		42929
-#define PBF_REG_BTB_GUARANTEED_VOQ5_RT_OFFSET			42930
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ5_RT_OFFSET		42931
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ6_RT_OFFSET		42932
-#define PBF_REG_BTB_GUARANTEED_VOQ6_RT_OFFSET			42933
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ6_RT_OFFSET		42934
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ7_RT_OFFSET		42935
-#define PBF_REG_BTB_GUARANTEED_VOQ7_RT_OFFSET			42936
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ7_RT_OFFSET		42937
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ8_RT_OFFSET		42938
-#define PBF_REG_BTB_GUARANTEED_VOQ8_RT_OFFSET			42939
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ8_RT_OFFSET		42940
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ9_RT_OFFSET		42941
-#define PBF_REG_BTB_GUARANTEED_VOQ9_RT_OFFSET			42942
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ9_RT_OFFSET		42943
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ10_RT_OFFSET		42944
-#define PBF_REG_BTB_GUARANTEED_VOQ10_RT_OFFSET			42945
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ10_RT_OFFSET		42946
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ11_RT_OFFSET		42947
-#define PBF_REG_BTB_GUARANTEED_VOQ11_RT_OFFSET			42948
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ11_RT_OFFSET		42949
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ12_RT_OFFSET		42950
-#define PBF_REG_BTB_GUARANTEED_VOQ12_RT_OFFSET			42951
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ12_RT_OFFSET		42952
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ13_RT_OFFSET		42953
-#define PBF_REG_BTB_GUARANTEED_VOQ13_RT_OFFSET			42954
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ13_RT_OFFSET		42955
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ14_RT_OFFSET		42956
-#define PBF_REG_BTB_GUARANTEED_VOQ14_RT_OFFSET			42957
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ14_RT_OFFSET		42958
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ15_RT_OFFSET		42959
-#define PBF_REG_BTB_GUARANTEED_VOQ15_RT_OFFSET			42960
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ15_RT_OFFSET		42961
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ16_RT_OFFSET		42962
-#define PBF_REG_BTB_GUARANTEED_VOQ16_RT_OFFSET			42963
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ16_RT_OFFSET		42964
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ17_RT_OFFSET		42965
-#define PBF_REG_BTB_GUARANTEED_VOQ17_RT_OFFSET			42966
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ17_RT_OFFSET		42967
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ18_RT_OFFSET		42968
-#define PBF_REG_BTB_GUARANTEED_VOQ18_RT_OFFSET			42969
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ18_RT_OFFSET		42970
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ19_RT_OFFSET		42971
-#define PBF_REG_BTB_GUARANTEED_VOQ19_RT_OFFSET			42972
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ19_RT_OFFSET		42973
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ20_RT_OFFSET		42974
-#define PBF_REG_BTB_GUARANTEED_VOQ20_RT_OFFSET			42975
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ20_RT_OFFSET		42976
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ21_RT_OFFSET		42977
-#define PBF_REG_BTB_GUARANTEED_VOQ21_RT_OFFSET			42978
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ21_RT_OFFSET		42979
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ22_RT_OFFSET		42980
-#define PBF_REG_BTB_GUARANTEED_VOQ22_RT_OFFSET			42981
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ22_RT_OFFSET		42982
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ23_RT_OFFSET		42983
-#define PBF_REG_BTB_GUARANTEED_VOQ23_RT_OFFSET			42984
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ23_RT_OFFSET		42985
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ24_RT_OFFSET		42986
-#define PBF_REG_BTB_GUARANTEED_VOQ24_RT_OFFSET			42987
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ24_RT_OFFSET		42988
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ25_RT_OFFSET		42989
-#define PBF_REG_BTB_GUARANTEED_VOQ25_RT_OFFSET			42990
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ25_RT_OFFSET		42991
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ26_RT_OFFSET		42992
-#define PBF_REG_BTB_GUARANTEED_VOQ26_RT_OFFSET			42993
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ26_RT_OFFSET		42994
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ27_RT_OFFSET		42995
-#define PBF_REG_BTB_GUARANTEED_VOQ27_RT_OFFSET			42996
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ27_RT_OFFSET		42997
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ28_RT_OFFSET		42998
-#define PBF_REG_BTB_GUARANTEED_VOQ28_RT_OFFSET			42999
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ28_RT_OFFSET		43000
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ29_RT_OFFSET		43001
-#define PBF_REG_BTB_GUARANTEED_VOQ29_RT_OFFSET			43002
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ29_RT_OFFSET		43003
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ30_RT_OFFSET		43004
-#define PBF_REG_BTB_GUARANTEED_VOQ30_RT_OFFSET			43005
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ30_RT_OFFSET		43006
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ31_RT_OFFSET		43007
-#define PBF_REG_BTB_GUARANTEED_VOQ31_RT_OFFSET			43008
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ31_RT_OFFSET		43009
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ32_RT_OFFSET		43010
-#define PBF_REG_BTB_GUARANTEED_VOQ32_RT_OFFSET			43011
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ32_RT_OFFSET		43012
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ33_RT_OFFSET		43013
-#define PBF_REG_BTB_GUARANTEED_VOQ33_RT_OFFSET			43014
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ33_RT_OFFSET		43015
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ34_RT_OFFSET		43016
-#define PBF_REG_BTB_GUARANTEED_VOQ34_RT_OFFSET			43017
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ34_RT_OFFSET		43018
-#define PBF_REG_YCMD_QS_NUM_LINES_VOQ35_RT_OFFSET		43019
-#define PBF_REG_BTB_GUARANTEED_VOQ35_RT_OFFSET			43020
-#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ35_RT_OFFSET		43021
-#define XCM_REG_CON_PHY_Q3_RT_OFFSET				43022
-
-#define RUNTIME_ARRAY_SIZE	43023
+#define NIG_REG_PPF_TO_ENGINE_SEL_RT_OFFSET                             39786
+#define NIG_REG_PPF_TO_ENGINE_SEL_RT_SIZE                               8
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_VALUE_RT_OFFSET                  39794
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_VALUE_RT_SIZE                    1024
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_EN_RT_OFFSET                     40818
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_EN_RT_SIZE                       512
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_MODE_RT_OFFSET                   41330
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_MODE_RT_SIZE                     512
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_PROTOCOL_TYPE_RT_OFFSET          41842
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_PROTOCOL_TYPE_RT_SIZE            512
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_HDR_SEL_RT_OFFSET                42354
+#define NIG_REG_LLH_PF_CLS_FUNC_FILTER_HDR_SEL_RT_SIZE                  512
+#define NIG_REG_LLH_PF_CLS_FILTERS_MAP_RT_OFFSET                        42866
+#define NIG_REG_LLH_PF_CLS_FILTERS_MAP_RT_SIZE                          32
+#define CDU_REG_CID_ADDR_PARAMS_RT_OFFSET                               42898
+#define CDU_REG_SEGMENT0_PARAMS_RT_OFFSET                               42899
+#define CDU_REG_SEGMENT1_PARAMS_RT_OFFSET                               42900
+#define CDU_REG_PF_SEG0_TYPE_OFFSET_RT_OFFSET                           42901
+#define CDU_REG_PF_SEG1_TYPE_OFFSET_RT_OFFSET                           42902
+#define CDU_REG_PF_SEG2_TYPE_OFFSET_RT_OFFSET                           42903
+#define CDU_REG_PF_SEG3_TYPE_OFFSET_RT_OFFSET                           42904
+#define CDU_REG_PF_FL_SEG0_TYPE_OFFSET_RT_OFFSET                        42905
+#define CDU_REG_PF_FL_SEG1_TYPE_OFFSET_RT_OFFSET                        42906
+#define CDU_REG_PF_FL_SEG2_TYPE_OFFSET_RT_OFFSET                        42907
+#define CDU_REG_PF_FL_SEG3_TYPE_OFFSET_RT_OFFSET                        42908
+#define CDU_REG_VF_SEG_TYPE_OFFSET_RT_OFFSET                            42909
+#define CDU_REG_VF_FL_SEG_TYPE_OFFSET_RT_OFFSET                         42910
+#define PBF_REG_TAG_ETHERTYPE_0_RT_OFFSET                               42911
+#define PBF_REG_BTB_SHARED_AREA_SIZE_RT_OFFSET                          42912
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ0_RT_OFFSET                        42913
+#define PBF_REG_BTB_GUARANTEED_VOQ0_RT_OFFSET                           42914
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ0_RT_OFFSET                    42915
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ1_RT_OFFSET                        42916
+#define PBF_REG_BTB_GUARANTEED_VOQ1_RT_OFFSET                           42917
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ1_RT_OFFSET                    42918
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ2_RT_OFFSET                        42919
+#define PBF_REG_BTB_GUARANTEED_VOQ2_RT_OFFSET                           42920
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ2_RT_OFFSET                    42921
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ3_RT_OFFSET                        42922
+#define PBF_REG_BTB_GUARANTEED_VOQ3_RT_OFFSET                           42923
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ3_RT_OFFSET                    42924
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ4_RT_OFFSET                        42925
+#define PBF_REG_BTB_GUARANTEED_VOQ4_RT_OFFSET                           42926
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ4_RT_OFFSET                    42927
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ5_RT_OFFSET                        42928
+#define PBF_REG_BTB_GUARANTEED_VOQ5_RT_OFFSET                           42929
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ5_RT_OFFSET                    42930
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ6_RT_OFFSET                        42931
+#define PBF_REG_BTB_GUARANTEED_VOQ6_RT_OFFSET                           42932
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ6_RT_OFFSET                    42933
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ7_RT_OFFSET                        42934
+#define PBF_REG_BTB_GUARANTEED_VOQ7_RT_OFFSET                           42935
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ7_RT_OFFSET                    42936
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ8_RT_OFFSET                        42937
+#define PBF_REG_BTB_GUARANTEED_VOQ8_RT_OFFSET                           42938
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ8_RT_OFFSET                    42939
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ9_RT_OFFSET                        42940
+#define PBF_REG_BTB_GUARANTEED_VOQ9_RT_OFFSET                           42941
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ9_RT_OFFSET                    42942
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ10_RT_OFFSET                       42943
+#define PBF_REG_BTB_GUARANTEED_VOQ10_RT_OFFSET                          42944
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ10_RT_OFFSET                   42945
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ11_RT_OFFSET                       42946
+#define PBF_REG_BTB_GUARANTEED_VOQ11_RT_OFFSET                          42947
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ11_RT_OFFSET                   42948
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ12_RT_OFFSET                       42949
+#define PBF_REG_BTB_GUARANTEED_VOQ12_RT_OFFSET                          42950
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ12_RT_OFFSET                   42951
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ13_RT_OFFSET                       42952
+#define PBF_REG_BTB_GUARANTEED_VOQ13_RT_OFFSET                          42953
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ13_RT_OFFSET                   42954
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ14_RT_OFFSET                       42955
+#define PBF_REG_BTB_GUARANTEED_VOQ14_RT_OFFSET                          42956
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ14_RT_OFFSET                   42957
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ15_RT_OFFSET                       42958
+#define PBF_REG_BTB_GUARANTEED_VOQ15_RT_OFFSET                          42959
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ15_RT_OFFSET                   42960
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ16_RT_OFFSET                       42961
+#define PBF_REG_BTB_GUARANTEED_VOQ16_RT_OFFSET                          42962
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ16_RT_OFFSET                   42963
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ17_RT_OFFSET                       42964
+#define PBF_REG_BTB_GUARANTEED_VOQ17_RT_OFFSET                          42965
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ17_RT_OFFSET                   42966
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ18_RT_OFFSET                       42967
+#define PBF_REG_BTB_GUARANTEED_VOQ18_RT_OFFSET                          42968
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ18_RT_OFFSET                   42969
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ19_RT_OFFSET                       42970
+#define PBF_REG_BTB_GUARANTEED_VOQ19_RT_OFFSET                          42971
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ19_RT_OFFSET                   42972
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ20_RT_OFFSET                       42973
+#define PBF_REG_BTB_GUARANTEED_VOQ20_RT_OFFSET                          42974
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ20_RT_OFFSET                   42975
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ21_RT_OFFSET                       42976
+#define PBF_REG_BTB_GUARANTEED_VOQ21_RT_OFFSET                          42977
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ21_RT_OFFSET                   42978
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ22_RT_OFFSET                       42979
+#define PBF_REG_BTB_GUARANTEED_VOQ22_RT_OFFSET                          42980
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ22_RT_OFFSET                   42981
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ23_RT_OFFSET                       42982
+#define PBF_REG_BTB_GUARANTEED_VOQ23_RT_OFFSET                          42983
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ23_RT_OFFSET                   42984
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ24_RT_OFFSET                       42985
+#define PBF_REG_BTB_GUARANTEED_VOQ24_RT_OFFSET                          42986
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ24_RT_OFFSET                   42987
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ25_RT_OFFSET                       42988
+#define PBF_REG_BTB_GUARANTEED_VOQ25_RT_OFFSET                          42989
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ25_RT_OFFSET                   42990
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ26_RT_OFFSET                       42991
+#define PBF_REG_BTB_GUARANTEED_VOQ26_RT_OFFSET                          42992
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ26_RT_OFFSET                   42993
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ27_RT_OFFSET                       42994
+#define PBF_REG_BTB_GUARANTEED_VOQ27_RT_OFFSET                          42995
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ27_RT_OFFSET                   42996
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ28_RT_OFFSET                       42997
+#define PBF_REG_BTB_GUARANTEED_VOQ28_RT_OFFSET                          42998
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ28_RT_OFFSET                   42999
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ29_RT_OFFSET                       43000
+#define PBF_REG_BTB_GUARANTEED_VOQ29_RT_OFFSET                          43001
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ29_RT_OFFSET                   43002
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ30_RT_OFFSET                       43003
+#define PBF_REG_BTB_GUARANTEED_VOQ30_RT_OFFSET                          43004
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ30_RT_OFFSET                   43005
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ31_RT_OFFSET                       43006
+#define PBF_REG_BTB_GUARANTEED_VOQ31_RT_OFFSET                          43007
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ31_RT_OFFSET                   43008
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ32_RT_OFFSET                       43009
+#define PBF_REG_BTB_GUARANTEED_VOQ32_RT_OFFSET                          43010
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ32_RT_OFFSET                   43011
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ33_RT_OFFSET                       43012
+#define PBF_REG_BTB_GUARANTEED_VOQ33_RT_OFFSET                          43013
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ33_RT_OFFSET                   43014
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ34_RT_OFFSET                       43015
+#define PBF_REG_BTB_GUARANTEED_VOQ34_RT_OFFSET                          43016
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ34_RT_OFFSET                   43017
+#define PBF_REG_YCMD_QS_NUM_LINES_VOQ35_RT_OFFSET                       43018
+#define PBF_REG_BTB_GUARANTEED_VOQ35_RT_OFFSET                          43019
+#define PBF_REG_BTB_SHARED_AREA_SETUP_VOQ35_RT_OFFSET                   43020
+#define XCM_REG_CON_PHY_Q3_RT_OFFSET                                    43021
+
+#define RUNTIME_ARRAY_SIZE 43022
+
 
 /* Init Callbacks */
 #define DMAE_READY_CB	0
@@ -5694,8 +5792,10 @@ struct eth_vport_rx_mode {
 #define ETH_VPORT_RX_MODE_MCAST_ACCEPT_ALL_SHIFT	4
 #define ETH_VPORT_RX_MODE_BCAST_ACCEPT_ALL_MASK		0x1
 #define ETH_VPORT_RX_MODE_BCAST_ACCEPT_ALL_SHIFT	5
-#define ETH_VPORT_RX_MODE_RESERVED1_MASK		0x3FF
-#define ETH_VPORT_RX_MODE_RESERVED1_SHIFT		6
+#define ETH_VPORT_RX_MODE_ACCEPT_ANY_VNI_MASK		0x1
+#define ETH_VPORT_RX_MODE_ACCEPT_ANY_VNI_SHIFT		6
+#define ETH_VPORT_RX_MODE_RESERVED1_MASK		0x1FF
+#define ETH_VPORT_RX_MODE_RESERVED1_SHIFT		7
 };
 
 /* Command for setting tpa parameters */
@@ -6756,7 +6856,7 @@ struct e4_ystorm_rdma_task_ag_ctx {
 #define E4_YSTORM_RDMA_TASK_AG_CTX_RULE6EN_MASK		0x1
 #define E4_YSTORM_RDMA_TASK_AG_CTX_RULE6EN_SHIFT	7
 	u8 key;
-	__le32 mw_cnt;
+	__le32 mw_cnt_or_qp_id;
 	u8 ref_cnt_seq;
 	u8 ctx_upd_seq;
 	__le16 dif_flags;
@@ -6812,7 +6912,7 @@ struct e4_mstorm_rdma_task_ag_ctx {
 #define E4_MSTORM_RDMA_TASK_AG_CTX_RULE6EN_MASK		0x1
 #define E4_MSTORM_RDMA_TASK_AG_CTX_RULE6EN_SHIFT	7
 	u8 key;
-	__le32 mw_cnt;
+	__le32 mw_cnt_or_qp_id;
 	u8 ref_cnt_seq;
 	u8 ctx_upd_seq;
 	__le16 dif_flags;
@@ -7075,8 +7175,7 @@ struct rdma_register_tid_ramrod_data {
 	struct regpair va;
 	struct regpair pbl_base;
 	struct regpair dif_error_addr;
-	struct regpair dif_runt_addr;
-	__le32 reserved4[2];
+	__le32 reserved4[4];
 };
 
 /* rdma resize cq output params */
@@ -7144,8 +7243,7 @@ struct rdma_srq_modify_ramrod_data {
 enum rdma_tid_type {
 	RDMA_TID_REGISTERED_MR,
 	RDMA_TID_FMR,
-	RDMA_TID_MW_TYPE1,
-	RDMA_TID_MW_TYPE2A,
+	RDMA_TID_MW,
 	MAX_RDMA_TID_TYPE
 };
 
@@ -7681,6 +7779,16 @@ struct e4_roce_conn_context {
 	struct ustorm_roce_conn_st_ctx ustorm_st_context;
 };
 
+/* roce cqes statistics */
+struct roce_cqe_stats {
+	__le32 req_cqe_error;
+	__le32 req_remote_access_errors;
+	__le32 req_remote_invalid_request;
+	__le32 resp_cqe_error;
+	__le32 resp_local_length_error;
+	__le32 reserved;
+};
+
 /* roce create qp requester ramrod data */
 struct roce_create_qp_req_ramrod_data {
 	__le16 flags;
@@ -7798,8 +7906,8 @@ struct roce_dcqcn_sent_stats {
 
 /* RoCE destroy qp requester output params */
 struct roce_destroy_qp_req_output_params {
-	__le32 num_bound_mw;
 	__le32 cq_prod;
+	__le32 reserved;
 };
 
 /* RoCE destroy qp requester ramrod data */
@@ -7809,8 +7917,8 @@ struct roce_destroy_qp_req_ramrod_data {
 
 /* RoCE destroy qp responder output params */
 struct roce_destroy_qp_resp_output_params {
-	__le32 num_invalidated_mw;
 	__le32 cq_prod;
+	__le32 reserved;
 };
 
 /* RoCE destroy qp responder ramrod data */
@@ -7818,16 +7926,27 @@ struct roce_destroy_qp_resp_ramrod_data {
 	struct regpair output_params_addr;
 };
 
+/* roce error statistics */
+struct roce_error_stats {
+	__le32 resp_remote_access_errors;
+	__le32 reserved;
+};
+
 /* roce special events statistics */
 struct roce_events_stats {
-	__le16 silent_drops;
-	__le16 rnr_naks_sent;
+	__le32 silent_drops;
+	__le32 rnr_naks_sent;
 	__le32 retransmit_count;
 	__le32 icrc_error_count;
-	__le32 reserved;
+	__le32 implied_nak_seq_err;
+	__le32 duplicate_request;
+	__le32 local_ack_timeout_err;
+	__le32 out_of_sequence;
+	__le32 packet_seq_err;
+	__le32 rnr_nak_retry_err;
 };
 
-/* ROCE slow path EQ cmd IDs */
+/* roce slow path EQ cmd IDs */
 enum roce_event_opcode {
 	ROCE_EVENT_CREATE_QP = 11,
 	ROCE_EVENT_MODIFY_QP,
@@ -7845,6 +7964,9 @@ struct roce_init_func_params {
 	u8 cnp_dscp;
 	u8 reserved;
 	__le32 cnp_send_timeout;
+	__le16 rl_offset;
+	u8 rl_count_log;
+	u8 reserved1[5];
 };
 
 /* roce func init ramrod data */
@@ -8532,7 +8654,7 @@ struct e4_tstorm_roce_resp_conn_ag_ctx {
 	__le16 rq_prod;
 	__le16 conn_dpi;
 	__le16 irq_cons;
-	__le32 num_invlidated_mw;
+	__le32 reg9;
 	__le32 reg10;
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c
index fca2dbd..70504dc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hw.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c
@@ -360,6 +360,26 @@ void qed_port_unpretend(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 	       *(u32 *)&p_ptt->pxp.pretend);
 }
 
+void qed_port_fid_pretend(struct qed_hwfn *p_hwfn,
+			  struct qed_ptt *p_ptt, u8 port_id, u16 fid)
+{
+	u16 control = 0;
+
+	SET_FIELD(control, PXP_PRETEND_CMD_PORT, port_id);
+	SET_FIELD(control, PXP_PRETEND_CMD_USE_PORT, 1);
+	SET_FIELD(control, PXP_PRETEND_CMD_PRETEND_PORT, 1);
+	SET_FIELD(control, PXP_PRETEND_CMD_IS_CONCRETE, 1);
+	SET_FIELD(control, PXP_PRETEND_CMD_PRETEND_FUNCTION, 1);
+	if (!GET_FIELD(fid, PXP_CONCRETE_FID_VFVALID))
+		fid = GET_FIELD(fid, PXP_CONCRETE_FID_PFID);
+	p_ptt->pxp.pretend.control = cpu_to_le16(control);
+	p_ptt->pxp.pretend.fid.concrete_fid.fid = cpu_to_le16(fid);
+	REG_WR(p_hwfn,
+	       qed_ptt_config_addr(p_ptt) +
+	       offsetof(struct pxp_ptt_entry, pretend),
+	       *(u32 *)&p_ptt->pxp.pretend);
+}
+
 u32 qed_vfid_to_concrete(struct qed_hwfn *p_hwfn, u8 vfid)
 {
 	u32 concrete_fid = 0;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.h b/drivers/net/ethernet/qlogic/qed/qed_hw.h
index 8db2839..505e94d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hw.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hw.h
@@ -245,6 +245,18 @@ void qed_port_unpretend(struct qed_hwfn *p_hwfn,
 			struct qed_ptt *p_ptt);
 
 /**
+ * @brief qed_port_fid_pretend - pretend to another port and another function
+ *        when accessing the ptt window
+ *
+ * @param p_hwfn
+ * @param p_ptt
+ * @param port_id - the port to pretend to
+ * @param fid - fid field of pxp_pretend structure. Can contain either pf / vf.
+ */
+void qed_port_fid_pretend(struct qed_hwfn *p_hwfn,
+			  struct qed_ptt *p_ptt, u8 port_id, u16 fid);
+
+/**
  * @brief qed_vfid_to_concrete - build a concrete FID for a
  *        given VF ID
  *
diff --git a/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
index 1365da7..d845bad 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
@@ -1245,7 +1245,7 @@ void qed_gft_config(struct qed_hwfn *p_hwfn,
 		    bool udp,
 		    bool ipv4, bool ipv6, enum gft_profile_type profile_type)
 {
-	u32 reg_val, cam_line, ram_line_lo, ram_line_hi;
+	u32 reg_val, cam_line, ram_line_lo, ram_line_hi, search_non_ip_as_gft;
 
 	if (!ipv6 && !ipv4)
 		DP_NOTICE(p_hwfn,
@@ -1314,6 +1314,9 @@ void qed_gft_config(struct qed_hwfn *p_hwfn,
 	ram_line_lo = 0;
 	ram_line_hi = 0;
 
+	/* Search no IP as GFT */
+	search_non_ip_as_gft = 0;
+
 	/* Tunnel type */
 	SET_FIELD(ram_line_lo, GFT_RAM_LINE_TUNNEL_DST_PORT, 1);
 	SET_FIELD(ram_line_lo, GFT_RAM_LINE_TUNNEL_OVER_IP_PROTOCOL, 1);
@@ -1337,9 +1340,14 @@ void qed_gft_config(struct qed_hwfn *p_hwfn,
 		SET_FIELD(ram_line_lo, GFT_RAM_LINE_ETHERTYPE, 1);
 	} else if (profile_type == GFT_PROFILE_TYPE_TUNNEL_TYPE) {
 		SET_FIELD(ram_line_lo, GFT_RAM_LINE_TUNNEL_ETHERTYPE, 1);
+
+		/* Allow tunneled traffic without inner IP */
+		search_non_ip_as_gft = 1;
 	}
 
 	qed_wr(p_hwfn,
+	       p_ptt, PRS_REG_SEARCH_NON_IP_AS_GFT, search_non_ip_as_gft);
+	qed_wr(p_hwfn,
 	       p_ptt,
 	       PRS_REG_GFT_PROFILE_MASK_RAM + RAM_LINE_SIZE * pf_id,
 	       ram_line_lo);
@@ -1509,3 +1517,43 @@ void qed_enable_context_validation(struct qed_hwfn *p_hwfn,
 	ctx_validation = CDU_VALIDATION_DEFAULT_CFG << 8;
 	qed_wr(p_hwfn, p_ptt, CDU_REG_TCFC_CTX_VALID0, ctx_validation);
 }
+
+static u32 qed_get_rdma_assert_ram_addr(struct qed_hwfn *p_hwfn, u8 storm_id)
+{
+	switch (storm_id) {
+	case 0:
+		return TSEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    TSTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+	case 1:
+		return MSEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    MSTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+	case 2:
+		return USEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    USTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+	case 3:
+		return XSEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    XSTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+	case 4:
+		return YSEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    YSTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+	case 5:
+		return PSEM_REG_FAST_MEMORY + SEM_FAST_REG_INT_RAM +
+		    PSTORM_RDMA_ASSERT_LEVEL_OFFSET(p_hwfn->rel_pf_id);
+
+	default:
+		return 0;
+	}
+}
+
+void qed_set_rdma_error_level(struct qed_hwfn *p_hwfn,
+			      struct qed_ptt *p_ptt,
+			      u8 assert_level[NUM_STORMS])
+{
+	u8 storm_id;
+
+	for (storm_id = 0; storm_id < NUM_STORMS; storm_id++) {
+		u32 ram_addr = qed_get_rdma_assert_ram_addr(p_hwfn, storm_id);
+
+		qed_wr(p_hwfn, p_ptt, ram_addr, assert_level[storm_id]);
+	}
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 474e6cf..90a2b53 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1159,7 +1159,6 @@ int qed_iwarp_connect(void *rdma_cxt,
 	struct qed_iwarp_info *iwarp_info;
 	struct qed_iwarp_ep *ep;
 	u8 mpa_data_size = 0;
-	u8 ts_hdr_size = 0;
 	u32 cid;
 	int rc;
 
@@ -1218,10 +1217,7 @@ int qed_iwarp_connect(void *rdma_cxt,
 	       iparams->cm_info.private_data,
 	       iparams->cm_info.private_data_len);
 
-	if (p_hwfn->p_rdma_info->iwarp.tcp_flags & QED_IWARP_TS_EN)
-		ts_hdr_size = TIMESTAMP_HEADER_SIZE;
-
-	ep->mss = iparams->mss - ts_hdr_size;
+	ep->mss = iparams->mss;
 	ep->mss = min_t(u16, QED_IWARP_MAX_FW_MSS, ep->mss);
 
 	ep->event_cb = iparams->event_cb;
@@ -2337,7 +2333,6 @@ qed_iwarp_ll2_comp_syn_pkt(void *cxt, struct qed_ll2_comp_rx_data *data)
 	u8 local_mac_addr[ETH_ALEN];
 	struct qed_iwarp_ep *ep;
 	int tcp_start_offset;
-	u8 ts_hdr_size = 0;
 	u8 ll2_syn_handle;
 	int payload_len;
 	u32 hdr_size;
@@ -2415,11 +2410,7 @@ qed_iwarp_ll2_comp_syn_pkt(void *cxt, struct qed_ll2_comp_rx_data *data)
 
 	memcpy(&ep->cm_info, &cm_info, sizeof(ep->cm_info));
 
-	if (p_hwfn->p_rdma_info->iwarp.tcp_flags & QED_IWARP_TS_EN)
-		ts_hdr_size = TIMESTAMP_HEADER_SIZE;
-
-	hdr_size = ((cm_info.ip_version == QED_TCP_IPV4) ? 40 : 60) +
-		   ts_hdr_size;
+	hdr_size = ((cm_info.ip_version == QED_TCP_IPV4) ? 40 : 60);
 	ep->mss = p_hwfn->p_rdma_info->iwarp.max_mtu - hdr_size;
 	ep->mss = min_t(u16, QED_IWARP_MAX_FW_MSS, ep->mss);
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index eed4725..1f6ac848 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -586,6 +586,9 @@ qed_sp_update_accept_mode(struct qed_hwfn *p_hwfn,
 		SET_FIELD(state, ETH_VPORT_RX_MODE_BCAST_ACCEPT_ALL,
 			  !!(accept_filter & QED_ACCEPT_BCAST));
 
+		SET_FIELD(state, ETH_VPORT_RX_MODE_ACCEPT_ANY_VNI,
+			  !!(accept_filter & QED_ACCEPT_ANY_VNI));
+
 		p_ramrod->rx_mode.state = cpu_to_le16(state);
 		DP_VERBOSE(p_hwfn, QED_MSG_SP,
 			   "p_ramrod->rx_mode.state = 0x%x\n", state);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h b/drivers/net/ethernet/qlogic/qed/qed_l2.h
index c4030e9..806a8da 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h
@@ -183,6 +183,7 @@ struct qed_filter_accept_flags {
 #define QED_ACCEPT_MCAST_MATCHED        0x08
 #define QED_ACCEPT_MCAST_UNMATCHED      0x10
 #define QED_ACCEPT_BCAST                0x20
+#define QED_ACCEPT_ANY_VNI              0x40
 };
 
 struct qed_arfs_config_params {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index b870510..101d677 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -1508,11 +1508,8 @@ qed_rdma_register_tid(void *rdma_cxt,
 	case QED_RDMA_TID_FMR:
 		tid_type = RDMA_TID_FMR;
 		break;
-	case QED_RDMA_TID_MW_TYPE1:
-		tid_type = RDMA_TID_MW_TYPE1;
-		break;
-	case QED_RDMA_TID_MW_TYPE2A:
-		tid_type = RDMA_TID_MW_TYPE2A;
+	case QED_RDMA_TID_MW:
+		tid_type = RDMA_TID_MW;
 		break;
 	default:
 		rc = -EINVAL;
@@ -1544,7 +1541,6 @@ qed_rdma_register_tid(void *rdma_cxt,
 			  RDMA_REGISTER_TID_RAMROD_DATA_DIF_ON_HOST_FLG, 1);
 		DMA_REGPAIR_LE(p_ramrod->dif_error_addr,
 			       params->dif_error_addr);
-		DMA_REGPAIR_LE(p_ramrod->dif_runt_addr, params->dif_runt_addr);
 	}
 
 	rc = qed_spq_post(p_hwfn, p_ent, &fw_return_code);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
index f712205..d8ad2dc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
@@ -178,7 +178,7 @@
 	0x008c80UL
 #define  MCP_REG_SCRATCH	\
 	0xe20000UL
-#define  CNIG_REG_NW_PORT_MODE_BB_B0 \
+#define  CNIG_REG_NW_PORT_MODE_BB \
 	0x218200UL
 #define  MISCS_REG_CHIP_NUM \
 	0x00976cUL
@@ -1621,6 +1621,7 @@
 #define NIG_REG_TX_EDPM_CTRL_TX_EDPM_TC_EN_SHIFT 1
 
 #define PRS_REG_SEARCH_GFT 0x1f11bcUL
+#define PRS_REG_SEARCH_NON_IP_AS_GFT 0x1f11c0UL
 #define PRS_REG_CM_HDR_GFT 0x1f11c8UL
 #define PRS_REG_GFT_CAM 0x1f1100UL
 #define PRS_REG_GFT_PROFILE_MASK_RAM 0x1f1000UL
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index ee57fcd..b5ce158 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -681,7 +681,6 @@ static int qed_roce_sp_modify_requester(struct qed_hwfn *p_hwfn,
 
 static int qed_roce_sp_destroy_qp_responder(struct qed_hwfn *p_hwfn,
 					    struct qed_rdma_qp *qp,
-					    u32 *num_invalidated_mw,
 					    u32 *cq_prod)
 {
 	struct roce_destroy_qp_resp_output_params *p_ramrod_res;
@@ -692,8 +691,6 @@ static int qed_roce_sp_destroy_qp_responder(struct qed_hwfn *p_hwfn,
 	int rc;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "icid = %08x\n", qp->icid);
-
-	*num_invalidated_mw = 0;
 	*cq_prod = qp->cq_prod;
 
 	if (!qp->resp_offloaded) {
@@ -742,7 +739,6 @@ static int qed_roce_sp_destroy_qp_responder(struct qed_hwfn *p_hwfn,
 	if (rc)
 		goto err;
 
-	*num_invalidated_mw = le32_to_cpu(p_ramrod_res->num_invalidated_mw);
 	*cq_prod = le32_to_cpu(p_ramrod_res->cq_prod);
 	qp->cq_prod = *cq_prod;
 
@@ -764,8 +760,7 @@ static int qed_roce_sp_destroy_qp_responder(struct qed_hwfn *p_hwfn,
 }
 
 static int qed_roce_sp_destroy_qp_requester(struct qed_hwfn *p_hwfn,
-					    struct qed_rdma_qp *qp,
-					    u32 *num_bound_mw)
+					    struct qed_rdma_qp *qp)
 {
 	struct roce_destroy_qp_req_output_params *p_ramrod_res;
 	struct roce_destroy_qp_req_ramrod_data *p_ramrod;
@@ -807,7 +802,6 @@ static int qed_roce_sp_destroy_qp_requester(struct qed_hwfn *p_hwfn,
 	if (rc)
 		goto err;
 
-	*num_bound_mw = le32_to_cpu(p_ramrod_res->num_bound_mw);
 
 	/* Free ORQ - only if ramrod succeeded, in case FW is still using it */
 	dma_free_coherent(&p_hwfn->cdev->pdev->dev,
@@ -968,8 +962,6 @@ int qed_roce_query_qp(struct qed_hwfn *p_hwfn,
 
 int qed_roce_destroy_qp(struct qed_hwfn *p_hwfn, struct qed_rdma_qp *qp)
 {
-	u32 num_invalidated_mw = 0;
-	u32 num_bound_mw = 0;
 	u32 cq_prod;
 	int rc;
 
@@ -984,22 +976,14 @@ int qed_roce_destroy_qp(struct qed_hwfn *p_hwfn, struct qed_rdma_qp *qp)
 
 	if (qp->cur_state != QED_ROCE_QP_STATE_RESET) {
 		rc = qed_roce_sp_destroy_qp_responder(p_hwfn, qp,
-						      &num_invalidated_mw,
 						      &cq_prod);
 		if (rc)
 			return rc;
 
 		/* Send destroy requester ramrod */
-		rc = qed_roce_sp_destroy_qp_requester(p_hwfn, qp,
-						      &num_bound_mw);
+		rc = qed_roce_sp_destroy_qp_requester(p_hwfn, qp);
 		if (rc)
 			return rc;
-
-		if (num_invalidated_mw != num_bound_mw) {
-			DP_NOTICE(p_hwfn,
-				  "number of invalidate memory windows is different from bounded ones\n");
-			return -EINVAL;
-		}
 	}
 
 	return 0;
@@ -1010,7 +994,6 @@ int qed_roce_modify_qp(struct qed_hwfn *p_hwfn,
 		       enum qed_roce_qp_state prev_state,
 		       struct qed_rdma_modify_qp_in_params *params)
 {
-	u32 num_invalidated_mw = 0, num_bound_mw = 0;
 	int rc = 0;
 
 	/* Perform additional operations according to the current state and the
@@ -1090,7 +1073,6 @@ int qed_roce_modify_qp(struct qed_hwfn *p_hwfn,
 		/* Send destroy responder ramrod */
 		rc = qed_roce_sp_destroy_qp_responder(p_hwfn,
 						      qp,
-						      &num_invalidated_mw,
 						      &cq_prod);
 
 		if (rc)
@@ -1098,14 +1080,7 @@ int qed_roce_modify_qp(struct qed_hwfn *p_hwfn,
 
 		qp->cq_prod = cq_prod;
 
-		rc = qed_roce_sp_destroy_qp_requester(p_hwfn, qp,
-						      &num_bound_mw);
-
-		if (num_invalidated_mw != num_bound_mw) {
-			DP_NOTICE(p_hwfn,
-				  "number of invalidate memory windows is different from bounded ones\n");
-			return -EINVAL;
-		}
+		rc = qed_roce_sp_destroy_qp_requester(p_hwfn, qp);
 	} else {
 		DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "0\n");
 	}
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index ff21aa1..2f0a4f2 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -471,14 +471,8 @@ static u16 qedi_calc_mss(u16 pmtu, u8 is_ipv6, u8 tcp_ts_en, u8 vlan_en)
 	else
 		hdrs += IPV4_HDR_LEN;
 
-	if (vlan_en)
-		hdrs += VLAN_LEN;
-
 	mss = pmtu - hdrs;
 
-	if (tcp_ts_en)
-		mss -= TCP_OPTION_LEN;
-
 	if (!mss)
 		mss = DEF_MSS;
 
diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index 13c8ab1..0081fa6 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -109,8 +109,8 @@
 #define MAX_NUM_LL2_TX_STATS_COUNTERS	48
 
 #define FW_MAJOR_VERSION	8
-#define FW_MINOR_VERSION	33
-#define FW_REVISION_VERSION     11
+#define FW_MINOR_VERSION        37
+#define FW_REVISION_VERSION     2
 #define FW_ENGINEERING_VERSION	0
 
 /***********************/
diff --git a/include/linux/qed/iscsi_common.h b/include/linux/qed/iscsi_common.h
index 938df61..b34c573 100644
--- a/include/linux/qed/iscsi_common.h
+++ b/include/linux/qed/iscsi_common.h
@@ -799,8 +799,8 @@ struct e4_mstorm_iscsi_task_ag_ctx {
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_CONNECTION_TYPE_SHIFT	0
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_EXIST_IN_QM0_MASK		0x1
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_EXIST_IN_QM0_SHIFT		4
-#define E4_MSTORM_ISCSI_TASK_AG_CTX_BIT1_MASK			0x1
-#define E4_MSTORM_ISCSI_TASK_AG_CTX_BIT1_SHIFT			5
+#define E4_MSTORM_ISCSI_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK	0x1
+#define E4_MSTORM_ISCSI_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT	5
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_VALID_MASK			0x1
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_VALID_SHIFT			6
 #define E4_MSTORM_ISCSI_TASK_AG_CTX_TASK_CLEANUP_FLAG_MASK	0x1
@@ -849,8 +849,8 @@ struct e4_ustorm_iscsi_task_ag_ctx {
 #define E4_USTORM_ISCSI_TASK_AG_CTX_CONNECTION_TYPE_SHIFT	0
 #define E4_USTORM_ISCSI_TASK_AG_CTX_EXIST_IN_QM0_MASK		0x1
 #define E4_USTORM_ISCSI_TASK_AG_CTX_EXIST_IN_QM0_SHIFT		4
-#define E4_USTORM_ISCSI_TASK_AG_CTX_BIT1_MASK			0x1
-#define E4_USTORM_ISCSI_TASK_AG_CTX_BIT1_SHIFT			5
+#define E4_USTORM_ISCSI_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_MASK     0x1
+#define E4_USTORM_ISCSI_TASK_AG_CTX_CONN_CLEAR_SQ_FLAG_SHIFT    5
 #define E4_USTORM_ISCSI_TASK_AG_CTX_HQ_SCANNED_CF_MASK		0x3
 #define E4_USTORM_ISCSI_TASK_AG_CTX_HQ_SCANNED_CF_SHIFT		6
 	u8 flags1;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index e05e320..df4d13f 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -65,8 +65,7 @@ enum qed_roce_qp_state {
 enum qed_rdma_tid_type {
 	QED_RDMA_TID_REGISTERED_MR,
 	QED_RDMA_TID_FMR,
-	QED_RDMA_TID_MW_TYPE1,
-	QED_RDMA_TID_MW_TYPE2A
+	QED_RDMA_TID_MW
 };
 
 struct qed_rdma_events {
@@ -280,7 +279,6 @@ struct qed_rdma_register_tid_in_params {
 
 	bool dif_enabled;
 	u64 dif_error_addr;
-	u64 dif_runt_addr;
 };
 
 struct qed_rdma_create_cq_in_params {
diff --git a/include/linux/qed/roce_common.h b/include/linux/qed/roce_common.h
index 193bcef3..473fba7 100644
--- a/include/linux/qed/roce_common.h
+++ b/include/linux/qed/roce_common.h
@@ -43,6 +43,7 @@
 #define ROCE_MAX_QPS			(32 * 1024)
 #define ROCE_DCQCN_NP_MAX_QPS		(64)
 #define ROCE_DCQCN_RP_MAX_QPS		(64)
+#define ROCE_LKEY_MW_DIF_EN_BIT		(28)
 
 /* Affiliated asynchronous events / errors enumeration */
 enum roce_async_events_type {
-- 
2.9.5

^ permalink raw reply related

* [PATCH] net-tcp: remove useless tw_timeout field
From: Maciej Żenczykowski @ 2018-06-05 10:07 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S . Miller, Eric Dumazet; +Cc: netdev

From: Maciej Żenczykowski <maze@google.com>

Tested: 'git grep tw_timeout' comes up empty and it builds :-)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_timewait_sock.h | 1 -
 net/dccp/minisocks.c             | 1 -
 net/ipv4/tcp_minisocks.c         | 1 -
 3 files changed, 3 deletions(-)

diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 659d8ed5a3bc..78775038f011 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -61,7 +61,6 @@ struct inet_timewait_sock {
 #define tw_cookie		__tw_common.skc_cookie
 #define tw_dr			__tw_common.skc_tw_dr
 
-	int			tw_timeout;
 	__u32			tw_mark;
 	volatile unsigned char	tw_substate;
 	unsigned char		tw_rcv_wscale;
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 37ccbe62eb1a..ba6fc3c1186b 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -53,7 +53,6 @@ void dccp_time_wait(struct sock *sk, int state, int timeo)
 		if (timeo < rto)
 			timeo = rto;
 
-		tw->tw_timeout = DCCP_TIMEWAIT_LEN;
 		if (state == DCCP_TIME_WAIT)
 			timeo = DCCP_TIMEWAIT_LEN;
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f867658b4b30..1dda1341a223 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -307,7 +307,6 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
 		if (timeo < rto)
 			timeo = rto;
 
-		tw->tw_timeout = TCP_TIMEWAIT_LEN;
 		if (state == TCP_TIME_WAIT)
 			timeo = TCP_TIMEWAIT_LEN;
 
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* [PATCH net-next] net: fddi: fix a possible null-ptr-deref
From: YueHaibing @ 2018-06-05 10:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, YueHaibing

bp->SharedMemAddr is set to NULL while bp->SharedMemSize less than 0,
then memset will trigger null-ptr-deref.

Fix it use pci_zalloc_consistent when bp->SharedMemSize is greater than 0.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/fddi/skfp/skfddi.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fddi/skfp/skfddi.c b/drivers/net/fddi/skfp/skfddi.c
index 2414f1d..90164e0 100644
--- a/drivers/net/fddi/skfp/skfddi.c
+++ b/drivers/net/fddi/skfp/skfddi.c
@@ -407,23 +407,21 @@ static  int skfp_driver_init(struct net_device *dev)
 	if (bp->SharedMemSize > 0) {
 		bp->SharedMemSize += 16;	// for descriptor alignment
 
-		bp->SharedMemAddr = pci_alloc_consistent(&bp->pdev,
-							 bp->SharedMemSize,
-							 &bp->SharedMemDMA);
+		bp->SharedMemAddr = pci_zalloc_consistent(&bp->pdev,
+							  bp->SharedMemSize,
+							  &bp->SharedMemDMA);
 		if (!bp->SharedMemAddr) {
 			printk("could not allocate mem for ");
 			printk("hardware module: %ld byte\n",
 			       bp->SharedMemSize);
 			goto fail;
 		}
-		bp->SharedMemHeap = 0;	// Nothing used yet.
 
 	} else {
 		bp->SharedMemAddr = NULL;
-		bp->SharedMemHeap = 0;
 	}			// SharedMemSize > 0
 
-	memset(bp->SharedMemAddr, 0, bp->SharedMemSize);
+	bp->SharedMemHeap = 0;
 
 	card_stop(smc);		// Reset adapter.
 
-- 
2.7.0

^ permalink raw reply related

* Re: Qualcomm rmnet driver and qmi_wwan
From: Daniele Palmas @ 2018-06-05  9:38 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan; +Cc: netdev
In-Reply-To: <2a96c941b6699827ea6ac1ce3603fae5@codeaurora.org>

Hi,

2018-02-21 20:47 GMT+01:00 Subash Abhinov Kasiviswanathan
<subashab@codeaurora.org>:
> On 2018-02-21 04:38, Daniele Palmas wrote:
>>
>> Hello,
>>
>> in rmnet kernel documentation I read:
>>
>> "This driver can be used to register onto any physical network device in
>> IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator."
>>
>> Does this mean that it can be used in association with the qmi_wwan
>> driver?
>>
>> If yes, can someone give me an hint on the steps to follow?
>>
>> If not, does anyone know if it is possible to modify qmi_wwan in order
>> to take advantage of the features provided by the rmnet driver?
>>
>> In this case hint on the changes for modifying qmi_wwan are welcome.
>>
>> Thanks in advance,
>> Daniele
>
>
> Hi
>
> I havent used qmi_wwan so the following comment is based on code inspection.
> qmimux_register_device() is creating qmimux devices with usb net device as
> real_dev. The Multiplexing and aggregation header (qmimux_hdr) is stripped
> off
> in qmimux_rx_fixup() and the packet is passed on to stack.
>
> You could instead create rmnet devices with the usb netdevice as real dev.
> The packets from the usb net driver can be queued to network stack directly
> as rmnet driver will setup a RX handler. rmnet driver will process the
> packets
> further and then queue to network stack.
>

in kernel documentation I read that rmnet user space configuration is
done through librmnetctl available at

https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/dataservices/tree/rmnetctl

However it seems to me that this is a bit outdated (e.g. it does not
properly build since it is looking for kernel header
linux/rmnet_data.h that, as far as I understand, is no more present).

Is there available a more recent version of the tool?

Thanks,
Daniele

> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply

* Re: INFO: task hung in ip6gre_exit_batch_net
From: Dmitry Vyukov @ 2018-06-05  9:36 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: syzbot, Christian Brauner, David Miller, David Ahern,
	Florian Westphal, Jiri Benc, LKML, Xin Long, mschiffer, netdev,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <cf729ce5-7948-d475-1b20-f31fb313df88@virtuozzo.com>

On Tue, Jun 5, 2018 at 11:03 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> Hi, Dmirty!
>
> On 04.06.2018 18:22, Dmitry Vyukov wrote:
>> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
>> <syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com
>>
>> Another hang on rtnl lock:
>>
>> #syz dup: INFO: task hung in netdev_run_todo
>>
>> May be related to "unregister_netdevice: waiting for DEV to become free":
>> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9

netdev_wait_allrefs does not hold rtnl lock during waiting, so it must
be something different.


>> Any other explanations for massive hangs on rtnl lock for minutes?
>
> To exclude the situation, when a task exists with rtnl_mutex held:
>
> would the pr_warn() from print_held_locks_bug() be included in the console output
> if they appear?

Yes, everything containing "WARNING:" is detected as bug.

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Daniel Borkmann @ 2018-06-05  9:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jakub Kicinski
  Cc: netdev, Alexei Starovoitov
In-Reply-To: <87k1rdfv0m.fsf@toke.dk>

On 06/05/2018 11:22 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 06/05/2018 12:26 AM, Jakub Kicinski wrote:
>>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>>> This adds two new helper functions to trace_helpers that supports polling
>>>> multiple perf file descriptors for events. These are used to the XDP
>>>> perf_event_output example, which needs to work with one perf fd per CPU.
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>
>>> Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?
>>>
>>> I think the ability to poll multiple FDs could be generally useful and
>>> therefore better add it to libbpf.c than
>>> tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...
>>
>> I think for it to land in libbpf this code needs to be more generalized
>> as it is right now and allowing for more flexibility like pinning RB
>> processing threads to CPUs, poll handling, etc.
> 
> Hmm, so how about we leave it in trace_helpers.c for now? I don't
> necessarily mind working on an addition to libbpf at some point, but
> don't have time to do so short term...

That's okay with me as well.

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Toke Høiland-Jørgensen @ 2018-06-05  9:22 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski; +Cc: netdev, Alexei Starovoitov
In-Reply-To: <2626942d-664b-f569-1926-d5a8ac5314b9@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/05/2018 12:26 AM, Jakub Kicinski wrote:
>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>> This adds two new helper functions to trace_helpers that supports polling
>>> multiple perf file descriptors for events. These are used to the XDP
>>> perf_event_output example, which needs to work with one perf fd per CPU.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> 
>> Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?
>> 
>> I think the ability to poll multiple FDs could be generally useful and
>> therefore better add it to libbpf.c than
>> tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...
>
> I think for it to land in libbpf this code needs to be more generalized
> as it is right now and allowing for more flexibility like pinning RB
> processing threads to CPUs, poll handling, etc.

Hmm, so how about we leave it in trace_helpers.c for now? I don't
necessarily mind working on an addition to libbpf at some point, but
don't have time to do so short term...

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/2] samples/bpf: Add xdp_sample_pkts example
From: Toke Høiland-Jørgensen @ 2018-06-05  9:20 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski; +Cc: netdev
In-Reply-To: <1ef5fdfd-3225-89e2-c6b5-d3fe2573d655@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/05/2018 12:32 AM, Jakub Kicinski wrote:
>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>> +	if (load_bpf_file(filename)) {
>> 
>> Would you mind using libbpf instead of bpf_load.o?  I converted some
>> samples in be5bca44aa6b ("samples: bpf: convert some XDP samples from
>> bpf_load to libbpf"), it's pretty straight forward.  Maybe we can kill
>> bpf_load.o one day :)
>
> Agreed, we should only be using libbpf going forward.

Right, I'll rework to load using libbpf instead.

^ permalink raw reply

* Re: [PATCH RFC ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-05  9:07 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, eyal.birger, antony, benedictwong, lorenzo,
	shannon.nelson
In-Reply-To: <20180604.085822.1374283424905473397.davem@davemloft.net>

On Mon, Jun 04, 2018 at 08:58:22AM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Mon, 4 Jun 2018 08:09:07 +0200
> 
> > This patchset introduces new virtual xfrm interfaces.
> > The design of virtual xfrm interfaces interfaces was
> > discussed at the Linux IPsec workshop 2018. This patchset
> > implements these interfaces as the IPsec userspace and
> > kernel developers agreed. The purpose of these interfaces
> > is to overcome the design limitations that the existing
> > VTI devices have.
> > 
> > We had two presentations about xfrm interfaces at
> > the workshop. Slides with further informations
> > can be found at the workshop homepage:
> > 
> > https://workshop.linux-ipsec.org/2018/
> 
> First off, you will have to describe in detail what the VTI
> limitations are and how these new devices overcome them in this commit
> message.

Ok, no problem. Sorry for not doing this in the initial mail.

The main limitations that we see with the current VTI are the following:

- VTI interfaces are L3 tunnels with configurable endpoints.
  For xfrm, the tunnel endpoint are already determined by the SA.
  So the VTI tunnel endpoints must be either the same as on the
  SA or wildcards. In case VTI tunnel endpoints are same as on
  the SA, we get a one to one correlation between the SA and
  the tunnel. So each SA needs its own tunnel interface.

  On the other hand, we can have only one VTI tunnel with
  wildcard src/dst tunnel endpoints in the system because the
  lookup is based on the tunnel endpoints. The existing tunnel
  lookup won't work with multiple tunnels with wildcard
  tunnel endpoints. Some usecases require more than on
  VTI tunnel of this type, for example if somebody has multiple
  namespaces and every namespace requires such a VTI.

- VTI needs separate interfaces for IPv4 and IPv6 tunnels.
  So when routing to a VTI, we have to know to which address
  family this traffic class is going to be encapsulated.
  This is a lmitation because it makes routing more complex
  and it is not always possible to know what happens behind the
  VTI, e.g. when the VTI is move to some namespace.

- VTI works just with tunnel mode SAs. We need generic interfaces
  that ensures transfomation, regardless of the xfrm mode and
  the encapsulated address family.

- VTI is configured with a combination GRE keys and xfrm marks.
  With this we have to deal with some extra cases in the generic
  tunnel lookup because the GRE keys on the VTI are actually
  not GRE keys, the GRE keys were just reused for something else.
  All extensions to the VTI interfaces would require to add
  even more complexity to the generic tunnel lookup.

So to overcome this, we started with the following design goal:

- It should be possible to tunnel IPv4 and IPv6 through the same
  interface.

- No limitation on xfrm mode (tunnel, transport and beet).

- Should be a generic virtual interface that ensures IPsec
  transformation, no need to know what happens behind the
  interface.

- Interfaces should be configured with a new key that must match a
  new policy/SA lookup key.

- The lookup logic should stay in the xfrm codebase, no need to
  change or extend generic routing and tunnel lookups.

- Should be possible to use IPsec hardware offloads of the underlying
  interface.

> 
> You can't just say "we discussed this over there, go take a look".
> 
> The place people "take a look" is your text here.
> 
> Second, since you didn't explain things, I have to ask.  Why is a new
> special ID even necessary?  It makes the flowi bigger, and adds all of
> this new logic.
> 
> All netdevs have an ifindex and you should be able to find a way to
> use the ifindex of these new devices in the key somehow.

The ifindex is already part of the xfrm selector and used as a key
to enforce the output device after the transformation in the
policy/state lookup. So we need some other identifier as the
ifindex is already in use for xfrm lookups.

We tried to keep the lookup logic in xfrm as much as possible,
the only thing that we currently need is a field in the flowi
for this to pass the information down to the lookups.

^ permalink raw reply

* Re: INFO: task hung in ip6gre_exit_batch_net
From: Kirill Tkhai @ 2018-06-05  9:03 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: Christian Brauner, David Miller, David Ahern, Florian Westphal,
	Jiri Benc, LKML, Xin Long, mschiffer, netdev, syzkaller-bugs,
	Vladislav Yasevich
In-Reply-To: <CACT4Y+ZmxgKvgD2RWD1=KEZQqY=vwdhrAZfqMT-gGuj8k8eKKg@mail.gmail.com>

Hi, Dmirty!

On 04.06.2018 18:22, Dmitry Vyukov wrote:
> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
> <syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com
> 
> Another hang on rtnl lock:
> 
> #syz dup: INFO: task hung in netdev_run_todo
> 
> May be related to "unregister_netdevice: waiting for DEV to become free":
> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
> 
> Any other explanations for massive hangs on rtnl lock for minutes?

To exclude the situation, when a task exists with rtnl_mutex held:

would the pr_warn() from print_held_locks_bug() be included in the console output
if they appear?

Kirill

^ permalink raw reply

* Freeze when using ipheth+IPsec+IPv6
From: Yves-Alexis Perez @ 2018-06-05  8:54 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Hans Liljestrand, David Windsor,
	Kees Cook, Reshetova, Elena, Kirill Tkhai, Al Viro, Cong Wang,
	Mateusz Jurczyk, Denys Vlasenko, David Herrmann, netdev,
	Alexander Kappner, Johannes Berg, Gustavo A. R. Silva,
	Arvind Yadav, Steffen Klassert, Herbert Xu


[-- Attachment #1.1: Type: text/plain, Size: 996 bytes --]

Hi,

since some kernels releases (I didn't test thorougly but at least 4.16
and 4.17) I have regular freezes in certain situations on my laptop.

It seems to happen when I:

- tether using my iPhone (involving ipheth)
- mount an IPsec tunnel over IPv4
- run evolution to fetch my mail (IMAP traffic over IPv6 inside the IPv4
  IPsec tunnel)

When I do that, the interface seems to freeze. Last time the mouse was
still moving so the kernel didn't completely crash, but the UI was
completely irresponsive. I managed to get the attached log from
/sys/fs/pstore with refcount_t stuff pointing to an underflow.

Since this doesn't happen if all three conditions aren't met, I've added
a lot of people to the CC: list, sorry for the noise.

I can try to add debugging here and there if needed, but any pointer to
where would be helpful. I'm especially puzzled about the presence of
net/unix/af_unix.c:491 unix_sock_destructor+0x97/0xc0 in the log.

Regards,
-- 
Yves-Alexis

[-- Attachment #1.2: dmesg-crash.txt --]
[-- Type: text/plain, Size: 14641 bytes --]

Oops#1 Part8
<4>[ 2189.388649] ------------[ cut here ]------------
<4>[ 2189.388652] refcount_t: underflow; use-after-free.
<4>[ 2189.388691] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/lib/refcount.c:187 refcount_sub_and_test+0x3e/0x50
<4>[ 2189.388692] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.388713] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G                T 4.17.0 #22
<4>[ 2189.388714] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388718] RIP: 0010:refcount_sub_and_test+0x3e/0x50
<4>[ 2189.388720] RSP: 0000:ffff93e640dabdc0 EFLAGS: 00010282
<4>[ 2189.388722] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000006
<4>[ 2189.388723] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d014dd95610
<4>[ 2189.388724] RBP: ffff8d00bac82144 R08: 00000000000003b3 R09: 0000000000000004
<4>[ 2189.388725] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.388726] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
<4>[ 2189.388728] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.388729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388731] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.388732] Call Trace:
<4>[ 2189.388738]  sock_wfree+0x40/0x60
<4>[ 2189.388743]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.388747]  skb_release_head_state+0x59/0x90
<4>[ 2189.388751]  skb_release_all+0x9/0x20
<4>[ 2189.388753]  __kfree_skb_defer+0x19/0x50
Oops#1 Part7
<4>[ 2189.388757]  net_tx_action+0xf0/0x2d0
<4>[ 2189.388760]  __do_softirq+0xdb/0x220
<4>[ 2189.388766]  ? sort_range+0x20/0x20
<4>[ 2189.388768]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.388771]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.388775]  kthread+0x109/0x120
<4>[ 2189.388779]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.388783]  ret_from_fork+0x35/0x40
<4>[ 2189.388785] Code: c9 75 0c f0 0f b1 16 75 27 85 d2 0f 94 c0 c3 80 3d ab 82 f4 00 00 75 15 48 c7 c7 b8 a5 fd bd c6 05 9b 82 f4 00 01 e8 22 13 d4 ff <0f> 0b 31 c0 c3 83 f8 ff 75 bf eb f6 66 0f 1f 44 00 00 48 89 fe 
<4>[ 2189.388825] ---[ end trace b06d93f176d25117 ]---
<4>[ 2189.388848] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/net/unix/af_unix.c:491 unix_sock_destructor+0x97/0xc0
<4>[ 2189.388849] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.388866] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G        W       T 4.17.0 #22
<4>[ 2189.388867] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388869] RIP: 0010:unix_sock_destructor+0x97/0xc0
<4>[ 2189.388871] RSP: 0000:ffff93e640dabda8 EFLAGS: 00010286
<4>[ 2189.388872] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000000
<4>[ 2189.388873] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
<4>[ 2189.388875] RBP: ffff8d00bac82000 R08: ffff8d0084796430 R09: ffff93e640dabd50
<4>[ 2189.388876] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.388877] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
Oops#1 Part6
<4>[ 2189.388879] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.388880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388881] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.388882] Call Trace:
<4>[ 2189.388886]  __sk_destruct+0x1f/0x140
<4>[ 2189.388888]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.388891]  skb_release_head_state+0x59/0x90
<4>[ 2189.388894]  skb_release_all+0x9/0x20
<4>[ 2189.388894] ------------[ cut here ]------------
<4>[ 2189.388896] refcount_t: addition on 0; use-after-free.
<4>[ 2189.388898]  __kfree_skb_defer+0x19/0x50
<4>[ 2189.388901]  net_tx_action+0xf0/0x2d0
<4>[ 2189.388906] WARNING: CPU: 0 PID: 1094 at /home/corsac/projets/linux/linux/lib/refcount.c:102 refcount_add+0x26/0x30
<4>[ 2189.388907] Modules linked in: esp4
<4>[ 2189.388910]  __do_softirq+0xdb/0x220
<4>[ 2189.388911]  xfrm6_mode_tunnel
<4>[ 2189.388915]  ? sort_range+0x20/0x20
<4>[ 2189.388915]  xfrm4_mode_tunnel ipheth
<4>[ 2189.388918]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.388918]  bnep
<4>[ 2189.388920]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.388921]  rtsx_pci_sdmmc
<4>[ 2189.388924]  kthread+0x109/0x120
<4>[ 2189.388925]  iwlmvm
<4>[ 2189.388927]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.388928]  snd_hda_codec_realtek
<4>[ 2189.388932]  ret_from_fork+0x35/0x40
<4>[ 2189.388932]  snd_hda_codec_generic snd_hda_codec_hdmi
<4>[ 2189.388933] Code: 
<4>[ 2189.388935]  iwlwifi
<4>[ 2189.388936] e8 ff 
<4>[ 2189.388937]  snd_hda_intel rtsx_pci
<4>[ 2189.388938] f0 f2 ff 5b be 
<4>[ 2189.388941]  snd_hda_codec snd_hwdep
<4>[ 2189.388942] 00 02 
<4>[ 2189.388944]  snd_hda_core
<4>[ 2189.388944] 00 00 
<4>[ 2189.388945]  snd_pcm thinkpad_acpi
Oops#1 Part5
<4>[ 2189.388947] 48 
<4>[ 2189.388947]  efivarfs input_leds
<4>[ 2189.388949] c7 
<4>[ 2189.388951] c7 
<4>[ 2189.388952] CPU: 0 PID: 1094 Comm: Xorg Tainted: G        W       T 4.17.0 #22
<4>[ 2189.388953] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388954] 5c 
<4>[ 2189.388957] RIP: 0010:refcount_add+0x26/0x30
<4>[ 2189.388958] c3 
<4>[ 2189.388959] RSP: 0018:ffff93e64167bbe0 EFLAGS: 00010286
<4>[ 2189.388960] 7b 
<4>[ 2189.388961] RAX: 0000000000000000 RBX: ffff93e64167bc18 RCX: 0000000000000000
<4>[ 2189.388963] RDX: ffff8d014dc1c540 RSI: ffff8d014dc15618 RDI: ffff8d014dc15618
<4>[ 2189.388963] bd 
<4>[ 2189.388965] RBP: ffffffffbd0d1570 R08: 00000000000003e6 R09: 0000000000000004
<4>[ 2189.388966] R10: ffff8d01449ee000 R11: 0000000000000001 R12: ffff93e64167bc94
<4>[ 2189.388967] 5d e9 
<4>[ 2189.388969] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d00bac82000
<4>[ 2189.388970] FS:  00007559523e76c0(0000) GS:ffff8d014dc00000(0000) knlGS:0000000000000000
<4>[ 2189.388971] 2c 91 
<4>[ 2189.388973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388975] CR2: 00007e917c21a000 CR3: 00000002431c8002 CR4: 00000000003606f0
<4>[ 2189.388976] 8d 
<4>[ 2189.388977] Call Trace:
<4>[ 2189.388977] ff 48 89 ef 
<4>[ 2189.388983]  sock_alloc_send_pskb+0x20c/0x230
<4>[ 2189.388984] e8 24 73 
<4>[ 2189.388987]  unix_stream_sendmsg+0x2bf/0x3d0
<4>[ 2189.388988] 9f ff eb be 
<4>[ 2189.388993]  sock_sendmsg+0x31/0x40
<4>[ 2189.388994] 0f 
<4>[ 2189.388996]  sock_write_iter+0x88/0xf0
<4>[ 2189.388997] 0b 48 83 
<4>[ 2189.389002]  do_iter_readv_writev+0x147/0x1a0
<4>[ 2189.389005] 7b 
<4>[ 2189.389007]  do_iter_write+0x81/0x1a0
<4>[ 2189.389008] 70 
<4>[ 2189.389010]  vfs_writev+0xd1/0x160
<4>[ 2189.389011] 00 74 
Oops#1 Part4
<4>[ 2189.389013]  ? __sys_recvmsg+0x71/0xb0
<4>[ 2189.389016]  ? __fget+0x6f/0xb0
<4>[ 2189.389016] 8b <0f> 0b 
<4>[ 2189.389019]  ? do_writev+0x5c/0xf0
<4>[ 2189.389021]  do_writev+0x5c/0xf0
<4>[ 2189.389022] 48 83 bb 
<4>[ 2189.389025]  do_syscall_64+0x72/0x1c0
<4>[ 2189.389026] 60 
<4>[ 2189.389029]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[ 2189.389029] 02 00 
<4>[ 2189.389031] RIP: 0033:0x75594f801017
<4>[ 2189.389033] RSP: 002b:00007ffdf628f130 EFLAGS: 00000293
<4>[ 2189.389033] 00 
<4>[ 2189.389034]  ORIG_RAX: 0000000000000014
<4>[ 2189.389035] 00 
<4>[ 2189.389037] RAX: ffffffffffffffda RBX: 000000000000004b RCX: 000075594f801017
<4>[ 2189.389037] 74 
<4>[ 2189.389039] RDX: 0000000000000001 RSI: 00007ffdf628f430 RDI: 000000000000004b
<4>[ 2189.389039] 89 
<4>[ 2189.389041] RBP: 00007ffdf628f430 R08: 0000000000000000 R09: 0000616b576c78e0
<4>[ 2189.389041] 0f 
<4>[ 2189.389043] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001
<4>[ 2189.389043] 0b 
<4>[ 2189.389045] R13: 00007ffdf628f430 R14: 0000000000000020 R15: 0000616b57691c80
<4>[ 2189.389046] eb 85 
<4>[ 2189.389047] Code: 
<4>[ 2189.389048] 48 
<4>[ 2189.389049] 00 00 
<4>[ 2189.389050] 89 
<4>[ 2189.389051] 00 
<4>[ 2189.389052] de 
<4>[ 2189.389053] 00 
<4>[ 2189.389054] 5b 
<4>[ 2189.389055] 00 
<4>[ 2189.389055] 48 
<4>[ 2189.389057] e8 8b 
<4>[ 2189.389058] ---[ end trace b06d93f176d25118 ]---
<4>[ 2189.389059] ff ff ff 84 c0 74 01 c3 80 3d 76 83 f4 00 00 75 f6 48 c7 c7 58 a5 fd bd c6 05 66 83 f4 00 01 e8 ea 13 d4 ff <0f> 0b c3 0f 1f 80 00 00 00 00 8b 
<4>[ 2189.389090] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/net/unix/af_unix.c:492 unix_sock_destructor+0xa3/0xc0
<4>[ 2189.389090] Modules linked in:
<4>[ 2189.389091] 07 
<4>[ 2189.389092]  esp4
<4>[ 2189.389093] 8d 50 
Oops#1 Part3
<4>[ 2189.389094]  xfrm6_mode_tunnel
<4>[ 2189.389095] 01 85 
<4>[ 2189.389097]  xfrm4_mode_tunnel
<4>[ 2189.389097] c0 74 
<4>[ 2189.389099]  ipheth
<4>[ 2189.389100] 35 85 
<4>[ 2189.389101]  bnep rtsx_pci_sdmmc
<4>[ 2189.389103] d2 
<4>[ 2189.389104]  iwlmvm
<4>[ 2189.389106] ---[ end trace b06d93f176d25119 ]---
<4>[ 2189.389106]  snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.389116] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G        W       T 4.17.0 #22
<4>[ 2189.389117] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.389119] RIP: 0010:unix_sock_destructor+0xa3/0xc0
<4>[ 2189.389121] RSP: 0000:ffff93e640dabda8 EFLAGS: 00010286
<4>[ 2189.389123] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000000
<4>[ 2189.389124] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
<4>[ 2189.389126] RBP: ffff8d00bac82000 R08: ffff8d0084796430 R09: ffff93e640dabd50
<4>[ 2189.389127] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.389128] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
<4>[ 2189.389130] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.389131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.389133] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.389134] Call Trace:
<4>[ 2189.389138]  __sk_destruct+0x1f/0x140
<4>[ 2189.389141]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.389145]  skb_release_head_state+0x59/0x90
<4>[ 2189.389148]  skb_release_all+0x9/0x20
<4>[ 2189.389150]  __kfree_skb_defer+0x19/0x50
<4>[ 2189.389153]  net_tx_action+0xf0/0x2d0
Oops#1 Part2
<4>[ 2189.389155]  __do_softirq+0xdb/0x220
<4>[ 2189.389159]  ? sort_range+0x20/0x20
<4>[ 2189.389163]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.389165]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.389168]  kthread+0x109/0x120
<4>[ 2189.389171]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.389174]  ret_from_fork+0x35/0x40
<4>[ 2189.389176] Code: c7 c7 5c c3 7b bd 5d e9 2c 91 8d ff 48 89 ef e8 24 73 9f ff eb be 0f 0b 48 83 7b 70 00 74 8b 0f 0b 48 83 bb 60 02 00 00 00 74 89 <0f> 0b eb 85 48 89 de 5b 48 c7 c7 60 78 05 be 5d e9 31 74 92 ff 
<4>[ 2189.389222] ---[ end trace b06d93f176d2511a ]---
<6>[ 2189.389224] unix: Attempt to release alive unix socket: 000000003dc67149
<1>[ 2190.401779] BUG: unable to handle kernel paging request at ffffffffbd3e3680
<6>[ 2190.401785] PGD 6620c067 P4D 6620c067 PUD 6620d063 PMD 652000e1 
<4>[ 2190.401790] Oops: 0003 [#1] PREEMPT SMP PTI
<4>[ 2190.401792] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2190.401806] CPU: 2 PID: 4119 Comm: evolution Tainted: G        W       T 4.17.0 #22
<4>[ 2190.401808] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2190.401813] RIP: 0010:queued_spin_lock_slowpath+0xe4/0x1a0
<4>[ 2190.401815] RSP: 0018:ffff93e642373c60 EFLAGS: 00010282
<4>[ 2190.401817] RAX: ffffffffbd3e3680 RBX: 0000000000000008 RCX: ffff8d014dd20880
<4>[ 2190.401818] RDX: 0000000000001084 RSI: 0000000042161000 RDI: ffff8d00bac82340
<4>[ 2190.401820] RBP: ffff8d00bac82340 R08: 00000000000c0000 R09: ffff8d0142385000
<4>[ 2190.401821] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d0142385000
Oops#1 Part1
<4>[ 2190.401823] R13: 0000000000000000 R14: ffff8d00bac82000 R15: ffff8d0144916800
<4>[ 2190.401825] FS:  0000746978d9bf80(0000) GS:ffff8d014dd00000(0000) knlGS:0000000000000000
<4>[ 2190.401827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2190.401829] CR2: ffffffffbd3e3680 CR3: 00000001844d6005 CR4: 00000000003606e0
<4>[ 2190.401830] Call Trace:
<4>[ 2190.401835]  unix_stream_sendmsg+0x1df/0x3d0
<4>[ 2190.401839]  sock_sendmsg+0x31/0x40
<4>[ 2190.401842]  sock_write_iter+0x88/0xf0
<4>[ 2190.401846]  do_iter_readv_writev+0x147/0x1a0
<4>[ 2190.401848]  do_iter_write+0x81/0x1a0
<4>[ 2190.401851]  vfs_writev+0xd1/0x160
<4>[ 2190.401854]  ? __fget+0x6f/0xb0
<4>[ 2190.401857]  ? do_writev+0x5c/0xf0
<4>[ 2190.401859]  do_writev+0x5c/0xf0
<4>[ 2190.401863]  do_syscall_64+0x72/0x1c0
<4>[ 2190.401866]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[ 2190.401868] RIP: 0033:0x746973a7c017
<4>[ 2190.401870] RSP: 002b:00007ffd99341b60 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
<4>[ 2190.401872] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000746973a7c017
<4>[ 2190.401873] RDX: 0000000000000003 RSI: 00007ffd99341d10 RDI: 0000000000000004
<4>[ 2190.401875] RBP: 00007ffd99341d10 R08: 0000000000000000 R09: 0000000000000000
<4>[ 2190.401876] R10: 00000000000000b0 R11: 0000000000000293 R12: 0000000000000003
<4>[ 2190.401878] R13: 00007ffd99341c94 R14: 000060ed17d42a98 R15: 00007ffd99341c98
<4>[ 2190.401880] Code: 89 c2 c1 e2 10 85 d2 0f 84 cc 00 00 00 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 80 08 02 00 48 03 04 d5 80 54 06 be <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 
<1>[ 2190.401909] RIP: queued_spin_lock_slowpath+0xe4/0x1a0 RSP: ffff93e642373c60
<4>[ 2190.401910] CR2: ffffffffbd3e3680
<4>[ 2190.401912] ---[ end trace b06d93f176d2511b ]---

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: AF_XDP. Was: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04
From: Björn Töpel @ 2018-06-05  8:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexei Starovoitov, David Miller, Björn Töpel,
	Karlsson, Magnus, ast, Daniel Borkmann, Or Gerlitz, Jeff Kirsher,
	Netdev
In-Reply-To: <CAKgT0UeoB19-KKkXMLfprMHFA0B+_KRiBrCCFL7T=oEHjWKsTA@mail.gmail.com>

Den tis 5 juni 2018 kl 03:46 skrev Alexander Duyck <alexander.duyck@gmail.com>:
>
> On Mon, Jun 4, 2018 at 4:32 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Mon, Jun 04, 2018 at 03:02:31PM -0700, Alexander Duyck wrote:
> >> On Mon, Jun 4, 2018 at 2:27 PM, David Miller <davem@davemloft.net> wrote:
> >> > From: Or Gerlitz <gerlitz.or@gmail.com>
> >> > Date: Tue, 5 Jun 2018 00:11:35 +0300
> >> >
> >> >> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
> >> >> merged for this window -- AFAIU from [1], it's still under
> >> >> examination/development/research for non Intel HWs, am I correct or
> >> >> this is going to get in now?
> >> >
> >> > All of the pending AF_XDP changes will be merged this merge window.
> >> >
> >> > I think Intel folks need to review things as fast as possible because
> >> > I pretty much refuse to revert the series or disable it in Kconfig at
> >> > this point.
> >> >
> >> > Thank you.
> >>
> >> My understanding of things is that the current AF_XDP patches were
> >> going to be updated to have more of a model agnostic API such that
> >> they would work for either the "typewriter" mode or the descriptor
> >> ring based approach. The current plan was to have the zero copy
> >> patches be a follow-on after the vendor agnostic API bits in the
> >> descriptors and such had been sorted out. I believe you guys have the
> >> descriptor fixes already right?
> >>
> >> In my opinion the i40e code isn't mature enough yet to really go into
> >> anything other than maybe net-next in a couple weeks. We are going to
> >> need a while to get adequate testing in order to flush out all the
> >> bugs and performance regressions we are likely to see coming out of
> >> this change.
> >
> > I think the work everyone did in this release cycle increased my confidence
> > that the way descriptors are defined and the rest of uapi are stable enough
> > and i40e zero copy bits can land in the next release without uapi changes.
> > In that sense even if we merge i40e parts now, the other nic vendors
> > will be in the same situation and may find things that they would like
> > to improve in uapi.
> > So I propose we merge the first 7 patches of the last series now and
> > let 3 remaining i40e patches go via intel trees for the next release.
> > In the mean time other NIC vendors should start actively working
> > on AF_XDP support as well.
> > If somehow uapi would need tweaks, we can still do minor adjustments
> > since 4.18 won't be released for ~10 weeks.
> >
>
> That works for me. Actually I think patch 11 can probably be included
> as well since that is just sample code and could probably be used by
> whatever drivers end up implementing this.
>

The approach suggested by Alexei and Alex sounds good to us. Alex's
review items are very much valid, and require more time to address.
Therefore addressing i40e in the next merge windows sounds like a
great idea.

As Alex suggests, including patch 11 together with the first seven makes sense.

Thanks for spending time on our work. Very much appreciated!


Björn

> Thanks.
>
> - Alex

^ permalink raw reply

* [net-next PATCH v3 5/5] Documentation: Add explanation for XPS using Rx-queue(s) map
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 Documentation/ABI/testing/sysfs-class-net-queues |   11 ++++
 Documentation/networking/scaling.txt             |   58 ++++++++++++++++++----
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net-queues b/Documentation/ABI/testing/sysfs-class-net-queues
index 0c0df91..1b4cc21 100644
--- a/Documentation/ABI/testing/sysfs-class-net-queues
+++ b/Documentation/ABI/testing/sysfs-class-net-queues
@@ -42,6 +42,17 @@ Description:
 		network device transmit queue. Possible vaules depend on the
 		number of available CPU(s) in the system.
 
+What:		/sys/class/<iface>/queues/tx-<queue>/xps_rxqs
+Date:		June 2018
+KernelVersion:	4.17.0
+Contact:	netdev@vger.kernel.org
+Description:
+		Mask of the receive queue(s) currently enabled to participate
+		into the Transmit Packet Steering packet processing flow for this
+		network device transmit queue. Possible values depend on the
+		number of available receive queue(s) in the network device.
+		Default is disabled.
+
 What:		/sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
 Date:		November 2011
 KernelVersion:	3.3
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index f55639d..834147c 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -366,8 +366,13 @@ XPS: Transmit Packet Steering
 
 Transmit Packet Steering is a mechanism for intelligently selecting
 which transmit queue to use when transmitting a packet on a multi-queue
-device. To accomplish this, a mapping from CPU to hardware queue(s) is
-recorded. The goal of this mapping is usually to assign queues
+device. This can be accomplished by recording two kinds of maps, either
+a mapping of CPU to hardware queue(s) or a mapping of receive queue(s)
+to hardware transmit queue(s).
+
+1. XPS using CPUs map
+
+The goal of this mapping is usually to assign queues
 exclusively to a subset of CPUs, where the transmit completions for
 these queues are processed on a CPU within this set. This choice
 provides two benefits. First, contention on the device queue lock is
@@ -377,12 +382,36 @@ transmit queue). Secondly, cache miss rate on transmit completion is
 reduced, in particular for data cache lines that hold the sk_buff
 structures.
 
-XPS is configured per transmit queue by setting a bitmap of CPUs that
-may use that queue to transmit. The reverse mapping, from CPUs to
-transmit queues, is computed and maintained for each network device.
-When transmitting the first packet in a flow, the function
-get_xps_queue() is called to select a queue. This function uses the ID
-of the running CPU as a key into the CPU-to-queue lookup table. If the
+2. XPS using receive queues map
+
+This mapping is used to pick transmit queue based on the receive
+queue(s) map configuration set by the administrator. A set of receive
+queues can be mapped to a set of transmit queues (many:many), although
+the common use case is a 1:1 mapping. This will enable sending packets
+on the same queue pair for transmit and receive. This is useful for
+busy polling multi-threaded workloads where there are challenges in
+associating a given CPU to a given application thread. The application
+threads are not pinned to CPUs and each thread handles packets
+received on a single queue. The receive queue number is cached in the
+socket for the connection and there is no need for adding flow entries
+as in the case of aRFS or flow director. In this model, sending the
+packets on the same transmit queue corresponding to the queue-pair
+associated with the receive queue has benefits in keeping the CPU overhead
+low. Transmit completion work is locked into the same queue pair that
+a given application is polling on. This avoids the overhead of triggering
+an interrupt on another CPU. When the application cleans up the packets
+during the busy poll, transmit completion may be processed along with it
+in the same thread context and so result in reduced latency.
+
+XPS is configured per transmit queue by setting a bitmap of
+CPUs/receive-queues that may use that queue to transmit. The reverse
+mapping, from CPUs to transmit queues or from receive-queues to transmit
+queues, is computed and maintained for each network device. When
+transmitting the first packet in a flow, the function get_xps_queue() is
+called to select a queue. This function uses the ID of the receive queue
+for the socket connection for a match in the receive queue-to-transmit queue
+lookup table. Alternatively, this function can also use the ID of the
+running CPU as a key into the CPU-to-queue lookup table. If the
 ID matches a single queue, that is used for transmission. If multiple
 queues match, one is selected by using the flow hash to compute an index
 into the set.
@@ -404,11 +433,15 @@ acknowledged.
 
 XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
 default for SMP). The functionality remains disabled until explicitly
-configured. To enable XPS, the bitmap of CPUs that may use a transmit
-queue is configured using the sysfs file entry:
+configured. To enable XPS, the bitmap of CPUs/receive-queues that may
+use a transmit queue is configured using the sysfs file entry:
 
+For selection based on CPUs map:
 /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
 
+For selection based on receive-queues map:
+/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
+
 == Suggested Configuration
 
 For a network device with a single transmission queue, XPS configuration
@@ -421,6 +454,11 @@ best CPUs to share a given queue are probably those that share the cache
 with the CPU that processes transmit completions for that queue
 (transmit interrupts).
 
+For transmit queue selection based on receive queue(s), XPS has to be
+explicitly configured mapping receive-queue(s) to transmit queue(s). If the
+user configuration for receive-queue map does not apply, then the transmit
+queue is selected based on the CPUs map.
+
 Per TX Queue rate limitation:
 =============================
 

^ permalink raw reply related

* [net-next PATCH v3 4/5] net-sysfs: Add interface for Rx queue(s) map per Tx queue
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Extend transmit queue sysfs attribute to configure Rx queue(s) map
per Tx queue. By default no receive queues are configured for the
Tx queue.

- /sys/class/net/eth0/queues/tx-*/xps_rxqs

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/net-sysfs.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b39987c..2ed4317 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1283,6 +1283,86 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
 
 static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 	= __ATTR_RW(xps_cpus);
+
+static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
+{
+	struct net_device *dev = queue->dev;
+	struct xps_dev_maps *dev_maps;
+	unsigned long *mask, index;
+	int j, len, num_tc = 1, tc = 0;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	if (dev->num_tc) {
+		num_tc = dev->num_tc;
+		tc = netdev_txq_to_tc(dev, index);
+		if (tc < 0)
+			return -EINVAL;
+	}
+
+	rcu_read_lock();
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	if (dev_maps) {
+		for (j = -1; j = attrmask_next(j, NULL, dev->num_rx_queues),
+		     j < dev->num_rx_queues;) {
+			int i, tci = j * num_tc + tc;
+			struct xps_map *map;
+
+			map = rcu_dereference(dev_maps->attr_map[tci]);
+			if (!map)
+				continue;
+
+			for (i = map->len; i--;) {
+				if (map->queues[i] == index) {
+					set_bit(j, mask);
+					break;
+				}
+			}
+		}
+	}
+
+	len = bitmap_print_to_pagebuf(false, buf, mask, dev->num_rx_queues);
+	rcu_read_unlock();
+	kfree(mask);
+
+	return len < PAGE_SIZE ? len : -EINVAL;
+}
+
+static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
+			      size_t len)
+{
+	struct net_device *dev = queue->dev;
+	unsigned long *mask, index;
+	int err;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	err = bitmap_parse(buf, len, mask, dev->num_rx_queues);
+	if (err) {
+		kfree(mask);
+		return err;
+	}
+
+	err = __netif_set_xps_queue(dev, mask, index, XPS_MAP_RXQS);
+	kfree(mask);
+	return err ? : len;
+}
+
+static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init
+	= __ATTR_RW(xps_rxqs);
 #endif /* CONFIG_XPS */
 
 static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
@@ -1290,6 +1370,7 @@ static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
 	&queue_traffic_class.attr,
 #ifdef CONFIG_XPS
 	&xps_cpus_attribute.attr,
+	&xps_rxqs_attribute.attr,
 	&queue_tx_maxrate.attr,
 #endif
 	NULL

^ permalink raw reply related

* [net-next PATCH v3 3/5] net: Enable Tx queue selection based on Rx queues
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

This patch adds support to pick Tx queue based on the Rx queue(s) map
configuration set by the admin through the sysfs attribute
for each Tx queue. If the user configuration for receive queue(s) map
does not apply, then the Tx queue selection falls back to CPU(s) map
based selection and finally to hashing.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/busy_poll.h |    3 ++
 include/net/sock.h      |   14 +++++++++++
 net/core/dev.c          |   60 ++++++++++++++++++++++++++++++++---------------
 net/core/sock.c         |    4 +++
 net/ipv4/tcp_input.c    |    3 ++
 5 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a9..fc4fb68 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -136,6 +136,9 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	sk->sk_napi_id = skb->napi_id;
 #endif
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = skb_get_rx_queue(skb);
+#endif
 }
 
 /* variant used for unconnected sockets */
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f7c584..12313653 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -139,6 +139,7 @@ typedef __u64 __bitwise __addrpair;
  *	@skc_node: main hash linkage for various protocol lookup tables
  *	@skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol
  *	@skc_tx_queue_mapping: tx queue number for this connection
+ *	@skc_rx_queue_mapping: rx queue number for this connection
  *	@skc_flags: place holder for sk_flags
  *		%SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
@@ -215,6 +216,9 @@ struct sock_common {
 		struct hlist_nulls_node skc_nulls_node;
 	};
 	int			skc_tx_queue_mapping;
+#ifdef CONFIG_XPS
+	int			skc_rx_queue_mapping;
+#endif
 	union {
 		int		skc_incoming_cpu;
 		u32		skc_rcv_wnd;
@@ -326,6 +330,9 @@ struct sock {
 #define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_refcnt		__sk_common.skc_refcnt
 #define sk_tx_queue_mapping	__sk_common.skc_tx_queue_mapping
+#ifdef CONFIG_XPS
+#define sk_rx_queue_mapping	__sk_common.skc_rx_queue_mapping
+#endif
 
 #define sk_dontcopy_begin	__sk_common.skc_dontcopy_begin
 #define sk_dontcopy_end		__sk_common.skc_dontcopy_end
@@ -1696,6 +1703,13 @@ static inline int sk_tx_queue_get(const struct sock *sk)
 	return sk ? sk->sk_tx_queue_mapping : -1;
 }
 
+static inline void sk_mark_rx_queue(struct sock *sk, struct sk_buff *skb)
+{
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = skb_get_rx_queue(skb);
+#endif
+}
+
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
 	sk_tx_queue_clear(sk);
diff --git a/net/core/dev.c b/net/core/dev.c
index bba755f..1880e6c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3479,36 +3479,58 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 }
 #endif /* CONFIG_NET_EGRESS */
 
-static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
+#ifdef CONFIG_XPS
+static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
+			       struct xps_dev_maps *dev_maps, unsigned int tci)
+{
+	struct xps_map *map;
+	int queue_index = -1;
+
+	if (dev->num_tc) {
+		tci *= dev->num_tc;
+		tci += netdev_get_prio_tc_map(dev, skb->priority);
+	}
+
+	map = rcu_dereference(dev_maps->attr_map[tci]);
+	if (map) {
+		if (map->len == 1)
+			queue_index = map->queues[0];
+		else
+			queue_index = map->queues[reciprocal_scale(
+						skb_get_hash(skb), map->len)];
+		if (unlikely(queue_index >= dev->real_num_tx_queues))
+			queue_index = -1;
+	}
+	return queue_index;
+}
+#endif
+
+static int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_XPS
 	struct xps_dev_maps *dev_maps;
-	struct xps_map *map;
+	struct sock *sk = skb->sk;
 	int queue_index = -1;
+	unsigned int tci = 0;
 
 	if (!static_key_false(&xps_needed))
 		return -1;
 
+	if (sk && sk->sk_rx_queue_mapping <= dev->num_rx_queues)
+		tci = sk->sk_rx_queue_mapping;
+
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_cpus_map);
-	if (dev_maps) {
-		unsigned int tci = skb->sender_cpu - 1;
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	if (dev_maps)
+		queue_index = __get_xps_queue_idx(dev, skb, dev_maps, tci);
 
-		if (dev->num_tc) {
-			tci *= dev->num_tc;
-			tci += netdev_get_prio_tc_map(dev, skb->priority);
-		}
 
-		map = rcu_dereference(dev_maps->attr_map[tci]);
-		if (map) {
-			if (map->len == 1)
-				queue_index = map->queues[0];
-			else
-				queue_index = map->queues[reciprocal_scale(skb_get_hash(skb),
-									   map->len)];
-			if (unlikely(queue_index >= dev->real_num_tx_queues))
-				queue_index = -1;
-		}
+	if (queue_index < 0) {
+		tci = skb->sender_cpu - 1;
+		dev_maps = rcu_dereference(dev->xps_cpus_map);
+		if (dev_maps)
+			queue_index = __get_xps_queue_idx(dev, skb, dev_maps,
+							  tci);
 	}
 	rcu_read_unlock();
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 435a0ba..3c10d31 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2824,6 +2824,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_pacing_rate = ~0U;
 	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
+
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = -1;
+#endif
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d5ffb57..cc69f75 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -78,6 +78,7 @@
 #include <linux/errqueue.h>
 #include <trace/events/tcp.h>
 #include <linux/static_key.h>
+#include <net/busy_poll.h>
 
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 
@@ -5574,6 +5575,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 	if (skb) {
 		icsk->icsk_af_ops->sk_rx_dst_set(sk, skb);
 		security_inet_conn_established(sk, skb);
+		sk_mark_napi_id(sk, skb);
 	}
 
 	tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
@@ -6402,6 +6404,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	tcp_rsk(req)->snt_isn = isn;
 	tcp_rsk(req)->txhash = net_tx_rndhash();
 	tcp_openreq_init_rwin(req, sk, dst);
+	sk_mark_rx_queue(req_to_sk(req), skb);
 	if (!want_cookie) {
 		tcp_reqsk_record_syn(sk, req, skb);
 		fastopen_sk = tcp_try_fastopen(sk, skb, req, &foc, dst);

^ permalink raw reply related

* [net-next PATCH v3 2/5] net: Use static_key for XPS maps
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Use static_key for XPS maps to reduce the cost of extra map checks,
similar to how it is used for RPS and RFS.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/dev.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 156acbe..bba755f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2081,6 +2081,8 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq)
 EXPORT_SYMBOL(netdev_txq_to_tc);
 
 #ifdef CONFIG_XPS
+struct static_key xps_needed __read_mostly;
+EXPORT_SYMBOL(xps_needed);
 static DEFINE_MUTEX(xps_map_mutex);
 #define xmap_dereference(P)		\
 	rcu_dereference_protected((P), lockdep_is_held(&xps_map_mutex))
@@ -2189,6 +2191,7 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 out_no_maps:
 		type++;
 	}
+	static_key_slow_dec(&xps_needed);
 	mutex_unlock(&xps_map_mutex);
 }
 
@@ -2309,6 +2312,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
+	static_key_slow_inc(&xps_needed);
+
 	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
@@ -3481,6 +3486,9 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	struct xps_map *map;
 	int queue_index = -1;
 
+	if (!static_key_false(&xps_needed))
+		return -1;
+
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {

^ permalink raw reply related

* [net-next PATCH v3 1/5] net: Refactor XPS for CPUs and Rx queues
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/cpumask.h   |   11 ++
 include/linux/netdevice.h |   73 ++++++++++++++
 net/core/dev.c            |  228 ++++++++++++++++++++++++++++++---------------
 net/core/net-sysfs.c      |    4 -
 4 files changed, 234 insertions(+), 82 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index bf53d89..57f20a0 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -115,12 +115,17 @@ extern struct cpumask __cpu_active_mask;
 #define cpu_active(cpu)		((cpu) == 0)
 #endif
 
-/* verify cpu argument to cpumask_* operators */
-static inline unsigned int cpumask_check(unsigned int cpu)
+static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
 {
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
+	WARN_ON_ONCE(cpu >= bits);
 #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
+/* verify cpu argument to cpumask_* operators */
+static inline unsigned int cpumask_check(unsigned int cpu)
+{
+	cpu_max_bits_warn(cpu, nr_cpumask_bits);
 	return cpu;
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6b863ed..354c866 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -730,10 +730,21 @@ struct xps_map {
  */
 struct xps_dev_maps {
 	struct rcu_head rcu;
-	struct xps_map __rcu *cpu_map[0];
+	struct xps_map __rcu *attr_map[0];
 };
-#define XPS_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +		\
+
+#define XPS_CPU_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +	\
 	(nr_cpu_ids * (_tcs) * sizeof(struct xps_map *)))
+
+#define XPS_RXQ_DEV_MAPS_SIZE(_tcs, _rxqs) (sizeof(struct xps_dev_maps) +\
+	(_rxqs * (_tcs) * sizeof(struct xps_map *)))
+
+enum xps_map_type {
+	XPS_MAP_RXQS,
+	XPS_MAP_CPUS,
+	__XPS_MAP_MAX
+};
+
 #endif /* CONFIG_XPS */
 
 #define TC_MAX_QUEUE	16
@@ -1902,7 +1913,8 @@ struct net_device {
 	int			watchdog_timeo;
 
 #ifdef CONFIG_XPS
-	struct xps_dev_maps __rcu *xps_maps;
+	struct xps_dev_maps __rcu *xps_cpus_map;
+	struct xps_dev_maps __rcu *xps_rxqs_map;
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
@@ -3251,6 +3263,61 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
 #ifdef CONFIG_XPS
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			u16 index);
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, enum xps_map_type type);
+
+static inline bool attr_test_mask(unsigned long j, const unsigned long *mask,
+				  unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+	return test_bit(j, mask);
+}
+
+static inline bool attr_test_online(unsigned long j,
+				    const unsigned long *online_mask,
+				    unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+
+	if (online_mask)
+		return test_bit(j, online_mask);
+
+	if (j >= 0 && j < nr_bits)
+		return true;
+
+	return false;
+}
+
+static inline unsigned int attrmask_next(int n, const unsigned long *srcp,
+					 unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (srcp)
+		return find_next_bit(srcp, nr_bits, n + 1);
+
+	return n + 1;
+}
+
+static inline int attrmask_next_and(int n, const unsigned long *src1p,
+				    const unsigned long *src2p,
+				    unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (src1p && src2p)
+		return find_next_and_bit(src1p, src2p, nr_bits, n + 1);
+	else if (src1p)
+		return find_next_bit(src1p, nr_bits, n + 1);
+	else if (src2p)
+		return find_next_bit(src2p, nr_bits, n + 1);
+
+	return n + 1;
+}
 #else
 static inline int netif_set_xps_queue(struct net_device *dev,
 				      const struct cpumask *mask,
diff --git a/net/core/dev.c b/net/core/dev.c
index 1844d9b..156acbe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2092,7 +2092,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 	int pos;
 
 	if (dev_maps)
-		map = xmap_dereference(dev_maps->cpu_map[tci]);
+		map = xmap_dereference(dev_maps->attr_map[tci]);
 	if (!map)
 		return false;
 
@@ -2105,7 +2105,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 			break;
 		}
 
-		RCU_INIT_POINTER(dev_maps->cpu_map[tci], NULL);
+		RCU_INIT_POINTER(dev_maps->attr_map[tci], NULL);
 		kfree_rcu(map, rcu);
 		return false;
 	}
@@ -2138,30 +2138,57 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 				   u16 count)
 {
+	const unsigned long *possible_mask = NULL;
+	enum xps_map_type type = XPS_MAP_RXQS;
 	struct xps_dev_maps *dev_maps;
-	int cpu, i;
 	bool active = false;
+	unsigned int nr_ids;
+	int i, j;
 
 	mutex_lock(&xps_map_mutex);
-	dev_maps = xmap_dereference(dev->xps_maps);
-
-	if (!dev_maps)
-		goto out_no_maps;
-
-	for_each_possible_cpu(cpu)
-		active |= remove_xps_queue_cpu(dev, dev_maps, cpu,
-					       offset, count);
-
-	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
-		kfree_rcu(dev_maps, rcu);
-	}
+	while (type < __XPS_MAP_MAX) {
+		switch (type) {
+		case XPS_MAP_RXQS:
+			dev_maps = xmap_dereference(dev->xps_rxqs_map);
+			if (!dev_maps)
+				goto out_no_maps;
+			nr_ids = dev->num_rx_queues;
+			break;
+		case XPS_MAP_CPUS:
+			dev_maps = xmap_dereference(dev->xps_cpus_map);
+			if (!dev_maps)
+				goto out_no_maps;
+
+			if (num_possible_cpus() > 1)
+				possible_mask = cpumask_bits(cpu_possible_mask);
+			nr_ids = nr_cpu_ids;
+			break;
+		default:
+			goto out_no_maps;
+		}
 
-	for (i = offset + (count - 1); count--; i--)
-		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, i),
-					     NUMA_NO_NODE);
+		for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+		     j < nr_ids;)
+			active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
+						       count);
+		if (!active) {
+			if (type == XPS_MAP_RXQS)
+				RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+
+			if (type == XPS_MAP_CPUS) {
+				RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
+
+				for (i = offset + (count - 1); count--; i--)
+					netdev_queue_numa_node_write(
+						netdev_get_tx_queue(dev, i),
+								NUMA_NO_NODE);
+			}
+			kfree_rcu(dev_maps, rcu);
+		}
 
 out_no_maps:
+		type++;
+	}
 	mutex_unlock(&xps_map_mutex);
 }
 
@@ -2170,8 +2197,8 @@ static void netif_reset_xps_queues_gt(struct net_device *dev, u16 index)
 	netif_reset_xps_queues(dev, index, dev->num_tx_queues - index);
 }
 
-static struct xps_map *expand_xps_map(struct xps_map *map,
-				      int cpu, u16 index)
+static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
+				      u16 index, enum xps_map_type type)
 {
 	struct xps_map *new_map;
 	int alloc_len = XPS_MIN_MAP_ALLOC;
@@ -2183,7 +2210,7 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		return map;
 	}
 
-	/* Need to add queue to this CPU's existing map */
+	/* Need to add tx-queue to this CPU's/rx-queue's existing map */
 	if (map) {
 		if (pos < map->alloc_len)
 			return map;
@@ -2191,9 +2218,20 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		alloc_len = map->alloc_len * 2;
 	}
 
-	/* Need to allocate new map to store queue on this CPU's map */
-	new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
-			       cpu_to_node(cpu));
+	/* Need to allocate new map to store tx-queue on this CPU's/rx-queue's
+	 *  map
+	 */
+	switch (type) {
+	case XPS_MAP_RXQS:
+		new_map = kzalloc(XPS_MAP_SIZE(alloc_len), GFP_KERNEL);
+		break;
+	case XPS_MAP_CPUS:
+		new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
+				       cpu_to_node(attr_index));
+		break;
+	default:
+		return NULL;
+	}
 	if (!new_map)
 		return NULL;
 
@@ -2205,14 +2243,16 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 	return new_map;
 }
 
-int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
-			u16 index)
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, enum xps_map_type type)
 {
+	const unsigned long *online_mask = NULL, *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
-	int i, cpu, tci, numa_node_id = -2;
+	int i, j, tci, numa_node_id = -2;
 	int maps_sz, num_tc = 1, tc = 0;
 	struct xps_map *map, *new_map;
 	bool active = false;
+	unsigned int nr_ids;
 
 	if (dev->num_tc) {
 		num_tc = dev->num_tc;
@@ -2221,16 +2261,33 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -EINVAL;
 	}
 
-	maps_sz = XPS_DEV_MAPS_SIZE(num_tc);
-	if (maps_sz < L1_CACHE_BYTES)
-		maps_sz = L1_CACHE_BYTES;
-
 	mutex_lock(&xps_map_mutex);
+	switch (type) {
+	case XPS_MAP_RXQS:
+		maps_sz = XPS_RXQ_DEV_MAPS_SIZE(num_tc, dev->num_rx_queues);
+		dev_maps = xmap_dereference(dev->xps_rxqs_map);
+		nr_ids = dev->num_rx_queues;
+		break;
+	case XPS_MAP_CPUS:
+		maps_sz = XPS_CPU_DEV_MAPS_SIZE(num_tc);
+		if (num_possible_cpus() > 1) {
+			online_mask = cpumask_bits(cpu_online_mask);
+			possible_mask = cpumask_bits(cpu_possible_mask);
+		}
+		dev_maps = xmap_dereference(dev->xps_cpus_map);
+		nr_ids = nr_cpu_ids;
+		break;
+	default:
+		mutex_unlock(&xps_map_mutex);
+		return -EINVAL;
+	}
 
-	dev_maps = xmap_dereference(dev->xps_maps);
+	if (maps_sz < L1_CACHE_BYTES)
+		maps_sz = L1_CACHE_BYTES;
 
 	/* allocate memory for queue storage */
-	for_each_cpu_and(cpu, cpu_online_mask, mask) {
+	for (j = -1; j = attrmask_next_and(j, online_mask, mask, nr_ids),
+	     j < nr_ids;) {
 		if (!new_dev_maps)
 			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
 		if (!new_dev_maps) {
@@ -2238,73 +2295,81 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -ENOMEM;
 		}
 
-		tci = cpu * num_tc + tc;
-		map = dev_maps ? xmap_dereference(dev_maps->cpu_map[tci]) :
+		tci = j * num_tc + tc;
+		map = dev_maps ? xmap_dereference(dev_maps->attr_map[tci]) :
 				 NULL;
 
-		map = expand_xps_map(map, cpu, index);
+		map = expand_xps_map(map, j, index, type);
 		if (!map)
 			goto error;
 
-		RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+		RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 	}
 
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
-	for_each_possible_cpu(cpu) {
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
-		for (i = tc, tci = cpu * num_tc; dev_maps && i--; tci++) {
+		for (i = tc, tci = j * num_tc; dev_maps && i--; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* We need to explicitly update tci as prevous loop
 		 * could break out early if dev_maps is NULL.
 		 */
-		tci = cpu * num_tc + tc;
+		tci = j * num_tc + tc;
 
-		if (cpumask_test_cpu(cpu, mask) && cpu_online(cpu)) {
-			/* add queue to CPU maps */
+		if (attr_test_mask(j, mask, nr_ids) &&
+		    attr_test_online(j, online_mask, nr_ids)) {
+			/* add tx-queue to CPU/rx-queue maps */
 			int pos = 0;
 
-			map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+			map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			while ((pos < map->len) && (map->queues[pos] != index))
 				pos++;
 
 			if (pos == map->len)
 				map->queues[map->len++] = index;
 #ifdef CONFIG_NUMA
-			if (numa_node_id == -2)
-				numa_node_id = cpu_to_node(cpu);
-			else if (numa_node_id != cpu_to_node(cpu))
-				numa_node_id = -1;
+			if (type == XPS_MAP_CPUS) {
+				if (numa_node_id == -2)
+					numa_node_id = cpu_to_node(j);
+				else if (numa_node_id != cpu_to_node(j))
+					numa_node_id = -1;
+			}
 #endif
 		} else if (dev_maps) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* copy maps belonging to foreign traffic classes */
 		for (i = num_tc - tc, tci++; dev_maps && --i; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 	}
 
-	rcu_assign_pointer(dev->xps_maps, new_dev_maps);
+	if (type == XPS_MAP_RXQS)
+		rcu_assign_pointer(dev->xps_rxqs_map, new_dev_maps);
+	else if (type == XPS_MAP_CPUS)
+		rcu_assign_pointer(dev->xps_cpus_map, new_dev_maps);
 
 	/* Cleanup old maps */
 	if (!dev_maps)
 		goto out_no_old_maps;
 
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
 			if (map && map != new_map)
 				kfree_rcu(map, rcu);
 		}
@@ -2317,19 +2382,23 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	active = true;
 
 out_no_new_maps:
-	/* update Tx queue numa node */
-	netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
-				     (numa_node_id >= 0) ? numa_node_id :
-				     NUMA_NO_NODE);
+	if (type == XPS_MAP_CPUS) {
+		/* update Tx queue numa node */
+		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
+					     (numa_node_id >= 0) ?
+					     numa_node_id : NUMA_NO_NODE);
+	}
 
 	if (!dev_maps)
 		goto out_no_maps;
 
-	/* removes queue from unused CPUs */
-	for_each_possible_cpu(cpu) {
-		for (i = tc, tci = cpu * num_tc; i--; tci++)
+	/* removes tx-queue from unused CPUs/rx-queues */
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = tc, tci = j * num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
-		if (!cpumask_test_cpu(cpu, mask) || !cpu_online(cpu))
+		if (!attr_test_mask(j, mask, nr_ids) ||
+		    !attr_test_online(j, online_mask, nr_ids))
 			active |= remove_xps_queue(dev_maps, tci, index);
 		for (i = num_tc - tc, tci++; --i; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
@@ -2337,7 +2406,10 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 
 	/* free map if not active */
 	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
+		if (type == XPS_MAP_RXQS)
+			RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+		else if (type == XPS_MAP_CPUS)
+			RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
 		kfree_rcu(dev_maps, rcu);
 	}
 
@@ -2347,11 +2419,12 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	return 0;
 error:
 	/* remove any maps that we added */
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			map = dev_maps ?
-			      xmap_dereference(dev_maps->cpu_map[tci]) :
+			      xmap_dereference(dev_maps->attr_map[tci]) :
 			      NULL;
 			if (new_map && new_map != map)
 				kfree(new_map);
@@ -2363,6 +2436,13 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	kfree(new_dev_maps);
 	return -ENOMEM;
 }
+
+int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
+			u16 index)
+{
+	return __netif_set_xps_queue(dev, cpumask_bits(mask), index,
+				     XPS_MAP_CPUS);
+}
 EXPORT_SYMBOL(netif_set_xps_queue);
 
 #endif
@@ -3402,7 +3482,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	int queue_index = -1;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		unsigned int tci = skb->sender_cpu - 1;
 
@@ -3411,7 +3491,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 			tci += netdev_get_prio_tc_map(dev, skb->priority);
 		}
 
-		map = rcu_dereference(dev_maps->cpu_map[tci]);
+		map = rcu_dereference(dev_maps->attr_map[tci]);
 		if (map) {
 			if (map->len == 1)
 				queue_index = map->queues[0];
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bb7e80f..b39987c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1227,13 +1227,13 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 		return -ENOMEM;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		for_each_possible_cpu(cpu) {
 			int i, tci = cpu * num_tc + tc;
 			struct xps_map *map;
 
-			map = rcu_dereference(dev_maps->cpu_map[tci]);
+			map = rcu_dereference(dev_maps->attr_map[tci]);
 			if (!map)
 				continue;
 

^ permalink raw reply related

* [net-next PATCH v3 0/5] Symmetric queue selection using XPS for Rx queues
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom

This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.

XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.

- /sys/class/net/<dev>/queues/tx-*/xps_rxqs

A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue pair as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html

Testing Hints:
Kernel:  Linux 4.17.0-rc7+
Interface: 
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b

Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on

Workload: 
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.

Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue

System:
Architecture:          x86_64
CPU(s):                72
Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

16 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 4/51/2215               2/30/5163
(usec)


intr/sec                        26655                   18606

contextswitch/sec               5145                    4044

insn per cycle                  0.43                    0.72

cache-misses                    6.919                   4.310
(% of all cache refs)

L1-dcache-load-                 4.49                    3.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.26                   8.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

32 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 10/112/5562             9/46/4637
(usec)


intr/sec                        30456                   27666

contextswitch/sec               7552                    5133

insn per cycle                  0.41                    0.49

cache-misses                    9.357                   2.769
(% of all cache refs)

L1-dcache-load-                 4.09                    3.98
-misses
(% of all L1-dcache hits)

LLC-load-misses                 12.96                   3.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

16 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  5/151/4989             9/69/2611
(usec)


intr/sec                        35686                   22907

contextswitch/sec               25522                   12281

insn per cycle                  0.67                    0.74

cache-misses                    8.652                   6.38
(% of all cache refs)

L1-dcache-load-                 3.19                    2.86
-misses
(% of all L1-dcache hits)

LLC-load-misses                 16.53                   11.99
(% of all LL-cache hits)

-------------------------------------------------------------------------------
32 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  6/163/6152             8/88/4209
(usec)


intr/sec                        47079                   26548

contextswitch/sec               42190                   39168

insn per cycle                  0.45                    0.54

cache-misses                    8.798                   4.668
(% of all cache refs)

L1-dcache-load-                 6.55                    6.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.91                   10.44
(% of all LL-cache hits)

-------------------------------------------------------------------------------

v3:
- Removed xps_maps array and used separate pointers for rxqs_map and cpus_map
- Used static_key for XPS maps
- Removed skc_rx_ifindex from sock_common
- Record rx_queue_mapping as part of sk_mark_napi_id, sk_mark_rx_queue is
  retained to be used on request sock which has only struct sock_common and not
  a full sock.
- Added documentation in ABI/testing/sysfs-class-net-queues

---

Amritha Nambiar (5):
      net: Refactor XPS for CPUs and Rx queues
      net: Use static_key for XPS maps
      net: Enable Tx queue selection based on Rx queues
      net-sysfs: Add interface for Rx queue(s) map per Tx queue
      Documentation: Add explanation for XPS using Rx-queue(s) map


 Documentation/ABI/testing/sysfs-class-net-queues |   11 +
 Documentation/networking/scaling.txt             |   58 ++++
 include/linux/cpumask.h                          |   11 +
 include/linux/netdevice.h                        |   73 +++++-
 include/net/busy_poll.h                          |    3 
 include/net/sock.h                               |   14 +
 net/core/dev.c                                   |  292 +++++++++++++++-------
 net/core/net-sysfs.c                             |   85 ++++++
 net/core/sock.c                                  |    4 
 net/ipv4/tcp_input.c                             |    3 
 10 files changed, 445 insertions(+), 109 deletions(-)

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Michal Hocko @ 2018-06-05  8:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, qing.huang, tariqt, haakon.bugge, yanjun.zhu,
	netdev, linux-rdma, linux-kernel, gi-oh.kim
In-Reply-To: <1908601f-2eda-d739-9c4d-430a002b1a05@gmail.com>

On Mon 04-06-18 06:22:26, Eric Dumazet wrote:
> 
> 
> On 06/04/2018 06:11 AM, Michal Hocko wrote:
> > On Thu 31-05-18 11:10:22, Michal Hocko wrote:
> 
> > Just in case you are interested
> > ---
> > From 5010543ed6f73e4c00367801486dca8d5c63b2ce Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Mon, 4 Jun 2018 15:07:37 +0200
> > Subject: [PATCH] net: cleanup gfp mask in alloc_skb_with_frags
> > 
> > alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations
> > which is just a noop and a little bit confusing.
> > 
> > __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> > high order allocations") to prevent from the OOM killer. Yet this was
> > not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> > allocation") didn't want an excessive reclaim for non-costly orders
> > so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> > place which is now redundant.
> > 
> > Drop the pointless __GFP_NORETRY because this function is used as
> > copy&paste source for other places.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks! What is the proper process now? Should I resend or somebody can
pick it up from this thread?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: WARNING in bpf_int_jit_compile
From: Dmitry Vyukov @ 2018-06-05  8:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: syzbot, Alexei Starovoitov, David Miller, H. Peter Anvin,
	Alexey Kuznetsov, LKML, Ingo Molnar, netdev, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Hideaki YOSHIFUJI
In-Reply-To: <cc2c87d5-0543-c340-9700-23f8676315f0@iogearbox.net>

On Mon, May 28, 2018 at 1:17 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 05/26/2018 11:29 AM, syzbot wrote:
>> syzbot has found a reproducer for the following crash on:
>>
>> HEAD commit:    62d18ecfa641 Merge tag 'arm64-fixes' of git://git.kernel.o..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=14c6bf57800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>> dashboard link: https://syzkaller.appspot.com/bug?extid=9e762b52dd17e616a7a5
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=130e42b7800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
>>
>> RAX: ffffffffffffffda RBX: 0000000002542914 RCX: 0000000000455a09
>> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
>> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> R13: 0000000000000046 R14: 00000000006f4730 R15: 0000000000000023
>> WARNING: CPU: 0 PID: 4752 at include/linux/filter.h:667 bpf_jit_binary_lock_ro include/linux/filter.h:667 [inline]
>> WARNING: CPU: 0 PID: 4752 at include/linux/filter.h:667 bpf_int_jit_compile+0xbf7/0xef7 arch/x86/net/bpf_jit_comp.c:1271
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 4752 Comm: syz-executor0 Not tainted 4.17.0-rc6+ #67
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  panic+0x22f/0x4de kernel/panic.c:184
>>  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
>>  report_bug+0x252/0x2d0 lib/bug.c:186
>>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>>  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
>> RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:667 [inline]
>
> Been looking into this last Friday already. What seems to happen here is that
> there's fault injection from inside set_memory_ro(), meaning it will eventually
> return an error there, and we throw a WARN_ON_ONCE() to bark that making the
> memory read-only didn't work out. I'd be in preference to notify the user on
> such issue rather than keeping completely silent about it so that there's
> awareness that read-only protections are not in place / guaranteed.


Hi Daniel,

There is no problem with notifying user, but there is a problem with
making this notification indistinguishable from kernel bugs. Also the
stack trace is pointless for end user, and they may not even
understand what happened and what they were notified about (not
everybody is a kernel developer!). A pr_err/pr_err_once with a
readable error message would be a better fit here.


>> RIP: 0010:bpf_int_jit_compile+0xbf7/0xef7 arch/x86/net/bpf_jit_comp.c:1271
>> RSP: 0018:ffff8801d85ff920 EFLAGS: 00010293
>> RAX: ffff8801d78c40c0 RBX: 0000000000000046 RCX: ffffffff81445d89
>> RDX: 0000000000000000 RSI: ffffffff81445d97 RDI: 0000000000000005
>> RBP: ffff8801d85ffa40 R08: ffff8801d78c40c0 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000194e002
>> R13: ffff8801d85ffa18 R14: 00000000fffffff4 R15: 0000000000000003
>>  bpf_prog_select_runtime+0x131/0x640 kernel/bpf/core.c:1541
>>  bpf_prog_load+0x16c2/0x2070 kernel/bpf/syscall.c:1333
>>  __do_sys_bpf kernel/bpf/syscall.c:2073 [inline]
>>  __se_sys_bpf kernel/bpf/syscall.c:2035 [inline]
>>  __x64_sys_bpf+0x389/0x4c0 kernel/bpf/syscall.c:2035
>>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x455a09
>> RSP: 002b:00007ffec3da2868 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
>> RAX: ffffffffffffffda RBX: 0000000002542914 RCX: 0000000000455a09
>> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
>> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> R13: 0000000000000046 R14: 00000000006f4730 R15: 0000000000000023
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
>>
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/cc2c87d5-0543-c340-9700-23f8676315f0%40iogearbox.net.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures?
From: Jiri Pirko @ 2018-06-05  8:18 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: dsahern, netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180605080528.GA2034@splinter.mtl.com>

Tue, Jun 05, 2018 at 10:05:28AM CEST, idosch@idosch.org wrote:
>On Tue, Jun 05, 2018 at 09:52:30AM +0200, Jiri Pirko wrote:
>> Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
>> > 	if (!mlxsw_sp_port->split) {
>> > 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
>> >+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");
>> 
>> I wonder if we need the dmesg for these as well. Plus it is not the same
>> (wasn't/was not) which is maybe confusing. Any objection against the
>> original dmesg messages removal?
>
>We had this discussion about three months ago and decided to keep the
>existing messages:
>https://marc.info/?l=linux-netdev&m=151982813309466&w=2

I forgot. Thanks for reminding me. So could we at least have the
messages 100% same? Thanks.

^ permalink raw reply

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures?
From: Ido Schimmel @ 2018-06-05  8:05 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: dsahern, netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180605075230.GC2164@nanopsycho>

On Tue, Jun 05, 2018 at 09:52:30AM +0200, Jiri Pirko wrote:
> Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
> > 	if (!mlxsw_sp_port->split) {
> > 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
> >+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");
> 
> I wonder if we need the dmesg for these as well. Plus it is not the same
> (wasn't/was not) which is maybe confusing. Any objection against the
> original dmesg messages removal?

We had this discussion about three months ago and decided to keep the
existing messages:
https://marc.info/?l=linux-netdev&m=151982813309466&w=2

^ permalink raw reply

* [PATCH net] net: sched: cls: Fix offloading when ingress dev is vxlan
From: Paul Blakey @ 2018-06-05  8:04 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang, Jamal Hadi Salim, David Miller, netdev
  Cc: Yevgeny Kliteynik, Roi Dayan, Shahar Klein, Mark Bloch,
	Or Gerlitz, Paul Blakey

When using a vxlan device as the ingress dev, we count it as a
"no offload dev", so when such a rule comes and err stop is true,
we fail early and don't try the egdev route which can offload it
through the egress device.

Fix that by not calling the block offload if one of the devices
attached to it is not offload capable, but make sure egress on such case
is capable instead.

Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
---
 net/sched/cls_api.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a57e112..2cd579f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -734,10 +734,6 @@ static int tcf_block_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	int ok_count = 0;
 	int err;
 
-	/* Make sure all netdevs sharing this block are offload-capable. */
-	if (block->nooffloaddevcnt && err_stop)
-		return -EOPNOTSUPP;
-
 	list_for_each_entry(block_cb, &block->cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
@@ -1580,21 +1576,31 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts *exts,
 int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 		     enum tc_setup_type type, void *type_data, bool err_stop)
 {
-	int ok_count;
+	int ok_count = 0;
 	int ret;
 
-	ret = tcf_block_cb_call(block, type, type_data, err_stop);
-	if (ret < 0)
-		return ret;
-	ok_count = ret;
+	if (!block->nooffloaddevcnt) {
+		ret = tcf_block_cb_call(block, type, type_data, err_stop);
+		if (ret < 0)
+			return ret;
+		ok_count = ret;
+	}
 
 	if (!exts || ok_count)
-		return ok_count;
+		goto skip_egress;
+
 	ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
 	if (ret < 0)
 		return ret;
 	ok_count += ret;
 
+skip_egress:
+	/* if one of the netdevs sharing this block are not offload-capable
+	 * make sure we succeeded in egress instead.
+	 */
+	if (block->nooffloaddevcnt && !ok_count && err_stop)
+		return -EOPNOTSUPP;
+
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures^[
From: Jiri Pirko @ 2018-06-05  7:52 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180604221503.20329-4-dsahern@kernel.org>

Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Return messages in extack for port split/unsplit errors. e.g.,
>    $ devlink port split swp1s1 count 4
>    Error: mlxsw_spectrum: Port cannot be split further.
>    devlink answers: Invalid argument
>
>    $ devlink port unsplit swp4
>    Error: mlxsw_spectrum: Port was not split.
>    devlink answers: Invalid argument
>
>Signed-off-by: David Ahern <dsahern@gmail.com>
>---
> drivers/net/ethernet/mellanox/mlxsw/core.c     | 14 ++++++++++----
> drivers/net/ethernet/mellanox/mlxsw/core.h     |  5 +++--
> drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 13 +++++++++++--
> 3 files changed, 24 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
>index 7ed38d80bc08..f9c724752a32 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
>@@ -775,11 +775,14 @@ static int mlxsw_devlink_port_split(struct devlink *devlink,
> {
> 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
> 
>-	if (port_index >= mlxsw_core->max_ports)
>+	if (port_index >= mlxsw_core->max_ports) {
>+		NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number of ports");
> 		return -EINVAL;
>+	}
> 	if (!mlxsw_core->driver->port_split)
> 		return -EOPNOTSUPP;
>-	return mlxsw_core->driver->port_split(mlxsw_core, port_index, count);
>+	return mlxsw_core->driver->port_split(mlxsw_core, port_index, count,
>+					      extack);
> }
> 
> static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
>@@ -788,11 +791,14 @@ static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
> {
> 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
> 
>-	if (port_index >= mlxsw_core->max_ports)
>+	if (port_index >= mlxsw_core->max_ports) {
>+		NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number of ports");
> 		return -EINVAL;
>+	}
> 	if (!mlxsw_core->driver->port_unsplit)
> 		return -EOPNOTSUPP;
>-	return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index);
>+	return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index,
>+						extack);
> }
> 
> static int
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
>index 4a8d4c7f89d9..552cfa29c2f7 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
>@@ -274,8 +274,9 @@ struct mlxsw_driver {
> 	int (*port_type_set)(struct mlxsw_core *mlxsw_core, u8 local_port,
> 			     enum devlink_port_type new_type);
> 	int (*port_split)(struct mlxsw_core *mlxsw_core, u8 local_port,
>-			  unsigned int count);
>-	int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port);
>+			  unsigned int count, struct netlink_ext_ack *extack);
>+	int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port,
>+			    struct netlink_ext_ack *extack);
> 	int (*sb_pool_get)(struct mlxsw_core *mlxsw_core,
> 			   unsigned int sb_index, u16 pool_index,
> 			   struct devlink_sb_pool_info *pool_info);
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>index fc39f22e5c70..1b6d930e452d 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>@@ -3092,7 +3092,8 @@ static void mlxsw_sp_port_unsplit_create(struct mlxsw_sp *mlxsw_sp,
> }
> 
> static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
>-			       unsigned int count)
>+			       unsigned int count,
>+			       struct netlink_ext_ack *extack)
> {
> 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
> 	struct mlxsw_sp_port *mlxsw_sp_port;
>@@ -3104,6 +3105,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 	if (!mlxsw_sp_port) {
> 		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
> 			local_port);
>+		NL_SET_ERR_MSG_MOD(extack, "Port number does not exist");
> 		return -EINVAL;
> 	}
> 
>@@ -3112,11 +3114,13 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 
> 	if (count != 2 && count != 4) {
> 		netdev_err(mlxsw_sp_port->dev, "Port can only be split into 2 or 4 ports\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port can only be split into 2 or 4 ports");
> 		return -EINVAL;
> 	}
> 
> 	if (cur_width != MLXSW_PORT_MODULE_MAX_WIDTH) {
> 		netdev_err(mlxsw_sp_port->dev, "Port cannot be split further\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port cannot be split further");
> 		return -EINVAL;
> 	}
> 
>@@ -3125,6 +3129,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 		base_port = local_port;
> 		if (mlxsw_sp->ports[base_port + 1]) {
> 			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
>+			NL_SET_ERR_MSG_MOD(extack, "Invalid split configuration");
> 			return -EINVAL;
> 		}
> 	} else {
>@@ -3132,6 +3137,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 		if (mlxsw_sp->ports[base_port + 1] ||
> 		    mlxsw_sp->ports[base_port + 3]) {
> 			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
>+			NL_SET_ERR_MSG_MOD(extack, "Invalid split configuration");
> 			return -EINVAL;
> 		}
> 	}
>@@ -3153,7 +3159,8 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 	return err;
> }
> 
>-static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
>+static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port,
>+				 struct netlink_ext_ack *extack)
> {
> 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
> 	struct mlxsw_sp_port *mlxsw_sp_port;
>@@ -3165,11 +3172,13 @@ static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
> 	if (!mlxsw_sp_port) {
> 		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
> 			local_port);
>+		NL_SET_ERR_MSG_MOD(extack, "Port number does not exist");
> 		return -EINVAL;
> 	}
> 
> 	if (!mlxsw_sp_port->split) {
> 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");

I wonder if we need the dmesg for these as well. Plus it is not the same
(wasn't/was not) which is maybe confusing. Any objection against the
original dmesg messages removal?


> 		return -EINVAL;
> 	}
> 
>-- 
>2.11.0
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox