Netdev List
 help / color / mirror / Atom feed
* [PATCH 11/20] octeontx2-af: Add support for stripping STAG/CTAG
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Tomasz Duszynski, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Tomasz Duszynski <tduszynski@marvell.com>

This works by shadowing existing UCAST MCAM entry
with a new one additionally matching either NPC_LT_LB_CTAG
or NPC_LT_LB_STAG. For this to fully work one needs to
send properly configured NIX_VTAG_CFG message afterwards i.e with
strip and capture enabled and type set to 0.

On receiving tagged packet NIX will remove outer VLAN and capture
TCI in NIX_RX_PARSE_S.

Also simplified RX Vtag configuration flow
With this setting STRIP/CAPTURE VTAG actions separately would be
possible. Following combinations are possible: STRIP,
STRIP and CAPTURE, CAPTURE or nothing (0 disables respective actions).

Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  6 +-
 drivers/net/ethernet/marvell/octeontx2/af/npc.h    | 30 ++++++++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  8 +++
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    | 83 +++++++++++++++++-----
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 46 +++++++++++-
 5 files changed, 152 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 737dbc9..f2bf77d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -181,7 +181,8 @@ M(NIX_SET_MAC_ADDR,	0x800a, nix_set_mac_addr, msg_rsp)		\
 M(NIX_SET_RX_MODE,	0x800b, nix_rx_mode, msg_rsp)			\
 M(NIX_SET_HW_FRS,	0x800c, nix_frs_cfg, msg_rsp)			\
 M(NIX_LF_START_RX,	0x800d, msg_req, msg_rsp)			\
-M(NIX_LF_STOP_RX,	0x800e, msg_req, msg_rsp)
+M(NIX_LF_STOP_RX,	0x800e, msg_req, msg_rsp)			\
+M(NIX_RXVLAN_ALLOC,	0x8012, msg_req, msg_rsp)
 
 /* Messages initiated by AF (range 0xC00 - 0xDFF) */
 #define MBOX_UP_CGX_MESSAGES						\
@@ -499,6 +500,7 @@ struct nix_txschq_config {
 
 struct nix_vtag_config {
 	struct mbox_msghdr hdr;
+	/* '0' for 4 octet VTAG, '1' for 8 octet VTAG */
 	u8 vtag_size;
 	/* cfg_type is '0' for tx vlan cfg
 	 * cfg_type is '1' for rx vlan cfg
@@ -519,7 +521,7 @@ struct nix_vtag_config {
 
 		/* valid when cfg_type is '1' */
 		struct {
-			/* rx vtag type index */
+			/* rx vtag type index, valid values are in 0..7 range */
 			u8 vtag_type;
 			/* rx vtag strip */
 			u8 strip_vtag :1;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index f98b011..3f7e5e6 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -259,4 +259,34 @@ struct nix_rx_action {
 #endif
 };
 
+struct nix_rx_vtag_action {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	u64     rsvd_63_48      :16;
+	u64     vtag1_valid     :1;
+	u64     vtag1_type      :3;
+	u64     rsvd_43         :1;
+	u64     vtag1_lid       :3;
+	u64     vtag1_relptr    :8;
+	u64     rsvd_31_16      :16;
+	u64     vtag0_valid     :1;
+	u64     vtag0_type      :3;
+	u64     rsvd_11         :1;
+	u64     vtag0_lid       :3;
+	u64     vtag0_relptr    :8;
+#else
+	u64     vtag0_relptr    :8;
+	u64     vtag0_lid       :3;
+	u64     rsvd_11         :1;
+	u64     vtag0_type      :3;
+	u64     vtag0_valid     :1;
+	u64     rsvd_31_16      :16;
+	u64     vtag1_relptr    :8;
+	u64     vtag1_lid       :3;
+	u64     rsvd_43         :1;
+	u64     vtag1_type      :3;
+	u64     vtag1_valid     :1;
+	u64     rsvd_63_48      :16;
+#endif
+};
+
 #endif /* NPC_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 12fbdba..e213bf4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -142,6 +142,11 @@ struct rvu_pfvf {
 	/* Broadcast pkt replication info */
 	u16			bcast_mce_idx;
 	struct nix_mce_list	bcast_mce_list;
+
+	/* VLAN offload */
+	struct mcam_entry entry;
+	int rxvlan_index;
+	bool rxvlan;
 };
 
 struct nix_txsch {
@@ -356,6 +361,8 @@ int rvu_mbox_handler_NIX_STATS_RST(struct rvu *rvu, struct msg_req *req,
 int rvu_mbox_handler_NIX_VTAG_CFG(struct rvu *rvu,
 				  struct nix_vtag_config *req,
 				  struct msg_rsp *rsp);
+int rvu_mbox_handler_NIX_RXVLAN_ALLOC(struct rvu *rvu, struct msg_req *req,
+				      struct msg_rsp *rsp);
 int rvu_mbox_handler_NIX_RSS_FLOWKEY_CFG(struct rvu *rvu,
 					 struct nix_rss_flowkey_cfg *req,
 					 struct msg_rsp *rsp);
@@ -384,6 +391,7 @@ void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_enable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc,
 				       int nixlf, u64 chan);
+int rvu_npc_update_rxvlan(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_disable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_enable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 5853af4..70a2997 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -192,6 +192,7 @@ static void nix_interface_deinit(struct rvu *rvu, u16 pcifunc, u8 nixlf)
 
 	pfvf->maxlen = 0;
 	pfvf->minlen = 0;
+	pfvf->rxvlan = false;
 
 	/* Remove this PF_FUNC from bcast pkt replication list */
 	err = nix_update_bcast_mce_list(rvu, pcifunc, false);
@@ -1209,28 +1210,15 @@ int rvu_mbox_handler_NIX_TXSCHQ_CFG(struct rvu *rvu,
 static int nix_rx_vtag_cfg(struct rvu *rvu, int nixlf, int blkaddr,
 			   struct nix_vtag_config *req)
 {
-	u64 regval = 0;
+	u64 regval = req->vtag_size;
 
-#define NIX_VTAGTYPE_MAX 0x8ull
-#define NIX_VTAGSIZE_MASK 0x7ull
-#define NIX_VTAGSTRIP_CAP_MASK 0x30ull
-
-	if (req->rx.vtag_type >= NIX_VTAGTYPE_MAX ||
-	    req->vtag_size > VTAGSIZE_T8)
+	if (req->rx.vtag_type > 7 || req->vtag_size > VTAGSIZE_T8)
 		return -EINVAL;
 
-	regval = rvu_read64(rvu, blkaddr,
-			    NIX_AF_LFX_RX_VTAG_TYPEX(nixlf, req->rx.vtag_type));
-
-	if (req->rx.strip_vtag && req->rx.capture_vtag)
-		regval |= BIT_ULL(4) | BIT_ULL(5);
-	else if (req->rx.strip_vtag)
+	if (req->rx.capture_vtag)
+		regval |= BIT_ULL(5);
+	if (req->rx.strip_vtag)
 		regval |= BIT_ULL(4);
-	else
-		regval &= ~(BIT_ULL(4) | BIT_ULL(5));
-
-	regval &= ~NIX_VTAGSIZE_MASK;
-	regval |= req->vtag_size & NIX_VTAGSIZE_MASK;
 
 	rvu_write64(rvu, blkaddr,
 		    NIX_AF_LFX_RX_VTAG_TYPEX(nixlf, req->rx.vtag_type), regval);
@@ -1770,6 +1758,9 @@ int rvu_mbox_handler_NIX_SET_MAC_ADDR(struct rvu *rvu,
 
 	rvu_npc_install_ucast_entry(rvu, pcifunc, nixlf,
 				    pfvf->rx_chan_base, req->mac_addr);
+
+	rvu_npc_update_rxvlan(rvu, pcifunc, nixlf);
+
 	return 0;
 }
 
@@ -1803,6 +1794,9 @@ int rvu_mbox_handler_NIX_SET_RX_MODE(struct rvu *rvu, struct nix_rx_mode *req,
 	else
 		rvu_npc_install_promisc_entry(rvu, pcifunc, nixlf,
 					      pfvf->rx_chan_base, allmulti);
+
+	rvu_npc_update_rxvlan(rvu, pcifunc, nixlf);
+
 	return 0;
 }
 
@@ -1941,6 +1935,59 @@ int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
 	return 0;
 }
 
+int rvu_mbox_handler_NIX_RXVLAN_ALLOC(struct rvu *rvu, struct msg_req *req,
+				      struct msg_rsp *rsp)
+{
+	struct npc_mcam_alloc_entry_req alloc_req = { };
+	struct npc_mcam_alloc_entry_rsp alloc_rsp = { };
+	struct npc_mcam_free_entry_req free_req = { };
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, nixlf, err;
+	struct rvu_pfvf *pfvf;
+
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+	if (pfvf->rxvlan)
+		return 0;
+
+	/* alloc new mcam entry */
+	alloc_req.hdr.pcifunc = pcifunc;
+	alloc_req.count = 1;
+
+	err = rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(rvu, &alloc_req,
+						    &alloc_rsp);
+	if (err)
+		return err;
+
+	/* update entry to enable rxvlan offload */
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
+	if (blkaddr < 0) {
+		err = NIX_AF_ERR_AF_LF_INVALID;
+		goto free_entry;
+	}
+
+	nixlf = rvu_get_lf(rvu, &rvu->hw->block[blkaddr], pcifunc, 0);
+	if (nixlf < 0) {
+		err = NIX_AF_ERR_AF_LF_INVALID;
+		goto free_entry;
+	}
+
+	pfvf->rxvlan_index = alloc_rsp.entry_list[0];
+	/* all it means is that rxvlan_index is valid */
+	pfvf->rxvlan = true;
+
+	err = rvu_npc_update_rxvlan(rvu, pcifunc, nixlf);
+	if (err)
+		goto free_entry;
+
+	return 0;
+free_entry:
+	free_req.hdr.pcifunc = pcifunc;
+	free_req.entry = alloc_rsp.entry_list[0];
+	rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(rvu, &free_req, rsp);
+	pfvf->rxvlan = false;
+	return err;
+}
+
 static void nix_link_config(struct rvu *rvu, int blkaddr)
 {
 	struct rvu_hwinfo *hw = rvu->hw;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 100ce29..5dbb5cd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -306,7 +306,9 @@ static u64 npc_get_mcam_action(struct rvu *rvu, struct npc_mcam *mcam,
 void rvu_npc_install_ucast_entry(struct rvu *rvu, u16 pcifunc,
 				 int nixlf, u64 chan, u8 *mac_addr)
 {
+	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
 	struct npc_mcam *mcam = &rvu->hw->mcam;
+	struct nix_rx_vtag_action vtag_action;
 	struct mcam_entry entry = { {0} };
 	struct nix_rx_action action;
 	int blkaddr, index, kwi;
@@ -345,6 +347,20 @@ void rvu_npc_install_ucast_entry(struct rvu *rvu, u16 pcifunc,
 	entry.action = *(u64 *)&action;
 	npc_config_mcam_entry(rvu, mcam, blkaddr, index,
 			      NIX_INTF_RX, &entry, true);
+
+	/* add VLAN matching, setup action and save entry back for later */
+	entry.kw[0] |= (NPC_LT_LB_STAG | NPC_LT_LB_CTAG) << 20;
+	entry.kw_mask[0] |= (NPC_LT_LB_STAG & NPC_LT_LB_CTAG) << 20;
+
+	*(u64 *)&vtag_action = 0;
+	vtag_action.vtag0_valid = 1;
+	/* must match type set in NIX_VTAG_CFG */
+	vtag_action.vtag0_type = 0;
+	vtag_action.vtag0_lid = NPC_LID_LA;
+	vtag_action.vtag0_relptr = 12;
+	entry.vtag_action = *(u64 *)&vtag_action;
+
+	memcpy(&pfvf->entry, &entry, sizeof(entry));
 }
 
 void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc,
@@ -352,7 +368,7 @@ void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc,
 {
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	struct mcam_entry entry = { {0} };
-	struct nix_rx_action action;
+	struct nix_rx_action action = { };
 	int blkaddr, index, kwi;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
@@ -521,6 +537,8 @@ void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 
 	rvu_write64(rvu, blkaddr,
 		    NPC_AF_MCAMEX_BANKX_ACTION(index, bank), *(u64 *)&action);
+
+	rvu_npc_update_rxvlan(rvu, pcifunc, nixlf);
 }
 
 static void npc_enadis_default_entries(struct rvu *rvu, u16 pcifunc,
@@ -560,6 +578,8 @@ static void npc_enadis_default_entries(struct rvu *rvu, u16 pcifunc,
 		rvu_npc_enable_promisc_entry(rvu, pcifunc, nixlf);
 	else
 		rvu_npc_disable_promisc_entry(rvu, pcifunc, nixlf);
+
+	rvu_npc_update_rxvlan(rvu, pcifunc, nixlf);
 }
 
 void rvu_npc_disable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
@@ -2018,3 +2038,27 @@ int rvu_mbox_handler_NPC_GET_KEX_CFG(struct rvu *rvu, struct msg_req *req,
 	}
 	return 0;
 }
+
+int rvu_npc_update_rxvlan(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int blkaddr, index;
+	bool enable;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NIX_AF_ERR_AF_LF_INVALID;
+
+	if (!pfvf->rxvlan)
+		return 0;
+
+	index = npc_get_nixlf_mcam_index(mcam, pcifunc, nixlf,
+					 NIXLF_UCAST_ENTRY);
+	pfvf->entry.action = npc_get_mcam_action(rvu, mcam, blkaddr, index);
+	enable = is_mcam_entry_enabled(rvu, mcam, blkaddr, index);
+	npc_config_mcam_entry(rvu, mcam, blkaddr, pfvf->rxvlan_index,
+			      NIX_INTF_RX, &pfvf->entry, enable);
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 10/20] octeontx2-af: Support to enable/disable default MCAM entries
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

For a PF/VF with a NIXLF attached has default/reserved MCAM entries
for receiving Ucast/Bcast/Promisc traffic. Ideally traffic should be
forwarded to NIXLF only after it's contexts are initialized. This
patch keeps these default entries disabled and adds mbox messages
for a PF/VF to enable these once NPA/NIXLF initialization is done.
Likewise while PF/VF is being teared down, it can send the disable
mailbox message to stop receiving traffic.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  4 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  7 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    | 48 ++++++++++++
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 91 +++++++++++++++-------
 4 files changed, 122 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 9941f0a..737dbc9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -179,7 +179,9 @@ M(NIX_VTAG_CFG,	0x8008, nix_vtag_config, msg_rsp)		\
 M(NIX_RSS_FLOWKEY_CFG,  0x8009, nix_rss_flowkey_cfg, msg_rsp)		\
 M(NIX_SET_MAC_ADDR,	0x800a, nix_set_mac_addr, msg_rsp)		\
 M(NIX_SET_RX_MODE,	0x800b, nix_rx_mode, msg_rsp)			\
-M(NIX_SET_HW_FRS,	0x800c, nix_frs_cfg, msg_rsp)
+M(NIX_SET_HW_FRS,	0x800c, nix_frs_cfg, msg_rsp)			\
+M(NIX_LF_START_RX,	0x800d, msg_req, msg_rsp)			\
+M(NIX_LF_STOP_RX,	0x800e, msg_req, msg_rsp)
 
 /* Messages initiated by AF (range 0xC00 - 0xDFF) */
 #define MBOX_UP_CGX_MESSAGES						\
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 074d792..12fbdba 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -366,6 +366,10 @@ int rvu_mbox_handler_NIX_SET_RX_MODE(struct rvu *rvu, struct nix_rx_mode *req,
 				     struct msg_rsp *rsp);
 int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
 				    struct msg_rsp *rsp);
+int rvu_mbox_handler_NIX_LF_START_RX(struct rvu *rvu, struct msg_req *req,
+				     struct msg_rsp *rsp);
+int rvu_mbox_handler_NIX_LF_STOP_RX(struct rvu *rvu, struct msg_req *req,
+				    struct msg_rsp *rsp);
 
 /* NPC APIs */
 int rvu_npc_init(struct rvu *rvu);
@@ -377,9 +381,12 @@ void rvu_npc_install_ucast_entry(struct rvu *rvu, u16 pcifunc,
 void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc,
 				   int nixlf, u64 chan, bool allmulti);
 void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf);
+void rvu_npc_enable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc,
 				       int nixlf, u64 chan);
 void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
+void rvu_npc_disable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
+void rvu_npc_enable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 				    int group, int alg_idx, int mcam_index);
 int rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(struct rvu *rvu,
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 9de9aaf..5853af4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -821,6 +821,9 @@ int rvu_mbox_handler_NIX_LF_ALLOC(struct rvu *rvu,
 	if (err)
 		goto free_mem;
 
+	/* Disable NPC entries as NIXLF's contexts are not initialized yet */
+	rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
+
 	goto exit;
 
 free_mem:
@@ -2176,3 +2179,48 @@ void rvu_nix_freemem(struct rvu *rvu)
 		mutex_destroy(&mcast->mce_lock);
 	}
 }
+
+static int nix_get_nixlf(struct rvu *rvu, u16 pcifunc, int *nixlf)
+{
+	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
+	struct rvu_hwinfo *hw = rvu->hw;
+	int blkaddr;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
+	if (!pfvf->nixlf || blkaddr < 0)
+		return NIX_AF_ERR_AF_LF_INVALID;
+
+	*nixlf = rvu_get_lf(rvu, &hw->block[blkaddr], pcifunc, 0);
+	if (*nixlf < 0)
+		return NIX_AF_ERR_AF_LF_INVALID;
+
+	return 0;
+}
+
+int rvu_mbox_handler_NIX_LF_START_RX(struct rvu *rvu, struct msg_req *req,
+				     struct msg_rsp *rsp)
+{
+	u16 pcifunc = req->hdr.pcifunc;
+	int nixlf, err;
+
+	err = nix_get_nixlf(rvu, pcifunc, &nixlf);
+	if (err)
+		return err;
+
+	rvu_npc_enable_default_entries(rvu, pcifunc, nixlf);
+	return 0;
+}
+
+int rvu_mbox_handler_NIX_LF_STOP_RX(struct rvu *rvu, struct msg_req *req,
+				    struct msg_rsp *rsp)
+{
+	u16 pcifunc = req->hdr.pcifunc;
+	int nixlf, err;
+
+	err = nix_get_nixlf(rvu, pcifunc, &nixlf);
+	if (err)
+		return err;
+
+	rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
+	return 0;
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 814166e..100ce29 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -384,7 +384,8 @@ void rvu_npc_install_promisc_entry(struct rvu *rvu, u16 pcifunc,
 			      NIX_INTF_RX, &entry, true);
 }
 
-void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf)
+static void npc_enadis_promisc_entry(struct rvu *rvu, u16 pcifunc,
+				     int nixlf, bool enable)
 {
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	int blkaddr, index;
@@ -399,7 +400,17 @@ void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf)
 
 	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
 					 nixlf, NIXLF_PROMISC_ENTRY);
-	npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false);
+	npc_enable_mcam_entry(rvu, mcam, blkaddr, index, enable);
+}
+
+void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	npc_enadis_promisc_entry(rvu, pcifunc, nixlf, false);
+}
+
+void rvu_npc_enable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	npc_enadis_promisc_entry(rvu, pcifunc, nixlf, true);
 }
 
 void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc,
@@ -512,11 +523,59 @@ void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 		    NPC_AF_MCAMEX_BANKX_ACTION(index, bank), *(u64 *)&action);
 }
 
-void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
+static void npc_enadis_default_entries(struct rvu *rvu, u16 pcifunc,
+				       int nixlf, bool enable)
 {
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	struct nix_rx_action action;
-	int blkaddr, index, bank;
+	int index, bank, blkaddr;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return;
+
+	/* Ucast MCAM match entry of this PF/VF */
+	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
+					 nixlf, NIXLF_UCAST_ENTRY);
+	npc_enable_mcam_entry(rvu, mcam, blkaddr, index, enable);
+
+	/* For PF, ena/dis promisc and bcast MCAM match entries */
+	if (pcifunc & RVU_PFVF_FUNC_MASK)
+		return;
+
+	/* For bcast, enable/disable only if it's action is not
+	 * packet replication, incase if action is replication
+	 * then this PF's nixlf is removed from bcast replication
+	 * list.
+	 */
+	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
+					 nixlf, NIXLF_BCAST_ENTRY);
+	bank = npc_get_bank(mcam, index);
+	*(u64 *)&action = rvu_read64(rvu, blkaddr,
+	     NPC_AF_MCAMEX_BANKX_ACTION(index & (mcam->banksize - 1), bank));
+	if (action.op != NIX_RX_ACTIONOP_MCAST)
+		npc_enable_mcam_entry(rvu, mcam,
+				      blkaddr, index, enable);
+	if (enable)
+		rvu_npc_enable_promisc_entry(rvu, pcifunc, nixlf);
+	else
+		rvu_npc_disable_promisc_entry(rvu, pcifunc, nixlf);
+}
+
+void rvu_npc_disable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	npc_enadis_default_entries(rvu, pcifunc, nixlf, false);
+}
+
+void rvu_npc_enable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	npc_enadis_default_entries(rvu, pcifunc, nixlf, true);
+}
+
+void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int blkaddr;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
 	if (blkaddr < 0)
@@ -532,29 +591,7 @@ void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
 
 	mutex_unlock(&mcam->lock);
 
-	/* Disable ucast MCAM match entry of this PF/VF */
-	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
-					 nixlf, NIXLF_UCAST_ENTRY);
-	npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false);
-
-	/* For PF, disable promisc and bcast MCAM match entries */
-	if (!(pcifunc & RVU_PFVF_FUNC_MASK)) {
-		index = npc_get_nixlf_mcam_index(mcam, pcifunc,
-						 nixlf, NIXLF_BCAST_ENTRY);
-		/* For bcast, disable only if it's action is not
-		 * packet replication, incase if action is replication
-		 * then this PF's nixlf is removed from bcast replication
-		 * list.
-		 */
-		bank = npc_get_bank(mcam, index);
-		index &= (mcam->banksize - 1);
-		*(u64 *)&action = rvu_read64(rvu, blkaddr,
-				     NPC_AF_MCAMEX_BANKX_ACTION(index, bank));
-		if (action.op != NIX_RX_ACTIONOP_MCAST)
-			npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false);
-
-		rvu_npc_disable_promisc_entry(rvu, pcifunc, nixlf);
-	}
+	rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
 }
 
 #define SET_KEX_LD(intf, lid, ltype, ld, cfg)	\
-- 
2.7.4

^ permalink raw reply related

* [PATCH 09/20] octeontx2-af: Add MKEX default profile
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem
  Cc: arnd, linux-soc, Santosh Shukla, Yuri Tolstov, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Santosh Shukla <sshukla@marvell.com>

Added basic default MKEX profile. This profile tells
hardware what data to extract from packet and where to
place it (bit offset) in final KEY generated for the
parsed packet. Based on the bit placement of the packet
data, MCAM entries have to programmed for matching.

Also added a msg to retrieve this MKEX profile from PF/VF
which inturn can process it to determine how MCAM entry
has to be populated.

Signed-off-by: Santosh Shukla <sshukla@marvell.com>
Signed-off-by: Yuri Tolstov <ytolstov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  18 +++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |   2 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 155 +++++++++++++++++----
 3 files changed, 150 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 2be2f71..9941f0a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -165,6 +165,7 @@ M(NPC_MCAM_COUNTER_STATS, 0x600a, npc_mcam_oper_counter_req,		\
 				  npc_mcam_oper_counter_rsp)		\
 M(NPC_MCAM_ALLOC_AND_WRITE_ENTRY, 0x600b, npc_mcam_alloc_and_write_entry_req,\
 					  npc_mcam_alloc_and_write_entry_rsp)\
+M(NPC_GET_KEX_CFG,	  0x600c, msg_req, npc_get_kex_cfg_rsp)		\
 /* NIX mbox IDs (range 0x8000 - 0xFFFF) */				\
 M(NIX_LF_ALLOC,		0x8000, nix_lf_alloc_req, nix_lf_alloc_rsp)	\
 M(NIX_LF_FREE,		0x8001, msg_req, msg_rsp)			\
@@ -684,4 +685,21 @@ struct npc_mcam_alloc_and_write_entry_rsp {
 	u16 cntr;
 };
 
+struct npc_get_kex_cfg_rsp {
+	struct mbox_msghdr hdr;
+	u64 rx_keyx_cfg;   /* NPC_AF_INTF(0)_KEX_CFG */
+	u64 tx_keyx_cfg;   /* NPC_AF_INTF(1)_KEX_CFG */
+#define NPC_MAX_INTF	2
+#define NPC_MAX_LID	8
+#define NPC_MAX_LT	16
+#define NPC_MAX_LD	2
+#define NPC_MAX_LFL	16
+	/* NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG */
+	u64 kex_ld_flags[NPC_MAX_LD];
+	/* NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG */
+	u64 intf_lid_lt_ld[NPC_MAX_INTF][NPC_MAX_LID][NPC_MAX_LT][NPC_MAX_LD];
+	/* NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG */
+	u64 intf_ld_flags[NPC_MAX_INTF][NPC_MAX_LD][NPC_MAX_LFL];
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 75b6f6b..074d792 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -415,4 +415,6 @@ int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
 int rvu_mbox_handler_NPC_MCAM_ALLOC_AND_WRITE_ENTRY(struct rvu *rvu,
 			  struct npc_mcam_alloc_and_write_entry_req *req,
 			  struct npc_mcam_alloc_and_write_entry_rsp *rsp);
+int rvu_mbox_handler_NPC_GET_KEX_CFG(struct rvu *rvu, struct msg_req *req,
+				     struct npc_get_kex_cfg_rsp *rsp);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 289de15..814166e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -427,9 +427,28 @@ void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc,
 	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
 					 nixlf, NIXLF_BCAST_ENTRY);
 
-	/* Check for L2B bit and LMAC channel */
-	entry.kw[0] = BIT_ULL(25) | chan;
-	entry.kw_mask[0] = BIT_ULL(25) | 0xFFFULL;
+	/* Check for L2B bit and LMAC channel
+	 * NOTE: Since MKEX default profile(a reduced version intended to
+	 * accommodate more capability but igoring few bits) a stap-gap
+	 * approach.
+	 * Since we care for L2B which by HRM NPC_PARSE_KEX_S at BIT_POS[25], So
+	 * moved to BIT_POS[13], ignoring ERRCODE, ERRLEV as we'll loose out
+	 * on capability features needed for CoS (/from ODP PoV) e.g: VLAN,
+	 * DSCP.
+	 *
+	 * Reduced layout of MKEX default profile -
+	 * Includes following are (i.e.CHAN, L2/3{B/M}, LA, LB, LC, LD):
+	 *
+	 * BIT_POS[31:28] : LD
+	 * BIT_POS[27:24] : LC
+	 * BIT_POS[23:20] : LB
+	 * BIT_POS[19:16] : LA
+	 * BIT_POS[15:12] : L3B, L3M, L2B, L2M
+	 * BIT_POS[11:00] : CHAN
+	 *
+	 */
+	entry.kw[0] = BIT_ULL(13) | chan;
+	entry.kw_mask[0] = ~entry.kw[0] & (BIT_ULL(13) | 0xFFFULL);
 
 	*(u64 *)&action = 0x00;
 #ifdef MCAST_MCE
@@ -538,14 +557,18 @@ void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
 	}
 }
 
-#define LDATA_EXTRACT_CONFIG(intf, lid, ltype, ld, cfg) \
+#define SET_KEX_LD(intf, lid, ltype, ld, cfg)	\
 	rvu_write64(rvu, blkaddr,			\
 		NPC_AF_INTFX_LIDX_LTX_LDX_CFG(intf, lid, ltype, ld), cfg)
 
-#define LDATA_FLAGS_CONFIG(intf, ld, flags, cfg)	\
+#define SET_KEX_LDFLAGS(intf, ld, flags, cfg)	\
 	rvu_write64(rvu, blkaddr,			\
 		NPC_AF_INTFX_LDATAX_FLAGSX_CFG(intf, ld, flags), cfg)
 
+#define KEX_LD_CFG(bytesm1, hdr_ofs, ena, flags_ena, key_ofs)		\
+			(((bytesm1) << 16) | ((hdr_ofs) << 8) | ((ena) << 7) | \
+			 ((flags_ena) << 6) | ((key_ofs) & 0x3F))
+
 static void npc_config_ldata_extract(struct rvu *rvu, int blkaddr)
 {
 	struct npc_mcam *mcam = &rvu->hw->mcam;
@@ -561,28 +584,66 @@ static void npc_config_ldata_extract(struct rvu *rvu, int blkaddr)
 	 */
 	for (lid = 0; lid < lid_count; lid++) {
 		for (ltype = 0; ltype < 16; ltype++) {
-			LDATA_EXTRACT_CONFIG(NIX_INTF_RX, lid, ltype, 0, 0ULL);
-			LDATA_EXTRACT_CONFIG(NIX_INTF_RX, lid, ltype, 1, 0ULL);
-			LDATA_EXTRACT_CONFIG(NIX_INTF_TX, lid, ltype, 0, 0ULL);
-			LDATA_EXTRACT_CONFIG(NIX_INTF_TX, lid, ltype, 1, 0ULL);
-
-			LDATA_FLAGS_CONFIG(NIX_INTF_RX, 0, ltype, 0ULL);
-			LDATA_FLAGS_CONFIG(NIX_INTF_RX, 1, ltype, 0ULL);
-			LDATA_FLAGS_CONFIG(NIX_INTF_TX, 0, ltype, 0ULL);
-			LDATA_FLAGS_CONFIG(NIX_INTF_TX, 1, ltype, 0ULL);
+			SET_KEX_LD(NIX_INTF_RX, lid, ltype, 0, 0ULL);
+			SET_KEX_LD(NIX_INTF_RX, lid, ltype, 1, 0ULL);
+			SET_KEX_LD(NIX_INTF_TX, lid, ltype, 0, 0ULL);
+			SET_KEX_LD(NIX_INTF_TX, lid, ltype, 1, 0ULL);
+
+			SET_KEX_LDFLAGS(NIX_INTF_RX, 0, ltype, 0ULL);
+			SET_KEX_LDFLAGS(NIX_INTF_RX, 1, ltype, 0ULL);
+			SET_KEX_LDFLAGS(NIX_INTF_TX, 0, ltype, 0ULL);
+			SET_KEX_LDFLAGS(NIX_INTF_TX, 1, ltype, 0ULL);
 		}
 	}
 
-	/* If we plan to extract Outer IPv4 tuple for TCP/UDP pkts
-	 * then 112bit key is not sufficient
-	 */
 	if (mcam->keysize != NPC_MCAM_KEY_X2)
 		return;
 
-	/* Start placing extracted data/flags from 64bit onwards, for now */
-	/* Extract DMAC from the packet */
-	cfg = (0x05 << 16) | BIT_ULL(7) | NPC_PARSE_RESULT_DMAC_OFFSET;
-	LDATA_EXTRACT_CONFIG(NIX_INTF_RX, NPC_LID_LA, NPC_LT_LA_ETHER, 0, cfg);
+	/* Default MCAM KEX profile */
+	/* Layer A: Ethernet: */
+
+	/* DMAC: 6 bytes, KW1[47:0] */
+	cfg = KEX_LD_CFG(0x05, 0x0, 0x1, 0x0, NPC_PARSE_RESULT_DMAC_OFFSET);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LA, NPC_LT_LA_ETHER, 0, cfg);
+
+	/* Ethertype: 2 bytes, KW0[47:32] */
+	cfg = KEX_LD_CFG(0x01, 0xc, 0x1, 0x0, 0x4);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LA, NPC_LT_LA_ETHER, 1, cfg);
+
+	/* Layer B: Single VLAN (CTAG) */
+	/* CTAG VLAN[2..3] + Ethertype, 4 bytes, KW0[63:32] */
+	cfg = KEX_LD_CFG(0x03, 0x0, 0x1, 0x0, 0x4);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LB, NPC_LT_LB_CTAG, 0, cfg);
+
+	/* Layer B: Stacked VLAN (STAG|QinQ) */
+	/* CTAG VLAN[2..3] + Ethertype, 4 bytes, KW0[63:32] */
+	cfg = KEX_LD_CFG(0x03, 0x4, 0x1, 0x0, 0x4);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LB, NPC_LT_LB_STAG, 0, cfg);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LB, NPC_LT_LB_QINQ, 0, cfg);
+
+	/* Layer C: IPv4 */
+	/* SIP+DIP: 8 bytes, KW2[63:0] */
+	cfg = KEX_LD_CFG(0x07, 0xc, 0x1, 0x0, 0x10);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LC, NPC_LT_LC_IP, 0, cfg);
+	/* TOS: 1 byte, KW1[63:56] */
+	cfg = KEX_LD_CFG(0x0, 0x1, 0x1, 0x0, 0xf);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LC, NPC_LT_LC_IP, 1, cfg);
+
+	/* Layer D:UDP */
+	/* SPORT: 2 bytes, KW3[15:0] */
+	cfg = KEX_LD_CFG(0x1, 0x0, 0x1, 0x0, 0x18);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LD, NPC_LT_LD_UDP, 0, cfg);
+	/* DPORT: 2 bytes, KW3[31:16] */
+	cfg = KEX_LD_CFG(0x1, 0x2, 0x1, 0x0, 0x1a);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LD, NPC_LT_LD_UDP, 1, cfg);
+
+	/* Layer D:TCP */
+	/* SPORT: 2 bytes, KW3[15:0] */
+	cfg = KEX_LD_CFG(0x1, 0x0, 0x1, 0x0, 0x18);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LD, NPC_LT_LD_TCP, 0, cfg);
+	/* DPORT: 2 bytes, KW3[31:16] */
+	cfg = KEX_LD_CFG(0x1, 0x2, 0x1, 0x0, 0x1a);
+	SET_KEX_LD(NIX_INTF_RX, NPC_LID_LD, NPC_LT_LD_TCP, 1, cfg);
 }
 
 static void npc_config_kpuaction(struct rvu *rvu, int blkaddr,
@@ -898,13 +959,12 @@ int rvu_npc_init(struct rvu *rvu)
 		    BIT_ULL(6) | BIT_ULL(2));
 
 	/* Set RX and TX side MCAM search key size.
-	 * Also enable parse key extract nibbles suchthat except
-	 * layer E to H, rest of the key is included for MCAM search.
+	 * LA..LD (ltype only) + Channel
 	 */
 	rvu_write64(rvu, blkaddr, NPC_AF_INTFX_KEX_CFG(NIX_INTF_RX),
-		    ((keyz & 0x3) << 32) | ((1ULL << 20) - 1));
+			((keyz & 0x3) << 32) | 0x49247);
 	rvu_write64(rvu, blkaddr, NPC_AF_INTFX_KEX_CFG(NIX_INTF_TX),
-		    ((keyz & 0x3) << 32) | ((1ULL << 20) - 1));
+			((keyz & 0x3) << 32) | ((1ULL << 19) - 1));
 
 	err = npc_mcam_rsrcs_init(rvu, blkaddr);
 	if (err)
@@ -1876,3 +1936,48 @@ int rvu_mbox_handler_NPC_MCAM_ALLOC_AND_WRITE_ENTRY(struct rvu *rvu,
 
 	return 0;
 }
+
+#define GET_KEX_CFG(intf) \
+	rvu_read64(rvu, BLKADDR_NPC, NPC_AF_INTFX_KEX_CFG(intf))
+
+#define GET_KEX_FLAGS(ld) \
+	rvu_read64(rvu, BLKADDR_NPC, NPC_AF_KEX_LDATAX_FLAGS_CFG(ld))
+
+#define GET_KEX_LD(intf, lid, lt, ld)	\
+	rvu_read64(rvu, BLKADDR_NPC,	\
+		NPC_AF_INTFX_LIDX_LTX_LDX_CFG(intf, lid, lt, ld))
+
+#define GET_KEX_LDFLAGS(intf, ld, fl)	\
+	rvu_read64(rvu, BLKADDR_NPC,	\
+		NPC_AF_INTFX_LDATAX_FLAGSX_CFG(intf, ld, fl))
+
+int rvu_mbox_handler_NPC_GET_KEX_CFG(struct rvu *rvu, struct msg_req *req,
+				     struct npc_get_kex_cfg_rsp *rsp)
+{
+	int lid, lt, ld, fl;
+
+	rsp->rx_keyx_cfg = GET_KEX_CFG(NIX_INTF_RX);
+	rsp->tx_keyx_cfg = GET_KEX_CFG(NIX_INTF_TX);
+	for (lid = 0; lid < NPC_MAX_LID; lid++) {
+		for (lt = 0; lt < NPC_MAX_LT; lt++) {
+			for (ld = 0; ld < NPC_MAX_LD; ld++) {
+				rsp->intf_lid_lt_ld[NIX_INTF_RX][lid][lt][ld] =
+					GET_KEX_LD(NIX_INTF_RX, lid, lt, ld);
+				rsp->intf_lid_lt_ld[NIX_INTF_TX][lid][lt][ld] =
+					GET_KEX_LD(NIX_INTF_TX, lid, lt, ld);
+			}
+		}
+	}
+	for (ld = 0; ld < NPC_MAX_LD; ld++)
+		rsp->kex_ld_flags[ld] = GET_KEX_FLAGS(ld);
+
+	for (ld = 0; ld < NPC_MAX_LD; ld++) {
+		for (fl = 0; fl < NPC_MAX_LFL; fl++) {
+			rsp->intf_ld_flags[NIX_INTF_RX][ld][fl] =
+					GET_KEX_LDFLAGS(NIX_INTF_RX, ld, fl);
+			rsp->intf_ld_flags[NIX_INTF_TX][ld][fl] =
+					GET_KEX_LDFLAGS(NIX_INTF_TX, ld, fl);
+		}
+	}
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

A new mailbox message is added to support allocating a MCAM entry
along with a counter and configuring it in one go. This reduces
the amount of mailbox communication involved in installing a new
MCAM rule.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   | 18 ++++++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  3 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 72 ++++++++++++++++++++++
 3 files changed, 93 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index a851a0b..2be2f71 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -163,6 +163,8 @@ M(NPC_MCAM_UNMAP_COUNTER, 0x6008, npc_mcam_unmap_counter_req, msg_rsp)	\
 M(NPC_MCAM_CLEAR_COUNTER, 0x6009, npc_mcam_oper_counter_req, msg_rsp)	\
 M(NPC_MCAM_COUNTER_STATS, 0x600a, npc_mcam_oper_counter_req,		\
 				  npc_mcam_oper_counter_rsp)		\
+M(NPC_MCAM_ALLOC_AND_WRITE_ENTRY, 0x600b, npc_mcam_alloc_and_write_entry_req,\
+					  npc_mcam_alloc_and_write_entry_rsp)\
 /* NIX mbox IDs (range 0x8000 - 0xFFFF) */				\
 M(NIX_LF_ALLOC,		0x8000, nix_lf_alloc_req, nix_lf_alloc_rsp)	\
 M(NIX_LF_FREE,		0x8001, msg_req, msg_rsp)			\
@@ -666,4 +668,20 @@ struct npc_mcam_unmap_counter_req {
 	u8  all;   /* Unmap all entries using this counter ? */
 };
 
+struct npc_mcam_alloc_and_write_entry_req {
+	struct mbox_msghdr hdr;
+	struct mcam_entry entry_data;
+	u16 ref_entry;
+	u8  priority;    /* Lower or higher w.r.t ref_entry */
+	u8  intf;	 /* Rx or Tx interface */
+	u8  enable_entry;/* Enable this MCAM entry ? */
+	u8  alloc_cntr;  /* Allocate counter and map ? */
+};
+
+struct npc_mcam_alloc_and_write_entry_rsp {
+	struct mbox_msghdr hdr;
+	u16 entry;
+	u16 cntr;
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 203f441..75b6f6b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -412,4 +412,7 @@ int rvu_mbox_handler_NPC_MCAM_UNMAP_COUNTER(struct rvu *rvu,
 int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
 			struct npc_mcam_oper_counter_req *req,
 			struct npc_mcam_oper_counter_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_ALLOC_AND_WRITE_ENTRY(struct rvu *rvu,
+			  struct npc_mcam_alloc_and_write_entry_req *req,
+			  struct npc_mcam_alloc_and_write_entry_rsp *rsp);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 79df1d4..289de15 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -1804,3 +1804,75 @@ int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
 
 	return 0;
 }
+
+int rvu_mbox_handler_NPC_MCAM_ALLOC_AND_WRITE_ENTRY(struct rvu *rvu,
+			  struct npc_mcam_alloc_and_write_entry_req *req,
+			  struct npc_mcam_alloc_and_write_entry_rsp *rsp)
+{
+	struct npc_mcam_alloc_counter_req cntr_req;
+	struct npc_mcam_alloc_counter_rsp cntr_rsp;
+	struct npc_mcam_alloc_entry_req entry_req;
+	struct npc_mcam_alloc_entry_rsp entry_rsp;
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 entry = NPC_MCAM_ENTRY_INVALID;
+	u16 cntr = NPC_MCAM_ENTRY_INVALID;
+	int blkaddr, rc;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	if (req->intf != NIX_INTF_RX && req->intf != NIX_INTF_TX)
+		return NPC_MCAM_INVALID_REQ;
+
+	/* Try to allocate a MCAM entry */
+	entry_req.hdr.pcifunc = req->hdr.pcifunc;
+	entry_req.contig = true;
+	entry_req.priority = req->priority;
+	entry_req.ref_entry = req->ref_entry;
+	entry_req.count = 1;
+
+	rc = rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(rvu,
+						   &entry_req, &entry_rsp);
+	if (rc)
+		return rc;
+
+	if (!entry_rsp.count)
+		return NPC_MCAM_ALLOC_FAILED;
+
+	entry = entry_rsp.entry;
+
+	if (!req->alloc_cntr)
+		goto write_entry;
+
+	/* Now allocate counter */
+	cntr_req.hdr.pcifunc = req->hdr.pcifunc;
+	cntr_req.contig = true;
+	cntr_req.count = 1;
+
+	rc = rvu_mbox_handler_NPC_MCAM_ALLOC_COUNTER(rvu, &cntr_req, &cntr_rsp);
+	if (rc) {
+		/* Free allocated MCAM entry */
+		mutex_lock(&mcam->lock);
+		mcam->entry2pfvf_map[entry] = 0;
+		npc_mcam_clear_bit(mcam, entry);
+		mutex_unlock(&mcam->lock);
+		return rc;
+	}
+
+	cntr = cntr_rsp.cntr;
+
+write_entry:
+	mutex_lock(&mcam->lock);
+	npc_config_mcam_entry(rvu, mcam, blkaddr, entry, req->intf,
+			      &req->entry_data, req->enable_entry);
+
+	if (req->alloc_cntr)
+		npc_map_mcam_entry_and_cntr(rvu, mcam, blkaddr, entry, cntr);
+	mutex_unlock(&mcam->lock);
+
+	rsp->entry = entry;
+	rsp->cntr = cntr;
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 07/20] octeontx2-af: Map or unmap NPC MCAM entry and counter
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

Alloc memory to save MCAM 'entry to counter' mapping and since
multiple entries can map to same counter, added counter's reference
count tracking.

Do 'entry to counter' mapping when a entry is being installed
and mbox msg sender requested to configure a counter as well.
Mapping is removed when a entry or counter is being freed or
a explicit mbox msg is received to unmap them.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |   8 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |   4 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 183 ++++++++++++++++++++-
 3 files changed, 191 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 9fa491e..a851a0b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -159,6 +159,7 @@ M(NPC_MCAM_SHIFT_ENTRY, 0x6005, npc_mcam_shift_entry_req,		\
 M(NPC_MCAM_ALLOC_COUNTER, 0x6006, npc_mcam_alloc_counter_req,		\
 				  npc_mcam_alloc_counter_rsp)		\
 M(NPC_MCAM_FREE_COUNTER,  0x6007, npc_mcam_oper_counter_req, msg_rsp)	\
+M(NPC_MCAM_UNMAP_COUNTER, 0x6008, npc_mcam_unmap_counter_req, msg_rsp)	\
 M(NPC_MCAM_CLEAR_COUNTER, 0x6009, npc_mcam_oper_counter_req, msg_rsp)	\
 M(NPC_MCAM_COUNTER_STATS, 0x600a, npc_mcam_oper_counter_req,		\
 				  npc_mcam_oper_counter_rsp)		\
@@ -658,4 +659,11 @@ struct npc_mcam_oper_counter_rsp {
 	u64 stat;  /* valid only while fetching counter's stats */
 };
 
+struct npc_mcam_unmap_counter_req {
+	struct mbox_msghdr hdr;
+	u16 cntr;
+	u16 entry; /* Entry and counter to be unmapped */
+	u8  all;   /* Unmap all entries using this counter ? */
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 4e10fe9..203f441 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -81,7 +81,9 @@ struct npc_mcam {
 	u16	bmap_entries;	/* Number of unreserved MCAM entries */
 	u16	bmap_fcnt;	/* MCAM entries free count */
 	u16	*entry2pfvf_map;
+	u16	*entry2cntr_map;
 	u16	*cntr2pfvf_map;
+	u16	*cntr_refcnt;
 	u8	keysize;	/* MCAM keysize 112/224/448 bits */
 	u8	banks;		/* Number of MCAM banks */
 	u8	banks_per_entry;/* Number of keywords in key */
@@ -405,6 +407,8 @@ int rvu_mbox_handler_NPC_MCAM_FREE_COUNTER(struct rvu *rvu,
 		   struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp);
 int rvu_mbox_handler_NPC_MCAM_CLEAR_COUNTER(struct rvu *rvu,
 		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_UNMAP_COUNTER(struct rvu *rvu,
+		struct npc_mcam_unmap_counter_req *req, struct msg_rsp *rsp);
 int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
 			struct npc_mcam_oper_counter_req *req,
 			struct npc_mcam_oper_counter_rsp *rsp);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index abc62f2..79df1d4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -28,6 +28,8 @@
 
 static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
 				      int blkaddr, u16 pcifunc);
+static void npc_mcam_free_all_counters(struct rvu *rvu, struct npc_mcam *mcam,
+				       u16 pcifunc);
 
 void rvu_npc_set_pkind(struct rvu *rvu, int pkind, struct rvu_pfvf *pfvf)
 {
@@ -506,6 +508,9 @@ void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
 	/* Disable and free all MCAM entries mapped to this 'pcifunc' */
 	npc_mcam_free_all_entries(rvu, mcam, blkaddr, pcifunc);
 
+	/* Free all MCAM counters mapped to this 'pcifunc' */
+	npc_mcam_free_all_counters(rvu, mcam, pcifunc);
+
 	mutex_unlock(&mcam->lock);
 
 	/* Disable ucast MCAM match entry of this PF/VF */
@@ -819,6 +824,19 @@ static int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
 	if (!mcam->cntr2pfvf_map)
 		goto free_mem;
 
+	/* Alloc memory for MCAM entry to counter mapping and for tracking
+	 * counter's reference count.
+	 */
+	mcam->entry2cntr_map = devm_kcalloc(rvu->dev, mcam->bmap_entries,
+					    sizeof(u16), GFP_KERNEL);
+	if (!mcam->entry2cntr_map)
+		goto free_mem;
+
+	mcam->cntr_refcnt = devm_kcalloc(rvu->dev, mcam->counters.max,
+					 sizeof(u16), GFP_KERNEL);
+	if (!mcam->cntr_refcnt)
+		goto free_mem;
+
 	mutex_init(&mcam->lock);
 
 	return 0;
@@ -948,6 +966,36 @@ static int npc_mcam_verify_counter(struct npc_mcam *mcam,
 	return 0;
 }
 
+static void npc_map_mcam_entry_and_cntr(struct rvu *rvu, struct npc_mcam *mcam,
+					int blkaddr, u16 entry, u16 cntr)
+{
+	u16 index = entry & (mcam->banksize - 1);
+	u16 bank = npc_get_bank(mcam, entry);
+
+	/* Set mapping and increment counter's refcnt */
+	mcam->entry2cntr_map[entry] = cntr;
+	mcam->cntr_refcnt[cntr]++;
+	/* Enable stats */
+	rvu_write64(rvu, blkaddr,
+		    NPC_AF_MCAMEX_BANKX_STAT_ACT(index, bank),
+		    BIT_ULL(9) | cntr);
+}
+
+static void npc_unmap_mcam_entry_and_cntr(struct rvu *rvu,
+					  struct npc_mcam *mcam,
+					  int blkaddr, u16 entry, u16 cntr)
+{
+	u16 index = entry & (mcam->banksize - 1);
+	u16 bank = npc_get_bank(mcam, entry);
+
+	/* Remove mapping and reduce counter's refcnt */
+	mcam->entry2cntr_map[entry] = NPC_MCAM_INVALID_MAP;
+	mcam->cntr_refcnt[cntr]--;
+	/* Disable stats */
+	rvu_write64(rvu, blkaddr,
+		    NPC_AF_MCAMEX_BANKX_STAT_ACT(index, bank), 0x00);
+}
+
 /* Sets MCAM entry in bitmap as used. Update
  * reverse bitmap too. Should be called with
  * 'mcam->lock' held.
@@ -983,7 +1031,7 @@ static void npc_mcam_clear_bit(struct npc_mcam *mcam, u16 index)
 static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
 				      int blkaddr, u16 pcifunc)
 {
-	u16 index;
+	u16 index, cntr;
 
 	/* Scan all MCAM entries and free the ones mapped to 'pcifunc' */
 	for (index = 0; index < mcam->bmap_entries; index++) {
@@ -993,6 +1041,33 @@ static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
 			npc_mcam_clear_bit(mcam, index);
 			/* Disable the entry */
 			npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false);
+
+			/* Update entry2counter mapping */
+			cntr = mcam->entry2cntr_map[index];
+			if (cntr != NPC_MCAM_INVALID_MAP)
+				npc_unmap_mcam_entry_and_cntr(rvu, mcam,
+							      blkaddr, index,
+							      cntr);
+		}
+	}
+}
+
+static void npc_mcam_free_all_counters(struct rvu *rvu, struct npc_mcam *mcam,
+				       u16 pcifunc)
+{
+	u16 cntr;
+
+	/* Scan all MCAM counters and free the ones mapped to 'pcifunc' */
+	for (cntr = 0; cntr < mcam->counters.max; cntr++) {
+		if (mcam->cntr2pfvf_map[cntr] == pcifunc) {
+			mcam->cntr2pfvf_map[cntr] = NPC_MCAM_INVALID_MAP;
+			mcam->cntr_refcnt[cntr] = 0;
+			rvu_free_rsrc(&mcam->counters, cntr);
+			/* This API is expected to be called after freeing
+			 * MCAM entries, which inturn will remove
+			 * 'entry to counter' mapping.
+			 * No need to do it again.
+			 */
 		}
 	}
 }
@@ -1282,6 +1357,7 @@ static int npc_mcam_alloc_entries(struct npc_mcam *mcam, u16 pcifunc,
 			(rsp->entry + entry) : rsp->entry_list[entry];
 		npc_mcam_set_bit(mcam, index);
 		mcam->entry2pfvf_map[index] = pcifunc;
+		mcam->entry2cntr_map[index] = NPC_MCAM_INVALID_MAP;
 	}
 
 	/* Update available free count in mbox response */
@@ -1338,6 +1414,7 @@ int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	u16 pcifunc = req->hdr.pcifunc;
 	int blkaddr, rc = 0;
+	u16 cntr;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
 	if (blkaddr < 0)
@@ -1360,6 +1437,12 @@ int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
 	npc_mcam_clear_bit(mcam, req->entry);
 	npc_enable_mcam_entry(rvu, mcam, blkaddr, req->entry, false);
 
+	/* Update entry2counter mapping */
+	cntr = mcam->entry2cntr_map[req->entry];
+	if (cntr != NPC_MCAM_INVALID_MAP)
+		npc_unmap_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+					      req->entry, cntr);
+
 	goto exit;
 
 free_all:
@@ -1387,6 +1470,12 @@ int rvu_mbox_handler_NPC_MCAM_WRITE_ENTRY(struct rvu *rvu,
 	if (rc)
 		goto exit;
 
+	if (req->set_cntr &&
+	    npc_mcam_verify_counter(mcam, pcifunc, req->cntr)) {
+		rc = NPC_MCAM_INVALID_REQ;
+		goto exit;
+	}
+
 	if (req->intf != NIX_INTF_RX && req->intf != NIX_INTF_TX) {
 		rc = NPC_MCAM_INVALID_REQ;
 		goto exit;
@@ -1395,6 +1484,10 @@ int rvu_mbox_handler_NPC_MCAM_WRITE_ENTRY(struct rvu *rvu,
 	npc_config_mcam_entry(rvu, mcam, blkaddr, req->entry, req->intf,
 			      &req->entry_data, req->enable_entry);
 
+	if (req->set_cntr)
+		npc_map_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+					    req->entry, req->cntr);
+
 	rc = 0;
 exit:
 	mutex_unlock(&mcam->lock);
@@ -1454,8 +1547,8 @@ int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	u16 pcifunc = req->hdr.pcifunc;
 	u16 old_entry, new_entry;
+	u16 index, cntr;
 	int blkaddr, rc;
-	u16 index;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
 	if (blkaddr < 0)
@@ -1480,12 +1573,27 @@ int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
 		if (rc)
 			break;
 
+		/* new_entry should not have a counter mapped */
+		if (mcam->entry2cntr_map[new_entry] != NPC_MCAM_INVALID_MAP) {
+			rc = NPC_MCAM_PERM_DENIED;
+			break;
+		}
+
 		/* Disable the new_entry */
 		npc_enable_mcam_entry(rvu, mcam, blkaddr, new_entry, false);
 
 		/* Copy rule from old entry to new entry */
 		npc_copy_mcam_entry(rvu, mcam, blkaddr, old_entry, new_entry);
 
+		/* Copy counter mapping, if any */
+		cntr = mcam->entry2cntr_map[old_entry];
+		if (cntr != NPC_MCAM_INVALID_MAP) {
+			npc_unmap_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+						      old_entry, cntr);
+			npc_map_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+						    new_entry, cntr);
+		}
+
 		/* Enable new_entry and disable old_entry */
 		npc_enable_mcam_entry(rvu, mcam, blkaddr, new_entry, true);
 		npc_enable_mcam_entry(rvu, mcam, blkaddr, old_entry, false);
@@ -1569,7 +1677,12 @@ int rvu_mbox_handler_NPC_MCAM_FREE_COUNTER(struct rvu *rvu,
 		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp)
 {
 	struct npc_mcam *mcam = &rvu->hw->mcam;
-	int err;
+	u16 index, entry = 0;
+	int blkaddr, err;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
 
 	mutex_lock(&mcam->lock);
 	err = npc_mcam_verify_counter(mcam, req->hdr.pcifunc, req->cntr);
@@ -1581,11 +1694,73 @@ int rvu_mbox_handler_NPC_MCAM_FREE_COUNTER(struct rvu *rvu,
 	/* Mark counter as free/unused */
 	mcam->cntr2pfvf_map[req->cntr] = NPC_MCAM_INVALID_MAP;
 	rvu_free_rsrc(&mcam->counters, req->cntr);
-	mutex_unlock(&mcam->lock);
 
+	/* Disable all MCAM entry's stats which are using this counter */
+	while (entry < mcam->bmap_entries) {
+		if (!mcam->cntr_refcnt[req->cntr])
+			break;
+
+		index = find_next_bit(mcam->bmap, mcam->bmap_entries, entry);
+		if (index >= mcam->bmap_entries)
+			break;
+		if (mcam->entry2cntr_map[index] != req->cntr)
+			continue;
+
+		entry = index + 1;
+		npc_unmap_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+					      index, req->cntr);
+	}
+
+	mutex_unlock(&mcam->lock);
 	return 0;
 }
 
+int rvu_mbox_handler_NPC_MCAM_UNMAP_COUNTER(struct rvu *rvu,
+		struct npc_mcam_unmap_counter_req *req, struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 index, entry = 0;
+	int blkaddr, rc;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	rc = npc_mcam_verify_counter(mcam, req->hdr.pcifunc, req->cntr);
+	if (rc)
+		goto exit;
+
+	/* Unmap the MCAM entry and counter */
+	if (!req->all) {
+		rc = npc_mcam_verify_entry(mcam, req->hdr.pcifunc, req->entry);
+		if (rc)
+			goto exit;
+		npc_unmap_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+					      req->entry, req->cntr);
+		goto exit;
+	}
+
+	/* Disable all MCAM entry's stats which are using this counter */
+	while (entry < mcam->bmap_entries) {
+		if (!mcam->cntr_refcnt[req->cntr])
+			break;
+
+		index = find_next_bit(mcam->bmap, mcam->bmap_entries, entry);
+		if (index >= mcam->bmap_entries)
+			break;
+		if (mcam->entry2cntr_map[index] != req->cntr)
+			continue;
+
+		entry = index + 1;
+		npc_unmap_mcam_entry_and_cntr(rvu, mcam, blkaddr,
+					      index, req->cntr);
+	}
+exit:
+	mutex_unlock(&mcam->lock);
+	return rc;
+}
+
 int rvu_mbox_handler_NPC_MCAM_CLEAR_COUNTER(struct rvu *rvu,
 		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp)
 {
-- 
2.7.4

^ permalink raw reply related

* [PATCH 06/20] octeontx2-af: Support for NPC MCAM counters
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

NPC HW has counters which can be mapped to MCAM
entries to gather entry match statistics. This
patch adds support to allocate, free, clear and retrieve
stats of NPC MCAM counters. New mailbox messages have
been added for this. Similar to MCAM entries both
contiguous and non-contiguous counter allocation is
supported.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  32 +++++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  10 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 144 +++++++++++++++++++++
 3 files changed, 186 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 76b666b..9fa491e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -156,6 +156,12 @@ M(NPC_MCAM_ENA_ENTRY,   0x6003, npc_mcam_ena_dis_entry_req, msg_rsp)	\
 M(NPC_MCAM_DIS_ENTRY,   0x6004, npc_mcam_ena_dis_entry_req, msg_rsp)	\
 M(NPC_MCAM_SHIFT_ENTRY, 0x6005, npc_mcam_shift_entry_req,		\
 				npc_mcam_shift_entry_rsp)		\
+M(NPC_MCAM_ALLOC_COUNTER, 0x6006, npc_mcam_alloc_counter_req,		\
+				  npc_mcam_alloc_counter_rsp)		\
+M(NPC_MCAM_FREE_COUNTER,  0x6007, npc_mcam_oper_counter_req, msg_rsp)	\
+M(NPC_MCAM_CLEAR_COUNTER, 0x6009, npc_mcam_oper_counter_req, msg_rsp)	\
+M(NPC_MCAM_COUNTER_STATS, 0x600a, npc_mcam_oper_counter_req,		\
+				  npc_mcam_oper_counter_rsp)		\
 /* NIX mbox IDs (range 0x8000 - 0xFFFF) */				\
 M(NIX_LF_ALLOC,		0x8000, nix_lf_alloc_req, nix_lf_alloc_rsp)	\
 M(NIX_LF_FREE,		0x8001, msg_req, msg_rsp)			\
@@ -626,4 +632,30 @@ struct npc_mcam_shift_entry_rsp {
 	u16 failed_entry_idx; /* Index in 'curr_entry', not entry itself */
 };
 
+struct npc_mcam_alloc_counter_req {
+	struct mbox_msghdr hdr;
+	u8  contig;	/* Contiguous counters ? */
+#define NPC_MAX_NONCONTIG_COUNTERS       64
+	u16 count;	/* Number of counters requested */
+};
+
+struct npc_mcam_alloc_counter_rsp {
+	struct mbox_msghdr hdr;
+	u16 cntr;   /* Counter allocated or start index if contiguous.
+		     * Invalid incase of non-contiguous.
+		     */
+	u16 count;  /* Number of counters allocated */
+	u16 cntr_list[NPC_MAX_NONCONTIG_COUNTERS];
+};
+
+struct npc_mcam_oper_counter_req {
+	struct mbox_msghdr hdr;
+	u16 cntr;   /* Free a counter or clear/fetch it's stats */
+};
+
+struct npc_mcam_oper_counter_rsp {
+	struct mbox_msghdr hdr;
+	u64 stat;  /* valid only while fetching counter's stats */
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index b649c9c..4e10fe9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -398,4 +398,14 @@ int rvu_mbox_handler_NPC_MCAM_DIS_ENTRY(struct rvu *rvu,
 int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
 					  struct npc_mcam_shift_entry_req *req,
 					  struct npc_mcam_shift_entry_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_ALLOC_COUNTER(struct rvu *rvu,
+				struct npc_mcam_alloc_counter_req *req,
+				struct npc_mcam_alloc_counter_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_FREE_COUNTER(struct rvu *rvu,
+		   struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_CLEAR_COUNTER(struct rvu *rvu,
+		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
+			struct npc_mcam_oper_counter_req *req,
+			struct npc_mcam_oper_counter_rsp *rsp);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index af0d8a1..abc62f2 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -933,6 +933,21 @@ static int npc_mcam_verify_entry(struct npc_mcam *mcam,
 	return 0;
 }
 
+static int npc_mcam_verify_counter(struct npc_mcam *mcam,
+				   u16 pcifunc, int cntr)
+{
+	/* Verify if counter is valid and if it is indeed
+	 * allocated to the requesting PFFUNC.
+	 */
+	if (cntr >= mcam->counters.max)
+		return NPC_MCAM_INVALID_REQ;
+
+	if (pcifunc != mcam->cntr2pfvf_map[cntr])
+		return NPC_MCAM_PERM_DENIED;
+
+	return 0;
+}
+
 /* Sets MCAM entry in bitmap as used. Update
  * reverse bitmap too. Should be called with
  * 'mcam->lock' held.
@@ -1485,3 +1500,132 @@ int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
 	mutex_unlock(&mcam->lock);
 	return rc;
 }
+
+int rvu_mbox_handler_NPC_MCAM_ALLOC_COUNTER(struct rvu *rvu,
+			struct npc_mcam_alloc_counter_req *req,
+			struct npc_mcam_alloc_counter_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	u16 max_contig, cntr;
+	int blkaddr, index;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	/* If the request is from a PFFUNC with no NIXLF attached, ignore */
+	if (!is_nixlf_attached(rvu, pcifunc))
+		return NPC_MCAM_INVALID_REQ;
+
+	/* Since list of allocated counter IDs needs to be sent to requester,
+	 * max number of non-contiguous counters per mbox msg is limited.
+	 */
+	if (!req->contig && req->count > NPC_MAX_NONCONTIG_COUNTERS)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+
+	/* Check if unused counters are available or not */
+	if (!rvu_rsrc_free_count(&mcam->counters)) {
+		mutex_unlock(&mcam->lock);
+		return NPC_MCAM_ALLOC_FAILED;
+	}
+
+	rsp->count = 0;
+
+	if (req->contig) {
+		/* Allocate requested number of contiguous counters, if
+		 * unsuccessful find max contiguous entries available.
+		 */
+		index = npc_mcam_find_zero_area(mcam->counters.bmap,
+						mcam->counters.max, 0,
+						req->count, &max_contig);
+		rsp->count = max_contig;
+		rsp->cntr = index;
+		for (cntr = index; cntr < (index + max_contig); cntr++) {
+			__set_bit(cntr, mcam->counters.bmap);
+			mcam->cntr2pfvf_map[cntr] = pcifunc;
+		}
+	} else {
+		/* Allocate requested number of non-contiguous counters,
+		 * if unsuccessful allocate as many as possible.
+		 */
+		for (cntr = 0; cntr < req->count; cntr++) {
+			index = rvu_alloc_rsrc(&mcam->counters);
+			if (index < 0)
+				break;
+			rsp->cntr_list[cntr] = index;
+			rsp->count++;
+			mcam->cntr2pfvf_map[index] = pcifunc;
+		}
+	}
+
+	mutex_unlock(&mcam->lock);
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_FREE_COUNTER(struct rvu *rvu,
+		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int err;
+
+	mutex_lock(&mcam->lock);
+	err = npc_mcam_verify_counter(mcam, req->hdr.pcifunc, req->cntr);
+	if (err) {
+		mutex_unlock(&mcam->lock);
+		return err;
+	}
+
+	/* Mark counter as free/unused */
+	mcam->cntr2pfvf_map[req->cntr] = NPC_MCAM_INVALID_MAP;
+	rvu_free_rsrc(&mcam->counters, req->cntr);
+	mutex_unlock(&mcam->lock);
+
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_CLEAR_COUNTER(struct rvu *rvu,
+		struct npc_mcam_oper_counter_req *req, struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int blkaddr, err;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	err = npc_mcam_verify_counter(mcam, req->hdr.pcifunc, req->cntr);
+	mutex_unlock(&mcam->lock);
+	if (err)
+		return err;
+
+	rvu_write64(rvu, blkaddr, NPC_AF_MATCH_STATX(req->cntr), 0x00);
+
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_COUNTER_STATS(struct rvu *rvu,
+			struct npc_mcam_oper_counter_req *req,
+			struct npc_mcam_oper_counter_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int blkaddr, err;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	err = npc_mcam_verify_counter(mcam, req->hdr.pcifunc, req->cntr);
+	mutex_unlock(&mcam->lock);
+	if (err)
+		return err;
+
+	rsp->stat = rvu_read64(rvu, blkaddr, NPC_AF_MATCH_STATX(req->cntr));
+	rsp->stat &= BIT_ULL(48) - 1;
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 05/20] octeontx2-af: MCAM entry installation support
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

Add support for a RVU PF/VF to enable, disable, configure
and shuffle MCAM entries via mbox commands. This patch adds
mailbox message formats and handling of these commands.

As of now otherthan validating MCAM entry index, info like
channel number e.t.c in MCAM config data sent by PF/VF are
not validated.

Also a max of 64 MCAM entries can be shuffled with a single
mbox command.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  42 +++++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  12 ++
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 179 ++++++++++++++++++++-
 3 files changed, 225 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 9eefbf7..76b666b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -151,6 +151,11 @@ M(NPA_HWCTX_DISABLE,	0x403, hwctx_disable_req, msg_rsp)		\
 M(NPC_MCAM_ALLOC_ENTRY,	0x6000, npc_mcam_alloc_entry_req,		\
 				npc_mcam_alloc_entry_rsp)		\
 M(NPC_MCAM_FREE_ENTRY,	0x6001, npc_mcam_free_entry_req, msg_rsp)	\
+M(NPC_MCAM_WRITE_ENTRY,	0x6002, npc_mcam_write_entry_req, msg_rsp)	\
+M(NPC_MCAM_ENA_ENTRY,   0x6003, npc_mcam_ena_dis_entry_req, msg_rsp)	\
+M(NPC_MCAM_DIS_ENTRY,   0x6004, npc_mcam_ena_dis_entry_req, msg_rsp)	\
+M(NPC_MCAM_SHIFT_ENTRY, 0x6005, npc_mcam_shift_entry_req,		\
+				npc_mcam_shift_entry_rsp)		\
 /* NIX mbox IDs (range 0x8000 - 0xFFFF) */				\
 M(NIX_LF_ALLOC,		0x8000, nix_lf_alloc_req, nix_lf_alloc_rsp)	\
 M(NIX_LF_FREE,		0x8001, msg_req, msg_rsp)			\
@@ -584,4 +589,41 @@ struct npc_mcam_free_entry_req {
 	u8  all;   /* If all entries allocated to this PFVF to be freed */
 };
 
+struct mcam_entry {
+#define NPC_MAX_KWS_IN_KEY	7 /* Number of keywords in max keywidth */
+	u64	kw[NPC_MAX_KWS_IN_KEY];
+	u64	kw_mask[NPC_MAX_KWS_IN_KEY];
+	u64	action;
+	u64	vtag_action;
+};
+
+struct npc_mcam_write_entry_req {
+	struct mbox_msghdr hdr;
+	struct mcam_entry entry_data;
+	u16 entry;	 /* MCAM entry to write this match key */
+	u16 cntr;	 /* Counter for this MCAM entry */
+	u8  intf;	 /* Rx or Tx interface */
+	u8  enable_entry;/* Enable this MCAM entry ? */
+	u8  set_cntr;    /* Set counter for this entry ? */
+};
+
+/* Enable/Disable a given entry */
+struct npc_mcam_ena_dis_entry_req {
+	struct mbox_msghdr hdr;
+	u16 entry;
+};
+
+struct npc_mcam_shift_entry_req {
+	struct mbox_msghdr hdr;
+#define NPC_MCAM_MAX_SHIFTS	64
+	u16 curr_entry[NPC_MCAM_MAX_SHIFTS];
+	u16 new_entry[NPC_MCAM_MAX_SHIFTS];
+	u16 shift_count; /* Number of entries to shift */
+};
+
+struct npc_mcam_shift_entry_rsp {
+	struct mbox_msghdr hdr;
+	u16 failed_entry_idx; /* Index in 'curr_entry', not entry itself */
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 01a2fcd..b649c9c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -386,4 +386,16 @@ int rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(struct rvu *rvu,
 int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
 					 struct npc_mcam_free_entry_req *req,
 					 struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_WRITE_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_write_entry_req *req,
+					  struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_ENA_ENTRY(struct rvu *rvu,
+					struct npc_mcam_ena_dis_entry_req *req,
+					struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_DIS_ENTRY(struct rvu *rvu,
+					struct npc_mcam_ena_dis_entry_req *req,
+					struct msg_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_shift_entry_req *req,
+					  struct npc_mcam_shift_entry_rsp *rsp);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 82c1c10..af0d8a1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -29,14 +29,6 @@
 static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
 				      int blkaddr, u16 pcifunc);
 
-struct mcam_entry {
-#define NPC_MAX_KWS_IN_KEY	7 /* Number of keywords in max keywidth */
-	u64	kw[NPC_MAX_KWS_IN_KEY];
-	u64	kw_mask[NPC_MAX_KWS_IN_KEY];
-	u64	action;
-	u64	vtag_action;
-};
-
 void rvu_npc_set_pkind(struct rvu *rvu, int pkind, struct rvu_pfvf *pfvf)
 {
 	int blkaddr;
@@ -259,6 +251,46 @@ static void npc_config_mcam_entry(struct rvu *rvu, struct npc_mcam *mcam,
 		npc_enable_mcam_entry(rvu, mcam, blkaddr, actindex, false);
 }
 
+static void npc_copy_mcam_entry(struct rvu *rvu, struct npc_mcam *mcam,
+				int blkaddr, u16 src, u16 dest)
+{
+	int dbank = npc_get_bank(mcam, dest);
+	int sbank = npc_get_bank(mcam, src);
+	u64 cfg, sreg, dreg;
+	int bank, i;
+
+	src &= (mcam->banksize - 1);
+	dest &= (mcam->banksize - 1);
+
+	/* Copy INTF's, W0's, W1's CAM0 and CAM1 configuration */
+	for (bank = 0; bank < mcam->banks_per_entry; bank++) {
+		sreg = NPC_AF_MCAMEX_BANKX_CAMX_INTF(src, sbank + bank, 0);
+		dreg = NPC_AF_MCAMEX_BANKX_CAMX_INTF(dest, dbank + bank, 0);
+		for (i = 0; i < 6; i++) {
+			cfg = rvu_read64(rvu, blkaddr, sreg + (i * 8));
+			rvu_write64(rvu, blkaddr, dreg + (i * 8), cfg);
+		}
+	}
+
+	/* Copy action */
+	cfg = rvu_read64(rvu, blkaddr,
+			 NPC_AF_MCAMEX_BANKX_ACTION(src, sbank));
+	rvu_write64(rvu, blkaddr,
+		    NPC_AF_MCAMEX_BANKX_ACTION(dest, dbank), cfg);
+
+	/* Copy TAG action */
+	cfg = rvu_read64(rvu, blkaddr,
+			 NPC_AF_MCAMEX_BANKX_TAG_ACT(src, sbank));
+	rvu_write64(rvu, blkaddr,
+		    NPC_AF_MCAMEX_BANKX_TAG_ACT(dest, dbank), cfg);
+
+	/* Enable or disable */
+	cfg = rvu_read64(rvu, blkaddr,
+			 NPC_AF_MCAMEX_BANKX_CFG(src, sbank));
+	rvu_write64(rvu, blkaddr,
+		    NPC_AF_MCAMEX_BANKX_CFG(dest, dbank), cfg);
+}
+
 static u64 npc_get_mcam_action(struct rvu *rvu, struct npc_mcam *mcam,
 			       int blkaddr, int index)
 {
@@ -1322,3 +1354,134 @@ int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
 	mutex_unlock(&mcam->lock);
 	return rc;
 }
+
+int rvu_mbox_handler_NPC_MCAM_WRITE_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_write_entry_req *req,
+					  struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, rc;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	rc = npc_mcam_verify_entry(mcam, pcifunc, req->entry);
+	if (rc)
+		goto exit;
+
+	if (req->intf != NIX_INTF_RX && req->intf != NIX_INTF_TX) {
+		rc = NPC_MCAM_INVALID_REQ;
+		goto exit;
+	}
+
+	npc_config_mcam_entry(rvu, mcam, blkaddr, req->entry, req->intf,
+			      &req->entry_data, req->enable_entry);
+
+	rc = 0;
+exit:
+	mutex_unlock(&mcam->lock);
+	return rc;
+}
+
+int rvu_mbox_handler_NPC_MCAM_ENA_ENTRY(struct rvu *rvu,
+					struct npc_mcam_ena_dis_entry_req *req,
+					struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, rc;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	rc = npc_mcam_verify_entry(mcam, pcifunc, req->entry);
+	mutex_unlock(&mcam->lock);
+	if (rc)
+		return rc;
+
+	npc_enable_mcam_entry(rvu, mcam, blkaddr, req->entry, true);
+
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_DIS_ENTRY(struct rvu *rvu,
+					struct npc_mcam_ena_dis_entry_req *req,
+					struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, rc;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	rc = npc_mcam_verify_entry(mcam, pcifunc, req->entry);
+	mutex_unlock(&mcam->lock);
+	if (rc)
+		return rc;
+
+	npc_enable_mcam_entry(rvu, mcam, blkaddr, req->entry, false);
+
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_SHIFT_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_shift_entry_req *req,
+					  struct npc_mcam_shift_entry_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	u16 old_entry, new_entry;
+	int blkaddr, rc;
+	u16 index;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	if (req->shift_count > NPC_MCAM_MAX_SHIFTS)
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+	for (index = 0; index < req->shift_count; index++) {
+		old_entry = req->curr_entry[index];
+		new_entry = req->new_entry[index];
+
+		/* Check if both old and new entries are valid and
+		 * does belong to this PFFUNC or not.
+		 */
+		rc = npc_mcam_verify_entry(mcam, pcifunc, old_entry);
+		if (rc)
+			break;
+
+		rc = npc_mcam_verify_entry(mcam, pcifunc, new_entry);
+		if (rc)
+			break;
+
+		/* Disable the new_entry */
+		npc_enable_mcam_entry(rvu, mcam, blkaddr, new_entry, false);
+
+		/* Copy rule from old entry to new entry */
+		npc_copy_mcam_entry(rvu, mcam, blkaddr, old_entry, new_entry);
+
+		/* Enable new_entry and disable old_entry */
+		npc_enable_mcam_entry(rvu, mcam, blkaddr, new_entry, true);
+		npc_enable_mcam_entry(rvu, mcam, blkaddr, old_entry, false);
+	}
+
+	/* If shift has failed then report the failed index */
+	if (index != req->shift_count) {
+		rc = NPC_MCAM_PERM_DENIED;
+		rsp->failed_entry_idx = index;
+	}
+
+	mutex_unlock(&mcam->lock);
+	return rc;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 04/20] octeontx2-af: NPC MCAM entry alloc/free support
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

This patch adds NPC MCAM entry management and support for
allocating and freeing them via mailbox. Both contiguous and
non-contiguous allocations are supported. Incase of contiguous,
if request cannot be met then max contiguous number of available
entries are allocated.

High or low priority index allocation w.r.t a reference MCAM index
is also supported.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  48 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  19 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    |  11 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 514 ++++++++++++++++++++-
 4 files changed, 587 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 89db883..9eefbf7 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -148,6 +148,9 @@ M(NPA_HWCTX_DISABLE,	0x403, hwctx_disable_req, msg_rsp)		\
 /* TIM mbox IDs (range 0x800 - 0x9FF) */				\
 /* CPT mbox IDs (range 0xA00 - 0xBFF) */				\
 /* NPC mbox IDs (range 0x6000 - 0x7FFF) */				\
+M(NPC_MCAM_ALLOC_ENTRY,	0x6000, npc_mcam_alloc_entry_req,		\
+				npc_mcam_alloc_entry_rsp)		\
+M(NPC_MCAM_FREE_ENTRY,	0x6001, npc_mcam_free_entry_req, msg_rsp)	\
 /* NIX mbox IDs (range 0x8000 - 0xFFFF) */				\
 M(NIX_LF_ALLOC,		0x8000, nix_lf_alloc_req, nix_lf_alloc_rsp)	\
 M(NIX_LF_FREE,		0x8001, msg_req, msg_rsp)			\
@@ -348,6 +351,8 @@ struct hwctx_disable_req {
 	u8 ctype;
 };
 
+/* NIX mbox message formats */
+
 /* NIX mailbox error codes
  * Range 401 - 500.
  */
@@ -536,4 +541,47 @@ struct nix_frs_cfg {
 	u16	minlen;
 };
 
+/* NPC mbox message structs */
+
+#define NPC_MCAM_ENTRY_INVALID	0xFFFF
+#define NPC_MCAM_INVALID_MAP	0xFFFF
+
+/* NPC mailbox error codes
+ * Range 701 - 800.
+ */
+enum npc_af_status {
+	NPC_MCAM_INVALID_REQ	= -701,
+	NPC_MCAM_ALLOC_DENIED	= -702,
+	NPC_MCAM_ALLOC_FAILED	= -703,
+	NPC_MCAM_PERM_DENIED	= -704,
+};
+
+struct npc_mcam_alloc_entry_req {
+	struct mbox_msghdr hdr;
+#define NPC_MAX_NONCONTIG_ENTRIES	256
+	u8  contig;   /* Contiguous entries ? */
+#define NPC_MCAM_ANY_PRIO		0
+#define NPC_MCAM_LOWER_PRIO		1
+#define NPC_MCAM_HIGHER_PRIO		2
+	u8  priority; /* Lower or higher w.r.t ref_entry */
+	u16 ref_entry;
+	u16 count;    /* Number of entries requested */
+};
+
+struct npc_mcam_alloc_entry_rsp {
+	struct mbox_msghdr hdr;
+	u16 entry; /* Entry allocated or start index if contiguous.
+		    * Invalid incase of non-contiguous.
+		    */
+	u16 count; /* Number of entries allocated */
+	u16 free_count; /* Number of entries available */
+	u16 entry_list[NPC_MAX_NONCONTIG_ENTRIES];
+};
+
+struct npc_mcam_free_entry_req {
+	struct mbox_msghdr hdr;
+	u16 entry; /* Entry index to be freed */
+	u8  all;   /* If all entries allocated to this PFVF to be freed */
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 12268de..01a2fcd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -74,15 +74,25 @@ struct nix_mce_list {
 };
 
 struct npc_mcam {
+	struct rsrc_bmap counters;
 	struct mutex	lock;	/* MCAM entries and counters update lock */
+	unsigned long	*bmap;		/* bitmap, 0 => bmap_entries */
+	unsigned long	*bmap_reverse;	/* Reverse bitmap, bmap_entries => 0 */
+	u16	bmap_entries;	/* Number of unreserved MCAM entries */
+	u16	bmap_fcnt;	/* MCAM entries free count */
+	u16	*entry2pfvf_map;
+	u16	*cntr2pfvf_map;
 	u8	keysize;	/* MCAM keysize 112/224/448 bits */
 	u8	banks;		/* Number of MCAM banks */
 	u8	banks_per_entry;/* Number of keywords in key */
 	u16	banksize;	/* Number of MCAM entries in each bank */
 	u16	total_entries;	/* Total number of MCAM entries */
-	u16     entries;	/* Total minus reserved for NIX LFs */
 	u16	nixlf_offset;	/* Offset of nixlf rsvd uncast entries */
 	u16	pf_offset;	/* Offset of PF's rsvd bcast, promisc entries */
+	u16	lprio_count;
+	u16	lprio_start;
+	u16	hprio_count;
+	u16	hprio_end;
 };
 
 /* Structure for per RVU func info ie PF/VF */
@@ -315,6 +325,7 @@ int rvu_mbox_handler_NPA_LF_FREE(struct rvu *rvu, struct msg_req *req,
 				 struct msg_rsp *rsp);
 
 /* NIX APIs */
+bool is_nixlf_attached(struct rvu *rvu, u16 pcifunc);
 int rvu_nix_init(struct rvu *rvu);
 void rvu_nix_freemem(struct rvu *rvu);
 int rvu_get_nixlf_count(struct rvu *rvu);
@@ -369,4 +380,10 @@ void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc,
 void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf);
 void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 				    int group, int alg_idx, int mcam_index);
+int rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_alloc_entry_req *req,
+					  struct npc_mcam_alloc_entry_rsp *rsp);
+int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
+					 struct npc_mcam_free_entry_req *req,
+					 struct msg_rsp *rsp);
 #endif /* RVU_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 4d8ae2e..9de9aaf 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -55,6 +55,17 @@ struct mce {
 	u16			pcifunc;
 };
 
+bool is_nixlf_attached(struct rvu *rvu, u16 pcifunc)
+{
+	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
+	int blkaddr;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
+	if (!pfvf->nixlf || blkaddr < 0)
+		return false;
+	return true;
+}
+
 int rvu_get_nixlf_count(struct rvu *rvu)
 {
 	struct rvu_block *block;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 3a96dfd..82c1c10 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -26,6 +26,9 @@
 
 #define NPC_PARSE_RESULT_DMAC_OFFSET	8
 
+static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
+				      int blkaddr, u16 pcifunc);
+
 struct mcam_entry {
 #define NPC_MAX_KWS_IN_KEY	7 /* Number of keywords in max keywidth */
 	u64	kw[NPC_MAX_KWS_IN_KEY];
@@ -466,6 +469,13 @@ void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
 	if (blkaddr < 0)
 		return;
 
+	mutex_lock(&mcam->lock);
+
+	/* Disable and free all MCAM entries mapped to this 'pcifunc' */
+	npc_mcam_free_all_entries(rvu, mcam, blkaddr, pcifunc);
+
+	mutex_unlock(&mcam->lock);
+
 	/* Disable ucast MCAM match entry of this PF/VF */
 	index = npc_get_nixlf_mcam_index(mcam, pcifunc,
 					 nixlf, NIXLF_UCAST_ENTRY);
@@ -690,13 +700,14 @@ static int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
 {
 	int nixlf_count = rvu_get_nixlf_count(rvu);
 	struct npc_mcam *mcam = &rvu->hw->mcam;
-	int rsvd;
+	int rsvd, err;
 	u64 cfg;
 
 	/* Get HW limits */
 	cfg = rvu_read64(rvu, blkaddr, NPC_AF_CONST);
 	mcam->banks = (cfg >> 44) & 0xF;
 	mcam->banksize = (cfg >> 28) & 0xFFFF;
+	mcam->counters.max = (cfg >> 48) & 0xFFFF;
 
 	/* Actual number of MCAM entries vary by entry size */
 	cfg = (rvu_read64(rvu, blkaddr,
@@ -728,20 +739,69 @@ static int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
 		return -ENOMEM;
 	}
 
-	mcam->entries = mcam->total_entries - rsvd;
-	mcam->nixlf_offset = mcam->entries;
+	mcam->bmap_entries = mcam->total_entries - rsvd;
+	mcam->nixlf_offset = mcam->bmap_entries;
 	mcam->pf_offset = mcam->nixlf_offset + nixlf_count;
 
+	/* Allocate bitmaps for managing MCAM entries */
+	mcam->bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(mcam->bmap_entries),
+				  sizeof(long), GFP_KERNEL);
+	if (!mcam->bmap)
+		return -ENOMEM;
+
+	mcam->bmap_reverse = devm_kcalloc(rvu->dev,
+					  BITS_TO_LONGS(mcam->bmap_entries),
+					  sizeof(long), GFP_KERNEL);
+	if (!mcam->bmap_reverse)
+		return -ENOMEM;
+
+	mcam->bmap_fcnt = mcam->bmap_entries;
+
+	/* Alloc memory for saving entry to RVU PFFUNC allocation mapping */
+	mcam->entry2pfvf_map = devm_kcalloc(rvu->dev, mcam->bmap_entries,
+					    sizeof(u16), GFP_KERNEL);
+	if (!mcam->entry2pfvf_map)
+		return -ENOMEM;
+
+	/* Reserve 1/8th of MCAM entries at the bottom for low priority
+	 * allocations and another 1/8th at the top for high priority
+	 * allocations.
+	 */
+	mcam->lprio_count = mcam->bmap_entries / 8;
+	if (mcam->lprio_count > BITS_PER_LONG)
+		mcam->lprio_count = round_down(mcam->lprio_count,
+					       BITS_PER_LONG);
+	mcam->lprio_start = mcam->bmap_entries - mcam->lprio_count;
+	mcam->hprio_count = mcam->lprio_count;
+	mcam->hprio_end = mcam->hprio_count;
+
+	/* Allocate bitmap for managing MCAM counters and memory
+	 * for saving counter to RVU PFFUNC allocation mapping.
+	 */
+	err = rvu_alloc_bitmap(&mcam->counters);
+	if (err)
+		return err;
+
+	mcam->cntr2pfvf_map = devm_kcalloc(rvu->dev, mcam->counters.max,
+					   sizeof(u16), GFP_KERNEL);
+	if (!mcam->cntr2pfvf_map)
+		goto free_mem;
+
 	mutex_init(&mcam->lock);
 
 	return 0;
+
+free_mem:
+	kfree(mcam->counters.bmap);
+	return -ENOMEM;
 }
 
 int rvu_npc_init(struct rvu *rvu)
 {
 	struct npc_pkind *pkind = &rvu->hw->pkind;
 	u64 keyz = NPC_MCAM_KEY_X2;
-	int blkaddr, err;
+	int blkaddr, entry, bank, err;
+	u64 cfg;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
 	if (blkaddr < 0) {
@@ -749,6 +809,14 @@ int rvu_npc_init(struct rvu *rvu)
 		return -ENODEV;
 	}
 
+	/* First disable all MCAM entries, to stop traffic towards NIXLFs */
+	cfg = rvu_read64(rvu, blkaddr, NPC_AF_CONST);
+	for (bank = 0; bank < ((cfg >> 44) & 0xF); bank++) {
+		for (entry = 0; entry < ((cfg >> 28) & 0xFFFF); entry++)
+			rvu_write64(rvu, blkaddr,
+				    NPC_AF_MCAMEX_BANKX_CFG(entry, bank), 0);
+	}
+
 	/* Allocate resource bimap for pkind*/
 	pkind->rsrc.max = (rvu_read64(rvu, blkaddr,
 				      NPC_AF_CONST1) >> 12) & 0xFF;
@@ -814,5 +882,443 @@ void rvu_npc_freemem(struct rvu *rvu)
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 
 	kfree(pkind->rsrc.bmap);
+	kfree(mcam->counters.bmap);
 	mutex_destroy(&mcam->lock);
 }
+
+static int npc_mcam_verify_entry(struct npc_mcam *mcam,
+				 u16 pcifunc, int entry)
+{
+	/* Verify if entry is valid and if it is indeed
+	 * allocated to the requesting PFFUNC.
+	 */
+	if (entry >= mcam->bmap_entries)
+		return NPC_MCAM_INVALID_REQ;
+
+	if (pcifunc != mcam->entry2pfvf_map[entry])
+		return NPC_MCAM_PERM_DENIED;
+
+	return 0;
+}
+
+/* Sets MCAM entry in bitmap as used. Update
+ * reverse bitmap too. Should be called with
+ * 'mcam->lock' held.
+ */
+static void npc_mcam_set_bit(struct npc_mcam *mcam, u16 index)
+{
+	u16 entry, rentry;
+
+	entry = index;
+	rentry = mcam->bmap_entries - index - 1;
+
+	__set_bit(entry, mcam->bmap);
+	__set_bit(rentry, mcam->bmap_reverse);
+	mcam->bmap_fcnt--;
+}
+
+/* Sets MCAM entry in bitmap as free. Update
+ * reverse bitmap too. Should be called with
+ * 'mcam->lock' held.
+ */
+static void npc_mcam_clear_bit(struct npc_mcam *mcam, u16 index)
+{
+	u16 entry, rentry;
+
+	entry = index;
+	rentry = mcam->bmap_entries - index - 1;
+
+	__clear_bit(entry, mcam->bmap);
+	__clear_bit(rentry, mcam->bmap_reverse);
+	mcam->bmap_fcnt++;
+}
+
+static void npc_mcam_free_all_entries(struct rvu *rvu, struct npc_mcam *mcam,
+				      int blkaddr, u16 pcifunc)
+{
+	u16 index;
+
+	/* Scan all MCAM entries and free the ones mapped to 'pcifunc' */
+	for (index = 0; index < mcam->bmap_entries; index++) {
+		if (mcam->entry2pfvf_map[index] == pcifunc) {
+			mcam->entry2pfvf_map[index] = NPC_MCAM_INVALID_MAP;
+			/* Free the entry in bitmap */
+			npc_mcam_clear_bit(mcam, index);
+			/* Disable the entry */
+			npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false);
+		}
+	}
+}
+
+/* Find area of contiguous free entries of size 'nr'.
+ * If not found return max contiguous free entries available.
+ */
+static u16 npc_mcam_find_zero_area(unsigned long *map, u16 size, u16 start,
+				   u16 nr, u16 *max_area)
+{
+	u16 max_area_start = 0;
+	u16 index, next, end;
+
+	*max_area = 0;
+
+again:
+	index = find_next_zero_bit(map, size, start);
+	if (index >= size)
+		return max_area_start;
+
+	end = ((index + nr) >= size) ? size : index + nr;
+	next = find_next_bit(map, end, index);
+	if (*max_area < (next - index)) {
+		*max_area = next - index;
+		max_area_start = index;
+	}
+
+	if (next < end) {
+		start = next + 1;
+		goto again;
+	}
+
+	return max_area_start;
+}
+
+/* Find number of free MCAM entries available
+ * within range i.e in between 'start' and 'end'.
+ */
+static u16 npc_mcam_get_free_count(unsigned long *map, u16 start, u16 end)
+{
+	u16 index, next;
+	u16 fcnt = 0;
+
+again:
+	if (start >= end)
+		return fcnt;
+
+	index = find_next_zero_bit(map, end, start);
+	if (index >= end)
+		return fcnt;
+
+	next = find_next_bit(map, end, index);
+	if (next <= end) {
+		fcnt += next - index;
+		start = next + 1;
+		goto again;
+	}
+
+	fcnt += end - index;
+	return fcnt;
+}
+
+static void
+npc_get_mcam_search_range_priority(struct npc_mcam *mcam,
+				   struct npc_mcam_alloc_entry_req *req,
+				   u16 *start, u16 *end, bool *reverse)
+{
+	u16 fcnt;
+
+	if (req->priority == NPC_MCAM_HIGHER_PRIO)
+		goto hprio;
+
+	/* For a low priority entry allocation
+	 * - If reference entry is not in hprio zone then
+	 *      search range: ref_entry to end.
+	 * - If reference entry is in hprio zone and if
+	 *   request can be accomodated in non-hprio zone then
+	 *      search range: 'start of middle zone' to 'end'
+	 * - else search in reverse, so that less number of hprio
+	 *   zone entries are allocated.
+	 */
+
+	*reverse = false;
+	*start = req->ref_entry + 1;
+	*end = mcam->bmap_entries;
+
+	if (req->ref_entry >= mcam->hprio_end)
+		return;
+
+	fcnt = npc_mcam_get_free_count(mcam->bmap,
+				       mcam->hprio_end, mcam->bmap_entries);
+	if (fcnt > req->count)
+		*start = mcam->hprio_end;
+	else
+		*reverse = true;
+	return;
+
+hprio:
+	/* For a high priority entry allocation, search is always
+	 * in reverse to preserve hprio zone entries.
+	 * - If reference entry is not in lprio zone then
+	 *      search range: 0 to ref_entry.
+	 * - If reference entry is in lprio zone and if
+	 *   request can be accomodated in middle zone then
+	 *      search range: 'hprio_end' to 'lprio_start'
+	 */
+
+	*reverse = true;
+	*start = 0;
+	*end = req->ref_entry;
+
+	if (req->ref_entry <= mcam->lprio_start)
+		return;
+
+	fcnt = npc_mcam_get_free_count(mcam->bmap,
+				       mcam->hprio_end, mcam->lprio_start);
+	if (fcnt < req->count)
+		return;
+	*start = mcam->hprio_end;
+	*end = mcam->lprio_start;
+}
+
+static int npc_mcam_alloc_entries(struct npc_mcam *mcam, u16 pcifunc,
+				  struct npc_mcam_alloc_entry_req *req,
+				  struct npc_mcam_alloc_entry_rsp *rsp)
+{
+	u16 entry_list[NPC_MAX_NONCONTIG_ENTRIES];
+	u16 fcnt, hp_fcnt, lp_fcnt;
+	u16 start, end, index;
+	int entry, next_start;
+	bool reverse = false;
+	unsigned long *bmap;
+	u16 max_contig;
+
+	mutex_lock(&mcam->lock);
+
+	/* Check if there are any free entries */
+	if (!mcam->bmap_fcnt) {
+		mutex_unlock(&mcam->lock);
+		return NPC_MCAM_ALLOC_FAILED;
+	}
+
+	/* MCAM entries are divided into high priority, middle and
+	 * low priority zones. Idea is to not allocate top and lower
+	 * most entries as much as possible, this is to increase
+	 * probability of honouring priority allocation requests.
+	 *
+	 * Two bitmaps are used for mcam entry management,
+	 * mcam->bmap for forward search i.e '0 to mcam->bmap_entries'.
+	 * mcam->bmap_reverse for reverse search i.e 'mcam->bmap_entries to 0'.
+	 *
+	 * Reverse bitmap is used to allocate entries
+	 * - when a higher priority entry is requested
+	 * - when available free entries are less.
+	 * Lower priority ones out of avaialble free entries are always
+	 * chosen when 'high vs low' question arises.
+	 */
+
+	/* Get the search range for priority allocation request */
+	if (req->priority) {
+		npc_get_mcam_search_range_priority(mcam, req,
+						   &start, &end, &reverse);
+		goto alloc;
+	}
+
+	/* Find out the search range for non-priority allocation request
+	 *
+	 * Get MCAM free entry count in middle zone.
+	 */
+	lp_fcnt = npc_mcam_get_free_count(mcam->bmap,
+					  mcam->lprio_start,
+					  mcam->bmap_entries);
+	hp_fcnt = npc_mcam_get_free_count(mcam->bmap, 0, mcam->hprio_end);
+	fcnt = mcam->bmap_fcnt - lp_fcnt - hp_fcnt;
+
+	/* Check if request can be accomodated in the middle zone */
+	if (fcnt > req->count) {
+		start = mcam->hprio_end;
+		end = mcam->lprio_start;
+	} else if ((fcnt + (hp_fcnt / 2) + (lp_fcnt / 2)) > req->count) {
+		/* Expand search zone from half of hprio zone to
+		 * half of lprio zone.
+		 */
+		start = mcam->hprio_end / 2;
+		end = mcam->bmap_entries - (mcam->lprio_count / 2);
+		reverse = true;
+	} else {
+		/* Not enough free entries, search all entries in reverse,
+		 * so that low priority ones will get used up.
+		 */
+		reverse = true;
+		start = 0;
+		end = mcam->bmap_entries;
+	}
+
+alloc:
+	if (reverse) {
+		bmap = mcam->bmap_reverse;
+		start = mcam->bmap_entries - start;
+		end = mcam->bmap_entries - end;
+		index = start;
+		start = end;
+		end = index;
+	} else {
+		bmap = mcam->bmap;
+	}
+
+	if (req->contig) {
+		/* Allocate requested number of contiguous entries, if
+		 * unsuccessful find max contiguous entries available.
+		 */
+		index = npc_mcam_find_zero_area(bmap, end, start,
+						req->count, &max_contig);
+		rsp->count = max_contig;
+		if (reverse)
+			rsp->entry = mcam->bmap_entries - index - max_contig;
+		else
+			rsp->entry = index;
+	} else {
+		/* Allocate requested number of non-contiguous entries,
+		 * if unsuccessful allocate as many as possible.
+		 */
+		rsp->count = 0;
+		next_start = start;
+		for (entry = 0; entry < req->count; entry++) {
+			index = find_next_zero_bit(bmap, end, next_start);
+			if (index >= end)
+				break;
+
+			next_start = start + (index - start) + 1;
+
+			/* Save the entry's index */
+			if (reverse)
+				index = mcam->bmap_entries - index - 1;
+			entry_list[entry] = index;
+			rsp->count++;
+		}
+	}
+
+	/* If allocating requested no of entries is unsucessful,
+	 * expand the search range to full bitmap length and retry.
+	 */
+	if (!req->priority && (rsp->count < req->count) &&
+	    ((end - start) != mcam->bmap_entries)) {
+		reverse = true;
+		start = 0;
+		end = mcam->bmap_entries;
+		goto alloc;
+	}
+
+	/* For priority entry allocation requests, if allocation is
+	 * failed then expand search to max possible range and retry.
+	 */
+	if (req->priority && rsp->count < req->count) {
+		if (req->priority == NPC_MCAM_LOWER_PRIO &&
+		    (start != (req->ref_entry + 1))) {
+			start = req->ref_entry + 1;
+			end = mcam->bmap_entries;
+			reverse = false;
+			goto alloc;
+		} else if ((req->priority == NPC_MCAM_HIGHER_PRIO) &&
+			   ((end - start) != req->ref_entry)) {
+			start = 0;
+			end = req->ref_entry;
+			reverse = true;
+			goto alloc;
+		}
+	}
+
+	/* Copy MCAM entry indices into mbox response entry_list.
+	 * Requester always expects indices in ascending order, so
+	 * so reverse the list if reverse bitmap is used for allocation.
+	 */
+	if (!req->contig && rsp->count) {
+		index = 0;
+		for (entry = rsp->count - 1; entry >= 0; entry--) {
+			if (reverse)
+				rsp->entry_list[index++] = entry_list[entry];
+			else
+				rsp->entry_list[entry] = entry_list[entry];
+		}
+	}
+
+	/* Mark the allocated entries as used and set nixlf mapping */
+	for (entry = 0; entry < rsp->count; entry++) {
+		index = req->contig ?
+			(rsp->entry + entry) : rsp->entry_list[entry];
+		npc_mcam_set_bit(mcam, index);
+		mcam->entry2pfvf_map[index] = pcifunc;
+	}
+
+	/* Update available free count in mbox response */
+	rsp->free_count = mcam->bmap_fcnt;
+
+	mutex_unlock(&mcam->lock);
+	return 0;
+}
+
+int rvu_mbox_handler_NPC_MCAM_ALLOC_ENTRY(struct rvu *rvu,
+					  struct npc_mcam_alloc_entry_req *req,
+					  struct npc_mcam_alloc_entry_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	rsp->entry = NPC_MCAM_ENTRY_INVALID;
+	rsp->free_count = 0;
+
+	/* Check if ref_entry is within range */
+	if (req->priority && req->ref_entry >= mcam->bmap_entries)
+		return NPC_MCAM_INVALID_REQ;
+
+	/* ref_entry can't be '0' if requested priority is high.
+	 * Can't be last entry if requested priority is low.
+	 */
+	if ((!req->ref_entry && req->priority == NPC_MCAM_HIGHER_PRIO) ||
+	    ((req->ref_entry == (mcam->bmap_entries - 1)) &&
+	     req->priority == NPC_MCAM_LOWER_PRIO))
+		return NPC_MCAM_INVALID_REQ;
+
+	/* Since list of allocated indices needs to be sent to requester,
+	 * max number of non-contiguous entries per mbox msg is limited.
+	 */
+	if (!req->contig && req->count > NPC_MAX_NONCONTIG_ENTRIES)
+		return NPC_MCAM_INVALID_REQ;
+
+	/* Alloc request from PFFUNC with no NIXLF attached should be denied */
+	if (!is_nixlf_attached(rvu, pcifunc))
+		return NPC_MCAM_ALLOC_DENIED;
+
+	return npc_mcam_alloc_entries(mcam, pcifunc, req, rsp);
+}
+
+int rvu_mbox_handler_NPC_MCAM_FREE_ENTRY(struct rvu *rvu,
+					 struct npc_mcam_free_entry_req *req,
+					 struct msg_rsp *rsp)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, rc = 0;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return NPC_MCAM_INVALID_REQ;
+
+	/* Free request from PFFUNC with no NIXLF attached, ignore */
+	if (!is_nixlf_attached(rvu, pcifunc))
+		return NPC_MCAM_INVALID_REQ;
+
+	mutex_lock(&mcam->lock);
+
+	if (req->all)
+		goto free_all;
+
+	rc = npc_mcam_verify_entry(mcam, pcifunc, req->entry);
+	if (rc)
+		goto exit;
+
+	mcam->entry2pfvf_map[req->entry] = 0;
+	npc_mcam_clear_bit(mcam, req->entry);
+	npc_enable_mcam_entry(rvu, mcam, blkaddr, req->entry, false);
+
+	goto exit;
+
+free_all:
+	/* Free up all entries allocated to requesting PFFUNC */
+	npc_mcam_free_all_entries(rvu, mcam, blkaddr, pcifunc);
+exit:
+	mutex_unlock(&mcam->lock);
+	return rc;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 03/20] octeontx2-af: Relax resource lock into mutex
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Stanislaw Kardach, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Stanislaw Kardach <skardach@marvell.com>

The resource locks does not need to be a spinlock as they are not
used in any interrupt handling routines (only in bottom halves).
Therefore relax them into a mutex so that later on we may use them
in routines that might sleep.

Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c    | 18 ++++++++-------
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  6 ++---
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    | 27 +++++++++++-----------
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    |  4 +++-
 4 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index dc28fa2..c7f00895 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -153,17 +153,17 @@ int rvu_get_lf(struct rvu *rvu, struct rvu_block *block, u16 pcifunc, u16 slot)
 	u16 match = 0;
 	int lf;
 
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 	for (lf = 0; lf < block->lf.max; lf++) {
 		if (block->fn_map[lf] == pcifunc) {
 			if (slot == match) {
-				spin_unlock(&rvu->rsrc_lock);
+				mutex_unlock(&rvu->rsrc_lock);
 				return lf;
 			}
 			match++;
 		}
 	}
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 	return -ENODEV;
 }
 
@@ -597,6 +597,8 @@ static void rvu_free_hw_resources(struct rvu *rvu)
 	dma_unmap_resource(rvu->dev, rvu->msix_base_iova,
 			   max_msix * PCI_MSIX_ENTRY_SIZE,
 			   DMA_BIDIRECTIONAL, 0);
+
+	mutex_destroy(&rvu->rsrc_lock);
 }
 
 static int rvu_setup_hw_resources(struct rvu *rvu)
@@ -752,7 +754,7 @@ static int rvu_setup_hw_resources(struct rvu *rvu)
 	if (!rvu->hwvf)
 		return -ENOMEM;
 
-	spin_lock_init(&rvu->rsrc_lock);
+	mutex_init(&rvu->rsrc_lock);
 
 	err = rvu_setup_msix_resources(rvu);
 	if (err)
@@ -926,7 +928,7 @@ static int rvu_detach_rsrcs(struct rvu *rvu, struct rsrc_detach *detach,
 	struct rvu_block *block;
 	int blkid;
 
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 
 	/* Check for partial resource detach */
 	if (detach && detach->partial)
@@ -956,7 +958,7 @@ static int rvu_detach_rsrcs(struct rvu *rvu, struct rsrc_detach *detach,
 		rvu_detach_block(rvu, pcifunc, block->type);
 	}
 
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 	return 0;
 }
 
@@ -1119,7 +1121,7 @@ static int rvu_mbox_handler_ATTACH_RESOURCES(struct rvu *rvu,
 	if (!attach->modify)
 		rvu_detach_rsrcs(rvu, NULL, pcifunc);
 
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 
 	/* Check if the request can be accommodated */
 	err = rvu_check_rsrc_availability(rvu, attach, pcifunc);
@@ -1163,7 +1165,7 @@ static int rvu_mbox_handler_ATTACH_RESOURCES(struct rvu *rvu,
 	}
 
 exit:
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 	return err;
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 4f4829e..12268de 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -64,7 +64,7 @@ struct nix_mcast {
 	struct qmem	*mcast_buf;
 	int		replay_pkind;
 	int		next_free_mce;
-	spinlock_t	mce_lock; /* Serialize MCE updates */
+	struct mutex	mce_lock; /* Serialize MCE updates */
 };
 
 struct nix_mce_list {
@@ -74,7 +74,7 @@ struct nix_mce_list {
 };
 
 struct npc_mcam {
-	spinlock_t	lock;	/* MCAM entries and counters update lock */
+	struct mutex	lock;	/* MCAM entries and counters update lock */
 	u8	keysize;	/* MCAM keysize 112/224/448 bits */
 	u8	banks;		/* Number of MCAM banks */
 	u8	banks_per_entry;/* Number of keywords in key */
@@ -174,7 +174,7 @@ struct rvu {
 	struct rvu_hwinfo       *hw;
 	struct rvu_pfvf		*pf;
 	struct rvu_pfvf		*hwvf;
-	spinlock_t		rsrc_lock; /* Serialize resource alloc/free */
+	struct mutex            rsrc_lock; /* Serialize resource alloc/free */
 
 	/* Mbox */
 	struct otx2_mbox	mbox;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 0b37a88..4d8ae2e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -109,12 +109,12 @@ static bool is_valid_txschq(struct rvu *rvu, int blkaddr,
 	if (schq >= txsch->schq.max)
 		return false;
 
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 	if (txsch->pfvf_map[schq] != pcifunc) {
-		spin_unlock(&rvu->rsrc_lock);
+		mutex_unlock(&rvu->rsrc_lock);
 		return false;
 	}
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 	return true;
 }
 
@@ -953,7 +953,7 @@ int rvu_mbox_handler_NIX_TXSCH_ALLOC(struct rvu *rvu,
 	if (!nix_hw)
 		return -EINVAL;
 
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
 		txsch = &nix_hw->txsch[lvl];
 		req_schq = req->schq_contig[lvl] + req->schq[lvl];
@@ -1009,7 +1009,7 @@ int rvu_mbox_handler_NIX_TXSCH_ALLOC(struct rvu *rvu,
 err:
 	rc = NIX_AF_ERR_TLX_ALLOC_FAIL;
 exit:
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 	return rc;
 }
 
@@ -1034,7 +1034,7 @@ static int nix_txschq_free(struct rvu *rvu, u16 pcifunc)
 		return NIX_AF_ERR_AF_LF_INVALID;
 
 	/* Disable TL2/3 queue links before SMQ flush*/
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 	for (lvl = NIX_TXSCH_LVL_TL4; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
 		if (lvl != NIX_TXSCH_LVL_TL2 && lvl != NIX_TXSCH_LVL_TL4)
 			continue;
@@ -1076,7 +1076,7 @@ static int nix_txschq_free(struct rvu *rvu, u16 pcifunc)
 			txsch->pfvf_map[schq] = 0;
 		}
 	}
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 
 	/* Sync cached info for this LF in NDC-TX to LLC/DRAM */
 	rvu_write64(rvu, blkaddr, NIX_AF_NDC_TX_SYNC, BIT_ULL(12) | nixlf);
@@ -1308,7 +1308,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list,
 		return 0;
 
 	/* Add a new one to the list, at the tail */
-	mce = kzalloc(sizeof(*mce), GFP_ATOMIC);
+	mce = kzalloc(sizeof(*mce), GFP_KERNEL);
 	if (!mce)
 		return -ENOMEM;
 	mce->idx = idx;
@@ -1354,7 +1354,7 @@ static int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add)
 		return -EINVAL;
 	}
 
-	spin_lock(&mcast->mce_lock);
+	mutex_lock(&mcast->mce_lock);
 
 	err = nix_update_mce_list(mce_list, pcifunc, idx, add);
 	if (err)
@@ -1384,7 +1384,7 @@ static int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add)
 	}
 
 end:
-	spin_unlock(&mcast->mce_lock);
+	mutex_unlock(&mcast->mce_lock);
 	return err;
 }
 
@@ -1469,7 +1469,7 @@ static int nix_setup_mcast(struct rvu *rvu, struct nix_hw *nix_hw, int blkaddr)
 		    BIT_ULL(63) | (mcast->replay_pkind << 24) |
 		    BIT_ULL(20) | MC_BUF_CNT);
 
-	spin_lock_init(&mcast->mce_lock);
+	mutex_init(&mcast->mce_lock);
 
 	return nix_setup_bcast_tables(rvu, nix_hw);
 }
@@ -1869,7 +1869,7 @@ int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
 
 	/* Update min/maxlen in each of the SMQ attached to this PF/VF */
 	txsch = &nix_hw->txsch[NIX_TXSCH_LVL_SMQ];
-	spin_lock(&rvu->rsrc_lock);
+	mutex_lock(&rvu->rsrc_lock);
 	for (schq = 0; schq < txsch->schq.max; schq++) {
 		if (txsch->pfvf_map[schq] != pcifunc)
 			continue;
@@ -1879,7 +1879,7 @@ int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
 			cfg = (cfg & ~0x7FULL) | ((u64)req->minlen & 0x7F);
 		rvu_write64(rvu, blkaddr, NIX_AF_SMQX_CFG(schq), cfg);
 	}
-	spin_unlock(&rvu->rsrc_lock);
+	mutex_unlock(&rvu->rsrc_lock);
 
 rx_frscfg:
 	/* Check if config is for SDP link */
@@ -2162,5 +2162,6 @@ void rvu_nix_freemem(struct rvu *rvu)
 		mcast = &nix_hw->mcast;
 		qmem_free(rvu->dev, mcast->mce_ctx);
 		qmem_free(rvu->dev, mcast->mcast_buf);
+		mutex_destroy(&mcast->mce_lock);
 	}
 }
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 23ff47f..3a96dfd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -732,7 +732,7 @@ static int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
 	mcam->nixlf_offset = mcam->entries;
 	mcam->pf_offset = mcam->nixlf_offset + nixlf_count;
 
-	spin_lock_init(&mcam->lock);
+	mutex_init(&mcam->lock);
 
 	return 0;
 }
@@ -811,6 +811,8 @@ int rvu_npc_init(struct rvu *rvu)
 void rvu_npc_freemem(struct rvu *rvu)
 {
 	struct npc_pkind *pkind = &rvu->hw->pkind;
+	struct npc_mcam *mcam = &rvu->hw->mcam;
 
 	kfree(pkind->rsrc.bmap);
+	mutex_destroy(&mcam->lock);
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH 02/20] octeontx2-af: Support to get NIX HW constants from AF
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Kiran Kumar, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Kiran Kumar <kirankumark@marvell.com>

This patch adds reading HW limits like number of Rx/Tx stats,
number of queue IRQs supported per NIX LF from AF registers
and sync them to PF/VF.

Signed-off-by: Kiran Kumar <kirankumark@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h    | 4 ++++
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index a4e0fb5..89db883 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -393,6 +393,10 @@ struct nix_lf_alloc_rsp {
 	u8	lso_tsov4_idx;
 	u8	lso_tsov6_idx;
 	u8      mac_addr[ETH_ALEN];
+	u8	lf_rx_stats; /* NIX_AF_CONST1::LF_RX_STATS */
+	u8	lf_tx_stats; /* NIX_AF_CONST1::LF_TX_STATS */
+	u16	cints; /* NIX_AF_CONST2::CINTS */
+	u16	qints; /* NIX_AF_CONST2::QINTS */
 };
 
 /* NIX AQ enqueue msg */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index b8d8bb9..0b37a88 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -829,6 +829,14 @@ int rvu_mbox_handler_NIX_LF_ALLOC(struct rvu *rvu,
 	rsp->tx_chan_cnt = pfvf->tx_chan_cnt;
 	rsp->lso_tsov4_idx = NIX_LSO_FORMAT_IDX_TSOV4;
 	rsp->lso_tsov6_idx = NIX_LSO_FORMAT_IDX_TSOV6;
+	/* Get HW supported stat count */
+	cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1);
+	rsp->lf_rx_stats = ((cfg >> 32) & 0xFF);
+	rsp->lf_tx_stats = ((cfg >> 24) & 0xFF);
+	/* Get count of CQ IRQs and error IRQs supported per LF */
+	cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
+	rsp->qints = ((cfg >> 12) & 0xFFF);
+	rsp->cints = ((cfg >> 24) & 0xFFF);
 	return rc;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 01/20] octeontx2-af: Support to modify min/max allowed packet lengths
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham
In-Reply-To: <1541702161-30673-1-git-send-email-sunil.kovvuri@gmail.com>

From: Sunil Goutham <sgoutham@marvell.com>

This patch adds support for RVU PF/VFs to modify min/max
packet lengths allowed by HW. For VFs on PF0, settings will
be automatically applied on LBK link. RX link's min/maxlen
is configured to min/max of PF and it's all VFs. On the TX side
if requested all SMQs attached to the requesting NIXLF will be
updated with new min/max lengths.

Also updates transmit credits for Tx links based on new maxlen.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h    |   1 +
 drivers/net/ethernet/marvell/octeontx2/af/common.h |   5 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  12 +-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |   4 +
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    | 199 +++++++++++++++++++++
 5 files changed, 220 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
index 0a66d27..3bd38ed 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
@@ -22,6 +22,7 @@
 
 #define MAX_CGX				3
 #define MAX_LMAC_PER_CGX		4
+#define CGX_FIFO_LEN			65536 /* 64K for both Rx & Tx */
 #define CGX_OFFSET(x)			((x) * MAX_LMAC_PER_CGX)
 
 /* Registers */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/common.h b/drivers/net/ethernet/marvell/octeontx2/af/common.h
index d39ada4..a8c89df 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/common.h
@@ -143,6 +143,11 @@ enum nix_scheduler {
 	NIX_TXSCH_LVL_CNT = 0x5,
 };
 
+/* Min/Max packet sizes, excluding FCS */
+#define	NIC_HW_MIN_FRS			40
+#define	NIC_HW_MAX_FRS			9212
+#define	SDP_HW_MAX_FRS			65535
+
 /* NIX RX action operation*/
 #define NIX_RX_ACTIONOP_DROP		(0x0ull)
 #define NIX_RX_ACTIONOP_UCAST		(0x1ull)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index a15a59c..a4e0fb5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -160,7 +160,8 @@ M(NIX_STATS_RST,	0x8007, msg_req, msg_rsp)			\
 M(NIX_VTAG_CFG,	0x8008, nix_vtag_config, msg_rsp)		\
 M(NIX_RSS_FLOWKEY_CFG,  0x8009, nix_rss_flowkey_cfg, msg_rsp)		\
 M(NIX_SET_MAC_ADDR,	0x800a, nix_set_mac_addr, msg_rsp)		\
-M(NIX_SET_RX_MODE,	0x800b, nix_rx_mode, msg_rsp)
+M(NIX_SET_RX_MODE,	0x800b, nix_rx_mode, msg_rsp)			\
+M(NIX_SET_HW_FRS,	0x800c, nix_frs_cfg, msg_rsp)
 
 /* Messages initiated by AF (range 0xC00 - 0xDFF) */
 #define MBOX_UP_CGX_MESSAGES						\
@@ -522,4 +523,13 @@ struct nix_rx_mode {
 	u16	mode;
 };
 
+struct nix_frs_cfg {
+	struct mbox_msghdr hdr;
+	u8	update_smq;    /* Update SMQ's min/max lens */
+	u8	update_minlen; /* Set minlen also */
+	u8	sdp_link;      /* Set SDP RX link */
+	u16	maxlen;
+	u16	minlen;
+};
+
 #endif /* MBOX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 2c0580c..4f4829e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -122,6 +122,8 @@ struct rvu_pfvf {
 	u16		tx_chan_base;
 	u8              rx_chan_cnt; /* total number of RX channels */
 	u8              tx_chan_cnt; /* total number of TX channels */
+	u16		maxlen;
+	u16		minlen;
 
 	u8		mac_addr[ETH_ALEN]; /* MAC address of this PF/VF */
 
@@ -349,6 +351,8 @@ int rvu_mbox_handler_NIX_SET_MAC_ADDR(struct rvu *rvu,
 				      struct msg_rsp *rsp);
 int rvu_mbox_handler_NIX_SET_RX_MODE(struct rvu *rvu, struct nix_rx_mode *req,
 				     struct msg_rsp *rsp);
+int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
+				    struct msg_rsp *rsp);
 
 /* NPC APIs */
 int rvu_npc_init(struct rvu *rvu);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index a5ab7ef..b8d8bb9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -168,14 +168,20 @@ static int nix_interface_init(struct rvu *rvu, u16 pcifunc, int type, int nixlf)
 
 	rvu_npc_install_bcast_match_entry(rvu, pcifunc,
 					  nixlf, pfvf->rx_chan_base);
+	pfvf->maxlen = NIC_HW_MIN_FRS;
+	pfvf->minlen = NIC_HW_MIN_FRS;
 
 	return 0;
 }
 
 static void nix_interface_deinit(struct rvu *rvu, u16 pcifunc, u8 nixlf)
 {
+	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
 	int err;
 
+	pfvf->maxlen = 0;
+	pfvf->minlen = 0;
+
 	/* Remove this PF_FUNC from bcast pkt replication list */
 	err = nix_update_bcast_mce_list(rvu, pcifunc, false);
 	if (err) {
@@ -1778,6 +1784,196 @@ int rvu_mbox_handler_NIX_SET_RX_MODE(struct rvu *rvu, struct nix_rx_mode *req,
 	return 0;
 }
 
+static void nix_find_link_frs(struct rvu *rvu,
+			      struct nix_frs_cfg *req, u16 pcifunc)
+{
+	int pf = rvu_get_pf(pcifunc);
+	struct rvu_pfvf *pfvf;
+	int maxlen, minlen;
+	int numvfs, hwvf;
+	int vf;
+
+	/* Update with requester's min/max lengths */
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+	pfvf->maxlen = req->maxlen;
+	if (req->update_minlen)
+		pfvf->minlen = req->minlen;
+
+	maxlen = req->maxlen;
+	minlen = req->update_minlen ? req->minlen : 0;
+
+	/* Get this PF's numVFs and starting hwvf */
+	rvu_get_pf_numvfs(rvu, pf, &numvfs, &hwvf);
+
+	/* For each VF, compare requested max/minlen */
+	for (vf = 0; vf < numvfs; vf++) {
+		pfvf =  &rvu->hwvf[hwvf + vf];
+		if (pfvf->maxlen > maxlen)
+			maxlen = pfvf->maxlen;
+		if (req->update_minlen &&
+		    pfvf->minlen && pfvf->minlen < minlen)
+			minlen = pfvf->minlen;
+	}
+
+	/* Compare requested max/minlen with PF's max/minlen */
+	pfvf = &rvu->pf[pf];
+	if (pfvf->maxlen > maxlen)
+		maxlen = pfvf->maxlen;
+	if (req->update_minlen &&
+	    pfvf->minlen && pfvf->minlen < minlen)
+		minlen = pfvf->minlen;
+
+	/* Update the request with max/min PF's and it's VF's max/min */
+	req->maxlen = maxlen;
+	if (req->update_minlen)
+		req->minlen = minlen;
+}
+
+int rvu_mbox_handler_NIX_SET_HW_FRS(struct rvu *rvu, struct nix_frs_cfg *req,
+				    struct msg_rsp *rsp)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	u16 pcifunc = req->hdr.pcifunc;
+	int pf = rvu_get_pf(pcifunc);
+	int blkaddr, schq, link = -1;
+	struct nix_txsch *txsch;
+	u64 cfg, lmac_fifo_len;
+	struct nix_hw *nix_hw;
+	u8 cgx = 0, lmac = 0;
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
+	if (blkaddr < 0)
+		return NIX_AF_ERR_AF_LF_INVALID;
+
+	nix_hw = get_nix_hw(rvu->hw, blkaddr);
+	if (!nix_hw)
+		return -EINVAL;
+
+	if (!req->sdp_link && req->maxlen > NIC_HW_MAX_FRS)
+		return NIX_AF_ERR_FRS_INVALID;
+
+	if (req->update_minlen && req->minlen < NIC_HW_MIN_FRS)
+		return NIX_AF_ERR_FRS_INVALID;
+
+	/* Check if requester wants to update SMQ's */
+	if (!req->update_smq)
+		goto rx_frscfg;
+
+	/* Update min/maxlen in each of the SMQ attached to this PF/VF */
+	txsch = &nix_hw->txsch[NIX_TXSCH_LVL_SMQ];
+	spin_lock(&rvu->rsrc_lock);
+	for (schq = 0; schq < txsch->schq.max; schq++) {
+		if (txsch->pfvf_map[schq] != pcifunc)
+			continue;
+		cfg = rvu_read64(rvu, blkaddr, NIX_AF_SMQX_CFG(schq));
+		cfg = (cfg & ~(0xFFFFULL << 8)) | ((u64)req->maxlen << 8);
+		if (req->update_minlen)
+			cfg = (cfg & ~0x7FULL) | ((u64)req->minlen & 0x7F);
+		rvu_write64(rvu, blkaddr, NIX_AF_SMQX_CFG(schq), cfg);
+	}
+	spin_unlock(&rvu->rsrc_lock);
+
+rx_frscfg:
+	/* Check if config is for SDP link */
+	if (req->sdp_link) {
+		if (!hw->sdp_links)
+			return NIX_AF_ERR_RX_LINK_INVALID;
+		link = hw->cgx_links + hw->lbk_links;
+		goto linkcfg;
+	}
+
+	/* Check if the request is from CGX mapped RVU PF */
+	if (is_pf_cgxmapped(rvu, pf)) {
+		/* Get CGX and LMAC to which this PF is mapped and find link */
+		rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx, &lmac);
+		link = (cgx * hw->lmac_per_cgx) + lmac;
+	} else if (pf == 0) {
+		/* For VFs of PF0 ingress is LBK port, so config LBK link */
+		link = hw->cgx_links;
+	}
+
+	if (link < 0)
+		return NIX_AF_ERR_RX_LINK_INVALID;
+
+	nix_find_link_frs(rvu, req, pcifunc);
+
+linkcfg:
+	cfg = rvu_read64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link));
+	cfg = (cfg & ~(0xFFFFULL << 16)) | ((u64)req->maxlen << 16);
+	if (req->update_minlen)
+		cfg = (cfg & ~0xFFFFULL) | req->minlen;
+	rvu_write64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link), cfg);
+
+	if (req->sdp_link || pf == 0)
+		return 0;
+
+	/* Update transmit credits for CGX links */
+	lmac_fifo_len =
+		CGX_FIFO_LEN / cgx_get_lmac_cnt(rvu_cgx_pdata(cgx, rvu));
+	cfg = rvu_read64(rvu, blkaddr, NIX_AF_TX_LINKX_NORM_CREDIT(link));
+	cfg &= ~(0xFFFFFULL << 12);
+	cfg |=  ((lmac_fifo_len - req->maxlen) / 16) << 12;
+	rvu_write64(rvu, blkaddr, NIX_AF_TX_LINKX_NORM_CREDIT(link), cfg);
+	rvu_write64(rvu, blkaddr, NIX_AF_TX_LINKX_EXPR_CREDIT(link), cfg);
+
+	return 0;
+}
+
+static void nix_link_config(struct rvu *rvu, int blkaddr)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	int cgx, lmac_cnt, slink, link;
+	u64 tx_credits;
+
+	/* Set default min/max packet lengths allowed on NIX Rx links.
+	 *
+	 * With HW reset minlen value of 60byte, HW will treat ARP pkts
+	 * as undersize and report them to SW as error pkts, hence
+	 * setting it to 40 bytes.
+	 */
+	for (link = 0; link < (hw->cgx_links + hw->lbk_links); link++) {
+		rvu_write64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link),
+			    NIC_HW_MAX_FRS << 16 | NIC_HW_MIN_FRS);
+	}
+
+	if (hw->sdp_links) {
+		link = hw->cgx_links + hw->lbk_links;
+		rvu_write64(rvu, blkaddr, NIX_AF_RX_LINKX_CFG(link),
+			    SDP_HW_MAX_FRS << 16 | NIC_HW_MIN_FRS);
+	}
+
+	/* Set credits for Tx links assuming max packet length allowed.
+	 * This will be reconfigured based on MTU set for PF/VF.
+	 */
+	for (cgx = 0; cgx < hw->cgx; cgx++) {
+		lmac_cnt = cgx_get_lmac_cnt(rvu_cgx_pdata(cgx, rvu));
+		tx_credits = ((CGX_FIFO_LEN / lmac_cnt) - NIC_HW_MAX_FRS) / 16;
+		/* Enable credits and set credit pkt count to max allowed */
+		tx_credits =  (tx_credits << 12) | (0x1FF << 2) | BIT_ULL(1);
+		slink = cgx * hw->lmac_per_cgx;
+		for (link = slink; link < (slink + lmac_cnt); link++) {
+			rvu_write64(rvu, blkaddr,
+				    NIX_AF_TX_LINKX_NORM_CREDIT(link),
+				    tx_credits);
+			rvu_write64(rvu, blkaddr,
+				    NIX_AF_TX_LINKX_EXPR_CREDIT(link),
+				    tx_credits);
+		}
+	}
+
+	/* Set Tx credits for LBK link */
+	slink = hw->cgx_links;
+	for (link = slink; link < (slink + hw->lbk_links); link++) {
+		tx_credits = 1000; /* 10 * max LBK datarate = 10 * 100Gbps */
+		/* Enable credits and set credit pkt count to max allowed */
+		tx_credits =  (tx_credits << 12) | (0x1FF << 2) | BIT_ULL(1);
+		rvu_write64(rvu, blkaddr,
+			    NIX_AF_TX_LINKX_NORM_CREDIT(link), tx_credits);
+		rvu_write64(rvu, blkaddr,
+			    NIX_AF_TX_LINKX_EXPR_CREDIT(link), tx_credits);
+	}
+}
+
 static int nix_calibrate_x2p(struct rvu *rvu, int blkaddr)
 {
 	int idx, err;
@@ -1922,6 +2118,9 @@ int rvu_nix_init(struct rvu *rvu)
 			    (NPC_LID_LC << 8) | (NPC_LT_LC_IP << 4) | 0x0F);
 
 		nix_rx_flowkey_alg_cfg(rvu, blkaddr);
+
+		/* Initialize CGX/LBK/SDP link credits, min/max pkt lengths */
+		nix_link_config(rvu, blkaddr);
 	}
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH 00/20] octeontx2-af: NPC MCAM support and FLR handling
From: sunil.kovvuri @ 2018-11-08 18:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: arnd, linux-soc, Sunil Goutham

From: Sunil Goutham <sgoutham@marvell.com>

This patchset is a continuation to earlier submitted three patch
series to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.

1. octeontx2-af: Add RVU Admin Function driver
   https://www.spinics.net/lists/netdev/msg528272.html
2. octeontx2-af: NPA and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg529163.html
3. octeontx2-af: NPC parser and NIX blocks initialization
   https://www.spinics.net/lists/netdev/msg530252.html

This patch series adds support for below
RVU generic:
- Function Level Reset irq handler
  When FLR is triggered for PFs, AF receives interrupt.
  This patchset adds logic for cleaning up of NPA, NIX
  and NPC block resources being used by PF.

- Mailbox communication between AF and it's VFs.
  Unlike VFs of PF1-PFn, AF which is PF0 can communicate 
  with it's VFs directly. Added support for the same.

- AF's VFs IO configuration
  These VFs are mapped to use internal HW loopback channels
  instead of CGX LMACs. Each pair of VFs work as two of ends
  of hardwired interfaces. VF0's TX is VF1's Rx & viceversa.

NPC block:
- MCAM entry management
  Alloc/Free of contiguous/non-contiguous and lower/higher 
  priority MCAM entry allocation and programming support.
- MCAM counters management and map/unmap with MCAM entries
- Default KEY extract profile
- HW errata workarounds

NIX block:
- Minimum and maximum allowed packet length config
- HW errata workarounds

Few more changes like shift to use mutex instead of spinlock etc
are done in this patchset.

Geetha sowjanya (2):
  octeontx2-af: Add FLR interrupt handler
  octeontx2-af: Teardown NPA, NIX LF upon receiving FLR

Kiran Kumar (1):
  octeontx2-af: Support to get NIX HW constants from AF

Linu Cherian (1):
  octeontx2-af: Add interrupt handlers for Master Enable event

Santosh Shukla (1):
  octeontx2-af: Add MKEX default profile

Stanislaw Kardach (1):
  octeontx2-af: Relax resource lock into mutex

Sunil Goutham (10):
  octeontx2-af: Support to modify min/max allowed packet lengths
  octeontx2-af: NPC MCAM entry alloc/free support
  octeontx2-af: MCAM entry installation support
  octeontx2-af: Support for NPC MCAM counters
  octeontx2-af: Map or unmap NPC MCAM entry and counter
  octeontx2-af: Alloc and config NPC MCAM entry at a time
  octeontx2-af: Support to enable/disable default MCAM entries
  octeontx2-af: Verify NPA/SSO/NIX PF_FUNC mapping
  octeontx2-af: Add FLR handling support for AF's VFs
  octeontx2-af: Workarounds for HW errata

Tomasz Duszynski (4):
  octeontx2-af: Add support for stripping STAG/CTAG
  octeontx2-af: Mbox communication support btw AF and it's VFs
  octeontx2-af: Enable sriov on AF to create VFs
  octeontx2-af: Configure AF VFs to talk over LBK channels

 drivers/net/ethernet/marvell/octeontx2/af/cgx.h    |    1 +
 drivers/net/ethernet/marvell/octeontx2/af/common.h |    7 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  200 ++-
 drivers/net/ethernet/marvell/octeontx2/af/npc.h    |   30 +
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c    |  937 +++++++++++--
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h    |  118 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c    |    6 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    |  495 ++++++-
 .../net/ethernet/marvell/octeontx2/af/rvu_npa.c    |   17 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 1386 +++++++++++++++++++-
 10 files changed, 2975 insertions(+), 222 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-11-08 18:35 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Saeed Mahameed, Eric Dumazet, Linux Kernel Network Developers,
	Dimitris Michailidis
In-Reply-To: <f4759eda-8a47-3114-f944-69b9ff6c2e87@itcare.pl>

On Thu, Nov 1, 2018 at 3:59 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>
>
>
> W dniu 31.10.2018 o 22:17, Cong Wang pisze:
> > On Wed, Oct 31, 2018 at 2:05 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
> >> Cong, How often does this happen ? can you some how verify if the
> >> problematic packet has extra end padding after the ip payload ?
> > For us, we need 10+ hours to get one warning. This is also
> > why we never capture the packet that causes this warning.
> >
> >
> >> It would be cool if we had a feature in kernel to store such SKB in
> >> memory when such issue occurs, and let the user dump it later (via
> >> tcpdump) and send the dump to the vendor for debug so we could just
> >> replay and see what happens.
> >>
> > Yeah, the warning kinda sucks, it tells almost nothing, the SKB
> > should be dumped up on this warning.
> >
>
> So another vlan and same hw csum - this time this vlan have less traffic
> so i catch traffic with tcpdump
> Nov  1 23:46:22 kernel: vlan2805: hw csum failure
> but the problem is there is about 1986 frames in that second
> Will tcpdump output helps ?

Looks like you don't have any IP fragments.

Do you try Eric's debugging patch? Does it make a difference?

Also, if doable, can you try to remove vlan from your setup to see if
the warning will be gone?

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 1/2] dpaa2-eth: defer probe on object allocate
From: Andrew Lunn @ 2018-11-08 18:25 UTC (permalink / raw)
  To: Ioana Ciornei
  Cc: netdev@vger.kernel.org, davem@davemloft.net,
	Ioana Ciocoi Radulescu
In-Reply-To: <1541683054-22273-2-git-send-email-ioana.ciornei@nxp.com>

On Thu, Nov 08, 2018 at 01:17:47PM +0000, Ioana Ciornei wrote:
> The fsl_mc_object_allocate function can fail because not all allocatable
> objects are probed by the fsl_mc_allocator at the call time. Defer the
> dpaa2-eth probe when this happens.
> 
> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
> ---
>  drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 30 +++++++++++++++++-------
>  1 file changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> index 88f7acc..71f5cd4 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> @@ -1434,8 +1434,11 @@ static struct fsl_mc_device *setup_dpcon(struct dpaa2_eth_priv *priv)
>  	err = fsl_mc_object_allocate(to_fsl_mc_device(dev),
>  				     FSL_MC_POOL_DPCON, &dpcon);
>  	if (err) {
> -		dev_info(dev, "Not enough DPCONs, will go on as-is\n");
> -		return NULL;
> +		if (err == -ENXIO)
> +			err = -EPROBE_DEFER;
> +		else
> +			dev_info(dev, "Not enough DPCONs, will go on as-is\n");
> +		return ERR_PTR(err);
>  	}
>  
>  	err = dpcon_open(priv->mc_io, 0, dpcon->obj_desc.id, &dpcon->mc_handle);
> @@ -1493,8 +1496,10 @@ static void free_dpcon(struct dpaa2_eth_priv *priv,
>  		return NULL;
>  
>  	channel->dpcon = setup_dpcon(priv);
> -	if (!channel->dpcon)
> +	if (IS_ERR_OR_NULL(channel->dpcon)) {
> +		err = PTR_ERR(channel->dpcon);
>  		goto err_setup;
> +	}

Hi Ioana

You need to be careful with IS_ERR_OR_NULL(). If it is a NULL,
PTR_ERR() is going to return 0. You then jump to the error cleanup
code, but return 0, meaning everything is O.K.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Michael S. Tsirkin @ 2018-11-09  4:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <67bd6a88-00f2-ed13-ad13-bdfe92ceeffc@redhat.com>

On Fri, Nov 09, 2018 at 10:30:50AM +0800, Jason Wang wrote:
> 
> On 2018/11/8 下午11:56, Michael S. Tsirkin wrote:
> > On Thu, Nov 08, 2018 at 07:51:48PM +0800, Tiwei Bie wrote:
> > > On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
> > > > On 2018/11/8 上午9:38, Tiwei Bie wrote:
> > > > > > > +
> > > > > > > +	if (vq->vq.num_free < descs_used) {
> > > > > > > +		pr_debug("Can't add buf len %i - avail = %i\n",
> > > > > > > +			 descs_used, vq->vq.num_free);
> > > > > > > +		/* FIXME: for historical reasons, we force a notify here if
> > > > > > > +		 * there are outgoing parts to the buffer.  Presumably the
> > > > > > > +		 * host should service the ring ASAP. */
> > > > > > I don't think we have a reason to do this for packed ring.
> > > > > > No historical baggage there, right?
> > > > > Based on the original commit log, it seems that the notify here
> > > > > is just an "optimization". But I don't quite understand what does
> > > > > the "the heuristics which KVM uses" refer to. If it's safe to drop
> > > > > this in packed ring, I'd like to do it.
> > > > 
> > > > According to the commit log, it seems like a workaround of lguest networking
> > > > backend.
> > > Do you know why removing this notify in Tx will break "the
> > > heuristics which KVM uses"? Or what does "the heuristics
> > > which KVM uses" refer to?
> > Yes. QEMU has a mode where it disables notifications and processes TX
> > ring periodically from a timer.  It's off by default but used to be on
> > by default a long time ago. If ring becomes full this causes traffic
> > stalls.
> 
> 
> Do you mean tx-timer? If yes, we can still enable it for packed ring

Yes we can but I doubt anyone does.

> and the
> timer will finally fired and we can go.

on tx ring full we probably don't want to wait for timer.
But I think we can just prevent qemu from using tx timer
with virtio 1.

> 
> > As a work-around Rusty put in this hack to kick on ring full
> > even with notifications disabled.
> 
> 
> From the commit log it looks more like a performance workaround instead of a
> bug fix.

it's a quality of implementation issue, yes.

> 
> > It's easy enough to make sure QEMU
> > does not combine devices with packed ring support with the timer hack.
> > And I am guessing it's safe enough to also block that option completely
> > e.g. when virtio 1.0 is enabled.
> 
> 
> I agree.
> 
> Thanks
> 
> 
> > > > I agree to drop it, we should not have such burden.
> > > > 
> > > > But we should notice that, with this removed, the compare between packed vs
> > > > split is kind of unfair. Consider the removal of lguest support recently,
> > > > maybe we can drop this for split ring as well?
> > > > 
> > > > Thanks
> > > > 
> > > > 
> > > > > commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
> > > > > Author: Rusty Russell<rusty@rustcorp.com.au>
> > > > > Date:   Fri Jul 25 12:06:04 2008 -0500
> > > > > 
> > > > >       virtio: don't always force a notification when ring is full
> > > > >       We force notification when the ring is full, even if the host has
> > > > >       indicated it doesn't want to know.  This seemed like a good idea at
> > > > >       the time: if we fill the transmit ring, we should tell the host
> > > > >       immediately.
> > > > >       Unfortunately this logic also applies to the receiving ring, which is
> > > > >       refilled constantly.  We should introduce real notification thesholds
> > > > >       to replace this logic.  Meanwhile, removing the logic altogether breaks
> > > > >       the heuristics which KVM uses, so we use a hack: only notify if there are
> > > > >       outgoing parts of the new buffer.
> > > > >       Here are the number of exits with lguest's crappy network implementation:
> > > > >       Before:
> > > > >               network xmit 7859051 recv 236420
> > > > >       After:
> > > > >               network xmit 7858610 recv 118136
> > > > >       Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 72bf8bc09014..21d9a62767af 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
> > > > >    	if (vq->num_free < out + in) {
> > > > >    		pr_debug("Can't add buf len %i - avail = %i\n",
> > > > >    			 out + in, vq->num_free);
> > > > > -		/* We notify*even if*  VRING_USED_F_NO_NOTIFY is set here. */
> > > > > -		vq->notify(&vq->vq);
> > > > > +		/* FIXME: for historical reasons, we force a notify here if
> > > > > +		 * there are outgoing parts to the buffer.  Presumably the
> > > > > +		 * host should service the ring ASAP. */
> > > > > +		if (out)
> > > > > +			vq->notify(&vq->vq);
> > > > >    		END_USE(vq);
> > > > >    		return -ENOSPC;
> > > > >    	}
> > > > > 
> > > > > 

^ permalink raw reply

* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Michael S. Tsirkin @ 2018-11-09  3:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <21d6dbd9-8f78-6939-0e80-27b470aeb00a@redhat.com>

On Fri, Nov 09, 2018 at 10:25:28AM +0800, Jason Wang wrote:
> 
> On 2018/11/8 下午10:14, Michael S. Tsirkin wrote:
> > On Thu, Nov 08, 2018 at 04:18:25PM +0800, Jason Wang wrote:
> > > On 2018/11/8 上午9:38, Tiwei Bie wrote:
> > > > > > +
> > > > > > +	if (vq->vq.num_free < descs_used) {
> > > > > > +		pr_debug("Can't add buf len %i - avail = %i\n",
> > > > > > +			 descs_used, vq->vq.num_free);
> > > > > > +		/* FIXME: for historical reasons, we force a notify here if
> > > > > > +		 * there are outgoing parts to the buffer.  Presumably the
> > > > > > +		 * host should service the ring ASAP. */
> > > > > I don't think we have a reason to do this for packed ring.
> > > > > No historical baggage there, right?
> > > > Based on the original commit log, it seems that the notify here
> > > > is just an "optimization". But I don't quite understand what does
> > > > the "the heuristics which KVM uses" refer to. If it's safe to drop
> > > > this in packed ring, I'd like to do it.
> > > 
> > > According to the commit log, it seems like a workaround of lguest networking
> > > backend. I agree to drop it, we should not have such burden.
> > > 
> > > But we should notice that, with this removed, the compare between packed vs
> > > split is kind of unfair.
> > I don't think this ever triggers to be frank. When would it?
> 
> 
> I think it can happen e.g in the path of XDP transmission in
> __virtnet_xdp_xmit_one():
> 
> 
>         err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC);
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
> 

I see. We used to do it for regular xmit but stopped
doing it. Is it fine for xdp then?

> > 
> > > Consider the removal of lguest support recently,
> > > maybe we can drop this for split ring as well?
> > > 
> > > Thanks
> > If it's helpful, then for sure we can drop it for virtio 1.
> > Can you see any perf differences at all? With which device?
> 
> 
> I don't test but consider the case of XDP_TX in guest plus vhost_net in
> host. Since vhost_net is half duplex, it's pretty easier to trigger this
> condition.
> 
> Thanks

Sounds reasonable. Worth testing before we change things though.

> 
> > 
> > > > commit 44653eae1407f79dff6f52fcf594ae84cb165ec4
> > > > Author: Rusty Russell<rusty@rustcorp.com.au>
> > > > Date:   Fri Jul 25 12:06:04 2008 -0500
> > > > 
> > > >       virtio: don't always force a notification when ring is full
> > > >       We force notification when the ring is full, even if the host has
> > > >       indicated it doesn't want to know.  This seemed like a good idea at
> > > >       the time: if we fill the transmit ring, we should tell the host
> > > >       immediately.
> > > >       Unfortunately this logic also applies to the receiving ring, which is
> > > >       refilled constantly.  We should introduce real notification thesholds
> > > >       to replace this logic.  Meanwhile, removing the logic altogether breaks
> > > >       the heuristics which KVM uses, so we use a hack: only notify if there are
> > > >       outgoing parts of the new buffer.
> > > >       Here are the number of exits with lguest's crappy network implementation:
> > > >       Before:
> > > >               network xmit 7859051 recv 236420
> > > >       After:
> > > >               network xmit 7858610 recv 118136
> > > >       Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
> > > > 
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 72bf8bc09014..21d9a62767af 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -87,8 +87,11 @@ static int vring_add_buf(struct virtqueue *_vq,
> > > >    	if (vq->num_free < out + in) {
> > > >    		pr_debug("Can't add buf len %i - avail = %i\n",
> > > >    			 out + in, vq->num_free);
> > > > -		/* We notify*even if*  VRING_USED_F_NO_NOTIFY is set here. */
> > > > -		vq->notify(&vq->vq);
> > > > +		/* FIXME: for historical reasons, we force a notify here if
> > > > +		 * there are outgoing parts to the buffer.  Presumably the
> > > > +		 * host should service the ring ASAP. */
> > > > +		if (out)
> > > > +			vq->notify(&vq->vq);
> > > >    		END_USE(vq);
> > > >    		return -ENOSPC;
> > > >    	}
> > > > 
> > > > 

^ permalink raw reply

* Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector
From: Quentin Monnet @ 2018-11-08 18:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	jakub.kicinski, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181108180153.tbssxcgkkq5xcdxc@mini-arch>


[-- Attachment #1.1: Type: text/plain, Size: 19076 bytes --]

2018-11-08 10:01 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> On 11/08, Quentin Monnet wrote:
>> Hi Stanislav, thanks for the changes! More comments below.
> Thank you for another round of review!
> 
>> 2018-11-07 21:39 UTC-0800 ~ Stanislav Fomichev <sdf@google.com>
>>> This commit adds support for loading/attaching/detaching flow
>>> dissector program. The structure of the flow dissector program is
>>> assumed to be the same as in the selftests:
>>>
>>> * flow_dissector section with the main entry point
>>> * a bunch of tail call progs
>>> * a jmp_table map that is populated with the tail call progs
>>>
>>> When `bpftool load` is called with a flow_dissector prog (i.e. when the
>>> first section is flow_dissector of 'type flow_dissector' argument is
>>> passed), we load and pin all the programs/maps. User is responsible to
>>> construct the jump table for the tail calls.
>>>
>>> The last argument of `bpftool attach` is made optional for this use
>>> case.
>>>
>>> Example:
>>> bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
>>> 	/sys/fs/bpf/flow type flow_dissector
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 0 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/IP
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 1 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/IPV6
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 2 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/IPV6OP
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 3 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/IPV6FR
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 4 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/MPLS
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>          key 5 0 0 0 \
>>>          value pinned /sys/fs/bpf/flow/VLAN
>>>
>>> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector
>>>
>>> Tested by using the above lines to load the prog in
>>> the test_flow_dissector.sh selftest.
>>>
>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>> ---
>>>   .../bpftool/Documentation/bpftool-prog.rst    |  36 ++++--
>>>   tools/bpf/bpftool/bash-completion/bpftool     |   6 +-
>>>   tools/bpf/bpftool/common.c                    |  30 ++---
>>>   tools/bpf/bpftool/main.h                      |   1 +
>>>   tools/bpf/bpftool/prog.c                      | 112 +++++++++++++-----
>>>   5 files changed, 126 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> index ac4e904b10fb..0374634c3087 100644
>>> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> @@ -15,7 +15,8 @@ SYNOPSIS
>>>   	*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-f** | **--bpffs** } }
>>>   	*COMMANDS* :=
>>> -	{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** | **help** }
>>> +	{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load**
>>> +	| **loadall** | **help** }
>>>   MAP COMMANDS
>>>   =============
>>> @@ -24,9 +25,9 @@ MAP COMMANDS
>>>   |	**bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** | **visual**}]
>>>   |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
>>>   |	**bpftool** **prog pin** *PROG* *FILE*
>>> -|	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>> -|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
>>> -|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
>>> +|	**bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>> +|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>>> +|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>>>   |	**bpftool** **prog help**
>>>   |
>>>   |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
>>> @@ -39,7 +40,9 @@ MAP COMMANDS
>>>   |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
>>>   |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
>>>   |	}
>>> -|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
>>> +|       *ATTACH_TYPE* := {
>>> +|		**msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**
>>> +|	}
>>>   DESCRIPTION
>>> @@ -79,8 +82,11 @@ DESCRIPTION
>>>   		  contain a dot character ('.'), which is reserved for future
>>>   		  extensions of *bpffs*.
>>> -	**bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>> +	**bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>>   		  Load bpf program from binary *OBJ* and pin as *FILE*.
>>> +		  **bpftool prog load** will pin only the first bpf program
>>> +		  from the *OBJ*, **bpftool prog loadall** will pin all maps
>>> +		  and programs from the *OBJ*.
>>
>> This could be improved regarding maps: with "bpftool prog load" I think we
>> also load and pin all maps, but your description implies this is only the
>> case with "loadall"
> I don't think we pin any maps with `bpftool prog load`, we certainly load
> them, but we don't pin any afaict. Can you point me to the code where we
> pin the maps?
> 

My bad. I read "pin" but thought "load". It does not pin them indeed,
sorry about that.

>>>   		  **type** is optional, if not specified program type will be
>>>   		  inferred from section names.
>>>   		  By default bpftool will create new maps as declared in the ELF
>>> @@ -97,13 +103,17 @@ DESCRIPTION
>>>   		  contain a dot character ('.'), which is reserved for future
>>>   		  extensions of *bpffs*.
>>> -        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
>>> -                  Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
>>> -                  to the map *MAP*.
>>> -
>>> -        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
>>> -                  Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
>>> -                  from the map *MAP*.
>>> +        **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>>> +                  Attach bpf program *PROG* (with type specified by
>>> +                  *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
>>> +                  parameter, with the exception of *flow_dissector* which is
>>> +                  attached to current networking name space.
>>> +
>>> +        **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>>> +                  Detach bpf program *PROG* (with type specified by
>>> +                  *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
>>> +                  parameter, with the exception of *flow_dissector* which is
>>> +                  detached from the current networking name space.
>>
>> While at it could you please fix those two paragraphs to use tabs for
>> indentation, as the rest of the doc? Thanks!
> Time to teach my vim to use tabs in .rst files. Sorry about that.

Those paragraphs were using spaces already, so you didn't introduce that
:). But all others use tabs so its a good occasion to fix it.

>>>   	**bpftool prog help**
>>>   		  Print short help message.
>>> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
>>> index 3f78e6404589..ad0fc919f7ec 100644
>>> --- a/tools/bpf/bpftool/bash-completion/bpftool
>>> +++ b/tools/bpf/bpftool/bash-completion/bpftool
>>> @@ -243,7 +243,7 @@ _bpftool()
>>>       # Completion depends on object and command in use
>>>       case $object in
>>>           prog)
>>> -            if [[ $command != "load" ]]; then
>>> +            if [[ $command != "load" && $command != "loadall" ]]; then
>>>                   case $prev in
>>>                       id)
>>>                           _bpftool_get_prog_ids
>>> @@ -299,7 +299,7 @@ _bpftool()
>>>                       fi
>>>                       if [[ ${#words[@]} == 6 ]]; then
>>> -                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse" -- "$cur" ) )
>>> +                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse flow_dissector" -- "$cur" ) )
>>>                           return 0
>>>                       fi
>>> @@ -309,7 +309,7 @@ _bpftool()
>>>                       fi
>>>                       return 0
>>>                       ;;
>>> -                load)
>>> +                load|loadall)
>>>                       local obj
>>>                       if [[ ${#words[@]} -lt 6 ]]; then
>>
>> You also want to update completion for the program types, at line 341 or so.
>> Feel free to split that list on several lines, by the way :).
> Will do, thanks!
> 
>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>>> index 25af85304ebe..f671a921dec5 100644
>>> --- a/tools/bpf/bpftool/common.c
>>> +++ b/tools/bpf/bpftool/common.c
>>> @@ -169,34 +169,24 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
>>>   	return fd;
>>>   }
>>> -int do_pin_fd(int fd, const char *name)
>>> +int mount_bpffs_for_pin(const char *name)
>>>   {
>>>   	char err_str[ERR_MAX_LEN];
>>>   	char *file;
>>>   	char *dir;
>>>   	int err = 0;
>>> -	err = bpf_obj_pin(fd, name);
>>> -	if (!err)
>>> -		goto out;
>>> -
>>>   	file = malloc(strlen(name) + 1);
>>>   	strcpy(file, name);
>>>   	dir = dirname(file);
>>> -	if (errno != EPERM || is_bpffs(dir)) {
>>> -		p_err("can't pin the object (%s): %s", name, strerror(errno));
>>> +	if (is_bpffs(dir)) {
>>> +		/* nothing to do if already mounted */
>>>   		goto out_free;
>>>   	}
>>
>> Nitpick: unnecessary brackets.
> Ack.
> 
>>> -	/* Attempt to mount bpffs, then retry pinning. */
>>>   	err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
>>> -	if (!err) {
>>> -		err = bpf_obj_pin(fd, name);
>>> -		if (err)
>>> -			p_err("can't pin the object (%s): %s", name,
>>> -			      strerror(errno));
>>> -	} else {
>>> +	if (err) {
>>>   		err_str[ERR_MAX_LEN - 1] = '\0';
>>>   		p_err("can't mount BPF file system to pin the object (%s): %s",
>>>   		      name, err_str);
>>> @@ -204,10 +194,20 @@ int do_pin_fd(int fd, const char *name)
>>>   out_free:
>>>   	free(file);
>>> -out:
>>>   	return err;
>>>   }
>>> +int do_pin_fd(int fd, const char *name)
>>> +{
>>> +	int err;
>>> +
>>> +	err = mount_bpffs_for_pin(name);
>>> +	if (err)
>>> +		return err;
>>> +
>>> +	return bpf_obj_pin(fd, name);
>>> +}
>>> +
>>>   int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
>>>   {
>>>   	unsigned int id;
>>> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
>>> index 28322ace2856..1383824c9baf 100644
>>> --- a/tools/bpf/bpftool/main.h
>>> +++ b/tools/bpf/bpftool/main.h
>>> @@ -129,6 +129,7 @@ const char *get_fd_type_name(enum bpf_obj_type type);
>>>   char *get_fdinfo(int fd, const char *key);
>>>   int open_obj_pinned(char *path);
>>>   int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
>>> +int mount_bpffs_for_pin(const char *name);
>>>   int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
>>>   int do_pin_fd(int fd, const char *name);
>>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>>> index 5302ee282409..a4346dd673b1 100644
>>> --- a/tools/bpf/bpftool/prog.c
>>> +++ b/tools/bpf/bpftool/prog.c
>>> @@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
>>>   	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
>>>   	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
>>>   	[BPF_SK_MSG_VERDICT] = "msg_verdict",
>>> +	[BPF_FLOW_DISSECTOR] = "flow_dissector",
>>>   	[__MAX_BPF_ATTACH_TYPE] = NULL,
>>>   };
>>> @@ -724,10 +725,11 @@ int map_replace_compar(const void *p1, const void *p2)
>>>   static int do_attach(int argc, char **argv)
>>>   {
>>>   	enum bpf_attach_type attach_type;
>>> -	int err, mapfd, progfd;
>>> +	int err, progfd;
>>> +	int mapfd = 0;
>>> -	if (!REQ_ARGS(5)) {
>>> -		p_err("too few parameters for map attach");
>>> +	if (!REQ_ARGS(3)) {
>>> +		p_err("too few parameters for attach");
>>>   		return -EINVAL;
>>>   	}
>>> @@ -740,11 +742,17 @@ static int do_attach(int argc, char **argv)
>>>   		p_err("invalid attach type");
>>>   		return -EINVAL;
>>>   	}
>>> -	NEXT_ARG();
>>> +	if (attach_type != BPF_FLOW_DISSECTOR) {
>>> +		NEXT_ARG();
>>> +		if (!REQ_ARGS(2)) {
>>> +			p_err("too few parameters for map attach");
>>> +			return -EINVAL;
>>> +		}
>>> -	mapfd = map_parse_fd(&argc, &argv);
>>> -	if (mapfd < 0)
>>> -		return mapfd;
>>> +		mapfd = map_parse_fd(&argc, &argv);
>>> +		if (mapfd < 0)
>>> +			return mapfd;
>>> +	}
>>>   	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
>>>   	if (err) {
>>> @@ -760,10 +768,11 @@ static int do_attach(int argc, char **argv)
>>>   static int do_detach(int argc, char **argv)
>>>   {
>>>   	enum bpf_attach_type attach_type;
>>> -	int err, mapfd, progfd;
>>> +	int err, progfd;
>>> +	int mapfd = 0;
>>> -	if (!REQ_ARGS(5)) {
>>> -		p_err("too few parameters for map detach");
>>> +	if (!REQ_ARGS(3)) {
>>> +		p_err("too few parameters for detach");
>>>   		return -EINVAL;
>>>   	}
>>> @@ -776,11 +785,17 @@ static int do_detach(int argc, char **argv)
>>>   		p_err("invalid attach type");
>>>   		return -EINVAL;
>>>   	}
>>> -	NEXT_ARG();
>>> +	if (attach_type != BPF_FLOW_DISSECTOR) {
>>> +		NEXT_ARG();
>>> +		if (!REQ_ARGS(2)) {
>>> +			p_err("too few parameters for map detach");
>>> +			return -EINVAL;
>>> +		}
>>
>> Would that make sense to factor argument checks or parsing for do_attach()
>> and do_detach() to some extent? In order to reduce the number of
>> attach-type-based exceptions to add in the code if we have other attach
>> types that do not take maps in the future.
> I can move all argument parsing into a new function and use it from both
> do_attach and do_detach.

Sounds good to me, thanks!

>>> -	mapfd = map_parse_fd(&argc, &argv);
>>> -	if (mapfd < 0)
>>> -		return mapfd;
>>> +		mapfd = map_parse_fd(&argc, &argv);
>>> +		if (mapfd < 0)
>>> +			return mapfd;
>>> +	}
>>>   	err = bpf_prog_detach2(progfd, mapfd, attach_type);
>>>   	if (err) {
>>> @@ -792,15 +807,16 @@ static int do_detach(int argc, char **argv)
>>>   		jsonw_null(json_wtr);
>>>   	return 0;
>>>   }
>>> -static int do_load(int argc, char **argv)
>>> +
>>> +static int load_with_options(int argc, char **argv, bool first_prog_only)
>>>   {
>>>   	enum bpf_attach_type expected_attach_type;
>>>   	struct bpf_object_open_attr attr = {
>>>   		.prog_type	= BPF_PROG_TYPE_UNSPEC,
>>>   	};
>>>   	struct map_replace *map_replace = NULL;
>>> +	struct bpf_program *prog = NULL, *pos;
>>>   	unsigned int old_map_fds = 0;
>>> -	struct bpf_program *prog;
>>>   	struct bpf_object *obj;
>>>   	struct bpf_map *map;
>>>   	const char *pinfile;
>>> @@ -918,14 +934,20 @@ static int do_load(int argc, char **argv)
>>>   		goto err_free_reuse_maps;
>>>   	}
>>> -	prog = bpf_program__next(NULL, obj);
>>> -	if (!prog) {
>>> -		p_err("object file doesn't contain any bpf program");
>>> -		goto err_close_obj;
>>> +	if (first_prog_only) {
>>> +		prog = bpf_program__next(NULL, obj);
>>> +		if (!prog) {
>>> +			p_err("object file doesn't contain any bpf program");
>>> +			goto err_close_obj;
>>> +		}
>>>   	}
>>> -	bpf_program__set_ifindex(prog, ifindex);
>>>   	if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
>>> +		if (!prog) {
>>> +			p_err("can not guess program type when loading all programs\n");
>>> +			goto err_close_obj;
>>> +		}
>>> +
>>>   		const char *sec_name = bpf_program__title(prog, false);
>>>   		err = libbpf_prog_type_by_name(sec_name, &attr.prog_type,
>>> @@ -936,8 +958,13 @@ static int do_load(int argc, char **argv)
>>>   			goto err_close_obj;
>>>   		}
>>>   	}
>>> -	bpf_program__set_type(prog, attr.prog_type);
>>> -	bpf_program__set_expected_attach_type(prog, expected_attach_type);
>>> +
>>> +	bpf_object__for_each_program(pos, obj) {
>>> +		bpf_program__set_ifindex(pos, ifindex);
>>> +		bpf_program__set_type(pos, attr.prog_type);
>>> +		bpf_program__set_expected_attach_type(pos,
>>> +						      expected_attach_type);
>>> +	}
>>
>> I still believe you can have programs of different types here, and be able
>> to load them. I tried it and managed to have it working fine. If no type is
>> provided from command line we can retrieve types for each program from its
>> section name. If a type is provided on the command line, we can do the same,
>> but I am not sure we should do it, or impose that type for all programs
>> instead.
> I can move auto-detection into this new bpf_object__for_each_program
> loop. So if no type is specified, try to infer the type from each prog
> section name, otherwise, use the provided one for all progs. Do we want
> something like that?

This is what I have in mind. But others may disagree.

> Btw, do you have some existing real life example of where it's needed so
> I can test this new implementation? (maybe something under samples/ ?)

I thought about an ELF file containing both an XDP and a TC classifier
program for example. XDP can mark programs for TC, then TC process them
with all the facilities we have for skbs. It does not _have_ to be in
the same ELF file, but could be.

I haven't searched samples/bpf/ in depth, but a grep on SEC shows a
couple of files with several types (kprobe/kretprobe, classifier/xdp).
samples/bpf/xdp2skb_meta_kern.c looks like a good candidate. Or actually
for testing purposes, I simply used the following:

	#define SEC(NAME) __attribute__((section(NAME), used))

	int _version SEC("version") = 1;

	SEC("classifier")
	int func()
	{
		return 1;
	}

	SEC("xdp")
	int funcbar()
	{
		return 0;
	}

>>>   	qsort(map_replace, old_map_fds, sizeof(*map_replace),
>>>   	      map_replace_compar);
>>> @@ -1001,9 +1028,25 @@ static int do_load(int argc, char **argv)
>>>   		goto err_close_obj;
>>>   	}
>>> -	if (do_pin_fd(bpf_program__fd(prog), pinfile))
>>> +	err = mount_bpffs_for_pin(pinfile);
>>> +	if (err)
>>>   		goto err_close_obj;
>>> +	if (prog) {
>>
>> Nit: Maybe "if (first_prog_only) {" instead? If I understand correctly, at
>> this stage it should be equivalent, but in my opinion it would make it
>> easier to understand why we have two cases here.
> Sure, I can do that if you think that's more readable, I don't have a
> preference.

Thanks!
Quentin


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Alexei Starovoitov @ 2018-11-08 18:21 UTC (permalink / raw)
  To: Edward Cree
  Cc: Martin Lau, Yonghong Song, Alexei Starovoitov,
	daniel@iogearbox.net, netdev@vger.kernel.org, Kernel Team
In-Reply-To: <14c5120a-b67a-a514-5e9d-2895a4a841df@solarflare.com>

On Thu, Nov 08, 2018 at 05:58:56PM +0000, Edward Cree wrote:
> 
> > Happy to jump on the call to explain it again.
> > 10:30am pacific time works for me tomorrow.
> That works for me (that's in ~30 minutes from now if I've converted
>  correctly.)  Please email me offlist with the phone number to call.

no offlist. public link for anyone to join:
https://bluejeans.com/867080076/

I have hard cutoff at 11am though.

^ permalink raw reply

* Re: [PATCHv2] net: stmmac: Fix RX packet size > 8191
From: David Miller @ 2018-11-09  3:47 UTC (permalink / raw)
  To: thor.thayer
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, netdev, linux-kernel
In-Reply-To: <1541698935-9752-1-git-send-email-thor.thayer@linux.intel.com>

From: thor.thayer@linux.intel.com
Date: Thu,  8 Nov 2018 11:42:14 -0600

> From: Thor Thayer <thor.thayer@linux.intel.com>
> 
> Ping problems with packets > 8191 as shown:
> 
> PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data.
> 8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms
> wrong data byte 8144 should be 0xd0 but was 0x0
> 16    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
>       20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
> %< ---------------snip--------------------------------------
> 8112  b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
>       c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
> 8144  0 0 0 0 d0 d1
>       ^^^^^^^
> Notice the 4 bytes of 0 before the expected byte of d0.
> 
> Databook notes that the RX buffer must be a multiple of 4/8/16
> bytes [1].
> 
> Update the DMA Buffer size define to 8188 instead of 8192. Remove
> the -1 from the RX buffer size allocations and use the new
> DMA Buffer size directly.
> 
> [1] Synopsys DesignWare Cores Ethernet MAC Universal v3.70a
>     [section 8.4.2 - Table 8-24]
> 
> Tested on SoCFPGA Stratix10 with ping sweep from 100 to 8300 byte packets.
> 
> Fixes: 286a83721720 ("stmmac: add CHAINED descriptor mode support (V4)")
> Suggested-by: Jose Abreu <jose.abreu@synopsys.com>
> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>

Applied.

^ permalink raw reply

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
From: Song Liu @ 2018-11-08 18:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Netdev, lkml, Kernel Team, ast@kernel.org, daniel@iogearbox.net,
	acme@kernel.org
In-Reply-To: <20181108150028.GU9761@hirez.programming.kicks-ass.net>

Hi Peter,

> On Nov 8, 2018, at 7:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
>> 
>> 
>>> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>>>> For better performance analysis of BPF programs, this patch introduces
>>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>>> load/unload information to user space.
>>>> 
>>>>       /*
>>>>        * Record different types of bpf events:
>>>>        *   enum perf_bpf_event_type {
>>>>        *      PERF_BPF_EVENT_UNKNOWN          = 0,
>>>>        *      PERF_BPF_EVENT_PROG_LOAD        = 1,
>>>>        *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
>>>>        *   };
>>>>        *
>>>>        * struct {
>>>>        *      struct perf_event_header header;
>>>>        *      u16 type;
>>>>        *      u16 flags;
>>>>        *      u32 id;  // prog_id or map_id
>>>>        * };
>>>>        */
>>>>       PERF_RECORD_BPF_EVENT                   = 17,
>>>> 
>>>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>>>> Perf utility (or other user space tools) should listen to this event and
>>>> fetch more details about the event via BPF syscalls
>>>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
>>> 
>>> Why !? You're failing to explain why it cannot provide the full
>>> information there.
>> 
>> Aha, I missed this part. I will add the following to next version. Please
>> let me know if anything is not clear.
> 
>> 
>> This design decision is picked for the following reasons. First, BPF 
>> programs could be loaded-and-jited and/or unloaded before/during/after 
>> perf-record run. Once a BPF programs is unloaded, it is impossible to 
>> recover details of the program. It is impossible to provide the 
>> information through a simple key (like the build ID). Second, BPF prog
>> annotation is under fast developments. Multiple informations will be 
>> added to bpf_prog_info in the next few releases. Including all the
>> information of a BPF program in the perf ring buffer requires frequent 
>> changes to the perf ABI, and thus makes it very difficult to manage 
>> compatibility of perf utility. 
> 
> So I don't agree with that reasoning. If you want symbol information
> you'll just have to commit to some form of ABI. That bpf_prog_info is an
> ABI too.

At the beginning of the perf-record run, perf need to query bpf_prog_info 
of already loaded BPF programs. Therefore, we need to commit to the 
bpf_prog_info ABI. If we also include full information of the BPF program 
in the perf ring buffer, we will commit to TWO ABIs. 

Also, perf-record write the event to perf.data file, so the data need to be 
serialized. This is implemented in patch 4/5. To include the data in the 
ring buffer, we will need another piece of code in the kernel to do the
same serialization work.   

On the other hand, processing BPF load/unload events synchronously should
not introduce too much overhead for meaningful use cases. If many BPF progs
are being loaded/unloaded within short period of time, it is not the steady
state that profiling works care about. 

Would these resolve your concerns? 

Thanks,
Song

^ permalink raw reply

* Re: a propose of snmp counter document
From: Cong Wang @ 2018-11-08 18:05 UTC (permalink / raw)
  To: yupeng0921; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAG3TDc3Za0hMk6r=iNq-rkCw9U-w1OL7bn2X=Sc4HP5FCWLcwA@mail.gmail.com>

On Thu, Nov 8, 2018 at 12:10 AM peng yu <yupeng0921@gmail.com> wrote:
>
> I'm planing to write a document which explains the meaning of the
> kernel snmp counters, and combine the explanations with some tests,
> because I found lots of the 'TcpExt' and 'IpExt' counters are not
> explained in any document. Here is a draft:
> https://github.com/yupeng0921/iproute2_learning/blob/master/nstat.md
> It is still on going. I think it might be useful. Besides put it on my
> git repo, could someone have any suggestion about any place I
> could contribute this document to?

Good work! It has been in my todo list for a long time.

I believe we have enough room in Documentation/networking/
for it. You can follow the normal patch submission process to
contribute to upstream:

https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html


Thanks!

^ permalink raw reply

* Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
From: David Ahern @ 2018-11-08 18:05 UTC (permalink / raw)
  To: Paweł Staszewski, Jesper Dangaard Brouer; +Cc: netdev, Yoel Caspersen
In-Reply-To: <6d52b197-c303-eeac-2992-cedfd78115c0@itcare.pl>

On 11/8/18 10:30 AM, Paweł Staszewski wrote:
> Wondering about this:
> swapper     0 [045] 68494.770287: fib:fib_table_lookup: table 254 oif 0
> iif 6 proto 1 192.168.22.237/0 -> 172.16.0.2/0 tos 0 scope 0 flags 0 ==>
> dev vlan1740 gw 0.0.0.0 src 172.16.0.1 err 0
>             7fff818c13b5 fib_table_lookup ([kernel.kallsyms])
> 
> oif 0 ?
> 
> Is that correct here ?

ingress path so iif is set to the vlan device and oif is 0.

egress lookups (e.g., locally generated traffic) have oif non-0.

^ permalink raw reply

* Re: [PATCH net-next] net: qca_spi: Add available buffer space verification
From: David Miller @ 2018-11-09  3:41 UTC (permalink / raw)
  To: stefan.wahren; +Cc: michael.heimpold, netdev, linux-kernel
In-Reply-To: <1541684301-15824-1-git-send-email-stefan.wahren@i2se.com>

From: Stefan Wahren <stefan.wahren@i2se.com>
Date: Thu,  8 Nov 2018 14:38:21 +0100

> Interferences on the SPI line could distort the response of
> available buffer space. So at least we should check that the
> response doesn't exceed the maximum available buffer space.
> In error case increase a new error counter and retry it later.
> This behavior avoids buffer errors in the QCA7000, which
> results in an unnecessary chip reset including packet loss.
> 
> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-08 18:01 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	jakub.kicinski, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <8c35340e-3ed7-70cd-3123-7cd0fb8824a7@netronome.com>

On 11/08, Quentin Monnet wrote:
> Hi Stanislav, thanks for the changes! More comments below.
Thank you for another round of review!

> 2018-11-07 21:39 UTC-0800 ~ Stanislav Fomichev <sdf@google.com>
> > This commit adds support for loading/attaching/detaching flow
> > dissector program. The structure of the flow dissector program is
> > assumed to be the same as in the selftests:
> > 
> > * flow_dissector section with the main entry point
> > * a bunch of tail call progs
> > * a jmp_table map that is populated with the tail call progs
> > 
> > When `bpftool load` is called with a flow_dissector prog (i.e. when the
> > first section is flow_dissector of 'type flow_dissector' argument is
> > passed), we load and pin all the programs/maps. User is responsible to
> > construct the jump table for the tail calls.
> > 
> > The last argument of `bpftool attach` is made optional for this use
> > case.
> > 
> > Example:
> > bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
> > 	/sys/fs/bpf/flow type flow_dissector
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 0 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/IP
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 1 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/IPV6
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 2 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/IPV6OP
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 3 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/IPV6FR
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 4 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/MPLS
> > 
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >          key 5 0 0 0 \
> >          value pinned /sys/fs/bpf/flow/VLAN
> > 
> > bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector
> > 
> > Tested by using the above lines to load the prog in
> > the test_flow_dissector.sh selftest.
> > 
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   .../bpftool/Documentation/bpftool-prog.rst    |  36 ++++--
> >   tools/bpf/bpftool/bash-completion/bpftool     |   6 +-
> >   tools/bpf/bpftool/common.c                    |  30 ++---
> >   tools/bpf/bpftool/main.h                      |   1 +
> >   tools/bpf/bpftool/prog.c                      | 112 +++++++++++++-----
> >   5 files changed, 126 insertions(+), 59 deletions(-)
> > 
> > diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > index ac4e904b10fb..0374634c3087 100644
> > --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > @@ -15,7 +15,8 @@ SYNOPSIS
> >   	*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-f** | **--bpffs** } }
> >   	*COMMANDS* :=
> > -	{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** | **help** }
> > +	{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load**
> > +	| **loadall** | **help** }
> >   MAP COMMANDS
> >   =============
> > @@ -24,9 +25,9 @@ MAP COMMANDS
> >   |	**bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** | **visual**}]
> >   |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
> >   |	**bpftool** **prog pin** *PROG* *FILE*
> > -|	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > -|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
> > -|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
> > +|	**bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > +|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
> > +|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
> >   |	**bpftool** **prog help**
> >   |
> >   |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> > @@ -39,7 +40,9 @@ MAP COMMANDS
> >   |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
> >   |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
> >   |	}
> > -|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
> > +|       *ATTACH_TYPE* := {
> > +|		**msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**
> > +|	}
> >   DESCRIPTION
> > @@ -79,8 +82,11 @@ DESCRIPTION
> >   		  contain a dot character ('.'), which is reserved for future
> >   		  extensions of *bpffs*.
> > -	**bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > +	**bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> >   		  Load bpf program from binary *OBJ* and pin as *FILE*.
> > +		  **bpftool prog load** will pin only the first bpf program
> > +		  from the *OBJ*, **bpftool prog loadall** will pin all maps
> > +		  and programs from the *OBJ*.
> 
> This could be improved regarding maps: with "bpftool prog load" I think we
> also load and pin all maps, but your description implies this is only the
> case with "loadall"
I don't think we pin any maps with `bpftool prog load`, we certainly load
them, but we don't pin any afaict. Can you point me to the code where we
pin the maps?

> >   		  **type** is optional, if not specified program type will be
> >   		  inferred from section names.
> >   		  By default bpftool will create new maps as declared in the ELF
> > @@ -97,13 +103,17 @@ DESCRIPTION
> >   		  contain a dot character ('.'), which is reserved for future
> >   		  extensions of *bpffs*.
> > -        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
> > -                  Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> > -                  to the map *MAP*.
> > -
> > -        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
> > -                  Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> > -                  from the map *MAP*.
> > +        **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
> > +                  Attach bpf program *PROG* (with type specified by
> > +                  *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
> > +                  parameter, with the exception of *flow_dissector* which is
> > +                  attached to current networking name space.
> > +
> > +        **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
> > +                  Detach bpf program *PROG* (with type specified by
> > +                  *ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP*
> > +                  parameter, with the exception of *flow_dissector* which is
> > +                  detached from the current networking name space.
> 
> While at it could you please fix those two paragraphs to use tabs for
> indentation, as the rest of the doc? Thanks!
Time to teach my vim to use tabs in .rst files. Sorry about that.

> >   	**bpftool prog help**
> >   		  Print short help message.
> > diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> > index 3f78e6404589..ad0fc919f7ec 100644
> > --- a/tools/bpf/bpftool/bash-completion/bpftool
> > +++ b/tools/bpf/bpftool/bash-completion/bpftool
> > @@ -243,7 +243,7 @@ _bpftool()
> >       # Completion depends on object and command in use
> >       case $object in
> >           prog)
> > -            if [[ $command != "load" ]]; then
> > +            if [[ $command != "load" && $command != "loadall" ]]; then
> >                   case $prev in
> >                       id)
> >                           _bpftool_get_prog_ids
> > @@ -299,7 +299,7 @@ _bpftool()
> >                       fi
> >                       if [[ ${#words[@]} == 6 ]]; then
> > -                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse" -- "$cur" ) )
> > +                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse flow_dissector" -- "$cur" ) )
> >                           return 0
> >                       fi
> > @@ -309,7 +309,7 @@ _bpftool()
> >                       fi
> >                       return 0
> >                       ;;
> > -                load)
> > +                load|loadall)
> >                       local obj
> >                       if [[ ${#words[@]} -lt 6 ]]; then
> 
> You also want to update completion for the program types, at line 341 or so.
> Feel free to split that list on several lines, by the way :).
Will do, thanks!

> > diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > index 25af85304ebe..f671a921dec5 100644
> > --- a/tools/bpf/bpftool/common.c
> > +++ b/tools/bpf/bpftool/common.c
> > @@ -169,34 +169,24 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
> >   	return fd;
> >   }
> > -int do_pin_fd(int fd, const char *name)
> > +int mount_bpffs_for_pin(const char *name)
> >   {
> >   	char err_str[ERR_MAX_LEN];
> >   	char *file;
> >   	char *dir;
> >   	int err = 0;
> > -	err = bpf_obj_pin(fd, name);
> > -	if (!err)
> > -		goto out;
> > -
> >   	file = malloc(strlen(name) + 1);
> >   	strcpy(file, name);
> >   	dir = dirname(file);
> > -	if (errno != EPERM || is_bpffs(dir)) {
> > -		p_err("can't pin the object (%s): %s", name, strerror(errno));
> > +	if (is_bpffs(dir)) {
> > +		/* nothing to do if already mounted */
> >   		goto out_free;
> >   	}
> 
> Nitpick: unnecessary brackets.
Ack.

> > -	/* Attempt to mount bpffs, then retry pinning. */
> >   	err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
> > -	if (!err) {
> > -		err = bpf_obj_pin(fd, name);
> > -		if (err)
> > -			p_err("can't pin the object (%s): %s", name,
> > -			      strerror(errno));
> > -	} else {
> > +	if (err) {
> >   		err_str[ERR_MAX_LEN - 1] = '\0';
> >   		p_err("can't mount BPF file system to pin the object (%s): %s",
> >   		      name, err_str);
> > @@ -204,10 +194,20 @@ int do_pin_fd(int fd, const char *name)
> >   out_free:
> >   	free(file);
> > -out:
> >   	return err;
> >   }
> > +int do_pin_fd(int fd, const char *name)
> > +{
> > +	int err;
> > +
> > +	err = mount_bpffs_for_pin(name);
> > +	if (err)
> > +		return err;
> > +
> > +	return bpf_obj_pin(fd, name);
> > +}
> > +
> >   int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
> >   {
> >   	unsigned int id;
> > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> > index 28322ace2856..1383824c9baf 100644
> > --- a/tools/bpf/bpftool/main.h
> > +++ b/tools/bpf/bpftool/main.h
> > @@ -129,6 +129,7 @@ const char *get_fd_type_name(enum bpf_obj_type type);
> >   char *get_fdinfo(int fd, const char *key);
> >   int open_obj_pinned(char *path);
> >   int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
> > +int mount_bpffs_for_pin(const char *name);
> >   int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
> >   int do_pin_fd(int fd, const char *name);
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index 5302ee282409..a4346dd673b1 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
> >   	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
> >   	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
> >   	[BPF_SK_MSG_VERDICT] = "msg_verdict",
> > +	[BPF_FLOW_DISSECTOR] = "flow_dissector",
> >   	[__MAX_BPF_ATTACH_TYPE] = NULL,
> >   };
> > @@ -724,10 +725,11 @@ int map_replace_compar(const void *p1, const void *p2)
> >   static int do_attach(int argc, char **argv)
> >   {
> >   	enum bpf_attach_type attach_type;
> > -	int err, mapfd, progfd;
> > +	int err, progfd;
> > +	int mapfd = 0;
> > -	if (!REQ_ARGS(5)) {
> > -		p_err("too few parameters for map attach");
> > +	if (!REQ_ARGS(3)) {
> > +		p_err("too few parameters for attach");
> >   		return -EINVAL;
> >   	}
> > @@ -740,11 +742,17 @@ static int do_attach(int argc, char **argv)
> >   		p_err("invalid attach type");
> >   		return -EINVAL;
> >   	}
> > -	NEXT_ARG();
> > +	if (attach_type != BPF_FLOW_DISSECTOR) {
> > +		NEXT_ARG();
> > +		if (!REQ_ARGS(2)) {
> > +			p_err("too few parameters for map attach");
> > +			return -EINVAL;
> > +		}
> > -	mapfd = map_parse_fd(&argc, &argv);
> > -	if (mapfd < 0)
> > -		return mapfd;
> > +		mapfd = map_parse_fd(&argc, &argv);
> > +		if (mapfd < 0)
> > +			return mapfd;
> > +	}
> >   	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
> >   	if (err) {
> > @@ -760,10 +768,11 @@ static int do_attach(int argc, char **argv)
> >   static int do_detach(int argc, char **argv)
> >   {
> >   	enum bpf_attach_type attach_type;
> > -	int err, mapfd, progfd;
> > +	int err, progfd;
> > +	int mapfd = 0;
> > -	if (!REQ_ARGS(5)) {
> > -		p_err("too few parameters for map detach");
> > +	if (!REQ_ARGS(3)) {
> > +		p_err("too few parameters for detach");
> >   		return -EINVAL;
> >   	}
> > @@ -776,11 +785,17 @@ static int do_detach(int argc, char **argv)
> >   		p_err("invalid attach type");
> >   		return -EINVAL;
> >   	}
> > -	NEXT_ARG();
> > +	if (attach_type != BPF_FLOW_DISSECTOR) {
> > +		NEXT_ARG();
> > +		if (!REQ_ARGS(2)) {
> > +			p_err("too few parameters for map detach");
> > +			return -EINVAL;
> > +		}
> 
> Would that make sense to factor argument checks or parsing for do_attach()
> and do_detach() to some extent? In order to reduce the number of
> attach-type-based exceptions to add in the code if we have other attach
> types that do not take maps in the future.
I can move all argument parsing into a new function and use it from both
do_attach and do_detach.

> > -	mapfd = map_parse_fd(&argc, &argv);
> > -	if (mapfd < 0)
> > -		return mapfd;
> > +		mapfd = map_parse_fd(&argc, &argv);
> > +		if (mapfd < 0)
> > +			return mapfd;
> > +	}
> >   	err = bpf_prog_detach2(progfd, mapfd, attach_type);
> >   	if (err) {
> > @@ -792,15 +807,16 @@ static int do_detach(int argc, char **argv)
> >   		jsonw_null(json_wtr);
> >   	return 0;
> >   }
> > -static int do_load(int argc, char **argv)
> > +
> > +static int load_with_options(int argc, char **argv, bool first_prog_only)
> >   {
> >   	enum bpf_attach_type expected_attach_type;
> >   	struct bpf_object_open_attr attr = {
> >   		.prog_type	= BPF_PROG_TYPE_UNSPEC,
> >   	};
> >   	struct map_replace *map_replace = NULL;
> > +	struct bpf_program *prog = NULL, *pos;
> >   	unsigned int old_map_fds = 0;
> > -	struct bpf_program *prog;
> >   	struct bpf_object *obj;
> >   	struct bpf_map *map;
> >   	const char *pinfile;
> > @@ -918,14 +934,20 @@ static int do_load(int argc, char **argv)
> >   		goto err_free_reuse_maps;
> >   	}
> > -	prog = bpf_program__next(NULL, obj);
> > -	if (!prog) {
> > -		p_err("object file doesn't contain any bpf program");
> > -		goto err_close_obj;
> > +	if (first_prog_only) {
> > +		prog = bpf_program__next(NULL, obj);
> > +		if (!prog) {
> > +			p_err("object file doesn't contain any bpf program");
> > +			goto err_close_obj;
> > +		}
> >   	}
> > -	bpf_program__set_ifindex(prog, ifindex);
> >   	if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
> > +		if (!prog) {
> > +			p_err("can not guess program type when loading all programs\n");
> > +			goto err_close_obj;
> > +		}
> > +
> >   		const char *sec_name = bpf_program__title(prog, false);
> >   		err = libbpf_prog_type_by_name(sec_name, &attr.prog_type,
> > @@ -936,8 +958,13 @@ static int do_load(int argc, char **argv)
> >   			goto err_close_obj;
> >   		}
> >   	}
> > -	bpf_program__set_type(prog, attr.prog_type);
> > -	bpf_program__set_expected_attach_type(prog, expected_attach_type);
> > +
> > +	bpf_object__for_each_program(pos, obj) {
> > +		bpf_program__set_ifindex(pos, ifindex);
> > +		bpf_program__set_type(pos, attr.prog_type);
> > +		bpf_program__set_expected_attach_type(pos,
> > +						      expected_attach_type);
> > +	}
> 
> I still believe you can have programs of different types here, and be able
> to load them. I tried it and managed to have it working fine. If no type is
> provided from command line we can retrieve types for each program from its
> section name. If a type is provided on the command line, we can do the same,
> but I am not sure we should do it, or impose that type for all programs
> instead.
I can move auto-detection into this new bpf_object__for_each_program
loop. So if no type is specified, try to infer the type from each prog
section name, otherwise, use the provided one for all progs. Do we want
something like that?
Btw, do you have some existing real life example of where it's needed so
I can test this new implementation? (maybe something under samples/ ?)

> >   	qsort(map_replace, old_map_fds, sizeof(*map_replace),
> >   	      map_replace_compar);
> > @@ -1001,9 +1028,25 @@ static int do_load(int argc, char **argv)
> >   		goto err_close_obj;
> >   	}
> > -	if (do_pin_fd(bpf_program__fd(prog), pinfile))
> > +	err = mount_bpffs_for_pin(pinfile);
> > +	if (err)
> >   		goto err_close_obj;
> > +	if (prog) {
> 
> Nit: Maybe "if (first_prog_only) {" instead? If I understand correctly, at
> this stage it should be equivalent, but in my opinion it would make it
> easier to understand why we have two cases here.
Sure, I can do that if you think that's more readable, I don't have a
preference.

> > +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> > +		if (err) {
> > +			p_err("failed to pin program %s",
> > +			      bpf_program__title(prog, false));
> > +			goto err_close_obj;
> > +		}
> > +	} else {
> > +		err = bpf_object__pin(obj, pinfile);
> > +		if (err) {
> > +			p_err("failed to pin all programs");
> > +			goto err_close_obj;
> > +		}
> > +	}
> > +
> >   	if (json_output)
> >   		jsonw_null(json_wtr);
> > @@ -1023,6 +1066,16 @@ static int do_load(int argc, char **argv)
> >   	return -1;
> >   }
> > +static int do_load(int argc, char **argv)
> > +{
> > +	return load_with_options(argc, argv, true);
> > +}
> > +
> > +static int do_loadall(int argc, char **argv)
> > +{
> > +	return load_with_options(argc, argv, false);
> > +}
> > +
> >   static int do_help(int argc, char **argv)
> >   {
> >   	if (json_output) {
> > @@ -1035,10 +1088,11 @@ static int do_help(int argc, char **argv)
> >   		"       %s %s dump xlated PROG [{ file FILE | opcodes | visual }]\n"
> >   		"       %s %s dump jited  PROG [{ file FILE | opcodes }]\n"
> >   		"       %s %s pin   PROG FILE\n"
> > -		"       %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
> > +		"       %s %s { load | loadall } OBJ  FILE \\\n"
> > +		"                         [type TYPE] [dev NAME] \\\n"
> >   		"                         [map { idx IDX | name NAME } MAP]\n"
> > -		"       %s %s attach PROG ATTACH_TYPE MAP\n"
> > -		"       %s %s detach PROG ATTACH_TYPE MAP\n"
> > +		"       %s %s attach PROG ATTACH_TYPE [MAP]\n"
> > +		"       %s %s detach PROG ATTACH_TYPE [MAP]\n"
> >   		"       %s %s help\n"
> >   		"\n"
> >   		"       " HELP_SPEC_MAP "\n"
> > @@ -1050,7 +1104,8 @@ static int do_help(int argc, char **argv)
> >   		"                 cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
> >   		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
> >   		"                 cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
> > -		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
> > +		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse |\n"
> > +		"                        flow_dissector }\n"
> >   		"       " HELP_SPEC_OPTIONS "\n"
> >   		"",
> >   		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > @@ -1067,6 +1122,7 @@ static const struct cmd cmds[] = {
> >   	{ "dump",	do_dump },
> >   	{ "pin",	do_pin },
> >   	{ "load",	do_load },
> > +	{ "loadall",	do_loadall },
> >   	{ "attach",	do_attach },
> >   	{ "detach",	do_detach },
> >   	{ 0 }
> > 
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Edward Cree @ 2018-11-08 17:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin Lau, Yonghong Song, Alexei Starovoitov,
	daniel@iogearbox.net, netdev@vger.kernel.org, Kernel Team
In-Reply-To: <20181107214922.xjqcacj5rc5hmepw@ast-mbp.dhcp.thefacebook.com>

On 07/11/18 21:49, Alexei Starovoitov wrote:
> On Wed, Nov 07, 2018 at 07:29:31PM +0000, Edward Cree wrote:
>> Whereas I don't, and I don't feel like my core criticisms have
>>  been addressed _at all_.  The only answer I get to "BTF should
>>  store type and instance information in separate records" is
>>  "it's a debuginfo",
> ...
>>  I am just trying to organise
>>  BTF to consist of separate _parts_ for types and instances,
>>  rather than forcing both into the same Procrustean bed.
> BTF does not have and should not have instances.
> BTF is debug info only.
> This is not negotiable.
I'm not saying the instances go in BTF, I'm saying that debug info
 *about* instances goes in BTF (it already does, as you keep saying
 BTF is "not just pure types"), and that that ought to be
 distinguished within the format from debug info about types.

> So I'm looking forward to your ideas how to describe BTF in .s
> Note such .s must have freedom to describe 'int bar(struct __sk_buff *a1, char a2)'
> as debug info while having '.globl foo; foo:' as symbol name.
I've pushed out a branch with what I have; see
 https://github.com/solarflarecom/ebpf_asm/tree/btfdoc
 (with some examples in dropper.s and documentation in the README).
In particular note that right now the BTF section is entirely
 decoupled from the .text, so indeed there is nothing right now
 tying function names to symbol names.  I do not yet have anything
 generating FuncInfo (or LineInfo) tables, but when I do that will
 remain decoupled.

> Your other 'criticism' was about libbpf's bpf_map_find_btf_info()
> and ____btf_map_* hack. Yes. It is a hack and I'm open to change it
> if there are better suggestions. It's a convention between
> libbpf and program writers that produce elf. It's not a kernel abi.
> Nothing to do with BTF and this instance vs debug info discussion.
It's everything to do with it: it's defining a type with a magic name
 (____btf_map_foo) when what we really want to do is declare an
 instance (the map 'foo').  And it may not be a kernel ABI, but it's
 a part of the file format you're defining (whether that's just a
 'convention' or something more), and if you want the BTF ecosystem
 to be more than just an llvm monoculture then the format needs to be
 properly specified so that others can work with it.

> Happy to jump on the call to explain it again.
> 10:30am pacific time works for me tomorrow.
That works for me (that's in ~30 minutes from now if I've converted
 correctly.)  Please email me offlist with the phone number to call.

-Ed

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox