* [PATCH net-next v2 1/9] eth: bnxt: use napi_consume_skb()
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 2/9] eth: bnxt: don't run xdp programs on fallback traffic Jakub Kicinski
` (8 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Use napi_consume_skb() to improve skb recycling.
__bnxt_tx_int() already has the real NAPI passed in budget.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 15c57a06ecaf..f6a26f6f85bb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -855,7 +855,7 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
next_tx_int:
cons = NEXT_TX(cons);
- dev_consume_skb_any(skb);
+ napi_consume_skb(skb, budget);
}
WRITE_ONCE(txr->tx_cons, cons);
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH net-next v2 2/9] eth: bnxt: don't run xdp programs on fallback traffic
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 1/9] eth: bnxt: use napi_consume_skb() Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 3/9] eth: bnxt: rename ring_err_stats -> ring_drv_stats Jakub Kicinski
` (7 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
The XDP program attached to the PF should not be executed
on the fallback traffic. Pass the desired dev to bnxt_rx_xdp()
and abort if the packet is for a representor. bnxt_rx_xdp()
has a lot of arguments already, so presumably adding one
more is okay.
Compile tested only.
Well behaved drivers (nfp) do not execute XDP on fallback
traffic, but perhaps this is a matter of opinion rather than
a hard rule, therefore I'm not considering this a fix.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- pass dev to bnxt_rx_xdp(), and skip just the BPF execution,
to avoid unintentionally skipping the Tx ring handling
v1: https://lore.kernel.org/20250226211003.2790916-3-kuba@kernel.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 3 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 11 +++++++----
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 8 ++++++--
3 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 0122782400b8..752b6cf0022c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -17,7 +17,8 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
dma_addr_t mapping, u32 len,
struct xdp_buff *xdp);
void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget);
-bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
+bool bnxt_rx_xdp(struct bnxt *bp, struct net_device *dev,
+ struct bnxt_rx_ring_info *rxr, u16 cons,
struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
unsigned int *len, u8 *event);
int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f6a26f6f85bb..94bc9121d3f9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2036,7 +2036,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
{
struct bnxt_napi *bnapi = cpr->bnapi;
struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
- struct net_device *dev = bp->dev;
+ struct net_device *dev;
struct rx_cmp *rxcmp;
struct rx_cmp_ext *rxcmp1;
u32 tmp_raw_cons = *raw_cons;
@@ -2159,6 +2159,10 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
len = flags >> RX_CMP_LEN_SHIFT;
dma_addr = rx_buf->mapping;
+ dev = bp->dev;
+ if (cmp_type == CMP_TYPE_RX_L2_CMP)
+ dev = bnxt_get_pkt_dev(bp, RX_CMP_CFA_CODE(rxcmp1));
+
if (bnxt_xdp_attached(bp, rxr)) {
bnxt_xdp_buff_init(bp, rxr, cons, data_ptr, len, &xdp);
if (agg_bufs) {
@@ -2172,7 +2176,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
}
if (xdp_active) {
- if (bnxt_rx_xdp(bp, rxr, cons, &xdp, data, &data_ptr, &len, event)) {
+ if (bnxt_rx_xdp(bp, dev, rxr, cons, &xdp, data, &data_ptr, &len,
+ event)) {
rc = 1;
goto next_rx;
}
@@ -2239,8 +2244,6 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
skb_set_hash(skb, le32_to_cpu(rxcmp->rx_cmp_rss_hash), type);
}
- if (cmp_type == CMP_TYPE_RX_L2_CMP)
- dev = bnxt_get_pkt_dev(bp, RX_CMP_CFA_CODE(rxcmp1));
skb->protocol = eth_type_trans(skb, dev);
if (skb->dev->features & BNXT_HW_FEATURE_VLAN_ALL_RX) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index e6c64e4bd66c..aba49ddb0e66 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -221,7 +221,8 @@ void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
* true - packet consumed by XDP and new buffer is allocated.
* false - packet should be passed to the stack.
*/
-bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
+bool bnxt_rx_xdp(struct bnxt *bp, struct net_device *dev,
+ struct bnxt_rx_ring_info *rxr, u16 cons,
struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
unsigned int *len, u8 *event)
{
@@ -246,7 +247,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
/* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
orig_data = xdp->data;
- act = bpf_prog_run_xdp(xdp_prog, xdp);
+ if (bp->dev == dev)
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
+ else /* packet is for a VF representor */
+ act = XDP_PASS;
tx_avail = bnxt_tx_avail(bp, txr);
/* If the tx ring is not full, we must not update the rx producer yet
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH net-next v2 3/9] eth: bnxt: rename ring_err_stats -> ring_drv_stats
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 1/9] eth: bnxt: use napi_consume_skb() Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 2/9] eth: bnxt: don't run xdp programs on fallback traffic Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 4/9] eth: bnxt: snapshot driver stats Jakub Kicinski
` (6 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
We will soon store non-error stats to the ring struct.
Rename them to "drv" stats, as these are all maintained
by the driver (even if partially based on info from descriptors).
Pure rename using sed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 8 ++++----
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 ++++++-------
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 +++++++++----------
3 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index e85b5ce94f58..34f23ddd4d71 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1136,7 +1136,7 @@ struct bnxt_sw_stats {
struct bnxt_cmn_sw_stats cmn;
};
-struct bnxt_total_ring_err_stats {
+struct bnxt_total_ring_drv_stats {
u64 rx_total_l4_csum_errors;
u64 rx_total_resets;
u64 rx_total_buf_errors;
@@ -2538,7 +2538,7 @@ struct bnxt {
u8 pri2cos_idx[8];
u8 pri2cos_valid;
- struct bnxt_total_ring_err_stats ring_err_stats_prev;
+ struct bnxt_total_ring_drv_stats ring_drv_stats_prev;
u16 hwrm_max_req_len;
u16 hwrm_max_ext_req_len;
@@ -2936,8 +2936,8 @@ int bnxt_half_open_nic(struct bnxt *bp);
void bnxt_half_close_nic(struct bnxt *bp);
void bnxt_reenable_sriov(struct bnxt *bp);
void bnxt_close_nic(struct bnxt *, bool, bool);
-void bnxt_get_ring_err_stats(struct bnxt *bp,
- struct bnxt_total_ring_err_stats *stats);
+void bnxt_get_ring_drv_stats(struct bnxt *bp,
+ struct bnxt_total_ring_drv_stats *stats);
bool bnxt_rfs_capable(struct bnxt *bp, bool new_rss_ctx);
int bnxt_dbg_hwrm_rd_reg(struct bnxt *bp, u32 reg_off, u16 num_words,
u32 *reg_buf);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 94bc9121d3f9..4b85f224c344 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12941,7 +12941,7 @@ static void __bnxt_close_nic(struct bnxt *bp, bool irq_re_init,
/* Save ring stats before shutdown */
if (bp->bnapi && irq_re_init) {
bnxt_get_ring_stats(bp, &bp->net_stats_prev);
- bnxt_get_ring_err_stats(bp, &bp->ring_err_stats_prev);
+ bnxt_get_ring_drv_stats(bp, &bp->ring_drv_stats_prev);
}
if (irq_re_init) {
bnxt_free_irq(bp);
@@ -13191,8 +13191,8 @@ bnxt_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
clear_bit(BNXT_STATE_READ_STATS, &bp->state);
}
-static void bnxt_get_one_ring_err_stats(struct bnxt *bp,
- struct bnxt_total_ring_err_stats *stats,
+static void bnxt_get_one_ring_drv_stats(struct bnxt *bp,
+ struct bnxt_total_ring_drv_stats *stats,
struct bnxt_cp_ring_info *cpr)
{
struct bnxt_sw_stats *sw_stats = cpr->sw_stats;
@@ -13211,13 +13211,13 @@ static void bnxt_get_one_ring_err_stats(struct bnxt *bp,
stats->total_missed_irqs += sw_stats->cmn.missed_irqs;
}
-void bnxt_get_ring_err_stats(struct bnxt *bp,
- struct bnxt_total_ring_err_stats *stats)
+void bnxt_get_ring_drv_stats(struct bnxt *bp,
+ struct bnxt_total_ring_drv_stats *stats)
{
int i;
for (i = 0; i < bp->cp_nr_rings; i++)
- bnxt_get_one_ring_err_stats(bp, stats, &bp->bnapi[i]->cp_ring);
+ bnxt_get_one_ring_drv_stats(bp, stats, &bp->bnapi[i]->cp_ring);
}
static bool bnxt_mc_list_updated(struct bnxt *bp, u32 *rx_mask)
@@ -15643,7 +15643,7 @@ static void bnxt_get_base_stats(struct net_device *dev,
rx->packets = bp->net_stats_prev.rx_packets;
rx->bytes = bp->net_stats_prev.rx_bytes;
- rx->alloc_fail = bp->ring_err_stats_prev.rx_total_oom_discards;
+ rx->alloc_fail = bp->ring_drv_stats_prev.rx_total_oom_discards;
tx->packets = bp->net_stats_prev.tx_packets;
tx->bytes = bp->net_stats_prev.tx_bytes;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 9c5820839514..023a0c2d52fd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -340,7 +340,7 @@ enum {
RX_NETPOLL_DISCARDS,
};
-static const char *const bnxt_ring_err_stats_arr[] = {
+static const char *const bnxt_ring_drv_stats_arr[] = {
"rx_total_l4_csum_errors",
"rx_total_resets",
"rx_total_buf_errors",
@@ -500,7 +500,7 @@ static const struct {
BNXT_TX_STATS_PRI_ENTRIES(tx_packets),
};
-#define BNXT_NUM_RING_ERR_STATS ARRAY_SIZE(bnxt_ring_err_stats_arr)
+#define BNXT_NUM_RING_DRV_STATS ARRAY_SIZE(bnxt_ring_drv_stats_arr)
#define BNXT_NUM_PORT_STATS ARRAY_SIZE(bnxt_port_stats_arr)
#define BNXT_NUM_STATS_PRI \
(ARRAY_SIZE(bnxt_rx_bytes_pri_arr) + \
@@ -539,7 +539,7 @@ static int bnxt_get_num_stats(struct bnxt *bp)
int num_stats = bnxt_get_num_ring_stats(bp);
int len;
- num_stats += BNXT_NUM_RING_ERR_STATS;
+ num_stats += BNXT_NUM_RING_DRV_STATS;
if (bp->flags & BNXT_FLAG_PORT_STATS)
num_stats += BNXT_NUM_PORT_STATS;
@@ -594,7 +594,7 @@ static bool is_tx_ring(struct bnxt *bp, int ring_num)
static void bnxt_get_ethtool_stats(struct net_device *dev,
struct ethtool_stats *stats, u64 *buf)
{
- struct bnxt_total_ring_err_stats ring_err_stats = {0};
+ struct bnxt_total_ring_drv_stats ring_drv_stats = {0};
struct bnxt *bp = netdev_priv(dev);
u64 *curr, *prev;
u32 tpa_stats;
@@ -643,12 +643,12 @@ static void bnxt_get_ethtool_stats(struct net_device *dev,
buf[j] = sw[k];
}
- bnxt_get_ring_err_stats(bp, &ring_err_stats);
+ bnxt_get_ring_drv_stats(bp, &ring_drv_stats);
skip_ring_stats:
- curr = &ring_err_stats.rx_total_l4_csum_errors;
- prev = &bp->ring_err_stats_prev.rx_total_l4_csum_errors;
- for (i = 0; i < BNXT_NUM_RING_ERR_STATS; i++, j++, curr++, prev++)
+ curr = &ring_drv_stats.rx_total_l4_csum_errors;
+ prev = &bp->ring_drv_stats_prev.rx_total_l4_csum_errors;
+ for (i = 0; i < BNXT_NUM_RING_DRV_STATS; i++, j++, curr++, prev++)
buf[j] = *curr + *prev;
if (bp->flags & BNXT_FLAG_PORT_STATS) {
@@ -752,8 +752,8 @@ static void bnxt_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
ethtool_sprintf(&buf, "[%d]: %s", i, str);
}
}
- for (i = 0; i < BNXT_NUM_RING_ERR_STATS; i++)
- ethtool_puts(&buf, bnxt_ring_err_stats_arr[i]);
+ for (i = 0; i < BNXT_NUM_RING_DRV_STATS; i++)
+ ethtool_puts(&buf, bnxt_ring_drv_stats_arr[i]);
if (bp->flags & BNXT_FLAG_PORT_STATS)
for (i = 0; i < BNXT_NUM_PORT_STATS; i++) {
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH net-next v2 4/9] eth: bnxt: snapshot driver stats
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (2 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 3/9] eth: bnxt: rename ring_err_stats -> ring_drv_stats Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 5/9] eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO Jakub Kicinski
` (5 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Subsequent commits will add datapath stats which need u64_stats
protection. Make current readers work on a snapshot, so it's
easier to extend this code without much duplication.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 31 +++++++++++++++--------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4b85f224c344..854e7ec5390b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -13093,6 +13093,12 @@ static int bnxt_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
return -EOPNOTSUPP;
}
+static void bnxt_drv_stat_snapshot(const struct bnxt_sw_stats *sw_stats,
+ struct bnxt_sw_stats *snapshot)
+{
+ memcpy(snapshot, sw_stats, sizeof(*snapshot));
+}
+
static void bnxt_get_ring_stats(struct bnxt *bp,
struct rtnl_link_stats64 *stats)
{
@@ -13101,8 +13107,11 @@ static void bnxt_get_ring_stats(struct bnxt *bp,
for (i = 0; i < bp->cp_nr_rings; i++) {
struct bnxt_napi *bnapi = bp->bnapi[i];
struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
+ struct bnxt_sw_stats sw_stats;
u64 *sw = cpr->stats.sw_stats;
+ bnxt_drv_stat_snapshot(cpr->sw_stats, &sw_stats);
+
stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_ucast_pkts);
stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_mcast_pkts);
stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_bcast_pkts);
@@ -13127,8 +13136,8 @@ static void bnxt_get_ring_stats(struct bnxt *bp,
stats->tx_dropped += BNXT_GET_RING_STATS64(sw, tx_error_pkts);
stats->rx_dropped +=
- cpr->sw_stats->rx.rx_netpoll_discards +
- cpr->sw_stats->rx.rx_oom_discards;
+ sw_stats.rx.rx_netpoll_discards +
+ sw_stats.rx.rx_oom_discards;
}
}
@@ -13195,20 +13204,22 @@ static void bnxt_get_one_ring_drv_stats(struct bnxt *bp,
struct bnxt_total_ring_drv_stats *stats,
struct bnxt_cp_ring_info *cpr)
{
- struct bnxt_sw_stats *sw_stats = cpr->sw_stats;
u64 *hw_stats = cpr->stats.sw_stats;
+ struct bnxt_sw_stats sw_stats;
- stats->rx_total_l4_csum_errors += sw_stats->rx.rx_l4_csum_errors;
- stats->rx_total_resets += sw_stats->rx.rx_resets;
- stats->rx_total_buf_errors += sw_stats->rx.rx_buf_errors;
- stats->rx_total_oom_discards += sw_stats->rx.rx_oom_discards;
- stats->rx_total_netpoll_discards += sw_stats->rx.rx_netpoll_discards;
+ bnxt_drv_stat_snapshot(cpr->sw_stats, &sw_stats);
+
+ stats->rx_total_l4_csum_errors += sw_stats.rx.rx_l4_csum_errors;
+ stats->rx_total_resets += sw_stats.rx.rx_resets;
+ stats->rx_total_buf_errors += sw_stats.rx.rx_buf_errors;
+ stats->rx_total_oom_discards += sw_stats.rx.rx_oom_discards;
+ stats->rx_total_netpoll_discards += sw_stats.rx.rx_netpoll_discards;
stats->rx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
- stats->tx_total_resets += sw_stats->tx.tx_resets;
+ stats->tx_total_resets += sw_stats.tx.tx_resets;
stats->tx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, tx_discard_pkts);
- stats->total_missed_irqs += sw_stats->cmn.missed_irqs;
+ stats->total_missed_irqs += sw_stats.cmn.missed_irqs;
}
void bnxt_get_ring_drv_stats(struct bnxt *bp,
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH net-next v2 5/9] eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (3 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 4/9] eth: bnxt: snapshot driver stats Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-03-03 6:47 ` Michael Chan
2025-02-28 1:25 ` [PATCH net-next v2 6/9] eth: bnxt: consolidate the GRO-but-not-really paths in bnxt_gro_skb() Jakub Kicinski
` (4 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Use IS_ENABLED(CONFIG_INET) to make the code easier to refactor.
Now all packets which did not go thru GRO will exit in the same
branch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 854e7ec5390b..d8a24a8bcfe8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1740,12 +1740,11 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
struct rx_tpa_end_cmp_ext *tpa_end1,
struct sk_buff *skb)
{
-#ifdef CONFIG_INET
int payload_off;
u16 segs;
segs = TPA_END_TPA_SEGS(tpa_end);
- if (segs == 1)
+ if (segs == 1 || !IS_ENABLED(CONFIG_INET))
return skb;
NAPI_GRO_CB(skb)->count = segs;
@@ -1759,7 +1758,6 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
skb = bp->gro_func(tpa_info, payload_off, TPA_END_GRO_TS(tpa_end), skb);
if (likely(skb))
tcp_gro_complete(skb);
-#endif
return skb;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH net-next v2 5/9] eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO
2025-02-28 1:25 ` [PATCH net-next v2 5/9] eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO Jakub Kicinski
@ 2025-03-03 6:47 ` Michael Chan
0 siblings, 0 replies; 19+ messages in thread
From: Michael Chan @ 2025-03-03 6:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 510 bytes --]
On Thu, Feb 27, 2025 at 5:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Use IS_ENABLED(CONFIG_INET) to make the code easier to refactor.
> Now all packets which did not go thru GRO will exit in the same
> branch.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the old days, tcp_gro_complete() was undefined without CONFIG_INET
and that's why we needed the #ifdef. Now there is a stub for
tcp_gro_complete() so this will work.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next v2 6/9] eth: bnxt: consolidate the GRO-but-not-really paths in bnxt_gro_skb()
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (4 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 5/9] eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-02-28 1:25 ` [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW Jakub Kicinski
` (3 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
bnxt_tpa_end() skips calling bnxt_gro_skb() if it determines that GRO
should not be performed. For ease of packet counting pass the gro bool
into bnxt_gro_skb(), this way we have a single branch thru which all
non-GRO packets coming out of bnxt_tpa_end() should pass.
bnxt_gro_skb() is a static inline with a single caller, it will
be inlined so there is no concern about adding an extra call.
seg count will now be extracted every time, but tpa_end is touched
by bnxt_tpa_end(), the field extraction will make no practical
difference.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d8a24a8bcfe8..d4052bfcc4ff 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1735,6 +1735,7 @@ static struct sk_buff *bnxt_gro_func_5730x(struct bnxt_tpa_info *tpa_info,
}
static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
+ bool gro,
struct bnxt_tpa_info *tpa_info,
struct rx_tpa_end_cmp *tpa_end,
struct rx_tpa_end_cmp_ext *tpa_end1,
@@ -1744,7 +1745,7 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
u16 segs;
segs = TPA_END_TPA_SEGS(tpa_end);
- if (segs == 1 || !IS_ENABLED(CONFIG_INET))
+ if (!gro || segs == 1 || !IS_ENABLED(CONFIG_INET))
return skb;
NAPI_GRO_CB(skb)->count = segs;
@@ -1917,10 +1918,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
(tpa_info->flags2 & RX_CMP_FLAGS2_T_L4_CS_CALC) >> 3;
}
- if (gro)
- skb = bnxt_gro_skb(bp, tpa_info, tpa_end, tpa_end1, skb);
-
- return skb;
+ return bnxt_gro_skb(bp, gro, tpa_info, tpa_end, tpa_end1, skb);
}
static void bnxt_tpa_agg(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (5 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 6/9] eth: bnxt: consolidate the GRO-but-not-really paths in bnxt_gro_skb() Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-03-03 7:06 ` Michael Chan
2025-02-28 1:25 ` [PATCH net-next v2 8/9] eth: bnxt: maintain tx " Jakub Kicinski
` (2 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Some workloads want to be able to track bandwidth utilization on
the scale of 10s of msecs. Updating all HW stats at this rate is
both hard and wasteful of PCIe bandwidth.
Maintain basic Rx pkt/byte counters in software. ethtool -S will still
show the HW stats, but qstats and rtnl stats will show SW statistics.
We need to take care of HW-GRO, XDP and VF representors. Per netdev
qstat definition Rx stats should reflect packets passed to XDP (if
active, otherwise to the stack). XDP and GRO do not interoperate
in bnxt, so we need to count the packets in a few places.
Add a helper and call it where needed.
Do not count VF representor traffic as traffic for the main netdev.
The stats are added towards the end of the struct since ethtool
code expects existing members to be first.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- adjust to skipping XDP inside bnxt_rx_pkt()
v1: https://lore.kernel.org/20250226211003.2790916-8-kuba@kernel.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 25 +++++++
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 68 +++++++++++--------
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 14 +++-
3 files changed, 77 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 34f23ddd4d71..376141a41ecf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1118,8 +1118,12 @@ struct bnxt_rx_sw_stats {
u64 rx_l4_csum_errors;
u64 rx_resets;
u64 rx_buf_errors;
+ /* non-ethtool stats follow */
u64 rx_oom_discards;
u64 rx_netpoll_discards;
+ u64 rx_packets;
+ u64 rx_bytes;
+ struct u64_stats_sync syncp;
};
struct bnxt_tx_sw_stats {
@@ -1146,6 +1150,9 @@ struct bnxt_total_ring_drv_stats {
u64 tx_total_resets;
u64 tx_total_ring_discards;
u64 total_missed_irqs;
+ /* non-ethtool stats follow */
+ u64 rx_total_packets;
+ u64 rx_total_bytes;
};
struct bnxt_stats_mem {
@@ -2797,6 +2804,24 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp,
return bp->tx_ring_size - (used & bp->tx_ring_mask);
}
+static inline void bnxt_rx_pkt_cnt(struct bnxt *bp, struct bnxt_napi *bnapi,
+ struct net_device *dev,
+ int segs, int len, int payload_off)
+{
+ struct bnxt_sw_stats *sw_stats = bnapi->cp_ring.sw_stats;
+
+ /* Packet is for a representor */
+ if (bp->dev != dev)
+ return;
+
+ u64_stats_update_begin(&sw_stats->rx.syncp);
+ sw_stats->rx.rx_packets += segs;
+ sw_stats->rx.rx_bytes += len;
+ if (segs > 1)
+ sw_stats->rx.rx_bytes += (segs - 1) * payload_off;
+ u64_stats_update_end(&sw_stats->rx.syncp);
+}
+
static inline void bnxt_writeq(struct bnxt *bp, u64 val,
volatile void __iomem *addr)
{
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d4052bfcc4ff..3840359250b2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1735,6 +1735,7 @@ static struct sk_buff *bnxt_gro_func_5730x(struct bnxt_tpa_info *tpa_info,
}
static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
+ struct bnxt_napi *bnapi,
bool gro,
struct bnxt_tpa_info *tpa_info,
struct rx_tpa_end_cmp *tpa_end,
@@ -1742,11 +1743,15 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
struct sk_buff *skb)
{
int payload_off;
+ int full_len;
u16 segs;
+ full_len = skb->len - skb_mac_offset(skb);
segs = TPA_END_TPA_SEGS(tpa_end);
- if (!gro || segs == 1 || !IS_ENABLED(CONFIG_INET))
+ if (!gro || segs == 1 || !IS_ENABLED(CONFIG_INET)) {
+ bnxt_rx_pkt_cnt(bp, bnapi, skb->dev, 1, full_len, 0);
return skb;
+ }
NAPI_GRO_CB(skb)->count = segs;
skb_shinfo(skb)->gso_size =
@@ -1757,8 +1762,11 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
else
payload_off = TPA_END_PAYLOAD_OFF(tpa_end);
skb = bp->gro_func(tpa_info, payload_off, TPA_END_GRO_TS(tpa_end), skb);
- if (likely(skb))
+ if (likely(skb)) {
tcp_gro_complete(skb);
+ bnxt_rx_pkt_cnt(bp, bnapi, skb->dev,
+ segs, full_len, payload_off);
+ }
return skb;
}
@@ -1918,7 +1926,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
(tpa_info->flags2 & RX_CMP_FLAGS2_T_L4_CS_CALC) >> 3;
}
- return bnxt_gro_skb(bp, gro, tpa_info, tpa_end, tpa_end1, skb);
+ return bnxt_gro_skb(bp, bnapi, gro, tpa_info, tpa_end, tpa_end1, skb);
}
static void bnxt_tpa_agg(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
@@ -2275,6 +2283,10 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
}
}
}
+
+ if (!xdp_active) /* XDP packets counted in bnxt_rx_xdp() */
+ bnxt_rx_pkt_cnt(bp, bnapi, dev,
+ 1, skb->len - skb_mac_offset(skb), 0);
bnxt_deliver_skb(bp, bnapi, skb);
rc = 1;
@@ -5115,6 +5127,8 @@ static int bnxt_alloc_stats(struct bnxt *bp)
if (!cpr->sw_stats)
return -ENOMEM;
+ u64_stats_init(&cpr->sw_stats->rx.syncp);
+
cpr->stats.len = size;
rc = bnxt_alloc_stats_mem(bp, &cpr->stats, !i);
if (rc)
@@ -13092,7 +13106,14 @@ static int bnxt_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
static void bnxt_drv_stat_snapshot(const struct bnxt_sw_stats *sw_stats,
struct bnxt_sw_stats *snapshot)
{
- memcpy(snapshot, sw_stats, sizeof(*snapshot));
+ unsigned int seq_rx;
+
+ do {
+ seq_rx = u64_stats_fetch_begin(&sw_stats->rx.syncp);
+
+ memcpy(snapshot, sw_stats, sizeof(*snapshot));
+
+ } while (u64_stats_fetch_retry(&sw_stats->rx.syncp, seq_rx));
}
static void bnxt_get_ring_stats(struct bnxt *bp,
@@ -13108,18 +13129,13 @@ static void bnxt_get_ring_stats(struct bnxt *bp,
bnxt_drv_stat_snapshot(cpr->sw_stats, &sw_stats);
- stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_ucast_pkts);
- stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_mcast_pkts);
- stats->rx_packets += BNXT_GET_RING_STATS64(sw, rx_bcast_pkts);
+ stats->rx_packets += sw_stats.rx.rx_packets;
+ stats->rx_bytes += sw_stats.rx.rx_bytes;
stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_ucast_pkts);
stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_mcast_pkts);
stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_bcast_pkts);
- stats->rx_bytes += BNXT_GET_RING_STATS64(sw, rx_ucast_bytes);
- stats->rx_bytes += BNXT_GET_RING_STATS64(sw, rx_mcast_bytes);
- stats->rx_bytes += BNXT_GET_RING_STATS64(sw, rx_bcast_bytes);
-
stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_ucast_bytes);
stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_mcast_bytes);
stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_bcast_bytes);
@@ -13210,6 +13226,8 @@ static void bnxt_get_one_ring_drv_stats(struct bnxt *bp,
stats->rx_total_buf_errors += sw_stats.rx.rx_buf_errors;
stats->rx_total_oom_discards += sw_stats.rx.rx_oom_discards;
stats->rx_total_netpoll_discards += sw_stats.rx.rx_netpoll_discards;
+ stats->rx_total_packets += sw_stats.rx.rx_packets;
+ stats->rx_total_bytes += sw_stats.rx.rx_bytes;
stats->rx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
stats->tx_total_resets += sw_stats.tx.tx_resets;
@@ -15602,23 +15620,17 @@ static void bnxt_get_queue_stats_rx(struct net_device *dev, int i,
struct netdev_queue_stats_rx *stats)
{
struct bnxt *bp = netdev_priv(dev);
- struct bnxt_cp_ring_info *cpr;
- u64 *sw;
+ struct bnxt_sw_stats *sw_stats;
+ unsigned int seq;
- cpr = &bp->bnapi[i]->cp_ring;
- sw = cpr->stats.sw_stats;
+ sw_stats = bp->bnapi[i]->cp_ring.sw_stats;
- stats->packets = 0;
- stats->packets += BNXT_GET_RING_STATS64(sw, rx_ucast_pkts);
- stats->packets += BNXT_GET_RING_STATS64(sw, rx_mcast_pkts);
- stats->packets += BNXT_GET_RING_STATS64(sw, rx_bcast_pkts);
-
- stats->bytes = 0;
- stats->bytes += BNXT_GET_RING_STATS64(sw, rx_ucast_bytes);
- stats->bytes += BNXT_GET_RING_STATS64(sw, rx_mcast_bytes);
- stats->bytes += BNXT_GET_RING_STATS64(sw, rx_bcast_bytes);
-
- stats->alloc_fail = cpr->sw_stats->rx.rx_oom_discards;
+ do {
+ seq = u64_stats_fetch_begin(&sw_stats->rx.syncp);
+ stats->packets = sw_stats->rx.rx_packets;
+ stats->bytes = sw_stats->rx.rx_bytes;
+ stats->alloc_fail = sw_stats->rx.rx_oom_discards;
+ } while (u64_stats_fetch_retry(&sw_stats->rx.syncp, seq));
}
static void bnxt_get_queue_stats_tx(struct net_device *dev, int i,
@@ -15648,8 +15660,8 @@ static void bnxt_get_base_stats(struct net_device *dev,
{
struct bnxt *bp = netdev_priv(dev);
- rx->packets = bp->net_stats_prev.rx_packets;
- rx->bytes = bp->net_stats_prev.rx_bytes;
+ rx->packets = bp->ring_drv_stats_prev.rx_total_packets;
+ rx->bytes = bp->ring_drv_stats_prev.rx_total_bytes;
rx->alloc_fail = bp->ring_drv_stats_prev.rx_total_oom_discards;
tx->packets = bp->net_stats_prev.tx_packets;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index aba49ddb0e66..16d3698cf0e9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -247,10 +247,20 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct net_device *dev,
/* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
orig_data = xdp->data;
- if (bp->dev == dev)
+ if (bp->dev == dev) {
+ struct skb_shared_info *sinfo;
+ int stat_len = *len;
+
+ if (unlikely(xdp_buff_has_frags(xdp))) {
+ sinfo = xdp_get_shared_info_from_buff(xdp);
+ stat_len += sinfo->xdp_frags_size;
+ }
+
+ bnxt_rx_pkt_cnt(bp, rxr->bnapi, dev, 1, stat_len, 0);
act = bpf_prog_run_xdp(xdp_prog, xdp);
- else /* packet is for a VF representor */
+ } else { /* packet is for a VF representor */
act = XDP_PASS;
+ }
tx_avail = bnxt_tx_avail(bp, txr);
/* If the tx ring is not full, we must not update the rx producer yet
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW
2025-02-28 1:25 ` [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW Jakub Kicinski
@ 2025-03-03 7:06 ` Michael Chan
2025-03-03 22:32 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Michael Chan @ 2025-03-03 7:06 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]
On Thu, Feb 27, 2025 at 5:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Some workloads want to be able to track bandwidth utilization on
> the scale of 10s of msecs. Updating all HW stats at this rate is
> both hard and wasteful of PCIe bandwidth.
>
> Maintain basic Rx pkt/byte counters in software. ethtool -S will still
> show the HW stats, but qstats and rtnl stats will show SW statistics.
>
> We need to take care of HW-GRO, XDP and VF representors. Per netdev
> qstat definition Rx stats should reflect packets passed to XDP (if
> active, otherwise to the stack). XDP and GRO do not interoperate
> in bnxt, so we need to count the packets in a few places.
> Add a helper and call it where needed.
>
> Do not count VF representor traffic as traffic for the main netdev.
>
> The stats are added towards the end of the struct since ethtool
> code expects existing members to be first.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The byte counting looks correct except for VLAN packets with the VLAN
tag stripped. Do we want to count the 4 bytes for the VLAN tag?
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW
2025-03-03 7:06 ` Michael Chan
@ 2025-03-03 22:32 ` Jakub Kicinski
2025-03-03 22:38 ` Michael Chan
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-03 22:32 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
On Sun, 2 Mar 2025 23:06:29 -0800 Michael Chan wrote:
> On Thu, Feb 27, 2025 at 5:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Some workloads want to be able to track bandwidth utilization on
> > the scale of 10s of msecs. Updating all HW stats at this rate is
> > both hard and wasteful of PCIe bandwidth.
> >
> > Maintain basic Rx pkt/byte counters in software. ethtool -S will still
> > show the HW stats, but qstats and rtnl stats will show SW statistics.
> >
> > We need to take care of HW-GRO, XDP and VF representors. Per netdev
> > qstat definition Rx stats should reflect packets passed to XDP (if
> > active, otherwise to the stack). XDP and GRO do not interoperate
> > in bnxt, so we need to count the packets in a few places.
> > Add a helper and call it where needed.
> >
> > Do not count VF representor traffic as traffic for the main netdev.
> >
> > The stats are added towards the end of the struct since ethtool
> > code expects existing members to be first.
> >
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> The byte counting looks correct except for VLAN packets with the VLAN
> tag stripped. Do we want to count the 4 bytes for the VLAN tag?
Not 100% clear to me. I looked at nfp and a couple of Intel drivers
and if I read the code right they don't count VLAN tags if stripped.
I suspect that's what most drivers which count in SW will do, simply
because it's easier rather than intentional choice by their authors.
Happy to change if you have a preference, but to spell it out my
understanding is that not counting stripped tags was more comment
and that's what I intended to implement here.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW
2025-03-03 22:32 ` Jakub Kicinski
@ 2025-03-03 22:38 ` Michael Chan
2025-03-04 22:33 ` Michael Chan
0 siblings, 1 reply; 19+ messages in thread
From: Michael Chan @ 2025-03-03 22:38 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 748 bytes --]
On Mon, Mar 3, 2025 at 2:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Not 100% clear to me. I looked at nfp and a couple of Intel drivers
> and if I read the code right they don't count VLAN tags if stripped.
> I suspect that's what most drivers which count in SW will do, simply
> because it's easier rather than intentional choice by their authors.
>
> Happy to change if you have a preference, but to spell it out my
> understanding is that not counting stripped tags was more comment
> and that's what I intended to implement here.
I believe our hardware byte counters will count the VLAN tag, but let
me double check on that to be certain. I think for consistency, the
software counters should match the hardware counters.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW
2025-03-03 22:38 ` Michael Chan
@ 2025-03-04 22:33 ` Michael Chan
0 siblings, 0 replies; 19+ messages in thread
From: Michael Chan @ 2025-03-04 22:33 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]
On Mon, Mar 3, 2025 at 2:38 PM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Mon, Mar 3, 2025 at 2:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Not 100% clear to me. I looked at nfp and a couple of Intel drivers
> > and if I read the code right they don't count VLAN tags if stripped.
> > I suspect that's what most drivers which count in SW will do, simply
> > because it's easier rather than intentional choice by their authors.
> >
> > Happy to change if you have a preference, but to spell it out my
> > understanding is that not counting stripped tags was more comment
> > and that's what I intended to implement here.
>
> I believe our hardware byte counters will count the VLAN tag, but let
> me double check on that to be certain. I think for consistency, the
> software counters should match the hardware counters.
I've confirmed that our HW will count packet bytes on the network
side. This means that the VLAN tag will be counted even if it gets
stripped on RX or inserted on TX. This also means that padding bytes
inserted by the HW will be counted as well. The driver only pads in
software TX packets less than 52 bytes.
I think it's better for SW stats to match HW stats. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next v2 8/9] eth: bnxt: maintain tx pkt/byte stats in SW
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (6 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 7/9] eth: bnxt: maintain rx pkt/byte stats in SW Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-03-03 7:08 ` Michael Chan
2025-02-28 1:25 ` [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets Jakub Kicinski
2025-02-28 11:51 ` [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Przemek Kitszel
9 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Some workloads want to be able to track bandwidth utilization on
the scale of 10s of msecs. Updating all HW stats at this rate is
both hard and wasteful of PCIe bandwidth.
Maintain basic Tx pkt/byte counters in software. ethtool -S will still
show the HW stats, but qstats and rtnl stats will show SW statistics.
We need to take care of TSO and VF representors, record relevant
state in tx_buf to avoid touching potentially cold skb. Use existing
holes in the struct (no size change). Note that according to TX_BD_HSIZE
max header size is 255, so u8 should be enough.
It's not obvious whether VF representor traffic should be counted,
on one hand it doesn't belong to the local interface. On the other
it does go thru normal queuing so Qdisc will show it. I opted to
_not_ count it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 11 +++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 70 ++++++++++++++---------
2 files changed, 54 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 376141a41ecf..d5f617fd5beb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -877,11 +877,14 @@ struct bnxt_sw_tx_bd {
struct sk_buff *skb;
struct xdp_frame *xdpf;
};
+ struct page *page;
DEFINE_DMA_UNMAP_ADDR(mapping);
DEFINE_DMA_UNMAP_LEN(len);
- struct page *page;
+ u16 extra_segs;
+ u8 hdr_size;
u8 is_ts_pkt;
u8 is_push;
+ u8 is_vfr;
u8 action;
unsigned short nr_frags;
union {
@@ -1128,6 +1131,10 @@ struct bnxt_rx_sw_stats {
struct bnxt_tx_sw_stats {
u64 tx_resets;
+ /* non-ethtool stats follow */
+ u64 tx_packets;
+ u64 tx_bytes;
+ struct u64_stats_sync syncp;
};
struct bnxt_cmn_sw_stats {
@@ -1153,6 +1160,8 @@ struct bnxt_total_ring_drv_stats {
/* non-ethtool stats follow */
u64 rx_total_packets;
u64 rx_total_bytes;
+ u64 tx_total_packets;
+ u64 tx_total_bytes;
};
struct bnxt_stats_mem {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 3840359250b2..f6308e4e8360 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -514,6 +514,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
vlan_tag_flags = 0;
cfa_action = bnxt_xmit_get_cfa_action(skb);
+ tx_buf->is_vfr = !!cfa_action;
if (skb_vlan_tag_present(skb)) {
vlan_tag_flags = TX_BD_CFA_META_KEY_VLAN |
skb_vlan_tag_get(skb);
@@ -675,6 +676,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
length = skb_shinfo(skb)->gso_size;
txbd1->tx_bd_mss = cpu_to_le32(length);
length += hdr_len;
+
+ tx_buf->hdr_size = hdr_len;
+ tx_buf->extra_segs = skb_shinfo(skb)->gso_segs - 1;
} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
txbd1->tx_bd_hsize_lflags |=
cpu_to_le32(TX_BD_FLAGS_TCP_UDP_CHKSUM);
@@ -784,6 +788,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (txr->kick_pending)
bnxt_txr_db_kick(bp, txr, txr->tx_prod);
txr->tx_buf_ring[txr->tx_prod].skb = NULL;
+ txr->tx_buf_ring[txr->tx_prod].extra_segs = 0;
+ txr->tx_buf_ring[txr->tx_prod].hdr_size = 0;
+ txr->tx_buf_ring[txr->tx_prod].is_vfr = 0;
dev_core_stats_tx_dropped_inc(dev);
return NETDEV_TX_OK;
}
@@ -793,11 +800,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
int budget)
{
struct netdev_queue *txq = netdev_get_tx_queue(bp->dev, txr->txq_index);
+ struct bnxt_sw_stats *sw_stats = txr->bnapi->cp_ring.sw_stats;
struct pci_dev *pdev = bp->pdev;
+ int adj_bytes = 0, tx_bytes = 0;
+ int adj_pkts = 0, tx_pkts = 0;
u16 hw_cons = txr->tx_hw_cons;
- unsigned int tx_bytes = 0;
u16 cons = txr->tx_cons;
- int tx_pkts = 0;
bool rc = false;
while (RING_TX(bp, cons) != hw_cons) {
@@ -823,8 +831,18 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
cons = NEXT_TX(cons);
tx_pkts++;
tx_bytes += skb->len;
+ if (!tx_buf->is_vfr) {
+ adj_pkts += tx_buf->extra_segs;
+ adj_bytes += tx_buf->extra_segs * tx_buf->hdr_size;
+ } else {
+ adj_pkts--;
+ adj_bytes -= skb->len;
+ }
tx_buf->skb = NULL;
+ tx_buf->extra_segs = 0;
+ tx_buf->hdr_size = 0;
tx_buf->is_ts_pkt = 0;
+ tx_buf->is_vfr = 0;
if (tx_buf->is_push) {
tx_buf->is_push = 0;
@@ -860,6 +878,11 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
WRITE_ONCE(txr->tx_cons, cons);
+ u64_stats_update_begin(&sw_stats->tx.syncp);
+ sw_stats->tx.tx_packets += tx_pkts + adj_pkts;
+ sw_stats->tx.tx_bytes += tx_bytes + adj_bytes;
+ u64_stats_update_end(&sw_stats->tx.syncp);
+
__netif_txq_completed_wake(txq, tx_pkts, tx_bytes,
bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
READ_ONCE(txr->dev_state) == BNXT_DEV_STATE_CLOSING);
@@ -5128,6 +5151,7 @@ static int bnxt_alloc_stats(struct bnxt *bp)
return -ENOMEM;
u64_stats_init(&cpr->sw_stats->rx.syncp);
+ u64_stats_init(&cpr->sw_stats->tx.syncp);
cpr->stats.len = size;
rc = bnxt_alloc_stats_mem(bp, &cpr->stats, !i);
@@ -13106,14 +13130,16 @@ static int bnxt_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
static void bnxt_drv_stat_snapshot(const struct bnxt_sw_stats *sw_stats,
struct bnxt_sw_stats *snapshot)
{
- unsigned int seq_rx;
+ unsigned int seq_rx, seq_tx;
do {
seq_rx = u64_stats_fetch_begin(&sw_stats->rx.syncp);
+ seq_tx = u64_stats_fetch_begin(&sw_stats->tx.syncp);
memcpy(snapshot, sw_stats, sizeof(*snapshot));
- } while (u64_stats_fetch_retry(&sw_stats->rx.syncp, seq_rx));
+ } while (u64_stats_fetch_retry(&sw_stats->rx.syncp, seq_rx) ||
+ u64_stats_fetch_retry(&sw_stats->tx.syncp, seq_tx));
}
static void bnxt_get_ring_stats(struct bnxt *bp,
@@ -13132,13 +13158,8 @@ static void bnxt_get_ring_stats(struct bnxt *bp,
stats->rx_packets += sw_stats.rx.rx_packets;
stats->rx_bytes += sw_stats.rx.rx_bytes;
- stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_ucast_pkts);
- stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_mcast_pkts);
- stats->tx_packets += BNXT_GET_RING_STATS64(sw, tx_bcast_pkts);
-
- stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_ucast_bytes);
- stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_mcast_bytes);
- stats->tx_bytes += BNXT_GET_RING_STATS64(sw, tx_bcast_bytes);
+ stats->tx_packets += sw_stats.tx.tx_packets;
+ stats->tx_bytes += sw_stats.tx.tx_bytes;
stats->rx_missed_errors +=
BNXT_GET_RING_STATS64(sw, rx_discard_pkts);
@@ -13230,6 +13251,8 @@ static void bnxt_get_one_ring_drv_stats(struct bnxt *bp,
stats->rx_total_bytes += sw_stats.rx.rx_bytes;
stats->rx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
+ stats->tx_total_packets += sw_stats.tx.tx_packets;
+ stats->tx_total_bytes += sw_stats.tx.tx_bytes;
stats->tx_total_resets += sw_stats.tx.tx_resets;
stats->tx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, tx_discard_pkts);
@@ -15637,21 +15660,16 @@ static void bnxt_get_queue_stats_tx(struct net_device *dev, int i,
struct netdev_queue_stats_tx *stats)
{
struct bnxt *bp = netdev_priv(dev);
- struct bnxt_napi *bnapi;
- u64 *sw;
+ struct bnxt_sw_stats *sw_stats;
+ unsigned int seq;
- bnapi = bp->tx_ring[bp->tx_ring_map[i]].bnapi;
- sw = bnapi->cp_ring.stats.sw_stats;
+ sw_stats = bp->tx_ring[bp->tx_ring_map[i]].bnapi->cp_ring.sw_stats;
- stats->packets = 0;
- stats->packets += BNXT_GET_RING_STATS64(sw, tx_ucast_pkts);
- stats->packets += BNXT_GET_RING_STATS64(sw, tx_mcast_pkts);
- stats->packets += BNXT_GET_RING_STATS64(sw, tx_bcast_pkts);
-
- stats->bytes = 0;
- stats->bytes += BNXT_GET_RING_STATS64(sw, tx_ucast_bytes);
- stats->bytes += BNXT_GET_RING_STATS64(sw, tx_mcast_bytes);
- stats->bytes += BNXT_GET_RING_STATS64(sw, tx_bcast_bytes);
+ do {
+ seq = u64_stats_fetch_begin(&sw_stats->tx.syncp);
+ stats->packets = sw_stats->tx.tx_packets;
+ stats->bytes = sw_stats->tx.tx_bytes;
+ } while (u64_stats_fetch_retry(&sw_stats->tx.syncp, seq));
}
static void bnxt_get_base_stats(struct net_device *dev,
@@ -15664,8 +15682,8 @@ static void bnxt_get_base_stats(struct net_device *dev,
rx->bytes = bp->ring_drv_stats_prev.rx_total_bytes;
rx->alloc_fail = bp->ring_drv_stats_prev.rx_total_oom_discards;
- tx->packets = bp->net_stats_prev.tx_packets;
- tx->bytes = bp->net_stats_prev.tx_bytes;
+ tx->packets = bp->ring_drv_stats_prev.tx_total_packets;
+ tx->bytes = bp->ring_drv_stats_prev.tx_total_bytes;
}
static const struct netdev_stat_ops bnxt_stat_ops = {
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH net-next v2 8/9] eth: bnxt: maintain tx pkt/byte stats in SW
2025-02-28 1:25 ` [PATCH net-next v2 8/9] eth: bnxt: maintain tx " Jakub Kicinski
@ 2025-03-03 7:08 ` Michael Chan
0 siblings, 0 replies; 19+ messages in thread
From: Michael Chan @ 2025-03-03 7:08 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 992 bytes --]
On Thu, Feb 27, 2025 at 5:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Some workloads want to be able to track bandwidth utilization on
> the scale of 10s of msecs. Updating all HW stats at this rate is
> both hard and wasteful of PCIe bandwidth.
>
> Maintain basic Tx pkt/byte counters in software. ethtool -S will still
> show the HW stats, but qstats and rtnl stats will show SW statistics.
>
> We need to take care of TSO and VF representors, record relevant
> state in tx_buf to avoid touching potentially cold skb. Use existing
> holes in the struct (no size change). Note that according to TX_BD_HSIZE
> max header size is 255, so u8 should be enough.
>
> It's not obvious whether VF representor traffic should be counted,
> on one hand it doesn't belong to the local interface. On the other
> it does go thru normal queuing so Qdisc will show it. I opted to
> _not_ count it.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Same VLAN comment for TX.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (7 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 8/9] eth: bnxt: maintain tx " Jakub Kicinski
@ 2025-02-28 1:25 ` Jakub Kicinski
2025-03-04 22:48 ` Michael Chan
2025-02-28 11:51 ` [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Przemek Kitszel
9 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-28 1:25 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, przemyslaw.kitszel, Jakub Kicinski
Count XDP_TX and XDP_REDIRECT packets. Since the Tx rings are separate
we count the packets sent to the base stats, not per-queues stats.
The XDP stats are protected by the Rx syncp since they are in NAPI
context. Feels slightly less ugly than having a Tx stats in Rx struct.
But neither is ideal.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- move tx_buf init sooner so that shinfo handling can access it
v1: https://lore.kernel.org/20250226211003.2790916-10-kuba@kernel.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 +++++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 23 +++++++++++++++--
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 25 +++++++++++++++----
3 files changed, 47 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d5f617fd5beb..415dda512329 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -880,7 +880,10 @@ struct bnxt_sw_tx_bd {
struct page *page;
DEFINE_DMA_UNMAP_ADDR(mapping);
DEFINE_DMA_UNMAP_LEN(len);
- u16 extra_segs;
+ union {
+ u16 extra_segs;
+ u16 xdp_len;
+ };
u8 hdr_size;
u8 is_ts_pkt;
u8 is_push;
@@ -1134,6 +1137,8 @@ struct bnxt_tx_sw_stats {
/* non-ethtool stats follow */
u64 tx_packets;
u64 tx_bytes;
+ u64 xdp_packets; /* under rx syncp */
+ u64 xdp_bytes; /* under rx syncp */
struct u64_stats_sync syncp;
};
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f6308e4e8360..7dbb940c0591 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -13160,6 +13160,8 @@ static void bnxt_get_ring_stats(struct bnxt *bp,
stats->tx_packets += sw_stats.tx.tx_packets;
stats->tx_bytes += sw_stats.tx.tx_bytes;
+ stats->tx_packets += sw_stats.tx.xdp_packets;
+ stats->tx_bytes += sw_stats.tx.xdp_bytes;
stats->rx_missed_errors +=
BNXT_GET_RING_STATS64(sw, rx_discard_pkts);
@@ -13251,8 +13253,9 @@ static void bnxt_get_one_ring_drv_stats(struct bnxt *bp,
stats->rx_total_bytes += sw_stats.rx.rx_bytes;
stats->rx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
- stats->tx_total_packets += sw_stats.tx.tx_packets;
- stats->tx_total_bytes += sw_stats.tx.tx_bytes;
+ stats->tx_total_packets +=
+ sw_stats.tx.tx_packets + sw_stats.tx.xdp_packets;
+ stats->tx_total_bytes += sw_stats.tx.tx_bytes + sw_stats.tx.xdp_bytes;
stats->tx_total_resets += sw_stats.tx.tx_resets;
stats->tx_total_ring_discards +=
BNXT_GET_RING_STATS64(hw_stats, tx_discard_pkts);
@@ -15677,6 +15680,7 @@ static void bnxt_get_base_stats(struct net_device *dev,
struct netdev_queue_stats_tx *tx)
{
struct bnxt *bp = netdev_priv(dev);
+ int i;
rx->packets = bp->ring_drv_stats_prev.rx_total_packets;
rx->bytes = bp->ring_drv_stats_prev.rx_total_bytes;
@@ -15684,6 +15688,21 @@ static void bnxt_get_base_stats(struct net_device *dev,
tx->packets = bp->ring_drv_stats_prev.tx_total_packets;
tx->bytes = bp->ring_drv_stats_prev.tx_total_bytes;
+
+ for (i = 0; i < bp->cp_nr_rings; i++) {
+ struct bnxt_sw_stats *sw_stats = bp->bnapi[i]->cp_ring.sw_stats;
+ unsigned int seq;
+ u64 pkts, bytes;
+
+ do {
+ seq = u64_stats_fetch_begin(&sw_stats->rx.syncp);
+ pkts = sw_stats->tx.xdp_packets;
+ bytes = sw_stats->tx.xdp_bytes;
+ } while (u64_stats_fetch_retry(&sw_stats->rx.syncp, seq));
+
+ tx->packets += pkts;
+ tx->bytes += bytes;
+ }
}
static const struct netdev_stat_ops bnxt_stat_ops = {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 16d3698cf0e9..644e4a7818a2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -35,14 +35,17 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
u16 prod;
int i;
- if (xdp && xdp_buff_has_frags(xdp)) {
- sinfo = xdp_get_shared_info_from_buff(xdp);
- num_frags = sinfo->nr_frags;
- }
-
/* fill up the first buffer */
prod = txr->tx_prod;
tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
+ tx_buf->xdp_len = len;
+
+ if (xdp && xdp_buff_has_frags(xdp)) {
+ sinfo = xdp_get_shared_info_from_buff(xdp);
+ tx_buf->xdp_len += sinfo->xdp_frags_size;
+ num_frags = sinfo->nr_frags;
+ }
+
tx_buf->nr_frags = num_frags;
if (xdp)
tx_buf->page = virt_to_head_page(xdp->data);
@@ -120,9 +123,11 @@ static void __bnxt_xmit_xdp_redirect(struct bnxt *bp,
void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
{
+ struct bnxt_sw_stats *sw_stats = bnapi->cp_ring.sw_stats;
struct bnxt_tx_ring_info *txr = bnapi->tx_ring[0];
struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
u16 tx_hw_cons = txr->tx_hw_cons;
+ unsigned int pkts = 0, bytes = 0;
bool rx_doorbell_needed = false;
struct bnxt_sw_tx_bd *tx_buf;
u16 tx_cons = txr->tx_cons;
@@ -135,6 +140,10 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
while (RING_TX(bp, tx_cons) != tx_hw_cons) {
tx_buf = &txr->tx_buf_ring[RING_TX(bp, tx_cons)];
+ pkts++;
+ bytes += tx_buf->xdp_len;
+ tx_buf->xdp_len = 0;
+
if (tx_buf->action == XDP_REDIRECT) {
struct pci_dev *pdev = bp->pdev;
@@ -163,6 +172,12 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
tx_cons = NEXT_TX(tx_cons);
}
+ /* Note: Rx sync here, because Rx == NAPI context */
+ u64_stats_update_begin(&sw_stats->rx.syncp);
+ sw_stats->tx.xdp_packets += pkts;
+ sw_stats->tx.xdp_bytes += bytes;
+ u64_stats_update_end(&sw_stats->rx.syncp);
+
bnapi->events &= ~BNXT_TX_CMP_EVENT;
WRITE_ONCE(txr->tx_cons, tx_cons);
if (rx_doorbell_needed) {
--
2.48.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets
2025-02-28 1:25 ` [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets Jakub Kicinski
@ 2025-03-04 22:48 ` Michael Chan
2025-03-05 0:51 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Michael Chan @ 2025-03-04 22:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
[-- Attachment #1: Type: text/plain, Size: 558 bytes --]
On Thu, Feb 27, 2025 at 5:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
> @@ -1134,6 +1137,8 @@ struct bnxt_tx_sw_stats {
> /* non-ethtool stats follow */
> u64 tx_packets;
> u64 tx_bytes;
> + u64 xdp_packets; /* under rx syncp */
> + u64 xdp_bytes; /* under rx syncp */
Why do we need different TX counters for XDP? A TX ring is either for
XDP or for regular TX. It cannot be for both so why do we need
separate counters?
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets
2025-03-04 22:48 ` Michael Chan
@ 2025-03-05 0:51 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-03-05 0:51 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
pavan.chebbi, przemyslaw.kitszel
On Tue, 4 Mar 2025 14:48:05 -0800 Michael Chan wrote:
> > @@ -1134,6 +1137,8 @@ struct bnxt_tx_sw_stats {
> > /* non-ethtool stats follow */
> > u64 tx_packets;
> > u64 tx_bytes;
> > + u64 xdp_packets; /* under rx syncp */
> > + u64 xdp_bytes; /* under rx syncp */
>
> Why do we need different TX counters for XDP? A TX ring is either for
> XDP or for regular TX. It cannot be for both so why do we need
> separate counters?
No strong reason, felt cleaner given xdp is under a different lock.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW
2025-02-28 1:25 [PATCH net-next v2 0/9] eth: bnxt: maintain basic pkt/byte counters in SW Jakub Kicinski
` (8 preceding siblings ...)
2025-02-28 1:25 ` [PATCH net-next v2 9/9] eth: bnxt: count xdp xmit packets Jakub Kicinski
@ 2025-02-28 11:51 ` Przemek Kitszel
9 siblings, 0 replies; 19+ messages in thread
From: Przemek Kitszel @ 2025-02-28 11:51 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
pavan.chebbi, davem
On 2/28/25 02:25, Jakub Kicinski wrote:
> Some workloads want to be able to track bandwidth utilization on
> the scale of 10s of msecs. bnxt uses HW stats and async stats
> updates, with update frequency controlled via ethtool -C.
> Updating all HW stats more often than 100 msec is both hard for
> the device and consumes PCIe bandwidth. Switch to maintaining
> basic Rx / Tx packet and byte counters in SW.
>
> Tested with drivers/net/stats.py:
> # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Manually tested by comparing the ethtool -S stats (which continues
> to show HW stats) with qstats, and total interface stats.
> With and without HW-GRO, and with XDP on / off.
> Stopping and starting the interface also doesn't corrupt the values.
>
> v2:
> - fix skipping XDP vs the XDP Tx ring handling (Michael)
> - rename the defines as well as the structs (Przemek)
> - fix counding frag'ed packets in XDP Tx
> v1: https://lore.kernel.org/20250226211003.2790916-1-kuba@kernel.org
>
> Jakub Kicinski (9):
> eth: bnxt: use napi_consume_skb()
> eth: bnxt: don't run xdp programs on fallback traffic
> eth: bnxt: rename ring_err_stats -> ring_drv_stats
> eth: bnxt: snapshot driver stats
> eth: bnxt: don't use ifdef to check for CONFIG_INET in GRO
> eth: bnxt: consolidate the GRO-but-not-really paths in bnxt_gro_skb()
> eth: bnxt: maintain rx pkt/byte stats in SW
> eth: bnxt: maintain tx pkt/byte stats in SW
> eth: bnxt: count xdp xmit packets
>
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 49 +++-
> drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 3 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 213 +++++++++++-------
> .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 43 +++-
> 5 files changed, 228 insertions(+), 100 deletions(-)
>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread