* Re: [PATCH net-next v5] octeontx2-pf: Retain ethtool stats across interface down/up.
2026-05-20 9:34 [PATCH net-next v5] octeontx2-pf: Retain ethtool stats across interface down/up Anshumali Gaur
@ 2026-05-25 18:51 ` Jakub Kicinski
0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-25 18:51 UTC (permalink / raw)
To: agaur
Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, gakula,
hkelam, sbhatta, sgoutham, bbhushan2, netdev, linux-kernel
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
octeontx2-pf: Retain ethtool stats across interface down/up.
This patch saves accumulated device statistics into a new
otx2_hw.old_stats structure on interface stop and adds them to the
hardware counters returned by otx2_get_stats64() so that
rtnl_link_stats64 values persist across down/up cycles.
> octeontx2-pf: Retain ethtool stats across interface down/up.
[Medium]
The subject and changelog talk about "ethtool stats", but the patch only
touches the rtnl_link_stats64 path via otx2_get_stats64() (the
ndo_get_stats64 callback). The actual ethtool -S handlers, namely
otx2_get_ethtool_stats() and otx2vf_get_ethtool_stats() in
otx2_ethtool.c, still index pfvf->hw.dev_stats directly:
*(data++) = ((u64 *)&pfvf->hw.dev_stats)[otx2_dev_stats[stat].index];
with no contribution from the new old_stats field. Will ethtool -S
counters continue to drop to zero on interface down despite the commit
subject, or is the intent to update those handlers too? Could the
subject and changelog be reworded to refer to interface
(rtnl_link_stats64) statistics, or the ethtool -S handlers updated to
also account for old_stats?
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 971fcab1c248..fce46816446f 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -138,20 +138,22 @@ void otx2_get_dev_stats(struct otx2_nic *pfvf)
> void otx2_get_stats64(struct net_device *netdev,
> struct rtnl_link_stats64 *stats)
> {
> + struct otx2_dev_stats *dev_stats, *old_stats;
> struct otx2_nic *pfvf = netdev_priv(netdev);
> - struct otx2_dev_stats *dev_stats;
>
> otx2_get_dev_stats(pfvf);
>
> dev_stats = &pfvf->hw.dev_stats;
> - stats->rx_bytes = dev_stats->rx_bytes;
> - stats->rx_packets = dev_stats->rx_frames;
> - stats->rx_dropped = dev_stats->rx_drops;
> - stats->multicast = dev_stats->rx_mcast_frames;
> -
> - stats->tx_bytes = dev_stats->tx_bytes;
> - stats->tx_packets = dev_stats->tx_frames;
> - stats->tx_dropped = dev_stats->tx_drops;
> + old_stats = &pfvf->hw.old_stats;
> +
> + stats->rx_bytes = old_stats->rx_bytes + dev_stats->rx_bytes;
> + stats->rx_packets = old_stats->rx_frames + dev_stats->rx_frames;
> + stats->rx_dropped = old_stats->rx_drops + dev_stats->rx_drops;
> + stats->multicast = old_stats->rx_mcast_frames + dev_stats->rx_mcast_frames;
> +
> + stats->tx_bytes = old_stats->tx_bytes + dev_stats->tx_bytes;
> + stats->tx_packets = old_stats->tx_frames + dev_stats->tx_frames;
> + stats->tx_dropped = old_stats->tx_drops + dev_stats->tx_drops;
[Medium]
otx2_get_stats64() is the ndo_get_stats64 callback for both PF and VF
and is callable from contexts that do not hold RTNL (for example
/proc/net/dev readers). It now reads pfvf->hw.old_stats with no
locking, while otx2_stop() on another CPU performs a series of
non-atomic read-modify-write operations on the same fields:
old_stats->rx_bytes += dev->rx_bytes;
old_stats->rx_drops += dev->rx_drops;
...
Can a concurrent reader observe a torn 64-bit value on 32-bit
architectures, or an inconsistent snapshot across the thirteen fields
on 64-bit (e.g. old_stats->rx_bytes already accumulated while
old_stats->rx_frames is not)?
Additionally, both paths write pfvf->hw.dev_stats: otx2_stop() calls
otx2_get_dev_stats(pf) just before the accumulation, and any concurrent
otx2_get_stats64() also calls otx2_get_dev_stats(pfvf) and overwrites
the same fields. Could the otx2_stop() accumulation end up adding
values that no longer match the snapshot it just took, if a concurrent
reader rewrote dev_stats in between?
Would a u64_stats_sync seqlock or a dedicated stats lock around both
the writer and reader paths be more appropriate here?
> }
> EXPORT_SYMBOL(otx2_get_stats64);
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
[ ... ]
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index ee623476e5ff..3b4e32d899b7 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -2154,6 +2154,7 @@ EXPORT_SYMBOL(otx2_open);
> int otx2_stop(struct net_device *netdev)
> {
> struct otx2_nic *pf = netdev_priv(netdev);
> + struct otx2_dev_stats *old_stats, *dev;
> struct otx2_cq_poll *cq_poll = NULL;
> struct otx2_qset *qset = &pf->qset;
> int qidx, vec, wrk;
> @@ -2200,6 +2201,26 @@ int otx2_stop(struct net_device *netdev)
> cancel_delayed_work_sync(&pf->refill_wrk[wrk].pool_refill_work);
> devm_kfree(pf->dev, pf->refill_wrk);
>
> + /* Read final HW counters and accumulate */
> + dev = &pf->hw.dev_stats;
> + old_stats = &pf->hw.old_stats;
> + otx2_get_dev_stats(pf);
> +
> + /* Accumulate old stats */
> + old_stats->rx_bytes += dev->rx_bytes;
> + old_stats->rx_drops += dev->rx_drops;
> + old_stats->rx_bcast_frames += dev->rx_bcast_frames;
> + old_stats->rx_mcast_frames += dev->rx_mcast_frames;
> + old_stats->rx_ucast_frames += dev->rx_ucast_frames;
> + old_stats->rx_frames += dev->rx_frames;
> +
> + old_stats->tx_bytes += dev->tx_bytes;
> + old_stats->tx_drops += dev->tx_drops;
[High]
Can tx_drops be over-counted across down/up cycles? Looking at
otx2_get_dev_stats():
dev_stats->tx_drops = OTX2_GET_TX_STATS(TX_DROP) +
(unsigned long)atomic_long_read(&dev_stats->tx_discards);
The tx_discards atomic_long_t is incremented on the xmit path and is
never reset across down/up cycles (no clearing in otx2_open() or
otx2_stop(), and only ever atomic_long_inc()).
The accumulation here saves the composite dev->tx_drops
(HW TX_DROP + tx_discards) into old_stats->tx_drops. After the next
bring-up the HW TX_DROP register starts again at 0 but tx_discards
still contains the prior session's count, so otx2_get_dev_stats()
folds the same atomic value in again.
The next otx2_get_stats64() then returns
old_stats->tx_drops + dev_stats->tx_drops, where dev_stats->tx_drops
still includes the same tx_discards. After N down/up cycles, every
prior-session tx_discards increment is counted N+1 times.
For example, with X prior tx_discards and HW drops D1 in cycle 1, then
HW drops D2 and tx_discards growing to X+Y in cycle 2, the reported
tx_dropped becomes (D1+X) + (D2+X+Y) = D1+D2+2X+Y instead of
D1+D2+X+Y.
Would it be better to accumulate only the HW component (snapshot
OTX2_GET_TX_STATS(TX_DROP) separately and let the atomic continue to
be added at read time), zero the atomic at stop, or compute deltas
instead of full snapshots?
> + old_stats->tx_bcast_frames += dev->tx_bcast_frames;
> + old_stats->tx_mcast_frames += dev->tx_mcast_frames;
> + old_stats->tx_ucast_frames += dev->tx_ucast_frames;
> + old_stats->tx_frames += dev->tx_frames;
> +
> otx2_free_hw_resources(pf);
> otx2_free_cints(pf, pf->hw.cint_cnt);
> otx2_disable_napi(pf);
--
pw-bot: cr
^ permalink raw reply [flat|nested] 2+ messages in thread