* [PATCH v3 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors
@ 2024-12-06 9:03 Yafang Shao
2024-12-08 1:38 ` Jakub Kicinski
0 siblings, 1 reply; 3+ messages in thread
From: Yafang Shao @ 2024-12-06 9:03 UTC (permalink / raw)
To: saeedm, tariqt, leon, gal, kuba
Cc: netdev, linux-rdma, Yafang Shao, Tariq Toukan
We observed a high number of rx_discards_phy events on some servers when
running `ethtool -S`. However, this important counter is not currently
reflected in the /proc/net/dev statistics file, making it challenging to
monitor effectively.
Since rx_fifo_errors represents receive FIFO errors on this network
deivice, it makes sense to include rx_discards_phy in this counter to
enhance monitoring visibility. This change will help administrators track
these events more effectively through standard interfaces.
I have also verified the manual of ethtool counters on mlx5 [0], it seems
that rx_discards_phy and rx_fifo_errors has the same meaning:
rx_discards_phy: The number of received packets dropped due to lack of
buffers on a physical port. If this counter is
increasing, it implies that the adapter is congested and
cannot absorb the traffic coming from the network.
ConnectX-3 naming : rx_fifo_errors
Link: https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters [0]
Suggested-by: Tariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Gal Pressman <gal@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
1 file changed, 1 insertion(+)
Changes:
v2->v3:
- Drop the changes on the Doc
v1->v2: https://lore.kernel.org/netdev/20241114021711.5691-1-laoar.shao@gmail.com/
- Use rx_fifo_errors instead (Tariq)
- Update the if_link.h accordingly
v1: https://lore.kernel.org/netdev/20241106064015.4118-1-laoar.shao@gmail.com/
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e601324a690a..15b1a3e6e641 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3916,6 +3916,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
}
stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer;
+ stats->rx_fifo_errors = PPORT_2863_GET(pstats, if_in_discards);
stats->rx_length_errors =
PPORT_802_3_GET(pstats, a_in_range_length_errors) +
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors
2024-12-06 9:03 [PATCH v3 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors Yafang Shao
@ 2024-12-08 1:38 ` Jakub Kicinski
2024-12-08 6:01 ` Yafang Shao
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2024-12-08 1:38 UTC (permalink / raw)
To: Yafang Shao; +Cc: saeedm, tariqt, leon, gal, netdev, linux-rdma, Tariq Toukan
On Fri, 6 Dec 2024 17:03:28 +0800 Yafang Shao wrote:
> We observed a high number of rx_discards_phy events on some servers when
> running `ethtool -S`. However, this important counter is not currently
> reflected in the /proc/net/dev statistics file, making it challenging to
> monitor effectively.
>
> Since rx_fifo_errors represents receive FIFO errors on this network
> deivice, it makes sense to include rx_discards_phy in this counter to
> enhance monitoring visibility. This change will help administrators track
> these events more effectively through standard interfaces.
It's not a standard if there is no definition applicable across vendors.
Count it as generic rx_dropped. If you disagree with me please carry
this tag on future versions:
Nacked-by: Jakub Kicinski <kuba@kernel.org>
--
pw-bot: cr
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors
2024-12-08 1:38 ` Jakub Kicinski
@ 2024-12-08 6:01 ` Yafang Shao
0 siblings, 0 replies; 3+ messages in thread
From: Yafang Shao @ 2024-12-08 6:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: saeedm, tariqt, leon, gal, netdev, linux-rdma, Tariq Toukan
On Sun, Dec 8, 2024 at 9:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 6 Dec 2024 17:03:28 +0800 Yafang Shao wrote:
> > We observed a high number of rx_discards_phy events on some servers when
> > running `ethtool -S`. However, this important counter is not currently
> > reflected in the /proc/net/dev statistics file, making it challenging to
> > monitor effectively.
> >
> > Since rx_fifo_errors represents receive FIFO errors on this network
> > deivice, it makes sense to include rx_discards_phy in this counter to
> > enhance monitoring visibility. This change will help administrators track
> > these events more effectively through standard interfaces.
>
> It's not a standard if there is no definition applicable across vendors.
> Count it as generic rx_dropped.
Thank you for your suggestion. I'm okay with counting it as generic
rx_dropped as long as we have a metric to track it.
I will send a new version.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-08 6:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 9:03 [PATCH v3 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors Yafang Shao
2024-12-08 1:38 ` Jakub Kicinski
2024-12-08 6:01 ` Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).