* [net-next v5 0/2] mlx5: Add netdev-genl queue stats
@ 2024-06-12 20:08 Joe Damato
2024-06-12 20:08 ` [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping Joe Damato
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Joe Damato @ 2024-06-12 20:08 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: nalramli, Joe Damato, Carolina Jubran, David S. Miller,
Dragos Tatulea, Eric Dumazet, Gal Pressman, Jakub Kicinski,
Leon Romanovsky, open list:MELLANOX MLX5 core VPI driver,
Naveen Mamindlapalli, Paolo Abeni, Richard Cochran,
Saeed Mahameed, Tariq Toukan
Greetings:
Welcome to v5.
Switched from RFC to just a v5, because I think this is pretty close.
Minor changes from v4 summarized below in the changelog.
Note that my NIC does not seem to support PTP and I couldn't get the
mlnx-tools mlnx_qos script to work, so I was only able to test the
following cases:
- device up at boot
- adjusting queue counts
- device down (e.g. ip link set dev eth4 down)
Please see the commit message of patch 2/2 for more details on output
and test cases.
rfcv4 thread:
https://lore.kernel.org/linux-kernel/20240604004629.299699-1-jdamato@fastly.com/T/
Thanks,
Joe
rfcv4 -> v5:
- Patch 1/2: change variable name 'mlx5e_qid' to 'txq_ix'.
- Patch 2/2:
- remove logic in mlx5e_get_queue_stats_rx for PTP. PTP RX are
always reported in base.
- report PTP TX in mlx5e_get_base_stats only if:
- PTP has ever been opened, and
- either PTP is NULL (closed) or the MLX5E_PTP_STATE_TX bit in its
state is not set
Otherwise, PTP TX will be reported when the txq_ix is passed into
mlx5e_get_queue_stats_tx
rfcv3 -> rfcv4:
- Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq
indices to sq_stats structures so stats can be accessed directly.
This mapping is kept up to date along side txq2sq.
- Patch 2/2:
- All mutex_lock/unlock on state_lock has been dropped.
- mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special
case for PTP. If PTP was ever opened, is currently opened, and the
channel index matches, stats for PTP RX are output.
- mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No
corner cases are needed here because any txq idx (passed in as i)
will have an up to date mapping in priv->txq2sq_stats.
- mlx5e_get_base_stats:
- in the RX case:
- iterates from [params.num_channels, stats_nch) collecting
stats.
- if ptp was ever opened but is currently closed, add the PTP
stats.
- in the TX case:
- handle 2 cases:
- the channel is available, so sum only the unavailable TCs
[mlx5e_get_dcb_num_tc, max_opened_tc).
- the channel is unavailable, so sum all TCs [0, max_opened_tc).
- if ptp was ever opened but is currently closed, add the PTP
sq stats.
v2 -> rfcv3:
- Added patch 1/2 which creates some helpers for computing the txq_ix
and ch_ix/tc_ix.
- Patch 2/2 modified in several ways:
- Fixed variable declarations in mlx5e_get_queue_stats_rx to be at
the start of the function.
- mlx5e_get_queue_stats_tx rewritten to access sq stats directly by
using the helpers added in the previous patch.
- mlx5e_get_base_stats modified in several ways:
- Took the state_lock when accessing priv->channels.
- For the base RX stats, code was simplified to call
mlx5e_get_queue_stats_rx instead of repeating the same code.
- For the base TX stats, I attempted to implement what I think
Tariq suggested in the previous thread:
- for available channels, only unavailable TC stats are summed
- for unavailable channels, all stats for TCs up to
max_opened_tc are summed.
v1 - > v2:
- Essentially a full rewrite after comments from Jakub, Tariq, and
Zhu.
Joe Damato (2):
net/mlx5e: Add txq to sq stats mapping
net/mlx5e: Add per queue netdev-genl stats
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +
.../net/ethernet/mellanox/mlx5/core/en/qos.c | 13 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 143 +++++++++++++++++-
3 files changed, 155 insertions(+), 3 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping 2024-06-12 20:08 [net-next v5 0/2] mlx5: Add netdev-genl queue stats Joe Damato @ 2024-06-12 20:08 ` Joe Damato 2024-06-17 5:15 ` Tariq Toukan 2024-06-12 20:08 ` [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats Joe Damato 2024-06-17 10:40 ` [net-next v5 0/2] mlx5: Add netdev-genl queue stats patchwork-bot+netdevbpf 2 siblings, 1 reply; 12+ messages in thread From: Joe Damato @ 2024-06-12 20:08 UTC (permalink / raw) To: netdev, linux-kernel Cc: nalramli, Joe Damato, Saeed Mahameed, Leon Romanovsky, Tariq Toukan, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Gal Pressman, Carolina Jubran, Naveen Mamindlapalli, open list:MELLANOX MLX5 core VPI driver mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map txqs to sq_stats, as well, for direct access to stats. Add priv->txq2sq_stats and insert mappings. The mappings will be used next to tabulate stats information. Signed-off-by: Joe Damato <jdamato@fastly.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 13 +++++++++++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 ++++++++++- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index bec784d25d7b..6a343a8f162f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -867,6 +867,8 @@ struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_selq selq; struct mlx5e_txqsq **txq2sq; + struct mlx5e_sq_stats **txq2sq_stats; + #ifdef CONFIG_MLX5_CORE_EN_DCB struct mlx5e_dcbx_dp dcbx_dp; #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c index 6743806b8480..f0744a45db92 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c @@ -170,6 +170,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id) mlx5e_tx_disable_queue(netdev_get_tx_queue(priv->netdev, qid)); priv->txq2sq[qid] = sq; + priv->txq2sq_stats[qid] = sq->stats; /* Make the change to txq2sq visible before the queue is started. * As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE, @@ -186,6 +187,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id) void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid) { struct mlx5e_txqsq *sq; + u16 txq_ix; sq = mlx5e_get_qos_sq(priv, qid); if (!sq) /* Handle the case when the SQ failed to open. */ @@ -194,7 +196,10 @@ void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid) qos_dbg(sq->mdev, "Deactivate QoS SQ qid %u\n", qid); mlx5e_deactivate_txqsq(sq); - priv->txq2sq[mlx5e_qid_from_qos(&priv->channels, qid)] = NULL; + txq_ix = mlx5e_qid_from_qos(&priv->channels, qid); + + priv->txq2sq[txq_ix] = NULL; + priv->txq2sq_stats[txq_ix] = NULL; /* Make the change to txq2sq visible before the queue is started again. * As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE, @@ -325,6 +330,7 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c) { struct mlx5e_params *params = &c->priv->channels.params; struct mlx5e_txqsq __rcu **qos_sqs; + u16 txq_ix; int i; qos_sqs = mlx5e_state_dereference(c->priv, c->qos_sqs); @@ -342,8 +348,11 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c) qos_dbg(c->mdev, "Deactivate QoS SQ qid %u\n", qid); mlx5e_deactivate_txqsq(sq); + txq_ix = mlx5e_qid_from_qos(&c->priv->channels, qid); + /* The queue is disabled, no synchronization with datapath is needed. */ - c->priv->txq2sq[mlx5e_qid_from_qos(&c->priv->channels, qid)] = NULL; + c->priv->txq2sq[txq_ix] = NULL; + c->priv->txq2sq_stats[txq_ix] = NULL; } } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 44a64d062e42..c548e2fdc58f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3125,6 +3125,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv) struct mlx5e_txqsq *sq = &c->sq[tc]; priv->txq2sq[sq->txq_ix] = sq; + priv->txq2sq_stats[sq->txq_ix] = sq->stats; } } @@ -3139,6 +3140,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv) struct mlx5e_txqsq *sq = &c->ptpsq[tc].txqsq; priv->txq2sq[sq->txq_ix] = sq; + priv->txq2sq_stats[sq->txq_ix] = sq->stats; } out: @@ -5849,9 +5851,13 @@ int mlx5e_priv_init(struct mlx5e_priv *priv, if (!priv->txq2sq) goto err_destroy_workqueue; + priv->txq2sq_stats = kcalloc_node(num_txqs, sizeof(*priv->txq2sq_stats), GFP_KERNEL, node); + if (!priv->txq2sq_stats) + goto err_free_txq2sq; + priv->tx_rates = kcalloc_node(num_txqs, sizeof(*priv->tx_rates), GFP_KERNEL, node); if (!priv->tx_rates) - goto err_free_txq2sq; + goto err_free_txq2sq_stats; priv->channel_stats = kcalloc_node(nch, sizeof(*priv->channel_stats), GFP_KERNEL, node); @@ -5862,6 +5868,8 @@ int mlx5e_priv_init(struct mlx5e_priv *priv, err_free_tx_rates: kfree(priv->tx_rates); +err_free_txq2sq_stats: + kfree(priv->txq2sq_stats); err_free_txq2sq: kfree(priv->txq2sq); err_destroy_workqueue: @@ -5885,6 +5893,7 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv) kvfree(priv->channel_stats[i]); kfree(priv->channel_stats); kfree(priv->tx_rates); + kfree(priv->txq2sq_stats); kfree(priv->txq2sq); destroy_workqueue(priv->wq); mlx5e_selq_cleanup(&priv->selq); -- 2.25.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping 2024-06-12 20:08 ` [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping Joe Damato @ 2024-06-17 5:15 ` Tariq Toukan 0 siblings, 0 replies; 12+ messages in thread From: Tariq Toukan @ 2024-06-17 5:15 UTC (permalink / raw) To: Joe Damato, netdev, linux-kernel Cc: nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Gal Pressman, Carolina Jubran, Naveen Mamindlapalli, open list:MELLANOX MLX5 core VPI driver On 12/06/2024 23:08, Joe Damato wrote: > mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map > txqs to sq_stats, as well, for direct access to stats. > > Add priv->txq2sq_stats and insert mappings. The mappings will be used > next to tabulate stats information. > > Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-12 20:08 [net-next v5 0/2] mlx5: Add netdev-genl queue stats Joe Damato 2024-06-12 20:08 ` [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping Joe Damato @ 2024-06-12 20:08 ` Joe Damato 2024-06-13 20:25 ` Tariq Toukan 2024-06-17 10:40 ` [net-next v5 0/2] mlx5: Add netdev-genl queue stats patchwork-bot+netdevbpf 2 siblings, 1 reply; 12+ messages in thread From: Joe Damato @ 2024-06-12 20:08 UTC (permalink / raw) To: netdev, linux-kernel Cc: nalramli, Joe Damato, Saeed Mahameed, Tariq Toukan, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' ...snip {'ifindex': 7, 'queue-id': 62, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 105965251, 'rx-packets': 179790}, {'ifindex': 7, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 9402665, 'tx-packets': 17551}, ...snip Also tested with the script tools/testing/selftests/drivers/net/stats.py in several scenarios to ensure stats tallying was correct: - on boot (default queue counts) - adjusting queue count up or down (ethtool -L eth0 combined ...) The tools/testing/selftests/drivers/net/stats.py brings the device up, so to test with the device down, I did the following: $ ip link show eth4 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] [..snip..] $ cat /proc/net/dev | grep eth4 eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"ifindex": 7}' [{'ifindex': 7, 'rx-alloc-fail': 0, 'rx-bytes': 235710489, 'rx-packets': 434811, 'tx-bytes': 2878744, 'tx-packets': 21227}] Compare the values in /proc/net/dev match the output of cli for the same device, even while the device is down. Note that while the device is down, per queue stats output nothing (because the device is down there are no queues): $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' [] Signed-off-by: Joe Damato <jdamato@fastly.com> --- .../net/ethernet/mellanox/mlx5/core/en_main.c | 132 ++++++++++++++++++ 1 file changed, 132 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index c548e2fdc58f..d3f38b4b18eb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -39,6 +39,7 @@ #include <linux/debugfs.h> #include <linux/if_bridge.h> #include <linux/filter.h> +#include <net/netdev_queues.h> #include <net/page_pool/types.h> #include <net/pkt_sched.h> #include <net/xdp_sock_drv.h> @@ -5299,6 +5300,136 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev) return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)); } +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i, + struct netdev_queue_stats_rx *stats) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_channel_stats *channel_stats; + struct mlx5e_rq_stats *xskrq_stats; + struct mlx5e_rq_stats *rq_stats; + + ASSERT_RTNL(); + if (mlx5e_is_uplink_rep(priv)) + return; + + channel_stats = priv->channel_stats[i]; + xskrq_stats = &channel_stats->xskrq; + rq_stats = &channel_stats->rq; + + stats->packets = rq_stats->packets + xskrq_stats->packets; + stats->bytes = rq_stats->bytes + xskrq_stats->bytes; + stats->alloc_fail = rq_stats->buff_alloc_err + + xskrq_stats->buff_alloc_err; +} + +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i, + struct netdev_queue_stats_tx *stats) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_sq_stats *sq_stats; + + ASSERT_RTNL(); + /* no special case needed for ptp htb etc since txq2sq_stats is kept up + * to date for active sq_stats, otherwise get_base_stats takes care of + * inactive sqs. + */ + sq_stats = priv->txq2sq_stats[i]; + stats->packets = sq_stats->packets; + stats->bytes = sq_stats->bytes; +} + +static void mlx5e_get_base_stats(struct net_device *dev, + struct netdev_queue_stats_rx *rx, + struct netdev_queue_stats_tx *tx) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_ptp *ptp_channel; + int i, tc; + + ASSERT_RTNL(); + if (!mlx5e_is_uplink_rep(priv)) { + rx->packets = 0; + rx->bytes = 0; + rx->alloc_fail = 0; + + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { + struct netdev_queue_stats_rx rx_i = {0}; + + mlx5e_get_queue_stats_rx(dev, i, &rx_i); + + rx->packets += rx_i.packets; + rx->bytes += rx_i.bytes; + rx->alloc_fail += rx_i.alloc_fail; + } + + /* always report PTP RX stats from base as there is no + * corresponding channel to report them under in + * mlx5e_get_queue_stats_rx. + */ + if (priv->rx_ptp_opened) { + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq; + + rx->packets += rq_stats->packets; + rx->bytes += rq_stats->bytes; + } + } + + tx->packets = 0; + tx->bytes = 0; + + for (i = 0; i < priv->stats_nch; i++) { + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i]; + + /* handle two cases: + * + * 1. channels which are active. In this case, + * report only deactivated TCs on these channels. + * + * 2. channels which were deactivated + * (i > priv->channels.params.num_channels) + * must have all of their TCs [0 .. priv->max_opened_tc) + * examined because deactivated channels will not be in the + * range of [0..real_num_tx_queues) and will not have their + * stats reported by mlx5e_get_queue_stats_tx. + */ + if (i < priv->channels.params.num_channels) + tc = mlx5e_get_dcb_num_tc(&priv->channels.params); + else + tc = 0; + + for (; tc < priv->max_opened_tc; tc++) { + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc]; + + tx->packets += sq_stats->packets; + tx->bytes += sq_stats->bytes; + } + } + + /* if PTP TX was opened at some point and has since either: + * - been shutdown and set to NULL, or + * - simply disabled (bit unset) + * + * report stats directly from the ptp_stats structures as these queues + * are now unavailable and there is no txq index to retrieve these + * stats via calls to mlx5e_get_queue_stats_tx. + */ + ptp_channel = priv->channels.ptp; + if (priv->tx_ptp_opened && (!ptp_channel || !test_bit(MLX5E_PTP_STATE_TX, ptp_channel->state))) { + for (tc = 0; tc < priv->max_opened_tc; tc++) { + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc]; + + tx->packets += sq_stats->packets; + tx->bytes += sq_stats->bytes; + } + } +} + +static const struct netdev_stat_ops mlx5e_stat_ops = { + .get_queue_stats_rx = mlx5e_get_queue_stats_rx, + .get_queue_stats_tx = mlx5e_get_queue_stats_tx, + .get_base_stats = mlx5e_get_base_stats, +}; + static void mlx5e_build_nic_netdev(struct net_device *netdev) { struct mlx5e_priv *priv = netdev_priv(netdev); @@ -5316,6 +5447,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->watchdog_timeo = 15 * HZ; + netdev->stat_ops = &mlx5e_stat_ops; netdev->ethtool_ops = &mlx5e_ethtool_ops; netdev->vlan_features |= NETIF_F_SG; -- 2.25.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-12 20:08 ` [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats Joe Damato @ 2024-06-13 20:25 ` Tariq Toukan 2024-06-13 21:58 ` Jakub Kicinski 2024-06-13 22:06 ` Joe Damato 0 siblings, 2 replies; 12+ messages in thread From: Tariq Toukan @ 2024-06-13 20:25 UTC (permalink / raw) To: Joe Damato, netdev, linux-kernel Cc: nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On 12/06/2024 23:08, Joe Damato wrote: > ./cli.py --spec netlink/specs/netdev.yaml \ > --dump qstats-get --json '{"scope": "queue"}' > > ...snip > > {'ifindex': 7, > 'queue-id': 62, > 'queue-type': 'rx', > 'rx-alloc-fail': 0, > 'rx-bytes': 105965251, > 'rx-packets': 179790}, > {'ifindex': 7, > 'queue-id': 0, > 'queue-type': 'tx', > 'tx-bytes': 9402665, > 'tx-packets': 17551}, > > ...snip > > Also tested with the script tools/testing/selftests/drivers/net/stats.py > in several scenarios to ensure stats tallying was correct: > > - on boot (default queue counts) > - adjusting queue count up or down (ethtool -L eth0 combined ...) > > The tools/testing/selftests/drivers/net/stats.py brings the device up, > so to test with the device down, I did the following: > > $ ip link show eth4 > 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] > [..snip..] > > $ cat /proc/net/dev | grep eth4 > eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ > --dump qstats-get --json '{"ifindex": 7}' > [{'ifindex': 7, > 'rx-alloc-fail': 0, > 'rx-bytes': 235710489, > 'rx-packets': 434811, > 'tx-bytes': 2878744, > 'tx-packets': 21227}] > > Compare the values in /proc/net/dev match the output of cli for the same > device, even while the device is down. > > Note that while the device is down, per queue stats output nothing > (because the device is down there are no queues): Yeah, the query doesn't reach the device driver... > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ > --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' > [] > > Signed-off-by: Joe Damato <jdamato@fastly.com> > --- > .../net/ethernet/mellanox/mlx5/core/en_main.c | 132 ++++++++++++++++++ > 1 file changed, 132 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index c548e2fdc58f..d3f38b4b18eb 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -39,6 +39,7 @@ > #include <linux/debugfs.h> > #include <linux/if_bridge.h> > #include <linux/filter.h> > +#include <net/netdev_queues.h> > #include <net/page_pool/types.h> > #include <net/pkt_sched.h> > #include <net/xdp_sock_drv.h> > @@ -5299,6 +5300,136 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev) > return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)); > } > > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i, > + struct netdev_queue_stats_rx *stats) > +{ > + struct mlx5e_priv *priv = netdev_priv(dev); > + struct mlx5e_channel_stats *channel_stats; > + struct mlx5e_rq_stats *xskrq_stats; > + struct mlx5e_rq_stats *rq_stats; > + > + ASSERT_RTNL(); > + if (mlx5e_is_uplink_rep(priv)) > + return; > + > + channel_stats = priv->channel_stats[i]; > + xskrq_stats = &channel_stats->xskrq; > + rq_stats = &channel_stats->rq; > + > + stats->packets = rq_stats->packets + xskrq_stats->packets; > + stats->bytes = rq_stats->bytes + xskrq_stats->bytes; > + stats->alloc_fail = rq_stats->buff_alloc_err + > + xskrq_stats->buff_alloc_err; > +} > + > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i, > + struct netdev_queue_stats_tx *stats) > +{ > + struct mlx5e_priv *priv = netdev_priv(dev); > + struct mlx5e_sq_stats *sq_stats; > + > + ASSERT_RTNL(); > + /* no special case needed for ptp htb etc since txq2sq_stats is kept up > + * to date for active sq_stats, otherwise get_base_stats takes care of > + * inactive sqs. > + */ > + sq_stats = priv->txq2sq_stats[i]; > + stats->packets = sq_stats->packets; > + stats->bytes = sq_stats->bytes; > +} > + > +static void mlx5e_get_base_stats(struct net_device *dev, > + struct netdev_queue_stats_rx *rx, > + struct netdev_queue_stats_tx *tx) > +{ > + struct mlx5e_priv *priv = netdev_priv(dev); > + struct mlx5e_ptp *ptp_channel; > + int i, tc; > + > + ASSERT_RTNL(); > + if (!mlx5e_is_uplink_rep(priv)) { > + rx->packets = 0; > + rx->bytes = 0; > + rx->alloc_fail = 0; > + > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { IIUC, per the current kernel implementation, the lower parts won't be completed in a loop over [0..real_num_rx_queues-1], as that loop is conditional, happening only if the queues are active. I would like the kernel to drop that condition, and stop forcing the device driver to conditionally include this part in the base. Otherwise, the lower parts need to be added here. > + struct netdev_queue_stats_rx rx_i = {0}; > + > + mlx5e_get_queue_stats_rx(dev, i, &rx_i); > + > + rx->packets += rx_i.packets; > + rx->bytes += rx_i.bytes; > + rx->alloc_fail += rx_i.alloc_fail; > + } > + > + /* always report PTP RX stats from base as there is no > + * corresponding channel to report them under in > + * mlx5e_get_queue_stats_rx. > + */ > + if (priv->rx_ptp_opened) { > + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq; > + > + rx->packets += rq_stats->packets; > + rx->bytes += rq_stats->bytes; > + } > + } > + > + tx->packets = 0; > + tx->bytes = 0; > + > + for (i = 0; i < priv->stats_nch; i++) { > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i]; > + > + /* handle two cases: > + * > + * 1. channels which are active. In this case, > + * report only deactivated TCs on these channels. > + * > + * 2. channels which were deactivated > + * (i > priv->channels.params.num_channels) > + * must have all of their TCs [0 .. priv->max_opened_tc) > + * examined because deactivated channels will not be in the > + * range of [0..real_num_tx_queues) and will not have their > + * stats reported by mlx5e_get_queue_stats_tx. > + */ > + if (i < priv->channels.params.num_channels) > + tc = mlx5e_get_dcb_num_tc(&priv->channels.params); > + else > + tc = 0; > + > + for (; tc < priv->max_opened_tc; tc++) { > + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc]; > + > + tx->packets += sq_stats->packets; > + tx->bytes += sq_stats->bytes; > + } Again, what about the lower part in case queues are not active? > + } > + > + /* if PTP TX was opened at some point and has since either: > + * - been shutdown and set to NULL, or > + * - simply disabled (bit unset) > + * > + * report stats directly from the ptp_stats structures as these queues > + * are now unavailable and there is no txq index to retrieve these > + * stats via calls to mlx5e_get_queue_stats_tx. > + */ > + ptp_channel = priv->channels.ptp; > + if (priv->tx_ptp_opened && (!ptp_channel || !test_bit(MLX5E_PTP_STATE_TX, ptp_channel->state))) { > + for (tc = 0; tc < priv->max_opened_tc; tc++) { > + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc]; > + > + tx->packets += sq_stats->packets; > + tx->bytes += sq_stats->bytes; > + } > + } > +} > + > +static const struct netdev_stat_ops mlx5e_stat_ops = { > + .get_queue_stats_rx = mlx5e_get_queue_stats_rx, > + .get_queue_stats_tx = mlx5e_get_queue_stats_tx, > + .get_base_stats = mlx5e_get_base_stats, > +}; > + > static void mlx5e_build_nic_netdev(struct net_device *netdev) > { > struct mlx5e_priv *priv = netdev_priv(netdev); > @@ -5316,6 +5447,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) > > netdev->watchdog_timeo = 15 * HZ; > > + netdev->stat_ops = &mlx5e_stat_ops; > netdev->ethtool_ops = &mlx5e_ethtool_ops; > > netdev->vlan_features |= NETIF_F_SG; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 20:25 ` Tariq Toukan @ 2024-06-13 21:58 ` Jakub Kicinski 2024-06-13 22:12 ` Joe Damato 2024-06-13 22:06 ` Joe Damato 1 sibling, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2024-06-13 21:58 UTC (permalink / raw) To: Tariq Toukan Cc: Joe Damato, netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On Thu, 13 Jun 2024 23:25:12 +0300 Tariq Toukan wrote: > > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { > > IIUC, per the current kernel implementation, the lower parts won't be > completed in a loop over [0..real_num_rx_queues-1], as that loop is > conditional, happening only if the queues are active. Could you rephrase this? Is priv->channels.params.num_channels non-zero also when device is closed? I'm just guessing from the code, TBH, I can't parse your reply :( > I would like the kernel to drop that condition, and stop forcing the > device driver to conditionally include this part in the base. > > Otherwise, the lower parts need to be added here. You'd need a stronger (and clearly explained) argument to change the core. If you're saying that the kernel should be able to dump queue stats for disabled queues - that seems rather questionable. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 21:58 ` Jakub Kicinski @ 2024-06-13 22:12 ` Joe Damato 2024-06-13 22:36 ` Jakub Kicinski 2024-06-17 8:30 ` Tariq Toukan 0 siblings, 2 replies; 12+ messages in thread From: Joe Damato @ 2024-06-13 22:12 UTC (permalink / raw) To: Jakub Kicinski Cc: Tariq Toukan, netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On Thu, Jun 13, 2024 at 02:58:17PM -0700, Jakub Kicinski wrote: > On Thu, 13 Jun 2024 23:25:12 +0300 Tariq Toukan wrote: > > > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { > > > > IIUC, per the current kernel implementation, the lower parts won't be > > completed in a loop over [0..real_num_rx_queues-1], as that loop is > > conditional, happening only if the queues are active. > > Could you rephrase this? Is priv->channels.params.num_channels > non-zero also when device is closed? I'm just guessing from > the code, TBH, I can't parse your reply :( I don't mean to speak for Tariq (so my apologies Tariq), but I suspect it may not be clear in which cases IFF_UP is checked and in which cases get_base is called. I tried to clear it up in my longer response with examples from code. If you have a moment could you take a look and let me know if I've gotten it wrong in my explanation/walk through? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 22:12 ` Joe Damato @ 2024-06-13 22:36 ` Jakub Kicinski 2024-06-17 8:30 ` Tariq Toukan 1 sibling, 0 replies; 12+ messages in thread From: Jakub Kicinski @ 2024-06-13 22:36 UTC (permalink / raw) To: Joe Damato Cc: Tariq Toukan, netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On Thu, 13 Jun 2024 15:12:00 -0700 Joe Damato wrote: > If you have a moment could you take a look and let me know if I've > gotten it wrong in my explanation/walk through? No lies detected :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 22:12 ` Joe Damato 2024-06-13 22:36 ` Jakub Kicinski @ 2024-06-17 8:30 ` Tariq Toukan 1 sibling, 0 replies; 12+ messages in thread From: Tariq Toukan @ 2024-06-17 8:30 UTC (permalink / raw) To: Joe Damato, Jakub Kicinski, netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On 14/06/2024 1:12, Joe Damato wrote: > On Thu, Jun 13, 2024 at 02:58:17PM -0700, Jakub Kicinski wrote: >> On Thu, 13 Jun 2024 23:25:12 +0300 Tariq Toukan wrote: >>>> + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { >>> >>> IIUC, per the current kernel implementation, the lower parts won't be >>> completed in a loop over [0..real_num_rx_queues-1], as that loop is >>> conditional, happening only if the queues are active. >> >> Could you rephrase this? Is priv->channels.params.num_channels >> non-zero also when device is closed? I'm just guessing from >> the code, TBH, I can't parse your reply :( > > I don't mean to speak for Tariq (so my apologies Tariq), but I > suspect it may not be clear in which cases IFF_UP is checked and in > which cases get_base is called. > Exactly. > I tried to clear it up in my longer response with examples from > code. > > If you have a moment could you take a look and let me know if I've > gotten it wrong in my explanation/walk through? Thanks for the detailed explanation. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 20:25 ` Tariq Toukan 2024-06-13 21:58 ` Jakub Kicinski @ 2024-06-13 22:06 ` Joe Damato 2024-06-17 5:17 ` Tariq Toukan 1 sibling, 1 reply; 12+ messages in thread From: Joe Damato @ 2024-06-13 22:06 UTC (permalink / raw) To: Tariq Toukan Cc: netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver, Tariq Toukan On Thu, Jun 13, 2024 at 11:25:12PM +0300, Tariq Toukan wrote: > > > On 12/06/2024 23:08, Joe Damato wrote: > > ./cli.py --spec netlink/specs/netdev.yaml \ > > --dump qstats-get --json '{"scope": "queue"}' > > > > ...snip > > > > {'ifindex': 7, > > 'queue-id': 62, > > 'queue-type': 'rx', > > 'rx-alloc-fail': 0, > > 'rx-bytes': 105965251, > > 'rx-packets': 179790}, > > {'ifindex': 7, > > 'queue-id': 0, > > 'queue-type': 'tx', > > 'tx-bytes': 9402665, > > 'tx-packets': 17551}, > > > > ...snip > > > > Also tested with the script tools/testing/selftests/drivers/net/stats.py > > in several scenarios to ensure stats tallying was correct: > > > > - on boot (default queue counts) > > - adjusting queue count up or down (ethtool -L eth0 combined ...) > > > > The tools/testing/selftests/drivers/net/stats.py brings the device up, > > so to test with the device down, I did the following: > > > > $ ip link show eth4 > > 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] > > [..snip..] > > > > $ cat /proc/net/dev | grep eth4 > > eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] > > > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ > > --dump qstats-get --json '{"ifindex": 7}' > > [{'ifindex': 7, > > 'rx-alloc-fail': 0, > > 'rx-bytes': 235710489, > > 'rx-packets': 434811, > > 'tx-bytes': 2878744, > > 'tx-packets': 21227}] > > > > Compare the values in /proc/net/dev match the output of cli for the same > > device, even while the device is down. > > > > Note that while the device is down, per queue stats output nothing > > (because the device is down there are no queues): > > Yeah, the query doesn't reach the device driver... Yes. Are you suggesting that I update the commit message? I can do so, if you think that is needed? > > > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ > > --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' > > [] > > > > Signed-off-by: Joe Damato <jdamato@fastly.com> > > --- > > .../net/ethernet/mellanox/mlx5/core/en_main.c | 132 ++++++++++++++++++ > > 1 file changed, 132 insertions(+) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > index c548e2fdc58f..d3f38b4b18eb 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > > @@ -39,6 +39,7 @@ > > #include <linux/debugfs.h> > > #include <linux/if_bridge.h> > > #include <linux/filter.h> > > +#include <net/netdev_queues.h> > > #include <net/page_pool/types.h> > > #include <net/pkt_sched.h> > > #include <net/xdp_sock_drv.h> > > @@ -5299,6 +5300,136 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev) > > return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)); > > } > > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i, > > + struct netdev_queue_stats_rx *stats) > > +{ > > + struct mlx5e_priv *priv = netdev_priv(dev); > > + struct mlx5e_channel_stats *channel_stats; > > + struct mlx5e_rq_stats *xskrq_stats; > > + struct mlx5e_rq_stats *rq_stats; > > + > > + ASSERT_RTNL(); > > + if (mlx5e_is_uplink_rep(priv)) > > + return; > > + > > + channel_stats = priv->channel_stats[i]; > > + xskrq_stats = &channel_stats->xskrq; > > + rq_stats = &channel_stats->rq; > > + > > + stats->packets = rq_stats->packets + xskrq_stats->packets; > > + stats->bytes = rq_stats->bytes + xskrq_stats->bytes; > > + stats->alloc_fail = rq_stats->buff_alloc_err + > > + xskrq_stats->buff_alloc_err; > > +} > > + > > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i, > > + struct netdev_queue_stats_tx *stats) > > +{ > > + struct mlx5e_priv *priv = netdev_priv(dev); > > + struct mlx5e_sq_stats *sq_stats; > > + > > + ASSERT_RTNL(); > > + /* no special case needed for ptp htb etc since txq2sq_stats is kept up > > + * to date for active sq_stats, otherwise get_base_stats takes care of > > + * inactive sqs. > > + */ > > + sq_stats = priv->txq2sq_stats[i]; > > + stats->packets = sq_stats->packets; > > + stats->bytes = sq_stats->bytes; > > +} > > + > > +static void mlx5e_get_base_stats(struct net_device *dev, > > + struct netdev_queue_stats_rx *rx, > > + struct netdev_queue_stats_tx *tx) > > +{ > > + struct mlx5e_priv *priv = netdev_priv(dev); > > + struct mlx5e_ptp *ptp_channel; > > + int i, tc; > > + > > + ASSERT_RTNL(); > > + if (!mlx5e_is_uplink_rep(priv)) { > > + rx->packets = 0; > > + rx->bytes = 0; > > + rx->alloc_fail = 0; > > + > > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { > > IIUC, per the current kernel implementation, the lower parts won't be > completed in a loop over [0..real_num_rx_queues-1], as that loop is > conditional, happening only if the queues are active. Sorry, but I'm probably missing something -- you said: > as that loop is conditional, happening only if the queues are active. I don't think so? Please continue reading for an example with code. Let me clarify one thing, please? When you say "the lower parts" here you mean [0...priv->channels.params.num_channels], is that right? If yes, I don't understand why the code in this v5 is wrong. It looks correct to me, if my understanding of "lower parts" is right. Here's an example: 1. Machine boots with 32 queues by default. 2. User runs ethtool -L eth0 combined 4 From mlx5/core/en_ethtool.c, mlx5e_ethtool_set_channels: new_params = *cur_params; new_params.num_channels = count; So, priv->channels.params.num_channels = 4, [0...4) are the active queues. The above loop in mlx5e_get_base_stats sums [4...32), which were previously active but have since been deactivated by a call to ethtool: for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) The (snipped) code for netdev-genl, net/core/netdev-genl.c netdev_nl_stats_by_netdev (which does NOT check IFF_UP) does this: /* ... */ ops->get_base_stats(netdev, &rx_sum, &tx_sum); /* ... */ for (i = 0; i < netdev->real_num_rx_queues; i++) { memset(&rx, 0xff, sizeof(rx)); if (ops->get_queue_stats_rx) ops->get_queue_stats_rx(netdev, i, &rx); netdev_nl_stats_add(&rx_sum, &rx, sizeof(rx)); } /* ... */ ... same for netdev->real_num_tx_queues The above code gets the base stats (which in my example is [4..32)) and then gets the stats for the active RX (and if you continue reading, TX) based on real_num_rx_queues and real_num_tx_queues (which would be [0..4)). This is why in the commit message, my example: $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"ifindex": 7}' The numbers match /proc/net/dev even when the device is down because all queues active and deactivated are summed properly. Do you agree with me so far? The other case is the per-queue case, which is expressed like this (note the different "scope"): ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' In this case the user is querying stats on a per queue basis, not overall across the device. In this case: 1. If the device is marked as !IFF_UP (down), an empty set is returned. 2. Otherwise, as seen in netdev_nl_stats_by_queue (from net/core/netdev-genl.c): while (ops->get_queue_stats_rx && i < netdev->real_num_rx_queues) { err = netdev_nl_stats_queue(netdev, rsp, NETDEV_QUEUE_TYPE_RX, i, info); /* ... */ /* the same for real_num_tx_queues */ And so the individual stats for the active queues are returned (as shown in the commit message example). If you disagree, can you please provide a detailed example so that I can understand where I am going wrong? > I would like the kernel to drop that condition, and stop forcing the device > driver to conditionally include this part in the base. I personally don't think the condition should be dropped, but this is a question for the implementor, who I believe is Jakub. CC: Jakub on Tariq's request/question above. > Otherwise, the lower parts need to be added here. My understanding is that get_base is only called for the summary stats for the entire device, not the per-queue stats, so I don't think the "lower parts" (which I take to mean [0...priv->channels.params.num_channels)) need to be added here. The per-queue stats are only called for a specific queue number that is valid and will be returned by the other functions, not base. Of course, I could be wrong and would appreciate insight from Jakub on this, if possible. > > + struct netdev_queue_stats_rx rx_i = {0}; > > + > > + mlx5e_get_queue_stats_rx(dev, i, &rx_i); > > + > > + rx->packets += rx_i.packets; > > + rx->bytes += rx_i.bytes; > > + rx->alloc_fail += rx_i.alloc_fail; > > + } > > + > > + /* always report PTP RX stats from base as there is no > > + * corresponding channel to report them under in > > + * mlx5e_get_queue_stats_rx. > > + */ > > + if (priv->rx_ptp_opened) { > > + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq; > > + > > + rx->packets += rq_stats->packets; > > + rx->bytes += rq_stats->bytes; > > + } > > + } > > + > > + tx->packets = 0; > > + tx->bytes = 0; > > + > > + for (i = 0; i < priv->stats_nch; i++) { > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i]; > > + > > + /* handle two cases: > > + * > > + * 1. channels which are active. In this case, > > + * report only deactivated TCs on these channels. > > + * > > + * 2. channels which were deactivated > > + * (i > priv->channels.params.num_channels) > > + * must have all of their TCs [0 .. priv->max_opened_tc) > > + * examined because deactivated channels will not be in the > > + * range of [0..real_num_tx_queues) and will not have their > > + * stats reported by mlx5e_get_queue_stats_tx. > > + */ > > + if (i < priv->channels.params.num_channels) > > + tc = mlx5e_get_dcb_num_tc(&priv->channels.params); > > + else > > + tc = 0; > > + > > + for (; tc < priv->max_opened_tc; tc++) { > > + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc]; > > + > > + tx->packets += sq_stats->packets; > > + tx->bytes += sq_stats->bytes; > > + } > > Again, what about the lower part in case queues are not active? I am not trying to be difficult here; I appreciate your time and energy, but I think there is still some misunderstanding here. Probably on my side ;) But, if the queues are not active then any queue above priv->channels.params.num_channels (the non active queues) will have all TCs summed. > > + } > > + > > + /* if PTP TX was opened at some point and has since either: > > + * - been shutdown and set to NULL, or > > + * - simply disabled (bit unset) > > + * > > + * report stats directly from the ptp_stats structures as these queues > > + * are now unavailable and there is no txq index to retrieve these > > + * stats via calls to mlx5e_get_queue_stats_tx. > > + */ > > + ptp_channel = priv->channels.ptp; > > + if (priv->tx_ptp_opened && (!ptp_channel || !test_bit(MLX5E_PTP_STATE_TX, ptp_channel->state))) { > > + for (tc = 0; tc < priv->max_opened_tc; tc++) { > > + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc]; > > + > > + tx->packets += sq_stats->packets; > > + tx->bytes += sq_stats->bytes; > > + } > > + } > > +} > > + > > +static const struct netdev_stat_ops mlx5e_stat_ops = { > > + .get_queue_stats_rx = mlx5e_get_queue_stats_rx, > > + .get_queue_stats_tx = mlx5e_get_queue_stats_tx, > > + .get_base_stats = mlx5e_get_base_stats, > > +}; > > + > > static void mlx5e_build_nic_netdev(struct net_device *netdev) > > { > > struct mlx5e_priv *priv = netdev_priv(netdev); > > @@ -5316,6 +5447,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) > > netdev->watchdog_timeo = 15 * HZ; > > + netdev->stat_ops = &mlx5e_stat_ops; > > netdev->ethtool_ops = &mlx5e_ethtool_ops; > > netdev->vlan_features |= NETIF_F_SG; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats 2024-06-13 22:06 ` Joe Damato @ 2024-06-17 5:17 ` Tariq Toukan 0 siblings, 0 replies; 12+ messages in thread From: Tariq Toukan @ 2024-06-17 5:17 UTC (permalink / raw) To: Joe Damato, Tariq Toukan, netdev, linux-kernel, nalramli, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, open list:MELLANOX MLX5 core VPI driver On 14/06/2024 1:06, Joe Damato wrote: > On Thu, Jun 13, 2024 at 11:25:12PM +0300, Tariq Toukan wrote: >> >> >> On 12/06/2024 23:08, Joe Damato wrote: >>> ./cli.py --spec netlink/specs/netdev.yaml \ >>> --dump qstats-get --json '{"scope": "queue"}' >>> >>> ...snip >>> >>> {'ifindex': 7, >>> 'queue-id': 62, >>> 'queue-type': 'rx', >>> 'rx-alloc-fail': 0, >>> 'rx-bytes': 105965251, >>> 'rx-packets': 179790}, >>> {'ifindex': 7, >>> 'queue-id': 0, >>> 'queue-type': 'tx', >>> 'tx-bytes': 9402665, >>> 'tx-packets': 17551}, >>> >>> ...snip >>> >>> Also tested with the script tools/testing/selftests/drivers/net/stats.py >>> in several scenarios to ensure stats tallying was correct: >>> >>> - on boot (default queue counts) >>> - adjusting queue count up or down (ethtool -L eth0 combined ...) >>> >>> The tools/testing/selftests/drivers/net/stats.py brings the device up, >>> so to test with the device down, I did the following: >>> >>> $ ip link show eth4 >>> 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] >>> [..snip..] >>> >>> $ cat /proc/net/dev | grep eth4 >>> eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] >>> >>> $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ >>> --dump qstats-get --json '{"ifindex": 7}' >>> [{'ifindex': 7, >>> 'rx-alloc-fail': 0, >>> 'rx-bytes': 235710489, >>> 'rx-packets': 434811, >>> 'tx-bytes': 2878744, >>> 'tx-packets': 21227}] >>> >>> Compare the values in /proc/net/dev match the output of cli for the same >>> device, even while the device is down. >>> >>> Note that while the device is down, per queue stats output nothing >>> (because the device is down there are no queues): >> >> Yeah, the query doesn't reach the device driver... > > Yes. Are you suggesting that I update the commit message? I can do > so, if you think that is needed? > No, that's fine. >>> >>> $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ >>> --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' >>> [] >>> >>> Signed-off-by: Joe Damato <jdamato@fastly.com> >>> --- >>> .../net/ethernet/mellanox/mlx5/core/en_main.c | 132 ++++++++++++++++++ >>> 1 file changed, 132 insertions(+) >>> >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>> index c548e2fdc58f..d3f38b4b18eb 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>> @@ -39,6 +39,7 @@ >>> #include <linux/debugfs.h> >>> #include <linux/if_bridge.h> >>> #include <linux/filter.h> >>> +#include <net/netdev_queues.h> >>> #include <net/page_pool/types.h> >>> #include <net/pkt_sched.h> >>> #include <net/xdp_sock_drv.h> >>> @@ -5299,6 +5300,136 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev) >>> return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)); >>> } >>> +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i, >>> + struct netdev_queue_stats_rx *stats) >>> +{ >>> + struct mlx5e_priv *priv = netdev_priv(dev); >>> + struct mlx5e_channel_stats *channel_stats; >>> + struct mlx5e_rq_stats *xskrq_stats; >>> + struct mlx5e_rq_stats *rq_stats; >>> + >>> + ASSERT_RTNL(); >>> + if (mlx5e_is_uplink_rep(priv)) >>> + return; >>> + >>> + channel_stats = priv->channel_stats[i]; >>> + xskrq_stats = &channel_stats->xskrq; >>> + rq_stats = &channel_stats->rq; >>> + >>> + stats->packets = rq_stats->packets + xskrq_stats->packets; >>> + stats->bytes = rq_stats->bytes + xskrq_stats->bytes; >>> + stats->alloc_fail = rq_stats->buff_alloc_err + >>> + xskrq_stats->buff_alloc_err; >>> +} >>> + >>> +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i, >>> + struct netdev_queue_stats_tx *stats) >>> +{ >>> + struct mlx5e_priv *priv = netdev_priv(dev); >>> + struct mlx5e_sq_stats *sq_stats; >>> + >>> + ASSERT_RTNL(); >>> + /* no special case needed for ptp htb etc since txq2sq_stats is kept up >>> + * to date for active sq_stats, otherwise get_base_stats takes care of >>> + * inactive sqs. >>> + */ >>> + sq_stats = priv->txq2sq_stats[i]; >>> + stats->packets = sq_stats->packets; >>> + stats->bytes = sq_stats->bytes; >>> +} >>> + >>> +static void mlx5e_get_base_stats(struct net_device *dev, >>> + struct netdev_queue_stats_rx *rx, >>> + struct netdev_queue_stats_tx *tx) >>> +{ >>> + struct mlx5e_priv *priv = netdev_priv(dev); >>> + struct mlx5e_ptp *ptp_channel; >>> + int i, tc; >>> + >>> + ASSERT_RTNL(); >>> + if (!mlx5e_is_uplink_rep(priv)) { >>> + rx->packets = 0; >>> + rx->bytes = 0; >>> + rx->alloc_fail = 0; >>> + >>> + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) { >> >> IIUC, per the current kernel implementation, the lower parts won't be >> completed in a loop over [0..real_num_rx_queues-1], as that loop is >> conditional, happening only if the queues are active. > > Sorry, but I'm probably missing something -- you said: > >> as that loop is conditional, happening only if the queues are active. > > I don't think so? Please continue reading for an example with code. > > Let me clarify one thing, please? When you say "the lower parts" > here you mean [0...priv->channels.params.num_channels], is that > right? > Right. > If yes, I don't understand why the code in this v5 is wrong. It looks correct > to me, if my understanding of "lower parts" is right. > > Here's an example: > > 1. Machine boots with 32 queues by default. > 2. User runs ethtool -L eth0 combined 4 > > From mlx5/core/en_ethtool.c, mlx5e_ethtool_set_channels: > > new_params = *cur_params; > new_params.num_channels = count; > > So, priv->channels.params.num_channels = 4, [0...4) are the active > queues. > > The above loop in mlx5e_get_base_stats sums [4...32), which were previously > active but have since been deactivated by a call to ethtool: > > for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) > > The (snipped) code for netdev-genl, net/core/netdev-genl.c > netdev_nl_stats_by_netdev (which does NOT check IFF_UP) does this: It does NOT check IFF_UP.. Now things make sense. > > /* ... */ > ops->get_base_stats(netdev, &rx_sum, &tx_sum); > > /* ... */ > for (i = 0; i < netdev->real_num_rx_queues; i++) { > memset(&rx, 0xff, sizeof(rx)); > if (ops->get_queue_stats_rx) > ops->get_queue_stats_rx(netdev, i, &rx); > netdev_nl_stats_add(&rx_sum, &rx, sizeof(rx)); > } > > /* ... */ > ... same for netdev->real_num_tx_queues > > The above code gets the base stats (which in my example is [4..32)) and then > gets the stats for the active RX (and if you continue reading, TX) based on > real_num_rx_queues and real_num_tx_queues (which would be [0..4)). > > This is why in the commit message, my example: > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ > --dump qstats-get --json '{"ifindex": 7}' > > The numbers match /proc/net/dev even when the device is down because all queues > active and deactivated are summed properly. > > Do you agree with me so far? > Yep. > The other case is the per-queue case, which is expressed like this (note the > different "scope"): > > ./cli.py --spec netlink/specs/netdev.yaml \ > --dump qstats-get --json '{"scope": "queue"}' > > In this case the user is querying stats on a per queue basis, not overall > across the device. > > In this case: > 1. If the device is marked as !IFF_UP (down), an empty set is returned. Ok, so here is the main difference in comparison to the previous case. The kernel does the check and doesn't propagate the query to the device driver. I think the kernel can trust the device driver, but ok, that's a different discussion. Thanks for the detailed walkthrough. > 2. Otherwise, as seen in netdev_nl_stats_by_queue (from net/core/netdev-genl.c): > > while (ops->get_queue_stats_rx && i < netdev->real_num_rx_queues) { > err = netdev_nl_stats_queue(netdev, rsp, NETDEV_QUEUE_TYPE_RX, i, info); > /* ... */ > > /* the same for real_num_tx_queues */ > > And so the individual stats for the active queues are returned (as shown in the > commit message example). > > If you disagree, can you please provide a detailed example so that I can > understand where I am going wrong? > >> I would like the kernel to drop that condition, and stop forcing the device >> driver to conditionally include this part in the base. > > I personally don't think the condition should be dropped, but this is a > question for the implementor, who I believe is Jakub. > > CC: Jakub on Tariq's request/question above. > >> Otherwise, the lower parts need to be added here. > > My understanding is that get_base is only called for the summary stats for the > entire device, not the per-queue stats, so I don't think the "lower parts" > (which I take to mean [0...priv->channels.params.num_channels)) need to be added here. > Right. > The per-queue stats are only called for a specific queue number that is valid > and will be returned by the other functions, not base. > > Of course, I could be wrong and would appreciate insight from Jakub > on this, if possible. > >>> + struct netdev_queue_stats_rx rx_i = {0}; >>> + >>> + mlx5e_get_queue_stats_rx(dev, i, &rx_i); >>> + >>> + rx->packets += rx_i.packets; >>> + rx->bytes += rx_i.bytes; >>> + rx->alloc_fail += rx_i.alloc_fail; >>> + } >>> + >>> + /* always report PTP RX stats from base as there is no >>> + * corresponding channel to report them under in >>> + * mlx5e_get_queue_stats_rx. >>> + */ >>> + if (priv->rx_ptp_opened) { >>> + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq; >>> + >>> + rx->packets += rq_stats->packets; >>> + rx->bytes += rq_stats->bytes; >>> + } >>> + } >>> + >>> + tx->packets = 0; >>> + tx->bytes = 0; >>> + >>> + for (i = 0; i < priv->stats_nch; i++) { > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i]; >>> + >>> + /* handle two cases: >>> + * >>> + * 1. channels which are active. In this case, >>> + * report only deactivated TCs on these channels. >>> + * >>> + * 2. channels which were deactivated >>> + * (i > priv->channels.params.num_channels) >>> + * must have all of their TCs [0 .. priv->max_opened_tc) >>> + * examined because deactivated channels will not be in the >>> + * range of [0..real_num_tx_queues) and will not have their >>> + * stats reported by mlx5e_get_queue_stats_tx. >>> + */ >>> + if (i < priv->channels.params.num_channels) >>> + tc = mlx5e_get_dcb_num_tc(&priv->channels.params); >>> + else >>> + tc = 0; >>> + >>> + for (; tc < priv->max_opened_tc; tc++) { >>> + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc]; >>> + >>> + tx->packets += sq_stats->packets; >>> + tx->bytes += sq_stats->bytes; >>> + } >> >> Again, what about the lower part in case queues are not active? > > I am not trying to be difficult here; I appreciate your time and energy, but I > think there is still some misunderstanding here. > > Probably on my side ;) > > But, if the queues are not active then any queue above > priv->channels.params.num_channels (the non active queues) will have all TCs > summed. > Thanks for your contribution. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> >>> + } >>> + >>> + /* if PTP TX was opened at some point and has since either: >>> + * - been shutdown and set to NULL, or >>> + * - simply disabled (bit unset) >>> + * >>> + * report stats directly from the ptp_stats structures as these queues >>> + * are now unavailable and there is no txq index to retrieve these >>> + * stats via calls to mlx5e_get_queue_stats_tx. >>> + */ >>> + ptp_channel = priv->channels.ptp; >>> + if (priv->tx_ptp_opened && (!ptp_channel || !test_bit(MLX5E_PTP_STATE_TX, ptp_channel->state))) { >>> + for (tc = 0; tc < priv->max_opened_tc; tc++) { >>> + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc]; >>> + >>> + tx->packets += sq_stats->packets; >>> + tx->bytes += sq_stats->bytes; >>> + } >>> + } >>> +} >>> + >>> +static const struct netdev_stat_ops mlx5e_stat_ops = { >>> + .get_queue_stats_rx = mlx5e_get_queue_stats_rx, >>> + .get_queue_stats_tx = mlx5e_get_queue_stats_tx, >>> + .get_base_stats = mlx5e_get_base_stats, >>> +}; >>> + >>> static void mlx5e_build_nic_netdev(struct net_device *netdev) >>> { >>> struct mlx5e_priv *priv = netdev_priv(netdev); >>> @@ -5316,6 +5447,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) >>> netdev->watchdog_timeo = 15 * HZ; >>> + netdev->stat_ops = &mlx5e_stat_ops; >>> netdev->ethtool_ops = &mlx5e_ethtool_ops; >>> netdev->vlan_features |= NETIF_F_SG; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net-next v5 0/2] mlx5: Add netdev-genl queue stats 2024-06-12 20:08 [net-next v5 0/2] mlx5: Add netdev-genl queue stats Joe Damato 2024-06-12 20:08 ` [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping Joe Damato 2024-06-12 20:08 ` [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats Joe Damato @ 2024-06-17 10:40 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+netdevbpf @ 2024-06-17 10:40 UTC (permalink / raw) To: Joe Damato Cc: netdev, linux-kernel, nalramli, cjubran, davem, dtatulea, edumazet, gal, kuba, leon, linux-rdma, naveenm, pabeni, richardcochran, saeedm, tariqt Hello: This series was applied to netdev/net-next.git (main) by David S. Miller <davem@davemloft.net>: On Wed, 12 Jun 2024 20:08:55 +0000 you wrote: > Greetings: > > Welcome to v5. > > Switched from RFC to just a v5, because I think this is pretty close. > Minor changes from v4 summarized below in the changelog. > > [...] Here is the summary with links: - [net-next,v5,1/2] net/mlx5e: Add txq to sq stats mapping https://git.kernel.org/netdev/net-next/c/0a3e5c1b670f - [net-next,v5,2/2] net/mlx5e: Add per queue netdev-genl stats https://git.kernel.org/netdev/net-next/c/7b66ae536a78 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-06-17 10:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-12 20:08 [net-next v5 0/2] mlx5: Add netdev-genl queue stats Joe Damato 2024-06-12 20:08 ` [net-next v5 1/2] net/mlx5e: Add txq to sq stats mapping Joe Damato 2024-06-17 5:15 ` Tariq Toukan 2024-06-12 20:08 ` [net-next v5 2/2] net/mlx5e: Add per queue netdev-genl stats Joe Damato 2024-06-13 20:25 ` Tariq Toukan 2024-06-13 21:58 ` Jakub Kicinski 2024-06-13 22:12 ` Joe Damato 2024-06-13 22:36 ` Jakub Kicinski 2024-06-17 8:30 ` Tariq Toukan 2024-06-13 22:06 ` Joe Damato 2024-06-17 5:17 ` Tariq Toukan 2024-06-17 10:40 ` [net-next v5 0/2] mlx5: Add netdev-genl queue stats patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).