virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
       [not found] ` <20211123163955.154512-22-alexandr.lobakin@intel.com>
@ 2021-11-24  0:52   ` Daniel Borkmann
  2021-11-25 11:56     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Borkmann @ 2021-11-24  0:52 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Andrei Vagin, Tony Nguyen, Thomas Petazzoni,
	Ioana Ciornei, Noam Dagan, Leon Romanovsky, Jonathan Corbet,
	linux-rdma, linux-doc, John Fastabend, Russell King,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Yonghong Song,
	Shay Agroskin, Marcin Wojtas, David Arinzon, David Ahern,
	Toke Høiland-Jørgensen, virtualization, linux-kernel,
	Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, Martin KaFai Lau

Hi Alexander,

On 11/23/21 5:39 PM, Alexander Lobakin wrote:
[...]

Just commenting on ice here as one example (similar applies to other drivers):

> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 1dd7e84f41f8..7dc287bc3a1a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
>   		xdp_ring->next_dd = ICE_TX_THRESH - 1;
>   	xdp_ring->next_to_clean = ntc;
>   	ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
> +	xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
> +				total_bytes);
>   }
> 
>   /**
> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
>   		ice_clean_xdp_irq(xdp_ring);
> 
>   	if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
> +		xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
>   		xdp_ring->tx_stats.tx_busy++;
>   		return ICE_XDP_CONSUMED;
>   	}
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index ff55cb415b11..62ef47a38d93 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
>    * @xdp: xdp_buff used as input to the XDP program
>    * @xdp_prog: XDP program to run
>    * @xdp_ring: ring to be used for XDP_TX action
> + * @lrstats: onstack Rx XDP stats
>    *
>    * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
>    */
>   static int
>   ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> -	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
> +	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
> +	       struct xdp_rx_drv_stats_local *lrstats)
>   {
>   	int err, result = ICE_XDP_PASS;
>   	u32 act;
> 
> +	lrstats->bytes += xdp->data_end - xdp->data;
> +	lrstats->packets++;
> +
>   	act = bpf_prog_run_xdp(xdp_prog, xdp);
> 
>   	if (likely(act == XDP_REDIRECT)) {
>   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> -		if (err)
> +		if (err) {
> +			lrstats->redirect_errors++;
>   			goto out_failure;
> +		}
> +		lrstats->redirect++;
>   		return ICE_XDP_REDIR;
>   	}
> 
>   	switch (act) {
>   	case XDP_PASS:
> +		lrstats->pass++;
>   		break;
>   	case XDP_TX:
>   		result = ice_xmit_xdp_buff(xdp, xdp_ring);
> -		if (result == ICE_XDP_CONSUMED)
> +		if (result == ICE_XDP_CONSUMED) {
> +			lrstats->tx_errors++;
>   			goto out_failure;
> +		}
> +		lrstats->tx++;
>   		break;
>   	default:
>   		bpf_warn_invalid_xdp_action(act);
> -		fallthrough;
> +		lrstats->invalid++;
> +		goto out_failure;
>   	case XDP_ABORTED:
> +		lrstats->aborted++;
>   out_failure:
>   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> -		fallthrough;
> +		result = ICE_XDP_CONSUMED;
> +		break;
>   	case XDP_DROP:
>   		result = ICE_XDP_CONSUMED;
> +		lrstats->drop++;
>   		break;
>   	}

Imho, the overall approach is way too bloated. I can see the packets/bytes but now we
have 3 counter updates with return codes included and then the additional sync of the
on-stack counters into the ring counters via xdp_update_rx_drv_stats(). So we now need
ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which syncs 10 different
stat counters via u64_stats_add() into the per ring ones. :/

I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and
export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/
which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the
nodes in the cluster). Given the different action codes are very often application specific,
there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to
gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc).
If really of interest, then maybe libxdp could have such per-action counters as opt-in in
its call chain..

In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under
XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT
where we run into driver-specific errors that are /outside of the reach/ of the BPF prog.
For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the
past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy
was increasing. These things are useful and would make sense to standardize for XDP context.

But then it also seems like above in ice_xmit_xdp_ring() we now need to bump counters
twice just for sake of ethtool vs xdp counters which sucks a bit, would be nice to only
having to do it once:

 >   	if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
 > +		xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
 >   		xdp_ring->tx_stats.tx_busy++;
 >   		return ICE_XDP_CONSUMED;
 >   	}

Anyway, but just to reiterate, for troubleshooting I do care about anomalous events that
led to drops in the driver e.g. due to no space in ring or DMA errors (XDP_TX), or more
detailed insights in xdp_do_redirect() when errors occur (XDP_REDIRECT), very much less
about the action code given the prog has the full error context here already.

One more comment/question on the last doc update patch (I presume you only have dummy
numbers in there from testing?):

+For some interfaces, standard XDP statistics are available.
+It can be accessed the same ways, e.g. `ip`::
+
+  $ ip link xdpstats dev enp178s0
+  16: enp178s0:
+  xdp-channel0-rx_xdp_packets: 0
+  xdp-channel0-rx_xdp_bytes: 1
+  xdp-channel0-rx_xdp_errors: 2

What are the semantics on xdp_errors? Summary of xdp_redirect_errors, xdp_tx_errors and
xdp_xmit_errors? Or driver specific defined?

+  xdp-channel0-rx_xdp_aborted: 3
+  xdp-channel0-rx_xdp_drop: 4
+  xdp-channel0-rx_xdp_invalid: 5
+  xdp-channel0-rx_xdp_pass: 6
[...]

+  xdp-channel0-rx_xdp_redirect: 7
+  xdp-channel0-rx_xdp_redirect_errors: 8
+  xdp-channel0-rx_xdp_tx: 9
+  xdp-channel0-rx_xdp_tx_errors: 10
+  xdp-channel0-tx_xdp_xmit_packets: 11
+  xdp-channel0-tx_xdp_xmit_bytes: 12
+  xdp-channel0-tx_xdp_xmit_errors: 13
+  xdp-channel0-tx_xdp_xmit_full: 14

 From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
to xsk.

> @@ -507,6 +523,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
>   {
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
> +	struct xdp_rx_drv_stats_local lrstats = { };
>   	struct ice_tx_ring *xdp_ring;
>   	unsigned int xdp_xmit = 0;
>   	struct bpf_prog *xdp_prog;
> @@ -548,7 +565,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
>   		xsk_buff_set_size(*xdp, size);
>   		xsk_buff_dma_sync_for_cpu(*xdp, rx_ring->xsk_pool);
> 
> -		xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring);
> +		xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring,
> +					 &lrstats);
>   		if (xdp_res) {
>   			if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
>   				xdp_xmit |= xdp_res;
> @@ -598,6 +616,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> 
>   	ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
>   	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
> +	xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xsk_rx, &lrstats);
> 
>   	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
>   		if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> @@ -629,6 +648,7 @@ static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
>   		struct ice_tx_buf *tx_buf;
> 
>   		if (unlikely(!ICE_DESC_UNUSED(xdp_ring))) {
> +			xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xsk_tx);
>   			xdp_ring->tx_stats.tx_busy++;
>   			work_done = false;
>   			break;
> @@ -686,11 +706,11 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
>    */
>   bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
>   {
> -	int total_packets = 0, total_bytes = 0;
>   	s16 ntc = xdp_ring->next_to_clean;
> +	u32 xdp_frames = 0, xdp_bytes = 0;
> +	u32 xsk_frames = 0, xsk_bytes = 0;
>   	struct ice_tx_desc *tx_desc;
>   	struct ice_tx_buf *tx_buf;
> -	u32 xsk_frames = 0;
>   	bool xmit_done;
> 
>   	tx_desc = ICE_TX_DESC(xdp_ring, ntc);

Thanks,
Daniel
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 08/26] mvpp2: provide .ndo_get_xdp_stats() callback
       [not found] ` <20211123163955.154512-9-alexandr.lobakin@intel.com>
@ 2021-11-24 11:33   ` Russell King (Oracle)
  2021-11-24 11:36   ` Russell King (Oracle)
  1 sibling, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-11-24 11:33 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Yonghong Song,
	Alexei Starovoitov, Andrii Nakryiko, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Vladimir Oltean, Cong Wang,
	Jakub Kicinski, Lorenzo Bianconi, Maciej Fijalkowski,
	Jesper Dangaard Brouer, Johannes Berg, KP Singh, Claudiu Manoil,
	Martin Habets, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, Toke Høiland-Jørgensen,
	virtualization, linux-kernel, Martin KaFai Lau, Edward Cree,
	Yajun Deng, netdev, Saeed Bishara, Michal Swiatkowski, bpf,
	Saeed Mahameed, David S. Miller

On Tue, Nov 23, 2021 at 05:39:37PM +0100, Alexander Lobakin wrote:
> Same as mvneta, mvpp2 stores 7 XDP counters in per-cpu containers.
> Expose them via generic XDP stats infra.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 08/26] mvpp2: provide .ndo_get_xdp_stats() callback
       [not found] ` <20211123163955.154512-9-alexandr.lobakin@intel.com>
  2021-11-24 11:33   ` [PATCH v2 net-next 08/26] mvpp2: provide .ndo_get_xdp_stats() callback Russell King (Oracle)
@ 2021-11-24 11:36   ` Russell King (Oracle)
  1 sibling, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-11-24 11:36 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Yonghong Song,
	Alexei Starovoitov, Andrii Nakryiko, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Vladimir Oltean, Cong Wang,
	Jakub Kicinski, Lorenzo Bianconi, Maciej Fijalkowski,
	Jesper Dangaard Brouer, Johannes Berg, KP Singh, Claudiu Manoil,
	Martin Habets, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, Toke Høiland-Jørgensen,
	virtualization, linux-kernel, Martin KaFai Lau, Edward Cree,
	Yajun Deng, netdev, Saeed Bishara, Michal Swiatkowski, bpf,
	Saeed Mahameed, David S. Miller

On Tue, Nov 23, 2021 at 05:39:37PM +0100, Alexander Lobakin wrote:
> Same as mvneta, mvpp2 stores 7 XDP counters in per-cpu containers.
> Expose them via generic XDP stats infra.
> 
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 51 +++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index 97bd2ee8a010..58203cde3b60 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -5131,6 +5131,56 @@ mvpp2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>  	stats->tx_dropped	= dev->stats.tx_dropped;
>  }
> 
> +static int mvpp2_get_xdp_stats_ndo(const struct net_device *dev, u32 attr_id,
> +				   void *attr_data)
> +{
> +	const struct mvpp2_port *port = netdev_priv(dev);
> +	struct ifla_xdp_stats *xdp_stats = attr_data;
> +	u32 cpu, start;
> +
> +	switch (attr_id) {
> +	case IFLA_XDP_XSTATS_TYPE_XDP:
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		const struct mvpp2_pcpu_stats *ps;
> +		u64 xdp_xmit_err;
> +		u64 xdp_redirect;
> +		u64 xdp_tx_err;
> +		u64 xdp_pass;
> +		u64 xdp_drop;
> +		u64 xdp_xmit;
> +		u64 xdp_tx;
> +
> +		ps = per_cpu_ptr(port->stats, cpu);
> +
> +		do {
> +			start = u64_stats_fetch_begin_irq(&ps->syncp);
> +
> +			xdp_redirect = ps->xdp_redirect;
> +			xdp_pass = ps->xdp_pass;
> +			xdp_drop = ps->xdp_drop;
> +			xdp_xmit = ps->xdp_xmit;
> +			xdp_xmit_err = ps->xdp_xmit_err;
> +			xdp_tx = ps->xdp_tx;
> +			xdp_tx_err = ps->xdp_tx_err;
> +		} while (u64_stats_fetch_retry_irq(&ps->syncp, start));
> +
> +		xdp_stats->redirect += xdp_redirect;
> +		xdp_stats->pass += xdp_pass;
> +		xdp_stats->drop += xdp_drop;
> +		xdp_stats->xmit_packets += xdp_xmit;
> +		xdp_stats->xmit_errors += xdp_xmit_err;
> +		xdp_stats->tx += xdp_tx;
> +		xdp_stats->tx_errors  += xdp_tx_err;
> +	}

Actually, the only concern I have here is the duplication between this
function and mvpp2_get_xdp_stats(). It looks to me like these two
functions could share a lot of their code. Please submit a patch to
make that happen. Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 07/26] mvneta: add .ndo_get_xdp_stats() callback
       [not found] ` <20211123163955.154512-8-alexandr.lobakin@intel.com>
@ 2021-11-24 11:39   ` Russell King (Oracle)
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-11-24 11:39 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Yonghong Song,
	Alexei Starovoitov, Andrii Nakryiko, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Vladimir Oltean, Cong Wang,
	Jakub Kicinski, Lorenzo Bianconi, Maciej Fijalkowski,
	Jesper Dangaard Brouer, Johannes Berg, KP Singh, Claudiu Manoil,
	Martin Habets, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, Toke Høiland-Jørgensen,
	virtualization, linux-kernel, Martin KaFai Lau, Edward Cree,
	Yajun Deng, netdev, Saeed Bishara, Michal Swiatkowski, bpf,
	Saeed Mahameed, David S. Miller

On Tue, Nov 23, 2021 at 05:39:36PM +0100, Alexander Lobakin wrote:
> +	for_each_possible_cpu(cpu) {
> +		const struct mvneta_pcpu_stats *stats;
> +		const struct mvneta_stats *ps;
> +		u64 xdp_xmit_err;
> +		u64 xdp_redirect;
> +		u64 xdp_tx_err;
> +		u64 xdp_pass;
> +		u64 xdp_drop;
> +		u64 xdp_xmit;
> +		u64 xdp_tx;
> +		u32 start;
> +
> +		stats = per_cpu_ptr(pp->stats, cpu);
> +		ps = &stats->es.ps;
> +
> +		do {
> +			start = u64_stats_fetch_begin_irq(&stats->syncp);
> +
> +			xdp_drop = ps->xdp_drop;
> +			xdp_pass = ps->xdp_pass;
> +			xdp_redirect = ps->xdp_redirect;
> +			xdp_tx = ps->xdp_tx;
> +			xdp_tx_err = ps->xdp_tx_err;
> +			xdp_xmit = ps->xdp_xmit;
> +			xdp_xmit_err = ps->xdp_xmit_err;
> +		} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> +
> +		xdp_stats->drop += xdp_drop;
> +		xdp_stats->pass += xdp_pass;
> +		xdp_stats->redirect += xdp_redirect;
> +		xdp_stats->tx += xdp_tx;
> +		xdp_stats->tx_errors += xdp_tx_err;
> +		xdp_stats->xmit_packets += xdp_xmit;
> +		xdp_stats->xmit_errors += xdp_xmit_err;

Same comment as for mvpp2 - this could share a lot of code from
mvneta_ethtool_update_pcpu_stats() (although it means we end up
calculating a little more for the alloc error and refill error
that this API doesn't need) but I think sharing that code would be
a good idea.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
  2021-11-24  0:52   ` [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics Daniel Borkmann
@ 2021-11-25 11:56     ` Toke Høiland-Jørgensen
       [not found]       ` <20211125170708.127323-1-alexandr.lobakin@intel.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-25 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexander Lobakin, David S. Miller,
	Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Andrei Vagin, Tony Nguyen, Thomas Petazzoni,
	Ioana Ciornei, Noam Dagan, Leon Romanovsky, Jonathan Corbet,
	linux-rdma, linux-doc, John Fastabend, Russell King,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Yonghong Song,
	Shay Agroskin, Marcin Wojtas, David Arinzon, David Ahern,
	virtualization, linux-kernel, Edward Cree, Yajun Deng, netdev,
	Saeed Bishara, Michal Swiatkowski, bpf, Saeed Mahameed,
	Martin KaFai Lau

Daniel Borkmann <daniel@iogearbox.net> writes:

> Hi Alexander,
>
> On 11/23/21 5:39 PM, Alexander Lobakin wrote:
> [...]
>
> Just commenting on ice here as one example (similar applies to other drivers):
>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> index 1dd7e84f41f8..7dc287bc3a1a 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
>>   		xdp_ring->next_dd = ICE_TX_THRESH - 1;
>>   	xdp_ring->next_to_clean = ntc;
>>   	ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
>> +	xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
>> +				total_bytes);
>>   }
>> 
>>   /**
>> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
>>   		ice_clean_xdp_irq(xdp_ring);
>> 
>>   	if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
>> +		xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
>>   		xdp_ring->tx_stats.tx_busy++;
>>   		return ICE_XDP_CONSUMED;
>>   	}
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index ff55cb415b11..62ef47a38d93 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
>>    * @xdp: xdp_buff used as input to the XDP program
>>    * @xdp_prog: XDP program to run
>>    * @xdp_ring: ring to be used for XDP_TX action
>> + * @lrstats: onstack Rx XDP stats
>>    *
>>    * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
>>    */
>>   static int
>>   ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>> -	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
>> +	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
>> +	       struct xdp_rx_drv_stats_local *lrstats)
>>   {
>>   	int err, result = ICE_XDP_PASS;
>>   	u32 act;
>> 
>> +	lrstats->bytes += xdp->data_end - xdp->data;
>> +	lrstats->packets++;
>> +
>>   	act = bpf_prog_run_xdp(xdp_prog, xdp);
>> 
>>   	if (likely(act == XDP_REDIRECT)) {
>>   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
>> -		if (err)
>> +		if (err) {
>> +			lrstats->redirect_errors++;
>>   			goto out_failure;
>> +		}
>> +		lrstats->redirect++;
>>   		return ICE_XDP_REDIR;
>>   	}
>> 
>>   	switch (act) {
>>   	case XDP_PASS:
>> +		lrstats->pass++;
>>   		break;
>>   	case XDP_TX:
>>   		result = ice_xmit_xdp_buff(xdp, xdp_ring);
>> -		if (result == ICE_XDP_CONSUMED)
>> +		if (result == ICE_XDP_CONSUMED) {
>> +			lrstats->tx_errors++;
>>   			goto out_failure;
>> +		}
>> +		lrstats->tx++;
>>   		break;
>>   	default:
>>   		bpf_warn_invalid_xdp_action(act);
>> -		fallthrough;
>> +		lrstats->invalid++;
>> +		goto out_failure;
>>   	case XDP_ABORTED:
>> +		lrstats->aborted++;
>>   out_failure:
>>   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>> -		fallthrough;
>> +		result = ICE_XDP_CONSUMED;
>> +		break;
>>   	case XDP_DROP:
>>   		result = ICE_XDP_CONSUMED;
>> +		lrstats->drop++;
>>   		break;
>>   	}
>
> Imho, the overall approach is way too bloated. I can see the
> packets/bytes but now we have 3 counter updates with return codes
> included and then the additional sync of the on-stack counters into
> the ring counters via xdp_update_rx_drv_stats(). So we now need
> ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which
> syncs 10 different stat counters via u64_stats_add() into the per ring
> ones. :/
>
> I'm just taking our XDP L4LB in Cilium as an example: there we already
> count errors and export them via per-cpu map that eventually lead to
> XDP_DROP cases including the /reason/ which caused the XDP_DROP (e.g.
> Prometheus can then scrape these insights from all the nodes in the
> cluster). Given the different action codes are very often application
> specific, there's not much debugging that you can do when /only/
> looking at `ip link xdpstats` to gather insight on *why* some of these
> actions were triggered (e.g. fib lookup failure, etc). If really of
> interest, then maybe libxdp could have such per-action counters as
> opt-in in its call chain..

To me, standardising these counters is less about helping people debug
their XDP programs (as you say, you can put your own telemetry into
those), and more about making XDP less "mystical" to the system
administrator (who may not be the same person who wrote the XDP
programs). So at the very least, they need to indicate "where are the
packets going", which means at least counters for DROP, REDIRECT and TX
(+ errors for tx/redirect) in addition to the "processed by XDP" initial
counter. Which in the above means 'pass', 'invalid' and 'aborted' could
be dropped, I guess; but I don't mind terribly keeping them either given
that there's no measurable performance impact.

> But then it also seems like above in ice_xmit_xdp_ring() we now need
> to bump counters twice just for sake of ethtool vs xdp counters which
> sucks a bit, would be nice to only having to do it once:

This I agree with, and while I can see the layering argument for putting
them into 'ip' and rtnetlink instead of ethtool, I also worry that these
counters will simply be lost in obscurity, so I do wonder if it wouldn't
be better to accept the "layering violation" and keeping them all in the
'ethtool -S' output?

[...]

> +  xdp-channel0-rx_xdp_redirect: 7
> +  xdp-channel0-rx_xdp_redirect_errors: 8
> +  xdp-channel0-rx_xdp_tx: 9
> +  xdp-channel0-rx_xdp_tx_errors: 10
> +  xdp-channel0-tx_xdp_xmit_packets: 11
> +  xdp-channel0-tx_xdp_xmit_bytes: 12
> +  xdp-channel0-tx_xdp_xmit_errors: 13
> +  xdp-channel0-tx_xdp_xmit_full: 14
>
>  From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
> to xsk.

+1, these should probably be xdp-channel0-tx_xsk_* or something like
that...

-Toke

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
       [not found]           ` <20211125204007.133064-1-alexandr.lobakin@intel.com>
@ 2021-11-26 12:30             ` Toke Høiland-Jørgensen
       [not found]               ` <20211126100611.514df099@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-26 12:30 UTC (permalink / raw)
  To: Alexander Lobakin, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Leon Romanovsky, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	Daniel Borkmann, David Arinzon, David Ahern, virtualization,
	linux-kernel, Martin KaFai Lau, Edward Cree, Yajun Deng, netdev,
	Saeed Bishara, Michal Swiatkowski, bpf, Saeed Mahameed,
	David S. Miller

Alexander Lobakin <alexandr.lobakin@intel.com> writes:

> From: Jakub Kicinski <kuba@kernel.org>
> Date: Thu, 25 Nov 2021 09:44:40 -0800
>
>> On Thu, 25 Nov 2021 18:07:08 +0100 Alexander Lobakin wrote:
>> > > This I agree with, and while I can see the layering argument for putting
>> > > them into 'ip' and rtnetlink instead of ethtool, I also worry that these
>> > > counters will simply be lost in obscurity, so I do wonder if it wouldn't
>> > > be better to accept the "layering violation" and keeping them all in the
>> > > 'ethtool -S' output?  
>> > 
>> > I don't think we should harm the code and the logics in favor of
>> > 'some of the users can face something'. We don't control anything
>> > related to XDP using Ethtool at all, but there is some XDP-related
>> > stuff inside iproute2 code, so for me it's even more intuitive to
>> > have them there.
>> > Jakub, may be you'd like to add something at this point?
>> 
>> TBH I wasn't following this thread too closely since I saw Daniel
>> nacked it already. I do prefer rtnl xstats, I'd just report them 
>> in -s if they are non-zero. But doesn't sound like we have an agreement
>> whether they should exist or not.
>
> Right, just -s is fine, if we drop the per-channel approach.

I agree that adding them to -s is fine (and that resolves my "no one
will find them" complain as well). If it crowds the output we could also
default to only output'ing a subset, and have the more detailed
statistics hidden behind a verbose switch (or even just in the JSON
output)?

>> Can we think of an approach which would make cloudflare and cilium
>> happy? Feels like we're trying to make the slightly hypothetical 
>> admin happy while ignoring objections of very real users.
>
> The initial idea was to only uniform the drivers. But in general
> you are right, 10 drivers having something doesn't mean it's
> something good.

I don't think it's accurate to call the admin use case "hypothetical".
We're expending a significant effort explaining to people that XDP can
"eat" your packets, and not having any standard statistics makes this
way harder. We should absolutely cater to our "early adopters", but if
we want XDP to see wider adoption, making it "less weird" is critical!

> Maciej, I think you were talking about Cilium asking for those stats
> in Intel drivers? Could you maybe provide their exact usecases/needs
> so I'll orient myself? I certainly remember about XSK Tx packets and
> bytes.
> And speaking of XSK Tx, we have per-socket stats, isn't that enough?

IMO, as long as the packets are accounted for in the regular XDP stats,
having a whole separate set of stats only for XSK is less important.

>> Please leave the per-channel stats out. They make a precedent for
>> channel stats which should be an attribute of a channel. Working for 
>> a large XDP user for a couple of years now I can tell you from my own
>> experience I've not once found them useful. In fact per-queue stats are
>> a major PITA as they crowd the output.
>
> Oh okay. My very first iterations were without this, but then I
> found most of the drivers expose their XDP stats per-channel. Since
> I didn't plan to degrade the functionality, they went that way.

I personally find the per-channel stats quite useful. One of the primary
reasons for not achieving full performance with XDP is broken
configuration of packet steering to CPUs, and having per-channel stats
is a nice way of seeing this. I can see the point about them being way
too verbose in the default output, though, and I do generally filter the
output as well when viewing them. But see my point above about only
printing a subset of the stats by default; per-channel stats could be
JSON-only, for instance?

-Toke

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
       [not found]               ` <20211126100611.514df099@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
@ 2021-11-26 18:47                 ` Toke Høiland-Jørgensen
       [not found]                   ` <20211126111431.4a2ed007@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
  2021-11-26 22:27                 ` Daniel Borkmann
  1 sibling, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-26 18:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Leon Romanovsky, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	Daniel Borkmann, David Arinzon, David Ahern, virtualization,
	linux-kernel, Martin KaFai Lau, Edward Cree, Yajun Deng, netdev,
	Saeed Bishara, Michal Swiatkowski, bpf, Saeed Mahameed,
	David S. Miller

Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>> >> TBH I wasn't following this thread too closely since I saw Daniel
>> >> nacked it already. I do prefer rtnl xstats, I'd just report them 
>> >> in -s if they are non-zero. But doesn't sound like we have an agreement
>> >> whether they should exist or not.  
>> >
>> > Right, just -s is fine, if we drop the per-channel approach.  
>> 
>> I agree that adding them to -s is fine (and that resolves my "no one
>> will find them" complain as well). If it crowds the output we could also
>> default to only output'ing a subset, and have the more detailed
>> statistics hidden behind a verbose switch (or even just in the JSON
>> output)?
>> 
>> >> Can we think of an approach which would make cloudflare and cilium
>> >> happy? Feels like we're trying to make the slightly hypothetical 
>> >> admin happy while ignoring objections of very real users.  
>> >
>> > The initial idea was to only uniform the drivers. But in general
>> > you are right, 10 drivers having something doesn't mean it's
>> > something good.  
>> 
>> I don't think it's accurate to call the admin use case "hypothetical".
>> We're expending a significant effort explaining to people that XDP can
>> "eat" your packets, and not having any standard statistics makes this
>> way harder. We should absolutely cater to our "early adopters", but if
>> we want XDP to see wider adoption, making it "less weird" is critical!
>
> Fair. In all honesty I said that hoping to push for a more flexible
> approach hidden entirely in BPF, and not involving driver changes.
> Assuming the XDP program has more fine grained stats we should be able
> to extract those instead of double-counting. Hence my vague "let's work
> with apps" comment.
>
> For example to a person familiar with the workload it'd be useful to
> know if program returned XDP_DROP because of configured policy or
> failure to parse a packet. I don't think that sort distinction is
> achievable at the level of standard stats.
>
> The information required by the admin is higher level. As you say the
> primary concern there is "how many packets did XDP eat".

Right, sure, I am also totally fine with having only a somewhat
restricted subset of stats available at the interface level and make
everything else be BPF-based. I'm hoping we can converge of a common
understanding of what this "minimal set" should be :)

> Speaking of which, one thing that badly needs clarification is our
> expectation around XDP packets getting counted towards the interface
> stats.

Agreed. My immediate thought is that "XDP packets are interface packets"
but that is certainly not what we do today, so not sure if changing it
at this point would break things?

>> > Maciej, I think you were talking about Cilium asking for those stats
>> > in Intel drivers? Could you maybe provide their exact usecases/needs
>> > so I'll orient myself? I certainly remember about XSK Tx packets and
>> > bytes.
>> > And speaking of XSK Tx, we have per-socket stats, isn't that enough?  
>> 
>> IMO, as long as the packets are accounted for in the regular XDP stats,
>> having a whole separate set of stats only for XSK is less important.
>> 
>> >> Please leave the per-channel stats out. They make a precedent for
>> >> channel stats which should be an attribute of a channel. Working for 
>> >> a large XDP user for a couple of years now I can tell you from my own
>> >> experience I've not once found them useful. In fact per-queue stats are
>> >> a major PITA as they crowd the output.  
>> >
>> > Oh okay. My very first iterations were without this, but then I
>> > found most of the drivers expose their XDP stats per-channel. Since
>> > I didn't plan to degrade the functionality, they went that way.  
>> 
>> I personally find the per-channel stats quite useful. One of the primary
>> reasons for not achieving full performance with XDP is broken
>> configuration of packet steering to CPUs, and having per-channel stats
>> is a nice way of seeing this.
>
> Right, that's about the only thing I use it for as well. "Is the load
> evenly distributed?"  But that's not XDP specific and not worth
> standardizing for, yet, IMO, because..
>
>> I can see the point about them being way too verbose in the default
>> output, though, and I do generally filter the output as well when
>> viewing them. But see my point above about only printing a subset of
>> the stats by default; per-channel stats could be JSON-only, for
>> instance?
>
> we don't even know what constitutes a channel today. And that will
> become increasingly problematic as importance of application specific
> queues increases (zctap etc). IMO until the ontological gaps around
> queues are filled we should leave per-queue stats in ethtool -S.

Hmm, right, I see. I suppose that as long as the XDP packets show up in
one of the interface counters in ethtool -S, it's possible to answer the
load distribution issue, and any further debugging (say, XDP drops on a
certain queue due to CPU-based queue indexing on TX) can be delegated to
BPF-based tools...

-Toke

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
       [not found]               ` <20211126100611.514df099@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
  2021-11-26 18:47                 ` Toke Høiland-Jørgensen
@ 2021-11-26 22:27                 ` Daniel Borkmann
  2021-11-29 11:51                   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Borkmann @ 2021-11-26 22:27 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Leon Romanovsky, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	David Arinzon, David Ahern, virtualization, linux-kernel,
	Martin KaFai Lau, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, David S. Miller

On 11/26/21 7:06 PM, Jakub Kicinski wrote:
> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>>>> TBH I wasn't following this thread too closely since I saw Daniel
>>>> nacked it already. I do prefer rtnl xstats, I'd just report them
>>>> in -s if they are non-zero. But doesn't sound like we have an agreement
>>>> whether they should exist or not.
>>>
>>> Right, just -s is fine, if we drop the per-channel approach.
>>
>> I agree that adding them to -s is fine (and that resolves my "no one
>> will find them" complain as well). If it crowds the output we could also
>> default to only output'ing a subset, and have the more detailed
>> statistics hidden behind a verbose switch (or even just in the JSON
>> output)?
>>
>>>> Can we think of an approach which would make cloudflare and cilium
>>>> happy? Feels like we're trying to make the slightly hypothetical
>>>> admin happy while ignoring objections of very real users.
>>>
>>> The initial idea was to only uniform the drivers. But in general
>>> you are right, 10 drivers having something doesn't mean it's
>>> something good.
>>
>> I don't think it's accurate to call the admin use case "hypothetical".
>> We're expending a significant effort explaining to people that XDP can
>> "eat" your packets, and not having any standard statistics makes this
>> way harder. We should absolutely cater to our "early adopters", but if
>> we want XDP to see wider adoption, making it "less weird" is critical!
> 
> Fair. In all honesty I said that hoping to push for a more flexible
> approach hidden entirely in BPF, and not involving driver changes.
> Assuming the XDP program has more fine grained stats we should be able
> to extract those instead of double-counting. Hence my vague "let's work
> with apps" comment.
> 
> For example to a person familiar with the workload it'd be useful to
> know if program returned XDP_DROP because of configured policy or
> failure to parse a packet. I don't think that sort distinction is
> achievable at the level of standard stats.

Agree on the additional context. How often have you looked at tc clsact
/dropped/ stats specifically when you debug a more complex BPF program
there?

   # tc -s qdisc show clsact dev foo
   qdisc clsact ffff: parent ffff:fff1
    Sent 6800 bytes 120 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Similarly, XDP_PASS counters may be of limited use as well for same reason
(and I think we might not even have a tc counter equivalent for it).

> The information required by the admin is higher level. As you say the
> primary concern there is "how many packets did XDP eat".

Agree. Above said, for XDP_DROP I would see one use case where you compare
different drivers or bond vs no bond as we did in the past in [0] when
testing against a packet generator (although I don't see bond driver covered
in this series here yet where it aggregates the XDP stats from all bond slave
devs).

On a higher-level wrt "how many packets did XDP eat", it would make sense
to have the stats for successful XDP_{TX,REDIRECT} given these are out
of reach from a BPF prog PoV - we can only count there how many times we
returned with XDP_TX but not whether the pkt /successfully made it/.

In terms of error cases, could we just standardize all drivers on the behavior
of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
hit the trace_xdp_exception() and then fallthrough to bump a drop counter
(same as we bump in XDP_DROP then). So the drop counter will account for
program drops but also driver-related drops.

At some later point the trace_xdp_exception() could be extended with an error
code that the driver would propagate (given some of them look quite similar
across drivers, fwiw), and then whoever wants to do further processing with
them can do so via bpftrace or other tooling.

So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
bytes & packets counters that XDP sees, (driver-)successful tx & redirect
counters as well as drop counter. Also, XDP bytes & packets counters should
not be counted twice wrt ethtool stats.

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e

Thanks,
Daniel
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
       [not found]                   ` <20211126111431.4a2ed007@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
@ 2021-11-28 17:54                     ` Ido Schimmel
  0 siblings, 0 replies; 17+ messages in thread
From: Ido Schimmel @ 2021-11-28 17:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Leon Romanovsky, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, nikolay, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	petrm, Daniel Borkmann, David Arinzon, David Ahern,
	Toke Høiland-Jørgensen, virtualization, linux-kernel,
	Martin KaFai Lau, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, David S. Miller

+Petr, Nik

On Fri, Nov 26, 2021 at 11:14:31AM -0800, Jakub Kicinski wrote:
> On Fri, 26 Nov 2021 19:47:17 +0100 Toke Høiland-Jørgensen wrote:
> > > Fair. In all honesty I said that hoping to push for a more flexible
> > > approach hidden entirely in BPF, and not involving driver changes.
> > > Assuming the XDP program has more fine grained stats we should be able
> > > to extract those instead of double-counting. Hence my vague "let's work
> > > with apps" comment.
> > >
> > > For example to a person familiar with the workload it'd be useful to
> > > know if program returned XDP_DROP because of configured policy or
> > > failure to parse a packet. I don't think that sort distinction is
> > > achievable at the level of standard stats.
> > >
> > > The information required by the admin is higher level. As you say the
> > > primary concern there is "how many packets did XDP eat".  
> > 
> > Right, sure, I am also totally fine with having only a somewhat
> > restricted subset of stats available at the interface level and make
> > everything else be BPF-based. I'm hoping we can converge of a common
> > understanding of what this "minimal set" should be :)
> > 
> > > Speaking of which, one thing that badly needs clarification is our
> > > expectation around XDP packets getting counted towards the interface
> > > stats.  
> > 
> > Agreed. My immediate thought is that "XDP packets are interface packets"
> > but that is certainly not what we do today, so not sure if changing it
> > at this point would break things?
> 
> I'd vote for taking the risk and trying to align all the drivers.

I agree. I think IFLA_STATS64 in RTM_NEWLINK should contain statistics
of all the packets seen by the netdev. The breakdown into software /
hardware / XDP should be reported via RTM_NEWSTATS.

Currently, for soft devices such as VLANs, bridges and GRE, user space
only sees statistics of packets forwarded by software, which is quite
useless when forwarding is offloaded from the kernel to hardware.

Petr is working on exposing hardware statistics for such devices via
rtnetlink. Unlike XDP (?), we need to be able to let user space enable /
disable hardware statistics as we have a limited number of hardware
counters and they can also reduce the bandwidth when enabled. We are
thinking of adding a new RTM_SETSTATS for that:

# ip stats set dev swp1 hw_stats on

For query, something like (under discussion):

# ip stats show dev swp1 // all groups
# ip stats show dev swp1 group link
# ip stats show dev swp1 group offload // all sub-groups
# ip stats show dev swp1 group offload sub-group cpu
# ip stats show dev swp1 group offload sub-group hw

Like other iproute2 commands, these follow the nesting of the
RTM_{NEW,GET}STATS uAPI.

Looking at patch #1 [1], I think that whatever you decide to expose for
XDP can be queried via:

# ip stats show dev swp1 group xdp
# ip stats show dev swp1 group xdp sub-group regular
# ip stats show dev swp1 group xdp sub-group xsk

Regardless, the following command should show statistics of all the
packets seen by the netdev:

# ip -s link show dev swp1

There is a PR [2] for node_exporter to use rtnetlink to fetch netdev
statistics instead of the old proc interface. It should be possible to
extend it to use RTM_*STATS for more fine-grained statistics.

[1] https://lore.kernel.org/netdev/20211123163955.154512-2-alexandr.lobakin@intel.com/
[2] https://github.com/prometheus/node_exporter/pull/2074
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
       [not found] <20211123163955.154512-1-alexandr.lobakin@intel.com>
                   ` (2 preceding siblings ...)
       [not found] ` <20211123163955.154512-8-alexandr.lobakin@intel.com>
@ 2021-11-28 22:23 ` David Ahern
       [not found] ` <20211123163955.154512-2-alexandr.lobakin@intel.com>
       [not found] ` <20211130155612.594688-1-alexandr.lobakin@intel.com>
  5 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2021-11-28 22:23 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Andrei Vagin, Tony Nguyen, Thomas Petazzoni,
	Ioana Ciornei, Noam Dagan, Daniel Borkmann, Jonathan Corbet,
	linux-rdma, linux-doc, John Fastabend, Russell King,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Yonghong Song,
	Shay Agroskin, Marcin Wojtas, Leon Romanovsky, David Arinzon,
	David Ahern, Toke Høiland-Jørgensen, virtualization,
	linux-kernel, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, Martin KaFai Lau

On 11/23/21 9:39 AM, Alexander Lobakin wrote:
> This is an almost complete rework of [0].
> 
> This series introduces generic XDP statistics infra based on rtnl
> xstats (Ethtool standard stats previously), and wires up the drivers
> which collect appropriate statistics to this new interface. Finally,
> it introduces XDP/XSK statistics to all XDP-capable Intel drivers.
> 
> Those counters are:
> * packets: number of frames passed to bpf_prog_run_xdp().
> * bytes: number of bytes went through bpf_prog_run_xdp().
> * errors: number of general XDP errors, if driver has one unified
>   counter.
> * aborted: number of XDP_ABORTED returns.
> * drop: number of XDP_DROP returns.
> * invalid: number of returns of unallowed values (i.e. not XDP_*).
> * pass: number of XDP_PASS returns.
> * redirect: number of successfully performed XDP_REDIRECT requests.
> * redirect_errors: number of failed XDP_REDIRECT requests.
> * tx: number of successfully performed XDP_TX requests.
> * tx_errors: number of failed XDP_TX requests.
> * xmit_packets: number of successfully transmitted XDP/XSK frames.
> * xmit_bytes: number of successfully transmitted XDP/XSK frames.
> * xmit_errors: of XDP/XSK frames failed to transmit.
> * xmit_full: number of XDP/XSK queue being full at the moment of
>   transmission.
> 
> To provide them, developers need to implement .ndo_get_xdp_stats()
> and, if they want to expose stats on a per-channel basis,
> .ndo_get_xdp_stats_nch(). include/net/xdp.h contains some helper

Why the tie to a channel? There are Rx queues and Tx queues and no
requirement to link them into a channel. It would be better (more
flexible) to allow them to be independent. Rather than ask the driver
"how many channels", ask 'how many Rx queues' and 'how many Tx queues'
for which xdp stats are reported.

From there, allow queue numbers or queue id's to be non-consecutive and
add a queue id or number as an attribute. e.g.,

[XDP stats]
	[ Rx queue N]
		counters


	[ Tx queue N]
		counters

This would allow a follow on patch set to do something like "Give me XDP
stats for Rx queue N" instead of doing a full dump.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
  2021-11-26 22:27                 ` Daniel Borkmann
@ 2021-11-29 11:51                   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-29 11:51 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Leon Romanovsky, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	David Arinzon, David Ahern, virtualization, linux-kernel,
	Martin KaFai Lau, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, David S. Miller

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
>> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>>>>> TBH I wasn't following this thread too closely since I saw Daniel
>>>>> nacked it already. I do prefer rtnl xstats, I'd just report them
>>>>> in -s if they are non-zero. But doesn't sound like we have an agreement
>>>>> whether they should exist or not.
>>>>
>>>> Right, just -s is fine, if we drop the per-channel approach.
>>>
>>> I agree that adding them to -s is fine (and that resolves my "no one
>>> will find them" complain as well). If it crowds the output we could also
>>> default to only output'ing a subset, and have the more detailed
>>> statistics hidden behind a verbose switch (or even just in the JSON
>>> output)?
>>>
>>>>> Can we think of an approach which would make cloudflare and cilium
>>>>> happy? Feels like we're trying to make the slightly hypothetical
>>>>> admin happy while ignoring objections of very real users.
>>>>
>>>> The initial idea was to only uniform the drivers. But in general
>>>> you are right, 10 drivers having something doesn't mean it's
>>>> something good.
>>>
>>> I don't think it's accurate to call the admin use case "hypothetical".
>>> We're expending a significant effort explaining to people that XDP can
>>> "eat" your packets, and not having any standard statistics makes this
>>> way harder. We should absolutely cater to our "early adopters", but if
>>> we want XDP to see wider adoption, making it "less weird" is critical!
>> 
>> Fair. In all honesty I said that hoping to push for a more flexible
>> approach hidden entirely in BPF, and not involving driver changes.
>> Assuming the XDP program has more fine grained stats we should be able
>> to extract those instead of double-counting. Hence my vague "let's work
>> with apps" comment.
>> 
>> For example to a person familiar with the workload it'd be useful to
>> know if program returned XDP_DROP because of configured policy or
>> failure to parse a packet. I don't think that sort distinction is
>> achievable at the level of standard stats.
>
> Agree on the additional context. How often have you looked at tc clsact
> /dropped/ stats specifically when you debug a more complex BPF program
> there?
>
>    # tc -s qdisc show clsact dev foo
>    qdisc clsact ffff: parent ffff:fff1
>     Sent 6800 bytes 120 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
>
> Similarly, XDP_PASS counters may be of limited use as well for same reason
> (and I think we might not even have a tc counter equivalent for it).
>
>> The information required by the admin is higher level. As you say the
>> primary concern there is "how many packets did XDP eat".
>
> Agree. Above said, for XDP_DROP I would see one use case where you compare
> different drivers or bond vs no bond as we did in the past in [0] when
> testing against a packet generator (although I don't see bond driver covered
> in this series here yet where it aggregates the XDP stats from all bond slave
> devs).
>
> On a higher-level wrt "how many packets did XDP eat", it would make sense
> to have the stats for successful XDP_{TX,REDIRECT} given these are out
> of reach from a BPF prog PoV - we can only count there how many times we
> returned with XDP_TX but not whether the pkt /successfully made it/.
>
> In terms of error cases, could we just standardize all drivers on the behavior
> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
> (same as we bump in XDP_DROP then). So the drop counter will account for
> program drops but also driver-related drops.
>
> At some later point the trace_xdp_exception() could be extended with an error
> code that the driver would propagate (given some of them look quite similar
> across drivers, fwiw), and then whoever wants to do further processing with
> them can do so via bpftrace or other tooling.
>
> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
> tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
> counters as well as drop counter. Also, XDP bytes & packets counters should
> not be counted twice wrt ethtool stats.

This sounds reasonable to me, and I also like the error code to
tracepoint idea :)

-Toke

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 01/26] rtnetlink: introduce generic XDP statistics
       [not found] ` <20211123163955.154512-2-alexandr.lobakin@intel.com>
@ 2021-11-30  2:36   ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2021-11-30  2:36 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Andrei Vagin, Tony Nguyen, Thomas Petazzoni,
	Ioana Ciornei, Noam Dagan, Daniel Borkmann, Jonathan Corbet,
	linux-rdma, linux-doc, John Fastabend, Russell King,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Yonghong Song,
	Shay Agroskin, Marcin Wojtas, Leon Romanovsky, David Arinzon,
	David Ahern, Toke Høiland-Jørgensen, virtualization,
	linux-kernel, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, Martin KaFai Lau

On 11/23/21 9:39 AM, Alexander Lobakin wrote:
> +static bool rtnl_get_xdp_stats_xdpxsk(struct sk_buff *skb, u32 ch,
> +				      const void *attr_data)
> +{
> +	const struct ifla_xdp_stats *xstats = attr_data;
> +
> +	xstats += ch;
> +
> +	if (nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PACKETS, xstats->packets,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_BYTES, xstats->bytes,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ERRORS, xstats->errors,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ABORTED, xstats->aborted,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_DROP, xstats->drop,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_INVALID, xstats->invalid,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PASS, xstats->pass,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT, xstats->redirect,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT_ERRORS,
> +			      xstats->redirect_errors,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX, xstats->tx,
> +			      IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX_ERRORS,
> +			      xstats->tx_errors, IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_PACKETS,
> +			      xstats->xmit_packets, IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_BYTES,
> +			      xstats->xmit_bytes, IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_ERRORS,
> +			      xstats->xmit_errors, IFLA_XDP_XSTATS_UNSPEC) ||
> +	    nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_FULL,
> +			      xstats->xmit_full, IFLA_XDP_XSTATS_UNSPEC))
> +		return false;
> +
> +	return true;
> +}
> +

Another thought on this patch: with individual attributes you could save
some overhead by not sending 0 counters to userspace. e.g., define a
helper that does:

static inline int nla_put_u64_if_set(struct sk_buff *skb, int attrtype,
                                     u64 value, int padattr)
{
	if (value)
		return nla_put_u64_64bit(skb, attrtype, value, padattr);

	return 0;
}
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
       [not found] ` <20211130155612.594688-1-alexandr.lobakin@intel.com>
@ 2021-11-30 16:17   ` Toke Høiland-Jørgensen
       [not found]     ` <20211130090716.4a557036@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
       [not found]   ` <20211130081207.228f42ba@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
  2021-11-30 17:45   ` David Ahern
  2 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-11-30 16:17 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Noam Dagan, Daniel Borkmann,
	Jonathan Corbet, linux-rdma, linux-doc, John Fastabend,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Alexander Lobakin,
	Yonghong Song, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, virtualization, linux-kernel,
	Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, Martin KaFai Lau

Alexander Lobakin <alexandr.lobakin@intel.com> writes:

> From: Alexander Lobakin <alexandr.lobakin@intel.com>
> Date: Tue, 23 Nov 2021 17:39:29 +0100
>
> Ok, open questions:
>
> 1. Channels vs queues vs global.
>
> Jakub: no per-channel.
> David (Ahern): it's worth it to separate as Rx/Tx.
> Toke is fine with globals at the end I think?

Well, I don't like throwing data away, so in that sense I do like
per-queue stats, but it's not a very strong preference (i.e., I can live
with either)...

> My point was that for most of the systems we have 1:1 Rx:Tx
> (usually num_online_cpus()), so asking drivers separately for
> the number of RQs and then SQs would end up asking for the same
> number twice.
> But the main reason TBH was that most of the drivers store stats
> on a per-channel basis and I didn't want them to regress in
> functionality. I'm fine with reporting only netdev-wide if
> everyone are.
>
> In case if we keep per-channel: report per-channel only by request
> and cumulative globals by default to not flood the output?

... however if we do go with per-channel stats I do agree that they
shouldn't be in the default output. I guess netlink could still split
them out and iproute2 could just sum them before display?

> 2. Count all errors as "drops" vs separately.
>
> Daniel: account everything as drops, plus errors should be
> reported as exceptions for tracing sub.
> Jesper: we shouldn't mix drops and errors.
>
> My point: we shouldn't, that's why there are patches for 2 drivers
> to give errors a separate counter.
> I provided an option either to report all errors together ('errors'
> in stats structure) or to provide individual counters for each of
> them (sonamed ctrs), but personally prefer detailed errors. However,
> they might "go detailed" under trace_xdp_exception() only, sound
> fine (OTOH in RTNL stats we have both "general" errors and detailed
> error counters).

I agree it would be nice to have a separate error counter, but a single
counter is enough when combined with the tracepoints.

> 3. XDP and XSK ctrs separately or not.
>
> My PoV is that those are two quite different worlds.
> However, stats for actions on XSK really make a little sense since
> 99% of time we have xskmap redirect. So I think it'd be fine to just
> expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> the rest (actions, errors) together with XDP.

A whole set of separate counters for XSK is certainly overkill. No
strong preference as to whether they need a separate counter at all...

> Rest:
>  - don't create a separate `ip` command and report under `-s`;
>  - save some RTNL skb space by skipping zeroed counters.
>
> Also, regarding that I count all on the stack and then add to the
> storage once in a polling cycle -- most drivers don't do that and
> just increment the values in the storage directly, but this can be
> less performant for frequently updated stats (or it's just my
> embedded past).
> Re u64 vs u64_stats_t -- the latter is more universal and
> architecture-friendly, the former is used directly in most of the
> drivers primarily because those drivers and the corresponding HW
> are being run on 64-bit systems in the vast majority of cases, and
> Ethtools stats themselves are not so critical to guard them with
> anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?

I'm generally a fan of correctness first, so since you're touching all
the drivers anyway why I'd say go for u64_stats_t :)

-Toke

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
       [not found]       ` <20211130090449.58a8327d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
@ 2021-11-30 17:38         ` David Ahern
  2021-12-01 15:21           ` Jamal Hadi Salim
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2021-11-30 17:38 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Lobakin
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Yonghong Song, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, Toke Høiland-Jørgensen,
	virtualization, linux-kernel, Martin KaFai Lau, Edward Cree,
	Yajun Deng, netdev, Saeed Bishara, Michal Swiatkowski, bpf,
	Saeed Mahameed, David S. Miller

On 11/30/21 10:04 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
>>> Another thought on this patch: with individual attributes you could save
>>> some overhead by not sending 0 counters to userspace. e.g., define a
>>> helper that does:  
>>
>> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
>> does it?
> 
> Not sure if you're asking me or Dave but no, to my knowledge RTNL does
> not use such semantics today. But the reason is mostly because there
> weren't many driver stats added there. Knowing if an error counter is
> not supported or supporter and 0 is important for monitoring. Even if
> XDP stats don't have a counter which may not be supported today it's
> not a good precedent to make IMO.
> 

Today, stats are sent as a struct so skipping stats whose value is 0 is
not an option. When using individual attributes for the counters this
becomes an option. Given there is no value in sending '0' why do it?

Is your pushback that there should be a uapi to opt-in to this behavior?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
       [not found] ` <20211130155612.594688-1-alexandr.lobakin@intel.com>
  2021-11-30 16:17   ` [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats Toke Høiland-Jørgensen
       [not found]   ` <20211130081207.228f42ba@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
@ 2021-11-30 17:45   ` David Ahern
  2 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2021-11-30 17:45 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Andrei Vagin, Tony Nguyen, Thomas Petazzoni,
	Ioana Ciornei, Noam Dagan, Daniel Borkmann, Jonathan Corbet,
	linux-rdma, linux-doc, John Fastabend, Russell King,
	Arthur Kiyanovski, Cong Wang, Martin Habets, Lorenzo Bianconi,
	Maciej Fijalkowski, Jesper Dangaard Brouer, Johannes Berg,
	KP Singh, Andrii Nakryiko, Claudiu Manoil, Yonghong Song,
	Shay Agroskin, Marcin Wojtas, Leon Romanovsky, David Arinzon,
	David Ahern, Toke Høiland-Jørgensen, virtualization,
	linux-kernel, Edward Cree, Yajun Deng, netdev, Saeed Bishara,
	Michal Swiatkowski, bpf, Saeed Mahameed, Martin KaFai Lau

On 11/30/21 8:56 AM, Alexander Lobakin wrote:
> Rest:
>  - don't create a separate `ip` command and report under `-s`;

Reporting XDP stats under 'ip -s' is not going to be scalable from a
readability perspective.

ifstat (misc/ifstat.c) has support for extended stats which is where you
are adding these.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
       [not found]     ` <20211130090716.4a557036@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
@ 2021-11-30 17:56       ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2021-11-30 17:56 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Alexander Lobakin, Yonghong Song, Shay Agroskin, Marcin Wojtas,
	Leon Romanovsky, David Arinzon, David Ahern, virtualization,
	linux-kernel, Martin KaFai Lau, Edward Cree, Yajun Deng, netdev,
	Saeed Bishara, Michal Swiatkowski, bpf, Saeed Mahameed,
	David S. Miller

On 11/30/21 10:07 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
>>> 1. Channels vs queues vs global.
>>>
>>> Jakub: no per-channel.
>>> David (Ahern): it's worth it to separate as Rx/Tx.
>>> Toke is fine with globals at the end I think?  
>>
>> Well, I don't like throwing data away, so in that sense I do like
>> per-queue stats, but it's not a very strong preference (i.e., I can live
>> with either)...
> 
> We don't even have a clear definition of a queue in Linux.
> 

The summary above says "Jakub: no per-channel", and then you have this
comment about a clear definition of a queue. What is your preference
here, Jakub? I think I have gotten lost in all of the coments.

My request was just to not lump Rx and Tx together under a 'channel'
definition as a new API. Proposals like zctap and 'queues as a first
class citizen' are examples of intentions / desires to move towards Rx
and Tx queues beyond what exists today.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats
  2021-11-30 17:38         ` David Ahern
@ 2021-12-01 15:21           ` Jamal Hadi Salim
  0 siblings, 0 replies; 17+ messages in thread
From: Jamal Hadi Salim @ 2021-12-01 15:21 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Alexander Lobakin
  Cc: Song Liu, Sergey Ryazanov, Michael S. Tsirkin, Vladimir Oltean,
	Alexei Starovoitov, Russell King, Andrei Vagin, Tony Nguyen,
	Thomas Petazzoni, Ioana Ciornei, Arthur Kiyanovski,
	Daniel Borkmann, Jonathan Corbet, linux-rdma, linux-doc,
	John Fastabend, Noam Dagan, Cong Wang, Martin Habets,
	Lorenzo Bianconi, Maciej Fijalkowski, Jesper Dangaard Brouer,
	Johannes Berg, KP Singh, Andrii Nakryiko, Claudiu Manoil,
	Yonghong Song, Shay Agroskin, Marcin Wojtas, Leon Romanovsky,
	David Arinzon, David Ahern, Toke Høiland-Jørgensen,
	virtualization, linux-kernel, Martin KaFai Lau, Edward Cree,
	Yajun Deng, netdev, Saeed Bishara, Michal Swiatkowski, bpf,
	Saeed Mahameed, David S. Miller

On 2021-11-30 12:38, David Ahern wrote:
> Today, stats are sent as a struct so skipping stats whose value is 0 is
> not an option. When using individual attributes for the counters this
> becomes an option. Given there is no value in sending '0' why do it?
> 
> Is your pushback that there should be a uapi to opt-in to this behavior?

A filter in the netlink request should help pick what is user-preferred.
You can default to not sending zeros.

cheers,
jamal
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-12-01 15:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211123163955.154512-1-alexandr.lobakin@intel.com>
     [not found] ` <20211123163955.154512-22-alexandr.lobakin@intel.com>
2021-11-24  0:52   ` [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics Daniel Borkmann
2021-11-25 11:56     ` Toke Høiland-Jørgensen
     [not found]       ` <20211125170708.127323-1-alexandr.lobakin@intel.com>
     [not found]         ` <20211125094440.6c402d63@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
     [not found]           ` <20211125204007.133064-1-alexandr.lobakin@intel.com>
2021-11-26 12:30             ` Toke Høiland-Jørgensen
     [not found]               ` <20211126100611.514df099@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-26 18:47                 ` Toke Høiland-Jørgensen
     [not found]                   ` <20211126111431.4a2ed007@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-28 17:54                     ` Ido Schimmel
2021-11-26 22:27                 ` Daniel Borkmann
2021-11-29 11:51                   ` Toke Høiland-Jørgensen
     [not found] ` <20211123163955.154512-9-alexandr.lobakin@intel.com>
2021-11-24 11:33   ` [PATCH v2 net-next 08/26] mvpp2: provide .ndo_get_xdp_stats() callback Russell King (Oracle)
2021-11-24 11:36   ` Russell King (Oracle)
     [not found] ` <20211123163955.154512-8-alexandr.lobakin@intel.com>
2021-11-24 11:39   ` [PATCH v2 net-next 07/26] mvneta: add " Russell King (Oracle)
2021-11-28 22:23 ` [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats David Ahern
     [not found] ` <20211123163955.154512-2-alexandr.lobakin@intel.com>
2021-11-30  2:36   ` [PATCH v2 net-next 01/26] rtnetlink: introduce generic XDP statistics David Ahern
     [not found] ` <20211130155612.594688-1-alexandr.lobakin@intel.com>
2021-11-30 16:17   ` [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats Toke Høiland-Jørgensen
     [not found]     ` <20211130090716.4a557036@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-30 17:56       ` David Ahern
     [not found]   ` <20211130081207.228f42ba@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
     [not found]     ` <20211130163454.595897-1-alexandr.lobakin@intel.com>
     [not found]       ` <20211130090449.58a8327d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-30 17:38         ` David Ahern
2021-12-01 15:21           ` Jamal Hadi Salim
2021-11-30 17:45   ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).