From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
Alexander Lobakin <alexandr.lobakin@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Cc: Song Liu <songliubraving@fb.com>,
Sergey Ryazanov <ryazanov.s.a@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Vladimir Oltean <vladimir.oltean@nxp.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrei Vagin <avagin@gmail.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
Ioana Ciornei <ioana.ciornei@nxp.com>,
Noam Dagan <ndagan@amazon.com>, Leon Romanovsky <leon@kernel.org>,
Jonathan Corbet <corbet@lwn.net>,
linux-rdma@vger.kernel.org, linux-doc@vger.kernel.org,
John Fastabend <john.fastabend@gmail.com>,
Russell King <linux@armlinux.org.uk>,
Arthur Kiyanovski <akiyano@amazon.com>,
Cong Wang <cong.wang@bytedance.com>,
Martin Habets <habetsm.xilinx@gmail.com>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Johannes Berg <johannes.berg@intel.com>,
KP Singh <kpsingh@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Claudiu Manoil <claudiu.manoil@nxp.com>,
Yonghong Song <yhs@fb.com>, Shay Agroskin <shayagr@amazon.com>,
Marcin Wojtas <mw@semihalf.com>,
David Arinzon <darinzon@amazon.com>,
David Ahern <dsahern@kernel.org>,
virtualization@lists.linux-foundation.org,
linux-kernel@vger.kernel.org,
Edward Cree <ecree.xilinx@gmail.com>,
Yajun Deng <yajun.deng@linux.dev>,
netdev@vger.kernel.org, Saeed Bishara <saeedb@amazon.com>,
Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
bpf@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Martin KaFai Lau <kafai@fb.com>
Subject: Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
Date: Thu, 25 Nov 2021 12:56:01 +0100 [thread overview]
Message-ID: <87bl28bga6.fsf@toke.dk> (raw)
In-Reply-To: <77407c26-4e32-232c-58e0-2d601d781f84@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> writes:
> Hi Alexander,
>
> On 11/23/21 5:39 PM, Alexander Lobakin wrote:
> [...]
>
> Just commenting on ice here as one example (similar applies to other drivers):
>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> index 1dd7e84f41f8..7dc287bc3a1a 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
>> xdp_ring->next_dd = ICE_TX_THRESH - 1;
>> xdp_ring->next_to_clean = ntc;
>> ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
>> + xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
>> + total_bytes);
>> }
>>
>> /**
>> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
>> ice_clean_xdp_irq(xdp_ring);
>>
>> if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
>> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
>> xdp_ring->tx_stats.tx_busy++;
>> return ICE_XDP_CONSUMED;
>> }
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index ff55cb415b11..62ef47a38d93 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
>> * @xdp: xdp_buff used as input to the XDP program
>> * @xdp_prog: XDP program to run
>> * @xdp_ring: ring to be used for XDP_TX action
>> + * @lrstats: onstack Rx XDP stats
>> *
>> * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
>> */
>> static int
>> ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>> - struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
>> + struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
>> + struct xdp_rx_drv_stats_local *lrstats)
>> {
>> int err, result = ICE_XDP_PASS;
>> u32 act;
>>
>> + lrstats->bytes += xdp->data_end - xdp->data;
>> + lrstats->packets++;
>> +
>> act = bpf_prog_run_xdp(xdp_prog, xdp);
>>
>> if (likely(act == XDP_REDIRECT)) {
>> err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
>> - if (err)
>> + if (err) {
>> + lrstats->redirect_errors++;
>> goto out_failure;
>> + }
>> + lrstats->redirect++;
>> return ICE_XDP_REDIR;
>> }
>>
>> switch (act) {
>> case XDP_PASS:
>> + lrstats->pass++;
>> break;
>> case XDP_TX:
>> result = ice_xmit_xdp_buff(xdp, xdp_ring);
>> - if (result == ICE_XDP_CONSUMED)
>> + if (result == ICE_XDP_CONSUMED) {
>> + lrstats->tx_errors++;
>> goto out_failure;
>> + }
>> + lrstats->tx++;
>> break;
>> default:
>> bpf_warn_invalid_xdp_action(act);
>> - fallthrough;
>> + lrstats->invalid++;
>> + goto out_failure;
>> case XDP_ABORTED:
>> + lrstats->aborted++;
>> out_failure:
>> trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>> - fallthrough;
>> + result = ICE_XDP_CONSUMED;
>> + break;
>> case XDP_DROP:
>> result = ICE_XDP_CONSUMED;
>> + lrstats->drop++;
>> break;
>> }
>
> Imho, the overall approach is way too bloated. I can see the
> packets/bytes but now we have 3 counter updates with return codes
> included and then the additional sync of the on-stack counters into
> the ring counters via xdp_update_rx_drv_stats(). So we now need
> ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which
> syncs 10 different stat counters via u64_stats_add() into the per ring
> ones. :/
>
> I'm just taking our XDP L4LB in Cilium as an example: there we already
> count errors and export them via per-cpu map that eventually lead to
> XDP_DROP cases including the /reason/ which caused the XDP_DROP (e.g.
> Prometheus can then scrape these insights from all the nodes in the
> cluster). Given the different action codes are very often application
> specific, there's not much debugging that you can do when /only/
> looking at `ip link xdpstats` to gather insight on *why* some of these
> actions were triggered (e.g. fib lookup failure, etc). If really of
> interest, then maybe libxdp could have such per-action counters as
> opt-in in its call chain..
To me, standardising these counters is less about helping people debug
their XDP programs (as you say, you can put your own telemetry into
those), and more about making XDP less "mystical" to the system
administrator (who may not be the same person who wrote the XDP
programs). So at the very least, they need to indicate "where are the
packets going", which means at least counters for DROP, REDIRECT and TX
(+ errors for tx/redirect) in addition to the "processed by XDP" initial
counter. Which in the above means 'pass', 'invalid' and 'aborted' could
be dropped, I guess; but I don't mind terribly keeping them either given
that there's no measurable performance impact.
> But then it also seems like above in ice_xmit_xdp_ring() we now need
> to bump counters twice just for sake of ethtool vs xdp counters which
> sucks a bit, would be nice to only having to do it once:
This I agree with, and while I can see the layering argument for putting
them into 'ip' and rtnetlink instead of ethtool, I also worry that these
counters will simply be lost in obscurity, so I do wonder if it wouldn't
be better to accept the "layering violation" and keeping them all in the
'ethtool -S' output?
[...]
> + xdp-channel0-rx_xdp_redirect: 7
> + xdp-channel0-rx_xdp_redirect_errors: 8
> + xdp-channel0-rx_xdp_tx: 9
> + xdp-channel0-rx_xdp_tx_errors: 10
> + xdp-channel0-tx_xdp_xmit_packets: 11
> + xdp-channel0-tx_xdp_xmit_bytes: 12
> + xdp-channel0-tx_xdp_xmit_errors: 13
> + xdp-channel0-tx_xdp_xmit_full: 14
>
> From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
> to xsk.
+1, these should probably be xdp-channel0-tx_xsk_* or something like
that...
-Toke
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-11-25 11:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20211123163955.154512-1-alexandr.lobakin@intel.com>
[not found] ` <20211123163955.154512-22-alexandr.lobakin@intel.com>
2021-11-24 0:52 ` [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics Daniel Borkmann
2021-11-25 11:56 ` Toke Høiland-Jørgensen [this message]
[not found] ` <20211125170708.127323-1-alexandr.lobakin@intel.com>
[not found] ` <20211125094440.6c402d63@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
[not found] ` <20211125204007.133064-1-alexandr.lobakin@intel.com>
2021-11-26 12:30 ` Toke Høiland-Jørgensen
[not found] ` <20211126100611.514df099@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-26 18:47 ` Toke Høiland-Jørgensen
[not found] ` <20211126111431.4a2ed007@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-28 17:54 ` Ido Schimmel
2021-11-26 22:27 ` Daniel Borkmann
2021-11-29 11:51 ` Toke Høiland-Jørgensen
[not found] ` <20211123163955.154512-9-alexandr.lobakin@intel.com>
2021-11-24 11:33 ` [PATCH v2 net-next 08/26] mvpp2: provide .ndo_get_xdp_stats() callback Russell King (Oracle)
2021-11-24 11:36 ` Russell King (Oracle)
[not found] ` <20211123163955.154512-8-alexandr.lobakin@intel.com>
2021-11-24 11:39 ` [PATCH v2 net-next 07/26] mvneta: add " Russell King (Oracle)
2021-11-28 22:23 ` [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats David Ahern
[not found] ` <20211123163955.154512-2-alexandr.lobakin@intel.com>
2021-11-30 2:36 ` [PATCH v2 net-next 01/26] rtnetlink: introduce generic XDP statistics David Ahern
[not found] ` <20211130155612.594688-1-alexandr.lobakin@intel.com>
2021-11-30 16:17 ` [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats Toke Høiland-Jørgensen
[not found] ` <20211130090716.4a557036@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-30 17:56 ` David Ahern
[not found] ` <20211130081207.228f42ba@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
[not found] ` <20211130163454.595897-1-alexandr.lobakin@intel.com>
[not found] ` <20211130090449.58a8327d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2021-11-30 17:38 ` David Ahern
2021-12-01 15:21 ` Jamal Hadi Salim
2021-11-30 17:45 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bl28bga6.fsf@toke.dk \
--to=toke@redhat.com \
--cc=akiyano@amazon.com \
--cc=alexandr.lobakin@intel.com \
--cc=andrii@kernel.org \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=avagin@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=claudiu.manoil@nxp.com \
--cc=cong.wang@bytedance.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=darinzon@amazon.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=ecree.xilinx@gmail.com \
--cc=habetsm.xilinx@gmail.com \
--cc=hawk@kernel.org \
--cc=ioana.ciornei@nxp.com \
--cc=johannes.berg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=lorenzo@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=michal.swiatkowski@linux.intel.com \
--cc=mst@redhat.com \
--cc=mw@semihalf.com \
--cc=ndagan@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=ryazanov.s.a@gmail.com \
--cc=saeedb@amazon.com \
--cc=saeedm@nvidia.com \
--cc=shayagr@amazon.com \
--cc=songliubraving@fb.com \
--cc=thomas.petazzoni@bootlin.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=vladimir.oltean@nxp.com \
--cc=yajun.deng@linux.dev \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).