[PATCH bpf 2/6] veth: Use tstats per-CPU traffic counters

BPF List
 help / color / mirror / Atom feed

From: Daniel Borkmann <daniel@iogearbox.net>
To: martin.lau@kernel.org
Cc: kuba@kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org,
	Peilin Ye <peilin.ye@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: [PATCH bpf 2/6] veth: Use tstats per-CPU traffic counters
Date: Fri,  3 Nov 2023 23:27:44 +0100	[thread overview]
Message-ID: <20231103222748.12551-3-daniel@iogearbox.net> (raw)
In-Reply-To: <20231103222748.12551-1-daniel@iogearbox.net>

From: Peilin Ye <peilin.ye@bytedance.com>

Currently veth devices use the @lstats per-CPU traffic counters, which only
cover TX traffic.  veth_get_stats64() actually populates RX stats of a veth
device from its peer's TX counters, based on the assumption that a veth
device can _only_ receive packets from its peer, which is no longer true:

For example, recent CNIs (like Cilium) can use the bpf_redirect_peer() BPF
helper to redirect traffic from NIC's TC ingress to veth's TC ingress (in
a different netns), skipping veth's peer device. Unfortunately, this kind
of traffic isn't currently accounted for in veth's RX stats.

In preparation for the fix, use @tstats (instead of @lstats) to maintain
both RX and TX counters for each veth device.  We'll use RX counters for
bpf_redirect_peer() traffic, and keep using TX counters for the usual
"peer-to-peer" traffic. In veth_get_stats64(), calculate RX stats by _adding_
RX count to peer's TX count, in order to cover both kinds of traffic.

veth_stats_rx() might need a name change (perhaps to "veth_stats_xdp()")
for less confusion, but let's leave it to another patch to keep the fix
minimal.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/veth.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9980517ed8b0..df7a7c21a46d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -373,7 +373,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
 		if (!use_napi)
-			dev_lstats_add(dev, length);
+			dev_sw_netstats_tx_add(dev, 1, length);
 		else
 			__veth_xdp_flush(rq);
 	} else {
@@ -387,14 +387,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
-static u64 veth_stats_tx(struct net_device *dev, u64 *packets, u64 *bytes)
-{
-	struct veth_priv *priv = netdev_priv(dev);
-
-	dev_lstats_read(dev, packets, bytes);
-	return atomic64_read(&priv->dropped);
-}
-
 static void veth_stats_rx(struct veth_stats *result, struct net_device *dev)
 {
 	struct veth_priv *priv = netdev_priv(dev);
@@ -432,24 +424,24 @@ static void veth_get_stats64(struct net_device *dev,
 	struct veth_priv *priv = netdev_priv(dev);
 	struct net_device *peer;
 	struct veth_stats rx;
-	u64 packets, bytes;
 
-	tot->tx_dropped = veth_stats_tx(dev, &packets, &bytes);
-	tot->tx_bytes = bytes;
-	tot->tx_packets = packets;
+	tot->tx_dropped = atomic64_read(&priv->dropped);
+	dev_fetch_sw_netstats(tot, dev->tstats);
 
 	veth_stats_rx(&rx, dev);
 	tot->tx_dropped += rx.xdp_tx_err;
 	tot->rx_dropped = rx.rx_drops + rx.peer_tq_xdp_xmit_err;
-	tot->rx_bytes = rx.xdp_bytes;
-	tot->rx_packets = rx.xdp_packets;
+	tot->rx_bytes += rx.xdp_bytes;
+	tot->rx_packets += rx.xdp_packets;
 
 	rcu_read_lock();
 	peer = rcu_dereference(priv->peer);
 	if (peer) {
-		veth_stats_tx(peer, &packets, &bytes);
-		tot->rx_bytes += bytes;
-		tot->rx_packets += packets;
+		struct rtnl_link_stats64 tot_peer = {};
+
+		dev_fetch_sw_netstats(&tot_peer, peer->tstats);
+		tot->rx_bytes += tot_peer.tx_bytes;
+		tot->rx_packets += tot_peer.tx_packets;
 
 		veth_stats_rx(&rx, peer);
 		tot->tx_dropped += rx.peer_tq_xdp_xmit_err;
@@ -1508,13 +1500,13 @@ static int veth_dev_init(struct net_device *dev)
 {
 	int err;
 
-	dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
-	if (!dev->lstats)
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
 		return -ENOMEM;
 
 	err = veth_alloc_queues(dev);
 	if (err) {
-		free_percpu(dev->lstats);
+		free_percpu(dev->tstats);
 		return err;
 	}
 
@@ -1524,7 +1516,7 @@ static int veth_dev_init(struct net_device *dev)
 static void veth_dev_free(struct net_device *dev)
 {
 	veth_free_queues(dev);
-	free_percpu(dev->lstats);
+	free_percpu(dev->tstats);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-- 
2.34.1

next prev parent reply	other threads:[~2023-11-03 22:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 22:27 [PATCH bpf 0/6] bpf_redirect_peer fixes Daniel Borkmann
2023-11-03 22:27 ` [PATCH bpf 1/6] netkit: Add tstats per-CPU traffic counters Daniel Borkmann
2023-11-06 21:28   ` Jakub Kicinski
2023-11-06 23:42     ` Daniel Borkmann
2023-11-03 22:27 ` Daniel Borkmann [this message]
2023-11-03 22:27 ` [PATCH bpf 3/6] bpf: Fix dev's rx stats for bpf_redirect_peer traffic Daniel Borkmann
2023-11-03 22:27 ` [PATCH bpf 4/6] bpf, netkit: Add indirect call wrapper for fetching peer dev Daniel Borkmann
2023-11-06 17:21   ` Stanislav Fomichev
2023-11-06 18:21     ` Daniel Borkmann
2023-11-06 21:32   ` Jakub Kicinski
2023-11-06 23:44     ` Daniel Borkmann
2023-11-03 22:27 ` [PATCH bpf 5/6] selftests/bpf: De-veth-ize the tc_redirect test case Daniel Borkmann
2023-11-03 22:27 ` [PATCH bpf 6/6] selftests/bpf: Add netkit to tc_redirect selftest Daniel Borkmann
2023-11-06 17:22 ` [PATCH bpf 0/6] bpf_redirect_peer fixes Stanislav Fomichev

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9980517ed8b dfblob:df7a7c21a46 )
 OR (
bs:"[PATCH bpf 2/6] veth: Use tstats per-CPU traffic counters" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231103222748.12551-3-daniel@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=kuba@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peilin.ye@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox