[PATCH RFC net-next 0/4] improve hw flow offload byte accounting

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
@ 2026-04-09 13:07 Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 1/4] net: flow_offload: let drivers report byte counter semantics Daniel Golle
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 13:07 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek, netfilter-devel,
	coreteam

Hardware flow counters report raw byte counts whose semantics
vary by vendor -- some count ingress L2 frames, others egress
L2, others L3. The nf_flow_table framework currently passes
these bytes straight to conntrack without conversion, and
sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
never see any counter updates at all.

This series lets drivers declare what their counters represent,
so the framework can normalize to L3 for conntrack and
propagate per-layer stats to encap sub-interfaces.

Questions:
 - Sub-interface stats accesses vlan_dev_priv() directly --
   should there be a generic netdev callback instead?
 - Are there hw offload drivers whose counters do not fit the
   ingress-L2 / egress-L2 / L3 model?

Daniel Golle (4):
  net: flow_offload: let drivers report byte counter semantics
  nf_flow_table: track sub-interface and bridge ifindex in flow tuple
  nf_flow_table: convert hw byte counts and update sub-interface stats
  net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats

 .../net/ethernet/mediatek/mtk_ppe_offload.c   |   1 +
 include/net/flow_offload.h                    |   7 +
 include/net/netfilter/nf_flow_table.h         |   5 +
 net/netfilter/nf_flow_table_core.c            |   2 +
 net/netfilter/nf_flow_table_offload.c         | 174 +++++++++++++++++-
 net/netfilter/nf_flow_table_path.c            |   8 +
 6 files changed, 195 insertions(+), 2 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC net-next 1/4] net: flow_offload: let drivers report byte counter semantics
  2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
@ 2026-04-09 13:07 ` Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 2/4] nf_flow_table: track sub-interface and bridge ifindex in flow tuple Daniel Golle
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 13:07 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek, netfilter-devel,
	coreteam

Hardware flow offload engines count bytes at different points --
some report ingress L2 frame bytes, others egress L2, others L3.
Add an enum so drivers can declare what their counters represent.
The framework can then convert to L3 as needed for conntrack.

Default is FLOW_STATS_BYTES_L3 (zero), preserving existing
behaviour for all current drivers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 include/net/flow_offload.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 70a02ee143080..7f5ef29b3abce 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -541,12 +541,19 @@ static inline bool flow_rule_match_has_control_flags(const struct flow_rule *rul
 	return flow_rule_has_control_flags(match.mask->flags, extack);
 }
 
+enum flow_stats_byte_type {
+	FLOW_STATS_BYTES_L3 = 0,	/* L3 (inner IP) bytes */
+	FLOW_STATS_BYTES_INGRESS_L2,	/* full ingress L2 frame bytes */
+	FLOW_STATS_BYTES_EGRESS_L2,	/* full egress L2 frame bytes */
+};
+
 struct flow_stats {
 	u64	pkts;
 	u64	bytes;
 	u64	drops;
 	u64	lastused;
 	enum flow_action_hw_stats used_hw_stats;
+	enum flow_stats_byte_type byte_type;
 	bool used_hw_stats_valid;
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC net-next 2/4] nf_flow_table: track sub-interface and bridge ifindex in flow tuple
  2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 1/4] net: flow_offload: let drivers report byte counter semantics Daniel Golle
@ 2026-04-09 13:07 ` Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 3/4] nf_flow_table: convert hw byte counts and update sub-interface stats Daniel Golle
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 13:07 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek, netfilter-devel,
	coreteam

Store the net_device ifindex alongside each encap entry and for
bridge devices during path discovery so the flow offload stats
path can later update sub-interface (VLAN, PPPoE, bridge)
counters for hw-offloaded flows.

The indices are placed below __hash so they do not affect flow
tuple lookups.

No functional change -- the indices are stored but not yet used.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 include/net/netfilter/nf_flow_table.h | 5 +++++
 net/netfilter/nf_flow_table_core.c    | 2 ++
 net/netfilter/nf_flow_table_path.c    | 8 ++++++++
 3 files changed, 15 insertions(+)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index b09c11c048d51..ec1a18cfd9621 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -148,6 +148,9 @@ struct flow_offload_tuple {
 	/* All members above are keys for lookups, see flow_offload_hash(). */
 	struct { }			__hash;
 
+	int				encap_ifidx[NF_FLOW_TABLE_ENCAP_MAX];
+	int				bridge_ifidx;
+
 	u8				dir:2,
 					xmit_type:3,
 					encap_num:2,
@@ -221,11 +224,13 @@ struct nf_flow_route {
 			struct {
 				u16		id;
 				__be16		proto;
+				int		ifindex;
 			} encap[NF_FLOW_TABLE_ENCAP_MAX];
 			struct flow_offload_tunnel tun;
 			u8			num_encaps:2,
 						num_tuns:2,
 						ingress_vlans:2;
+			int			bridge_ifindex;
 		} in;
 		struct {
 			u32			ifindex;
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 2c4140e6f53c5..9bc8be177b392 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -115,6 +115,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
 	for (i = route->tuple[dir].in.num_encaps - 1; i >= 0; i--) {
 		flow_tuple->encap[j].id = route->tuple[dir].in.encap[i].id;
 		flow_tuple->encap[j].proto = route->tuple[dir].in.encap[i].proto;
+		flow_tuple->encap_ifidx[j] = route->tuple[dir].in.encap[i].ifindex;
 		if (route->tuple[dir].in.ingress_vlans & BIT(i))
 			flow_tuple->in_vlan_ingress |= BIT(j);
 		j++;
@@ -123,6 +124,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
 	flow_tuple->tun = route->tuple[dir].in.tun;
 	flow_tuple->encap_num = route->tuple[dir].in.num_encaps;
 	flow_tuple->tun_num = route->tuple[dir].in.num_tuns;
+	flow_tuple->bridge_ifidx = route->tuple[dir].in.bridge_ifindex;
 
 	switch (route->tuple[dir].xmit_type) {
 	case FLOW_OFFLOAD_XMIT_DIRECT:
diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c
index 6bb9579dcc2ab..c5817cb96a9f6 100644
--- a/net/netfilter/nf_flow_table_path.c
+++ b/net/netfilter/nf_flow_table_path.c
@@ -79,8 +79,10 @@ struct nft_forward_info {
 	struct id {
 		__u16	id;
 		__be16	proto;
+		int	ifindex;
 	} encap[NF_FLOW_TABLE_ENCAP_MAX];
 	u8 num_encaps;
+	int bridge_ifindex;
 	struct flow_offload_tunnel tun;
 	u8 num_tuns;
 	u8 ingress_vlans;
@@ -136,12 +138,15 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
 					path->encap.id;
 				info->encap[info->num_encaps].proto =
 					path->encap.proto;
+				info->encap[info->num_encaps].ifindex =
+					path->dev->ifindex;
 				info->num_encaps++;
 			}
 			if (path->type == DEV_PATH_PPPOE)
 				memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
 			break;
 		case DEV_PATH_BRIDGE:
+			info->bridge_ifindex = path->dev->ifindex;
 			if (is_zero_ether_addr(info->h_source))
 				memcpy(info->h_source, path->dev->dev_addr, ETH_ALEN);
 
@@ -156,6 +161,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
 				}
 				info->encap[info->num_encaps].id = path->bridge.vlan_id;
 				info->encap[info->num_encaps].proto = path->bridge.vlan_proto;
+				info->encap[info->num_encaps].ifindex = path->dev->ifindex;
 				info->num_encaps++;
 				break;
 			case DEV_PATH_BR_VLAN_UNTAG:
@@ -261,6 +267,7 @@ static void nft_dev_forward_path(const struct nft_pktinfo *pkt,
 	for (i = 0; i < info.num_encaps; i++) {
 		route->tuple[!dir].in.encap[i].id = info.encap[i].id;
 		route->tuple[!dir].in.encap[i].proto = info.encap[i].proto;
+		route->tuple[!dir].in.encap[i].ifindex = info.encap[i].ifindex;
 	}
 
 	if (info.num_tuns &&
@@ -273,6 +280,7 @@ static void nft_dev_forward_path(const struct nft_pktinfo *pkt,
 
 	route->tuple[!dir].in.num_encaps = info.num_encaps;
 	route->tuple[!dir].in.ingress_vlans = info.ingress_vlans;
+	route->tuple[!dir].in.bridge_ifindex = info.bridge_ifindex;
 
 	if (info.xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) {
 		memcpy(route->tuple[dir].out.h_source, info.h_source, ETH_ALEN);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC net-next 3/4] nf_flow_table: convert hw byte counts and update sub-interface stats
  2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 1/4] net: flow_offload: let drivers report byte counter semantics Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 2/4] nf_flow_table: track sub-interface and bridge ifindex in flow tuple Daniel Golle
@ 2026-04-09 13:07 ` Daniel Golle
  2026-04-09 13:07 ` [PATCH RFC net-next 4/4] net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats Daniel Golle
  2026-04-09 13:52 ` [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Pablo Neira Ayuso
  4 siblings, 0 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 13:07 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek, netfilter-devel,
	coreteam

Hardware flow offload counters may report L2 frame bytes while
conntrack expects L3 (IP) bytes. When a driver sets byte_type
to INGRESS_L2 or EGRESS_L2, subtract the appropriate per-direction
encap and tunnel overhead to derive L3 byte counts for conntrack.

Additionally, propagate per-flow stats to bridge, VLAN and PPPoE
sub-interfaces that are bypassed by hardware offloading. Each
sub-interface gets the L3 byte count plus the overhead of any
inner encap layers below it, matching what the software path
would count. Both RX and TX directions are updated.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 net/netfilter/nf_flow_table_offload.c | 174 +++++++++++++++++++++++++-
 1 file changed, 172 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 002ec15d988bd..67452da487c94 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -5,6 +5,8 @@
 #include <linux/netfilter.h>
 #include <linux/rhashtable.h>
 #include <linux/netdevice.h>
+#include <linux/if_vlan.h>
+#include <linux/if_pppox.h>
 #include <linux/tc_act/tc_csum.h>
 #include <net/flow_offload.h>
 #include <net/ip_tunnels.h>
@@ -1008,10 +1010,135 @@ static void flow_offload_tuple_stats(struct flow_offload_work *offload,
 			      &offload->flowtable->flow_block.cb_list);
 }
 
+static int flow_offload_encap_hlen(const struct flow_offload_tuple *tuple,
+				   int idx)
+{
+	switch (tuple->encap[idx].proto) {
+	case htons(ETH_P_8021Q):
+	case htons(ETH_P_8021AD):
+		return VLAN_HLEN;
+	case htons(ETH_P_PPP_SES):
+		return PPPOE_SES_HLEN;
+	}
+	return 0;
+}
+
+static void flow_offload_encap_netstats(struct net_device *dev,
+					__be16 encap_proto,
+					bool rx, u64 pkts, u64 bytes)
+{
+	struct pcpu_sw_netstats *tstats;
+	struct vlan_pcpu_stats *vstats;
+
+	if (encap_proto == htons(ETH_P_8021Q) ||
+	    encap_proto == htons(ETH_P_8021AD)) {
+		vstats = this_cpu_ptr(vlan_dev_priv(dev)->vlan_pcpu_stats);
+		u64_stats_update_begin(&vstats->syncp);
+		if (rx) {
+			u64_stats_add(&vstats->rx_packets, pkts);
+			u64_stats_add(&vstats->rx_bytes, bytes);
+		} else {
+			u64_stats_add(&vstats->tx_packets, pkts);
+			u64_stats_add(&vstats->tx_bytes, bytes);
+		}
+		u64_stats_update_end(&vstats->syncp);
+	} else if (dev->tstats) {
+		tstats = this_cpu_ptr(dev->tstats);
+		u64_stats_update_begin(&tstats->syncp);
+		if (rx) {
+			u64_stats_add(&tstats->rx_packets, pkts);
+			u64_stats_add(&tstats->rx_bytes, bytes);
+		} else {
+			u64_stats_add(&tstats->tx_packets, pkts);
+			u64_stats_add(&tstats->tx_bytes, bytes);
+		}
+		u64_stats_update_end(&tstats->syncp);
+	}
+}
+
+/* Update sub-interface (VLAN, PPPoE) stats for hw-offloaded flows.
+ *
+ * The driver reports L3 (IP) bytes. Each sub-interface in the
+ * software path sees the frame with the headers of all layers
+ * BELOW it still present, so we add back inner-layer overhead.
+ *
+ * encap[] is ordered outermost to innermost, so walk from the
+ * innermost layer outward, accumulating overhead as we go.
+ */
+static void flow_offload_update_encap_stats(struct flow_offload *flow,
+					    struct flow_offload_tuple *tuple,
+					    bool rx, u64 pkts, u64 bytes)
+{
+	struct net_device *dev;
+	int inner_hlen = 0;
+	int i;
+
+	for (i = tuple->encap_num - 1; i >= 0; i--) {
+		if (tuple->in_vlan_ingress & BIT(i))
+			continue;
+
+		dev = dev_get_by_index_rcu(dev_net(flow->ct->ct_net),
+					   tuple->encap_ifidx[i]);
+		if (dev)
+			flow_offload_encap_netstats(dev,
+						    tuple->encap[i].proto, rx,
+						    pkts,
+						    bytes + inner_hlen * pkts);
+
+		inner_hlen += flow_offload_encap_hlen(tuple, i);
+	}
+
+	/* Bridge device sits outside all encap layers -- it sees
+	 * L3 bytes plus the full encap overhead.
+	 */
+	if (tuple->bridge_ifidx) {
+		dev = dev_get_by_index_rcu(dev_net(flow->ct->ct_net),
+					   tuple->bridge_ifidx);
+		if (dev && dev->tstats)
+			flow_offload_encap_netstats(dev, 0, rx, pkts,
+						    bytes + inner_hlen * pkts);
+	}
+}
+
+/* Compute per-direction input overhead from the encap and tunnel
+ * chains. Hardware flow counters report L2 frame bytes but
+ * conntrack expects L3 (inner IP) bytes -- matching what the
+ * software path sees after stripping all encap and tunnel headers.
+ */
+static int flow_offload_input_l2_overhead(struct flow_offload_tuple *tuple)
+{
+	int overhead = ETH_HLEN;
+	int i;
+
+	for (i = 0; i < tuple->encap_num; i++) {
+		if (tuple->in_vlan_ingress & BIT(i))
+			continue;
+
+		overhead += flow_offload_encap_hlen(tuple, i);
+	}
+
+	if (tuple->tun_num) {
+		switch (tuple->tun.l3_proto) {
+		case IPPROTO_IPIP:
+			overhead += sizeof(struct iphdr);
+			break;
+		case IPPROTO_IPV6:
+			overhead += sizeof(struct ipv6hdr);
+			break;
+		}
+	}
+
+	return overhead;
+}
+
 static void flow_offload_work_stats(struct flow_offload_work *offload)
 {
+	struct flow_offload_tuple *tuple;
 	struct flow_stats stats[FLOW_OFFLOAD_DIR_MAX] = {};
+	u64 l3_bytes[FLOW_OFFLOAD_DIR_MAX];
+	int l2_overhead;
 	u64 lastused;
+	int i;
 
 	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
 	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
@@ -1022,16 +1149,59 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 	offload->flow->timeout = max_t(u64, offload->flow->timeout,
 				       lastused + flow_offload_get_timeout(offload->flow));
 
+	/* Convert hardware byte counts to L3 based on what the driver
+	 * reports.  Drivers that already report L3 (or do not set
+	 * byte_type) need no conversion.
+	 */
+	for (i = 0; i < FLOW_OFFLOAD_DIR_MAX; i++) {
+		l2_overhead = 0;
+
+		switch (stats[i].byte_type) {
+		case FLOW_STATS_BYTES_INGRESS_L2:
+			tuple = &offload->flow->tuplehash[i].tuple;
+			l2_overhead = flow_offload_input_l2_overhead(tuple);
+			break;
+		case FLOW_STATS_BYTES_EGRESS_L2:
+			tuple = &offload->flow->tuplehash[!i].tuple;
+			l2_overhead = flow_offload_input_l2_overhead(tuple);
+			break;
+		default:
+			break;
+		}
+		l3_bytes[i] = stats[i].bytes - stats[i].pkts * l2_overhead;
+	}
+
 	if (offload->flowtable->flags & NF_FLOWTABLE_COUNTER) {
 		if (stats[0].pkts)
 			nf_ct_acct_add(offload->flow->ct,
 				       FLOW_OFFLOAD_DIR_ORIGINAL,
-				       stats[0].pkts, stats[0].bytes);
+				       stats[0].pkts, l3_bytes[0]);
 		if (stats[1].pkts)
 			nf_ct_acct_add(offload->flow->ct,
 				       FLOW_OFFLOAD_DIR_REPLY,
-				       stats[1].pkts, stats[1].bytes);
+				       stats[1].pkts, l3_bytes[1]);
+	}
+
+	rcu_read_lock();
+	for (i = 0; i < FLOW_OFFLOAD_DIR_MAX; i++) {
+		tuple = &offload->flow->tuplehash[i].tuple;
+		if (!tuple->encap_num)
+			continue;
+
+		/* Input-side encap devices get RX stats */
+		if (stats[i].pkts)
+			flow_offload_update_encap_stats(offload->flow,
+							tuple, true,
+							stats[i].pkts,
+							l3_bytes[i]);
+		/* Same devices get TX stats from the other direction */
+		if (stats[!i].pkts)
+			flow_offload_update_encap_stats(offload->flow,
+							tuple, false,
+							stats[!i].pkts,
+							l3_bytes[!i]);
 	}
+	rcu_read_unlock();
 }
 
 static void flow_offload_work_handler(struct work_struct *work)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC net-next 4/4] net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats
  2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
                   ` (2 preceding siblings ...)
  2026-04-09 13:07 ` [PATCH RFC net-next 3/4] nf_flow_table: convert hw byte counts and update sub-interface stats Daniel Golle
@ 2026-04-09 13:07 ` Daniel Golle
  2026-04-09 13:52 ` [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Pablo Neira Ayuso
  4 siblings, 0 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 13:07 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek, netfilter-devel,
	coreteam

The MediaTek PPE MIB counters report ingress L2 frame bytes
including Ethernet, VLAN and PPPoE headers. Tell the flow offload
framework so it can derive correct L3 byte counts for conntrack
and update sub-interface counters.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 drivers/net/ethernet/mediatek/mtk_ppe_offload.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
index cc8c4ef8038f3..68cb03a193f3f 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -557,6 +557,7 @@ mtk_flow_offload_stats(struct mtk_eth *eth, struct flow_cls_offload *f)
 				  &diff)) {
 		f->stats.pkts += diff.packets;
 		f->stats.bytes += diff.bytes;
+		f->stats.byte_type = FLOW_STATS_BYTES_INGRESS_L2;
 	}
 
 	return 0;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
  2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
                   ` (3 preceding siblings ...)
  2026-04-09 13:07 ` [PATCH RFC net-next 4/4] net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats Daniel Golle
@ 2026-04-09 13:52 ` Pablo Neira Ayuso
  2026-04-09 14:21   ` Daniel Golle
  4 siblings, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2026-04-09 13:52 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Florian Westphal, Phil Sutter, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam

On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> Hardware flow counters report raw byte counts whose semantics
> vary by vendor -- some count ingress L2 frames, others egress
> L2, others L3. The nf_flow_table framework currently passes
> these bytes straight to conntrack without conversion, and
> sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> never see any counter updates at all.

I see, but that is part of the feature itself? Why pretend that these
interface are really seeing traffic while they don't. This aspiration
of trying to do all hardware offload fully transparent (when it is not
the case, not mentioning semantic changes in how packet handling is
done compared to the software plane) does not sound convincing to me.

On top of this, this issue also exists in the software plane: Devices
that are bypasses do not get their counters bumped.

Maybe if this is really a requirement, then this should address the
issue for software too, but is it worth the effort to add
infrastructure for this purpose?

> This series lets drivers declare what their counters represent,
> so the framework can normalize to L3 for conntrack and
> propagate per-layer stats to encap sub-interfaces.
> 
> Questions:
>  - Sub-interface stats accesses vlan_dev_priv() directly --
>    should there be a generic netdev callback instead?
>  - Are there hw offload drivers whose counters do not fit the
>    ingress-L2 / egress-L2 / L3 model?
> 
> Daniel Golle (4):
>   net: flow_offload: let drivers report byte counter semantics
>   nf_flow_table: track sub-interface and bridge ifindex in flow tuple
>   nf_flow_table: convert hw byte counts and update sub-interface stats
>   net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats
> 
>  .../net/ethernet/mediatek/mtk_ppe_offload.c   |   1 +
>  include/net/flow_offload.h                    |   7 +
>  include/net/netfilter/nf_flow_table.h         |   5 +
>  net/netfilter/nf_flow_table_core.c            |   2 +
>  net/netfilter/nf_flow_table_offload.c         | 174 +++++++++++++++++-
>  net/netfilter/nf_flow_table_path.c            |   8 +
>  6 files changed, 195 insertions(+), 2 deletions(-)
> 
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
  2026-04-09 13:52 ` [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Pablo Neira Ayuso
@ 2026-04-09 14:21   ` Daniel Golle
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Golle @ 2026-04-09 14:21 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Florian Westphal, Phil Sutter, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam

On Thu, Apr 09, 2026 at 03:52:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> > Hardware flow counters report raw byte counts whose semantics
> > vary by vendor -- some count ingress L2 frames, others egress
> > L2, others L3. The nf_flow_table framework currently passes
> > these bytes straight to conntrack without conversion, and
> > sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> > never see any counter updates at all.
> 
> I see, but that is part of the feature itself? Why pretend that these
> interface are really seeing traffic while they don't. This aspiration
> of trying to do all hardware offload fully transparent (when it is not
> the case, not mentioning semantic changes in how packet handling is
> done compared to the software plane) does not sound convincing to me.

Please explain what you mean by offloading not being fully
transparent. If the MAC hardware offloads VLAN encap/decap, for
example, we also maintain the counters correctly (it just so happens),
just the flow-offloading case results in a weird overall picture:
hardware interface counters keep increasing, encap interfaces (802.1Q,
PPPoE) don't. That makes it confusing and hard to understand what's
happening when only looking at the interface counters (ie. "what is
all that traffic on my physical WAN interface which isn't PPPoE? Can't
be that all of that is the modems management interface, SNMP, ...")

> 
> On top of this, this issue also exists in the software plane: Devices
> that are bypasses do not get their counters bumped.
> 
> Maybe if this is really a requirement, then this should address the
> issue for software too, but is it worth the effort to add
> infrastructure for this purpose?

To me it would feel more correct to see counters increasing also
for offloaded traffic on software interfaces such as PPPoE or VLAN.

I honestly didn't think about the software fastpath, and yes, I think
it should be addressed there too.

> > This series lets drivers declare what their counters represent,
> > so the framework can normalize to L3 for conntrack and
> > propagate per-layer stats to encap sub-interfaces.

This part could also been seen as an independent fix as currently
conntrack stats for the same traffic differ in case of software
offloading (pure L3 bytes) and hardware offloading (L2 ingress bytes
in case of mtk_ppe).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-09 14:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 13:07 [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Daniel Golle
2026-04-09 13:07 ` [PATCH RFC net-next 1/4] net: flow_offload: let drivers report byte counter semantics Daniel Golle
2026-04-09 13:07 ` [PATCH RFC net-next 2/4] nf_flow_table: track sub-interface and bridge ifindex in flow tuple Daniel Golle
2026-04-09 13:07 ` [PATCH RFC net-next 3/4] nf_flow_table: convert hw byte counts and update sub-interface stats Daniel Golle
2026-04-09 13:07 ` [PATCH RFC net-next 4/4] net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats Daniel Golle
2026-04-09 13:52 ` [PATCH RFC net-next 0/4] improve hw flow offload byte accounting Pablo Neira Ayuso
2026-04-09 14:21   ` Daniel Golle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox