public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next v2 0/2] Update (DSA) netdev stats with offloaded flows
@ 2026-03-24 20:40 Ahmed Zaki
  2026-03-24 20:40 ` [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows Ahmed Zaki
  2026-03-24 20:40 ` [PATCH nf-next v2 2/2] net: dsa: update net_device stats with HW offloaded flows stats Ahmed Zaki
  0 siblings, 2 replies; 5+ messages in thread
From: Ahmed Zaki @ 2026-03-24 20:40 UTC (permalink / raw)
  To: netfilter-devel, andrew, olteanv, pablo, fw, kuba, pabeni,
	edumazet
  Cc: coreteam, netdev

Some devices (notably DSA) delegate the nft flowtable HW offloaded flows
to a parent drivers. The delegating drivers cannot collect or report
the offloaded flows stats since they have no access to the underlying
hardware. This breaks SNMP-based network monitoring systems that rely on
netdev stats to report the network traffic.

Fix by moving the offloaded flow stat reporting to the nft flowtable
subsystem. The first patch adds a new stats field "fstats" to net_device
that is allocated and updated by the nft subsystem only if the new
flag "flow_offload_via_parent" is set by the driver. It also report these
stats back to the user in dev_get_stats().

Patch 2 sets the new flag "flow_offload_via_parent" for the DSA driver.
---
v2: - added the new "net_device->fstats" field since the existing
      tstats cannot be used since it would double-count on devices that
      already use tstats for hardware MIBs (P1).
    - fixed the outdev ifindex logic based. See new func
      "flow_offload_egress_ifidx()" in P1.

Ahmed Zaki (2):
  netfilter: flowtable: update netdev stats with HW_OFFLOAD flows
  net: dsa: update net_device stats with HW offloaded flows stats

 include/linux/netdevice.h             | 45 ++++++++++++
 net/core/dev.c                        |  8 +++
 net/dsa/user.c                        |  1 +
 net/netfilter/nf_flow_table_offload.c | 98 +++++++++++++++++++++++++--
 4 files changed, 146 insertions(+), 6 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows
  2026-03-24 20:40 [PATCH nf-next v2 0/2] Update (DSA) netdev stats with offloaded flows Ahmed Zaki
@ 2026-03-24 20:40 ` Ahmed Zaki
  2026-03-24 21:28   ` Pablo Neira Ayuso
  2026-03-24 20:40 ` [PATCH nf-next v2 2/2] net: dsa: update net_device stats with HW offloaded flows stats Ahmed Zaki
  1 sibling, 1 reply; 5+ messages in thread
From: Ahmed Zaki @ 2026-03-24 20:40 UTC (permalink / raw)
  To: netfilter-devel, andrew, olteanv, pablo, fw, kuba, pabeni,
	edumazet
  Cc: coreteam, netdev

Some drivers (notably DSA) delegate the nft flowtable HW_OFFLOAD flows
to a parent driver. While the parent driver is able to report the
offloaded traffic stats directly from the HW, the delegating driver
does not report the stats. This fails SNMP-based monitoring tools that
rely on netdev stats to report the network traffic.

Add a new struct pcpu_sw_netstats "fstats" to net_device that gets
allocated only if the new flag "flow_offload_via_parent" is set by the
driver. The new stats are lazily allocated by the nft flow offloading
code when the first flow is offloaded. The stats are updated periodically
in flow_offload_work_stats() and also once in flow_offload_work_del()
before the flow is deleted. For this, flow_offload_work_del() had to
be moved below flow_offload_tuple_stats().

Signed-off-by: Ahmed Zaki <anzaki@gmail.com>
---
 include/linux/netdevice.h             | 45 ++++++++++++
 net/core/dev.c                        |  8 +++
 net/netfilter/nf_flow_table_offload.c | 98 +++++++++++++++++++++++++--
 3 files changed, 145 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 67e25f6d15a4..647758f78213 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1840,6 +1840,11 @@ enum netdev_reg_state {
  *	@stats:		Statistics struct, which was left as a legacy, use
  *			rtnl_link_stats64 instead
  *
+ *	@fstats:	HW offloaded flow statistics: RX/TX packets,
+ *			RX/TX bytes. Lazily allocated by the flow offload
+ *			path on the first offloaded flow for devices that
+ *			set @flow_offload_via_parent. Freed by free_netdev().
+ *
  *	@core_stats:	core networking counters,
  *			do not use this in drivers
  *	@carrier_up_count:	Number of times the carrier has been up
@@ -2048,6 +2053,12 @@ enum netdev_reg_state {
  *	@change_proto_down: device supports setting carrier via IFLA_PROTO_DOWN
  *	@netns_immutable: interface can't change network namespaces
  *	@fcoe_mtu:	device supports maximum FCoE MTU, 2158 bytes
+ *	@flow_offload_via_parent: device delegates nft flowtable hardware
+ *				  offload to a parent/conduit device (e.g. DSA
+ *				  user ports delegate to their conduit MAC).
+ *				  The parent's HW count the offloaded traffic
+ *				  but this device's sw netstats path does not.
+ *				  @fstats is allocated to fill that gap.
  *
  *	@net_notifier_list:	List of per-net netdev notifier block
  *				that follow this device when it is moved
@@ -2233,6 +2244,7 @@ struct net_device {
 
 	struct net_device_stats	stats; /* not used by modern drivers */
 
+	struct pcpu_sw_netstats __percpu *fstats;
 	struct net_device_core_stats __percpu *core_stats;
 
 	/* Stats to monitor link on/off, flapping */
@@ -2463,6 +2475,7 @@ struct net_device {
 	unsigned long		change_proto_down:1;
 	unsigned long		netns_immutable:1;
 	unsigned long		fcoe_mtu:1;
+	unsigned long		flow_offload_via_parent:1;
 
 	struct list_head	net_notifier_list;
 
@@ -2992,6 +3005,38 @@ struct pcpu_lstats {
 
 void dev_lstats_read(struct net_device *dev, u64 *packets, u64 *bytes);
 
+static inline void dev_fstats_rx_add(struct net_device *dev,
+				     unsigned int packets,
+				     unsigned int len)
+{
+	struct pcpu_sw_netstats *fstats;
+
+	if (!dev->fstats)
+		return;
+
+	fstats = this_cpu_ptr(dev->fstats);
+	u64_stats_update_begin(&fstats->syncp);
+	u64_stats_add(&fstats->rx_bytes, len);
+	u64_stats_add(&fstats->rx_packets, packets);
+	u64_stats_update_end(&fstats->syncp);
+}
+
+static inline void dev_fstats_tx_add(struct net_device *dev,
+				     unsigned int packets,
+				     unsigned int len)
+{
+	struct pcpu_sw_netstats *fstats;
+
+	if (!dev->fstats)
+		return;
+
+	fstats = this_cpu_ptr(dev->fstats);
+	u64_stats_update_begin(&fstats->syncp);
+	u64_stats_add(&fstats->tx_bytes, len);
+	u64_stats_add(&fstats->tx_packets, packets);
+	u64_stats_update_end(&fstats->syncp);
+}
+
 static inline void dev_sw_netstats_rx_add(struct net_device *dev, unsigned int len)
 {
 	struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
diff --git a/net/core/dev.c b/net/core/dev.c
index f48dc299e4b2..07fb315ad42c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11865,6 +11865,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	const struct net_device_core_stats __percpu *p;
+	const struct pcpu_sw_netstats __percpu *fstats;
 
 	/*
 	 * IPv{4,6} and udp tunnels share common stat helpers and use
@@ -11893,6 +11894,11 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 		netdev_stats_to_stats64(storage, &dev->stats);
 	}
 
+	/* This READ_ONCE() pairs with cmpxchg in flow_offload_fstats_ensure() */
+	fstats = READ_ONCE(dev->fstats);
+	if (fstats)
+		dev_fetch_sw_netstats(storage, fstats);
+
 	/* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
 	p = READ_ONCE(dev->core_stats);
 	if (p) {
@@ -12212,6 +12218,8 @@ void free_netdev(struct net_device *dev)
 	free_percpu(dev->pcpu_refcnt);
 	dev->pcpu_refcnt = NULL;
 #endif
+	free_percpu(dev->fstats);
+	dev->fstats = NULL;
 	free_percpu(dev->core_stats);
 	dev->core_stats = NULL;
 	free_percpu(dev->xdp_bulkq);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index b2e4fb6fa011..fc1e67a79904 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -925,13 +925,80 @@ static void flow_offload_work_add(struct flow_offload_work *offload)
 	nf_flow_offload_destroy(flow_rule);
 }
 
-static void flow_offload_work_del(struct flow_offload_work *offload)
+static bool flow_offload_fstats_ensure(struct net_device *dev)
 {
-	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
-	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
-	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
-		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
-	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
+	struct pcpu_sw_netstats __percpu *p;
+
+	if (!dev->flow_offload_via_parent)
+		return false;
+
+	/* Pairs with cmpxchg() below. */
+	if (likely(READ_ONCE(dev->fstats)))
+		return true;
+
+	p = __netdev_alloc_pcpu_stats(struct pcpu_sw_netstats, GFP_ATOMIC);
+	if (!p)
+		return false;
+
+	if (cmpxchg(&dev->fstats, NULL, p))
+		free_percpu(p);	/* lost the race, discard and use winner's */
+
+	return true;
+}
+
+static u32 flow_offload_egress_ifidx(const struct flow_offload_tuple *tuple)
+{
+	switch (tuple->xmit_type) {
+	case FLOW_OFFLOAD_XMIT_NEIGH:
+		return tuple->ifidx;
+	case FLOW_OFFLOAD_XMIT_DIRECT:
+		return tuple->out.ifidx;
+	default:
+		return 0;
+	}
+}
+
+static void flow_offload_netdev_update(struct flow_offload_work *offload,
+				       struct flow_stats *stats)
+{
+	const struct flow_offload_tuple *tuple;
+	struct net_device *indev, *outdev;
+	struct net *net;
+
+	rcu_read_lock();
+	net = read_pnet(&offload->flowtable->net);
+	if (stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts) {
+		tuple = &offload->flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple;
+		indev = dev_get_by_index_rcu(net, tuple->iifidx);
+		if (indev && flow_offload_fstats_ensure(indev))
+			dev_fstats_rx_add(indev,
+					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts,
+					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].bytes);
+
+		outdev = dev_get_by_index_rcu(net,
+					      flow_offload_egress_ifidx(tuple));
+		if (outdev && flow_offload_fstats_ensure(outdev))
+			dev_fstats_tx_add(outdev,
+					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts,
+					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].bytes);
+	}
+
+	if (stats[FLOW_OFFLOAD_DIR_REPLY].pkts) {
+		tuple = &offload->flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple;
+		indev = dev_get_by_index_rcu(net, tuple->iifidx);
+		if (indev && flow_offload_fstats_ensure(indev))
+			dev_fstats_rx_add(indev,
+					  stats[FLOW_OFFLOAD_DIR_REPLY].pkts,
+					  stats[FLOW_OFFLOAD_DIR_REPLY].bytes);
+
+		outdev = dev_get_by_index_rcu(net,
+					      flow_offload_egress_ifidx(tuple));
+		if (outdev && flow_offload_fstats_ensure(outdev))
+			dev_fstats_tx_add(outdev,
+					  stats[FLOW_OFFLOAD_DIR_REPLY].pkts,
+					  stats[FLOW_OFFLOAD_DIR_REPLY].bytes);
+	}
+	rcu_read_unlock();
 }
 
 static void flow_offload_tuple_stats(struct flow_offload_work *offload,
@@ -968,6 +1035,25 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 				       FLOW_OFFLOAD_DIR_REPLY,
 				       stats[1].pkts, stats[1].bytes);
 	}
+
+	flow_offload_netdev_update(offload, stats);
+}
+
+static void flow_offload_work_del(struct flow_offload_work *offload)
+{
+	struct flow_stats stats[FLOW_OFFLOAD_DIR_MAX] = {};
+
+	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY,
+					 &stats[1]);
+	flow_offload_netdev_update(offload, stats);
+
+	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
+	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
+	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
+		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
+	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
 }
 
 static void flow_offload_work_handler(struct work_struct *work)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH nf-next v2 2/2] net: dsa: update net_device stats with HW offloaded flows stats
  2026-03-24 20:40 [PATCH nf-next v2 0/2] Update (DSA) netdev stats with offloaded flows Ahmed Zaki
  2026-03-24 20:40 ` [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows Ahmed Zaki
@ 2026-03-24 20:40 ` Ahmed Zaki
  1 sibling, 0 replies; 5+ messages in thread
From: Ahmed Zaki @ 2026-03-24 20:40 UTC (permalink / raw)
  To: netfilter-devel, andrew, olteanv, pablo, fw, kuba, pabeni,
	edumazet
  Cc: coreteam, netdev

Set the new net_device flag flow_offload_via_parent to update the
user net_device TX/RX stats with the HW offloaded traffic.

Signed-off-by: Ahmed Zaki <anzaki@gmail.com>
---
 net/dsa/user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dsa/user.c b/net/dsa/user.c
index c4bd6fe90b45..331bc581bce7 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -2807,6 +2807,7 @@ int dsa_user_create(struct dsa_port *port)
 
 	p = netdev_priv(user_dev);
 	user_dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
+	user_dev->flow_offload_via_parent = true;
 
 	ret = gro_cells_init(&p->gcells, user_dev);
 	if (ret)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows
  2026-03-24 20:40 ` [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows Ahmed Zaki
@ 2026-03-24 21:28   ` Pablo Neira Ayuso
  2026-03-24 23:27     ` Ahmed Zaki
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-24 21:28 UTC (permalink / raw)
  To: Ahmed Zaki
  Cc: netfilter-devel, andrew, olteanv, fw, kuba, pabeni, edumazet,
	coreteam, netdev

On Tue, Mar 24, 2026 at 02:40:15PM -0600, Ahmed Zaki wrote:
> Some drivers (notably DSA) delegate the nft flowtable HW_OFFLOAD flows
> to a parent driver. While the parent driver is able to report the
> offloaded traffic stats directly from the HW, the delegating driver
> does not report the stats. This fails SNMP-based monitoring tools that
> rely on netdev stats to report the network traffic.
> 
> Add a new struct pcpu_sw_netstats "fstats" to net_device that gets
> allocated only if the new flag "flow_offload_via_parent" is set by the
> driver. The new stats are lazily allocated by the nft flow offloading
> code when the first flow is offloaded. The stats are updated periodically
> in flow_offload_work_stats() and also once in flow_offload_work_del()
> before the flow is deleted. For this, flow_offload_work_del() had to
> be moved below flow_offload_tuple_stats().

Hm, I think v1 was a simpler approach... except that you innecesarily
modified a lot of callsites as Jakub pointed out. I don't see why you
need this new callback for netdev_ops.

> Signed-off-by: Ahmed Zaki <anzaki@gmail.com>
> ---
>  include/linux/netdevice.h             | 45 ++++++++++++
>  net/core/dev.c                        |  8 +++
>  net/netfilter/nf_flow_table_offload.c | 98 +++++++++++++++++++++++++--
>  3 files changed, 145 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 67e25f6d15a4..647758f78213 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1840,6 +1840,11 @@ enum netdev_reg_state {
>   *	@stats:		Statistics struct, which was left as a legacy, use
>   *			rtnl_link_stats64 instead
>   *
> + *	@fstats:	HW offloaded flow statistics: RX/TX packets,
> + *			RX/TX bytes. Lazily allocated by the flow offload
> + *			path on the first offloaded flow for devices that
> + *			set @flow_offload_via_parent. Freed by free_netdev().
> + *
>   *	@core_stats:	core networking counters,
>   *			do not use this in drivers
>   *	@carrier_up_count:	Number of times the carrier has been up
> @@ -2048,6 +2053,12 @@ enum netdev_reg_state {
>   *	@change_proto_down: device supports setting carrier via IFLA_PROTO_DOWN
>   *	@netns_immutable: interface can't change network namespaces
>   *	@fcoe_mtu:	device supports maximum FCoE MTU, 2158 bytes
> + *	@flow_offload_via_parent: device delegates nft flowtable hardware
> + *				  offload to a parent/conduit device (e.g. DSA
> + *				  user ports delegate to their conduit MAC).
> + *				  The parent's HW count the offloaded traffic
> + *				  but this device's sw netstats path does not.
> + *				  @fstats is allocated to fill that gap.
>   *
>   *	@net_notifier_list:	List of per-net netdev notifier block
>   *				that follow this device when it is moved
> @@ -2233,6 +2244,7 @@ struct net_device {
>  
>  	struct net_device_stats	stats; /* not used by modern drivers */
>  
> +	struct pcpu_sw_netstats __percpu *fstats;
>  	struct net_device_core_stats __percpu *core_stats;
>  
>  	/* Stats to monitor link on/off, flapping */
> @@ -2463,6 +2475,7 @@ struct net_device {
>  	unsigned long		change_proto_down:1;
>  	unsigned long		netns_immutable:1;
>  	unsigned long		fcoe_mtu:1;
> +	unsigned long		flow_offload_via_parent:1;
>  
>  	struct list_head	net_notifier_list;
>  
> @@ -2992,6 +3005,38 @@ struct pcpu_lstats {
>  
>  void dev_lstats_read(struct net_device *dev, u64 *packets, u64 *bytes);
>  
> +static inline void dev_fstats_rx_add(struct net_device *dev,
> +				     unsigned int packets,
> +				     unsigned int len)
> +{
> +	struct pcpu_sw_netstats *fstats;
> +
> +	if (!dev->fstats)
> +		return;
> +
> +	fstats = this_cpu_ptr(dev->fstats);
> +	u64_stats_update_begin(&fstats->syncp);
> +	u64_stats_add(&fstats->rx_bytes, len);
> +	u64_stats_add(&fstats->rx_packets, packets);
> +	u64_stats_update_end(&fstats->syncp);
> +}
> +
> +static inline void dev_fstats_tx_add(struct net_device *dev,
> +				     unsigned int packets,
> +				     unsigned int len)
> +{
> +	struct pcpu_sw_netstats *fstats;
> +
> +	if (!dev->fstats)
> +		return;
> +
> +	fstats = this_cpu_ptr(dev->fstats);
> +	u64_stats_update_begin(&fstats->syncp);
> +	u64_stats_add(&fstats->tx_bytes, len);
> +	u64_stats_add(&fstats->tx_packets, packets);
> +	u64_stats_update_end(&fstats->syncp);
> +}
> +
>  static inline void dev_sw_netstats_rx_add(struct net_device *dev, unsigned int len)
>  {
>  	struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f48dc299e4b2..07fb315ad42c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -11865,6 +11865,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
>  {
>  	const struct net_device_ops *ops = dev->netdev_ops;
>  	const struct net_device_core_stats __percpu *p;
> +	const struct pcpu_sw_netstats __percpu *fstats;
>  
>  	/*
>  	 * IPv{4,6} and udp tunnels share common stat helpers and use
> @@ -11893,6 +11894,11 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
>  		netdev_stats_to_stats64(storage, &dev->stats);
>  	}
>  
> +	/* This READ_ONCE() pairs with cmpxchg in flow_offload_fstats_ensure() */
> +	fstats = READ_ONCE(dev->fstats);
> +	if (fstats)
> +		dev_fetch_sw_netstats(storage, fstats);
> +
>  	/* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>  	p = READ_ONCE(dev->core_stats);
>  	if (p) {
> @@ -12212,6 +12218,8 @@ void free_netdev(struct net_device *dev)
>  	free_percpu(dev->pcpu_refcnt);
>  	dev->pcpu_refcnt = NULL;
>  #endif
> +	free_percpu(dev->fstats);
> +	dev->fstats = NULL;
>  	free_percpu(dev->core_stats);
>  	dev->core_stats = NULL;
>  	free_percpu(dev->xdp_bulkq);
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index b2e4fb6fa011..fc1e67a79904 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -925,13 +925,80 @@ static void flow_offload_work_add(struct flow_offload_work *offload)
>  	nf_flow_offload_destroy(flow_rule);
>  }
>  
> -static void flow_offload_work_del(struct flow_offload_work *offload)
> +static bool flow_offload_fstats_ensure(struct net_device *dev)
>  {
> -	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
> -	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
> -	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
> -		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
> -	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
> +	struct pcpu_sw_netstats __percpu *p;
> +
> +	if (!dev->flow_offload_via_parent)
> +		return false;
> +
> +	/* Pairs with cmpxchg() below. */
> +	if (likely(READ_ONCE(dev->fstats)))
> +		return true;
> +
> +	p = __netdev_alloc_pcpu_stats(struct pcpu_sw_netstats, GFP_ATOMIC);
> +	if (!p)
> +		return false;
> +
> +	if (cmpxchg(&dev->fstats, NULL, p))
> +		free_percpu(p);	/* lost the race, discard and use winner's */
> +
> +	return true;
> +}
> +
> +static u32 flow_offload_egress_ifidx(const struct flow_offload_tuple *tuple)
> +{
> +	switch (tuple->xmit_type) {
> +	case FLOW_OFFLOAD_XMIT_NEIGH:
> +		return tuple->ifidx;
> +	case FLOW_OFFLOAD_XMIT_DIRECT:
> +		return tuple->out.ifidx;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static void flow_offload_netdev_update(struct flow_offload_work *offload,
> +				       struct flow_stats *stats)
> +{
> +	const struct flow_offload_tuple *tuple;
> +	struct net_device *indev, *outdev;
> +	struct net *net;
> +
> +	rcu_read_lock();
> +	net = read_pnet(&offload->flowtable->net);
> +	if (stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts) {
> +		tuple = &offload->flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple;
> +		indev = dev_get_by_index_rcu(net, tuple->iifidx);
> +		if (indev && flow_offload_fstats_ensure(indev))
> +			dev_fstats_rx_add(indev,
> +					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts,
> +					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].bytes);
> +
> +		outdev = dev_get_by_index_rcu(net,
> +					      flow_offload_egress_ifidx(tuple));
> +		if (outdev && flow_offload_fstats_ensure(outdev))
> +			dev_fstats_tx_add(outdev,
> +					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].pkts,
> +					  stats[FLOW_OFFLOAD_DIR_ORIGINAL].bytes);
> +	}
> +
> +	if (stats[FLOW_OFFLOAD_DIR_REPLY].pkts) {
> +		tuple = &offload->flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple;
> +		indev = dev_get_by_index_rcu(net, tuple->iifidx);
> +		if (indev && flow_offload_fstats_ensure(indev))
> +			dev_fstats_rx_add(indev,
> +					  stats[FLOW_OFFLOAD_DIR_REPLY].pkts,
> +					  stats[FLOW_OFFLOAD_DIR_REPLY].bytes);
> +
> +		outdev = dev_get_by_index_rcu(net,
> +					      flow_offload_egress_ifidx(tuple));
> +		if (outdev && flow_offload_fstats_ensure(outdev))
> +			dev_fstats_tx_add(outdev,
> +					  stats[FLOW_OFFLOAD_DIR_REPLY].pkts,
> +					  stats[FLOW_OFFLOAD_DIR_REPLY].bytes);
> +	}
> +	rcu_read_unlock();
>  }
>  
>  static void flow_offload_tuple_stats(struct flow_offload_work *offload,
> @@ -968,6 +1035,25 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
>  				       FLOW_OFFLOAD_DIR_REPLY,
>  				       stats[1].pkts, stats[1].bytes);
>  	}
> +
> +	flow_offload_netdev_update(offload, stats);
> +}
> +
> +static void flow_offload_work_del(struct flow_offload_work *offload)
> +{
> +	struct flow_stats stats[FLOW_OFFLOAD_DIR_MAX] = {};
> +
> +	flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]);
> +	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
> +		flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY,
> +					 &stats[1]);
> +	flow_offload_netdev_update(offload, stats);
> +
> +	clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
> +	flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
> +	if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags))
> +		flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
> +	set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
>  }
>  
>  static void flow_offload_work_handler(struct work_struct *work)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows
  2026-03-24 21:28   ` Pablo Neira Ayuso
@ 2026-03-24 23:27     ` Ahmed Zaki
  0 siblings, 0 replies; 5+ messages in thread
From: Ahmed Zaki @ 2026-03-24 23:27 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, andrew, olteanv, fw, kuba, pabeni, edumazet,
	coreteam, netdev

On Tue, Mar 24, 2026 at 3:28 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Tue, Mar 24, 2026 at 02:40:15PM -0600, Ahmed Zaki wrote:
> > Some drivers (notably DSA) delegate the nft flowtable HW_OFFLOAD flows
> > to a parent driver. While the parent driver is able to report the
> > offloaded traffic stats directly from the HW, the delegating driver
> > does not report the stats. This fails SNMP-based monitoring tools that
> > rely on netdev stats to report the network traffic.
> >
> > Add a new struct pcpu_sw_netstats "fstats" to net_device that gets
> > allocated only if the new flag "flow_offload_via_parent" is set by the
> > driver. The new stats are lazily allocated by the nft flow offloading
> > code when the first flow is offloaded. The stats are updated periodically
> > in flow_offload_work_stats() and also once in flow_offload_work_del()
> > before the flow is deleted. For this, flow_offload_work_del() had to
> > be moved below flow_offload_tuple_stats().
>
> Hm, I think v1 was a simpler approach... except that you innecesarily
> modified a lot of callsites as Jakub pointed out. I don't see why you
> need this new callback for netdev_ops.
>

No new netdev_op, but I added the new stats field (fstats) since having a
separate field and flag guarantees that no other drivers gets broken and
no other drivers (that already count offloaded stats) need to be touched.

Looking at some drivers, tstats seems to be sometimes used by H/W drivers
and sometimes not, which just complicates the code semantics if we also use it
for "H/W flow offloaded" stats. Some of these drivers already read H/W
MIB registers.
So I think there is no way around adding a new flag to not break something.

But more importantly, I am not sure if there is no driver that sets
NETDEV_PCPU_STAT_TSTATS
**and** report its own offloaded flow stats. If we re-use tsats
for these, we will double count the offloaded traffic unless we set some
flag so that nft does not double-count.

Also, any driver that already counts offloaded flow traffic and uses
NETDEV_PCPU_STAT_DSTATS will break if we use tstas (union with dstats).

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-24 23:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 20:40 [PATCH nf-next v2 0/2] Update (DSA) netdev stats with offloaded flows Ahmed Zaki
2026-03-24 20:40 ` [PATCH nf-next v2 1/2] netfilter: flowtable: update netdev stats with HW_OFFLOAD flows Ahmed Zaki
2026-03-24 21:28   ` Pablo Neira Ayuso
2026-03-24 23:27     ` Ahmed Zaki
2026-03-24 20:40 ` [PATCH nf-next v2 2/2] net: dsa: update net_device stats with HW offloaded flows stats Ahmed Zaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox