Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf 1/2] bpf: Fix partial copy of non-linear skb test_run output
From: Paul Chaignon @ 2026-06-15 13:39 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah
In-Reply-To: <20260615073856.152479-2-sun.jian.kdev@gmail.com>

On Mon, Jun 15, 2026 at 03:38:55PM +0800, Sun Jian wrote:
> For non-linear skbs, bpf_test_finish() derives the linear head copy
> length from copy_size - frag_size. This only matches the skb head length
> when copy_size is the full packet size.
> 
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.

Thanks for fixing this!

> 
> Compute the linear head length from the skb layout instead, and clamp the
> head copy length to copy_size. This preserves the expected partial-copy
> semantics: return -ENOSPC, copy the packet prefix that fits in data_out,
> and report the full packet length through data_size_out.
> 
> Fixes: 838baa351cee ("bpf: Craft non-linear skbs in BPF_PROG_TEST_RUN")

Wouldn't this bug actually go back to 7855e0db150ad ("bpf: test_run:
add xdp_shared_info pointer in bpf_test_finish signature") and also
affect the XDP bpf_prog_test_run_xdp()? If so, could you also add a
selftest that covers it for XDP?

> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  net/bpf/test_run.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..976e8fa31bc9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>  	}
>  
>  	if (data_out) {
> -		int len = sinfo ? copy_size - frag_size : copy_size;
> -
> -		if (len < 0) {
> -			err = -ENOSPC;
> -			goto out;
> -		}
> +		u32 head_len = size - frag_size;
> +		u32 len = min(copy_size, head_len);
>  
>  		if (copy_to_user(data_out, data, len))
>  			goto out;
>  
>  		if (sinfo) {
> -			int i, offset = len;
> +			u32 offset = len;
>  			u32 data_len;
> +			int i;
>  
>  			for (i = 0; i < sinfo->nr_frags; i++) {
>  				skb_frag_t *frag = &sinfo->frags[i];
> -- 
> 2.43.0
> 

^ permalink raw reply

* [PATCH nf-next v2 1/6] netfilter: nf_nat_ftp: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_ftp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
index c92a436d9c48..ab714629e2b1 100644
--- a/net/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -69,7 +69,7 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       struct nf_conntrack_expect *exp)
 {
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn *ct = exp->master;
 	char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN];
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 0/6] netfilter: replace u_int*_t with kernel int types
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev

Hi all! This is my first patch series of many, I hope :)
I'd like to start contributing by helping out with janitor work,
standardizing code and cleaning up.

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files.

u_int*_t appears in many other files, 48 more to be precise, but I wanted
to keep this series small, unless advised otherwise.

No functional changes.

Changes in v2:
- addresses sashiko comments https://sashiko.dev/#/patchset/32368
  - nf_sockopt: update function prototypes and struct definitions
  - nf_log: update the corresponding function declarations and the 
    nf_logfn typedef
- link to v1: https://lore.kernel.org/all/20260612125146.75672-1-carlos@carlosgrillet.me

Carlos Grillet (6):
  netfilter: nf_nat_ftp: replace u_int16_t with u16
  netfilter: nf_nat_irc: replace u_int16_t with u16
  netfilter: nf_sockopt: replace u_int8_t with u8
  netfilter: xt_DSCP: replace u_int8_t with u8
  netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16
  netfilter: nf_log: replace u_int8_t with u8

 include/linux/netfilter.h      |  6 +++---
 include/net/netfilter/nf_log.h | 16 ++++++++--------
 net/netfilter/nf_log.c         | 14 +++++++-------
 net/netfilter/nf_nat_ftp.c     |  2 +-
 net/netfilter/nf_nat_irc.c     |  2 +-
 net/netfilter/nf_sockopt.c     |  8 ++++----
 net/netfilter/xt_DSCP.c        |  8 ++++----
 net/netfilter/xt_TCPOPTSTRIP.c |  8 ++++----
 8 files changed, 32 insertions(+), 32 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH nf-next v2 3/6] netfilter: nf_sockopt: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8, update prototype
and struct definition.

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/linux/netfilter.h  | 6 +++---
 net/netfilter/nf_sockopt.c | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index efbbfa770d66..91b68bdba3f5 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -181,7 +181,7 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
 struct nf_sockopt_ops {
 	struct list_head list;
 
-	u_int8_t pf;
+	u8 pf;
 
 	/* Non-inclusive ranges: use 0/0/NULL to never get called. */
 	int set_optmin;
@@ -357,9 +357,9 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 }
 
 /* Call setsockopt() */
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int optval, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int optval, sockptr_t opt,
 		  unsigned int len);
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int optval, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int optval, char __user *opt,
 		  int *len);
 
 struct flowi;
diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index 34afcd03b6f6..19a1d028158c 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -59,8 +59,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg)
 }
 EXPORT_SYMBOL(nf_unregister_sockopt);
 
-static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
-		int val, int get)
+static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u8 pf,
+					      int val, int get)
 {
 	struct nf_sockopt_ops *ops;
 
@@ -89,7 +89,7 @@ static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
 	return ops;
 }
 
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int val, sockptr_t opt,
 		  unsigned int len)
 {
 	struct nf_sockopt_ops *ops;
@@ -104,7 +104,7 @@ int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
 }
 EXPORT_SYMBOL(nf_setsockopt);
 
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int val, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int val, char __user *opt,
 		  int *len)
 {
 	struct nf_sockopt_ops *ops;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 2/6] netfilter: nf_nat_irc: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int16_t with preferred kernel type u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_irc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c
index 19c4fcc60c50..14b79cb0171b 100644
--- a/net/netfilter/nf_nat_irc.c
+++ b/net/netfilter/nf_nat_irc.c
@@ -39,7 +39,7 @@ static unsigned int help(struct sk_buff *skb,
 	char buffer[sizeof("4294967296 65635")];
 	struct nf_conn *ct = exp->master;
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 
 	/* Reply comes from server. */
 	newaddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 5/6] netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t/u_int16_t with preferred kernel types u8/u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/xt_TCPOPTSTRIP.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_TCPOPTSTRIP.c b/net/netfilter/xt_TCPOPTSTRIP.c
index 93f064306901..265d21697847 100644
--- a/net/netfilter/xt_TCPOPTSTRIP.c
+++ b/net/netfilter/xt_TCPOPTSTRIP.c
@@ -16,7 +16,7 @@
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter/xt_TCPOPTSTRIP.h>
 
-static inline unsigned int optlen(const u_int8_t *opt, unsigned int offset)
+static inline unsigned int optlen(const u8 *opt, unsigned int offset)
 {
 	/* Beware zero-length options: make finite progress */
 	if (opt[offset] <= TCPOPT_NOP || opt[offset+1] == 0)
@@ -33,8 +33,8 @@ tcpoptstrip_mangle_packet(struct sk_buff *skb,
 	const struct xt_tcpoptstrip_target_info *info = par->targinfo;
 	struct tcphdr *tcph, _th;
 	unsigned int optl, i, j;
-	u_int16_t n, o;
-	u_int8_t *opt;
+	u16 n, o;
+	u8 *opt;
 	int tcp_hdrlen;
 
 	/* This is a fragment, no TCP header is available */
@@ -97,7 +97,7 @@ tcpoptstrip_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	int tcphoff;
-	u_int8_t nexthdr;
+	u8 nexthdr;
 	__be16 frag_off;
 
 	nexthdr = ipv6h->nexthdr;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 4/6] netfilter: xt_DSCP: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/xt_DSCP.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_DSCP.c b/net/netfilter/xt_DSCP.c
index cfa44515ab72..76231e1dc5b5 100644
--- a/net/netfilter/xt_DSCP.c
+++ b/net/netfilter/xt_DSCP.c
@@ -30,7 +30,7 @@ static unsigned int
 dscp_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_DSCP_info *dinfo = par->targinfo;
-	u_int8_t dscp = ipv4_get_dsfield(ip_hdr(skb)) >> XT_DSCP_SHIFT;
+	u8 dscp = ipv4_get_dsfield(ip_hdr(skb)) >> XT_DSCP_SHIFT;
 
 	if (dscp != dinfo->dscp) {
 		if (skb_ensure_writable(skb, sizeof(struct iphdr)))
@@ -47,7 +47,7 @@ static unsigned int
 dscp_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_DSCP_info *dinfo = par->targinfo;
-	u_int8_t dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> XT_DSCP_SHIFT;
+	u8 dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> XT_DSCP_SHIFT;
 
 	if (dscp != dinfo->dscp) {
 		if (skb_ensure_writable(skb, sizeof(struct ipv6hdr)))
@@ -73,7 +73,7 @@ tos_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tos_target_info *info = par->targinfo;
 	struct iphdr *iph = ip_hdr(skb);
-	u_int8_t orig, nv;
+	u8 orig, nv;
 
 	orig = ipv4_get_dsfield(iph);
 	nv   = (orig & ~info->tos_mask) ^ info->tos_value;
@@ -93,7 +93,7 @@ tos_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tos_target_info *info = par->targinfo;
 	struct ipv6hdr *iph = ipv6_hdr(skb);
-	u_int8_t orig, nv;
+	u8 orig, nv;
 
 	orig = ipv6_get_dsfield(iph);
 	nv   = (orig & ~info->tos_mask) ^ info->tos_value;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 6/6] netfilter: nf_log: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8 and update typedef
and declaration in include/net/netfilter/nf_log.h

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/net/netfilter/nf_log.h | 16 ++++++++--------
 net/netfilter/nf_log.c         | 14 +++++++-------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/net/netfilter/nf_log.h b/include/net/netfilter/nf_log.h
index 00506792a06d..cff636f29f45 100644
--- a/include/net/netfilter/nf_log.h
+++ b/include/net/netfilter/nf_log.h
@@ -37,7 +37,7 @@ struct nf_loginfo {
 };
 
 typedef void nf_logfn(struct net *net,
-		      u_int8_t pf,
+		      u8 pf,
 		      unsigned int hooknum,
 		      const struct sk_buff *skb,
 		      const struct net_device *in,
@@ -56,18 +56,18 @@ struct nf_logger {
 extern int sysctl_nf_log_all_netns;
 
 /* Function to register/unregister log function. */
-int nf_log_register(u_int8_t pf, struct nf_logger *logger);
+int nf_log_register(u8 pf, struct nf_logger *logger);
 void nf_log_unregister(struct nf_logger *logger);
 
 /* Check if any logger is registered for a given protocol family. */
-bool nf_log_is_registered(u_int8_t pf);
+bool nf_log_is_registered(u8 pf);
 
-int nf_log_set(struct net *net, u_int8_t pf, const struct nf_logger *logger);
+int nf_log_set(struct net *net, u8 pf, const struct nf_logger *logger);
 void nf_log_unset(struct net *net, const struct nf_logger *logger);
 
-int nf_log_bind_pf(struct net *net, u_int8_t pf,
+int nf_log_bind_pf(struct net *net, u8 pf,
 		   const struct nf_logger *logger);
-void nf_log_unbind_pf(struct net *net, u_int8_t pf);
+void nf_log_unbind_pf(struct net *net, u8 pf);
 
 int nf_logger_find_get(int pf, enum nf_log_type type);
 void nf_logger_put(int pf, enum nf_log_type type);
@@ -78,7 +78,7 @@ void nf_logger_put(int pf, enum nf_log_type type);
 /* Calls the registered backend logging function */
 __printf(8, 9)
 void nf_log_packet(struct net *net,
-		   u_int8_t pf,
+		   u8 pf,
 		   unsigned int hooknum,
 		   const struct sk_buff *skb,
 		   const struct net_device *in,
@@ -88,7 +88,7 @@ void nf_log_packet(struct net *net,
 
 __printf(8, 9)
 void nf_log_trace(struct net *net,
-		  u_int8_t pf,
+		  u8 pf,
 		  unsigned int hooknum,
 		  const struct sk_buff *skb,
 		  const struct net_device *in,
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index f4d80654dfe6..978e082a91b5 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -42,7 +42,7 @@ static struct nf_logger *__find_logger(int pf, const char *str_logger)
 	return NULL;
 }
 
-int nf_log_set(struct net *net, u_int8_t pf, const struct nf_logger *logger)
+int nf_log_set(struct net *net, u8 pf, const struct nf_logger *logger)
 {
 	const struct nf_logger *log;
 
@@ -76,7 +76,7 @@ void nf_log_unset(struct net *net, const struct nf_logger *logger)
 EXPORT_SYMBOL(nf_log_unset);
 
 /* return EEXIST if the same logger is registered, 0 on success. */
-int nf_log_register(u_int8_t pf, struct nf_logger *logger)
+int nf_log_register(u8 pf, struct nf_logger *logger)
 {
 	int i;
 	int ret = 0;
@@ -133,7 +133,7 @@ EXPORT_SYMBOL(nf_log_unregister);
  *
  * Returns: true if at least one logger is active for @pf, false otherwise.
  */
-bool nf_log_is_registered(u_int8_t pf)
+bool nf_log_is_registered(u8 pf)
 {
 	int i;
 
@@ -151,7 +151,7 @@ bool nf_log_is_registered(u_int8_t pf)
 }
 EXPORT_SYMBOL(nf_log_is_registered);
 
-int nf_log_bind_pf(struct net *net, u_int8_t pf,
+int nf_log_bind_pf(struct net *net, u8 pf,
 		   const struct nf_logger *logger)
 {
 	if (pf >= ARRAY_SIZE(net->nf.nf_loggers))
@@ -167,7 +167,7 @@ int nf_log_bind_pf(struct net *net, u_int8_t pf,
 }
 EXPORT_SYMBOL(nf_log_bind_pf);
 
-void nf_log_unbind_pf(struct net *net, u_int8_t pf)
+void nf_log_unbind_pf(struct net *net, u8 pf)
 {
 	if (pf >= ARRAY_SIZE(net->nf.nf_loggers))
 		return;
@@ -235,7 +235,7 @@ void nf_logger_put(int pf, enum nf_log_type type)
 EXPORT_SYMBOL_GPL(nf_logger_put);
 
 void nf_log_packet(struct net *net,
-		   u_int8_t pf,
+		   u8 pf,
 		   unsigned int hooknum,
 		   const struct sk_buff *skb,
 		   const struct net_device *in,
@@ -264,7 +264,7 @@ void nf_log_packet(struct net *net,
 EXPORT_SYMBOL(nf_log_packet);
 
 void nf_log_trace(struct net *net,
-		  u_int8_t pf,
+		  u8 pf,
 		  unsigned int hooknum,
 		  const struct sk_buff *skb,
 		  const struct net_device *in,
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net-next v4 2/4] net: openvswitch: add per-flow_table lockdep checks
From: Eelco Chaudron @ 2026-06-15 13:55 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, pabeni, Ilya Maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Simon Horman, open list:OPENVSWITCH,
	open list
In-Reply-To: <20260611045817.1302665-3-amorenoz@redhat.com>

On 11 Jun 2026, at 6:58, Adrian Moreno wrote:

> A future patch will introduce a per-flow_table mutex that will protect
> flow operations independently. In preparation for that, this patch
> introduces a flow_table lockdep macro, and modifies some function
> signatures to allow lockdep assertions to run.
>
> For now, the actual lockdep check logic is a no-op, but adding the
> infrastructure helps reduce the size of the upcoming patch.
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

Hi Adrian,

See some comments below, and maybe address the Sashiko comment in the
commit message.

Cheers,

Eelco

[...]

> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index e78c28dd5d9d..72ad3ed12675 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -840,15 +840,16 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts,
>  		+ nla_total_size_64bit(8); /* OVS_FLOW_ATTR_USED */
>  }
>
> -/* Called with ovs_mutex or RCU read lock. */
> +/* Called with table->lock or RCU read lock. */
>  static int ovs_flow_cmd_fill_stats(const struct sw_flow *flow,
> +				   const struct flow_table *table,

Should 'table' come before 'flow' to be consistent with
ovs_flow_tbl_insert() and ovs_flow_tbl_remove()? This applies
to all functions in this patch adding the 'table' parameter.

>  				   struct sk_buff *skb)
>  {
>  	struct ovs_flow_stats stats;
>  	__be16 tcp_flags;
>  	unsigned long used;

[...]

> diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
> index 6211bcc72655..3e5e9845c28a 100644
> --- a/net/openvswitch/flow_table.h
> +++ b/net/openvswitch/flow_table.h
> @@ -72,6 +72,22 @@ struct flow_table {
>
>  extern struct kmem_cache *flow_stats_cache;
>
> +static inline int lockdep_ovs_tbl_is_held(const struct flow_table *table
> +					  __always_unused)
> +{
> +	return 1;
> +}
> +
> +#define ASSERT_OVS_TBL(tbl)   WARN_ON(!lockdep_ovs_tbl_is_held(tbl))
> +
> +/* Lock-protected update-allowed dereferences.*/
> +#define ovs_tbl_dereference(p, tbl)	\
> +	rcu_dereference_protected(p, lockdep_ovs_tbl_is_held(tbl))
> +
> +/* Read dereferences can be protected by either RCU, table lock. */

nit; This comment reads odd, maybe;

/* Read dereferences can be protected by either RCU or table lock. */

> +#define rcu_dereference_ovs_tbl(p, tbl) \
> +	rcu_dereference_check(p, lockdep_ovs_tbl_is_held(tbl))
> +
>  int ovs_flow_init(void);
>  void ovs_flow_exit(void);


^ permalink raw reply

* Re: [PATCH net-next v4 4/4] net: openvswitch: avoid double-rcu wait period
From: Eelco Chaudron @ 2026-06-15 13:56 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, pabeni, Ilya Maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Simon Horman, open list:OPENVSWITCH,
	open list
In-Reply-To: <20260611045817.1302665-5-amorenoz@redhat.com>

On 11 Jun 2026, at 6:58, Adrian Moreno wrote:

> Avoid waiting for two rcu periods by scheduling the deletion of the
> flow_table and all of its referenced structs at the same time.

Hi Adrian,

One small nit below; the code itself looks fine to me.

Cheers,

Eelco

> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
>  net/openvswitch/flow_table.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 3934873a44c3..35232e1af8aa 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -527,30 +527,33 @@ static void table_instance_destroy(struct table_instance *ti,
>  	call_rcu(&ufid_ti->rcu, flow_tbl_destroy_rcu_cb);
>  }
>
> -/* No need for locking this function is called from RCU callback. */
>  static void ovs_flow_tbl_destroy_rcu(struct rcu_head *rcu)
>  {
>  	struct flow_table *table = container_of(rcu, struct flow_table, rcu);
>
> -	struct table_instance *ti = rcu_dereference_raw(table->ti);
> -	struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
> -	struct mask_cache *mc = rcu_dereference_raw(table->mask_cache);
> -	struct mask_array *ma = rcu_dereference_raw(table->mask_array);
> -
> -	call_rcu(&mc->rcu, mask_cache_rcu_cb);
> -	call_rcu(&ma->rcu, mask_array_rcu_cb);
> -	table_instance_destroy(ti, ufid_ti);
>  	mutex_destroy(&table->lock);
>  	kfree(table);
>  }
>
>  void ovs_flow_tbl_put(struct flow_table *table)
>  {
> +	struct table_instance *ufid_ti;
> +	struct table_instance *ti;
> +	struct mask_cache *mc;
> +	struct mask_array *ma;
> +
>  	if (refcount_dec_and_test(&table->refcnt)) {
>  		mutex_lock(&table->lock);
> -		table_instance_flow_flush(table,
> -					  ovs_tbl_dereference(table->ti, table),
> -					  ovs_tbl_dereference(table->ufid_ti, table));
> +		ufid_ti = ovs_tbl_dereference(table->ufid_ti, table);
> +		ti = ovs_tbl_dereference(table->ti, table);
> +		table_instance_flow_flush(table, ti, ufid_ti);
> +		table_instance_destroy(ti, ufid_ti);
> +
> +		mc = ovs_tbl_dereference(table->mask_cache, table);
> +		ma = ovs_tbl_dereference(table->mask_array, table);
> +		call_rcu(&mc->rcu, mask_cache_rcu_cb);
> +		call_rcu(&ma->rcu, mask_array_rcu_cb);

nit: Would it be cleaner to extract the destruction logic in
ovs_flow_tbl_put() into a separate static function, e.g.
ovs_flow_tbl_destroy(), to separate the refcount handling
from the actual cleanup?

>  		mutex_unlock(&table->lock);
>  		call_rcu(&table->rcu, ovs_flow_tbl_destroy_rcu);
>  	}


^ permalink raw reply

* Re: [PATCH net-next v4 3/4] net: openvswitch: decouple flow_table from ovs_mutex
From: Eelco Chaudron @ 2026-06-15 13:55 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, pabeni, Ilya Maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Simon Horman, open list:OPENVSWITCH,
	open list
In-Reply-To: <20260611045817.1302665-4-amorenoz@redhat.com>

On 11 Jun 2026, at 6:58, Adrian Moreno wrote:

> In order to protect flow operations from RTNL contention, this patch
> decouples flow_table modifications from ovs_mutex by means of the
> following:
>
> 1 - Create a new mutex inside the flow_table that protects it from
> concurrent modifications.
> Putting the mutex inside flow_table makes it easier to consume for
> functions inside flow_table.c that do not currently take pointers to the
> datapath.
> Some function signatures need to be changed to accept flow_table so that
> lockdep checks can be performed.
>
> 2 - Create a reference count to temporarily extend rcu protection from
> the datapath to the flow_table.
> One reference is held by the datapath, the other is temporarily
> increased during flow modifications.
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

Hi Adrian,

Thanks for this patch. I did spend quite some time analyzing
the implications of decoupling the table from ovs_mutex, and
I think all the concerns I had are covered. I did not do any
extensive traffic-based testing, but if time allows I will do
so with your next revision.

Find some comments below.

Cheers,

Eelco

[...]


> @@ -1678,8 +1722,12 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
>  	if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
>  		goto nla_put_failure;
>
> -	if (nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE,
> -			ovs_flow_tbl_masks_cache_size(table)))
> +	rcu_read_lock();
> +	table = rcu_dereference(dp->table);
> +	err = table ? nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE,
> +				  ovs_flow_tbl_masks_cache_size(table)) : 0;
> +	rcu_read_unlock();
> +	if (err)
>  		goto nla_put_failure;
>
>  	if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU && pids) {
> @@ -1817,7 +1865,9 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
>  			return -ENODEV;
>
>  		cache_size = nla_get_u32(a[OVS_DP_ATTR_MASKS_CACHE_SIZE]);
> +		mutex_lock(&table->lock);
>  		err = ovs_flow_tbl_masks_cache_resize(table, cache_size);
> +		mutex_unlock(&table->lock);

The locking schema in flow_table.h does not document that
ovs_mutex-held writers may skip the refcount steps. Should
this either be documented as an exception, or should the
full 7-step protocol be followed here for consistency?

>  		if (err)
>  			return err;
>  	}

[...]

> @@ -2656,9 +2701,12 @@ static void ovs_dp_masks_rebalance(struct work_struct *work)
>  	ovs_lock();
>  	list_for_each_entry(dp, &ovs_net->dps, list_node) {
>  		table = ovsl_dereference(dp->table);
> -		if (!table)
> +		if (!table || !ovs_flow_tbl_get(table))
>  			continue;
> +		mutex_lock(&table->lock);
>  		ovs_flow_masks_rebalance(table);
> +		mutex_unlock(&table->lock);

Same question as above, but here the RCU steps are skipped
instead of the refcount steps.

> +		ovs_flow_tbl_put(table);
>  	}
>  	ovs_unlock();
>
>

[...]

> @@ -513,7 +528,7 @@ static void table_instance_destroy(struct table_instance *ti,
>  }
>
>  /* No need for locking this function is called from RCU callback. */
> -void ovs_flow_tbl_destroy_rcu(struct rcu_head *rcu)
> +static void ovs_flow_tbl_destroy_rcu(struct rcu_head *rcu)
>  {
>  	struct flow_table *table = container_of(rcu, struct flow_table, rcu);
>
> @@ -525,9 +540,22 @@ void ovs_flow_tbl_destroy_rcu(struct rcu_head *rcu)
>  	call_rcu(&mc->rcu, mask_cache_rcu_cb);
>  	call_rcu(&ma->rcu, mask_array_rcu_cb);
>  	table_instance_destroy(ti, ufid_ti);
> +	mutex_destroy(&table->lock);
>  	kfree(table);
>  }
>
> +void ovs_flow_tbl_put(struct flow_table *table)
> +{
> +	if (refcount_dec_and_test(&table->refcnt)) {

Do we need a comment explaining why the 7-step protocol is not
followed here? Something like:

    /* Last reference dropped, no concurrent writers possible.
	 * Lock is only needed for lockdep assertions below.
	 */

> +		mutex_lock(&table->lock);
> +		table_instance_flow_flush(table,
> +					  ovs_tbl_dereference(table->ti, table),
> +					  ovs_tbl_dereference(table->ufid_ti, table));
> +		mutex_unlock(&table->lock);
> +		call_rcu(&table->rcu, ovs_flow_tbl_destroy_rcu);
> +	}
> +}
> +
>  struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
>  				       u32 *bucket, u32 *last)
>  {

[...]


^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Sebastian Andrzej Siewior @ 2026-06-15 13:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker
In-Reply-To: <20260611191114.5bc43a59@kernel.org>

On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> Please trim the pages of slop in the commit message and the comments.
> 
> On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > +	local_bh_disable();
> > + 	poll_napi(dev);
> > +	_local_bh_enable();
> 
> tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> softirq into not waking ksoftirqd? The problematic path is:

The I planned to get to this today but I won't make it. I try to get to
this as soon I can…

Sebastian

^ permalink raw reply

* Re: [PATCH stable 6.6.y v3 0/4] bpf: linked scalar precision fixes
From: Sasha Levin @ 2026-06-15 14:02 UTC (permalink / raw)
  To: bpf
  Cc: Sasha Levin, netdev, linux-kernel, ast, daniel, john.fastabend,
	andrii, martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, shung-hsi.yu, stable, mykolal, tamird,
	Zhenzhong Wu
In-Reply-To: <cover.1781194510.git.jt26wzz@gmail.com>

On Mon, Jun 15, 2026 at 00:58:37AM +0800, Zhenzhong Wu wrote:
> This v3 targets 6.6.y and changes the backport strategy based on review
> feedback on v2.

Queued all four for 6.6.y, thanks.

--
Thanks,
Sasha

^ permalink raw reply

* [PATCH net-next] gre: fix ERSPAN o_flags race/corruption in xmit and fill_info
From: Eric Dumazet @ 2026-06-15 14:03 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Ido Schimmel, David Ahern, netdev, eric.dumazet,
	Eric Dumazet

For IPv4 ERSPAN:
In erspan_xmit(), the driver clears IP_TUNNEL_SEQ_BIT (for version 0)
and IP_TUNNEL_KEY_BIT directly in the shared tunnel->parms.o_flags
structure. Since transmit paths can run locklessly and concurrently,
this leads to a data race.

Furthermore, modifying tunnel->parms.o_flags permanently alters the
tunnel configuration. To work around this, erspan_fill_info() (which
reports config to userspace) was setting IP_TUNNEL_KEY_BIT back. If
erspan_fill_info (running under RTNL) and erspan_xmit (running locklessly)
race, erspan_xmit might see IP_TUNNEL_KEY_BIT set when it shouldn't,
leading to GRE header corruption (injecting a key field into the ERSPAN
GRE header).

Fix this by:
1) Passing flags as an argument to __gre_xmit().
2) Using local flags in erspan_xmit() and passing them to __gre_xmit().
3) Removing the racy modification of t->parms.o_flags in erspan_fill_info().
4) Forcing IP_TUNNEL_KEY_BIT in the reported flags for ERSPAN locally
   in ipgre_fill_info().

For IPv6 ERSPAN:
ip6erspan_tunnel_xmit() was locklessly clearing IP_TUNNEL_KEY_BIT in
t->parms.o_flags even though it does not use these flags for building
the GRE header (it uses local flags). This permanently corrupts the
configuration and races with ip6gre_fill_info() which reads it.

Remove the redundant and racy modification.
This should remove false sharing in a fast path.

Add const qualifiers in ipgre_fill_info(), erspan_fill_info()
and ip6gre_fill_info() to clarify that these methods are not
supposed to write any live parameters.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Fixes: ee496694b9ee ("ip_gre: do not report erspan version on GRE interface")
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
I found this issue while working on RTNL-less fill_info().
Sent to net-next since 7.1 was just released.

 net/ipv4/ip_gre.c  | 30 +++++++++++++++---------------
 net/ipv6/ip6_gre.c |  5 ++---
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 208dd48012d963b9df0eddbdda73dd319930e48f..eab6d228d062b97b6f3f9d03418b84bac12b6983 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -475,12 +475,9 @@ static int gre_rcv(struct sk_buff *skb)
 
 static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
 		       const struct iphdr *tnl_params,
-		       __be16 proto)
+		       __be16 proto, const unsigned long *flags)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	IP_TUNNEL_DECLARE_FLAGS(flags);
-
-	ip_tunnel_flags_copy(flags, tunnel->parms.o_flags);
 
 	/* Push GRE header. */
 	gre_build_header(skb, tunnel->tun_hlen,
@@ -692,7 +689,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
 					      tunnel->parms.o_flags)))
 		goto free_skb;
 
-	__gre_xmit(skb, dev, tnl_params, skb->protocol);
+	__gre_xmit(skb, dev, tnl_params, skb->protocol, tunnel->parms.o_flags);
 	return NETDEV_TX_OK;
 
 free_skb:
@@ -705,6 +702,7 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
 			       struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
+	IP_TUNNEL_DECLARE_FLAGS(flags);
 	bool truncate = false;
 	__be16 proto;
 
@@ -728,10 +726,12 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
 		truncate = true;
 	}
 
+	ip_tunnel_flags_copy(flags, tunnel->parms.o_flags);
+
 	/* Push ERSPAN header */
 	if (tunnel->erspan_ver == 0) {
 		proto = htons(ETH_P_ERSPAN);
-		__clear_bit(IP_TUNNEL_SEQ_BIT, tunnel->parms.o_flags);
+		__clear_bit(IP_TUNNEL_SEQ_BIT, flags);
 	} else if (tunnel->erspan_ver == 1) {
 		erspan_build_header(skb, ntohl(tunnel->parms.o_key),
 				    tunnel->index,
@@ -746,8 +746,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
 		goto free_skb;
 	}
 
-	__clear_bit(IP_TUNNEL_KEY_BIT, tunnel->parms.o_flags);
-	__gre_xmit(skb, dev, &tunnel->parms.iph, proto);
+	__clear_bit(IP_TUNNEL_KEY_BIT, flags);
+	__gre_xmit(skb, dev, &tunnel->parms.iph, proto, flags);
 	return NETDEV_TX_OK;
 
 free_skb:
@@ -776,7 +776,7 @@ static netdev_tx_t gre_tap_xmit(struct sk_buff *skb,
 	if (skb_cow_head(skb, dev->needed_headroom))
 		goto free_skb;
 
-	__gre_xmit(skb, dev, &tunnel->parms.iph, htons(ETH_P_TEB));
+	__gre_xmit(skb, dev, &tunnel->parms.iph, htons(ETH_P_TEB), tunnel->parms.o_flags);
 	return NETDEV_TX_OK;
 
 free_skb:
@@ -1554,12 +1554,15 @@ static size_t ipgre_get_size(const struct net_device *dev)
 
 static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
-	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm_kern *p = &t->parms;
+	const struct ip_tunnel *t = netdev_priv(dev);
+	const struct ip_tunnel_parm_kern *p = &t->parms;
 	IP_TUNNEL_DECLARE_FLAGS(o_flags);
 
 	ip_tunnel_flags_copy(o_flags, p->o_flags);
 
+	if (t->erspan_ver != 0 && !t->collect_md)
+		__set_bit(IP_TUNNEL_KEY_BIT, o_flags);
+
 	if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) ||
 	    nla_put_be16(skb, IFLA_GRE_IFLAGS,
 			 gre_tnl_flags_to_gre_flags(p->i_flags)) ||
@@ -1602,12 +1605,9 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 
 static int erspan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
-	struct ip_tunnel *t = netdev_priv(dev);
+	const struct ip_tunnel *t = netdev_priv(dev);
 
 	if (t->erspan_ver <= 2) {
-		if (t->erspan_ver != 0 && !t->collect_md)
-			__set_bit(IP_TUNNEL_KEY_BIT, t->parms.o_flags);
-
 		if (nla_put_u8(skb, IFLA_GRE_ERSPAN_VER, t->erspan_ver))
 			goto nla_put_failure;
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 795be59946f7210bfae55d20500d18c83c01ede9..d0701351934ccfcfbac70bb2c2c2a7ccb9b6d779 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -964,7 +964,6 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 	if (skb_cow_head(skb, dev->needed_headroom ?: t->hlen))
 		goto tx_err;
 
-	__clear_bit(IP_TUNNEL_KEY_BIT, t->parms.o_flags);
 	IPCB(skb)->flags = 0;
 
 	/* For collect_md mode, derive fl6 from the tunnel key,
@@ -2112,8 +2111,8 @@ static size_t ip6gre_get_size(const struct net_device *dev)
 
 static int ip6gre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
-	struct ip6_tnl *t = netdev_priv(dev);
-	struct __ip6_tnl_parm *p = &t->parms;
+	const struct ip6_tnl *t = netdev_priv(dev);
+	const struct __ip6_tnl_parm *p = &t->parms;
 	IP_TUNNEL_DECLARE_FLAGS(o_flags);
 
 	ip_tunnel_flags_copy(o_flags, p->o_flags);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH][net-next] net/mlx5: Remove broken and unused mlx5_query_mtppse()
From: lirongqing @ 2026-06-15 14:04 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, gal, linux-rdma, linux-kernel
  Cc: Li RongQing

From: Li RongQing <lirongqing@baidu.com>

mlx5_query_mtppse() reads the Event Trigger Pin (MTPPSE) register but
reads the returned arm and mode values from the input buffer 'in'
instead of the output buffer 'out', so it always returns the values
that were written rather than the actual hardware state, making the
query useless.

The function has no in-tree callers. Remove it rather than fix it.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/port.c      | 19 -------------------
 2 files changed, 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 1507e88..a1001d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -296,7 +296,6 @@ void mlx5_core_reps_aux_devs_remove(struct mlx5_core_dev *dev);
 void mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
 int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size);
 int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size);
-int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode);
 int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode);
 
 struct mlx5_dm *mlx5_dm_create(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index ee8b976..ddbe9ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -908,25 +908,6 @@ int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size)
 				    sizeof(out), MLX5_REG_MTPPS, 0, 1);
 }
 
-int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode)
-{
-	u32 out[MLX5_ST_SZ_DW(mtppse_reg)] = {0};
-	u32 in[MLX5_ST_SZ_DW(mtppse_reg)] = {0};
-	int err = 0;
-
-	MLX5_SET(mtppse_reg, in, pin, pin);
-
-	err = mlx5_core_access_reg(mdev, in, sizeof(in), out,
-				   sizeof(out), MLX5_REG_MTPPSE, 0, 0);
-	if (err)
-		return err;
-
-	*arm = MLX5_GET(mtppse_reg, in, event_arm);
-	*mode = MLX5_GET(mtppse_reg, in, event_generation_mode);
-
-	return err;
-}
-
 int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode)
 {
 	u32 out[MLX5_ST_SZ_DW(mtppse_reg)] = {0};
-- 
2.9.4


^ permalink raw reply related

* [PATCH net] ice: eswitch: fix use-after-free of metadata_dst in repr release
From: Doruk Tan Ozturk @ 2026-06-15 14:05 UTC (permalink / raw)
  To: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
	edumazet, kuba, pabeni
  Cc: piotr.raczynski, michal.swiatkowski, wojciech.drewek,
	intel-wired-lan, netdev, linux-kernel, Doruk Tan Ozturk, stable

ice_eswitch_release_repr() frees the port representor metadata_dst via
metadata_dst_free(), which directly kfree()s the object and ignores the
dst_entry refcount. The eswitch slow-path TX routine
ice_eswitch_port_start_xmit() takes a reference on this dst with
dst_hold() and attaches it to the skb via skb_dst_set(). If such an skb
is still in flight (e.g. queued in a qdisc) when the representor is torn
down, the metadata_dst is freed while the skb still points at it. When
the skb is later freed, dst_release() operates on already-freed memory.

Replace metadata_dst_free() with dst_release() so the metadata_dst is
freed only after the last reference is dropped. The dst subsystem frees
metadata_dst objects from dst_destroy() once the refcount reaches zero
(DST_METADATA is set by metadata_dst_alloc()).

Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
nft_tunnel: fix use-after-free on object destroy").

Fixes: fff292b47ac1 ("ice: add VF representors one by one")
Cc: stable@vger.kernel.org
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
---
 drivers/net/ethernet/intel/ice/ice_eswitch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index 2e4f0969035f..41b30a7ca4a9 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -95,7 +95,7 @@ ice_eswitch_release_repr(struct ice_pf *pf, struct ice_repr *repr)
 		return;
 
 	ice_vsi_update_security(vsi, ice_vsi_ctx_set_antispoof);
-	metadata_dst_free(repr->dst);
+	dst_release(&repr->dst->dst);
 	repr->dst = NULL;
 	ice_fltr_add_mac_and_broadcast(vsi, repr->parent_mac,
 				       ICE_FWD_TO_VSI);
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] net/mlx5e: macsec: fix use-after-free of metadata_dst on RX SC delete
From: Doruk Tan Ozturk @ 2026-06-15 14:05 UTC (permalink / raw)
  To: saeedm, leon, tariqt, mbloch, andrew+netdev, davem, edumazet,
	kuba, pabeni
  Cc: borisp, sd, raeds, ehakim, netdev, linux-rdma, linux-kernel,
	Doruk Tan Ozturk, stable

macsec_del_rxsc_ctx() frees the RX SC metadata_dst via
metadata_dst_free(), which directly kfree()s the object and ignores the
dst_entry refcount. The MACsec RX offload datapath
mlx5e_macsec_offload_handle_rx_skb() takes a reference on this dst with
dst_hold() and attaches it to the skb via skb_dst_set(). If such an skb
is still in flight when the RX SC is deleted, the metadata_dst is freed
while the skb still references it; the subsequent dst_release() on skb
free then operates on already-freed memory.

Replace metadata_dst_free() with dst_release() so the metadata_dst is
freed only after the last reference is dropped. The dst subsystem frees
metadata_dst objects from dst_destroy() once the refcount reaches zero
(DST_METADATA is set by metadata_dst_alloc()).

Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
nft_tunnel: fix use-after-free on object destroy").

Fixes: 9b9e23c4dc2b ("net/mlx5e: MACsec, fix memory leak when MACsec device is deleted")
Cc: stable@vger.kernel.org
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
index 71b3a059c964..2a4e7ed76d31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
@@ -829,7 +829,7 @@ static void macsec_del_rxsc_ctx(struct mlx5e_macsec *macsec, struct mlx5e_macsec
 	 */
 	list_del_rcu(&rx_sc->rx_sc_list_element);
 	xa_erase(&macsec->sc_xarray, rx_sc->sc_xarray_element->fs_id);
-	metadata_dst_free(rx_sc->md_dst);
+	dst_release(&rx_sc->md_dst->dst);
 	kfree(rx_sc->sc_xarray_element);
 	kfree_rcu_mightsleep(rx_sc);
 }
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v7 05/12] net: phylink: support late PCS provider attach
From: Maxime Chevallier @ 2026-06-15 14:07 UTC (permalink / raw)
  To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-6-ansuelsmth@gmail.com>

Hi Christian,

On 6/15/26 14:29, Christian Marangi wrote:
> Add support for late PCS provider attachment to a phylink instance.
> This works by creating a global notifier for the PCS provider and
> making each phylink instance that makes use of fwnode subscribe to
> this notifier.
> 
> The PCS notifier will emit the event FWNODE_PCS_PROVIDER_ADD every time
> a new PCS provider is added.
> 
> phylink will then react to this event and will call the new function
> fwnode_phylink_pcs_get_from_fwnode() that will check if the PCS fwnode
> provided by the event is present in the pcs-handle property of the
> phylink instance.
> 
> If a related PCS is found, then such PCS is added to the phylink
> instance PCS list.
> 
> Then we link the PCS to the phylink instance and we refresh the supported
> interfaces of the phylink instance.
> 
> Finally we check if we are in a major_config_failed scenario and trigger
> an interface reconfiguration in the next phylink resolve.
> 
> In the example scenario where the link was previously torn down due to
> removal of PCS, the link will be established again as the PCS came back
> and is now available to phylink.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---

[...]

> @@ -2151,6 +2204,10 @@ void phylink_destroy(struct phylink *pl)
>  	if (pl->link_gpio)
>  		gpiod_put(pl->link_gpio);
>  
> +	/* Unregister notifier for late PCS attach */
> +	if (pl->fwnode_pcs_nb.notifier_call)
> +		unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);

I wanted to try this out, but I get :

drivers/net/phy/phylink.c:2218:17: error: implicit declaration of function ‘unregister_fwnode_pcs_notifier’; did you mean ‘register_fwnode_pcs_notifier’? [-Werror=implicit-function-declaration]
 2218 |                 unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                 register_fwnode_pcs_notifier

I guess you either need to stub this, or there's a missing Kconfig
dependency somewhere

Maxime




^ permalink raw reply

* Re: [PATCH net-next v7 05/12] net: phylink: support late PCS provider attach
From: Christian Marangi @ 2026-06-15 14:10 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Lorenzo Bianconi,
	Heiner Kallweit, Russell King, Saravana Kannan, Philipp Zabel,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	netdev, devicetree, linux-kernel, linux-doc, linux-arm-kernel,
	linux-mediatek, llvm
In-Reply-To: <867a39de-ccc2-4dcf-be24-ab2542d20ab6@bootlin.com>

On Mon, Jun 15, 2026 at 04:07:03PM +0200, Maxime Chevallier wrote:
> Hi Christian,
> 
> On 6/15/26 14:29, Christian Marangi wrote:
> > Add support for late PCS provider attachment to a phylink instance.
> > This works by creating a global notifier for the PCS provider and
> > making each phylink instance that makes use of fwnode subscribe to
> > this notifier.
> > 
> > The PCS notifier will emit the event FWNODE_PCS_PROVIDER_ADD every time
> > a new PCS provider is added.
> > 
> > phylink will then react to this event and will call the new function
> > fwnode_phylink_pcs_get_from_fwnode() that will check if the PCS fwnode
> > provided by the event is present in the pcs-handle property of the
> > phylink instance.
> > 
> > If a related PCS is found, then such PCS is added to the phylink
> > instance PCS list.
> > 
> > Then we link the PCS to the phylink instance and we refresh the supported
> > interfaces of the phylink instance.
> > 
> > Finally we check if we are in a major_config_failed scenario and trigger
> > an interface reconfiguration in the next phylink resolve.
> > 
> > In the example scenario where the link was previously torn down due to
> > removal of PCS, the link will be established again as the PCS came back
> > and is now available to phylink.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> 
> [...]
> 
> > @@ -2151,6 +2204,10 @@ void phylink_destroy(struct phylink *pl)
> >  	if (pl->link_gpio)
> >  		gpiod_put(pl->link_gpio);
> >  
> > +	/* Unregister notifier for late PCS attach */
> > +	if (pl->fwnode_pcs_nb.notifier_call)
> > +		unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
> 
> I wanted to try this out, but I get :
> 
> drivers/net/phy/phylink.c:2218:17: error: implicit declaration of function ‘unregister_fwnode_pcs_notifier’; did you mean ‘register_fwnode_pcs_notifier’? [-Werror=implicit-function-declaration]
>  2218 |                 unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       |                 register_fwnode_pcs_notifier
> 
> I guess you either need to stub this, or there's a missing Kconfig
> dependency somewhere
>

Hi yes if you want toi test just enable CONFIG_FWNODE_PCS. I forgot to add
the static declaration for unregister_fwnode_pcs_notifier. 

-- 
	Ansuel

^ permalink raw reply

* Re: [PATCH bpf 2/2] selftests/bpf: Cover partial copy of non-linear skb test_run output
From: Paul Chaignon @ 2026-06-15 14:13 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah
In-Reply-To: <20260615073856.152479-3-sun.jian.kdev@gmail.com>

On Mon, Jun 15, 2026 at 03:38:56PM +0800, Sun Jian wrote:
> Add a test case for BPF_PROG_TEST_RUN with a non-linear skb and a short
> data_out buffer.
> 
> The test verifies that test_run returns -ENOSPC, reports the full packet
> length through data_size_out, and copies the packet prefix into data_out.
> The test uses a 100-byte data_out buffer with a 64-byte linear head, so the
> expected output spans both the skb head and the first fragment.
> 
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/skb_load_bytes.c | 35 +++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c b/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
> index d7f83c0a40a5..134be0ea8ed7 100644
> --- a/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
> +++ b/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
> @@ -3,6 +3,39 @@
>  #include <network_helpers.h>
>  #include "skb_load_bytes.skel.h"
>  
> +#define NONLINEAR_PKT_LEN 9000
> +#define NONLINEAR_HEAD_LEN 64
> +#define SHORT_OUT_LEN 100
> +
> +static void test_nonlinear_data_out_partial(int prog_fd)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, tattr);
> +	__u8 pkt[NONLINEAR_PKT_LEN];
> +	__u8 out[SHORT_OUT_LEN];
> +	struct __sk_buff skb = {};
> +	int err, i;
> +
> +	for (i = 0; i < sizeof(pkt); i++)
> +		pkt[i] = i & 0xff;
> +
> +	memset(out, 0xa5, sizeof(out));
> +
> +	skb.data_end = NONLINEAR_HEAD_LEN;
> +
> +	tattr.data_in = pkt;
> +	tattr.data_size_in = sizeof(pkt);
> +	tattr.data_out = out;
> +	tattr.data_size_out = sizeof(out);
> +	tattr.ctx_in = &skb;
> +	tattr.ctx_size_in = sizeof(skb);
> +
> +	err = bpf_prog_test_run_opts(prog_fd, &tattr);
> +
> +	ASSERT_EQ(err, -ENOSPC, "nonlinear_partial_err");
> +	ASSERT_EQ(tattr.data_size_out, sizeof(pkt), "nonlinear_partial_data_size_out");
> +	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "nonlinear_partial_data_out");
> +}
> +
>  void test_skb_load_bytes(void)
>  {
>  	struct skb_load_bytes *skel;
> @@ -40,6 +73,8 @@ void test_skb_load_bytes(void)
>  	if (!ASSERT_EQ(test_result, 0, "offset 10"))
>  		goto out;
>  
> +	test_nonlinear_data_out_partial(prog_fd);
> +

Maybe prog_tests/prog_run_opts.c would be a better place to cover this?
test_skb_load_bytes() is meant to cover the bpf_skb_load_bytes helper.

>  out:
>  	skb_load_bytes__destroy(skel);
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH net-next v7 02/12] net: phylink: introduce internal phylink PCS handling
From: Christian Marangi @ 2026-06-15 14:17 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Lorenzo Bianconi,
	Heiner Kallweit, Russell King, Saravana Kannan, Philipp Zabel,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	netdev, devicetree, linux-kernel, linux-doc, linux-arm-kernel,
	linux-mediatek, llvm
In-Reply-To: <3bbacda3-4225-4536-a4b4-3aa31a47a3aa@bootlin.com>

On Mon, Jun 15, 2026 at 03:31:20PM +0200, Maxime Chevallier wrote:
> Hi Christian,
> 
> On 6/15/26 14:29, Christian Marangi wrote:
> > Introduce internal handling of PCS for phylink. This is an alternative
> > way to .mac_select_pcs that moves the selection logic of the PCS entirely
> > to phylink with the usage of the supported_interface value in the PCS
> > struct.
> > 
> > MAC should now provide a callback to fill the available PCS in
> > phylink_config in .fill_available_pcs and fill the .num_possible_pcs with
> > the number of elements in the array. MAC should also define a new bitmap,
> > pcs_interfaces, in phylink_config to define for what interface mode a
> > dedicated PCS is required.
> > 
> > On phylink_create(), an array of PCS pointer is allocated of size
> > .num_possible_pcs from phylink_config and .fill_available_pcs from
> > phylink_config is called passing as args the just allocated array and
> > the number of possible element in it.
> > 
> > MAC will fill this passed array with all the available PCS.
> > 
> > This array is then parsed and a linked list of PCS is created based on
> > the allocated PCS array filled by MAC via .fill_available_pcs().
> > 
> > Every PCS in phylink PCS list gets then linked to the phylink instance
> > by setting the phylink value in phylink_pcs struct to the phylink instance.
> > Also the supported_interface value in phylink struct is updated with
> > the new supported_interface from the provided PCS.
> > 
> > On phylink_destroy(), every PCS in phylink PCS list is unlinked from the
> > phylink instance by setting the phylink value in phylink_pcs struct to NULL
> > and removed from the PCS list.
> > 
> > phylink_validate_mac_and_pcs(), phylink_major_config() and
> > phylink_inband_caps() are updated to support this new implementation
> > with the PCS list stored in phylink.
> > 
> > They will make use of phylink_validate_pcs_interface() that will loop
> > for every PCS in the phylink PCS available list and find one that supports
> > the passed interface.
> > 
> > phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
> > where if a supported_interface value is not set for the PCS struct, then
> > it's assumed every interface is supported.
> > 
> > A MAC is required to implement either a .mac_select_pcs or make use of
> > the PCS list implementation. Implementing both will result in a fail
> > on phylink_create().
> > 
> > A MAC defining .num_possible_pcs in phylink_config MUST also define a
> > .fill_available_pcs or phylink_create() will fail with an negative error.
> > 
> > phylink value in phylink_pcs struct with this implementation is used to
> > track from PCS side when it's attached to a phylink instance. PCS driver
> > will make use of this information to correctly detach from a phylink
> > instance if needed.
> > 
> > phylink_pcs_change() is also changed to verify that the PCS that triggered
> > a link change is the one that is currently used by the phylink instance.
> > 
> > The .mac_select_pcs implementation is not changed but it's expected that
> > every MAC driver migrates to the new implementation to later deprecate
> > and remove .mac_select_pcs.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> 
> [...]
> 
> > @@ -1872,10 +1993,28 @@ struct phylink *phylink_create(struct phylink_config *config,
> >  	mutex_init(&pl->phydev_mutex);
> >  	mutex_init(&pl->state_mutex);
> >  	INIT_WORK(&pl->resolve, phylink_resolve);
> > +	INIT_LIST_HEAD(&pl->pcs_list);
> > +
> > +	/* Fill the PCS list with available PCS from phylink config */
> > +	ret = phylink_fill_available_pcs(pl, config);
> > +	if (ret < 0) {
> > +		kfree(pl);
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	/* Link available PCS to phylink */
> > +	list_for_each_entry(pcs, &pl->pcs_list, list)
> > +		pcs->phylink = pl;
> >  
> >  	phy_interface_copy(pl->supported_interfaces,
> >  			   config->supported_interfaces);
> >  
> > +	/* Update supported interfaces */
> > +	list_for_each_entry(pcs, &pl->pcs_list, list)
> > +		phy_interface_or(pl->supported_interfaces,
> > +				 pl->supported_interfaces,
> > +				 pcs->supported_interfaces);
> > +
> 
> I'm not entirely sure about that, we may need to restrict the supported_interfaces
> from the MAC.
> 
> As an example, take mvpp2. We have 2 PCSs, one for BaseX/SGMII, one for BaseR. But
> if we don't have a comphy (generic PHY) device, then we can't use all the
> combination of modes our PCSs can provide :
> 
> https://elixir.bootlin.com/linux/v7.1-rc7/source/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c#L7074
> 
> These aren't external PCS IPs, but from what I understand you'd like to
> handle these the same way as purely external PCSs, right ?
> 
> I'd say the MAC driver utltimately has the knowledge of all possible interfaces.
> 
> The way I see it, it's probably safer to let the MAC give a wide range of interfaces,
> and filter that down with what the PCSs can provide (i.e. turn that or into an and,
> while handling the case where the pcs supported interfaces is empty).
> 
> What do you think ?
>

The idea is that supported_interface is a mask of every possible interface
from MAC and PCS. Then it's phylink_validate_mac_and_pcs that actually use
that mask and validates it on both MAC and PCS.

This is why the OR was used instead of AND. The idea is to have the PCS as
external standalone entry (even if they are internal to the MAC). So each
entry should have they own set of supported mask.

The previous patch and this try to address this problem where phylink is
actually clueless of what is actually supported exactly because it's has
been given MAC too much freedom of modelling limitation internally.

I feel limitation should be handled by their dedicated function with
.pcs_validate and .mac_get_caps.

Just my idea on this, if needed it's totally ok to simplify this and let
MAC entirely handle the mask. (but I feel the current idea of phylink code
was to have a generic mask in supported_interfaces and then verify MAC and
PCS in phylink_validate_mac_and_pcs())

But by thinking on it more, following your case of mvpp2, with this new
PCS:

- You need a PCS for the .get_state.
- And such PCS will have the supported interface set 1000baseX and
  2500BaseX (as that is what is actually supported in HW)

Either some magic is done in .pcs_validate to deny changing the interface
that was initially configured or this gets limited at the
supported_interface configured by the MAC.

I need to check if this might be problematic for the other driver where
this is being used on OpenWrt but maybe changing the logic to an AND might
be sensible for these kind of case.

(for the other it shouldn't change anything)

-- 
	Ansuel

^ permalink raw reply

* Re: [PATCH net-next v7 01/12] net: phylink: keep and use MAC supported_interfaces in phylink struct
From: Christian Marangi @ 2026-06-15 14:18 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Lorenzo Bianconi,
	Heiner Kallweit, Russell King, Saravana Kannan, Philipp Zabel,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	netdev, devicetree, linux-kernel, linux-doc, linux-arm-kernel,
	linux-mediatek, llvm
In-Reply-To: <371a1df7-084c-4431-bd00-0045298e3212@bootlin.com>

On Mon, Jun 15, 2026 at 03:33:34PM +0200, Maxime Chevallier wrote:
> Hello Christian,
> 
> On 6/15/26 14:29, Christian Marangi wrote:
> > Add in phylink struct a copy of supported_interfaces from phylink_config
> > and make use of that instead of relying on phylink_config value.
> > 
> > This in preparation for support of PCS handling internally to phylink
> > where a PCS can be removed or added after the phylink is created and we
> > need both a reference of the supported_interfaces value from
> > phylink_config and an internal value that can be updated with the new
> > PCS info.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> >  drivers/net/phy/phylink.c | 22 +++++++++++++++-------
> >  1 file changed, 15 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index 087ac63f9193..4d59c0dd78db 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -60,6 +60,11 @@ struct phylink {
> >  	/* The link configuration settings */
> >  	struct phylink_link_state link_config;
> >  
> > +	/* What interface are supported by the current link.
> > +	 * Can change on removal or addition of new PCS.
> > +	 */
> > +	DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
> 
> Can you clarify a bit what you mean here ? Is that the combination of the
> interfaces the MAC supports AND the currently in-use PCS ?
> 

Combination of interface the MAC supports and the currently attached PCS
(not the current one in use)

The fact that it can change is due to the fact that PCS can be attached
later and supported_interfaces can be updated accordingly.

-- 
	Ansuel

^ permalink raw reply

* Re: vhost: fix vhost_get_avail_idx for a non empty ring
From: Christian Borntraeger @ 2026-06-15 14:24 UTC (permalink / raw)
  To: mst
  Cc: eperezma, jasowang, kvm, linux-kernel, netdev, sgarzare, shuangyu,
	stefanha, virtualization, Christian Borntraeger
In-Reply-To: <559b04ae6ce52973c535dc47e461638b7f4c3d63.1772441455.git.mst@redhat.com>

Late feedback, but this patch massively improves our uperf latency/bandwidth
and cpu consumption significantly for s390. Improvements are all over
the place, streaming, transactional (100 byte/2000 byte). Nice fix.


Christian

^ permalink raw reply

* Re: [PATCH net-next v5 12/15] onsemi: s2500: Add driver support for TS2500 MAC-PHY
From: Julian Braha @ 2026-06-15 14:27 UTC (permalink / raw)
  To: Selvamani.Rajagopal, Andrew Lunn, Piergiorgio Beruto,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Andrew Lunn, Parthiban Veerasooran,
	Richard Cochran, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan
  Cc: netdev, linux-kernel, devicetree, linux-doc, Jerry Ray
In-Reply-To: <20260614-s2500-mac-phy-support-v5-12-89874b72f725@onsemi.com>

Hi Selvamani,

On 6/14/26 18:00, Selvamani Rajagopal via B4 Relay wrote:

> diff --git a/drivers/net/ethernet/onsemi/Kconfig b/drivers/net/ethernet/onsemi/Kconfig
> new file mode 100644
> index 000000000000..8d72194151ea
> --- /dev/null
> +++ b/drivers/net/ethernet/onsemi/Kconfig
> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# onsemi network device configuration
> +#
> +
> +config NET_VENDOR_ONSEMI
> +	bool "onsemi network devices"
> +	help
> +	  If you have a network card belonging to this class, say Y.
> +
> +	  Note that the answer to this question doesn't directly affect the
> +	  kernel: saying N will just cause the configurator to skip all
> +	  the questions about onsemi ethernet devices. If you say Y, you
> +	  will be asked for your specific card in the following questions.
> +
> +if NET_VENDOR_ONSEMI
> +
> +source "drivers/net/ethernet/onsemi/s2500/Kconfig"
> +
> +endif # NET_VENDOR_ONSEMI

When you put the 'if NET_VENDOR_ONSEMI' around the 'source', you're
making it a dependency on every config option in that sourced Kconfig.

> diff --git a/drivers/net/ethernet/onsemi/s2500/Kconfig b/drivers/net/ethernet/onsemi/s2500/Kconfig
> new file mode 100644
> index 000000000000..f2e8d5d1429d
> --- /dev/null
> +++ b/drivers/net/ethernet/onsemi/s2500/Kconfig
> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# onsemi S2500 Driver Support
> +#
> +
> +if NET_VENDOR_ONSEMI
> +
> +config S2500_MACPHY
> +	tristate "S2500 support"
> +	depends on SPI
> +	select NCN26000_PHY
> +	select OA_TC6
> +	help
> +	  Support for the onsemi TS2500 MACPHY Ethernet chip.
> +	  It works under the framework that conform to OPEN Alliance
> +	  10BASE-T1x Serial Interface specification.
> +
> +	  To compile this driver as a module, choose M here. The module will be
> +	  called s2500.
> +
> +endif # NET_VENDOR_ONSEMI
Which means that when you add 'if NET_VENDOR_ONSEMI' again inside the
sourced Kconfig, it's a duplicate dependency.

I think putting the if-endif in either place is fine, but it's redundant
to do it twice. You could maybe consider using a comment for the second
instance instead.

- Julian Braha

^ permalink raw reply

* Re: [PATCH net-next v7 05/12] net: phylink: support late PCS provider attach
From: Maxime Chevallier @ 2026-06-15 14:29 UTC (permalink / raw)
  To: Christian Marangi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Lorenzo Bianconi,
	Heiner Kallweit, Russell King, Saravana Kannan, Philipp Zabel,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	netdev, devicetree, linux-kernel, linux-doc, linux-arm-kernel,
	linux-mediatek, llvm
In-Reply-To: <6a3007ce.73de60af.3a056d.d903@mx.google.com>



On 6/15/26 16:10, Christian Marangi wrote:
> On Mon, Jun 15, 2026 at 04:07:03PM +0200, Maxime Chevallier wrote:
>> Hi Christian,
>>
>> On 6/15/26 14:29, Christian Marangi wrote:
>>> Add support for late PCS provider attachment to a phylink instance.
>>> This works by creating a global notifier for the PCS provider and
>>> making each phylink instance that makes use of fwnode subscribe to
>>> this notifier.
>>>
>>> The PCS notifier will emit the event FWNODE_PCS_PROVIDER_ADD every time
>>> a new PCS provider is added.
>>>
>>> phylink will then react to this event and will call the new function
>>> fwnode_phylink_pcs_get_from_fwnode() that will check if the PCS fwnode
>>> provided by the event is present in the pcs-handle property of the
>>> phylink instance.
>>>
>>> If a related PCS is found, then such PCS is added to the phylink
>>> instance PCS list.
>>>
>>> Then we link the PCS to the phylink instance and we refresh the supported
>>> interfaces of the phylink instance.
>>>
>>> Finally we check if we are in a major_config_failed scenario and trigger
>>> an interface reconfiguration in the next phylink resolve.
>>>
>>> In the example scenario where the link was previously torn down due to
>>> removal of PCS, the link will be established again as the PCS came back
>>> and is now available to phylink.
>>>
>>> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
>>> ---
>>
>> [...]
>>
>>> @@ -2151,6 +2204,10 @@ void phylink_destroy(struct phylink *pl)
>>>  	if (pl->link_gpio)
>>>  		gpiod_put(pl->link_gpio);
>>>  
>>> +	/* Unregister notifier for late PCS attach */
>>> +	if (pl->fwnode_pcs_nb.notifier_call)
>>> +		unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
>>
>> I wanted to try this out, but I get :
>>
>> drivers/net/phy/phylink.c:2218:17: error: implicit declaration of function ‘unregister_fwnode_pcs_notifier’; did you mean ‘register_fwnode_pcs_notifier’? [-Werror=implicit-function-declaration]
>>  2218 |                 unregister_fwnode_pcs_notifier(&pl->fwnode_pcs_nb);
>>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>       |                 register_fwnode_pcs_notifier
>>
>> I guess you either need to stub this, or there's a missing Kconfig
>> dependency somewhere
>>
> 
> Hi yes if you want toi test just enable CONFIG_FWNODE_PCS. I forgot to add
> the static declaration for unregister_fwnode_pcs_notifier. 

I'll give it a go with this yeah, I have a few devices here I'd like to
try this on.

Can you CC me for the next rounds ?

Maxime

> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox