Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 2/2] bonding: reuse neigh_setup from slave neigh_parms
From: Paritosh Potukuchi @ 2026-07-01  8:16 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Paritosh Potukuchi, Jay Vosburgh, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20260701081602.3185086-1-paritosh.potukuchi@amd.com>

bond_neigh_init() currently relies on the slave device's
ndo_neigh_setup() callback to obtain a neigh_setup() handler.

When an initialized neigh_parms instance already exists for the
slave device, reuse the neigh_setup() callback stored in it instead
of invoking ndo_neigh_setup() again.

If no neigh_parms instance is found, or no neigh_setup() callback is
present, retain the existing ndo_neigh_setup() fallback path.

This avoids unnecessary ndo_neigh_setup() invocations while preserving
existing behaviour.

Signed-off-by: Paritosh Potukuchi <paritosh.potukuchi@amd.com>
---
 drivers/net/bonding/bond_main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e044fc733b8c..d2e4dae4e97c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4719,7 +4719,7 @@ static int bond_neigh_init(struct neighbour *n)
 {
 	struct bonding *bond = netdev_priv(n->dev);
 	const struct net_device_ops *slave_ops;
-	struct neigh_parms parms;
+	struct neigh_parms parms, *p;
 	struct slave *slave;
 	int ret = 0;
 
@@ -4727,6 +4727,14 @@ static int bond_neigh_init(struct neighbour *n)
 	slave = bond_first_slave_rcu(bond);
 	if (!slave)
 		goto out;
+
+	p = neigh_parms_lookup_dev(n->tbl, slave->dev);
+
+	if (p && p->neigh_setup) {
+		ret = p->neigh_setup(n);
+		goto out;
+	}
+
 	slave_ops = slave->dev->netdev_ops;
 	if (!slave_ops->ndo_neigh_setup)
 		goto out;
-- 
2.43.0


^ permalink raw reply related

* RE: [PATCH net] net: phy: motorcomm: read EEE abilities in yt8521_get_features()
From: Clark Wang @ 2026-07-01  8:16 UTC (permalink / raw)
  To: Breno Leitao, Clark Wang (OSS)
  Cc: Frank.Sae@motor-comm.com, andrew@lunn.ch, hkallweit1@gmail.com,
	linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev
In-Reply-To: <akTLmyF0r70Jyn9m@gmail.com>

> > In phy_probe(), genphy_c45_read_eee_abilities() is only called when a
> > driver uses phydrv->features. Drivers that implement .get_features are
> > responsible for reading the EEE abilities themselves.
> >
> > yt8521_get_features() does not do this, so phydev->supported_eee stays
> > empty for YT8521/YT8531S and "ethtool --show-eee" reports "EEE status:
> > not supported", even though the PHY has the standard EEE capability
> > registers.
> >
> > Call genphy_c45_read_eee_abilities() at the end of
> > yt8521_get_features() to populate supported_eee.
> >
> > Fixes: 70479a40954c ("net: phy: Add driver for Motorcomm yt8521
> > gigabit ethernet phy")
> > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
> > ---
> >  drivers/net/phy/motorcomm.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/phy/motorcomm.c
> b/drivers/net/phy/motorcomm.c
> > index b49897500a59..46efa3406841 100644
> > --- a/drivers/net/phy/motorcomm.c
> > +++ b/drivers/net/phy/motorcomm.c
> > @@ -2439,6 +2439,9 @@ static int yt8521_get_features(struct phy_device
> *phydev)
> >  		/* add fiber's features to phydev->supported */
> >  		yt8521_prepare_fiber_features(phydev, phydev->supported);
> >  	}
> > +
> > +	genphy_c45_read_eee_abilities(phydev);
> 
> Don't you want to return error if genphy_c45_read_eee_abilities() fails?

EEE is an optional functionality, and the call in genphy_read_abilities() has the following comment. Therefore, I do not return its error here either.
"
	/* This is optional functionality. If not supported, we may get an error
	 * which should be ignored.
	 */
"


^ permalink raw reply

* Re: [PATCH RESEND net-next] net: airoha: Make use of the helper function dev_err_probe()
From: Lorenzo Bianconi @ 2026-07-01  8:17 UTC (permalink / raw)
  To: zhulei; +Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
In-Reply-To: <23f58025.6d6d.19f1cb25385.Coremail.zhulei_szu@163.com>

[-- Attachment #1: Type: text/plain, Size: 1820 bytes --]

> At 2026-06-30 18:38:38, "Lorenzo Bianconi" <lorenzo@kernel.org> wrote:
> >> From: Lei Zhu <zhulei@kylinos.cn>
> >> 
> >> Use dev_err_probe() to reduce code size and simplify the code.
> >> 
> >> Signed-off-by: Lei Zhu <zhulei@kylinos.cn>
> >> ---
> >> The last submission was when net-next is closed.Resending it.
> >> 
> >>  drivers/net/ethernet/airoha/airoha_eth.c | 21 +++++++++------------
> >>  1 file changed, 9 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> >> index 31cdb11cd78d..189f64e83a46 100644
> >> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> >> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> >> @@ -3071,10 +3071,9 @@ static int airoha_probe(struct platform_device *pdev)
> >>  	eth->dev = &pdev->dev;
> >>  
> >>  	err = dma_set_mask_and_coherent(eth->dev, DMA_BIT_MASK(32));
> >
> >I do not think dma_set_mask_and_coherent() can return -EPROBE_DEFER, so there
> >is no point adding dev_err_probe() here.
> >
> >Regards,
> >Lorenzo
> >
> Hi Lorenzo,
> 
> Thanks for your review.
> 
> Before making this patch, I referred to the comments of dev_err_probe:
> "even if @err is known to never be -EPROBE_DEFER, the benefit compared
> to a normal dev_err() is the standardized format of the error code."
> 
> In the probe function, I noticed devm_platform_ioremap_resource_byname
> already uses dev_err_probe, while other functions still use dev_err.
> Replace them with dev_err_probe for consistency, more compact error paths,
> and better readability of error codes.
> 
> Best regards
> Lei
> 

I do not have a strong opinion about it, from my pov it is more clear if we use
dev_err_probe() when it really does something, but up to you.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: phy: motorcomm: read EEE abilities in yt8521_get_features()
From: Breno Leitao @ 2026-07-01  8:19 UTC (permalink / raw)
  To: Clark Wang
  Cc: Clark Wang (OSS), Frank.Sae@motor-comm.com, andrew@lunn.ch,
	hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	imx@lists.linux.dev
In-Reply-To: <GV2PR04MB12213F4758648CD71F5D5B3B7F3F62@GV2PR04MB12213.eurprd04.prod.outlook.com>

On Wed, Jul 01, 2026 at 08:16:13AM +0000, Clark Wang wrote:
> > > In phy_probe(), genphy_c45_read_eee_abilities() is only called when a
> > > driver uses phydrv->features. Drivers that implement .get_features are
> > > responsible for reading the EEE abilities themselves.
> > >
> > > yt8521_get_features() does not do this, so phydev->supported_eee stays
> > > empty for YT8521/YT8531S and "ethtool --show-eee" reports "EEE status:
> > > not supported", even though the PHY has the standard EEE capability
> > > registers.
> > >
> > > Call genphy_c45_read_eee_abilities() at the end of
> > > yt8521_get_features() to populate supported_eee.
> > >
> > > Fixes: 70479a40954c ("net: phy: Add driver for Motorcomm yt8521
> > > gigabit ethernet phy")
> > > Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>

Reviewed-by: Breno Leitao <leitao@debian.org>

> > > +	genphy_c45_read_eee_abilities(phydev);
> > 
> > Don't you want to return error if genphy_c45_read_eee_abilities() fails?
> 
> EEE is an optional functionality, and the call in genphy_read_abilities() has the following comment. Therefore, I do not return its error here either.
> "
> 	/* This is optional functionality. If not supported, we may get an error
> 	 * which should be ignored.
> 	 */
> "

Ack. I've look at the code, and no one is even checking for the return
value anyway.

^ permalink raw reply

* [PATCH net-next 0/2] macvlan: RTNL-less macvlan_fill_info()
From: Eric Dumazet @ 2026-07-01  8:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

This series removes the RTNL lock dependency from macvlan_fill_info(),
allowing it to run under RCU read lock.

The first patch annotates data races on 'mode' and 'flags' fields which
are accessed locklessly in the RX/TX paths.

The second patch transitions macvlan_fill_info() to RCU, adding necessary
annotations for other fields and handling concurrent updates to the MAC
address list by computing the count dynamically.

Eric Dumazet (2):
  macvlan: annotate data-races around vlan->mode and vlan->flags
  macvlan: no longer rely on RTNL in macvlan_fill_info()

 drivers/net/macvlan.c | 109 ++++++++++++++++++++++++++----------------
 1 file changed, 69 insertions(+), 40 deletions(-)

-- 
2.55.0.rc0.799.gd6f94ed593-goog

^ permalink raw reply

* [PATCH net-next 1/2] macvlan: annotate data-races around vlan->mode and vlan->flags
From: Eric Dumazet @ 2026-07-01  8:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260701082214.2974946-1-edumazet@google.com>

Both fields can be changed in macvlan_changelink() while being read
locklessly.

Add READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/macvlan.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index c40fa331836bb2395267914807542ae5094e1a3c..8b69cc9b70f98d7991110f5eda76d58d5d96fa81 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -277,7 +277,7 @@ static void macvlan_broadcast(struct sk_buff *skb,
 		return;
 
 	hash_for_each_rcu(port->vlan_hash, i, vlan, hlist) {
-		if (vlan->dev == src || !(vlan->mode & mode))
+		if (vlan->dev == src || !(READ_ONCE(vlan->mode) & mode))
 			continue;
 
 		hash = mc_hash(vlan, eth->h_dest);
@@ -306,7 +306,7 @@ static void macvlan_multicast_rx(const struct macvlan_port *port,
 				  MACVLAN_MODE_VEPA    |
 				  MACVLAN_MODE_PASSTHRU|
 				  MACVLAN_MODE_BRIDGE);
-	else if (src->mode == MACVLAN_MODE_VEPA)
+	else if (READ_ONCE(src->mode) == MACVLAN_MODE_VEPA)
 		/* flood to everyone except source */
 		macvlan_broadcast(skb, port, src->dev,
 				  MACVLAN_MODE_VEPA |
@@ -447,7 +447,7 @@ static bool macvlan_forward_source(struct sk_buff *skb,
 			if (!vlan)
 				continue;
 
-			if (vlan->flags & MACVLAN_FLAG_NODST)
+			if (READ_ONCE(vlan->flags) & MACVLAN_FLAG_NODST)
 				consume = true;
 			macvlan_forward_source_one(skb, vlan);
 		}
@@ -487,14 +487,18 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 			return RX_HANDLER_CONSUMED;
 		}
 		src = macvlan_hash_lookup(port, eth->h_source);
-		if (src && src->mode != MACVLAN_MODE_VEPA &&
-		    src->mode != MACVLAN_MODE_BRIDGE) {
-			/* forward to original port. */
-			vlan = src;
-			ret = macvlan_broadcast_one(skb, vlan, eth, 0) ?:
-			      __netif_rx(skb);
-			handle_res = RX_HANDLER_CONSUMED;
-			goto out;
+		if (src) {
+			enum macvlan_mode mode = READ_ONCE(src->mode);
+
+			if (mode != MACVLAN_MODE_VEPA &&
+			    mode != MACVLAN_MODE_BRIDGE) {
+				/* forward to original port. */
+				vlan = src;
+				ret = macvlan_broadcast_one(skb, vlan, eth, 0) ?:
+				      __netif_rx(skb);
+				handle_res = RX_HANDLER_CONSUMED;
+				goto out;
+			}
 		}
 
 		hash = mc_hash(NULL, eth->h_dest);
@@ -515,7 +519,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 					      struct macvlan_dev, list);
 	else
 		vlan = macvlan_hash_lookup(port, eth->h_dest);
-	if (!vlan || vlan->mode == MACVLAN_MODE_SOURCE)
+	if (!vlan || READ_ONCE(vlan->mode) == MACVLAN_MODE_SOURCE)
 		return RX_HANDLER_PASS;
 
 	dev = vlan->dev;
@@ -548,7 +552,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 	const struct macvlan_port *port = vlan->port;
 	const struct macvlan_dev *dest;
 
-	if (vlan->mode == MACVLAN_MODE_BRIDGE) {
+	if (READ_ONCE(vlan->mode) == MACVLAN_MODE_BRIDGE) {
 		const struct ethhdr *eth = skb_eth_hdr(skb);
 
 		/* send to other bridge ports directly */
@@ -559,7 +563,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 
 		dest = macvlan_hash_lookup(port, eth->h_dest);
-		if (dest && dest->mode == MACVLAN_MODE_BRIDGE) {
+		if (dest && READ_ONCE(dest->mode) == MACVLAN_MODE_BRIDGE) {
 			/* send to lowerdev first for its network taps */
 			dev_forward_skb(vlan->lowerdev, skb);
 
@@ -777,7 +781,7 @@ static int macvlan_set_mac_address(struct net_device *dev, void *p)
 	if (ether_addr_equal(dev->dev_addr, addr->__data))
 		return 0;
 
-	if (vlan->mode == MACVLAN_MODE_PASSTHRU) {
+	if (READ_ONCE(vlan->mode) == MACVLAN_MODE_PASSTHRU) {
 		macvlan_set_addr_change(vlan->port);
 		return dev_set_mac_address(vlan->lowerdev, addr, NULL);
 	}
@@ -1645,7 +1649,7 @@ static int macvlan_changelink(struct net_device *dev,
 			if (err < 0)
 				return err;
 		}
-		vlan->flags = flags;
+		WRITE_ONCE(vlan->flags, flags);
 	}
 
 	if (data && data[IFLA_MACVLAN_BC_QUEUE_LEN]) {
@@ -1658,7 +1662,7 @@ static int macvlan_changelink(struct net_device *dev,
 			vlan, nla_get_s32(data[IFLA_MACVLAN_BC_CUTOFF]));
 
 	if (set_mode)
-		vlan->mode = mode;
+		WRITE_ONCE(vlan->mode, mode);
 	if (data && data[IFLA_MACVLAN_MACADDR_MODE]) {
 		if (vlan->mode != MACVLAN_MODE_SOURCE)
 			return -EINVAL;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH net-next 2/2] macvlan: no longer rely on RTNL in macvlan_fill_info()
From: Eric Dumazet @ 2026-07-01  8:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260701082214.2974946-1-edumazet@google.com>

Add READ_ONCE()/WRITE_ONCE() annotations on vlan->mode, vlan->flags,
vlan->bc_queue_len_req and port->bc_cutoff.

Fill IFLA_MACVLAN_MACADDR_DATA nested attribute and compute
on the fly the precise number of elements we put in it,
to fill an accurate IFLA_MACVLAN_MACADDR_COUNT attribute
as some user space applications could depend on its value
and the attributes order.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/macvlan.c | 71 +++++++++++++++++++++++++++++--------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 8b69cc9b70f98d7991110f5eda76d58d5d96fa81..9a4bc99dbf53b5f2cd6345f0af899f56fdb3a46b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -171,7 +171,7 @@ static int macvlan_hash_add_source(struct macvlan_dev *vlan,
 	RCU_INIT_POINTER(entry->vlan, vlan);
 	h = &port->vlan_source_hash[macvlan_eth_hash(addr)];
 	hlist_add_head_rcu(&entry->hlist, h);
-	vlan->macaddr_count++;
+	WRITE_ONCE(vlan->macaddr_count, vlan->macaddr_count + 1);
 
 	return 0;
 }
@@ -402,7 +402,7 @@ static void macvlan_flush_sources(struct macvlan_port *port,
 		if (rcu_access_pointer(entry->vlan) == vlan)
 			macvlan_hash_del_source(entry);
 
-	vlan->macaddr_count = 0;
+	WRITE_ONCE(vlan->macaddr_count, 0);
 }
 
 static void macvlan_forward_source_one(struct sk_buff *skb,
@@ -874,7 +874,7 @@ static void update_port_bc_cutoff(struct macvlan_dev *vlan, int cutoff)
 	if (vlan->port->bc_cutoff == cutoff)
 		return;
 
-	vlan->port->bc_cutoff = cutoff;
+	WRITE_ONCE(vlan->port->bc_cutoff, cutoff);
 	macvlan_recompute_bc_filter(vlan);
 }
 
@@ -1427,7 +1427,7 @@ static int macvlan_changelink_sources(struct macvlan_dev *vlan, u32 mode,
 		entry = macvlan_hash_lookup_source(vlan, addr);
 		if (entry) {
 			macvlan_hash_del_source(entry);
-			vlan->macaddr_count--;
+			WRITE_ONCE(vlan->macaddr_count, vlan->macaddr_count - 1);
 		}
 	} else if (mode == MACVLAN_MACADDR_FLUSH) {
 		macvlan_flush_sources(vlan->port, vlan);
@@ -1653,7 +1653,8 @@ static int macvlan_changelink(struct net_device *dev,
 	}
 
 	if (data && data[IFLA_MACVLAN_BC_QUEUE_LEN]) {
-		vlan->bc_queue_len_req = nla_get_u32(data[IFLA_MACVLAN_BC_QUEUE_LEN]);
+		WRITE_ONCE(vlan->bc_queue_len_req,
+			   nla_get_u32(data[IFLA_MACVLAN_BC_QUEUE_LEN]));
 		update_port_bc_queue_len(vlan->port);
 	}
 
@@ -1676,10 +1677,12 @@ static int macvlan_changelink(struct net_device *dev,
 
 static size_t macvlan_get_size_mac(const struct macvlan_dev *vlan)
 {
-	if (vlan->macaddr_count == 0)
+	unsigned int macaddr_count = READ_ONCE(vlan->macaddr_count);
+
+	if (!macaddr_count)
 		return 0;
 	return nla_total_size(0) /* IFLA_MACVLAN_MACADDR_DATA */
-		+ vlan->macaddr_count * nla_total_size(sizeof(u8) * ETH_ALEN);
+		+ macaddr_count * nla_total_size(sizeof(u8) * ETH_ALEN);
 }
 
 static size_t macvlan_get_size(const struct net_device *dev)
@@ -1702,53 +1705,75 @@ static int macvlan_fill_info_macaddr(struct sk_buff *skb,
 				     const int i)
 {
 	struct hlist_head *h = &vlan->port->vlan_source_hash[i];
-	struct macvlan_source_entry *entry;
+	const struct macvlan_source_entry *entry;
+	int cnt = 0;
 
-	hlist_for_each_entry_rcu(entry, h, hlist, lockdep_rtnl_is_held()) {
+	hlist_for_each_entry_rcu(entry, h, hlist) {
 		if (rcu_access_pointer(entry->vlan) != vlan)
 			continue;
 		if (nla_put(skb, IFLA_MACVLAN_MACADDR, ETH_ALEN, entry->addr))
-			return 1;
+			return -EMSGSIZE;
+		cnt++;
 	}
-	return 0;
+	return cnt;
 }
 
 static int macvlan_fill_info(struct sk_buff *skb,
 				const struct net_device *dev)
 {
-	struct macvlan_dev *vlan = netdev_priv(dev);
+	const struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvlan_port *port = vlan->port;
-	int i;
-	struct nlattr *nest;
+	unsigned int macaddr_count = 0;
+	struct nlattr *nest, *attr;
+	int bc_cutoff, cnt, i;
 
-	if (nla_put_u32(skb, IFLA_MACVLAN_MODE, vlan->mode))
+	rcu_read_lock();
+	if (nla_put_u32(skb, IFLA_MACVLAN_MODE, READ_ONCE(vlan->mode)))
 		goto nla_put_failure;
-	if (nla_put_u16(skb, IFLA_MACVLAN_FLAGS, vlan->flags))
+
+	if (nla_put_u16(skb, IFLA_MACVLAN_FLAGS, READ_ONCE(vlan->flags)))
 		goto nla_put_failure;
-	if (nla_put_u32(skb, IFLA_MACVLAN_MACADDR_COUNT, vlan->macaddr_count))
+
+	attr = nla_reserve(skb, IFLA_MACVLAN_MACADDR_COUNT, sizeof(u32));
+	if (!attr)
 		goto nla_put_failure;
-	if (vlan->macaddr_count > 0) {
+
+	if (READ_ONCE(vlan->macaddr_count) > 0) {
 		nest = nla_nest_start_noflag(skb, IFLA_MACVLAN_MACADDR_DATA);
 		if (nest == NULL)
 			goto nla_put_failure;
 
 		for (i = 0; i < MACVLAN_HASH_SIZE; i++) {
-			if (macvlan_fill_info_macaddr(skb, vlan, i))
+			cnt = macvlan_fill_info_macaddr(skb, vlan, i);
+			if (cnt < 0)
 				goto nla_put_failure;
+			macaddr_count += cnt;
 		}
-		nla_nest_end(skb, nest);
+		if (!macaddr_count)
+			nla_nest_cancel(skb, nest);
+		else if (nla_nest_end_safe(skb, nest) < 0)
+			goto nla_put_failure;
 	}
-	if (nla_put_u32(skb, IFLA_MACVLAN_BC_QUEUE_LEN, vlan->bc_queue_len_req))
+	*(u32 *)nla_data(attr) = macaddr_count;
+
+	if (nla_put_u32(skb, IFLA_MACVLAN_BC_QUEUE_LEN,
+			READ_ONCE(vlan->bc_queue_len_req)))
 		goto nla_put_failure;
+
 	if (nla_put_u32(skb, IFLA_MACVLAN_BC_QUEUE_LEN_USED,
 			READ_ONCE(port->bc_queue_len_used)))
 		goto nla_put_failure;
-	if (port->bc_cutoff != 1 &&
-	    nla_put_s32(skb, IFLA_MACVLAN_BC_CUTOFF, port->bc_cutoff))
+
+	bc_cutoff = READ_ONCE(port->bc_cutoff);
+	if (bc_cutoff != 1 &&
+	    nla_put_s32(skb, IFLA_MACVLAN_BC_CUTOFF, bc_cutoff))
 		goto nla_put_failure;
+
+	rcu_read_unlock();
 	return 0;
 
 nla_put_failure:
+	rcu_read_unlock();
 	return -EMSGSIZE;
 }
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH] net: phylink: reject unsupported speed/duplex in ksettings_set() with PHY
From: Maxime Chevallier @ 2026-07-01  8:29 UTC (permalink / raw)
  To: muhammad.nazim.amirul.nazle.asmade, linux, andrew, hkallweit1
  Cc: davem, edumazet, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20260701031746.23448-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

Hi,

On 7/1/26 05:17, muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
> From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

Target tree tag is still also missing in the subject :)

> When using ethtool to change speed and duplex on a phylink-managed
> interface with a PHY attached, the requested speed/duplex combination
> is not validated against the MAC's supported capabilities before being
> passed down to the PHY layer.
> 
> commit df0acdc59b09 ("net: phylink: fix ksettings_set() ethtool call")
> and commit 03c44a21d033 ("net: phylink: actually fix ksettings_set()
> ethtool call") introduced masking of the PHY advertising modes against
> pl->supported, but did not add an explicit check that the requested
> speed/duplex itself is within the MAC's capability set.
> 
> The AUTONEG_DISABLE path in the non-PHY case already uses
> phy_caps_lookup() to validate speed/duplex against pl->supported.
> Extend the same validation to the pl->phydev path so that ethtool
> requests for unsupported speed/duplex combinations are rejected with
> -EINVAL before reaching the PHY layer.
> 
> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
> ---
>  drivers/net/phy/phylink.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 087ac63f9193..22f9bbd381bd 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -2989,6 +2989,10 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>  	if (pl->phydev) {
>  		struct ethtool_link_ksettings phy_kset = *kset;
>  
> +		if (!phy_caps_lookup(kset->base.speed, kset->base.duplex,
> +				     pl->supported, true))
> +			return -EINVAL;
> +
>  		linkmode_and(phy_kset.link_modes.advertising,
>  			     phy_kset.link_modes.advertising,
>  			     pl->supported);

I can indeed reproduce that, with a 1000FD-only mac, running

  ethtool -s eth2 speed 1000 duplex half autoneg off

brings the link down, no error reported, and running

  ethtool -s eth2 speed 1000 duplex half autoneg on

also brings the link down, with

Advertised link modes:  Not reported

This is expected, but yeah no error reported.

I think rejecting these settings makes sense, I'm however wondering
wether this is a fix or not, as this will change user-visible behaviour.
I'd err to the side of caution and send that to net-next, but maybe
Andrew will have more insight :)

So at least, you'll have to resubmit targetting the correct tree.

Maxime

^ permalink raw reply

* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
From: Vimal Agrawal @ 2026-07-01  8:30 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: kuba, edumazet, netdev, vimal.agrawal
In-Reply-To: <CAAVpQUAN87UybUdiCy_A+UThmAN9haB2ie2-usW4qZoY31XEGA@mail.gmail.com>

Hi Kuniyuki,

I understand the recommendation to set gc_thresh3 = gc_thresh2 +
headroom. However, in field deployments the neighbour table size can
be unpredictable — traffic patterns, routing changes, and network
events can cause transient spikes that exceed even a well-configured
gc_thresh3.

We have reproduced this with 32k IXIA clients where we observed soft
lockups under sustained neighbour creation. This is what originally
motivated the investigation.

The current behaviour when gc_thresh3 is exceeded — calling
neigh_forced_gc() on every allocation with no rate limiting — is
unnecessarily severe. It causes lock contention proportional to the
table size on every allocation attempt, even briefly exceeding
gc_thresh3 results in significant latency impact.

The patch does not change GC semantics or thresholds. It simply
prevents repeated full table scans within a short window, which is
harmless when GC is effective and protective when it is not. This
seems like a reasonable defensive improvement regardless of how
thresholds are configured.


Thanks,
Vimal

On Tue, Jun 30, 2026 at 10:06 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Tue, Jun 30, 2026 at 5:01 AM Vimal Agrawal <avimalin@gmail.com> wrote:
> >
> > Hi Kuniyuki,
> >
> > You are correct that in this specific test case GC does not help since
> > all entries are active/reachable. However, this is not the only
> > scenario where entries can exceed gc_thresh3.
> >
> > In a real workload, the table can exceed gc_thresh3 with a mix of
> > active and stale entries. In that case GC does help, but should not be
> > called on every allocation attempt — once per 50ms is sufficient for
> > GC to make progress without causing lock contention.
>
> My mental model is that gc_thresh3 is the hard limit while gc_thresh2
> is the soft limit, so if the total number of entries often exceeds gc_thresh3,
> it's clearly wrong.
>
> I think you need to set gc_thresh2 to a proper baseline (it sounds like
> your current gc_thresh3 is the one) and gc_thresh3 to gc_thresh2+X
> where X covers fluctuations.
>
>
> >
> > The rate limiting also protects against the case where GC cannot
> > reclaim anything. Without it, every allocation attempt above
> > gc_thresh3 triggers a full table scan holding tbl->lock, even when GC
> > has no work to do.
> >
> > Thanks,
> > Vimal
> >
> > On Mon, Jun 29, 2026 at 11:35 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > >
> > > On Mon, Jun 29, 2026 at 12:57 AM Vimal Agrawal <avimalin@gmail.com> wrote:
> > > >
> > > > Hi Kuniyuki,
> > > > Thank you for the feedback.
> > > > However, the rate limiting issue exists independently of the threshold
> > > > values. If entries genuinely exceed gc_thresh3 — regardless of what it
> > > > is set to — neigh_forced_gc() is called on every allocation attempt
> > > > with no rate limiting. In my workload, most entries are
> > > > active/reachable with refcnt > 1, so the GC walk traverses the entire
> > > > table without reclaiming anything.
> > >
> > > This suggests your gc_thresh2/3 do not fit your use case.
> > >
> > > If GC does not help, there is no point in running it or rate-limiting
> > > in the first place.
> > >
> > >
> > > > Increasing gc_thresh3 would make
> > > > this worse, not better, as GC now has a larger table to scan on each
> > > > call.
> > >
> > > If you just increase gc_thresh3 slightly, then yes, it won't help.
> > >
> > >
> > > >
> > > > Regarding neigh_hash_shift: in my workload, neigh_alloc() returns
> > > > ENOBUFS before reaching do_alloc() since GC cannot reclaim any
> > > > entries. kzalloc() is never called, so neigh_hash_grow() is not
> > > > involved in the latency I observed. The pre-lock time check in
> > > > neigh_forced_gc() is a low-cost safeguard that prevents repeated full
> > > > table scans regardless of gc_thresh3 value. It does not interfere with
> > > > correct GC behaviour — if entries are still above the threshold, GC
> > > > runs normally.
> > > >
> > > >
> > > > Hi Jakub,
> > > > I tested with different threshold values, filling the table completely
> > > > with 32k reachable entries and attempting 1000 additional allocations.
> > > > Exported neigh_forced_gc so that it can be profiled
> > > >                          no change  10ms   50ms   100ms
> > > > max cpu usage %          44%        11.8%  2.56%  1.42%
> > > > calls > 100us (of 1000)  101        31     13     7
> > > >
> > > > At 10ms, max CPU usage is still 11.8% and 31 out of 1000 calls take
> > > > more than 100us. Given that 50ms reduces this to 2.56% and 13 calls
> > > > respectively, I would prefer 50ms as the threshold. However, I am open
> > > > to further discussion on the right value.
> > > >
> > > > Thanks,
> > > > Vimal
> > > >
> > > >
> > > > On Fri, Jun 26, 2026 at 3:17 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > > > >
> > > > > From: Vimal Agrawal <avimalin@gmail.com>
> > > > > Date: Thu, 25 Jun 2026 10:20:20 +0000
> > > > > > Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> > > > > > on every allocation attempt with no rate limiting. In workloads with mostly
> > > > > > active/reachable entries, the GC walk traverses a large portion of the
> > > > > > neighbour table without reclaiming entries, holding tbl->lock for an
> > > > > > extended period. This causes severe lock contention and allocation
> > > > > > latencies exceeding 16ms under sustained neighbour creation.
> > > > > >
> > > > > > Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> > > > > > performed within the last second, avoiding repeated full table scans and
> > > > > > lock acquisitions on the hot allocation path.
> > > > > >
> > > > > > Profiling of neigh_create() shows ~3 orders of magnitude latency
> > > > > > improvement with this change.
> > > > > >
> > > > > > Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/
> > > > >
> > > > > From the thread, these look misconfigured.
> > > > >
> > > > > ---8<---
> > > > > net.ipv6.neigh.default.gc_thresh2 = 32768
> > > > > net.ipv6.neigh.default.gc_thresh3 = 32768
> > > > > ---8<---
> > > > >
> > > > > If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
> > > > > rate limiting.
> > > > >
> > > > > If the number of active neigh entries constantly exceeds
> > > > > gc_thresh3, it will be the correct gc_thresh2 for you.
> > > > >
> > > > > Also, I guess you want a new kernel param for the first
> > > > > neigh_hash_alloc(), which is currently fixed for 3, which
> > > > > is too small for some hosts.
> > > > >
> > > > > 50000 entries require neigh_hash_grow() 13 times.
> > > > >
> > > > > Can you test this on your real workload, starting from
> > > > > neigh_hash_shift=16 and appropriate gc_thresh2/3 ?
> > > > >
> > > > > ---8<---
> > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > > index 1349c0eedb64..a75b3750eec9 100644
> > > > > --- a/net/core/neighbour.c
> > > > > +++ b/net/core/neighbour.c
> > > > > @@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
> > > > >  static struct lock_class_key neigh_table_proxy_queue_class;
> > > > >
> > > > >  static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
> > > > > +static __initdata unsigned long neigh_hash_shift = 3;
> > > > > +
> > > > > +static int __init neigh_set_hash_shift(char *str)
> > > > > +{
> > > > > +       ssize_t ret;
> > > > > +
> > > > > +       if (!str)
> > > > > +               return 0;
> > > > > +
> > > > > +       ret = kstrtoul(str, 0, &neigh_hash_shift);
> > > > > +       if (ret)
> > > > > +               return 0;
> > > > > +
> > > > > +       return 1;
> > > > > +}
> > > > > +__setup("neigh_hash_shift=", neigh_set_hash_shift);
> > > > >
> > > > >  void neigh_table_init(int index, struct neigh_table *tbl)
> > > > >  {
> > > > > @@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
> > > > >                 panic("cannot create neighbour proc dir entry");
> > > > >  #endif
> > > > >
> > > > > -       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
> > > > > +       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));
> > > > >
> > > > >         phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
> > > > >         tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
> > > > > ---8<---
> > > > >
> > > > >
> > > > >
> > > > > > Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
> > > > > > ---
> > > > > >  net/core/neighbour.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > >
> > > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > > > index 1349c0eedb64..078842db3c5f 100644
> > > > > > --- a/net/core/neighbour.c
> > > > > > +++ b/net/core/neighbour.c
> > > > > > @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > > >       int shrunk = 0;
> > > > > >       int loop = 0;
> > > > > >
> > > > > > +     if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> > > > > > +             return 0;
> > > > > > +
> > > > > >       NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
> > > > > >
> > > > > >       spin_lock_bh(&tbl->lock);
> > > > > > --
> > > > > > 2.17.1
> > > > > > v

^ permalink raw reply

* Re: [PATCH v4 1/7] dt-bindings: mtd: jedec,spi-nor: allow the SFDP to be exposed via NVMEM
From: Michael Walle @ 2026-07-01  8:34 UTC (permalink / raw)
  To: Linus Walleij, Manikandan Muralidharan
  Cc: pratyush, mwalle, takahiro.kuwano, miquel.raynal, richard,
	vigneshr, robh, krzk+dt, conor+dt, srini, nicolas.ferre,
	alexandre.belloni, claudiu.beznea, linux, richardcochran, arnd,
	linux-mtd, devicetree, linux-kernel, linux-arm-kernel, netdev
In-Reply-To: <CAD++jL=FkEfpz-LW0vmPpZ28fLfGFMWo5E479Mapz55YUxKNAQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

Hi,

>> Add an optional "sfdp" child node (compatible "jedec,sfdp") that
>> describes the SFDP as a read-only NVMEM provider via nvmem.yaml, so its
>> contents (e.g. a vendor EUI-48/EUI-64) can be read through NVMEM cells.
>>
>> Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
>
> I would expect it to follow nvmem conventions like this, notice
> compatibles specific-to-general with sfdp first:


> sfdp {
>     /* NVMEM provided by SFDP */
>     compatible = "jedec,sfdp", "nvmem-cells";
>     label = "SFDP";

Isn't using label frowned upon? I wouldn't add that to the example.

>     read-only;
>     #address-cells = <1>;
>     #size-cells = <1>;
>
>     mac0: macaddr@0x00 {
>         reg = <0x00 0x06>;
>     };
>     mac1: macaddr@0x06 {
>         reg = <0x06 0x06>;
>     };
> };

If I'm correct, this is the old style, see commit bd912c991d2e
("dt-bindings: nvmem: layouts: add fixed-layout"). So it should
eventually look like:

sfdp {
     compatible = "jedec,sfdp";

     nvmem-layout {
	     compatible = "microchip,sst26vf-sfdp-eui";
     };
};

Which is what patch series will lead to.

Also I'm not sure if we really need to add the "nvmem-cells" here.
IIRC in MTD it was there to tell a driver to add an nvmem device to
an already existing compatible/node.

Apart from the MTD case, I've just found qcom,smem-part,yaml which
has compatible = "nvmem-cells".

-michael

> Your example should definitely be more elaborate like this,
> just an opaque sfdp node will not suffice. Maybe a separate
> example?
>
> Yours,
> Linus Walleij


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 297 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: igmp: remove multicast group from hash table on device destruction
From: Yuyang Huang @ 2026-07-01  8:58 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Kuniyuki Iwashima, davem, dsahern, edumazet, horms,
	jedrzej.jagielski, kuba, linux-kernel, netdev, pabeni,
	xiyou.wangcong
In-Reply-To: <20260701081105.GA1289614@shredder>

On Wed, Jul 1, 2026 at 5:11 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> I agree, but let's do it as a separate change in net-next. The current
> one line fix is correct and fixes the root cause. Clearing the pointer
> happens to fix the problem because it relies on mc_hash only being
> accessible via dev->in_dev (vs reaching in_dev via a different path).

Acked, I can send out a separate patch for fixing this part and keep
this change as it.

^ permalink raw reply

* [PATCH net-next v3 0/3] ptp: Add driver for R-Car Gen4 gPTP timer
From: Niklas Söderlund @ 2026-07-01  9:06 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Richard Cochran, Andrew Lunn,
	DavidS. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-renesas-soc, devicetree, linux-kernel, netdev
  Cc: Niklas Söderlund

Hello,

This series is the first part cleaning up how PTP timer support is
implemented on R-Car Gen4. Currently there is partial support for it in
some of the Ethernet devices that can use it, but not all.

The partial support have been implemented by hacking the gPTP module
directly into the first Ethernet device driver that used it, RTSN for
V4H and RSWITCH for S4. This is understandable as earlier R-Car
generations had a dedicated gPTP timer for each Ethernet device, but on
Gen4 there is a single system-wide PTP timer shared by all.

The current implementation makes it impossible for other Ethernet
devices on the platform to use the PTP timer without messing around with
other Ethernet device drivers.

The effort to clean this up starts with this series which adds the
system-wide gPTP timer as its own driver and device tree node.

This series will then be followed by work to add proper PTP support to
the R-Car RAVB Gen4 driver, which currently advertises to user-space it
supports PTP but which implementation is broken and does not work.

This will in turn be followed by work to the RTSN and RSWITCH drivers
will be be switched from its current partial support by mapping the gPTP
address space directly to instead use this driver.

Having both this and RTSN/RSWITCH described and enabled (!) in device
tree will not work as they will try to use the same memory region. For
this reason this new solution will only be enabled on platforms
after all user's of the gPTP clock have moved to only use the new
centralized timer. But in the interim both devices will be described
(but not enabled) in the platforms base dtsi file.

For some platforms this is straight forward, such as V4H Sparrow Hawk,
which only have the RAVB Ethernet interface. This platform currently
have no users of the PTP timer, but still advertise it supports it. This
and the soon to be posted RAVB patches solves that.

As the RAVB patches depends on this series the device tree node for the
gPTP clock is added in this series but will be enabled and linked to
consumers in the RAVB gPTP series for platforms where it will not
conflict with RTSN and RSWITCH. And further enabled as more of this is
cleaned up.

The gPTP driver itself is heavily influence by the existing partial
support for gPTP in the RTSN and RSWITCH drivers and the Renesas BSP.

Niklas Söderlund (3):
  dt-bindings: ptp: renesas,rcar-gen4-gptp: Add R-Car Gen4
  ptp: Add driver for R-Car Gen4
  arm64: dts: renesas: r8a779g0: Add gPTP node

 .../bindings/ptp/renesas,rcar-gen4-gptp.yaml  |  64 +++++
 MAINTAINERS                                   |   7 +
 arch/arm64/boot/dts/renesas/r8a779g0.dtsi     |   9 +
 drivers/ptp/Kconfig                           |  12 +
 drivers/ptp/Makefile                          |   1 +
 drivers/ptp/ptp_rcar_gen4.c                   | 219 ++++++++++++++++++
 6 files changed, 312 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml
 create mode 100644 drivers/ptp/ptp_rcar_gen4.c

-- 
2.55.0

^ permalink raw reply

* [PATCH net-next v3 1/3] dt-bindings: ptp: renesas,rcar-gen4-gptp: Add R-Car Gen4
From: Niklas Söderlund @ 2026-07-01  9:06 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Richard Cochran, Andrew Lunn,
	DavidS. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-renesas-soc, devicetree, linux-kernel, netdev
  Cc: Niklas Söderlund, Krzysztof Kozlowski
In-Reply-To: <20260701090607.1108208-1-niklas.soderlund+renesas@ragnatech.se>

Add bindings for the R-Car Gen4 gPTP timer. The timer enables accurate
synchronization of the clock in the control system. The timer is
system-wide and used by different Ethernet devices on each Gen4 platform.

  - On R-Car S4 it is shared between RSWITCH and RAVB.

  - On R-Car V4H it is shared between RTSN and RAVB.

  - On R-Car V4M it is only used by RAVB.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
---
* Changes since v1
- Drop 'binding for' for patch subject.
- Drop comment for renesas,rcar-gen4-gptp compatible to match other
  Renesas bindings.
- Drop unused label in example.
- Rename node ptp in example.
---
 .../bindings/ptp/renesas,rcar-gen4-gptp.yaml  | 64 +++++++++++++++++++
 MAINTAINERS                                   |  6 ++
 2 files changed, 70 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml

diff --git a/Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml b/Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml
new file mode 100644
index 000000000000..3edd64d40038
--- /dev/null
+++ b/Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml
@@ -0,0 +1,64 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+# Copyright (C) 2026 Renesas Electronics Corp.
+# Copyright (C) 2026 Niklas Söderlund <niklas.soderlund@ragnatech.se>
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/ptp/renesas,rcar-gen4-gptp.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Renesas R-Car Gen4 gPTP timer
+
+maintainers:
+  - Niklas Söderlund <niklas.soderlund@ragnatech.se>
+
+description:
+  The R-Car Gen4 gPTP timer enables accurate synchronization of the clock in
+  the control system. The timer is system-wide and used by different Ethernet
+  devices on each Gen4 platform.
+
+    - On R-Car S4 it is shared between RSWITCH and RAVB.
+    - On R-Car V4H it is shared between RTSN and RAVB.
+    - On R-Car V4M it is only used by RAVB.
+
+properties:
+  compatible:
+    items:
+      - enum:
+          - renesas,r8a779f0-gptp # S4-8
+          - renesas,r8a779g0-gptp # V4H
+          - renesas,r8a779h0-gptp # V4M
+      - const: renesas,rcar-gen4-gptp
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  power-domains:
+    maxItems: 1
+
+  resets:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - power-domains
+  - resets
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/r8a779g0-cpg-mssr.h>
+    #include <dt-bindings/power/r8a779g0-sysc.h>
+
+    ptp@e6449000 {
+            compatible = "renesas,r8a779g0-gptp", "renesas,rcar-gen4-gptp";
+            reg = <0xe6449000 0x500>;
+            clocks = <&cpg CPG_MOD 2723>;
+            power-domains = <&sysc R8A779G0_PD_ALWAYS_ON>;
+            resets = <&cpg 2723>;
+    };
diff --git a/MAINTAINERS b/MAINTAINERS
index 15011f5752a9..ef17128d6f3f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22901,6 +22901,12 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/mtd/renesas-nandc.yaml
 F:	drivers/mtd/nand/raw/renesas-nand-controller.c
 
+RENESAS R-CAR GEN4 GPTP DRIVER
+M:	Niklas Söderlund <niklas.soderlund@ragnatech.se>
+L:	linux-renesas-soc@vger.kernel.org
+S:	Supported
+F:	Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml
+
 RENESAS R-CAR GYROADC DRIVER
 M:	Marek Vasut <marek.vasut+renesas@mailbox.org>
 L:	linux-iio@vger.kernel.org
-- 
2.55.0


^ permalink raw reply related

* [PATCH net-next v3 2/3] ptp: Add driver for R-Car Gen4
From: Niklas Söderlund @ 2026-07-01  9:06 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Richard Cochran, Andrew Lunn,
	DavidS. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-renesas-soc, devicetree, linux-kernel, netdev
  Cc: Niklas Söderlund
In-Reply-To: <20260701090607.1108208-1-niklas.soderlund+renesas@ragnatech.se>

Add driver for the gPTP timer found on R-Car Gen4 devices. The timer is
system-wide and shared by different Ethernet devices on each Gen4
platform. The operation of the timer is however not completely in
depended of the systems Ethernet devices.

  - On R-Car S4 is gated by the RSWITCH Ethernet module clock.

  - On R-Car V4H is gated by the RTSN Ethernet module clock.

  - On R-Car V4M is gated by its own module clock, the system have
    neither RTSN or RSWITCH device. But the module clock is the same as
    RTSN on V4H and the documentation referees to it as tsn (EtherTSN).

The gPTP device do have its own register space on all three platforms.
But on S4 and V4H it will share its clock and reset property with
RSWITCH or RTSN, respectively.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 MAINTAINERS                 |   1 +
 drivers/ptp/Kconfig         |  12 ++
 drivers/ptp/Makefile        |   1 +
 drivers/ptp/ptp_rcar_gen4.c | 219 ++++++++++++++++++++++++++++++++++++
 4 files changed, 233 insertions(+)
 create mode 100644 drivers/ptp/ptp_rcar_gen4.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ef17128d6f3f..4a387623409b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22906,6 +22906,7 @@ M:	Niklas Söderlund <niklas.soderlund@ragnatech.se>
 L:	linux-renesas-soc@vger.kernel.org
 S:	Supported
 F:	Documentation/devicetree/bindings/ptp/renesas,rcar-gen4-gptp.yaml
+F:	drivers/ptp/ptp_rcar_gen4.c
 
 RENESAS R-CAR GYROADC DRIVER
 M:	Marek Vasut <marek.vasut+renesas@mailbox.org>
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index b93640ca08b7..3593fd9da92a 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -263,4 +263,16 @@ config PTP_NETC_V4_TIMER
 	  synchronization. It also supports periodic output signal (e.g. PPS)
 	  and external trigger timestamping.
 
+config PTP_RCAR_GEN4
+	tristate "Renesas R-Car Gen4 PTP Driver"
+	depends on ARCH_RENESAS || COMPILE_TEST
+	depends on PTP_1588_CLOCK
+	help
+	  This driver adds support for using the Renesas R-Car Gen4 gPTP timer
+	  as a PTP clock, the clock can then be used by Gen4 Ethernet drivers
+	  for PTP time synchronization.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called ptp_rcar_gen4.
+
 endmenu
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index bdc47e284f14..0464a586bed2 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_PTP_1588_CLOCK_OCP)	+= ptp_ocp.o
 obj-$(CONFIG_PTP_DFL_TOD)		+= ptp_dfl_tod.o
 obj-$(CONFIG_PTP_S390)			+= ptp_s390.o
 obj-$(CONFIG_PTP_NETC_V4_TIMER)		+= ptp_netc.o
+obj-$(CONFIG_PTP_RCAR_GEN4)		+= ptp_rcar_gen4.o
diff --git a/drivers/ptp/ptp_rcar_gen4.c b/drivers/ptp/ptp_rcar_gen4.c
new file mode 100644
index 000000000000..ab0be2431be8
--- /dev/null
+++ b/drivers/ptp/ptp_rcar_gen4.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Renesas R-Car Gen4 gPTP device driver
+ *
+ * Copyright (C) 2026 Renesas Electronics Corporation
+ * Copyright (C) 2026 Niklas Söderlund <niklas.soderlund@ragnatech.se>
+ */
+
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/ptp_clock_kernel.h>
+#include <linux/types.h>
+
+#define PTPTMEC_REG		0x0010
+#define PTPTMDC_REG		0x0014
+#define PTPTIVC0_REG		0x0020
+#define PTPTOVC00_REG		0x0030
+#define PTPTOVC10_REG		0x0034
+#define PTPTOVC20_REG		0x0038
+#define PTPGPTPTM00_REG		0x0050
+#define PTPGPTPTM10_REG		0x0054
+#define PTPGPTPTM20_REG		0x0058
+
+struct ptp_rcar_gen4_priv {
+	void __iomem *base;
+	struct clk *clk;
+
+	struct ptp_clock *clock;
+	struct ptp_clock_info info;
+
+	spinlock_t lock;	/* Registers access. */
+	s64 default_addend;
+};
+
+#define ptp_to_priv(ptp) container_of(ptp, struct ptp_rcar_gen4_priv, info)
+
+static int ptp_rcar_gen4_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+	s64 addend = priv->default_addend;
+	bool neg_adj = scaled_ppm < 0;
+	unsigned long flags;
+	s64 diff;
+
+	if (neg_adj)
+		scaled_ppm = -scaled_ppm;
+	diff = div_s64(addend * scaled_ppm_to_ppb(scaled_ppm), NSEC_PER_SEC);
+	addend = neg_adj ? addend - diff : addend + diff;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	iowrite32(addend, priv->base + PTPTIVC0_REG);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return 0;
+}
+
+static void _ptp_rcar_gen4_gettime(struct ptp_clock_info *ptp,
+				   struct timespec64 *ts)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+
+	lockdep_assert_held(&priv->lock);
+
+	ts->tv_nsec = ioread32(priv->base + PTPGPTPTM00_REG);
+	ts->tv_sec = ioread32(priv->base + PTPGPTPTM10_REG) |
+		((s64)ioread32(priv->base + PTPGPTPTM20_REG) << 32);
+}
+
+static int ptp_rcar_gen4_gettime(struct ptp_clock_info *ptp,
+				 struct timespec64 *ts)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	_ptp_rcar_gen4_gettime(ptp, ts);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return 0;
+}
+
+static void _ptp_rcar_gen4_settime(struct ptp_clock_info *ptp,
+				   const struct timespec64 *ts)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+
+	lockdep_assert_held(&priv->lock);
+
+	iowrite32(1, priv->base + PTPTMDC_REG);
+	iowrite32(0, priv->base + PTPTOVC20_REG);
+	iowrite32(0, priv->base + PTPTOVC10_REG);
+	iowrite32(0, priv->base + PTPTOVC00_REG);
+	iowrite32(1, priv->base + PTPTMEC_REG);
+	iowrite32(ts->tv_sec >> 32, priv->base + PTPTOVC20_REG);
+	iowrite32(ts->tv_sec, priv->base + PTPTOVC10_REG);
+	iowrite32(ts->tv_nsec, priv->base + PTPTOVC00_REG);
+}
+
+static int ptp_rcar_gen4_settime(struct ptp_clock_info *ptp,
+				 const struct timespec64 *ts)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	_ptp_rcar_gen4_settime(ptp, ts);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return 0;
+}
+
+static int ptp_rcar_gen4_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+	struct ptp_rcar_gen4_priv *priv = ptp_to_priv(ptp);
+	struct timespec64 ts;
+	unsigned long flags;
+	s64 now;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	_ptp_rcar_gen4_gettime(ptp, &ts);
+	now = ktime_to_ns(timespec64_to_ktime(ts));
+	ts = ns_to_timespec64(now + delta);
+	_ptp_rcar_gen4_settime(ptp, &ts);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return 0;
+}
+
+static struct ptp_clock_info ptp_rcar_gen4_info = {
+	.owner = THIS_MODULE,
+	.name = "R-Car Gen4 gPTP",
+	.max_adj = 50000000,
+	.adjfine = ptp_rcar_gen4_adjfine,
+	.adjtime = ptp_rcar_gen4_adjtime,
+	.gettime64 = ptp_rcar_gen4_gettime,
+	.settime64 = ptp_rcar_gen4_settime,
+};
+
+static int ptp_rcar_gen4_probe(struct platform_device *pdev)
+{
+	struct ptp_rcar_gen4_priv *priv;
+	struct device *dev = &pdev->dev;
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, priv);
+
+	priv->base = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(priv->base))
+		return PTR_ERR(priv->base);
+
+	priv->clk = devm_clk_get(dev, NULL);
+	if (IS_ERR(priv->clk))
+		return PTR_ERR(priv->clk);
+
+	spin_lock_init(&priv->lock);
+
+	priv->info = ptp_rcar_gen4_info;
+
+	/* Default timer increment in ns.
+	 * bit[31:27] - integer
+	 * bit[26:0]  - decimal
+	 * increment[ns] = perid[ns] * 2^27 => (1ns * 2^27) / rate[hz]
+	 */
+	priv->default_addend =
+		div_s64(1000000000LL << 27, clk_get_rate(priv->clk));
+
+	pm_runtime_enable(dev);
+	pm_runtime_get_sync(dev);
+
+	iowrite32(priv->default_addend, priv->base + PTPTIVC0_REG);
+
+	priv->clock = ptp_clock_register(&priv->info, dev);
+	if (IS_ERR(priv->clock))
+		return PTR_ERR(priv->clock);
+
+	iowrite32(1, priv->base + PTPTMEC_REG);
+
+	return 0;
+}
+
+static void ptp_rcar_gen4_remove(struct platform_device *pdev)
+{
+	struct ptp_rcar_gen4_priv *priv = platform_get_drvdata(pdev);
+	struct device *dev = &pdev->dev;
+
+	ptp_clock_unregister(priv->clock);
+
+	iowrite32(1, priv->base + PTPTMDC_REG);
+
+	pm_runtime_put_sync(dev);
+	pm_runtime_disable(dev);
+}
+
+static const struct of_device_id ptp_rcar_gen4_of_match[] = {
+	{ .compatible = "renesas,rcar-gen4-gptp", },
+	{ /* Sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, ptp_rcar_gen4_of_match);
+
+static struct platform_driver ptp_rcar_gen4_driver = {
+	.driver = {
+		.name = "ptp-rcar-gen4",
+		.of_match_table = ptp_rcar_gen4_of_match,
+	},
+	.probe    = ptp_rcar_gen4_probe,
+	.remove   = ptp_rcar_gen4_remove,
+};
+module_platform_driver(ptp_rcar_gen4_driver);
+
+MODULE_AUTHOR("Niklas Söderlund");
+MODULE_DESCRIPTION("Renesas R-Car Gen4 gPTP driver");
+MODULE_LICENSE("GPL");
-- 
2.55.0


^ permalink raw reply related

* [PATCH net-next v3 3/3] arm64: dts: renesas: r8a779g0: Add gPTP node
From: Niklas Söderlund @ 2026-07-01  9:06 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Richard Cochran, Andrew Lunn,
	DavidS. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-renesas-soc, devicetree, linux-kernel, netdev
  Cc: Niklas Söderlund
In-Reply-To: <20260701090607.1108208-1-niklas.soderlund+renesas@ragnatech.se>

The gPTP module is shared between the RAVB and RTSN Ethernet devices on
the SoC.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
* Changes since v2
- Preserve sort order by unit-address.

* Changes since v1
- Rename node ptp.
---
 arch/arm64/boot/dts/renesas/r8a779g0.dtsi | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/boot/dts/renesas/r8a779g0.dtsi b/arch/arm64/boot/dts/renesas/r8a779g0.dtsi
index 82a7278836e5..b9b860ef7035 100644
--- a/arch/arm64/boot/dts/renesas/r8a779g0.dtsi
+++ b/arch/arm64/boot/dts/renesas/r8a779g0.dtsi
@@ -589,6 +589,15 @@ tmu4: timer@ffc00000 {
 			status = "disabled";
 		};
 
+		gptp: ptp@e6449000 {
+			compatible = "renesas,r8a779g0-gptp", "renesas,rcar-gen4-gptp";
+			reg = <0 0xe6449000 0 0x500>;
+			clocks = <&cpg CPG_MOD 2723>;
+			power-domains = <&sysc R8A779G0_PD_ALWAYS_ON>;
+			resets = <&cpg 2723>;
+			status = "disabled";
+		};
+
 		tsn0: ethernet@e6460000 {
 			compatible = "renesas,r8a779g0-ethertsn", "renesas,rcar-gen4-ethertsn";
 			reg = <0 0xe6460000 0 0x7000>,
-- 
2.55.0


^ permalink raw reply related

* Re: [PATCH net-next v3 5/5] selftest: Add tests for useful handling of LSM denials on SCM_RIGHTS
From: Jori Koolstra @ 2026-07-01  9:31 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jakub Kicinski, Aleksa Sarai, Kuniyuki Iwashima, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-fsdevel,
	linux-kernel
In-Reply-To: <20260701-malen-gutmachen-stengel-2c70ad5d2971@brauner>


> Op 01-07-2026 09:38 CEST schreef Christian Brauner <brauner@kernel.org>:
> 
> > 
> > I just need some LSM to trigger the reject of security_file_receive()
> > and Smack was the easiest to get going. The series is totally agnostic
> > to the used LSM. I am fine with moving the tests elsewhere or porting
> > them to SELinux if that is really necessary. We could also drop them
> > altogether.
> > 
> > What do you propose?
> 
> I'm pretty sure the easiest will be to use a tiny bpf program to reject
> security_file_receive().

Ah. Well, that's a testament to how much there's still to learn for me. 
I didn't even know that bpf could hook into LSM calls :)

^ permalink raw reply

* Re: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
From: Jiri Pirko @ 2026-07-01  9:38 UTC (permalink / raw)
  To: Mark Bloch
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Jonathan Corbet, Shuah Khan, netdev, linux-rdma, linux-doc
In-Reply-To: <20260629182102.245150-4-mbloch@nvidia.com>

Mon, Jun 29, 2026 at 08:20:58PM +0200, mbloch@nvidia.com wrote:
>Add devlink_eswitch_mode= kernel command line parsing for a default
>eswitch mode.
>
>The supported syntax selects either all devlink handles or one explicit
>comma-separated handle list:
>
>  devlink_eswitch_mode=*=<mode>
>
>  devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
>
>where <mode> is one of legacy, switchdev or switchdev_inactive. All
>selected handles receive the same mode. Assigning different modes to
>different handle lists in the same parameter value is not supported.
>
>Store the parsed selector and mode in devlink core so the default can be
>applied by a downstream patch.
>
>Document the devlink_eswitch_mode= syntax and duplicate handle handling.
>
>Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>---
> .../admin-guide/kernel-parameters.txt         |  25 ++
> .../networking/devlink/devlink-defaults.rst   |  78 ++++++
> Documentation/networking/devlink/index.rst    |   1 +
> net/devlink/core.c                            | 227 ++++++++++++++++++
> 4 files changed, 331 insertions(+)
> create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index b5493a7f8f22..117300dd589c 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -1249,6 +1249,31 @@ Kernel parameters
> 	dell_smm_hwmon.fan_max=
> 			[HW] Maximum configurable fan speed.
> 
>+	devlink_eswitch_mode=
>+			[NET]
>+			Format:
>+			<selector>=<mode>
>+
>+			<selector>:
>+			* | <handle>[,<handle>...]
>+
>+			<handle>:
>+			<bus-name>/<dev-name>
>+
>+			Configure default devlink eswitch mode for matching
>+			devlink instances during device initialization.
>+
>+			<mode>:
>+			legacy | switchdev | switchdev_inactive
>+
>+			Examples:
>+			devlink_eswitch_mode=*=switchdev
>+			devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+			devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+			See Documentation/networking/devlink/devlink-defaults.rst
>+			for the full syntax.
>+
> 	dfltcc=		[HW,S390]
> 			Format: { on | off | def_only | inf_only | always }
> 			on:       s390 zlib hardware support for compression on
>diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
>new file mode 100644
>index 000000000000..380c9e99210e
>--- /dev/null
>+++ b/Documentation/networking/devlink/devlink-defaults.rst
>@@ -0,0 +1,78 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==============================
>+Devlink Eswitch Mode Defaults
>+==============================
>+
>+Devlink eswitch mode defaults allow the eswitch mode to be provided on the
>+kernel command line and applied to matching devlink instances during device
>+initialization.
>+
>+The devlink device is selected by its devlink handle. For PCI devices this is
>+the same handle shown by ``devlink dev show``, for example
>+``pci/0000:08:00.0``.
>+
>+Kernel command line syntax
>+==========================
>+
>+Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
>+parameter.
>+
>+The general syntax is::
>+
>+  devlink_eswitch_mode=<selector>=<mode>
>+
>+``<selector>`` is either ``*`` or one or more devlink handles::
>+
>+  * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
>+
>+``*`` applies the mode to every devlink instance. All handles in the same
>+selector receive the same eswitch mode.
>+
>+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
>+
>+Syntax rules
>+------------
>+
>+The following syntax rules apply:
>+
>+* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
>+  ``devlink_eswitch_mode=`` parameters are not accumulated.
>+* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
>+  size.
>+* Whitespace is not allowed within the parameter value.
>+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
>+  combined with explicit handles.
>+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
>+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
>+  ``0000:08:00.0``.
>+* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
>+* A comma separates handles.
>+* Comma-separated default assignments are not supported.
>+* Duplicate handles are rejected and the devlink eswitch mode default is
>+  ignored.
>+
>+The eswitch mode default corresponds to the userspace command::
>+
>+  devlink dev eswitch set <handle> mode <value>
>+
>+
>+Examples
>+========
>+
>+Set all devlink instances to switchdev mode::
>+
>+  devlink_eswitch_mode=*=switchdev
>+
>+Set one PCI devlink instance to switchdev mode::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+
>+Set two PCI devlink instances to switchdev inactive mode::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+The following is invalid because comma-separated default assignments are not
>+supported::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive

Interesting. I would think that this is something user may want to set
for some usecases, no?


>diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>index 32f70879ddd0..93f09cb18c44 100644
>--- a/Documentation/networking/devlink/index.rst
>+++ b/Documentation/networking/devlink/index.rst
>@@ -56,6 +56,7 @@ general.
>    :maxdepth: 1
> 
>    devlink-dpipe
>+   devlink-defaults
>    devlink-eswitch-attr
>    devlink-flash
>    devlink-health
>diff --git a/net/devlink/core.c b/net/devlink/core.c

Wanna have this in a separate file perhaps? "default.c"?


>index fe9f6a0a67d5..5126509a9c4e 100644
>--- a/net/devlink/core.c
>+++ b/net/devlink/core.c
>@@ -4,6 +4,10 @@
>  * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
>  */
> 
>+#include <linux/init.h>
>+#include <linux/list.h>
>+#include <linux/slab.h>
>+#include <linux/string.h>
> #include <net/genetlink.h>
> #define CREATE_TRACE_POINTS
> #include <trace/events/devlink.h>
>@@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
> 
> DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
> 
>+static char *devlink_default_esw_mode_param;
>+static bool devlink_default_esw_mode_match_all;
>+static enum devlink_eswitch_mode devlink_default_esw_mode;
>+static LIST_HEAD(devlink_default_esw_mode_nodes);
>+
>+struct devlink_default_esw_mode_node {
>+	struct list_head list;
>+	char *bus_name;
>+	char *dev_name;
>+};
>+
>+static int __init
>+devlink_default_esw_mode_to_value(const char *str,
>+				  enum devlink_eswitch_mode *mode)
>+{
>+	if (!strcmp(str, "legacy")) {
>+		*mode = DEVLINK_ESWITCH_MODE_LEGACY;
>+		return 0;
>+	}
>+	if (!strcmp(str, "switchdev")) {
>+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
>+		return 0;
>+	}
>+	if (!strcmp(str, "switchdev_inactive")) {
>+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
>+		return 0;
>+	}
>+
>+	return -EINVAL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
>+				      char **dev_name)
>+{
>+	char *slash;
>+	char *p;
>+
>+	if (!*handle)
>+		return -EINVAL;
>+
>+	for (p = handle; *p; p++) {
>+		if (*p == '*' || *p == '=')
>+			return -EINVAL;
>+	}
>+
>+	slash = strchr(handle, '/');
>+	if (!slash || slash == handle || !slash[1])
>+		return -EINVAL;
>+	if (strchr(slash + 1, '/'))
>+		return -EINVAL;
>+
>+	*slash = '\0';
>+
>+	*bus_name = handle;
>+	*dev_name = slash + 1;
>+	return 0;
>+}
>+
>+static struct devlink_default_esw_mode_node *
>+devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+
>+	list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
>+		if (!strcmp(node->bus_name, bus_name) &&
>+		    !strcmp(node->dev_name, dev_name))
>+			return node;
>+	}
>+
>+	return NULL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+
>+	if (devlink_default_esw_mode_node_find(bus_name, dev_name))
>+		return -EEXIST;
>+
>+	node = kzalloc_obj(*node);
>+	if (!node)
>+		return -ENOMEM;
>+
>+	INIT_LIST_HEAD(&node->list);
>+	node->bus_name = kstrdup(bus_name, GFP_KERNEL);
>+	node->dev_name = kstrdup(dev_name, GFP_KERNEL);
>+	if (!node->bus_name || !node->dev_name) {
>+		kfree(node->bus_name);
>+		kfree(node->dev_name);
>+		kfree(node);
>+		return -ENOMEM;
>+	}
>+
>+	list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
>+	return 0;
>+}
>+
>+static int __init devlink_default_esw_mode_handles_parse(char *handles)
>+{
>+	char *handle;
>+	int err;
>+
>+	if (!strcmp(handles, "*")) {
>+		devlink_default_esw_mode_match_all = true;
>+		return 0;
>+	}
>+
>+	while ((handle = strsep(&handles, ",")) != NULL) {
>+		char *bus_name;
>+		char *dev_name;
>+
>+		err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
>+							    &dev_name);
>+		if (err)
>+			return err;
>+
>+		err = devlink_default_esw_mode_node_add(bus_name, dev_name);
>+		if (err)
>+			return err;
>+	}
>+
>+	return 0;
>+}
>+
>+static void __init
>+devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
>+{
>+	kfree(node->bus_name);
>+	kfree(node->dev_name);
>+	kfree(node);
>+}
>+
>+static void __init devlink_default_esw_mode_nodes_clear(void)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+	struct devlink_default_esw_mode_node *node_tmp;
>+
>+	list_for_each_entry_safe(node, node_tmp,
>+				 &devlink_default_esw_mode_nodes, list) {
>+		list_del(&node->list);
>+		devlink_default_esw_mode_node_free(node);
>+	}
>+
>+	devlink_default_esw_mode_match_all = false;
>+}
>+
>+static int __init devlink_default_esw_mode_parse(char *str)
>+{
>+	char *handles;
>+	char *separator;
>+	char *mode;
>+	enum devlink_eswitch_mode esw_mode;
>+	int err;
>+
>+	if (!*str)
>+		return -EINVAL;
>+
>+	separator = strrchr(str, '=');
>+	if (!separator || separator == str || !separator[1])
>+		return -EINVAL;
>+
>+	*separator = '\0';
>+	handles = str;
>+	mode = separator + 1;
>+
>+	err = devlink_default_esw_mode_to_value(mode, &esw_mode);
>+	if (err)
>+		return err;
>+
>+	err = devlink_default_esw_mode_handles_parse(handles);
>+	if (err)
>+		devlink_default_esw_mode_nodes_clear();
>+	else
>+		devlink_default_esw_mode = esw_mode;
>+
>+	return err;
>+}
>+
>+static int __init devlink_default_esw_mode_setup(char *str)
>+{
>+	devlink_default_esw_mode_param = str;
>+	return 1;
>+}
>+__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
>+
> static struct devlink *devlinks_xa_get(unsigned long index)
> {
> 	struct devlink *devlink;
>@@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
> /**
>  * devl_register - Register devlink instance
>  * @devlink: devlink
>+ *
>+ * Make @devlink visible to userspace. Drivers must call this only after the
>+ * instance is fully initialized and its devlink operations can be called.
>+ *
>+ * Context: Caller must hold the devlink instance lock. Use devlink_register()
>+ * when the lock is not already held.
>+ *
>+ * Return: 0 on success.
>  */
> int devl_register(struct devlink *devlink)
> {
>@@ -580,6 +779,31 @@ static int __init devlink_init(void)
> {
> 	int err;
> 
>+	if (devlink_default_esw_mode_param) {
>+		char *def;
>+
>+		def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
>+		if (!def) {
>+			devlink_default_esw_mode_param = NULL;
>+			pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+		} else {
>+			err = devlink_default_esw_mode_parse(def);
>+			kfree(def);
>+			if (err == -EEXIST) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: duplicate eswitch mode handles ignored\n");
>+			} else if (err == -EINVAL) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
>+			} else if (err == -ENOMEM) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+			} else if (err) {
>+				goto out;
>+			}

Move this to a separate helper alongside the other "default" functions?


>+		}
>+	}
>+
> 	err = register_pernet_subsys(&devlink_pernet_ops);
> 	if (err)
> 		goto out;
>@@ -595,7 +819,10 @@ static int __init devlink_init(void)
> out_unreg_pernet_subsys:
> 	unregister_pernet_subsys(&devlink_pernet_ops);
> out:
>+	if (err)
>+		devlink_default_esw_mode_nodes_clear();
> 	WARN_ON(err);
>+
> 	return err;
> }
> 
>-- 
>2.43.0
>

^ permalink raw reply

* [PATCH iwl-next v2 1/2] i40e: move ATR sample rate from ring to PF level
From: mheib @ 2026-07-01  9:38 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms, corbet,
	anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	Mohammad Heib

From: Mohammad Heib <mheib@redhat.com>

The ATR sample rate is currently stored per-ring and initialized when each
TX ring is configured. Since the sample rate is a global policy that
applies uniformly across all rings, it makes more sense to store it at
the PF level.

Move atr_sample_rate from struct i40e_ring to struct i40e_pf and initialize
it once during i40e_sw_init(). Update i40e_atr() to reference the PF-level
field. Change atr_count from u8 to u32 to match the sample rate type.

Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h | 3 +--
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 1b6a8fbaa648..88eb40ee45f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -487,6 +487,7 @@ struct i40e_pf {
 	u16 rss_size_max;          /* HW defined max RSS queues */
 	u16 fdir_pf_filter_count;  /* num of guaranteed filters for this PF */
 	u16 num_alloc_vsi;         /* num VSIs this driver supports */
+	u32 atr_sample_rate;
 	bool wol_en;
 
 	struct hlist_head fdir_filter_list;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1fb86bd1af8e..3834af6c09be 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3457,12 +3457,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 		ring->xsk_pool = i40e_xsk_pool(ring);
 
 	/* some ATR related tx ring init */
-	if (test_bit(I40E_FLAG_FD_ATR_ENA, vsi->back->flags)) {
-		ring->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
-		ring->atr_count = 0;
-	} else {
-		ring->atr_sample_rate = 0;
-	}
+	ring->atr_count = 0;
 
 	/* configure XPS */
 	i40e_config_xps_tx_ring(ring);
@@ -12745,6 +12740,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
 		}
 	}
 
+	pf->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
+
 	if ((pf->hw.func_caps.fd_filters_guaranteed > 0) ||
 	    (pf->hw.func_caps.fd_filters_best_effort > 0)) {
 		set_bit(I40E_FLAG_FD_ATR_ENA, pf->flags);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 61525ab7d21e..da94cb2ce94d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2882,7 +2882,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 		return;
 
 	/* if sampling is disabled do nothing */
-	if (!tx_ring->atr_sample_rate)
+	if (!pf->atr_sample_rate)
 		return;
 
 	/* Currently only IPv4/IPv6 with TCP is supported */
@@ -2934,7 +2934,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	if (!th->fin &&
 	    !th->syn &&
 	    !th->rst &&
-	    (tx_ring->atr_count < tx_ring->atr_sample_rate))
+	    (tx_ring->atr_count < pf->atr_sample_rate))
 		return;
 
 	tx_ring->atr_count = 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index bb741ff3e5f2..be587f804e7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -372,8 +372,7 @@ struct i40e_ring {
 	u16 next_to_clean;
 	u16 xdp_tx_active;
 
-	u8 atr_sample_rate;
-	u8 atr_count;
+	u32 atr_count;
 
 	bool ring_active;		/* is ring online or not */
 	bool arm_wb;		/* do something to arm write back */
-- 
2.53.0


^ permalink raw reply related

* [PATCH iwl-next v3 2/2] i40e: add devlink parameter for Flow Director ATR sample rate
From: mheib @ 2026-07-01  9:38 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms, corbet,
	anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	Mohammad Heib
In-Reply-To: <20260701093830.948756-1-mheib@redhat.com>

From: Mohammad Heib <mheib@redhat.com>

The i40e driver uses Flow Director ATR to periodically update flow
steering information for active TCP flows. The update frequency is
currently controlled by I40E_DEFAULT_ATR_SAMPLE_RATE and is fixed at
driver build time.

On systems with a large number of queues and high-rate TCP workloads,
the default sampling interval can result in frequent Flow Director
reprogramming for long-lived flows.

The amount of TCP packet reordering observed on some systems is
sensitive to the ATR sampling interval. Increasing the interval reduces
Flow Director programming activity and can significantly reduce the
associated reordering.

Since the optimal sampling interval depends on the workload and system
configuration, a single fixed value is not suitable for all deployments.

Add a devlink parameter to allow administrators to tune the ATR sample
rate at runtime without rebuilding the driver or disabling ATR
functionality entirely.

Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
 Documentation/networking/devlink/i40e.rst     | 20 +++++++++++
 .../net/ethernet/intel/i40e/i40e_devlink.c    | 36 +++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
index 51c887f0dc83..2cea98b631ba 100644
--- a/Documentation/networking/devlink/i40e.rst
+++ b/Documentation/networking/devlink/i40e.rst
@@ -40,6 +40,26 @@ Parameters
 
         The default value is ``0`` (internal calculation is used).
 
+.. list-table:: Driver specific parameters implemented
+    :widths: 5 5 90
+
+    * - Name
+      - Mode
+      - Description
+    * - ``atr_sample_rate``
+      - runtime
+      - Controls how frequently Flow Director ATR updates flow steering
+        information for active TCP flows.
+
+        ATR programs Flow Director entries based on sampled transmitted
+        packets. The sampling interval is specified as the number of
+        transmitted packets between ATR updates.
+
+        Lower values increase Flow Director programming activity, while
+        higher values reduce the update frequency.
+
+        Setting to ``0`` disables ATR sampling (no filters will be programmed)
+        The default value is ``20``.
 
 Info versions
 =============
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devlink.c b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
index 229179ccc131..cf487efdd803 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devlink.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
@@ -33,12 +33,48 @@ static int i40e_max_mac_per_vf_get(struct devlink *devlink,
 	return 0;
 }
 
+static int i40e_atr_sample_rate_set(struct devlink *devlink,
+				    u32 id,
+				    struct devlink_param_gset_ctx *ctx,
+				    struct netlink_ext_ack *extack)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+	u32 sample_rate = ctx->val.vu32;
+
+	pf->atr_sample_rate = sample_rate;
+	return 0;
+}
+
+static int i40e_atr_sample_rate_get(struct devlink *devlink,
+				    u32 id,
+				    struct devlink_param_gset_ctx *ctx,
+				    struct netlink_ext_ack *extack)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	ctx->val.vu32 = pf->atr_sample_rate;
+
+	return 0;
+}
+
+enum i40e_dl_param_id {
+	I40E_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
+};
+
 static const struct devlink_param i40e_dl_params[] = {
 	DEVLINK_PARAM_GENERIC(MAX_MAC_PER_VF,
 			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			      i40e_max_mac_per_vf_get,
 			      i40e_max_mac_per_vf_set,
 			      NULL),
+	DEVLINK_PARAM_DRIVER(I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
+			     "atr_sample_rate",
+			     DEVLINK_PARAM_TYPE_U32,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     i40e_atr_sample_rate_get,
+			     i40e_atr_sample_rate_set,
+			     NULL),
 };
 
 static void i40e_info_get_dsn(struct i40e_pf *pf, char *buf, size_t len)
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next] macsec: no longer rely on RTNL in macsec_fill_info()
From: Eric Dumazet @ 2026-07-01  9:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet, Sabrina Dubroca, Andrew Lunn

Add READ_ONCE()/WRITE_ONCE() annotations on fields that can be
changed concurrently in macsec_changelink() and macsec_update_offload():

- secy->key_len
- secy->xpn
- tx_sc->encoding_sa
- tx_sc->encrypt
- secy->protect_frames
- tx_sc->send_sci
- tx_sc->end_station
- tx_sc->scb
- secy->replay_protect
- secy->validate_frames
- secy->replay_window
- macsec->offload

This allows macsec_fill_info() to run locklessly without RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
---
 drivers/net/macsec.c | 78 +++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index fb009120a92415cf51f12eab15b4c7925a25704d..1a968596ca45b8deb028b86207a501dbf3116f0f 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -2636,7 +2636,7 @@ static int macsec_update_offload(struct net_device *dev,
 		vlan_drop_rx_ctag_filter_info(dev);
 		vlan_drop_rx_stag_filter_info(dev);
 	}
-	macsec->offload = offload;
+	WRITE_ONCE(macsec->offload, offload);
 	/* Add VLAN filters when enabling offload. */
 	if (prev_offload == MACSEC_OFFLOAD_OFF) {
 		ret = vlan_get_rx_ctag_filter_info(dev);
@@ -2666,7 +2666,7 @@ static int macsec_update_offload(struct net_device *dev,
 	return 0;
 
 rollback_offload:
-	macsec->offload = prev_offload;
+	WRITE_ONCE(macsec->offload, prev_offload);
 	macsec_offload(ops->mdo_del_secy, &ctx);
 
 	return ret;
@@ -3875,52 +3875,53 @@ static int macsec_changelink_common(struct net_device *dev,
 
 	if (data[IFLA_MACSEC_ENCODING_SA]) {
 		struct macsec_tx_sa *tx_sa;
+		u8 encoding_sa = nla_get_u8(data[IFLA_MACSEC_ENCODING_SA]);
 
-		tx_sc->encoding_sa = nla_get_u8(data[IFLA_MACSEC_ENCODING_SA]);
-		tx_sa = rtnl_dereference(tx_sc->sa[tx_sc->encoding_sa]);
+		WRITE_ONCE(tx_sc->encoding_sa, encoding_sa);
+		tx_sa = rtnl_dereference(tx_sc->sa[encoding_sa]);
 
 		secy->operational = tx_sa && tx_sa->active;
 	}
 
 	if (data[IFLA_MACSEC_ENCRYPT])
-		tx_sc->encrypt = !!nla_get_u8(data[IFLA_MACSEC_ENCRYPT]);
+		WRITE_ONCE(tx_sc->encrypt, !!nla_get_u8(data[IFLA_MACSEC_ENCRYPT]));
 
 	if (data[IFLA_MACSEC_PROTECT])
-		secy->protect_frames = !!nla_get_u8(data[IFLA_MACSEC_PROTECT]);
+		WRITE_ONCE(secy->protect_frames, !!nla_get_u8(data[IFLA_MACSEC_PROTECT]));
 
 	if (data[IFLA_MACSEC_INC_SCI])
-		tx_sc->send_sci = !!nla_get_u8(data[IFLA_MACSEC_INC_SCI]);
+		WRITE_ONCE(tx_sc->send_sci, !!nla_get_u8(data[IFLA_MACSEC_INC_SCI]));
 
 	if (data[IFLA_MACSEC_ES])
-		tx_sc->end_station = !!nla_get_u8(data[IFLA_MACSEC_ES]);
+		WRITE_ONCE(tx_sc->end_station, !!nla_get_u8(data[IFLA_MACSEC_ES]));
 
 	if (data[IFLA_MACSEC_SCB])
-		tx_sc->scb = !!nla_get_u8(data[IFLA_MACSEC_SCB]);
+		WRITE_ONCE(tx_sc->scb, !!nla_get_u8(data[IFLA_MACSEC_SCB]));
 
 	if (data[IFLA_MACSEC_REPLAY_PROTECT])
-		secy->replay_protect = !!nla_get_u8(data[IFLA_MACSEC_REPLAY_PROTECT]);
+		WRITE_ONCE(secy->replay_protect, !!nla_get_u8(data[IFLA_MACSEC_REPLAY_PROTECT]));
 
 	if (data[IFLA_MACSEC_VALIDATION])
-		secy->validate_frames = nla_get_u8(data[IFLA_MACSEC_VALIDATION]);
+		WRITE_ONCE(secy->validate_frames, nla_get_u8(data[IFLA_MACSEC_VALIDATION]));
 
 	if (data[IFLA_MACSEC_CIPHER_SUITE]) {
 		switch (nla_get_u64(data[IFLA_MACSEC_CIPHER_SUITE])) {
 		case MACSEC_CIPHER_ID_GCM_AES_128:
 		case MACSEC_DEFAULT_CIPHER_ID:
-			secy->key_len = MACSEC_GCM_AES_128_SAK_LEN;
-			secy->xpn = false;
+			WRITE_ONCE(secy->key_len, MACSEC_GCM_AES_128_SAK_LEN);
+			WRITE_ONCE(secy->xpn, false);
 			break;
 		case MACSEC_CIPHER_ID_GCM_AES_256:
-			secy->key_len = MACSEC_GCM_AES_256_SAK_LEN;
-			secy->xpn = false;
+			WRITE_ONCE(secy->key_len, MACSEC_GCM_AES_256_SAK_LEN);
+			WRITE_ONCE(secy->xpn, false);
 			break;
 		case MACSEC_CIPHER_ID_GCM_AES_XPN_128:
-			secy->key_len = MACSEC_GCM_AES_128_SAK_LEN;
-			secy->xpn = true;
+			WRITE_ONCE(secy->key_len, MACSEC_GCM_AES_128_SAK_LEN);
+			WRITE_ONCE(secy->xpn, true);
 			break;
 		case MACSEC_CIPHER_ID_GCM_AES_XPN_256:
-			secy->key_len = MACSEC_GCM_AES_256_SAK_LEN;
-			secy->xpn = true;
+			WRITE_ONCE(secy->key_len, MACSEC_GCM_AES_256_SAK_LEN);
+			WRITE_ONCE(secy->xpn, true);
 			break;
 		default:
 			return -EINVAL;
@@ -3928,13 +3929,14 @@ static int macsec_changelink_common(struct net_device *dev,
 	}
 
 	if (data[IFLA_MACSEC_WINDOW]) {
-		secy->replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]);
+		u32 replay_window = nla_get_u32(data[IFLA_MACSEC_WINDOW]);
 
 		/* IEEE 802.1AEbw-2013 10.7.8 - maximum replay window
 		 * for XPN cipher suites */
 		if (secy->xpn &&
-		    secy->replay_window > MACSEC_XPN_MAX_REPLAY_WINDOW)
+		    replay_window > MACSEC_XPN_MAX_REPLAY_WINDOW)
 			return -EINVAL;
+		WRITE_ONCE(secy->replay_window, replay_window);
 	}
 
 	return 0;
@@ -4382,21 +4384,21 @@ static size_t macsec_get_size(const struct net_device *dev)
 static int macsec_fill_info(struct sk_buff *skb,
 			    const struct net_device *dev)
 {
-	struct macsec_tx_sc *tx_sc;
-	struct macsec_dev *macsec;
-	struct macsec_secy *secy;
+	const struct macsec_tx_sc *tx_sc;
+	const struct macsec_dev *macsec;
+	const struct macsec_secy *secy;
 	u64 csid;
 
 	macsec = macsec_priv(dev);
 	secy = &macsec->secy;
 	tx_sc = &secy->tx_sc;
 
-	switch (secy->key_len) {
+	switch (READ_ONCE(secy->key_len)) {
 	case MACSEC_GCM_AES_128_SAK_LEN:
-		csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_128 : MACSEC_DEFAULT_CIPHER_ID;
+		csid = READ_ONCE(secy->xpn) ? MACSEC_CIPHER_ID_GCM_AES_XPN_128 : MACSEC_DEFAULT_CIPHER_ID;
 		break;
 	case MACSEC_GCM_AES_256_SAK_LEN:
-		csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
+		csid = READ_ONCE(secy->xpn) ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
 		break;
 	default:
 		goto nla_put_failure;
@@ -4407,20 +4409,20 @@ static int macsec_fill_info(struct sk_buff *skb,
 	    nla_put_u8(skb, IFLA_MACSEC_ICV_LEN, secy->icv_len) ||
 	    nla_put_u64_64bit(skb, IFLA_MACSEC_CIPHER_SUITE,
 			      csid, IFLA_MACSEC_PAD) ||
-	    nla_put_u8(skb, IFLA_MACSEC_ENCODING_SA, tx_sc->encoding_sa) ||
-	    nla_put_u8(skb, IFLA_MACSEC_ENCRYPT, tx_sc->encrypt) ||
-	    nla_put_u8(skb, IFLA_MACSEC_PROTECT, secy->protect_frames) ||
-	    nla_put_u8(skb, IFLA_MACSEC_INC_SCI, tx_sc->send_sci) ||
-	    nla_put_u8(skb, IFLA_MACSEC_ES, tx_sc->end_station) ||
-	    nla_put_u8(skb, IFLA_MACSEC_SCB, tx_sc->scb) ||
-	    nla_put_u8(skb, IFLA_MACSEC_REPLAY_PROTECT, secy->replay_protect) ||
-	    nla_put_u8(skb, IFLA_MACSEC_VALIDATION, secy->validate_frames) ||
-	    nla_put_u8(skb, IFLA_MACSEC_OFFLOAD, macsec->offload) ||
+	    nla_put_u8(skb, IFLA_MACSEC_ENCODING_SA, READ_ONCE(tx_sc->encoding_sa)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_ENCRYPT, READ_ONCE(tx_sc->encrypt)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_PROTECT, READ_ONCE(secy->protect_frames)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_INC_SCI, READ_ONCE(tx_sc->send_sci)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_ES, READ_ONCE(tx_sc->end_station)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_SCB, READ_ONCE(tx_sc->scb)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_REPLAY_PROTECT, READ_ONCE(secy->replay_protect)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_VALIDATION, READ_ONCE(secy->validate_frames)) ||
+	    nla_put_u8(skb, IFLA_MACSEC_OFFLOAD, READ_ONCE(macsec->offload)) ||
 	    0)
 		goto nla_put_failure;
 
-	if (secy->replay_protect) {
-		if (nla_put_u32(skb, IFLA_MACSEC_WINDOW, secy->replay_window))
+	if (READ_ONCE(secy->replay_protect)) {
+		if (nla_put_u32(skb, IFLA_MACSEC_WINDOW, READ_ONCE(secy->replay_window)))
 			goto nla_put_failure;
 	}
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v4 0/3] Add drm_ras netlink error event support
From: Riana Tauro @ 2026-07-01  9:44 UTC (permalink / raw)
  To: intel-xe, dri-devel, netdev
  Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
	kuba, simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
	ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
	raag.jadav, maarten.lankhorst, mallesh.koujalagi, soham.purkait,
	Riana Tauro

Define a new netlink event 'error-event' and a new multicast group
'error-report' in drm_ras. Each event contains device name, node and
error information to identify the error triggering the event.

Add drm_ras_nl_error_event() to trigger an event from the driver.
Wire this support to xe drm_ras to notify userspace whenever a GT or
SoC error occurs in PVC. Also add support for correctable errors in
CRI.

$ sudo ynl --family drm_ras --output-json --subscribe error-report

{
    "name": "error-event",
     "msg": {
         "device-name": "0000:03:00.0",
         "node-id": 1,
         "node-name": "uncorrectable-errors",
         "error-id": 1,
         "error-name": "core-compute",
         "error-value": 1
     }
}

Rev2: use ynl in document and commit message
      fix cosmetic review comments
      simplify caller 

Rev3: replace error-event with error-report
      had has_drm_ras check 
      add support for correctable errors in CRI

Rev4: send an event at most once per component for each interrupt
      add xe_warn for unexpected values from firmware
      fix sashiko reported issues
       
Riana Tauro (3):
  drm/drm_ras: Add drm_ras netlink error event
  drm/xe/xe_drm_ras: Add error-event support for PVC
  drm/xe/xe_ras: Add error-event support for CRI

 Documentation/gpu/drm-ras.rst            | 21 ++++++
 Documentation/netlink/specs/drm_ras.yaml | 48 +++++++++++++
 drivers/gpu/drm/drm_ras.c                | 87 ++++++++++++++++++++++++
 drivers/gpu/drm/drm_ras_nl.c             |  6 ++
 drivers/gpu/drm/drm_ras_nl.h             |  4 ++
 drivers/gpu/drm/xe/xe_drm_ras.c          | 32 +++++++++
 drivers/gpu/drm/xe/xe_drm_ras.h          |  3 +
 drivers/gpu/drm/xe/xe_hw_error.c         |  5 +-
 drivers/gpu/drm/xe/xe_ras.c              | 75 ++++++++++++++++++++
 include/drm/drm_ras.h                    |  5 ++
 include/uapi/drm/drm_ras.h               | 15 ++++
 11 files changed, 300 insertions(+), 1 deletion(-)

-- 
2.47.1


^ permalink raw reply

* [PATCH v4 1/3] drm/drm_ras: Add drm_ras netlink error event
From: Riana Tauro @ 2026-07-01  9:44 UTC (permalink / raw)
  To: intel-xe, dri-devel, netdev
  Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
	kuba, simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
	ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
	raag.jadav, maarten.lankhorst, mallesh.koujalagi, soham.purkait,
	Riana Tauro, Zack McKevitt, Lijo Lazar, Hawking Zhang,
	David S. Miller, Paolo Abeni, Eric Dumazet
In-Reply-To: <20260701094409.129131-5-riana.tauro@intel.com>

Define a new netlink event 'error-event' and a new multicast group
'error-report' in drm_ras. Each event contains device name, node and
error information to identify the error triggering the event.

Add drm_ras_nl_error_event() to trigger an event from the driver.
Userspace must subscribe to 'error-report' to receive 'error-event'
notifications.

Usage:

$ sudo ynl --family drm_ras --subscribe error-report

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v2: remove redundant initialization
    remove unnecessary space
    use ynl in commit message and doc (Raag)
    simplify doc for error-event attrs

v3: rename error-notify to error-report
    Replace notify with report across the file (Raag)
---
 Documentation/gpu/drm-ras.rst            | 21 ++++++
 Documentation/netlink/specs/drm_ras.yaml | 48 +++++++++++++
 drivers/gpu/drm/drm_ras.c                | 87 ++++++++++++++++++++++++
 drivers/gpu/drm/drm_ras_nl.c             |  6 ++
 drivers/gpu/drm/drm_ras_nl.h             |  4 ++
 include/drm/drm_ras.h                    |  5 ++
 include/uapi/drm/drm_ras.h               | 15 ++++
 7 files changed, 186 insertions(+)

diff --git a/Documentation/gpu/drm-ras.rst b/Documentation/gpu/drm-ras.rst
index 83c21853b74b..406e4c49bac1 100644
--- a/Documentation/gpu/drm-ras.rst
+++ b/Documentation/gpu/drm-ras.rst
@@ -56,6 +56,7 @@ User space tools can:
   ``node-id`` and ``error-id`` as parameters.
 * Clear specific error counters with the ``clear-error-counter`` command, using both
   ``node-id`` and ``error-id`` as parameters.
+* Subscribe to the ``error-report`` multicast group to receive ``error-event``.
 
 YAML-based Interface
 --------------------
@@ -111,3 +112,23 @@ Example: Clear an error counter for a given node
 
     sudo ynl --family drm_ras --do clear-error-counter --json '{"node-id":0, "error-id":1}'
     None
+
+Example: Subscribe to ``error-report`` multicast group
+
+.. code-block:: bash
+
+    sudo ynl --family drm_ras --output-json --subscribe error-report
+
+.. code-block:: json
+
+    {
+        "name": "error-event",
+        "msg": {
+            "device-name": "0000:03:00.0",
+            "node-id": 1,
+            "node-name": "uncorrectable-errors",
+            "error-id": 1,
+            "error-name": "error_name1",
+            "error-value": 1
+        }
+    }
diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
index e113056f8c01..8aed3d4515e5 100644
--- a/Documentation/netlink/specs/drm_ras.yaml
+++ b/Documentation/netlink/specs/drm_ras.yaml
@@ -69,6 +69,33 @@ attribute-sets:
         name: error-value
         type: u32
         doc: Current value of the requested error counter.
+  -
+    name: error-event-attrs
+    attributes:
+      -
+        name: device-name
+        type: string
+        doc: Device (PCI BDF, UUID) that reported the error.
+      -
+        name: node-id
+        type: u32
+        doc: ID of the node that reported the error.
+      -
+        name: node-name
+        type: string
+        doc: Name of the node that reported the error.
+      -
+        name: error-id
+        type: u32
+        doc: ID of the error counter.
+      -
+        name: error-name
+        type: string
+        doc: Name of the error.
+      -
+        name: error-value
+        type: u32
+        doc: Current value of the error counter.
 
 operations:
   list:
@@ -124,3 +151,24 @@ operations:
       do:
         request:
           attributes: *id-attrs
+    -
+      name: error-event
+      doc: >-
+           Report an error event to userspace.
+           The event includes the device, node and error information
+           of the error that triggered the event.
+      attribute-set: error-event-attrs
+      mcgrp: error-report
+      event:
+        attributes:
+          - device-name
+          - node-id
+          - node-name
+          - error-id
+          - error-name
+          - error-value
+
+mcast-groups:
+  list:
+    -
+      name: error-report
diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
index d6eab29a1394..77f912a4d101 100644
--- a/drivers/gpu/drm/drm_ras.c
+++ b/drivers/gpu/drm/drm_ras.c
@@ -41,6 +41,11 @@
  *    Userspace must provide Node ID, Error ID.
  *    Clears specific error counter of a node if supported.
  *
+ * 4. ERROR_REPORT: Subscribe to this multicast group to receive error events
+ *
+ * 5. ERROR_EVENT: Report an error event to userspace. The event contains device, node
+ *    and error information that triggered the event.
+ *
  * Node registration:
  *
  * - drm_ras_node_register(): Registers a new node and assigns
@@ -186,6 +191,34 @@ static int msg_reply_value(struct sk_buff *msg, u32 error_id,
 			   value);
 }
 
+static int msg_put_error_event_attrs(struct sk_buff *msg, struct drm_ras_node *node,
+				     u32 error_id, const char *error_name, u32 value)
+{
+	int ret;
+
+	ret = nla_put_string(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_DEVICE_NAME, node->device_name);
+	if (ret)
+		return ret;
+
+	ret = nla_put_u32(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_NODE_ID, node->id);
+	if (ret)
+		return ret;
+
+	ret = nla_put_string(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_NODE_NAME, node->node_name);
+	if (ret)
+		return ret;
+
+	ret = nla_put_u32(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_ID, error_id);
+	if (ret)
+		return ret;
+
+	ret = nla_put_string(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_NAME, error_name);
+	if (ret)
+		return ret;
+
+	return nla_put_u32(msg, DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_VALUE, value);
+}
+
 static int doit_reply_value(struct genl_info *info, u32 node_id,
 			    u32 error_id)
 {
@@ -222,6 +255,60 @@ static int doit_reply_value(struct genl_info *info, u32 node_id,
 	return genlmsg_reply(msg, info);
 }
 
+/**
+ * drm_ras_nl_error_event() - Report an error event
+ * @node: Node structure
+ * @error_id: ID of the error
+ * @error_name: Name of the error
+ * @value: Value associated with the error
+ * @flags: GFP flags for memory allocation
+ *
+ * Report an error-event to userspace using the error-report multicast group.
+ *
+ * Return: 0 on success, or negative errno on failure.
+ */
+int drm_ras_nl_error_event(struct drm_ras_node *node, u32 error_id, const char *error_name,
+			   u32 value, gfp_t flags)
+{
+	struct genl_info info;
+	struct sk_buff *msg;
+	struct nlattr *hdr;
+	int ret;
+
+	if (!error_name)
+		return -EINVAL;
+
+	if (!genl_has_listeners(&drm_ras_nl_family, &init_net, DRM_RAS_NLGRP_ERROR_REPORT))
+		return 0;
+
+	genl_info_init_ntf(&info, &drm_ras_nl_family, DRM_RAS_CMD_ERROR_EVENT);
+
+	msg = genlmsg_new(NLMSG_GOODSIZE, flags);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_iput(msg, &info);
+	if (!hdr) {
+		ret = -EMSGSIZE;
+		goto free_msg;
+	}
+
+	ret = msg_put_error_event_attrs(msg, node, error_id, error_name, value);
+	if (ret)
+		goto cancel_msg;
+
+	genlmsg_end(msg, hdr);
+	genlmsg_multicast(&drm_ras_nl_family, msg, 0, DRM_RAS_NLGRP_ERROR_REPORT, flags);
+	return 0;
+
+cancel_msg:
+	genlmsg_cancel(msg, hdr);
+free_msg:
+	nlmsg_free(msg);
+	return ret;
+}
+EXPORT_SYMBOL(drm_ras_nl_error_event);
+
 /**
  * drm_ras_nl_get_error_counter_dumpit() - Dump all Error Counters
  * @skb: Netlink message buffer
diff --git a/drivers/gpu/drm/drm_ras_nl.c b/drivers/gpu/drm/drm_ras_nl.c
index dea1c1b2494e..9d3123cc9f9c 100644
--- a/drivers/gpu/drm/drm_ras_nl.c
+++ b/drivers/gpu/drm/drm_ras_nl.c
@@ -58,6 +58,10 @@ static const struct genl_split_ops drm_ras_nl_ops[] = {
 	},
 };
 
+static const struct genl_multicast_group drm_ras_nl_mcgrps[] = {
+	[DRM_RAS_NLGRP_ERROR_REPORT] = { "error-report", },
+};
+
 struct genl_family drm_ras_nl_family __ro_after_init = {
 	.name		= DRM_RAS_FAMILY_NAME,
 	.version	= DRM_RAS_FAMILY_VERSION,
@@ -66,4 +70,6 @@ struct genl_family drm_ras_nl_family __ro_after_init = {
 	.module		= THIS_MODULE,
 	.split_ops	= drm_ras_nl_ops,
 	.n_split_ops	= ARRAY_SIZE(drm_ras_nl_ops),
+	.mcgrps		= drm_ras_nl_mcgrps,
+	.n_mcgrps	= ARRAY_SIZE(drm_ras_nl_mcgrps),
 };
diff --git a/drivers/gpu/drm/drm_ras_nl.h b/drivers/gpu/drm/drm_ras_nl.h
index a398643572a5..03ec275aca92 100644
--- a/drivers/gpu/drm/drm_ras_nl.h
+++ b/drivers/gpu/drm/drm_ras_nl.h
@@ -21,6 +21,10 @@ int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb,
 int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
 					struct genl_info *info);
 
+enum {
+	DRM_RAS_NLGRP_ERROR_REPORT,
+};
+
 extern struct genl_family drm_ras_nl_family;
 
 #endif /* _LINUX_DRM_RAS_GEN_H */
diff --git a/include/drm/drm_ras.h b/include/drm/drm_ras.h
index 0beede3ddc4e..8abfb7d2077b 100644
--- a/include/drm/drm_ras.h
+++ b/include/drm/drm_ras.h
@@ -80,9 +80,14 @@ struct drm_device;
 #if IS_ENABLED(CONFIG_DRM_RAS)
 int drm_ras_node_register(struct drm_ras_node *node);
 void drm_ras_node_unregister(struct drm_ras_node *node);
+int drm_ras_nl_error_event(struct drm_ras_node *node, u32 error_id, const char *error_name,
+			   u32 value, gfp_t flags);
 #else
 static inline int drm_ras_node_register(struct drm_ras_node *node) { return 0; }
 static inline void drm_ras_node_unregister(struct drm_ras_node *node) { }
+static inline int drm_ras_nl_error_event(struct drm_ras_node *node, u32 error_id,
+					 const char *error_name, u32 value, gfp_t flags)
+{ return 0; }
 #endif
 
 #endif
diff --git a/include/uapi/drm/drm_ras.h b/include/uapi/drm/drm_ras.h
index 218a3ee86805..eab8231aa87c 100644
--- a/include/uapi/drm/drm_ras.h
+++ b/include/uapi/drm/drm_ras.h
@@ -38,13 +38,28 @@ enum {
 	DRM_RAS_A_ERROR_COUNTER_ATTRS_MAX = (__DRM_RAS_A_ERROR_COUNTER_ATTRS_MAX - 1)
 };
 
+enum {
+	DRM_RAS_A_ERROR_EVENT_ATTRS_DEVICE_NAME = 1,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_NODE_ID,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_NODE_NAME,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_ID,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_NAME,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_ERROR_VALUE,
+
+	__DRM_RAS_A_ERROR_EVENT_ATTRS_MAX,
+	DRM_RAS_A_ERROR_EVENT_ATTRS_MAX = (__DRM_RAS_A_ERROR_EVENT_ATTRS_MAX - 1)
+};
+
 enum {
 	DRM_RAS_CMD_LIST_NODES = 1,
 	DRM_RAS_CMD_GET_ERROR_COUNTER,
 	DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
+	DRM_RAS_CMD_ERROR_EVENT,
 
 	__DRM_RAS_CMD_MAX,
 	DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1)
 };
 
+#define DRM_RAS_MCGRP_ERROR_REPORT	"error-report"
+
 #endif /* _UAPI_LINUX_DRM_RAS_H */
-- 
2.47.1


^ permalink raw reply related

* [PATCH v4 2/3] drm/xe/xe_drm_ras: Add error-event support for PVC
From: Riana Tauro @ 2026-07-01  9:44 UTC (permalink / raw)
  To: intel-xe, dri-devel, netdev
  Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
	kuba, simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
	ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
	raag.jadav, maarten.lankhorst, mallesh.koujalagi, soham.purkait,
	Riana Tauro
In-Reply-To: <20260701094409.129131-5-riana.tauro@intel.com>

Report drm_ras error event to userspace when an error occurs.
Add support for core-compute and SoC errors in PVC.

$ sudo ynl --family drm_ras --output-json --subscribe error-report

{
    "name": "error-event",
     "msg": {
         "device-name": "0000:03:00.0",
         "node-id": 1,
         "node-name": "uncorrectable-errors",
         "error-id": 1,
         "error-name": "core-compute",
         "error-value": 1
     }
}

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v2: use ynl (Raag)
    use value as function parameter
    move error event call to hw_error_source_handler 

v3: add has_drm_ras check

v4: use drm_err_ratelimited
    initialize node post drm_ras check (Sashiko)
---
 drivers/gpu/drm/xe/xe_drm_ras.c  | 32 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_drm_ras.h  |  3 +++
 drivers/gpu/drm/xe/xe_hw_error.c |  5 ++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_ras.c b/drivers/gpu/drm/xe/xe_drm_ras.c
index 7937d8ba0ed9..8e247a8139b1 100644
--- a/drivers/gpu/drm/xe/xe_drm_ras.c
+++ b/drivers/gpu/drm/xe/xe_drm_ras.c
@@ -185,6 +185,38 @@ static int register_nodes(struct xe_device *xe)
 	return ret;
 }
 
+/**
+ * xe_drm_ras_event() - Report drm_ras error event to userspace
+ * @xe: xe device structure
+ * @component: error component (see &enum drm_xe_ras_error_component)
+ * @severity: error severity (see &enum drm_xe_ras_error_severity)
+ * @value: value of error counter
+ * @flags: flags for allocation
+ *
+ * Report an error-event to userspace.
+ */
+void xe_drm_ras_event(struct xe_device *xe, u32 component, u32 severity, u32 value, gfp_t flags)
+{
+	struct xe_drm_ras *ras = &xe->ras;
+	struct xe_drm_ras_counter *info = ras->info[severity];
+	struct drm_ras_node *node;
+	int ret;
+
+	/* Event is supported only if drm_ras is enabled */
+	if (!xe->info.has_drm_ras)
+		return;
+
+	node = &ras->node[severity];
+
+	if (!info || !info[component].name)
+		return;
+
+	ret = drm_ras_nl_error_event(node, component, info[component].name, value, flags);
+	if (ret)
+		drm_err_ratelimited(&xe->drm, "drm_ras error-event failed: %d for %s %s\n", ret,
+				    info[component].name, error_severity[severity]);
+}
+
 /**
  * xe_drm_ras_init() - Initialize DRM RAS
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_drm_ras.h b/drivers/gpu/drm/xe/xe_drm_ras.h
index 365c70e93e82..2a694bf69478 100644
--- a/drivers/gpu/drm/xe/xe_drm_ras.h
+++ b/drivers/gpu/drm/xe/xe_drm_ras.h
@@ -5,11 +5,14 @@
 #ifndef _XE_DRM_RAS_H_
 #define _XE_DRM_RAS_H_
 
+#include <linux/types.h>
+
 struct xe_device;
 
 #define for_each_error_severity(i)	\
 	for (i = 0; i < DRM_XE_RAS_ERR_SEV_MAX; i++)
 
 int xe_drm_ras_init(struct xe_device *xe);
+void xe_drm_ras_event(struct xe_device *xe, u32 component, u32 severity, u32 value, gfp_t flags);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
index 4a4b363fc844..a833cecc74ec 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.c
+++ b/drivers/gpu/drm/xe/xe_hw_error.c
@@ -432,7 +432,7 @@ static void hw_error_source_handler(struct xe_tile *tile, const enum hardware_er
 	struct xe_drm_ras *ras = &xe->ras;
 	struct xe_drm_ras_counter *info = ras->info[severity];
 	unsigned long flags, err_src;
-	u32 err_bit;
+	u32 err_bit, value;
 
 	if (!IS_DGFX(xe))
 		return;
@@ -495,6 +495,9 @@ static void hw_error_source_handler(struct xe_tile *tile, const enum hardware_er
 			gt_hw_error_handler(tile, hw_err, error_id);
 		if (err_bit == XE_SOC_ERROR)
 			soc_hw_error_handler(tile, hw_err, error_id);
+
+		value = atomic_read(&info[error_id].counter);
+		xe_drm_ras_event(xe, error_id, severity, value, GFP_ATOMIC);
 	}
 
 clear_reg:
-- 
2.47.1


^ permalink raw reply related

* [PATCH v4 3/3] drm/xe/xe_ras: Add error-event support for CRI
From: Riana Tauro @ 2026-07-01  9:44 UTC (permalink / raw)
  To: intel-xe, dri-devel, netdev
  Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
	kuba, simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
	ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
	raag.jadav, maarten.lankhorst, mallesh.koujalagi, soham.purkait,
	Riana Tauro, Michal Wajdeczko
In-Reply-To: <20260701094409.129131-5-riana.tauro@intel.com>

Add error-event support for Correctable errors in CRI. Report an error
event to userspace for every component that has crossed the threshold on
receiving an interrupt.

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: add warns for unexpected values from system controller (Michal)
    send an event at most once per component for each interrupt (Raag)
    use correct parameters for get_counter (Sashiko)
---
 drivers/gpu/drm/xe/xe_ras.c | 75 +++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 44f4e1a3455b..b71d51285954 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -77,6 +77,18 @@ static u8 drm_to_xe_ras_severity(u8 severity)
 	}
 }
 
+static u8 xe_to_drm_ras_severity(u8 severity)
+{
+	switch (severity) {
+	case XE_RAS_SEV_CORRECTABLE:
+		return DRM_XE_RAS_ERR_SEV_CORRECTABLE;
+	case XE_RAS_SEV_UNCORRECTABLE:
+		return DRM_XE_RAS_ERR_SEV_UNCORRECTABLE;
+	default:
+		return DRM_XE_RAS_ERR_SEV_MAX;
+	}
+}
+
 static u8 drm_to_xe_ras_component(u8 component)
 {
 	switch (component) {
@@ -95,6 +107,24 @@ static u8 drm_to_xe_ras_component(u8 component)
 	}
 }
 
+static u8 xe_to_drm_ras_component(u8 component)
+{
+	switch (component) {
+	case XE_RAS_COMP_DEVICE_MEMORY:
+		return DRM_XE_RAS_ERR_COMP_DEVICE_MEMORY;
+	case XE_RAS_COMP_CORE_COMPUTE:
+		return DRM_XE_RAS_ERR_COMP_CORE_COMPUTE;
+	case XE_RAS_COMP_PCIE:
+		return DRM_XE_RAS_ERR_COMP_PCIE;
+	case XE_RAS_COMP_FABRIC:
+		return DRM_XE_RAS_ERR_COMP_FABRIC;
+	case XE_RAS_COMP_SOC_INTERNAL:
+		return DRM_XE_RAS_ERR_COMP_SOC_INTERNAL;
+	default:
+		return DRM_XE_RAS_ERR_COMP_MAX;
+	}
+}
+
 static int ras_status_to_errno(u32 status)
 {
 	switch (status) {
@@ -131,14 +161,41 @@ static inline const char *comp_to_str(u8 component)
 	return xe_ras_components[component];
 }
 
+static void ras_send_error_event(struct xe_device *xe, u8 severity, u8 component)
+{
+	u8 drm_severity, drm_component;
+	u32 value;
+	int ret;
+
+	drm_severity = xe_to_drm_ras_severity(severity);
+	if (drm_severity == DRM_XE_RAS_ERR_SEV_MAX) {
+		xe_warn(xe, "sysctrl: unexpected severity %u\n", severity);
+		return;
+	}
+
+	drm_component = xe_to_drm_ras_component(component);
+	if (drm_component == DRM_XE_RAS_ERR_COMP_MAX) {
+		xe_warn(xe, "sysctrl: unexpected component %u\n", component);
+		return;
+	}
+
+	ret = xe_ras_get_counter(xe, drm_severity, drm_component, &value);
+	if (ret)
+		return;
+
+	xe_drm_ras_event(xe, drm_component, drm_severity, value, GFP_KERNEL);
+}
+
 void xe_ras_counter_threshold_crossed(struct xe_device *xe,
 				      struct xe_sysctrl_event_response *response)
 {
 	struct xe_ras_threshold_crossed *pending = (void *)&response->data;
 	struct xe_ras_error_class *errors = pending->counters;
 	u32 id, ncounters = pending->ncounters;
+	u8 sent = 0;
 
 	BUILD_BUG_ON(sizeof(response->data) < sizeof(*pending));
+	BUILD_BUG_ON(XE_RAS_COMP_MAX > (BITS_PER_BYTE * sizeof(sent)));
 	xe_device_assert_mem_access(xe);
 
 	if (!ncounters || ncounters > XE_RAS_NUM_COUNTERS)
@@ -154,6 +211,24 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
 
 		xe_warn(xe, "[RAS]: %s %s detected\n",
 			comp_to_str(component), sev_to_str(severity));
+
+		if (severity != XE_RAS_SEV_CORRECTABLE) {
+			xe_warn(xe, "sysctrl: unexpected severity %s (%u)\n", sev_to_str(severity),
+				severity);
+			continue;
+		}
+
+		if (component >= XE_RAS_COMP_MAX) {
+			xe_warn(xe, "sysctrl: unexpected component %u\n", component);
+			continue;
+		}
+
+		/* Send event once per component */
+		if (sent & BIT(component))
+			continue;
+		sent |= BIT(component);
+
+		ras_send_error_event(xe, severity, component);
 	}
 }
 
-- 
2.47.1


^ permalink raw reply related

* Re: [PATCH net-next v2 2/2] tools: ynl: pyynl: pull the --family resolution logic into the lib
From: Donald Hunter @ 2026-07-01  9:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sdf, gal,
	jstancek, ast
In-Reply-To: <20260701021751.3234681-3-kuba@kernel.org>

Jakub Kicinski <kuba@kernel.org> writes:

> When packaging YNL as a system level utility we added a --family
> argument which auto-resolves the full spec path from a well known
> path in /usr/share. Spelling out full YAML spec files is at this
> point only done in-tree, for example in the selftests which need
> the very latest YAML. But the selftests have their own wrapping
> classes for each family so test authors aren't really bothered
> by having to spell the paths out.
>
> Afford the same ease of use to the Python library users.
> Move the path resolution from the CLI code to the library.
> This simplifies the pyynl use by a lot:
>
>   from pyynl import YnlFamily
>
>   ynl = YnlFamily(family="netdev")
>
> Unless I'm missing a trick, resolving the /usr/share path
> is hard enough for most users to lean towards shelling out
> to ynl CLI with --output-json, which is sad.
>
> The ethtool script can now use family= instead of
> resolving the path (the helpers are removed from cli.py
> so this isn't just a cleanup).
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox