Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 00/16] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2026-06-19 11:54 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Hi,

The following patchset contains Netfilter fixes for net, this contains
fixes for a few crash, but many of the patches are trivial/correctness
fixes. There is too one rework of the conntrack expectation timeout
strategy to deal with a possible race when removing an expectation.

1) Fix the incorrect flowtable timeout extension for entries in
   hw offload, from Adrian Bente. This is correcting a defect in
   the functionality, no crash.

2) Hold reference to device under the fake dst in br_netfilter,
   from Haoze Xie. This is fixing a possible UaF if the device
   is removed while packet is sitting in nfqueue.

3) Reject template conntrack in xt_cluster, otherwise access to
   uninitialize conntrack fields are possible leading to WARN_ON
   due to unset layer 3 protocol. From Wyatt Feng.

4) Make sure the IPv6 tunnel header is in the linear skb data
   area before pulling. While at it remove incomplete NEXTHDR_DEST
   support. From Lorenzo Bianconi. This possibly leading to crash
   if IPv4 header is not linear, but GRO already guarantees this,
   unlikely but still possible.

5) Bail out immediately if ENOMEM is seen in a nfnetlink batch,
   no further processing since this will accumulate more bogus
   errors. From Florian Westphal. Functionally improvements
   under memory stress, no crash.

6) Use test_bit_acquire in ipset hash set to avoid reordering
   of subsequent memory access. This is addressing a LLM related
   report, no crash has been observed. From Jozsef Kadlecsik.

7) Use test_bit_acquire in ipset bitmap set too, for the same
   reason as in the previous patch, from Jozsef Kadlecsik.

8) Call kfree_rcu() after rcu_assign_pointer() to address a
   possible UaF, very hard to trigger. Never observed in practise,
   reported by LLM. Also from Jozsef Kadlecsik.

9) Use disable_delayed_work_sync() instead cancel_delayed_work_sync()
   to avoid that ipset GC handler re-queues work as reported by LLM.
   From Jozsef Kadlecsik. This is for correctness.

10) Restore the check in nft_payload for exceeding payloda offset
    over 2^16. From Florian Westphal. This fixes a silent truncation,
    not a big deal, but better be assertive and reject it.

11) Validate NFT_META_BRI_IIFHWADDR can only run from bridge
    prerouting. From Florian Westphal. Harmless but it could allow
    to read bytes from skb->cb.

12) Zero out destination hardware address during the flowtable
    path setup, also from Florian. This is a correctness fix, LLM
    points that possible infoleak can happen but topology to achieve
    it is not clear.

13) Skip IPv4 options if present when building the IPV4 reject reply.
    Otherwise bytes in the IPv4 options header can be sent back to
    origin where the ICMP header is being expected. Again from
    Florian Westphal.

14) Replace timer API for expectation by GC worker approach. This
    is implicitly fixing a race between nf_ct_remove_expectations()
    which might fail to remove the expectation due to timer_del()
    returning false because timer has expired and callback is
    being run concurrently. This fix is addressing a crash that has
    been already reported with a reproducer.

15) Store the master tuple in the expectation, since SLAB_TYPESAFE_BY_RCU
    does not guarantee that accessing exp->master under rcu read lock
    refer to the right master conntrack. Found by initial round of
    fixes for expectation by LLM also found this.

16) Check if br_vlan_get_pvid_rcu() fails to address a possible stack
    infoleak of 4-bytes. From Florian Westphal.

This is slightly over the 15 patch limit in batches, please, allow this
round to exceed it by one.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-06-19

Thanks.

----------------------------------------------------------------

The following changes since commit 96e7f9122aae0ed000ee321f324b812a447906d9:

  eth: fbnic: take netif_addr_lock_bh() around rx mode address programming (2026-06-18 18:36:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-06-19

for you to fetch changes up to 05477f7a037c127854b58441f60b34210668f5c3:

  netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak (2026-06-19 12:27:08 +0200)

----------------------------------------------------------------
netfilter pull request 26-06-19

----------------------------------------------------------------
Adrian Bente (1):
      netfilter: flowtable: fix offloaded ct timeout never being extended

Florian Westphal (6):
      netfilter: nfnetlink: make OOM conditions fatal
      netfilter: nft_payload: reject offsets exceeding 65535 bytes
      netfilter: nft_meta_bridge: add validate callback for get operations
      netfilter: nft_flow_offload: zero device address for non-ether case
      netfilter: nf_reject: skip iphdr options when looking for icmp header
      netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak

Haoze Xie (1):
      netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst

Jozsef Kadlecsik (4):
      netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types
      netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types
      netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()
      netfilter: ipset: make sure gc is properly stopped

Lorenzo Bianconi (1):
      netfilter: flowtable: fix and simplify IP6IP6 tunnel handling

Pablo Neira Ayuso (2):
      netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
      netfilter: nf_conntrack_expect: store master_tuple in expectation

Wyatt Feng (1):
      netfilter: xt_cluster: reject template conntracks in hash match

 include/net/netfilter/nf_conntrack_expect.h        |  17 ++-
 include/net/netfilter/nf_queue.h                   |   1 +
 include/net/netfilter/nft_meta.h                   |   2 +
 include/uapi/linux/netfilter/nf_conntrack_common.h |   1 +
 net/bridge/netfilter/nft_meta_bridge.c             |  23 +++-
 net/ipv4/netfilter/nf_reject_ipv4.c                |   2 +-
 net/ipv6/ip6_tunnel.c                              |   7 +
 net/netfilter/ipset/ip_set_bitmap_gen.h            |   4 +-
 net/netfilter/ipset/ip_set_bitmap_ip.c             |   2 +-
 net/netfilter/ipset/ip_set_bitmap_ipmac.c          |   2 +-
 net/netfilter/ipset/ip_set_bitmap_port.c           |   2 +-
 net/netfilter/ipset/ip_set_core.c                  |   4 +-
 net/netfilter/ipset/ip_set_hash_gen.h              |  12 +-
 net/netfilter/nf_conntrack_broadcast.c             |   1 +
 net/netfilter/nf_conntrack_core.c                  |  33 ++++-
 net/netfilter/nf_conntrack_expect.c                | 147 +++++++++++----------
 net/netfilter/nf_conntrack_h323_main.c             |   4 +-
 net/netfilter/nf_conntrack_helper.c                |  10 +-
 net/netfilter/nf_conntrack_netlink.c               |  31 ++---
 net/netfilter/nf_conntrack_sip.c                   |  13 +-
 net/netfilter/nf_flow_table_core.c                 |  13 +-
 net/netfilter/nf_flow_table_ip.c                   |  80 +++--------
 net/netfilter/nf_flow_table_path.c                 |   4 +-
 net/netfilter/nf_queue.c                           |  14 ++
 net/netfilter/nfnetlink.c                          |   7 +
 net/netfilter/nfnetlink_queue.c                    |   3 +
 net/netfilter/nft_ct.c                             |   3 +-
 net/netfilter/nft_meta.c                           |   5 +-
 net/netfilter/nft_payload.c                        |  16 ++-
 net/netfilter/xt_cluster.c                         |   2 +-
 .../selftests/net/netfilter/nft_flowtable.sh       |   8 +-
 31 files changed, 268 insertions(+), 205 deletions(-)

^ permalink raw reply

* Re: [PATCH net v2] net: airoha: Fix TX scheduler queue mask loop upper bound
From: Lorenzo Bianconi @ 2026-06-19 11:39 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178185574223.2378148.13454900445528174929@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2499 bytes --]

> In airoha_qdma_set_chan_tx_sched(), the loop clearing queue mask was
> using AIROHA_NUM_TX_RING (32) instead of AIROHA_NUM_QOS_QUEUES (8).
> 
> Each channel has 8 queues, and TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i)
> computes BIT(i + (channel * 8)). With i ranging 0..31, this causes:
> - channel 0: clears bit 0..31 (all 4 channels) instead of 0..7
> - channel 1: clears bit 8..31 (channels 1-3) instead of 8..15
> - channel 2: clears bit 16..31 (channels 2-3) instead of 16..23
> - channel 3: clears bit 24..31 (channel 3 only) - correct by accident
> 
> While BIT(32+) on arm64 produces 64-bit values truncated to 0 in u32
> mask parameter, the loop still incorrectly clears queues within the
> same channel beyond queue 7.
> 
> Even though this is functionally harmless (the register resets to 0
> and is only ever cleared, never set — so clearing extra bits is a
> no-op), the loop bound is semantically wrong and should be fixed for
> correctness and clarity.
> 
> Fix by using AIROHA_NUM_QOS_QUEUES (8) as the loop upper bound.
> 
> Fixes: ef1ca9271313 ("net: airoha: Add sched HTB offload support")
> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Wayen Yan <win847@gmail.com>
> ---
> Changes in v2:
> - Add Lorenzo's Acked-by tag.
> - Clarify in commit message that this is semantically wrong but
>   functionally harmless (register resets to 0, only cleared), as
>   Lorenzo pointed out in review.
> - Rebase on current net tree.
> 
> Link: https://lore.kernel.org/netdev/ajJIWMs4dVbfkHZ5@lore-desk/
> Link: https://lore.kernel.org/netdev/CAL_ptrs6J3Ryw_4mVTq5VgzkB4RreF5S0huHyLvd9YwWr1m6jAA@mail.gmail.com/
> 
>  drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index d0c0c0ec8a..ca77747b44 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -2212,7 +2212,7 @@ static int airoha_qdma_set_chan_tx_sched(struct net_device *dev,
>  	struct airoha_gdm_port *port = netdev_priv(dev);

it seems you have not rebased on top of net tree.

Regards,
Lorenzo

>  	int i;
>  
> -	for (i = 0; i < AIROHA_NUM_TX_RING; i++)
> +	for (i = 0; i < AIROHA_NUM_QOS_QUEUES; i++)
>  		airoha_qdma_clear(port->qdma, REG_QUEUE_CLOSE_CFG(channel),
>  				  TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i));
>  
> -- 
> 2.51.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH net v2 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Lorenzo Bianconi @ 2026-06-19 11:37 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Simon Horman, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260619-airoha-qos-fixes-v2-0-5c43485038f9@kernel.org>

airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
accesses in the networking stack.
For example, allocating channel 0 then channel 3 results in
real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
Fix this by computing real_num_tx_queues based on the highest active
channel index rather than using a simple counter, in both the allocation
and deletion paths.

Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index aa98d1823ab6..aa2ddfd3af9f 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -2789,7 +2789,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
 					  struct tc_htb_qopt_offload *opt)
 {
 	u32 channel = TC_H_MIN(opt->classid) % AIROHA_NUM_QOS_CHANNELS;
-	int err, num_tx_queues = netdev->real_num_tx_queues;
+	int err, num_tx_queues = AIROHA_NUM_TX_RING + channel + 1;
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
 	struct airoha_qdma *qdma = dev->qdma;
 
@@ -2806,13 +2806,15 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
 	if (err)
 		goto error;
 
-	err = netif_set_real_num_tx_queues(netdev, num_tx_queues + 1);
-	if (err) {
-		airoha_qdma_set_tx_rate_limit(netdev, channel, 0,
-					      opt->quantum);
-		NL_SET_ERR_MSG_MOD(opt->extack,
-				   "failed setting real_num_tx_queues");
-		goto error;
+	if (num_tx_queues > netdev->real_num_tx_queues) {
+		err = netif_set_real_num_tx_queues(netdev, num_tx_queues);
+		if (err) {
+			airoha_qdma_set_tx_rate_limit(netdev, channel, 0,
+						      opt->quantum);
+			NL_SET_ERR_MSG_MOD(opt->extack,
+					   "failed setting real_num_tx_queues");
+			goto error;
+		}
 	}
 
 	set_bit(channel, dev->qos_sq_bmap);
@@ -3003,13 +3005,18 @@ static int airoha_dev_setup_tc_block(struct net_device *dev,
 static void airoha_tc_remove_htb_queue(struct net_device *netdev, int queue)
 {
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+	int num_tx_queues = AIROHA_NUM_TX_RING;
 	struct airoha_qdma *qdma = dev->qdma;
 
-	netif_set_real_num_tx_queues(netdev, netdev->real_num_tx_queues - 1);
 	airoha_qdma_set_tx_rate_limit(netdev, queue, 0, 0);
 
 	clear_bit(queue, qdma->qos_channel_map);
 	clear_bit(queue, dev->qos_sq_bmap);
+
+	if (!bitmap_empty(dev->qos_sq_bmap, AIROHA_NUM_QOS_CHANNELS))
+		num_tx_queues += find_last_bit(dev->qos_sq_bmap,
+					       AIROHA_NUM_QOS_CHANNELS) + 1;
+	netif_set_real_num_tx_queues(netdev, num_tx_queues);
 }
 
 static int airoha_tc_htb_delete_leaf_queue(struct net_device *netdev,

-- 
2.54.0


^ permalink raw reply related

* [PATCH net v2 1/2] net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
From: Lorenzo Bianconi @ 2026-06-19 11:37 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Simon Horman, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260619-airoha-qos-fixes-v2-0-5c43485038f9@kernel.org>

airoha_tc_htb_alloc_leaf_queue() computes the HTB QoS channel index
as opt->classid % AIROHA_NUM_QOS_CHANNELS and stores it in qos_sq_bmap.
However, airoha_tc_remove_htb_queue() clears the HTB configuration
using queue + 1 as the channel index, causing an off-by-one error.
Use queue directly as the QoS channel index to match the allocation
logic.

Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 64dde6464f3f..aa98d1823ab6 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -3006,7 +3006,7 @@ static void airoha_tc_remove_htb_queue(struct net_device *netdev, int queue)
 	struct airoha_qdma *qdma = dev->qdma;
 
 	netif_set_real_num_tx_queues(netdev, netdev->real_num_tx_queues - 1);
-	airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0);
+	airoha_qdma_set_tx_rate_limit(netdev, queue, 0, 0);
 
 	clear_bit(queue, qdma->qos_channel_map);
 	clear_bit(queue, dev->qos_sq_bmap);

-- 
2.54.0


^ permalink raw reply related

* [PATCH net v2 0/2] airoha: fixes for sched HTB offload support
From: Lorenzo Bianconi @ 2026-06-19 11:37 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Simon Horman, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev


---
Changes in v2:
- cosmetics
- Link to v1: https://lore.kernel.org/r/20260618-airoha-qos-fixes-v1-0-37192652157f@kernel.org

---
Lorenzo Bianconi (2):
      net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
      net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels

 drivers/net/ethernet/airoha/airoha_eth.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)
---
base-commit: 96e7f9122aae0ed000ee321f324b812a447906d9
change-id: 20260618-airoha-qos-fixes-b6460b085680

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply

* Re: [PATCH net 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Lorenzo Bianconi @ 2026-06-19 11:34 UTC (permalink / raw)
  To: Simon Horman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260619093529.GV827683@horms.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

> On Thu, Jun 18, 2026 at 08:00:30AM +0200, Lorenzo Bianconi wrote:
> > airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
> > index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
> > real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
> > channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
> > 2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
> > accesses in the networking stack.
> > For example, allocating channel 0 then channel 3 results in
> > real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
> > Fix this by computing real_num_tx_queues based on the highest active
> > channel index rather than using a simple counter, in both the allocation
> > and deletion paths.
> > 
> > Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/ethernet/airoha/airoha_eth.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> 
> ...
> 
> > @@ -2806,7 +2806,10 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
> >  	if (err)
> >  		goto error;
> >  
> > -	err = netif_set_real_num_tx_queues(netdev, num_tx_queues + 1);
> > +	if (num_tx_queues <= netdev->real_num_tx_queues)
> > +		goto set_qos_sq_bmap;
> > +
> > +	err = netif_set_real_num_tx_queues(netdev, num_tx_queues);
> >  	if (err) {
> >  		airoha_qdma_set_tx_rate_limit(netdev, channel, 0,
> >  					      opt->quantum);
> > @@ -2815,6 +2818,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
> >  		goto error;
> >  	}
> >  
> > +set_qos_sq_bmap:
> 
> I would prefer if this could be achieved without a goto.

ack, I will fix it in v2.

Regards,
Lorenzo

> 
> >  	set_bit(channel, dev->qos_sq_bmap);
> >  	opt->qid = AIROHA_NUM_TX_RING + channel;
> >  
> 
> ...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* airoha_eth: PPE flow entries and EGRESS TRTCM shaping
From: Wayen Yan @ 2026-06-19 11:32 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: netdev, linux-kernel

Hi Lorenzo,

While reviewing the airoha HTB offload code and your recent fix series,
I noticed that EGRESS TRTCM rate-limit buckets configured by
airoha_qdma_set_tx_rate_limit() do not appear to be referenced by
either the CPU xmit path or PPE-accelerated flows.

Specifically:

1) CPU xmit path always disables the meter:

   In airoha_dev_xmit(), TXMSG.METER is hardcoded to 0x7f:

     msg1 = FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f);

   The register comment notes "0x7f no meters", so even though
   airoha_tc_htb_alloc_leaf_queue() configures TRTCM bucket[channel]
   with the requested rate, CPU-path packets never hit those buckets.

2) PPE flow entries never bind to a TRTCM bucket:

   In airoha_ppe_foe_entry_prepare(), the FOE 'data' field is
   initialized with:

     qdata = FIELD_PREP(AIROHA_FOE_SHAPER_ID, 0x7f);

   And neither CHANNEL[15:11] nor QID[10:8] are ever set — they
   remain zero in all PPE entry creation paths (tc offload,
   HW autolearn, L2 subflow commit). This means PPE-accelerated
   flows bypass TRTCM shaping entirely.

   The only QoS-related bit set is IB2.PSE_QOS for wired LAN ports,
   which is a flag rather than an index.

3) airoha_ppe_foe_flow_stats_update() does read CHANNEL|QID from
   the data field and moves them to ACTDP, but since they were
   never populated, this is effectively a no-op in practice.

I have a few questions:

- Does AIROHA_FOE_SHAPER_ID map to the same QDMA TRTCM meter index
  space used by airoha_qdma_set_tx_rate_limit()? If so, setting
  SHAPER_ID = channel for flows belonging to an HTB class would
  enable per-flow egress shaping via PPE.

- Is the current 0x7f / disabled behavior intentional — i.e., PPE
  flow shaping is simply not yet implemented, or is there a hardware
  constraint I'm missing?

- For the CPU xmit path, is there a reason TXMSG.METER cannot be
  set to the channel derived from skb_get_queue_mapping(), so that
  CPU-path packets also respect the TRTCM rate limits?

Thanks,
Wayen

^ permalink raw reply

* Re: [PATCH] net: add sock_open() for unified socket creation
From: Alex Goltsev @ 2026-06-19 10:35 UTC (permalink / raw)
  To: Al Viro; +Cc: davem, netdev, linux-kernel
In-Reply-To: <20260618211231.GB2636677@ZenIV>

> What's the point (and why not make it inline, while we are at it)?

> Are there really callers that would pass a non-constant value as the last argument,
> and if so, what are they doing next?

As for `inline`: in this case, it would have no practical significance.

The compiler already treats a simple inline function as a regular

symbol within the `EXPORT_SYMBOL` context, whereas a static inline
function (the standard

kernel template for helper functions) would completely break the
export to the LKM.

This function solves the problem of actual API fragmentation:
currently, there are

three nearly identical functions (sock_create, sock_create_kern,

sock_create_lite) with slightly different signatures. If new variants

are added in the future, this will turn into a “zoo,” similar to what

happened with `kmalloc` before it was unified. I propose unifying this

now, while maintaining backward compatibility (all three existing

functions will remain unchanged).

As for the last argument, yes, today it is usually a constant,

but that’s not the point. The purpose of the enumeration is to provide

a unified, explicit control interface. It’s important that if, in the future,

someone adds a new type of socket creation, existing calling programs won’t

panic or throw a compilation error, but will smoothly fall back to

the default case and return -EINVAL, which is a safe failure mode.

I’m also aware that in `sock_open`, the first argument passed to the
`sock_create_kern` branch is not safe; I’ve placed it there as a
placeholder and am considering how to elegantly pass the `struct net`
to the `sock_create_kern` branch.

^ permalink raw reply

* [PATCH net v2] net: airoha: fix BQL underflow and UAF in shared QDMA TX ring
From: Lorenzo Bianconi @ 2026-06-19 10:30 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev

When multiple netdevs share a QDMA TX ring and one device is stopped,
netdev_tx_reset_subqueue() zeroes that device's BQL counters while its
pending skbs remain in the shared HW TX ring. When NAPI later completes
those skbs via netdev_tx_completed_queue(), the already-zeroed
dql->num_queued counter underflows.
Moreover, in the airoha_remove() path, netdevs are unregistered
sequentially while skbs from previously unregistered netdevs may still
reference freed net_device memory via skb->dev, causing a use-after-free
during BQL accounting.
Fix both issues:
- Remove netdev_tx_reset_subqueue() from airoha_dev_stop() so pending
  skbs are completed naturally by NAPI with proper BQL accounting.
- Introduce airoha_qdma_tx_flush() to stop NAPI and flush BQL counters
  for all pending skbs while skb->dev references are still valid.
- Guard airoha_dev_xmit() with DEV_STATE_FLUSH to drop packets during
  teardown.
- Move DMA engine start into probe and stop into airoha_qdma_cleanup().

Fixes: a9c2ca61fec7 ("net: airoha: Support multiple net_devices for a single FE GDM port")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in v2:
- Introduce airoha_qdma_tx_flush() to account BQL in airoha_remove() or
  airoha_probe() error path.
- Fix possible NULL pointer dereference in airoha_qdma_cleanup().
- Introduce DEV_STATE_FLUSH().
- Move back airoha_hw_cleanup().
- Set proper Fixes tag.
- Link to v1: https://lore.kernel.org/r/20260618-airoha-bql-fixes-v1-1-ffd2c2089518@kernel.org
---
 drivers/net/ethernet/airoha/airoha_eth.c | 87 +++++++++++++++++++++++---------
 drivers/net/ethernet/airoha/airoha_eth.h |  1 +
 2 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 64dde6464f3f..e81cd806b57b 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1004,6 +1004,7 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 
 		e = &q->entry[index];
 		skb = e->skb;
+		e->skb = NULL;
 
 		dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
 				 DMA_TO_DEVICE);
@@ -1523,10 +1524,26 @@ static int airoha_qdma_init(struct platform_device *pdev,
 	return airoha_qdma_hw_init(qdma);
 }
 
-static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
+static void airoha_qdma_cleanup(struct airoha_eth *eth,
+				struct airoha_qdma *qdma)
 {
 	int i;
 
+	if (test_bit(DEV_STATE_INITIALIZED, &eth->state)) {
+		u32 status;
+
+		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
+				  GLOBAL_CFG_TX_DMA_EN_MASK |
+				  GLOBAL_CFG_RX_DMA_EN_MASK);
+		if (read_poll_timeout(airoha_qdma_rr, status,
+				      !(status & (GLOBAL_CFG_TX_DMA_BUSY_MASK |
+						  GLOBAL_CFG_RX_DMA_BUSY_MASK)),
+				      USEC_PER_MSEC, 50 * USEC_PER_MSEC, true,
+				      qdma, REG_QDMA_GLOBAL_CFG))
+			dev_warn(eth->dev,
+				 "QDMA DMA engine busy timeout\n");
+	}
+
 	for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
 		if (!qdma->q_rx[i].ndesc)
 			continue;
@@ -1593,7 +1610,7 @@ static int airoha_hw_init(struct platform_device *pdev,
 	return 0;
 error:
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_cleanup(&eth->qdma[i]);
+		airoha_qdma_cleanup(eth, &eth->qdma[i]);
 
 	return err;
 }
@@ -1603,7 +1620,7 @@ static void airoha_hw_cleanup(struct airoha_eth *eth)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_cleanup(&eth->qdma[i]);
+		airoha_qdma_cleanup(eth, &eth->qdma[i]);
 	airoha_ppe_deinit(eth);
 }
 
@@ -1637,6 +1654,35 @@ static void airoha_qdma_stop_napi(struct airoha_qdma *qdma)
 	}
 }
 
+static void airoha_qdma_tx_flush(struct airoha_qdma *qdma)
+{
+	int i;
+
+	airoha_qdma_stop_napi(qdma);
+
+	for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
+		struct airoha_queue *q = &qdma->q_tx[i];
+		int j;
+
+		if (!q->ndesc)
+			continue;
+
+		spin_lock_bh(&q->lock);
+		for (j = 0; j < q->ndesc; j++) {
+			struct airoha_queue_entry *e = &q->entry[j];
+			struct sk_buff *skb = e->skb;
+			struct netdev_queue *txq;
+
+			if (!skb)
+				continue;
+
+			txq = skb_get_tx_queue(skb->dev, skb);
+			netdev_tx_completed_queue(txq, 1, skb->len);
+		}
+		spin_unlock_bh(&q->lock);
+	}
+}
+
 static void airoha_dev_get_hw_stats(struct airoha_gdm_dev *dev)
 {
 	struct airoha_gdm_port *port = dev->port;
@@ -1837,9 +1883,6 @@ static int airoha_dev_open(struct net_device *netdev)
 	}
 	port->users++;
 
-	airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG,
-			GLOBAL_CFG_TX_DMA_EN_MASK |
-			GLOBAL_CFG_RX_DMA_EN_MASK);
 	qdma->users++;
 
 	if (!airoha_is_lan_gdm_dev(dev) &&
@@ -1880,12 +1923,9 @@ static int airoha_dev_stop(struct net_device *netdev)
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
 	struct airoha_gdm_port *port = dev->port;
 	struct airoha_qdma *qdma = dev->qdma;
-	int i;
 
 	netif_tx_disable(netdev);
 	airoha_set_vip_for_gdm_port(dev, false);
-	for (i = 0; i < netdev->num_tx_queues; i++)
-		netdev_tx_reset_subqueue(netdev, i);
 
 	if (--port->users)
 		airoha_set_port_mtu(dev->eth, port);
@@ -1893,19 +1933,7 @@ static int airoha_dev_stop(struct net_device *netdev)
 		airoha_set_gdm_port_fwd_cfg(qdma->eth,
 					    REG_GDM_FWD_CFG(port->id),
 					    FE_PSE_PORT_DROP);
-
-	if (!--qdma->users) {
-		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
-				  GLOBAL_CFG_TX_DMA_EN_MASK |
-				  GLOBAL_CFG_RX_DMA_EN_MASK);
-
-		for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
-			if (!qdma->q_tx[i].ndesc)
-				continue;
-
-			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
-		}
-	}
+	qdma->users--;
 
 	return 0;
 }
@@ -2191,6 +2219,9 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	u16 index;
 	u8 fport;
 
+	if (test_bit(DEV_STATE_FLUSH, &dev->eth->state))
+		goto error;
+
 	qid = airoha_qdma_get_txq(qdma, skb_get_queue_mapping(skb));
 	tag = airoha_get_dsa_tag(skb, netdev);
 
@@ -3413,8 +3444,12 @@ static int airoha_probe(struct platform_device *pdev)
 	if (err)
 		goto error_netdev_free;
 
-	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
+	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) {
 		airoha_qdma_start_napi(&eth->qdma[i]);
+		airoha_qdma_set(&eth->qdma[i], REG_QDMA_GLOBAL_CFG,
+				GLOBAL_CFG_TX_DMA_EN_MASK |
+				GLOBAL_CFG_RX_DMA_EN_MASK);
+	}
 
 	for_each_child_of_node(pdev->dev.of_node, np) {
 		if (!of_device_is_compatible(np, "airoha,eth-mac"))
@@ -3437,8 +3472,9 @@ static int airoha_probe(struct platform_device *pdev)
 	return 0;
 
 error_napi_stop:
+	set_bit(DEV_STATE_FLUSH, &eth->state);
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_stop_napi(&eth->qdma[i]);
+		airoha_qdma_tx_flush(&eth->qdma[i]);
 
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
@@ -3474,8 +3510,9 @@ static void airoha_remove(struct platform_device *pdev)
 	struct airoha_eth *eth = platform_get_drvdata(pdev);
 	int i;
 
+	set_bit(DEV_STATE_FLUSH, &eth->state);
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_stop_napi(&eth->qdma[i]);
+		airoha_qdma_tx_flush(&eth->qdma[i]);
 
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 41d2e7a1f9fb..f6dce5e74e02 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -92,6 +92,7 @@ enum {
 enum {
 	DEV_STATE_INITIALIZED,
 	DEV_STATE_REGISTERED,
+	DEV_STATE_FLUSH,
 };
 
 enum {

---
base-commit: a887f2c7da66a805a55fd8706d45faec85f646db
change-id: 20260618-airoha-bql-fixes-f57b2d108573

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* Re: [PATCH net 3/6] ipv6: fix error handling in forwarding sysctl
From: Fernando Fernandez Mancera @ 2026-06-19 10:28 UTC (permalink / raw)
  To: nicolas.dichtel, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <fa7cbedb-33f3-4a79-acff-2b23cf3f57d8@6wind.com>

On 6/19/26 11:34 AM, Nicolas Dichtel wrote:
> Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
>> When writing to the forwarding sysctl, if proc_dointvec() fails to parse
>> the input, it returns a negative error code. The current implementation
>> is overwriting that error for write operations.
>>
>> This results in a silent failure, it returns a successful write although
>> the configuration was not modified at all. When modifying the "all"
>> variant it can also modify the configuration of existing interfaces to
>> the wrong value.
>>
>> Fix this by checking the return value of proc_dointvec() and returning
>> early on failure.
>>
>> Fixes: b325fddb7f86 ("ipv6: Fix sysctl unregistration deadlock")
> The bug existed before the git era.
> Maybe
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> 
> 


Hm, not really, AFAICS b325fddb7f86 is the first commit overwriting the 
return value from proc_dointvec(). See:

@@ -3983,7 +3986,7 @@ int addrconf_sysctl_forward(ctl_table *ctl, int 
write, struct file * filp,
  	ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);

  	if (write)
-		addrconf_fixup_forwarding(ctl, valp, val);
+		ret = addrconf_fixup_forwarding(ctl, valp, val);
  	return ret;
  }

^ permalink raw reply

* Re: [PATCHv2 3/4] mmc: sdhci-esdhc-mcf: do not use readl()/writel() on ColdFire
From: Angelo Dureghello @ 2026-06-19 10:24 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: linux-m68k, linux-kernel, arnd, wei.fang, frank.li, shenwei.wang,
	imx, netdev, nico, adureghello, ulfh, linux-mmc, linux-can,
	linux-spi, olteanv
In-Reply-To: <20260609142139.1563360-5-gerg@linux-m68k.org>

Hi Greg,

the driver behaves as before, and it is not using "edma",
but his internal dma interface, no new issues are introduced.

Over testing btw i noticed a possible issue, totally unrelated
with your changes. Will fix it eventually.

Tested-by: Angelo Dureghello <adureghello@baylibre.com>
Acked-by: Angelo Dureghello <adureghello@baylibre.com>

On Wed, Jun 10, 2026 at 12:13:00AM +1000, Greg Ungerer wrote:
> The implementation of the readX() and writeX() family of IO access
> functions is non-standard on ColdFire platforms. They check the supplied
> IO address and will return either big or little endian results based on
> that check. This is non-standard, they are expected to always return
> little-endian byte ordered data. Unfortunately this behavior also means
> that ioreadX()/iowroteX() and their big-endian counter parts
> ioreadXbe()/iowriteXbe() are wrong. This is now in the process of being
> cleaned up and fixed.
>
> Change the use of the readX() and writeX() access functions in this driver
> to use the recently defined specific ColdFire internal SoC hardware IO
> access functions mcf_read8()/mcf_read16()/mcf_read32() and
> mcf_write8()/mcf_write16()/mcf_write32().
>
> There is no functional change to the driver. Though it does have the
> effect of making the IO access slightly more efficient, since there is
> no longer a need to do the address check at every register access.
>
> Acked-by: Angelo Dureghello <adureghello@baylibre.com>
> Tested-by: Angelo Dureghello <adureghello@baylibre.com>
> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
> ---
> v2: moved from RFC to PATCH
>
>  drivers/mmc/host/sdhci-esdhc-mcf.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/mmc/host/sdhci-esdhc-mcf.c b/drivers/mmc/host/sdhci-esdhc-mcf.c
> index 375fce5639d7..6853521e8b2c 100644
> --- a/drivers/mmc/host/sdhci-esdhc-mcf.c
> +++ b/drivers/mmc/host/sdhci-esdhc-mcf.c
> @@ -55,7 +55,7 @@ static inline void esdhc_clrset_be(struct sdhci_host *host,
>  	if (reg == SDHCI_HOST_CONTROL)
>  		val |= ESDHC_PROCTL_D3CD;
>
> -	writel((readl(base) & ~mask) | val, base);
> +	mcf_write32((mcf_read32(base) & ~mask) | val, base);
>  }
>
>  /*
> @@ -71,7 +71,7 @@ static void esdhc_mcf_writeb_be(struct sdhci_host *host, u8 val, int reg)
>  	if (reg == SDHCI_HOST_CONTROL) {
>  		u32 host_ctrl = ESDHC_DEFAULT_HOST_CONTROL;
>  		u8 dma_bits = (val & SDHCI_CTRL_DMA_MASK) >> 3;
> -		u8 tmp = readb(host->ioaddr + SDHCI_HOST_CONTROL + 1);
> +		u8 tmp = mcf_read8(host->ioaddr + SDHCI_HOST_CONTROL + 1);
>
>  		tmp &= ~0x03;
>  		tmp |= dma_bits;
> @@ -82,12 +82,12 @@ static void esdhc_mcf_writeb_be(struct sdhci_host *host, u8 val, int reg)
>  		 */
>  		host_ctrl |= val;
>  		host_ctrl |= (dma_bits << 8);
> -		writel(host_ctrl, host->ioaddr + SDHCI_HOST_CONTROL);
> +		mcf_write32(host_ctrl, host->ioaddr + SDHCI_HOST_CONTROL);
>
>  		return;
>  	}
>
> -	writel((readl(base) & mask) | (val << shift), base);
> +	mcf_write32((mcf_read32(base) & mask) | (val << shift), base);
>  }
>
>  static void esdhc_mcf_writew_be(struct sdhci_host *host, u16 val, int reg)
> @@ -110,24 +110,24 @@ static void esdhc_mcf_writew_be(struct sdhci_host *host, u16 val, int reg)
>  		 * As for the fsl driver,
>  		 * we have to set the mode in a single write here.
>  		 */
> -		writel(val << 16 | mcf_data->aside,
> +		mcf_write32(val << 16 | mcf_data->aside,
>  		       host->ioaddr + SDHCI_TRANSFER_MODE);
>  		return;
>  	}
>
> -	writel((readl(base) & mask) | (val << shift), base);
> +	mcf_write32((mcf_read32(base) & mask) | (val << shift), base);
>  }
>
>  static void esdhc_mcf_writel_be(struct sdhci_host *host, u32 val, int reg)
>  {
> -	writel(val, host->ioaddr + reg);
> +	mcf_write32(val, host->ioaddr + reg);
>  }
>
>  static u8 esdhc_mcf_readb_be(struct sdhci_host *host, int reg)
>  {
>  	if (reg == SDHCI_HOST_CONTROL) {
>  		u8 __iomem *base = host->ioaddr + (reg & ~3);
> -		u16 val = readw(base + 2);
> +		u16 val = mcf_read16(base + 2);
>  		u8 dma_bits = (val >> 5) & SDHCI_CTRL_DMA_MASK;
>  		u8 host_ctrl = val & 0xff;
>
> @@ -137,7 +137,7 @@ static u8 esdhc_mcf_readb_be(struct sdhci_host *host, int reg)
>  		return host_ctrl;
>  	}
>
> -	return readb(host->ioaddr + (reg ^ 0x3));
> +	return mcf_read8(host->ioaddr + (reg ^ 0x3));
>  }
>
>  static u16 esdhc_mcf_readw_be(struct sdhci_host *host, int reg)
> @@ -149,14 +149,14 @@ static u16 esdhc_mcf_readw_be(struct sdhci_host *host, int reg)
>  	if (reg == SDHCI_HOST_VERSION)
>  		reg -= 2;
>
> -	return readw(host->ioaddr + (reg ^ 0x2));
> +	return mcf_read16(host->ioaddr + (reg ^ 0x2));
>  }
>
>  static u32 esdhc_mcf_readl_be(struct sdhci_host *host, int reg)
>  {
>  	u32 val;
>
> -	val = readl(host->ioaddr + reg);
> +	val = mcf_read32(host->ioaddr + reg);
>
>  	/*
>  	 * RM (25.3.9) sd pin clock must never exceed 25Mhz.
> @@ -245,7 +245,7 @@ static void esdhc_mcf_pltfm_set_clock(struct sdhci_host *host,
>  	 * fvco = fsys * outdvi1 + 1
>  	 * fshdc = fvco / outdiv3 + 1
>  	 */
> -	temp = readl(pll_dr);
> +	temp = mcf_read32(pll_dr);
>  	fsys = pltfm_host->clock;
>  	fvco = fsys * ((temp & 0x1f) + 1);
>  	fesdhc = fvco / (((temp >> 10) & 0x1f) + 1);
> --
> 2.54.0
>

^ permalink raw reply

* Re: [PATCH net 5/6] ipv6: reset value and position for proxy_ndp sysctl restart
From: Fernando Fernandez Mancera @ 2026-06-19 10:09 UTC (permalink / raw)
  To: nicolas.dichtel, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <491465c3-0d1a-42f6-86fe-6c31812e23c9@6wind.com>

On 6/19/26 11:58 AM, Nicolas Dichtel wrote:
> Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
>> When handling proxy_ndp, if rtnl_net_trylock() fails, the operation is
>> retried but as the value was already modified by the initial
>> proc_dointvec() call, the restarted syscall will read the newly modified
>> value as the 'old' state.
>>
>> Fix this by restoring the original value and position pointer before
>> restarting the syscall.
> Is it not better to call rtnl_net_trylock() at the beginning of the function?
> It avoids flapping the sysctl value.
> 

IMHO it is not better if we want to reduce the time we are holding RTNL 
lock. I think the idea is that if the user introduces a invalid value, 
we don't need to take the lock at all.

That is the general pattern I see around the sysctl code (IPv4 and 
IPv6). Given the current efforts to reduce the usage of RTNL I think 
this approach would be better.

In any case, it is not a blocker for me so if we all agree that your 
suggestion is better I don't mind taking that path.

Thanks for all the reviews!

>>
>> Fixes: c92d5491a6d9 ("netconf: add support for IPv6 proxy_ndp")
>> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
>> ---
>>   net/ipv6/addrconf.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 8ff015975e27..1cfb223476bd 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -6483,8 +6483,9 @@ static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write,
>>   		void *buffer, size_t *lenp, loff_t *ppos)
>>   {
>>   	int *valp = ctl->data;
>> -	int ret;
>> +	loff_t pos = *ppos;
>>   	int old, new;
>> +	int ret;
>>   
>>   	old = *valp;
>>   	ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
>> @@ -6493,8 +6494,12 @@ static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write,
>>   	if (write && old != new) {
>>   		struct net *net = ctl->extra2;
>>   
>> -		if (!rtnl_net_trylock(net))
>> +		if (!rtnl_net_trylock(net)) {
>> +			/* Restore the original values before restarting */
>> +			*valp = old;
>> +			*ppos = pos;
>>   			return restart_syscall();
>> +		}
>>   
>>   		if (valp == &net->ipv6.devconf_dflt->proxy_ndp) {
>>   			inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
> 
> 


^ permalink raw reply

* Re: [PATCH net 6/6] ipv6: fix missing notification for ignore_routes_with_linkdown
From: Nicolas Dichtel @ 2026-06-19 10:02 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <20260618162225.4588-7-fmancera@suse.de>

Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
> When changing the ignore_routes_with_linkdown sysctl for a specific
> interface, the RTM_NEWNETCONF netlink notification was not being emitted
> to userspace. Fix this by emitting the notification when needed.
> 
> In addition, fix bogus return value for successful "all" and specific
> interface write operation leading to a wrong reset of the position
> pointer.
> 
> Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com

^ permalink raw reply

* Re: [PATCH net 5/6] ipv6: reset value and position for proxy_ndp sysctl restart
From: Nicolas Dichtel @ 2026-06-19  9:58 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <20260618162225.4588-6-fmancera@suse.de>

Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
> When handling proxy_ndp, if rtnl_net_trylock() fails, the operation is
> retried but as the value was already modified by the initial
> proc_dointvec() call, the restarted syscall will read the newly modified
> value as the 'old' state.
> 
> Fix this by restoring the original value and position pointer before
> restarting the syscall.
Is it not better to call rtnl_net_trylock() at the beginning of the function?
It avoids flapping the sysctl value.

> 
> Fixes: c92d5491a6d9 ("netconf: add support for IPv6 proxy_ndp")
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> ---
>  net/ipv6/addrconf.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8ff015975e27..1cfb223476bd 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -6483,8 +6483,9 @@ static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write,
>  		void *buffer, size_t *lenp, loff_t *ppos)
>  {
>  	int *valp = ctl->data;
> -	int ret;
> +	loff_t pos = *ppos;
>  	int old, new;
> +	int ret;
>  
>  	old = *valp;
>  	ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
> @@ -6493,8 +6494,12 @@ static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write,
>  	if (write && old != new) {
>  		struct net *net = ctl->extra2;
>  
> -		if (!rtnl_net_trylock(net))
> +		if (!rtnl_net_trylock(net)) {
> +			/* Restore the original values before restarting */
> +			*valp = old;
> +			*ppos = pos;
>  			return restart_syscall();
> +		}
>  
>  		if (valp == &net->ipv6.devconf_dflt->proxy_ndp) {
>  			inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,


^ permalink raw reply

* [PATCH v6.1 3/3] netfilter: nf_tables: unconditionally bump set->nelems before insertion
From: Shivani Agarwal @ 2026-06-19  9:28 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Inseo An, Li hongliang, Sasha Levin, Shivani Agarwal
In-Reply-To: <20260619092850.1274076-1-shivani.agarwal@broadcom.com>

From: Pablo Neira Ayuso <pablo@netfilter.org>

[ Upstream commit def602e498a4f951da95c95b1b8ce8ae68aa733a ]

In case that the set is full, a new element gets published then removed
without waiting for the RCU grace period, while RCU reader can be
walking over it already.

To address this issue, add the element transaction even if set is full,
but toggle the set_full flag to report -ENFILE so the abort path safely
unwinds the set to its previous state.

As for element updates, decrement set->nelems to restore it.

A simpler fix is to call synchronize_rcu() in the error path.
However, with a large batch adding elements to already maxed-out set,
this could cause noticeable slowdown of such batches.

Fixes: 35d0ac9070ef ("netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL")
Reported-by: Inseo An <y0un9sa@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
[ Minor conflict resolved. ]
Signed-off-by: Li hongliang <1468888505@139.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Shivani: Modified to apply on 6.1.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
 net/netfilter/nf_tables_api.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 15bfdf07c..196ac4e76 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6388,6 +6388,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_data_desc desc;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
+	bool set_full = false;
 	u64 timeout;
 	u64 expiration;
 	int err, i;
@@ -6680,10 +6681,18 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	if (err < 0)
 		goto err_elem_free;
 
+	if (!(flags & NFT_SET_ELEM_CATCHALL)) {
+		unsigned int max = nft_set_maxsize(set), nelems;
+
+		nelems = atomic_inc_return(&set->nelems);
+		if (nelems > max)
+			set_full = true;
+	}
+
 	trans = nft_trans_elem_alloc(ctx, NFT_MSG_NEWSETELEM, set);
 	if (trans == NULL) {
 		err = -ENOMEM;
-		goto err_elem_free;
+		goto err_set_size;
 	}
 
 	ext->genmask = nft_genmask_cur(ctx->net);
@@ -6715,23 +6724,16 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err_element_clash;
 	}
 
-	if (!(flags & NFT_SET_ELEM_CATCHALL)) {
-		unsigned int max = nft_set_maxsize(set);
-
-		if (!atomic_add_unless(&set->nelems, 1, max)) {
-			err = -ENFILE;
-			goto err_set_full;
-		}
-	}
-
 	nft_trans_elem(trans) = elem;
 	nft_trans_commit_list_add_tail(ctx->net, trans);
-	return 0;
 
-err_set_full:
-	nft_setelem_remove(ctx->net, set, &elem);
+	return set_full ? -ENFILE : 0;
+
 err_element_clash:
 	kfree(trans);
+err_set_size:
+	if (!(flags & NFT_SET_ELEM_CATCHALL))
+		atomic_dec(&set->nelems);
 err_elem_free:
 	nf_tables_set_elem_destroy(ctx, set, elem.priv);
 err_parse_data:
-- 
2.53.0


^ permalink raw reply related

* [PATCH v6.1 2/3] netfilter: nf_tables: fix set size with rbtree backend
From: Shivani Agarwal @ 2026-06-19  9:28 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Sasha Levin, Shivani Agarwal
In-Reply-To: <20260619092850.1274076-1-shivani.agarwal@broadcom.com>

From: Pablo Neira Ayuso <pablo@netfilter.org>

[ Upstream commit 8d738c1869f611955d91d8d0fd0012d9ef207201 ]

The existing rbtree implementation uses singleton elements to represent
ranges, however, userspace provides a set size according to the number
of ranges in the set.

Adjust provided userspace set size to the number of singleton elements
in the kernel by multiplying the range by two.

Check if the no-match all-zero element is already in the set, in such
case release one slot in the set size.

Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Shivani: Modified to apply on 6.1.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
 include/net/netfilter/nf_tables.h |  6 ++++
 net/netfilter/nf_tables_api.c     | 49 +++++++++++++++++++++++++++++--
 net/netfilter/nft_set_rbtree.c    | 43 +++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index dafa0a32e..3329c2eae 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -422,6 +422,9 @@ struct nft_set_ext;
  *	@remove: remove element from set
  *	@walk: iterate over all set elements
  *	@get: get set elements
+ *	@ksize: kernel set size
+ *	@usize: userspace set size
+ *	@adjust_maxsize: delta to adjust maximum set size
  *	@privsize: function to return size of set private data
  *	@init: initialize private data of new set instance
  *	@destroy: destroy private data of set instance
@@ -470,6 +473,9 @@ struct nft_set_ops {
 					       const struct nft_set *set,
 					       const struct nft_set_elem *elem,
 					       unsigned int flags);
+	u32				(*ksize)(u32 size);
+	u32				(*usize)(u32 size);
+	u32				(*adjust_maxsize)(const struct nft_set *set);
 	void				(*commit)(struct nft_set *set);
 	void				(*abort)(const struct nft_set *set);
 	u64				(*privsize)(const struct nlattr * const nla[],
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ec4bfe53b..15bfdf07c 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4264,6 +4264,14 @@ static int nf_tables_fill_set_concat(struct sk_buff *skb,
 	return 0;
 }
 
+static u32 nft_set_userspace_size(const struct nft_set_ops *ops, u32 size)
+{
+	if (ops->usize)
+		return ops->usize(size);
+
+	return size;
+}
+
 static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
 			      const struct nft_set *set, u16 event, u16 flags)
 {
@@ -4328,7 +4336,8 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
 	if (!nest)
 		goto nla_put_failure;
 	if (set->size &&
-	    nla_put_be32(skb, NFTA_SET_DESC_SIZE, htonl(set->size)))
+	    nla_put_be32(skb, NFTA_SET_DESC_SIZE,
+			 htonl(nft_set_userspace_size(set->ops, set->size))))
 		goto nla_put_failure;
 
 	if (set->field_count > 1 &&
@@ -4698,6 +4707,15 @@ static bool nft_set_is_same(const struct nft_set *set,
 	return true;
 }
 
+static u32 nft_set_kernel_size(const struct nft_set_ops *ops,
+			       const struct nft_set_desc *desc)
+{
+	if (ops->ksize)
+		return ops->ksize(desc->size);
+
+	return desc->size;
+}
+
 static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
 			    const struct nlattr * const nla[])
 {
@@ -4880,6 +4898,9 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
 		if (err < 0)
 			return err;
 
+		if (desc.size)
+			desc.size = nft_set_kernel_size(set->ops, &desc);
+
 		err = 0;
 		if (!nft_set_is_same(set, &desc, exprs, num_exprs, flags)) {
 			NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]);
@@ -4902,6 +4923,9 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
 	if (IS_ERR(ops))
 		return PTR_ERR(ops);
 
+	if (desc.size)
+		desc.size = nft_set_kernel_size(ops, &desc);
+
 	udlen = 0;
 	if (nla[NFTA_SET_USERDATA])
 		udlen = nla_len(nla[NFTA_SET_USERDATA]);
@@ -6327,6 +6351,27 @@ static bool nft_setelem_valid_key_end(const struct nft_set *set,
 	return true;
 }
 
+static u32 nft_set_maxsize(const struct nft_set *set)
+{
+	u32 maxsize, delta;
+
+	if (!set->size)
+		return UINT_MAX;
+
+	if (set->ops->adjust_maxsize)
+		delta = set->ops->adjust_maxsize(set);
+	else
+		delta = 0;
+
+	if (check_add_overflow(set->size, set->ndeact, &maxsize))
+		return UINT_MAX;
+
+	if (check_add_overflow(maxsize, delta, &maxsize))
+		return UINT_MAX;
+
+	return maxsize;
+}
+
 static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			    const struct nlattr *attr, u32 nlmsg_flags)
 {
@@ -6671,7 +6716,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	}
 
 	if (!(flags & NFT_SET_ELEM_CATCHALL)) {
-		unsigned int max = set->size ? set->size + set->ndeact : UINT_MAX;
+		unsigned int max = nft_set_maxsize(set);
 
 		if (!atomic_add_unless(&set->nelems, 1, max)) {
 			err = -ENFILE;
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 426becaad..26e1d994f 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -775,6 +775,46 @@ static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
 	return true;
 }
 
+/* rbtree stores ranges as singleton elements, each range is composed of two
+ * elements ...
+ */
+static u32 nft_rbtree_ksize(u32 size)
+{
+	return size * 2;
+}
+
+/* ... hide this detail to userspace. */
+static u32 nft_rbtree_usize(u32 size)
+{
+	if (!size)
+		return 0;
+
+	return size / 2;
+}
+
+static u32 nft_rbtree_adjust_maxsize(const struct nft_set *set)
+{
+	struct nft_rbtree *priv = nft_set_priv(set);
+	struct nft_rbtree_elem *rbe;
+	struct rb_node *node;
+	const void *key;
+
+	node = rb_last(&priv->root);
+	if (!node)
+		return 0;
+
+	rbe = rb_entry(node, struct nft_rbtree_elem, node);
+	if (!nft_rbtree_interval_end(rbe))
+		return 0;
+
+	key = nft_set_ext_key(&rbe->ext);
+	if (memchr(key, 1, set->klen))
+		return 0;
+
+	/* this is the all-zero no-match element. */
+	return 1;
+}
+
 const struct nft_set_type nft_set_rbtree_type = {
 	.features	= NFT_SET_INTERVAL | NFT_SET_MAP | NFT_SET_OBJECT | NFT_SET_TIMEOUT,
 	.ops		= {
@@ -791,5 +831,8 @@ const struct nft_set_type nft_set_rbtree_type = {
 		.lookup		= nft_rbtree_lookup,
 		.walk		= nft_rbtree_walk,
 		.get		= nft_rbtree_get,
+		.ksize		= nft_rbtree_ksize,
+		.usize		= nft_rbtree_usize,
+		.adjust_maxsize = nft_rbtree_adjust_maxsize,
 	},
 };
-- 
2.53.0


^ permalink raw reply related

* [PATCH v6.1 1/3] netfilter: nf_tables: always increment set element count
From: Shivani Agarwal @ 2026-06-19  9:28 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Shivani Agarwal
In-Reply-To: <20260619092850.1274076-1-shivani.agarwal@broadcom.com>

From: Florian Westphal <fw@strlen.de>

[ Upstream commit d4b7f29eb85c93893bc27388b37709efbc3c9a0e ]

At this time, set->nelems counter only increments when the set has
a maximum size.

All set elements decrement the counter unconditionally, this is
confusing.

Increment the counter unconditionally to make this symmetrical.
This would also allow changing the set maximum size after set creation
in a later patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
[ Shivani: Modified to apply on 6.1.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
 net/netfilter/nf_tables_api.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 0c4224282..ec4bfe53b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6670,10 +6670,13 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err_element_clash;
 	}
 
-	if (!(flags & NFT_SET_ELEM_CATCHALL) && set->size &&
-	    !atomic_add_unless(&set->nelems, 1, set->size + set->ndeact)) {
-		err = -ENFILE;
-		goto err_set_full;
+	if (!(flags & NFT_SET_ELEM_CATCHALL)) {
+		unsigned int max = set->size ? set->size + set->ndeact : UINT_MAX;
+
+		if (!atomic_add_unless(&set->nelems, 1, max)) {
+			err = -ENFILE;
+			goto err_set_full;
+		}
 	}
 
 	nft_trans_elem(trans) = elem;
-- 
2.53.0


^ permalink raw reply related

* [PATCH v6.1 0/3] Fix CVE-2026-23272
From: Shivani Agarwal @ 2026-06-19  9:28 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Shivani Agarwal

To fix CVE-2026-23272, commit def602e498a4 is required; however,
it depends on commit d4b7f29eb85c and 8d738c1869f6. Therefore,
both patches have been backported to v6.1.

Florian Westphal (1):
  netfilter: nf_tables: always increment set element count

Pablo Neira Ayuso (2):
  netfilter: nf_tables: fix set size with rbtree backend
  netfilter: nf_tables: unconditionally bump set->nelems before
    insertion

 include/net/netfilter/nf_tables.h |  6 +++
 net/netfilter/nf_tables_api.c     | 72 ++++++++++++++++++++++++++-----
 net/netfilter/nft_set_rbtree.c    | 43 ++++++++++++++++++
 3 files changed, 110 insertions(+), 11 deletions(-)

-- 
2.53.0


^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Petr Mladek @ 2026-06-19  9:53 UTC (permalink / raw)
  To: John Ogness
  Cc: Breno Leitao, Peter Zijlstra, Jakub Kicinski,
	Sebastian Andrzej Siewior, Sergey Senozhatsky, Vlad Poenaru,
	Thomas Gleixner, netdev, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt,
	linux-rt-devel, linux-kernel, stable, Frederic Weisbecker,
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <87tsr1m6y5.fsf@jogness.linutronix.de>

On Wed 2026-06-17 19:13:30, John Ogness wrote:
> On 2026-06-17, Breno Leitao <leitao@debian.org> wrote:
> > On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote:
> >> But anything using locking is not ->write_atomic() and should be driven
> >> from a kthread, no?
> >
> > Good point. If that's the case, netconsole might not ever be able to drop
> > CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all. 
> 
> It depends on what it needs to synchronize against. For example, the
> UART consoles cannot write if the port lock is taken by another
> context. And the port lock is the sole lock for writing to the UART. To
> deal with this, we added wrappers [0] for acquiring/releasing the port
> lock. The wrappers acquire the nbcon hardware after taking the port
> lock.
>
> The write_atomic() implementations for UART consoles do not take the
> port lock. Only the nbcon hardware is acquired (which can be done from
> any context). This automatically provides the synchronization based on
> the port lock.
> 
> > As far as I can tell, there isn't a network driver today whose transmit
> > path is completely lockless, so, even if we make netpoll lockless.
> >
> > It's unlikely any NIC will ever achieve this, given that NIC TX
> > fundamentally relies on a shared DMA ring and doorbell register, which
> > inherently cannot be made lockless.
> >
> > So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of
> > netconsole forever-ish?
> 
> Is there some lock that can be taken to synchronize all writing of
> packets to the network? If yes, the netconsole can use a similar
> solution.

We need to be careful here. If more locks depend on the nbcon
ownership than it might become a kind of big kernel lock.

It might suffer from lock contention.

Another complication is that it is supposed to be a tail lock.

Finally, it might create tricky lockdep dependencies. But nbcon
context locking is not tracked by locked so it is not easy to be sure.

More details:

I always forget the details. But it seems that sleeping is allowed
in the nbcon context, see cant_migrate() in nbcon_device_try_acquire().
Which might break when someone tries to take it in atomic context.

AFAIK, the motivation was to allow using the normal (sleeping)
spin locks for serial console synchronization in RT. The nested nbcon
context locking should not disable the preemption when called
in NBCON_PRIO_NORMAL context.

It would still allow to take the nbcon context in atomic context
when called in NBCON_PRIO_EMERGENCY or _PANIC context because
nbcon_context_try_acquire() is able to take over the ownership
even from a sleeping NBCON_PRIO_NORMAL context.

But we need to make sure that outer locks behave the same.
In practice, they must be normal spin_locks. We could probably
add some lockdep annotation to catch eventual problems.

Sigh, I hope that I have got it right. I seem to be a bit lost
this week.

> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/serial_core.h?h=v7.1#n715

Best Regards,
Petr

^ permalink raw reply

* Re: AW: AW: [PATCH net] net: usb: lan78xx: restore VLAN filter table after device reset
From: Nicolai Buchwitz @ 2026-06-19  9:53 UTC (permalink / raw)
  To: Sven Schuchmann
  Cc: Thangaraj Samynathan, Rengarajan Sundararajan, UNGLinuxDriver,
	Woojung.Huh, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-usb, linux-kernel
In-Reply-To: <BEZP281MB2245422BB9FBFF4AFE081451D9E22@BEZP281MB2245.DEUP281.PROD.OUTLOOK.COM>

Hi Sven

On 19.6.2026 11:18, Sven Schuchmann wrote:
> Hello Nicolai,
> 
> my first opservation is that calling lan78xx_write_vlan_table()
> at the end lan78xx_start_rx_path() fixes the problem. I was able
> to do over 200 connect/disconnects without any problem.

Thanks, that's the right direction. For the final patch I'd move it
to lan78xx_mac_link_up(), which is IMHO a bit "cleaner":

[...]
  static void lan78xx_rx_urb_submit_all(struct lan78xx_net *dev);
+static int lan78xx_write_vlan_table(struct lan78xx_net *dev);
[...]
static void lan78xx_mac_link_up(struct phylink_config *config,
[...]
  	if (ret < 0)
  		goto link_up_fail;

+	ret = lan78xx_write_vlan_table(dev);
+	if (ret < 0)
+		goto link_up_fail;
+
  	netif_start_queue(net);
[...]

Could you give this version a quick test and confirm? Then I'll add
your Tested-by.

> [...]

Thanks
Nicolai

^ permalink raw reply

* [PATCH net v2] octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
From: Ratheesh Kannoth @ 2026-06-19  9:51 UTC (permalink / raw)
  To: kuba, linux-kernel, netdev, rkannoth
  Cc: andrew+netdev, davem, edumazet, pabeni, sgoutham

subbank_srch_order[i] is the physical subbank at search-order slot i,
so each subbank's arr_idx must be i (its slot), not
subbank_srch_order[sb->idx].  The old logic mis-keyed xa_sb_free
and broke allocation traversal order.

Populate arr_idx and xa_sb_free in a single pass over the search
order after subbank structs are initialized.

Fixes: 7ac9d4c4075c ("octeontx2-af: npc: cn20k: add subbank search order control")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

---
v1 -> v2: Addressed simon comments
	https://lore.kernel.org/netdev/20260619091341.918165-1-horms@kernel.org/
---
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 51 ++++++++++++++-----
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index 354c4e881c6a..51fe82f1343f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -3423,6 +3423,36 @@ static int npc_create_srch_order(int cnt)
 	return 0;
 }
 
+static int npc_subbanks_srch_order_init(struct rvu *rvu)
+{
+	struct npc_subbank *sb;
+	int sb_idx;
+	int i, j;
+	int rc;
+
+	for (i = 0; i < npc_priv->num_subbanks; i++) {
+		sb_idx = subbank_srch_order[i];
+		sb = &npc_priv->sb[sb_idx];
+		sb->arr_idx = i;
+
+		dev_dbg(rvu->dev, "%s: sb->idx=%u sb->arr_idx=%u\n",
+			__func__, sb->idx, sb->arr_idx);
+
+		rc = xa_err(xa_store(&npc_priv->xa_sb_free, sb->arr_idx,
+				     xa_mk_value(sb->idx), GFP_KERNEL));
+		if (rc) {
+			dev_err(rvu->dev,
+				"%s: xa_store(xa_sb_free) failed at slot %d (sb=%d): %d\n",
+				__func__, i, sb_idx, rc);
+			for (j = 0; j < i; j++)
+				xa_erase(&npc_priv->xa_sb_free, j);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
 static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx)
 {
 	mutex_init(&sb->lock);
@@ -3435,16 +3465,6 @@ static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx)
 
 	sb->flags = NPC_SUBBANK_FLAG_FREE;
 	sb->idx = idx;
-	sb->arr_idx = subbank_srch_order[idx];
-
-	dev_dbg(rvu->dev, "%s: sb->idx=%u sb->arr_idx=%u\n",
-		__func__, sb->idx, sb->arr_idx);
-
-	/* Keep first and last subbank at end of free array; so that
-	 * it will be used at last
-	 */
-	xa_store(&npc_priv->xa_sb_free, sb->arr_idx,
-		 xa_mk_value(sb->idx), GFP_KERNEL);
 }
 
 static int npc_pcifunc_map_create(struct rvu *rvu)
@@ -4635,6 +4655,7 @@ static int npc_priv_init(struct rvu *rvu)
 	int num_subbanks, subbank_depth;
 	u64 npc_const1, npc_const2 = 0;
 	struct npc_subbank *sb;
+	int ret = -ENOMEM;
 	u64 cfg;
 	int i;
 
@@ -4727,13 +4748,19 @@ static int npc_priv_init(struct rvu *rvu)
 	for (i = 0, sb = npc_priv->sb; i < num_subbanks; i++, sb++)
 		npc_subbank_init(rvu, sb, i);
 
+	ret = npc_subbanks_srch_order_init(rvu);
+	if (ret)
+		goto fail3;
+
 	/* Get number of pcifuncs in the system */
 	npc_priv->pf_cnt = npc_pcifunc_map_create(rvu);
 	npc_priv->xa_pf2idx_map = kcalloc(npc_priv->pf_cnt,
 					  sizeof(struct xarray),
 					  GFP_KERNEL);
-	if (!npc_priv->xa_pf2idx_map)
+	if (!npc_priv->xa_pf2idx_map) {
+		ret = -ENOMEM;
 		goto fail3;
+	}
 
 	for (i = 0; i < npc_priv->pf_cnt; i++)
 		xa_init_flags(&npc_priv->xa_pf2idx_map[i], XA_FLAGS_ALLOC);
@@ -4760,7 +4787,7 @@ static int npc_priv_init(struct rvu *rvu)
 fail1:
 	kfree(npc_priv);
 	npc_priv = NULL;
-	return -ENOMEM;
+	return ret;
 }
 
 void npc_cn20k_deinit(struct rvu *rvu)
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
From: Ratheesh Kannoth @ 2026-06-19  9:45 UTC (permalink / raw)
  To: Simon Horman
  Cc: kuba, linux-kernel, netdev, andrew+netdev, davem, edumazet,
	pabeni, sgoutham
In-Reply-To: <20260619091341.918165-1-horms@kernel.org>

On 2026-06-19 at 14:43:41, Simon Horman (horms@kernel.org) wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
>
> This change re-keys the xa_sb_free free list using each subbank's slot
> in subbank_srch_order[] rather than using subbank_srch_order[sb->idx],
> moving the population into a new helper called from npc_priv_init().
> A few comments inline.
>
> > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> > index 354c4e881c6a..d38e848add93 100644
> > --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> > +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> > @@ -3423,6 +3423,36 @@ static int npc_create_srch_order(int cnt)
> >  	return 0;
> >  }
> >
> > +static int npc_subbanks_srch_oder_init(struct rvu *rvu)
>                                 ^^^^
>
> [Low]
> Is "oder" intended here? The neighbouring identifiers
> npc_create_srch_order() and subbank_srch_order use "order", so this
> appears to be a typo that would be easier to fix before the symbol
> lands.
>
> > +{
> > +	struct npc_subbank *sb;
> > +	int sb_idx;
> > +	int i, j;
> > +	int rc;
> > +
> > +	for (i = 0; i < npc_priv->num_subbanks; i++) {
> > +		sb_idx = subbank_srch_order[i];
> > +		sb = &npc_priv->sb[sb_idx];
> > +		sb->arr_idx = i;
> > +
> > +		dev_dbg(rvu->dev, "%s: sb->idx=%u sb->arr_idx=%u\n",
> > +			__func__, sb->idx, sb->arr_idx);
> > +
> > +		rc = xa_err(xa_store(&npc_priv->xa_sb_free, sb->arr_idx,
> > +				     xa_mk_value(sb->idx), GFP_KERNEL));
> > +		if (rc) {
> > +			dev_err(rvu->dev,
> > +				"%s: xa_store(xa_sb_free) failed at slot %d (sb=%d): %d\n",
> > +				__func__, i, sb_idx, rc);
> > +			for (j = 0; j < i; j++)
> > +				xa_erase(&npc_priv->xa_sb_free, j);
> > +			return rc;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
>
> [ ... ]
>
> > @@ -4727,6 +4748,10 @@ static int npc_priv_init(struct rvu *rvu)
> >  	for (i = 0, sb = npc_priv->sb; i < num_subbanks; i++, sb++)
> >  		npc_subbank_init(rvu, sb, i);
> >
> > +	ret = npc_subbanks_srch_oder_init(rvu);
> > +	if (ret)
> > +		goto fail2;
> > +
>
> [Medium]
> Can this leak subbank_srch_order on the new failure path?
>
> npc_create_srch_order() allocates subbank_srch_order via kcalloc().
> Before this patch, the only way to reach fail2 was when
> npc_create_srch_order() itself failed and left subbank_srch_order
> NULL, so fail2 not freeing it was safe.
>
> With this new goto fail2, subbank_srch_order has already been
> allocated, but the fail2 label only frees npc_priv->sb and npc_priv:
>
> fail2:
> 	xa_destroy(&npc_priv->xa_sb_used);
> 	[...]
> 	kfree(npc_priv->sb);
> 	npc_priv->sb = NULL;
> fail1:
> 	kfree(npc_priv);
> 	npc_priv = NULL;
> 	return ret;
>
> A subsequent re-init would unconditionally reassign
> subbank_srch_order = kcalloc(...) in npc_create_srch_order(),
> permanently leaking the previous buffer.
>
> Should this path go to a new label that also kfree()s
> subbank_srch_order, or should fail2 free it?
>
> >  	/* Get number of pcifuncs in the system */
> >  	npc_priv->pf_cnt = npc_pcifunc_map_create(rvu);
> >  	npc_priv->xa_pf2idx_map = kcalloc(npc_priv->pf_cnt,
>
> [High]
> Can npc_priv_init() return 0 on a real failure here?
>
> After npc_subbanks_srch_oder_init() returns successfully, ret is 0.
> If the kcalloc() for npc_priv->xa_pf2idx_map fails right after, the
> code does goto fail3 which falls through fail2 and fail1 to
> return ret, propagating 0 to the caller:
>
> fail3:
> 	kfree(subbank_srch_order);
> 	subbank_srch_order = NULL;
>
> fail2:
> 	[...]
> fail1:
> 	kfree(npc_priv);
> 	npc_priv = NULL;
> 	return ret;
>
> The caller npc_cn20k_init() then sees err == 0 and continues into
> npc_setup_mcam_section(), which dereferences npc_priv (now NULL).
>
> Pre-patch the function ended in return -ENOMEM, so the fail3 path
> returned -ENOMEM unconditionally. Should ret be reset to -ENOMEM
> before goto fail3, or should a separate variable be used for the
> new helper's return value?
>
> > @@ -4760,7 +4785,7 @@ static int npc_priv_init(struct rvu *rvu)
> >  fail1:
> >  	kfree(npc_priv);
> >  	npc_priv = NULL;
> > -	return -ENOMEM;
> > +	return ret;
> >  }

ACK for all.

pw-bot: changes-requested

^ permalink raw reply

* Re: [PATCH net 4/6] ipv6: fix error handling in disable_policy sysctl
From: Nicolas Dichtel @ 2026-06-19  9:35 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <20260618162225.4588-5-fmancera@suse.de>

Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
> When writing to the disable_policy sysctl, if proc_dointvec() fails to
> parse the input, it returns a negative error code. The current
> implementation is resetting the position argument even if an error
> occurred during proc_dointvec() and not only during sysctl restart.
> 
> Fix this by checking the return value of proc_dointvec() and returning
> early on failure.
> 
> Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH net 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Simon Horman @ 2026-06-19  9:35 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260618-airoha-qos-fixes-v1-2-37192652157f@kernel.org>

On Thu, Jun 18, 2026 at 08:00:30AM +0200, Lorenzo Bianconi wrote:
> airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
> index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
> real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
> channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
> 2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
> accesses in the networking stack.
> For example, allocating channel 0 then channel 3 results in
> real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
> Fix this by computing real_num_tx_queues based on the highest active
> channel index rather than using a simple counter, in both the allocation
> and deletion paths.
> 
> Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c

...

> @@ -2806,7 +2806,10 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
>  	if (err)
>  		goto error;
>  
> -	err = netif_set_real_num_tx_queues(netdev, num_tx_queues + 1);
> +	if (num_tx_queues <= netdev->real_num_tx_queues)
> +		goto set_qos_sq_bmap;
> +
> +	err = netif_set_real_num_tx_queues(netdev, num_tx_queues);
>  	if (err) {
>  		airoha_qdma_set_tx_rate_limit(netdev, channel, 0,
>  					      opt->quantum);
> @@ -2815,6 +2818,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
>  		goto error;
>  	}
>  
> +set_qos_sq_bmap:

I would prefer if this could be achieved without a goto.

>  	set_bit(channel, dev->qos_sq_bmap);
>  	opt->qid = AIROHA_NUM_TX_RING + channel;
>  

...

^ permalink raw reply

* Re: [PATCH net 3/6] ipv6: fix error handling in forwarding sysctl
From: Nicolas Dichtel @ 2026-06-19  9:34 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, netdev
  Cc: shemminger, dforster, gospo, ddutt, brian.haley, horms, pabeni,
	kuba, edumazet, davem, idosch, dsahern
In-Reply-To: <20260618162225.4588-4-fmancera@suse.de>

Le 18/06/2026 à 18:22, Fernando Fernandez Mancera a écrit :
> When writing to the forwarding sysctl, if proc_dointvec() fails to parse
> the input, it returns a negative error code. The current implementation
> is overwriting that error for write operations.
> 
> This results in a silent failure, it returns a successful write although
> the configuration was not modified at all. When modifying the "all"
> variant it can also modify the configuration of existing interfaces to
> the wrong value.
> 
> Fix this by checking the return value of proc_dointvec() and returning
> early on failure.
> 
> Fixes: b325fddb7f86 ("ipv6: Fix sysctl unregistration deadlock")
The bug existed before the git era.
Maybe
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox