Netdev List
 help / color / mirror / Atom feed
* [PATCH net 08/13] ice: fix potential NULL pointer deref in error path of ice_set_ringparam()
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Kohei Enju, Paul Greenwalt, Rinitha S
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Kohei Enju <kohei@enjuk.jp>

ice_set_ringparam nullifies tstamp_ring of temporary tx_rings, without
clearing ICE_TX_RING_FLAGS_TXTIME bit.
When ICE_TX_RING_FLAGS_TXTIME is set and the subsequent
ice_setup_tx_ring() call fails, a NULL pointer dereference could happen
in the unwinding sequence:

ice_clean_tx_ring()
-> ice_is_txtime_cfg() == true (ICE_TX_RING_FLAGS_TXTIME is set)
-> ice_free_tx_tstamp_ring()
  -> ice_free_tstamp_ring()
    -> tstamp_ring->desc (NULL deref)

Clear ICE_TX_RING_FLAGS_TXTIME bit to avoid the potential issue.

Note that this potential issue is found by manual code review.
Compile test only since unfortunately I don't have E830 devices.

Fixes: ccde82e90946 ("ice: add E830 Earliest TxTime First Offload support")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index e6a20af6f63d..f28416a707d7 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3290,6 +3290,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring,
 		tx_rings[i].desc = NULL;
 		tx_rings[i].tx_buf = NULL;
 		tx_rings[i].tstamp_ring = NULL;
+		clear_bit(ICE_TX_RING_FLAGS_TXTIME, tx_rings[i].flags);
 		tx_rings[i].tx_tstamps = &pf->ptp.port.tx;
 		err = ice_setup_tx_ring(&tx_rings[i]);
 		if (err) {

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH net 09/13] i40e: don't advertise IFF_SUPP_NOFCS
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Kohei Enju, Aleksandr Loktionov,
	Sunitha Mekala
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Kohei Enju <kohei@enjuk.jp>

i40e advertises IFF_SUPP_NOFCS, allowing users to use the SO_NOFCS
socket option. However, this option is silently ignored, as the driver
does not check skb->no_fcs, and always enables FCS insertion offload.

Fix this by removing the advertisement of IFF_SUPP_NOFCS.

This behavior can be reproduced with a simple AF_PACKET socket:

  import socket
  s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
  s.setsockopt(socket.SOL_SOCKET, 43, 1) # SO_NOFCS
  s.bind(("eth0", 0))
  s.send(b'\xff' * 64)

Previously, send() succeeds but the driver ignores SO_NOFCS.
With this change, send() fails with -EPROTONOSUPPORT, as expected.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 926d001b2150..028bd500603a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -13783,7 +13783,6 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->neigh_priv_len = sizeof(u32) * 4;
 
 	netdev->priv_flags |= IFF_UNICAST_FLT;
-	netdev->priv_flags |= IFF_SUPP_NOFCS;
 	/* Setup netdev TC information */
 	i40e_vsi_config_netdev_tc(vsi, vsi->tc_config.enabled_tc);
 

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH net 10/13] i40e: fix napi_enable/disable skipping ringless q_vectors
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Aleksandr Loktionov, stable, Sunitha Mekala
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

After ethtool -L reduces the queue count, i40e_napi_disable_all() sets
NAPI_STATE_SCHED on all q_vectors, then i40e_vsi_map_rings_to_vectors()
clears ring pointers on the excess ones.  i40e_napi_enable_all() skips
those with:

	if (q_vector->rx.ring || q_vector->tx.ring)
		napi_enable(&q_vector->napi);

leaving them on dev->napi_list with NAPI_STATE_SCHED permanently set.

Writing to /sys/class/net/<iface>/threaded calls napi_stop_kthread()
on every entry in dev->napi_list.  The function loops on msleep(20)
waiting for NAPI_STATE_SCHED to clear -- which never happens for the
stale q_vectors.  The task hangs in D state forever; a concurrent write
deadlocks on dev->lock held by the first.

Commit 13a8cd191a2b ("i40e: Do not enable NAPI on q_vectors that have no
rings") added the guard to prevent a divide-by-zero in i40e_napi_poll()
when epoll busy-poll iterated all device NAPIs (4.x era). Since
7adc3d57fe2b ("net: Introduce preferred busy-polling"), from v5.11,
napi_busy_loop() polls by napi_id keyed to the socket, so ringless
q_vectors are never selected.  i40e_msix_clean_rings() also independently
avoids scheduling NAPI for them.  The guard is safe to remove.

Add an early return in i40e_napi_poll() for num_ringpairs == 0 so the
function is self-defending against a NULL tx.ring dereference at the
WB_ON_ITR check, should the NAPI ever fire through an unexpected path.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
Fixes: 13a8cd191a2b ("i40e: Do not enable NAPI on q_vectors that have no rings")
Cc: stable@vger.kernel.org
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 28 ++++++++++++++++------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 10 ++++++++++
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 028bd500603a..b4ca8485f4b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5182,6 +5182,14 @@ static void i40e_clear_interrupt_scheme(struct i40e_pf *pf)
 /**
  * i40e_napi_enable_all - Enable NAPI for all q_vectors in the VSI
  * @vsi: the VSI being configured
+ *
+ * Enable NAPI on every q_vector that is registered with the netdev,
+ * regardless of whether it currently has rings assigned.  After a queue-
+ * count reduction (e.g. ethtool -L combined 1) the excess q_vectors lose
+ * their ring pointers inside i40e_vsi_map_rings_to_vectors but remain on
+ * dev->napi_list.  Leaving them in the napi_disable()-ed state
+ * (NAPI_STATE_SCHED set) causes napi_set_threaded() to spin forever on
+ * msleep(20) waiting for that bit to clear.
  **/
 static void i40e_napi_enable_all(struct i40e_vsi *vsi)
 {
@@ -5190,17 +5198,17 @@ static void i40e_napi_enable_all(struct i40e_vsi *vsi)
 	if (!vsi->netdev)
 		return;
 
-	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++) {
-		struct i40e_q_vector *q_vector = vsi->q_vectors[q_idx];
-
-		if (q_vector->rx.ring || q_vector->tx.ring)
-			napi_enable(&q_vector->napi);
-	}
+	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++)
+		napi_enable(&vsi->q_vectors[q_idx]->napi);
 }
 
 /**
  * i40e_napi_disable_all - Disable NAPI for all q_vectors in the VSI
  * @vsi: the VSI being configured
+ *
+ * Mirror of i40e_napi_enable_all: operate on every registered q_vector so
+ * enable/disable calls are always balanced, even when some q_vectors carry
+ * no rings (as happens after a queue-count reduction).
  **/
 static void i40e_napi_disable_all(struct i40e_vsi *vsi)
 {
@@ -5209,12 +5217,8 @@ static void i40e_napi_disable_all(struct i40e_vsi *vsi)
 	if (!vsi->netdev)
 		return;
 
-	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++) {
-		struct i40e_q_vector *q_vector = vsi->q_vectors[q_idx];
-
-		if (q_vector->rx.ring || q_vector->tx.ring)
-			napi_disable(&q_vector->napi);
-	}
+	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++)
+		napi_disable(&vsi->q_vectors[q_idx]->napi);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 894f2d06d39d..3123459208d3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2760,6 +2760,16 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 		return 0;
 	}
 
+	/* A q_vector can have its ring pointers cleared after a queue-count
+	 * reduction (ethtool -L combined N) while napi_enable() was already
+	 * called on it.  Complete immediately so the poll loop exits cleanly
+	 * and we never dereference the NULL ring pointer below.
+	 */
+	if (unlikely(!q_vector->num_ringpairs)) {
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	/* Since the actual Tx work is minimal, we can give the Tx a larger
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH net 12/13] idpf: fix xdp crash in soft reset error path
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Emil Tantilov, stable, Aleksandr Loktionov,
	Patryk Holda
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

NULL pointer dereference is reported in cases where idpf_vport_open()
fails during soft reset:

./xdpsock -i <inf> -q -r -N

[ 3179.186687] idpf 0000:83:00.0: Failed to initialize queue ids for vport 0: -12
[ 3179.276739] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 3179.277636] #PF: supervisor read access in kernel mode
[ 3179.278470] #PF: error_code(0x0000) - not-present page
[ 3179.279285] PGD 0
[ 3179.280083] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ 3179.283997] Workqueue: events xp_release_deferred
[ 3179.284770] RIP: 0010:idpf_find_rxq_vec+0x17/0x30 [idpf]
...
[ 3179.291937] Call Trace:
[ 3179.292392]  <TASK>
[ 3179.292843]  idpf_qp_switch+0x25/0x820 [idpf]
[ 3179.293325]  idpf_xsk_pool_setup+0x7c/0x520 [idpf]
[ 3179.293803]  idpf_xdp+0x59/0x240 [idpf]
[ 3179.294275]  xp_disable_drv_zc+0x62/0xb0
[ 3179.294743]  xp_clear_dev+0x40/0xb0
[ 3179.295198]  xp_release_deferred+0x1f/0xa0
[ 3179.295648]  process_one_work+0x226/0x730
[ 3179.296106]  worker_thread+0x19e/0x340
[ 3179.296557]  ? __pfx_worker_thread+0x10/0x10
[ 3179.297009]  kthread+0xf4/0x130
[ 3179.297459]  ? __pfx_kthread+0x10/0x10
[ 3179.297910]  ret_from_fork+0x32c/0x410
[ 3179.298361]  ? __pfx_kthread+0x10/0x10
[ 3179.298702]  ret_from_fork_asm+0x1a/0x30

Fix the error handling of the soft reset in idpf_xdp_setup_prog() by
restoring the vport->xdp_prog to the old value. This avoids referencing
the orphaned prog that was copied to vport->xdp_prog in the soft reset
and prevents subsequent false positive by idpf_xdp_enabled().

Update the restart check in idpf_xsk_pool_setup() to use IDPF_VPORT_UP bit
instead of netif_running(). The idpf_vport_stop/start() calls will not
update the __LINK_STATE_START bit, making this test a false positive
should the soft reset fail.

Fixes: 3d57b2c00f09 ("idpf: add XSk pool initialization")
Cc: stable@vger.kernel.org
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Patryk Holda <patryk.holda@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/idpf/xdp.c | 1 +
 drivers/net/ethernet/intel/idpf/xsk.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/xdp.c b/drivers/net/ethernet/intel/idpf/xdp.c
index cbccd4546768..18a6e7062863 100644
--- a/drivers/net/ethernet/intel/idpf/xdp.c
+++ b/drivers/net/ethernet/intel/idpf/xdp.c
@@ -488,6 +488,7 @@ static int idpf_xdp_setup_prog(struct idpf_vport *vport,
 				   "Could not reopen the vport after XDP setup");
 
 		cfg->user_config.xdp_prog = old;
+		vport->xdp_prog = old;
 		old = prog;
 	}
 
diff --git a/drivers/net/ethernet/intel/idpf/xsk.c b/drivers/net/ethernet/intel/idpf/xsk.c
index d95d3efdfd36..3d8c430efd2b 100644
--- a/drivers/net/ethernet/intel/idpf/xsk.c
+++ b/drivers/net/ethernet/intel/idpf/xsk.c
@@ -553,6 +553,7 @@ int idpf_xskrq_poll(struct idpf_rx_queue *rxq, u32 budget)
 
 int idpf_xsk_pool_setup(struct idpf_vport *vport, struct netdev_bpf *bpf)
 {
+	const struct idpf_netdev_priv *np = netdev_priv(vport->netdev);
 	struct xsk_buff_pool *pool = bpf->xsk.pool;
 	u32 qid = bpf->xsk.queue_id;
 	bool restart;
@@ -568,7 +569,8 @@ int idpf_xsk_pool_setup(struct idpf_vport *vport, struct netdev_bpf *bpf)
 		return -EINVAL;
 	}
 
-	restart = idpf_xdp_enabled(vport) && netif_running(vport->netdev);
+	restart = idpf_xdp_enabled(vport) &&
+		  test_bit(IDPF_VPORT_UP, np->state);
 	if (!restart)
 		goto pool;
 

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH net 11/13] iavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Petr Oros, Aleksandr Loktionov, Paul Menzel,
	Rafal Romanowski
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Petr Oros <poros@redhat.com>

The IAVF_RXD_LEGACY_L2TAG2_M mask was incorrectly defined as
GENMASK_ULL(63, 32), extracting 32 bits from qw2 instead of the
16-bit VLAN tag. In the legacy Rx descriptor layout, the 2nd L2TAG2
(VLAN tag) occupies bits 63:48 of qw2, not 63:32.

The oversized mask causes FIELD_GET to return a 32-bit value where the
actual VLAN tag sits in bits 31:16. When this value is passed to
iavf_receive_skb() as a u16 parameter, it gets truncated to the lower
16 bits (which contain the 1st L2TAG2, typically zero). As a result,
__vlan_hwaccel_put_tag() is never called and software VLAN interfaces
on VFs receive no traffic.

This affects VFs behind ice PF (VIRTCHNL VLAN v2) when the PF
advertises VLAN stripping into L2TAG2_2 and legacy descriptors are
used.

The flex descriptor path already uses the correct mask
(IAVF_RXD_FLEX_L2TAG2_2_M = GENMASK_ULL(63, 48)).

Reproducer:
 1. Create 2 VFs on ice PF (echo 2 > sriov_numvfs)
 2. Disable spoofchk on both VFs
 3. Move each VF into a separate network namespace
 4. On each VF: create VLAN interface (e.g. vlan 198), assign IP,
    bring up
 5. Set rx-vlan-offload OFF on both VFs
 6. Ping between VLAN interfaces -> expect PASS
    (VLAN tag stays in packet data, kernel matches in-band)
 7. Set rx-vlan-offload ON on both VFs
 8. Ping between VLAN interfaces -> expect FAIL if bug present
    (HW strips VLAN tag into descriptor L2TAG2 field, wrong mask
    extracts bits 47:32 instead of 63:48, truncated to u16 -> zero,
    __vlan_hwaccel_put_tag() never called, packet delivered to parent
    interface, not VLAN interface)

The reproducer requires legacy Rx descriptors. On modern ice + iavf
with full PTP support, flex descriptors are always negotiated and the
buggy legacy path is never reached. Flex descriptors require all of:
 - CONFIG_PTP_1588_CLOCK enabled
 - VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC granted by PF
 - PTP capabilities negotiated (VIRTCHNL_VF_CAP_PTP)
 - VIRTCHNL_1588_PTP_CAP_RX_TSTAMP supported
 - VIRTCHNL_RXDID_2_FLEX_SQ_NIC present in DDP profile

If any condition is not met, iavf_select_rx_desc_format() falls back
to legacy descriptors (RXDID=1) and the wrong L2TAG2 mask is hit.

Fixes: 2dc8e7c36d80 ("iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_type.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index 1d8cf29cb65a..5bb1de1cfd33 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -277,7 +277,7 @@ struct iavf_rx_desc {
 /* L2 Tag 2 Presence */
 #define IAVF_RXD_LEGACY_L2TAG2P_M		BIT(0)
 /* Stripped S-TAG VLAN from the receive packet */
-#define IAVF_RXD_LEGACY_L2TAG2_M		GENMASK_ULL(63, 32)
+#define IAVF_RXD_LEGACY_L2TAG2_M		GENMASK_ULL(63, 48)
 /* Stripped S-TAG VLAN from the receive packet */
 #define IAVF_RXD_FLEX_L2TAG2_2_M		GENMASK_ULL(63, 48)
 /* The packet is a UDP tunneled packet */

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH net 13/13] e1000e: Unroll PTP in probe error handling
From: Jacob Keller @ 2026-04-15  5:48 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Jacob Keller, Matt Vollrath, Avigail Dahan
In-Reply-To: <20260414-iwl-net-submission-2026-04-14-v1-0-852f38e7da39@intel.com>

From: Matt Vollrath <tactii@gmail.com>

If probe fails after registering the PTP clock and its delayed work,
these resources must be released.

This was not an issue until a 2016 fix moved the e1000e_ptp_init() call
before the jump to err_register.

Fixes: aa524b66c5ef ("e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl")
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9befdacd6730..7ce0cc8ab8f4 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7706,6 +7706,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_register:
 	if (!(adapter->flags & FLAG_HAS_AMT))
 		e1000e_release_hw_control(adapter);
+	e1000e_ptp_remove(adapter);
 err_eeprom:
 	if (hw->phy.ops.check_reset_block && !hw->phy.ops.check_reset_block(hw))
 		e1000_phy_hw_reset(&adapter->hw);

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH v3 net] rose: fix OOB reads on short CLEAR REQUEST frames
From: Ashutosh Desai @ 2026-04-15  5:57 UTC (permalink / raw)
  To: netdev
  Cc: linux-hams, davem, edumazet, kuba, pabeni, horms, stable,
	linux-kernel, Ashutosh Desai

rose_process_rx_frame() calls rose_decode() which reads skb->data[2]
without any prior length check. For CLEAR REQUEST frames the state
machines then read skb->data[3] and skb->data[4] as the cause and
diagnostic bytes.

A crafted 3-byte ROSE CLEAR REQUEST frame passes the minimum length
gate in rose_route_frame() and reaches rose_process_rx_frame(), where
rose_decode() reads one byte past the header and the state machines
read two bytes past the valid buffer. A remote peer can exploit this
to leak kernel memory contents or trigger a kernel panic.

Add a pskb_may_pull(skb, 3) check before rose_decode() to cover its
skb->data[2] access, and a pskb_may_pull(skb, 5) check afterwards for
the CLEAR REQUEST path to cover the cause and diagnostic reads.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
---
V2 -> V3: drop kfree_skb() calls to fix double-free; add end-user
          visible symptom to commit log; use [net] subject prefix
V1 -> V2: switch skb->len check to pskb_may_pull; add pskb_may_pull(skb, 3)
          before rose_decode() to cover its skb->data[2] access

v2: https://lore.kernel.org/netdev/177614667427.3606651.8700070406932922261@gmail.com/
v1: https://lore.kernel.org/netdev/20260409013246.2051746-1-ashutoshdesai993@gmail.com/

 net/rose/rose_in.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/rose/rose_in.c b/net/rose/rose_in.c
index 0276b393f0e5..8e60dc562b4a 100644
--- a/net/rose/rose_in.c
+++ b/net/rose/rose_in.c
@@ -269,8 +269,14 @@ int rose_process_rx_frame(struct sock *sk, struct sk_buff *skb)
 	if (rose->state == ROSE_STATE_0)
 		return 0;
 
+	if (!pskb_may_pull(skb, 3))
+		return 0;
+
 	frametype = rose_decode(skb, &ns, &nr, &q, &d, &m);
 
+	if (frametype == ROSE_CLEAR_REQUEST && !pskb_may_pull(skb, 5))
+		return 0;
+
 	switch (rose->state) {
 	case ROSE_STATE_1:
 		queued = rose_state1_machine(sk, skb, frametype);
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2] rose: fix OOB reads on short CLEAR REQUEST frames
From: Ashutosh Desai @ 2026-04-15  6:03 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, linux-hams, davem, kuba, pabeni, horms, linux-kernel
In-Reply-To: <CANn89iLXG5ZMNeHkcLW+Ug9PxNnw_EKtpGVkPw1qeXEjJNtA0g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]

Hi Eric,

On Mon, Apr 13, 2026 at 11:11 PM Eric Dumazet wrote:
> rose_process_rx_frame() callers already call kfree_skb(skb) if
> rose_process_rx_frame() returns a 0. Your patch would add double-frees.
>
> Your patch is white-space mangled.

Thanks for the review. Sent v3 with all issues addressed.

Best regards,
Ashutosh

^ permalink raw reply

* [PATCH v2 net 0/2] net: enetc: fix command BD ring issues
From: Wei Fang @ 2026-04-15  6:08 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, chleroy
  Cc: netdev, linux-kernel, imx, linuxppc-dev, linux-arm-kernel

Currently, the implementation of command BD ring has two issues, one is
that the driver may obtain wrong consumer index of the ring, because the
driver does not mask out the SBE bit of the CIR value, so a wrong index
will be obtained when a SBE error ouccrs. The other one is that the DMA
buffer may be used after free. If netc_xmit_ntmp_cmd() times out and
returns an error, the pending command is not explicitly aborted, while
ntmp_free_data_mem() unconditionally frees the DMA buffer. If the buffer
has already been reallocated elsewhere, this may lead to silent memory
corruption. Because the hardware eventually processes the pending command
and perform a DMA write of the response to the physical address of the
freed buffer. So this patch set is to fix these two issues.

---
v2:
1. Check the SBE bit in netc_xmit_ntmp_cmd().
2. Fix DMA buffer leak issue when netc_xmit_ntmp_cmd returns -EBUSY.
3. Check swcbd->buf in ntmp_free_data_mem().
4. Move ring_lock ownership to the caller to ensure the response buffer
cannot be reclaimed prematurely. So add the helpers ntmp_unlock_cbdr()
and ntmp_select_and_lock_cbdr().
---

Wei Fang (2):
  net: enetc: correct the command BD ring consumer index
  net: enetc: fix NTMP DMA use-after-free issue

 drivers/net/ethernet/freescale/enetc/ntmp.c   | 217 +++++++++++-------
 .../ethernet/freescale/enetc/ntmp_private.h   |  10 +-
 include/linux/fsl/ntmp.h                      |   9 +-
 3 files changed, 141 insertions(+), 95 deletions(-)

-- 
2.34.1


^ permalink raw reply

* [PATCH v2 net 1/2] net: enetc: correct the command BD ring consumer index
From: Wei Fang @ 2026-04-15  6:08 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, chleroy
  Cc: netdev, linux-kernel, imx, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20260415060833.2303846-1-wei.fang@nxp.com>

The command BD ring cousumer index register has the consumer index as
the lower 10 bits, and the bit 31 is SBE, which indicates whether a
system bus error occurred during execution of the CBD command. So if a
system bus error occurs, reading the register will get the SBE bit set.

However, the current implementation directly uses the register value as
the consumer index without masking it. Therefore, if a system bus error
occurs, an incorrect consumer index will be obtained, causing errors in
the processing of the command BD ring. Thus, we need to mask out the
other bits to obtain the correct consumer index.

In addition, this patch adds a check for the SBE bit after the polling
loop and returns an error if the bit is set.

Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
 drivers/net/ethernet/freescale/enetc/ntmp.c         | 13 ++++++++++---
 drivers/net/ethernet/freescale/enetc/ntmp_private.h |  2 ++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/ntmp.c b/drivers/net/ethernet/freescale/enetc/ntmp.c
index 0c1d343253bf..b188eb2d40c0 100644
--- a/drivers/net/ethernet/freescale/enetc/ntmp.c
+++ b/drivers/net/ethernet/freescale/enetc/ntmp.c
@@ -55,7 +55,7 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev,
 	spin_lock_init(&cbdr->ring_lock);
 
 	cbdr->next_to_use = netc_read(cbdr->regs.pir);
-	cbdr->next_to_clean = netc_read(cbdr->regs.cir);
+	cbdr->next_to_clean = netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX;
 
 	/* Step 1: Configure the base address of the Control BD Ring */
 	netc_write(cbdr->regs.bar0, lower_32_bits(cbdr->dma_base_align));
@@ -98,7 +98,7 @@ static void ntmp_clean_cbdr(struct netc_cbdr *cbdr)
 	int i;
 
 	i = cbdr->next_to_clean;
-	while (netc_read(cbdr->regs.cir) != i) {
+	while ((netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX) != i) {
 		cbd = ntmp_get_cbd(cbdr, i);
 		memset(cbd, 0, sizeof(*cbd));
 		i = (i + 1) % cbdr->bd_num;
@@ -135,12 +135,19 @@ static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd)
 	cbdr->next_to_use = i;
 	netc_write(cbdr->regs.pir, i);
 
-	err = read_poll_timeout_atomic(netc_read, val, val == i,
+	err = read_poll_timeout_atomic(netc_read, val,
+				       (val & NETC_CBDRCIR_INDEX) == i,
 				       NETC_CBDR_DELAY_US, NETC_CBDR_TIMEOUT,
 				       true, cbdr->regs.cir);
 	if (unlikely(err))
 		goto cbdr_unlock;
 
+	if (unlikely(val & NETC_CBDRCIR_SBE)) {
+		dev_err(user->dev, "Command BD system bus error\n");
+		err = -EIO;
+		goto cbdr_unlock;
+	}
+
 	dma_rmb();
 	/* Get the writeback command BD, because the caller may need
 	 * to check some other fields of the response header.
diff --git a/drivers/net/ethernet/freescale/enetc/ntmp_private.h b/drivers/net/ethernet/freescale/enetc/ntmp_private.h
index 34394e40fddd..3459cc45b610 100644
--- a/drivers/net/ethernet/freescale/enetc/ntmp_private.h
+++ b/drivers/net/ethernet/freescale/enetc/ntmp_private.h
@@ -12,6 +12,8 @@
 
 #define NTMP_EID_REQ_LEN	8
 #define NETC_CBDR_BD_NUM	256
+#define NETC_CBDRCIR_INDEX	GENMASK(9, 0)
+#define NETC_CBDRCIR_SBE	BIT(31)
 
 union netc_cbd {
 	struct {
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2 net 2/2] net: enetc: fix NTMP DMA use-after-free issue
From: Wei Fang @ 2026-04-15  6:08 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, chleroy
  Cc: netdev, linux-kernel, imx, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20260415060833.2303846-1-wei.fang@nxp.com>

The AI-generated review reported a potential DMA use-after-free issue
[1]. If netc_xmit_ntmp_cmd() times out and returns an error, the pending
command is not explicitly aborted, while ntmp_free_data_mem()
unconditionally frees the DMA buffer. If the buffer has already been
reallocated elsewhere, this may lead to silent memory corruption. Because
the hardware eventually processes the pending command and perform a DMA
write of the response to the physical address of the freed buffer.

To resolve this issue, this patch does the following modifications:

1. Convert cbdr->ring_lock from a spinlock to a mutex

The lock was originally a spinlock in case NTMP operations might be
invoked from atomic context. After downstream support for all NTMP
tables, no such usage has materialized. A mutex lock is now required
because the driver now needs to reclaim used BDs and release associated
DMA memory within the lock's context, while dma_free_coherent() might
sleep.

2. Introduce software command BD (struct netc_swcbd)

The hardware write-back overwrites the addr and len fields of the BD,
so the driver cannot rely on the hardware BD to free the associated DMA
memory. The driver now maintains a software shadow BD storing the DMA
buffer pointer, DMA address, and size. And netc_xmit_ntmp_cmd() only
reclaims older BDs when the number of used BDs reaches
NETC_CBDR_CLEAN_WORK (16). The software BD enables correct DMA memory
release. With this, struct ntmp_dma_buf and ntmp_free_data_mem() are no
longer needed and are removed.

3. Require callers to hold ring_lock across netc_xmit_ntmp_cmd()

netc_xmit_ntmp_cmd() releases the ring_lock before the caller finishes
consuming the response. At this point, if a concurrent thread submits
a new command, it may trigger ntmp_clean_cbdr() and free the DMA buffer
while it is still in use. Move ring_lock ownership to the caller to
ensure the response buffer cannot be reclaimed prematurely. So the
helpers ntmp_select_and_lock_cbdr() and ntmp_unlock_cbdr() are added.

These changes eliminate the DMA use-after-free condition and ensure safe
and consistent BD reclamation and DMA buffer lifecycle management.

Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP")
Link: https://lore.kernel.org/netdev/20260403011729.1795413-1-kuba@kernel.org/ # [1]
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
 drivers/net/ethernet/freescale/enetc/ntmp.c   | 214 ++++++++++--------
 .../ethernet/freescale/enetc/ntmp_private.h   |   8 +-
 include/linux/fsl/ntmp.h                      |   9 +-
 3 files changed, 134 insertions(+), 97 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/ntmp.c b/drivers/net/ethernet/freescale/enetc/ntmp.c
index b188eb2d40c0..70bbc5d2d5d4 100644
--- a/drivers/net/ethernet/freescale/enetc/ntmp.c
+++ b/drivers/net/ethernet/freescale/enetc/ntmp.c
@@ -7,6 +7,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/fsl/netc_global.h>
 #include <linux/iopoll.h>
+#include <linux/vmalloc.h>
 
 #include "ntmp_private.h"
 
@@ -42,6 +43,12 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev,
 	if (!cbdr->addr_base)
 		return -ENOMEM;
 
+	cbdr->swcbd = vcalloc(cbd_num, sizeof(struct netc_swcbd));
+	if (!cbdr->swcbd) {
+		dma_free_coherent(dev, size, cbdr->addr_base, cbdr->dma_base);
+		return -ENOMEM;
+	}
+
 	cbdr->dma_size = size;
 	cbdr->bd_num = cbd_num;
 	cbdr->regs = *regs;
@@ -52,7 +59,7 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev,
 	cbdr->addr_base_align = PTR_ALIGN(cbdr->addr_base,
 					  NTMP_BASE_ADDR_ALIGN);
 
-	spin_lock_init(&cbdr->ring_lock);
+	mutex_init(&cbdr->ring_lock);
 
 	cbdr->next_to_use = netc_read(cbdr->regs.pir);
 	cbdr->next_to_clean = netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX;
@@ -71,10 +78,24 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(ntmp_init_cbdr);
 
+static void ntmp_free_data_mem(struct device *dev, struct netc_swcbd *swcbd)
+{
+	if (unlikely(!swcbd->buf))
+		return;
+
+	dma_free_coherent(dev, swcbd->size + NTMP_DATA_ADDR_ALIGN,
+			  swcbd->buf, swcbd->dma);
+}
+
 void ntmp_free_cbdr(struct netc_cbdr *cbdr)
 {
 	/* Disable the Control BD Ring */
 	netc_write(cbdr->regs.mr, 0);
+
+	for (int i = 0; i < cbdr->bd_num; i++)
+		ntmp_free_data_mem(cbdr->dev, &cbdr->swcbd[i]);
+
+	vfree(cbdr->swcbd);
 	dma_free_coherent(cbdr->dev, cbdr->dma_size, cbdr->addr_base,
 			  cbdr->dma_base);
 	memset(cbdr, 0, sizeof(*cbdr));
@@ -94,40 +115,59 @@ static union netc_cbd *ntmp_get_cbd(struct netc_cbdr *cbdr, int index)
 
 static void ntmp_clean_cbdr(struct netc_cbdr *cbdr)
 {
-	union netc_cbd *cbd;
-	int i;
+	int i = cbdr->next_to_clean;
 
-	i = cbdr->next_to_clean;
 	while ((netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX) != i) {
-		cbd = ntmp_get_cbd(cbdr, i);
+		union netc_cbd *cbd = ntmp_get_cbd(cbdr, i);
+		struct netc_swcbd *swcbd = &cbdr->swcbd[i];
+
+		ntmp_free_data_mem(cbdr->dev, swcbd);
+		memset(swcbd, 0, sizeof(*swcbd));
 		memset(cbd, 0, sizeof(*cbd));
 		i = (i + 1) % cbdr->bd_num;
 	}
 
+	dma_wmb();
 	cbdr->next_to_clean = i;
 }
 
-static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd)
+static void ntmp_select_and_lock_cbdr(struct ntmp_user *user,
+				      struct netc_cbdr **cbdr)
+{
+	/* Currently only ENETC is supported, and it has only one command
+	 * BD ring.
+	 */
+	*cbdr = &user->ring[0];
+
+	mutex_lock(&(*cbdr)->ring_lock);
+}
+
+static void ntmp_unlock_cbdr(struct netc_cbdr *cbdr)
+{
+	mutex_unlock(&cbdr->ring_lock);
+}
+
+static int netc_xmit_ntmp_cmd(struct netc_cbdr *cbdr, union netc_cbd *cbd,
+			      struct netc_swcbd *swcbd)
 {
 	union netc_cbd *cur_cbd;
-	struct netc_cbdr *cbdr;
-	int i, err;
+	int i, err, used_bds;
 	u16 status;
 	u32 val;
 
-	/* Currently only i.MX95 ENETC is supported, and it only has one
-	 * command BD ring
-	 */
-	cbdr = &user->ring[0];
-
-	spin_lock_bh(&cbdr->ring_lock);
-
-	if (unlikely(!ntmp_get_free_cbd_num(cbdr)))
+	used_bds = cbdr->bd_num - ntmp_get_free_cbd_num(cbdr);
+	if (unlikely(used_bds >= NETC_CBDR_CLEAN_WORK)) {
 		ntmp_clean_cbdr(cbdr);
+		if (unlikely(!ntmp_get_free_cbd_num(cbdr))) {
+			ntmp_free_data_mem(cbdr->dev, swcbd);
+			return -EBUSY;
+		}
+	}
 
 	i = cbdr->next_to_use;
 	cur_cbd = ntmp_get_cbd(cbdr, i);
 	*cur_cbd = *cbd;
+	cbdr->swcbd[i] = *swcbd;
 	dma_wmb();
 
 	/* Update producer index of both software and hardware */
@@ -135,17 +175,16 @@ static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd)
 	cbdr->next_to_use = i;
 	netc_write(cbdr->regs.pir, i);
 
-	err = read_poll_timeout_atomic(netc_read, val,
-				       (val & NETC_CBDRCIR_INDEX) == i,
-				       NETC_CBDR_DELAY_US, NETC_CBDR_TIMEOUT,
-				       true, cbdr->regs.cir);
+	err = read_poll_timeout(netc_read, val,
+				(val & NETC_CBDRCIR_INDEX) == i,
+				NETC_CBDR_DELAY_US, NETC_CBDR_TIMEOUT,
+				true, cbdr->regs.cir);
 	if (unlikely(err))
-		goto cbdr_unlock;
+		return err;
 
 	if (unlikely(val & NETC_CBDRCIR_SBE)) {
-		dev_err(user->dev, "Command BD system bus error\n");
-		err = -EIO;
-		goto cbdr_unlock;
+		dev_err(cbdr->dev, "Command BD system bus error\n");
+		return -EIO;
 	}
 
 	dma_rmb();
@@ -157,40 +196,29 @@ static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd)
 	/* Check the writeback error status */
 	status = le16_to_cpu(cbd->resp_hdr.error_rr) & NTMP_RESP_ERROR;
 	if (unlikely(status)) {
-		err = -EIO;
-		dev_err(user->dev, "Command BD error: 0x%04x\n", status);
+		dev_err(cbdr->dev, "Command BD error: 0x%04x\n", status);
+		return -EIO;
 	}
 
-	ntmp_clean_cbdr(cbdr);
-	dma_wmb();
-
-cbdr_unlock:
-	spin_unlock_bh(&cbdr->ring_lock);
-
-	return err;
+	return 0;
 }
 
-static int ntmp_alloc_data_mem(struct ntmp_dma_buf *data, void **buf_align)
+static int ntmp_alloc_data_mem(struct device *dev, struct netc_swcbd *swcbd,
+			       void **buf_align)
 {
 	void *buf;
 
-	buf = dma_alloc_coherent(data->dev, data->size + NTMP_DATA_ADDR_ALIGN,
-				 &data->dma, GFP_KERNEL);
+	buf = dma_alloc_coherent(dev, swcbd->size + NTMP_DATA_ADDR_ALIGN,
+				 &swcbd->dma, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
-	data->buf = buf;
+	swcbd->buf = buf;
 	*buf_align = PTR_ALIGN(buf, NTMP_DATA_ADDR_ALIGN);
 
 	return 0;
 }
 
-static void ntmp_free_data_mem(struct ntmp_dma_buf *data)
-{
-	dma_free_coherent(data->dev, data->size + NTMP_DATA_ADDR_ALIGN,
-			  data->buf, data->dma);
-}
-
 static void ntmp_fill_request_hdr(union netc_cbd *cbd, dma_addr_t dma,
 				  int len, int table_id, int cmd,
 				  int access_method)
@@ -241,37 +269,39 @@ static int ntmp_delete_entry_by_id(struct ntmp_user *user, int tbl_id,
 				   u8 tbl_ver, u32 entry_id, u32 req_len,
 				   u32 resp_len)
 {
-	struct ntmp_dma_buf data = {
-		.dev = user->dev,
+	struct netc_swcbd swcbd = {
 		.size = max(req_len, resp_len),
 	};
 	struct ntmp_req_by_eid *req;
+	struct netc_cbdr *cbdr;
 	union netc_cbd cbd;
 	int err;
 
-	err = ntmp_alloc_data_mem(&data, (void **)&req);
+	err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req);
 	if (err)
 		return err;
 
 	ntmp_fill_crd_eid(req, tbl_ver, 0, 0, entry_id);
-	ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(req_len, resp_len),
+	ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(req_len, resp_len),
 			      tbl_id, NTMP_CMD_DELETE, NTMP_AM_ENTRY_ID);
 
-	err = netc_xmit_ntmp_cmd(user, &cbd);
+	ntmp_select_and_lock_cbdr(user, &cbdr);
+	err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd);
 	if (err)
 		dev_err(user->dev,
 			"Failed to delete entry 0x%x of %s, err: %pe",
 			entry_id, ntmp_table_name(tbl_id), ERR_PTR(err));
-
-	ntmp_free_data_mem(&data);
+	ntmp_unlock_cbdr(cbdr);
 
 	return err;
 }
 
-static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id,
-				  u32 len, struct ntmp_req_by_eid *req,
-				  dma_addr_t dma, bool compare_eid)
+static int ntmp_query_entry_by_id(struct netc_cbdr *cbdr, int tbl_id,
+				  struct ntmp_req_by_eid *req,
+				  struct netc_swcbd *swcbd,
+				  bool compare_eid)
 {
+	u32 len = NTMP_LEN(sizeof(*req), swcbd->size);
 	struct ntmp_cmn_resp_query *resp;
 	int cmd = NTMP_CMD_QUERY;
 	union netc_cbd cbd;
@@ -283,10 +313,11 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id,
 		cmd = NTMP_CMD_QU;
 
 	/* Request header */
-	ntmp_fill_request_hdr(&cbd, dma, len, tbl_id, cmd, NTMP_AM_ENTRY_ID);
-	err = netc_xmit_ntmp_cmd(user, &cbd);
+	ntmp_fill_request_hdr(&cbd, swcbd->dma, len, tbl_id, cmd,
+			      NTMP_AM_ENTRY_ID);
+	err = netc_xmit_ntmp_cmd(cbdr, &cbd, swcbd);
 	if (err) {
-		dev_err(user->dev,
+		dev_err(cbdr->dev,
 			"Failed to query entry 0x%x of %s, err: %pe\n",
 			entry_id, ntmp_table_name(tbl_id), ERR_PTR(err));
 		return err;
@@ -300,7 +331,7 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id,
 
 	resp = (struct ntmp_cmn_resp_query *)req;
 	if (unlikely(le32_to_cpu(resp->entry_id) != entry_id)) {
-		dev_err(user->dev,
+		dev_err(cbdr->dev,
 			"%s: query EID 0x%x doesn't match response EID 0x%x\n",
 			ntmp_table_name(tbl_id), entry_id, le32_to_cpu(resp->entry_id));
 		return -EIO;
@@ -312,15 +343,15 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id,
 int ntmp_maft_add_entry(struct ntmp_user *user, u32 entry_id,
 			struct maft_entry_data *maft)
 {
-	struct ntmp_dma_buf data = {
-		.dev = user->dev,
+	struct netc_swcbd swcbd = {
 		.size = sizeof(struct maft_req_add),
 	};
 	struct maft_req_add *req;
+	struct netc_cbdr *cbdr;
 	union netc_cbd cbd;
 	int err;
 
-	err = ntmp_alloc_data_mem(&data, (void **)&req);
+	err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req);
 	if (err)
 		return err;
 
@@ -329,14 +360,15 @@ int ntmp_maft_add_entry(struct ntmp_user *user, u32 entry_id,
 	req->keye = maft->keye;
 	req->cfge = maft->cfge;
 
-	ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(data.size, 0),
+	ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(swcbd.size, 0),
 			      NTMP_MAFT_ID, NTMP_CMD_ADD, NTMP_AM_ENTRY_ID);
-	err = netc_xmit_ntmp_cmd(user, &cbd);
+
+	ntmp_select_and_lock_cbdr(user, &cbdr);
+	err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd);
 	if (err)
 		dev_err(user->dev, "Failed to add MAFT entry 0x%x, err: %pe\n",
 			entry_id, ERR_PTR(err));
-
-	ntmp_free_data_mem(&data);
+	ntmp_unlock_cbdr(cbdr);
 
 	return err;
 }
@@ -345,31 +377,31 @@ EXPORT_SYMBOL_GPL(ntmp_maft_add_entry);
 int ntmp_maft_query_entry(struct ntmp_user *user, u32 entry_id,
 			  struct maft_entry_data *maft)
 {
-	struct ntmp_dma_buf data = {
-		.dev = user->dev,
+	struct netc_swcbd swcbd = {
 		.size = sizeof(struct maft_resp_query),
 	};
 	struct maft_resp_query *resp;
 	struct ntmp_req_by_eid *req;
+	struct netc_cbdr *cbdr;
 	int err;
 
-	err = ntmp_alloc_data_mem(&data, (void **)&req);
+	err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req);
 	if (err)
 		return err;
 
 	ntmp_fill_crd_eid(req, user->tbl.maft_ver, 0, 0, entry_id);
-	err = ntmp_query_entry_by_id(user, NTMP_MAFT_ID,
-				     NTMP_LEN(sizeof(*req), data.size),
-				     req, data.dma, true);
+
+	ntmp_select_and_lock_cbdr(user, &cbdr);
+	err = ntmp_query_entry_by_id(cbdr, NTMP_MAFT_ID, req, &swcbd, true);
 	if (err)
-		goto end;
+		goto unlock_cbdr;
 
 	resp = (struct maft_resp_query *)req;
 	maft->keye = resp->keye;
 	maft->cfge = resp->cfge;
 
-end:
-	ntmp_free_data_mem(&data);
+unlock_cbdr:
+	ntmp_unlock_cbdr(cbdr);
 
 	return err;
 }
@@ -385,8 +417,9 @@ EXPORT_SYMBOL_GPL(ntmp_maft_delete_entry);
 int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table,
 			   int count)
 {
-	struct ntmp_dma_buf data = {.dev = user->dev};
 	struct rsst_req_update *req;
+	struct netc_swcbd swcbd;
+	struct netc_cbdr *cbdr;
 	union netc_cbd cbd;
 	int err, i;
 
@@ -394,8 +427,8 @@ int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table,
 		/* HW only takes in a full 64 entry table */
 		return -EINVAL;
 
-	data.size = struct_size(req, groups, count);
-	err = ntmp_alloc_data_mem(&data, (void **)&req);
+	swcbd.size = struct_size(req, groups, count);
+	err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req);
 	if (err)
 		return err;
 
@@ -405,15 +438,15 @@ int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table,
 	for (i = 0; i < count; i++)
 		req->groups[i] = (u8)(table[i]);
 
-	ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(data.size, 0),
+	ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(swcbd.size, 0),
 			      NTMP_RSST_ID, NTMP_CMD_UPDATE, NTMP_AM_ENTRY_ID);
 
-	err = netc_xmit_ntmp_cmd(user, &cbd);
+	ntmp_select_and_lock_cbdr(user, &cbdr);
+	err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd);
 	if (err)
 		dev_err(user->dev, "Failed to update RSST entry, err: %pe\n",
 			ERR_PTR(err));
-
-	ntmp_free_data_mem(&data);
+	ntmp_unlock_cbdr(cbdr);
 
 	return err;
 }
@@ -421,8 +454,9 @@ EXPORT_SYMBOL_GPL(ntmp_rsst_update_entry);
 
 int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count)
 {
-	struct ntmp_dma_buf data = {.dev = user->dev};
 	struct ntmp_req_by_eid *req;
+	struct netc_swcbd swcbd;
+	struct netc_cbdr *cbdr;
 	union netc_cbd cbd;
 	int err, i;
 	u8 *group;
@@ -431,21 +465,23 @@ int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count)
 		/* HW only takes in a full 64 entry table */
 		return -EINVAL;
 
-	data.size = NTMP_ENTRY_ID_SIZE + RSST_STSE_DATA_SIZE(count) +
-		    RSST_CFGE_DATA_SIZE(count);
-	err = ntmp_alloc_data_mem(&data, (void **)&req);
+	swcbd.size = NTMP_ENTRY_ID_SIZE + RSST_STSE_DATA_SIZE(count) +
+		     RSST_CFGE_DATA_SIZE(count);
+	err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req);
 	if (err)
 		return err;
 
 	/* Set the request data buffer */
 	ntmp_fill_crd_eid(req, user->tbl.rsst_ver, 0, 0, 0);
-	ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(sizeof(*req), data.size),
+	ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(sizeof(*req), swcbd.size),
 			      NTMP_RSST_ID, NTMP_CMD_QUERY, NTMP_AM_ENTRY_ID);
-	err = netc_xmit_ntmp_cmd(user, &cbd);
+
+	ntmp_select_and_lock_cbdr(user, &cbdr);
+	err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd);
 	if (err) {
 		dev_err(user->dev, "Failed to query RSST entry, err: %pe\n",
 			ERR_PTR(err));
-		goto end;
+		goto unlock_cbdr;
 	}
 
 	group = (u8 *)req;
@@ -453,8 +489,8 @@ int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count)
 	for (i = 0; i < count; i++)
 		table[i] = group[i];
 
-end:
-	ntmp_free_data_mem(&data);
+unlock_cbdr:
+	ntmp_unlock_cbdr(cbdr);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/freescale/enetc/ntmp_private.h b/drivers/net/ethernet/freescale/enetc/ntmp_private.h
index 3459cc45b610..f8dff3ba2c28 100644
--- a/drivers/net/ethernet/freescale/enetc/ntmp_private.h
+++ b/drivers/net/ethernet/freescale/enetc/ntmp_private.h
@@ -14,6 +14,7 @@
 #define NETC_CBDR_BD_NUM	256
 #define NETC_CBDRCIR_INDEX	GENMASK(9, 0)
 #define NETC_CBDRCIR_SBE	BIT(31)
+#define NETC_CBDR_CLEAN_WORK	16
 
 union netc_cbd {
 	struct {
@@ -56,13 +57,6 @@ union netc_cbd {
 	} resp_hdr; /* NTMP Response Message Header Format */
 };
 
-struct ntmp_dma_buf {
-	struct device *dev;
-	size_t size;
-	void *buf;
-	dma_addr_t dma;
-};
-
 struct ntmp_cmn_req_data {
 	__le16 update_act;
 	u8 dbg_opt;
diff --git a/include/linux/fsl/ntmp.h b/include/linux/fsl/ntmp.h
index 916dc4fe7de3..83a449b4d6ec 100644
--- a/include/linux/fsl/ntmp.h
+++ b/include/linux/fsl/ntmp.h
@@ -31,6 +31,12 @@ struct netc_tbl_vers {
 	u8 rsst_ver;
 };
 
+struct netc_swcbd {
+	void *buf;
+	dma_addr_t dma;
+	size_t size;
+};
+
 struct netc_cbdr {
 	struct device *dev;
 	struct netc_cbdr_regs regs;
@@ -44,9 +50,10 @@ struct netc_cbdr {
 	void *addr_base_align;
 	dma_addr_t dma_base;
 	dma_addr_t dma_base_align;
+	struct netc_swcbd *swcbd;
 
 	/* Serialize the order of command BD ring */
-	spinlock_t ring_lock;
+	struct mutex ring_lock;
 };
 
 struct ntmp_user {
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next 5/6] net: stmmac: move PHY handling out of __stmmac_open()/release()
From: Alexander Stein @ 2026-04-15  6:08 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, Russell King (Oracle)
  Cc: Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin,
	netdev, Paolo Abeni
In-Reply-To: <E1v11A3-0000000774G-3PKY@rmk-PC.armlinux.org.uk>

Hi,

Am Dienstag, 23. September 2025, 13:26:19 CEST schrieb Russell King (Oracle):
> Move the PHY attachment/detachment from the network driver out of
> __stmmac_open() and __stmmac_release() into stmmac_open() and
> stmmac_release() where these actions will only happen when the
> interface is administratively brought up or down. It does not make
> sense to detach and re-attach the PHY during a change of MTU.

Sorry for coming up now. But I recently noticed this commit breaks changing
the MTU on i.MX8MP. Once I simply change the MTU I run into some DMA error:
$ ip link set dev end1 mtu 1400
imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-4
imx-dwmac 30bf0000.ethernet end1: Link is Down
imx-dwmac 30bf0000.ethernet end1: Failed to reset the dma
imx-dwmac 30bf0000.ethernet end1: stmmac_hw_setup: DMA engine initialization failed
imx-dwmac 30bf0000.ethernet end1: __stmmac_open: Hw setup failed
imx-dwmac 30bf0000.ethernet end1: failed reopening the interface after MTU change

Using the command above bisecting was straight forward.
For some reason detach and re-attaching the PHY seems necessary on this platform.
There already too much changes for simply reverting this commit.

Best regards,
Alexander

> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 29 +++++++++++--------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 4acd180d2da8..4844d563e291 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3937,10 +3937,6 @@ static int __stmmac_open(struct net_device *dev,
>  	u32 chan;
>  	int ret;
>  
> -	ret = stmmac_init_phy(dev);
> -	if (ret)
> -		return ret;
> -
>  	for (int i = 0; i < MTL_MAX_TX_QUEUES; i++)
>  		if (priv->dma_conf.tx_queue[i].tbs & STMMAC_TBS_EN)
>  			dma_conf->tx_queue[i].tbs = priv->dma_conf.tx_queue[i].tbs;
> @@ -3990,7 +3986,6 @@ static int __stmmac_open(struct net_device *dev,
>  
>  	stmmac_release_ptp(priv);
>  init_error:
> -	phylink_disconnect_phy(priv->phylink);
>  	return ret;
>  }
>  
> @@ -4010,18 +4005,28 @@ static int stmmac_open(struct net_device *dev)
>  
>  	ret = pm_runtime_resume_and_get(priv->device);
>  	if (ret < 0)
> -		goto err;
> +		goto err_dma_resources;
> +
> +	ret = stmmac_init_phy(dev);
> +	if (ret)
> +		goto err_runtime_pm;
>  
>  	ret = __stmmac_open(dev, dma_conf);
> -	if (ret) {
> -		pm_runtime_put(priv->device);
> -err:
> -		free_dma_desc_resources(priv, dma_conf);
> -	}
> +	if (ret)
> +		goto err_disconnect_phy;
>  
>  	kfree(dma_conf);
>  
>  	return ret;
> +
> +err_disconnect_phy:
> +	phylink_disconnect_phy(priv->phylink);
> +err_runtime_pm:
> +	pm_runtime_put(priv->device);
> +err_dma_resources:
> +	free_dma_desc_resources(priv, dma_conf);
> +	kfree(dma_conf);
> +	return ret;
>  }
>  
>  static void __stmmac_release(struct net_device *dev)
> @@ -4038,7 +4043,6 @@ static void __stmmac_release(struct net_device *dev)
>  
>  	/* Stop and disconnect the PHY */
>  	phylink_stop(priv->phylink);
> -	phylink_disconnect_phy(priv->phylink);
>  
>  	stmmac_disable_all_queues(priv);
>  
> @@ -4078,6 +4082,7 @@ static int stmmac_release(struct net_device *dev)
>  
>  	__stmmac_release(dev);
>  
> +	phylink_disconnect_phy(priv->phylink);
>  	pm_runtime_put(priv->device);
>  
>  	return 0;
> 


-- 
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
http://www.tq-group.com/



^ permalink raw reply

* [PATCH] [PATCH net] tipc: fix UAF race in tipc_mon_peer_up/down/remove_peer vs bearer teardown
From: SnailSploit | Kai Aizen @ 2026-04-15  6:12 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion, jmaloy, ying.xue, kuba, pabeni, stable,
	Kai Aizen

From: Kai Aizen <kai.aizen.dev@gmail.com>

CVE-2025-40280 fixed tipc_mon_reinit_self() accessing monitors[] from a
workqueue without RTNL.  That patch closed the workqueue path by adding
rtnl_lock() around the call.

However, three additional functions in the same subsystem access
tipc_net->monitors[] from softirq context with no RCU protection at all:

  tipc_mon_peer_up()      - called from tipc_node_write_unlock()
  tipc_mon_peer_down()    - called from tipc_node_write_unlock()
  tipc_mon_remove_peer()  - called from tipc_node_link_down()

These three are invoked from the packet receive path (tipc_rcv ->
tipc_node_write_unlock / tipc_node_link_down) and hold only the per-node
rwlock, not RTNL.

Concurrently, bearer_disable() -- which always holds RTNL per its own
inline documentation -- calls tipc_mon_delete(), which:

  1. acquires mon->lock
  2. sets tn->monitors[bearer_id] = NULL
  3. frees all peer entries
  4. releases mon->lock
  5. calls kfree(mon)                     <-- no synchronize_rcu()

The race is structural: there is no shared lock between the data-path
reader (which reads monitors[id] then acquires mon->lock) and the
teardown path (which acquires mon->lock, NULLs the slot, then frees).
A softirq thread can read a non-NULL mon pointer, get preempted, and
resume after kfree(mon) has run on another CPU, then call
write_lock_bh(&mon->lock) on freed memory:

  CPU 0 (softirq / tipc_rcv)            CPU 1 (RTNL / bearer_disable)
  tipc_mon_peer_up()
    mon = tipc_monitor(net, id)
    [mon is non-NULL]
                                         tipc_mon_delete()
                                           write_lock_bh(&mon->lock)
                                           tn->monitors[id] = NULL
                                           ...
                                           write_unlock_bh(&mon->lock)
                                           kfree(mon)
    write_lock_bh(&mon->lock)   <-- UAF

The fix mirrors the existing bearer_list[] pattern in the same module:
convert monitors[] to __rcu, use rcu_assign_pointer() on creation,
RCU_INIT_POINTER() + synchronize_rcu() on deletion (before the kfree),
and the appropriate rcu_dereference_bh() vs rtnl_dereference() variant
at each read site depending on execution context.

synchronize_rcu() in tipc_mon_delete() is placed after the
write_unlock_bh() and before timer_shutdown_sync() + kfree() to ensure
all softirq-context readers that already observed the old pointer have
completed before the memory is freed.

Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework")
Cc: stable@vger.kernel.org
Signed-off-by: Kai Aizen <kai.aizen.dev@gmail.com>
---
 net/tipc/core.h    |  2 +-
 net/tipc/monitor.c | 45 +++++++++++++++++++++++++++++----------------
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/net/tipc/core.h b/net/tipc/core.h
index 9ce5f9ff6..cd582f7a2 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -109,7 +109,7 @@ struct tipc_net {
 	u32 num_links;
 
 	/* Neighbor monitoring list */
-	struct tipc_monitor *monitors[MAX_BEARERS];
+	struct tipc_monitor __rcu *monitors[MAX_BEARERS];
 	int mon_threshold;
 
 	/* Bearer list */
diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
index a94b9b36a..2a0665e1d 100644
--- a/net/tipc/monitor.c
+++ b/net/tipc/monitor.c
@@ -97,9 +97,21 @@ struct tipc_monitor {
 	unsigned long timer_intv;
 };
 
-static struct tipc_monitor *tipc_monitor(struct net *net, int bearer_id)
+/*
+ * tipc_monitor_rcu_bh - dereference monitors[] from softirq / data path.
+ * Caller must be in an RCU-bh read-side critical section (softirq context
+ * implicitly satisfies this on non-PREEMPT_RT kernels; use explicit
+ * rcu_read_lock_bh() where needed on RT).
+ */
+static struct tipc_monitor *tipc_monitor_rcu_bh(struct net *net, int bearer_id)
+{
+	return rcu_dereference_bh(tipc_net(net)->monitors[bearer_id]);
+}
+
+/* tipc_monitor_rtnl - dereference monitors[] from RTNL-held control path. */
+static struct tipc_monitor *tipc_monitor_rtnl(struct net *net, int bearer_id)
 {
-	return tipc_net(net)->monitors[bearer_id];
+	return rtnl_dereference(tipc_net(net)->monitors[bearer_id]);
 }
 
 const int tipc_max_domain_size = sizeof(struct tipc_mon_domain);
@@ -194,7 +206,7 @@ static struct tipc_peer *get_peer(struct tipc_monitor *mon, u32 addr)
 
 static struct tipc_peer *get_self(struct net *net, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 
 	return mon->self;
 }
@@ -351,7 +363,7 @@ static void mon_assign_roles(struct tipc_monitor *mon, struct tipc_peer *head)
 
 void tipc_mon_remove_peer(struct net *net, u32 addr, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_peer *self;
 	struct tipc_peer *peer, *prev, *head;
 
@@ -421,7 +433,7 @@ static bool tipc_mon_add_peer(struct tipc_monitor *mon, u32 addr,
 
 void tipc_mon_peer_up(struct net *net, u32 addr, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_peer *self = get_self(net, bearer_id);
 	struct tipc_peer *peer, *head;
 
@@ -440,7 +452,7 @@ void tipc_mon_peer_up(struct net *net, u32 addr, int bearer_id)
 
 void tipc_mon_peer_down(struct net *net, u32 addr, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_peer *self;
 	struct tipc_peer *peer, *head;
 	struct tipc_mon_domain *dom;
@@ -480,7 +492,7 @@ void tipc_mon_peer_down(struct net *net, u32 addr, int bearer_id)
 void tipc_mon_rcv(struct net *net, void *data, u16 dlen, u32 addr,
 		  struct tipc_mon_state *state, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_mon_domain *arrv_dom = data;
 	struct tipc_mon_domain dom_bef;
 	struct tipc_mon_domain *dom;
@@ -566,7 +578,7 @@ void tipc_mon_rcv(struct net *net, void *data, u16 dlen, u32 addr,
 void tipc_mon_prep(struct net *net, void *data, int *dlen,
 		   struct tipc_mon_state *state, int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_mon_domain *dom = data;
 	u16 gen = mon->dom_gen;
 	u16 len;
@@ -600,7 +612,7 @@ void tipc_mon_get_state(struct net *net, u32 addr,
 			struct tipc_mon_state *state,
 			int bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = tipc_monitor_rcu_bh(net, bearer_id);
 	struct tipc_peer *peer;
 
 	if (!tipc_mon_is_active(net, mon)) {
@@ -651,7 +663,7 @@ int tipc_mon_create(struct net *net, int bearer_id)
 	struct tipc_peer *self;
 	struct tipc_mon_domain *dom;
 
-	if (tn->monitors[bearer_id])
+	if (rtnl_dereference(tn->monitors[bearer_id]))
 		return 0;
 
 	mon = kzalloc_obj(*mon, GFP_ATOMIC);
@@ -663,7 +675,7 @@ int tipc_mon_create(struct net *net, int bearer_id)
 		kfree(dom);
 		return -ENOMEM;
 	}
-	tn->monitors[bearer_id] = mon;
+	rcu_assign_pointer(tn->monitors[bearer_id], mon);
 	rwlock_init(&mon->lock);
 	mon->net = net;
 	mon->peer_cnt = 1;
@@ -682,7 +694,7 @@ int tipc_mon_create(struct net *net, int bearer_id)
 void tipc_mon_delete(struct net *net, int bearer_id)
 {
 	struct tipc_net *tn = tipc_net(net);
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = rtnl_dereference(tn->monitors[bearer_id]);
 	struct tipc_peer *self;
 	struct tipc_peer *peer, *tmp;
 
@@ -691,7 +703,7 @@ void tipc_mon_delete(struct net *net, int bearer_id)
 
 	self = get_self(net, bearer_id);
 	write_lock_bh(&mon->lock);
-	tn->monitors[bearer_id] = NULL;
+	RCU_INIT_POINTER(tn->monitors[bearer_id], NULL);
 	list_for_each_entry_safe(peer, tmp, &self->list, list) {
 		list_del(&peer->list);
 		hlist_del(&peer->hash);
@@ -700,6 +712,7 @@ void tipc_mon_delete(struct net *net, int bearer_id)
 	}
 	mon->self = NULL;
 	write_unlock_bh(&mon->lock);
+	synchronize_rcu();
 	timer_shutdown_sync(&mon->timer);
 	kfree(self->domain);
 	kfree(self);
@@ -712,7 +725,7 @@ void tipc_mon_reinit_self(struct net *net)
 	int bearer_id;
 
 	for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
-		mon = tipc_monitor(net, bearer_id);
+		mon = rtnl_dereference(tipc_net(net)->monitors[bearer_id]);
 		if (!mon)
 			continue;
 		write_lock_bh(&mon->lock);
@@ -798,7 +811,7 @@ static int __tipc_nl_add_monitor_peer(struct tipc_peer *peer,
 int tipc_nl_add_monitor_peer(struct net *net, struct tipc_nl_msg *msg,
 			     u32 bearer_id, u32 *prev_node)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = rtnl_dereference(tipc_net(net)->monitors[bearer_id]);
 	struct tipc_peer *peer;
 
 	if (!mon)
@@ -827,7 +840,7 @@ int tipc_nl_add_monitor_peer(struct net *net, struct tipc_nl_msg *msg,
 int __tipc_nl_add_monitor(struct net *net, struct tipc_nl_msg *msg,
 			  u32 bearer_id)
 {
-	struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
+	struct tipc_monitor *mon = rtnl_dereference(tipc_net(net)->monitors[bearer_id]);
 	char bearer_name[TIPC_MAX_BEARER_NAME];
 	struct nlattr *attrs;
 	void *hdr;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v2] net: openvswitch: decouple flow_table from ovs_mutex
From: Adrián Moreno @ 2026-04-15  6:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Aaron Conole, Eelco Chaudron, Ilya Maximets,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
	open list:OPENVSWITCH, open list
In-Reply-To: <b4534229-994a-42ac-a200-a9cb1b333e60@redhat.com>

On Mon, Apr 13, 2026 at 10:39:37AM +0200, Paolo Abeni wrote:
> On 4/7/26 2:04 PM, Adrian Moreno wrote:
> > Currently the entire ovs module is write-protected using the global
> > ovs_mutex. While this simple approach works fine for control-plane
> > operations (such as vport configurations), requiring the global mutex
> > for flow modifications can be problematic.
> >
> > During periods of high control-plane operations, e.g: netdevs (vports)
> > coming and going, RTNL can suffer contention. This contention is easily
> > transferred to the ovs_mutex as RTNL nests inside ovs_mutex. Flow
> > modifications, however, are done as part of packet processing and having
> > them wait for RTNL pressure to go away can lead to packet drops.
> >
> > This patch decouples flow_table modifications from ovs_mutex by means of
> > the following:
> >
> > 1 - Make flow_table an rcu-protected pointer inside the datapath.
> > This allows both objects to be protected independently while reducing the
> > amount of changes required in "flow_table.c".
> >
> > 2 - Create a new mutex inside the flow_table that protects it from
> > concurrent modifications.
> > Putting the mutex inside flow_table makes it easier to consume for
> > functions inside flow_table.c that do not currently take pointers to the
> > datapath.
> > Some function signatures need to be changed to accept flow_table so that
> > lockdep checks can be performed.
> >
> > 3 - Create a reference count to temporarily extend rcu protection from
> > the datapath to the flow_table.
> > In order to use the flow_table without locking ovs_mutex, the flow_table
> > pointer must be first dereferenced within an rcu-protected region.
> > Next, the table->mutex needs to be locked to protect it from
> > concurrent writes but mutexes must not be locked inside an rcu-protected
> > region, so the rcu-protected region must be left at which point the
> > datapath can be concurrently freed.
> > To extend the protection beyond the rcu region, a reference count is used.
> > One reference is held by the datapath, the other is temporarily
> > increased during flow modifications. For example:
> >
> > Datapath deletion:
> >
> >   ovs_lock();
> >   table = rcu_dereference_protected(dp->table, ...);
> >   rcu_assign_pointer(dp->table, NULL);
> >   ovs_flow_tbl_put(table);
> >   ovs_unlock();
> >
> > Flow modification:
> >
> >   rcu_read_lock();
> >   dp = get_dp(...);
> >   table = rcu_dereference(dp->table);
> >   ovs_flow_tbl_get(table);
> >   rcu_read_unlock();
> >
> >   mutex_lock(&table->lock);
> >   /* Perform modifications on the flow_table */
> >   mutex_unlock(&table->lock);
> >   ovs_flow_tbl_put(table);
> >
> > Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> > ---
> > v2: Fix argument in ovs_flow_tbl_put (sparse)
> >     Remove rcu checks in ovs_dp_masks_rebalance
> > ---
> >  net/openvswitch/datapath.c   | 285 ++++++++++++++++++++++++-----------
> >  net/openvswitch/datapath.h   |   2 +-
> >  net/openvswitch/flow.c       |  13 +-
> >  net/openvswitch/flow.h       |   9 +-
> >  net/openvswitch/flow_table.c | 180 ++++++++++++++--------
> >  net/openvswitch/flow_table.h |  51 ++++++-
> >  6 files changed, 380 insertions(+), 160 deletions(-)
>
> This is too big for a single patch. The changelog above already suggests
> a way of splitting the change. At least the RCU-ification addition
> should be straight forward in a separate patch, which in turn should be
> easily reviewable.

I agree. I'll try to split it in the next version.

>
> > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> > index e209099218b4..9c234993520c 100644
> > --- a/net/openvswitch/datapath.c
> > +++ b/net/openvswitch/datapath.c
> > @@ -88,13 +88,17 @@ static void ovs_notify(struct genl_family *family,
> >   * DOC: Locking:
> >   *
> >   * All writes e.g. Writes to device state (add/remove datapath, port, set
> > - * operations on vports, etc.), Writes to other state (flow table
> > - * modifications, set miscellaneous datapath parameters, etc.) are protected
> > - * by ovs_lock.
> > + * operations on vports, etc.) and writes to other datapath parameters
> > + * are protected by ovs_lock.
> > + *
> > + * Writes to the flow table are NOT protected by ovs_lock. Instead, a per-table
> > + * mutex and reference count are used (see comment above "struct flow_table"
> > + * definition). On some few occasions, the per-flow table mutex is nested
> > + * inside ovs_mutex.
> >   *
> >   * Reads are protected by RCU.
> >   *
> > - * There are a few special cases (mostly stats) that have their own
> > + * There are a few other special cases (mostly stats) that have their own
> >   * synchronization but they nest under all of above and don't interact with
> >   * each other.
> >   *
> > @@ -166,7 +170,6 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
> >  {
> >  	struct datapath *dp = container_of(rcu, struct datapath, rcu);
> >
> > -	ovs_flow_tbl_destroy(&dp->table);
> >  	free_percpu(dp->stats_percpu);
> >  	kfree(dp->ports);
> >  	ovs_meters_exit(dp);
> > @@ -247,6 +250,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
> >  	struct ovs_pcpu_storage *ovs_pcpu = this_cpu_ptr(ovs_pcpu_storage);
> >  	const struct vport *p = OVS_CB(skb)->input_vport;
> >  	struct datapath *dp = p->dp;
> > +	struct flow_table *table;
> >  	struct sw_flow *flow;
> >  	struct sw_flow_actions *sf_acts;
> >  	struct dp_stats_percpu *stats;
> > @@ -257,9 +261,16 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
> >  	int error;
> >
> >  	stats = this_cpu_ptr(dp->stats_percpu);
> > +	table = rcu_dereference(dp->table);
> > +	if (!table) {
> > +		net_dbg_ratelimited("ovs: no flow table on datapath %s\n",
> > +				    ovs_dp_name(dp));
> > +		kfree_skb(skb);
> > +		return;
> > +	}
> >
> >  	/* Look up flow. */
> > -	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
> > +	flow = ovs_flow_tbl_lookup_stats(table, key, skb_get_hash(skb),
> >  					 &n_mask_hit, &n_cache_hit);
> >  	if (unlikely(!flow)) {
> >  		struct dp_upcall_info upcall;
> > @@ -752,12 +763,16 @@ static struct genl_family dp_packet_genl_family __ro_after_init = {
> >  static void get_dp_stats(const struct datapath *dp, struct ovs_dp_stats *stats,
> >  			 struct ovs_dp_megaflow_stats *mega_stats)
> >  {
> > +	struct flow_table *table = ovsl_dereference(dp->table);
>
> Should be rcu_dereference_ovs_tbl() ?
>

This function is only called with ovs_lock() (at least for now, we might
reconsider it based on your other comments below), so I kept the
stricter assertion.

> >  	int i;
> >
> >  	memset(mega_stats, 0, sizeof(*mega_stats));
> >
> > -	stats->n_flows = ovs_flow_tbl_count(&dp->table);
> > -	mega_stats->n_masks = ovs_flow_tbl_num_masks(&dp->table);
> > +	if (table) {
> > +		stats->n_flows = ovs_flow_tbl_count(table);
>
> As noted by Aaron, READ_ONCE() is now needed when accessing
> table->count. And WRITE_ONCE when writing it
>

Yep.

> > +		mega_stats->n_masks = ovs_flow_tbl_num_masks(table);
>
> Sashiko says:
>
> ---
> get_dp_stats() accesses table->mask_array via ovs_flow_tbl_num_masks()
> while holding only ovs_mutex. Since this patch decouples flow table updates
> by moving them under table->lock, ovs_flow_cmd_new() can execute
> concurrently and trigger a reallocation of the mask array, freeing the old
> one via call_rcu().
> Because get_dp_stats() does not hold rcu_read_lock(), the thread can be
> preempted (as ovs_mutex is sleepable) and the RCU grace period might expire
> before the count is read. Can this lead to a use-after-free?
> ---
>

I guess it is possible that all that happens between the dereference of
table->mask_array and the read of "ma->count".

Maybe we could rcu_read_lock() the call to "ofs_flow_tbl_num_masks" or
use RCU protection for the entire flow-table-related statistics?
I think this would allow us to demote "rcu_dereference_ovs_tbl" to only
consider table lock and RCU and use to dereference "dp->table".

> Note that it also spotted pre-existing issues, please have a look:
>
> https://sashiko.dev/#/patchset/20260407120418.356718-1-amorenoz%40redhat.com
>

Interesting. I'll look deeper into this, thanks!

> [...]
> > @@ -71,15 +93,40 @@ struct flow_table {
> >
> >  extern struct kmem_cache *flow_stats_cache;
> >
> > +#ifdef CONFIG_LOCKDEP
> > +int lockdep_ovs_tbl_is_held(const struct flow_table *table);
> > +#else
> > +static inline int lockdep_ovs_tbl_is_held(const struct flow_table *table)
> > +{
> > +	(void)table;
>
> You can use the __always_unused annotation.
>

Thanks, will do.

> > +	return 1;
> > +}
> > +#endif
> > +
> > +#define ASSERT_OVS_TBL(tbl)   WARN_ON(!lockdep_ovs_tbl_is_held(tbl))
> > +
> > +/* Lock-protected update-allowed dereferences.*/
> > +#define ovs_tbl_dereference(p, tbl)	\
> > +	rcu_dereference_protected(p, lockdep_ovs_tbl_is_held(tbl))
> > +
> > +/* Read dereferences can be protected by either RCU, table lock or ovs_mutex. */
>
> Is this access schema really safe? I understand tables can be
> written/deleted under the table lock only. If so this should ignore the
> OVS mutex status.
>

Even if it was safe (which it might not), it's confusing and makes it
difficult to understand the overall locking strategy.

I'll revisit all uses carefully but I think the only use-cases for only
holding the ovs_mutex are datapath statistics reading (such as
ovs_dp_stats we are discussing above). I will try to make use RCU in
those cases which might make them safer and everything more robust.

Thank you!
Adrián

> /P
>


^ permalink raw reply

* Re: [PATCH net v2 1/1] af_unix: Reject SIOCATMARK on non-stream sockets
From: Yuan Tan @ 2026-04-15  6:23 UTC (permalink / raw)
  To: Kuniyuki Iwashima, Ren Wei
  Cc: yuantan098, netdev, davem, edumazet, kuba, pabeni, horms,
	rao.shoaib, yifanwucs, tomapufckgml, bird, enjou1224z,
	wangjiexun2025
In-Reply-To: <CAAVpQUC--TT0M1+xd096CPw881tzHX-MddmT8sZs4sjoatVeLQ@mail.gmail.com>


On 4/13/2026 10:33 PM, Kuniyuki Iwashima wrote:
> On Mon, Apr 13, 2026 at 5:29 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
>> From: Jiexun Wang <wangjiexun2025@gmail.com>
>>
>> SIOCATMARK reports whether the receive queue is at the urgent mark for
>> MSG_OOB.
>>
>> In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
>> SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
>> so they should not support SIOCATMARK either.
>>
>> Return -EOPNOTSUPP for non-stream sockets before checking the receive
>> queue.
>>
>> Fixes: 314001f0bf92 ("af_unix: Add OOB support")
>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Please read this guideline again.
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
>
> Co-developed-by is not where you mention someone who
> developed a tool to find a bug, and Suggested-by is not where
> you mention someone who funds your research.
> https://lore.kernel.org/netdev/7c26a74d-90c5-4520-a10a-22f06e098b86@gmail.com/
>
> When you just copy my fix and modify the commit message,
> the two tags are inappropriate.
>
We sincerely apologize for the misuse of the tags and the incorrect
crediting in our previous submission. We are still learning the
community's process and appreciate your patience.

Would you allow us to add:
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Or
Co-developed-by: Kumiki Iwashima <kuniyu@google.com>
in this patch?

In future patches, if a maintainer provides specific code suggestions,
we will first ask if a Suggested-by or Co-developed-by tag is needed
before sending a new version.

If a maintainer doesn't provide specific code, but points out errors in
our patch, explains how to fix the bug correctly, or mentions similar
paths that could be fixed together, I am not quite sure if a
Suggested-by tag is required. Maybe we can send an email first when we
are uncertain to avoid this happening again.

If the following tags are the correct way, we will update the v3 patch
like this:

Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yihan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by or Suggested-by? : Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>


One more thing: my understanding is that non-maintainers can also
provide Reviewed-by tags, right? While I will be less involved in direct
development and mentoring, I will still review every report and patch
from our team before submission to ensure quality.


>> Tested-by: Ren Wei <enjou1224z@gmail.com>
>> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
>> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>> ---
>> Changes in v2:
>> - Rework the fix based on maintainer feedback.
>> - Drop the receive-queue locking approach and reject SIOCATMARK on
>>   non-stream sockets instead, since it is only meaningful for MSG_OOB.
>> - V1 link: https://lore.kernel.org/netdev/f6cbbc8da90e95584847b5ceb60aae830d1631c2.1775731983.git.wangjiexun2025@gmail.com/
>>
>>  net/unix/af_unix.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index b23c33df8b46..09d43b4813b1 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -3300,6 +3300,9 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
>>                         struct sk_buff *skb;
>>                         int answ = 0;
>>
>> +                       if (sk->sk_type != SOCK_STREAM)
>> +                               return -EOPNOTSUPP;
>> +
>>                         mutex_lock(&u->iolock);
>>
>>                         skb = skb_peek(&sk->sk_receive_queue);
>> --
>> 2.34.1
>>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v2] dpf: fix UAF and double free in idpf_plug_vport_aux_dev() error path
From: Guangshuo Li @ 2026-04-15  6:33 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Hay,
	Tatyana Nikolova, Madhu Chittim, intel-wired-lan, netdev,
	linux-kernel, Greg Kroah-Hartman, stable
In-Reply-To: <143881d9-02d5-42be-bf77-9fe9e8353c06@intel.com>

Hi Jacob,

Thanks for reviewing.

On Wed, 15 Apr 2026 at 13:37, Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> No problem. I had missed the other version, which explains my confusion.
> Still, to my eyes, the fix looks to be an equivalent fix as one
> submitted by GregKH:
>
> https://lore.kernel.org/intel-wired-lan/2026041116-retail-bagginess-250f@gregkh/
>
> Do you agree this is effectively a different fix for the same problem?
> Or is there really two different double-free issues here that both need
> patching? I haven't been able to fully convince my self either way, but
> I am leaning on this being one problem, and I think Gregs solution feels
> simpler to understand.
>
> Thanks,
> Jake
>
> >
> > Thanks,
> > Guangshuo
>
Yes, I agree Greg's patch addresses the same underlying issue.

For the other path in `idpf_plug_core_aux_dev()`, I had also
previously sent a fix, for reference:

v1:
https://lkml.org/lkml/2026/3/18/1822

v2:
https://lkml.org/lkml/2026/3/19/1285

The v2 for the core path was posted after discussion on the list and
incorporated the feedback I received there.

So my understanding is that Greg's patch covers the same class of
issue in both places, while I had sent them as separate fixes.

Thanks,
Guangshuo

^ permalink raw reply

* [PATCH v3 net] ax25: fix OOB read after address header strip in ax25_rcv()
From: Ashutosh Desai @ 2026-04-15  6:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-hams, jreuter, davem, edumazet, kuba, pabeni, horms, stable,
	linux-kernel, Ashutosh Desai

A remote station can send a crafted KISS frame that is just long enough
to pass ax25_addr_parse() (minimum 14 address bytes) but carries no
control or PID bytes. After ax25_kiss_rcv() strips the KISS framing
byte and ax25_rcv() strips the address header with skb_pull(), skb->len
drops to zero. The subsequent reads of skb->data[0] (control byte) and
skb->data[1] (PID byte) are then out of bounds, which can crash the
kernel or leak heap memory to a remote attacker.

Use pskb_may_pull(skb, 2) after the skb_pull() to ensure both bytes
are in the linear area before reading them. Discard malformed frames
that carry no control/PID pair.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
---
V2 -> V3: remove incorrect Suggested-by; add symptom, Fixes, Cc stable
V1 -> V2: use pskb_may_pull(skb, 2) instead of skb->len < 2

v2: https://lore.kernel.org/netdev/20260409152400.2219716-1-ashutoshdesai993@gmail.com/
v1: https://lore.kernel.org/netdev/20260409012235.2049389-1-ashutoshdesai993@gmail.com/

 net/ax25/ax25_in.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index d75b3e9ed93d..6a71dea876a1 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -217,6 +217,11 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
 	 */
 	skb_pull(skb, ax25_addr_size(&dp));
 
+	if (!pskb_may_pull(skb, 2)) {
+		kfree_skb(skb);
+		return 0;
+	}
+
 	/* For our port addresses ? */
 	if (ax25cmp(&dest, dev_addr) == 0 && dp.lastrepeat + 1 == dp.ndigi)
 		mine = 1;
-- 
2.34.1


^ permalink raw reply related

* Re: [patch 32/38] powerpc/spufs: Use mftb() directly
From: Christophe Leroy (CS GROUP) @ 2026-04-15  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Michael Ellerman, linuxppc-dev, Arnd Bergmann, x86, Lu Baolu,
	iommu, Michael Grzeschik, netdev, linux-wireless, Herbert Xu,
	linux-crypto, Vlastimil Babka, linux-mm, David Woodhouse,
	Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
	Andrew Morton, Uladzislau Rezki, Marco Elver, Dmitry Vyukov,
	kasan-dev, Andrey Ryabinin, Thomas Sailer, linux-hams,
	Jason A. Donenfeld, Richard Henderson, linux-alpha, Russell King,
	linux-arm-kernel, Catalin Marinas, Huacai Chen, loongarch,
	Geert Uytterhoeven, linux-m68k, Dinh Nguyen, Jonas Bonn,
	linux-openrisc, Helge Deller, linux-parisc, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120319.723429844@kernel.org>



Le 10/04/2026 à 14:21, Thomas Gleixner a écrit :
> There is no reason to indirect via get_cycles(), which is about to be
> removed.
> 
> Use mftb() directly.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

> ---
>   arch/powerpc/platforms/cell/spufs/switch.c |    5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> --- a/arch/powerpc/platforms/cell/spufs/switch.c
> +++ b/arch/powerpc/platforms/cell/spufs/switch.c
> @@ -34,6 +34,7 @@
>   #include <asm/spu_priv1.h>
>   #include <asm/spu_csa.h>
>   #include <asm/mmu_context.h>
> +#include <asm/time.h>
>   
>   #include "spufs.h"
>   
> @@ -279,7 +280,7 @@ static inline void save_timebase(struct
>   	 *    Read PPE Timebase High and Timebase low registers
>   	 *    and save in CSA.  TBD.
>   	 */
> -	csa->suspend_time = get_cycles();
> +	csa->suspend_time = mftb();
>   }
>   
>   static inline void remove_other_spu_access(struct spu_state *csa,
> @@ -1261,7 +1262,7 @@ static inline void setup_decr(struct spu
>   	 *     in LSCSA.
>   	 */
>   	if (csa->priv2.mfc_control_RW & MFC_CNTL_DECREMENTER_RUNNING) {
> -		cycles_t resume_time = get_cycles();
> +		cycles_t resume_time = mftb();
>   		cycles_t delta_time = resume_time - csa->suspend_time;
>   
>   		csa->lscsa->decr_status.slot[0] = SPU_DECR_STATUS_RUNNING;
> 
> 


^ permalink raw reply

* Re: [patch 05/38] treewide: Remove CLOCK_TICK_RATE
From: Christophe Leroy (CS GROUP) @ 2026-04-15  6:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik, netdev,
	linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
	linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
	Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
	Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
	Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
	linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
	Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
	Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
	linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120317.910770161@kernel.org>



Le 10/04/2026 à 14:18, Thomas Gleixner a écrit :
> This has been scheduled for removal more than a decade ago and the comments
> related to it have been dutifully ignored. The last dependencies are gone.
> 
> Remove it along with various now empty asm/timex.h files.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

For powerpc:

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

> ---
>   arch/alpha/include/asm/timex.h      |    4 ----
>   arch/arc/include/asm/timex.h        |   15 ---------------
>   arch/arm/mach-omap1/Kconfig         |    2 +-
>   arch/hexagon/include/asm/timex.h    |    3 ---
>   arch/m68k/include/asm/timex.h       |   15 ---------------
>   arch/microblaze/include/asm/timex.h |   13 -------------
>   arch/mips/include/asm/timex.h       |    8 --------
>   arch/openrisc/include/asm/timex.h   |    3 ---
>   arch/parisc/include/asm/timex.h     |    2 --
>   arch/powerpc/include/asm/timex.h    |    2 --
>   arch/s390/include/asm/timex.h       |    2 --
>   arch/sh/include/asm/timex.h         |   24 ------------------------
>   arch/sparc/include/asm/timex.h      |    2 +-
>   arch/sparc/include/asm/timex_32.h   |   14 --------------
>   arch/sparc/include/asm/timex_64.h   |    2 --
>   arch/um/include/asm/timex.h         |    9 ---------
>   arch/x86/include/asm/timex.h        |    3 ---
>   17 files changed, 2 insertions(+), 121 deletions(-)
> 
> --- a/arch/alpha/include/asm/timex.h
> +++ b/arch/alpha/include/asm/timex.h
> @@ -7,10 +7,6 @@
>   #ifndef _ASMALPHA_TIMEX_H
>   #define _ASMALPHA_TIMEX_H
>   
> -/* With only one or two oddballs, we use the RTC as the ticker, selecting
> -   the 32.768kHz reference clock, which nicely divides down to our HZ.  */
> -#define CLOCK_TICK_RATE	32768
> -
>   /*
>    * Standard way to access the cycle counter.
>    * Currently only used on SMP for scheduling.
> --- a/arch/arc/include/asm/timex.h
> +++ /dev/null
> @@ -1,15 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.synopsys.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cac13d5b928bc4eabd9b708de96fb5935%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C639114203455047148%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=uCL895qVLUoy3Stzhmgph2DiYmjpd4RPdQIW2dZcJ7w%3D&reserved=0)
> - */
> -
> -#ifndef _ASM_ARC_TIMEX_H
> -#define _ASM_ARC_TIMEX_H
> -
> -#define CLOCK_TICK_RATE	80000000 /* slated to be removed */
> -
> -#include <asm-generic/timex.h>
> -
> -/* XXX: get_cycles() to be implemented with RTSC insn */
> -
> -#endif /* _ASM_ARC_TIMEX_H */
> --- a/arch/arm/mach-omap1/Kconfig
> +++ b/arch/arm/mach-omap1/Kconfig
> @@ -74,7 +74,7 @@ config OMAP_32K_TIMER
>   	  currently only available for OMAP16XX, 24XX, 34XX, OMAP4/5 and DRA7XX.
>   
>   	  On OMAP2PLUS this value is only used for CONFIG_HZ and
> -	  CLOCK_TICK_RATE compile time calculation.
> +	  timer frequency compile time calculation.
>   	  The actual timer selection is done in the board file
>   	  through the (DT_)MACHINE_START structure.
>   
> --- a/arch/hexagon/include/asm/timex.h
> +++ b/arch/hexagon/include/asm/timex.h
> @@ -9,9 +9,6 @@
>   #include <asm-generic/timex.h>
>   #include <asm/hexagon_vm.h>
>   
> -/* Using TCX0 as our clock.  CLOCK_TICK_RATE scheduled to be removed. */
> -#define CLOCK_TICK_RATE              19200
> -
>   #define ARCH_HAS_READ_CURRENT_TIMER
>   
>   static inline int read_current_timer(unsigned long *timer_val)
> --- a/arch/m68k/include/asm/timex.h
> +++ b/arch/m68k/include/asm/timex.h
> @@ -7,21 +7,6 @@
>   #ifndef _ASMm68K_TIMEX_H
>   #define _ASMm68K_TIMEX_H
>   
> -#ifdef CONFIG_COLDFIRE
> -/*
> - * CLOCK_TICK_RATE should give the underlying frequency of the tick timer
> - * to make ntp work best.  For Coldfires, that's the main clock.
> - */
> -#include <asm/coldfire.h>
> -#define CLOCK_TICK_RATE	MCF_CLK
> -#else
> -/*
> - * This default CLOCK_TICK_RATE is probably wrong for many 68k boards
> - * Users of those boards will need to check and modify accordingly
> - */
> -#define CLOCK_TICK_RATE	1193180 /* Underlying HZ */
> -#endif
> -
>   typedef unsigned long cycles_t;
>   
>   static inline cycles_t get_cycles(void)
> --- a/arch/microblaze/include/asm/timex.h
> +++ /dev/null
> @@ -1,13 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/*
> - * Copyright (C) 2006 Atmark Techno, Inc.
> - */
> -
> -#ifndef _ASM_MICROBLAZE_TIMEX_H
> -#define _ASM_MICROBLAZE_TIMEX_H
> -
> -#include <asm-generic/timex.h>
> -
> -#define CLOCK_TICK_RATE 1000 /* Timer input freq. */
> -
> -#endif /* _ASM_TIMEX_H */
> --- a/arch/mips/include/asm/timex.h
> +++ b/arch/mips/include/asm/timex.h
> @@ -19,14 +19,6 @@
>   #include <asm/cpu-type.h>
>   
>   /*
> - * This is the clock rate of the i8253 PIT.  A MIPS system may not have
> - * a PIT by the symbol is used all over the kernel including some APIs.
> - * So keeping it defined to the number for the PIT is the only sane thing
> - * for now.
> - */
> -#define CLOCK_TICK_RATE 1193182
> -
> -/*
>    * Standard way to access the cycle counter.
>    * Currently only used on SMP for scheduling.
>    *
> --- a/arch/openrisc/include/asm/timex.h
> +++ b/arch/openrisc/include/asm/timex.h
> @@ -25,9 +25,6 @@ static inline cycles_t get_cycles(void)
>   }
>   #define get_cycles get_cycles
>   
> -/* This isn't really used any more */
> -#define CLOCK_TICK_RATE 1000
> -
>   #define ARCH_HAS_READ_CURRENT_TIMER
>   
>   #endif
> --- a/arch/parisc/include/asm/timex.h
> +++ b/arch/parisc/include/asm/timex.h
> @@ -9,8 +9,6 @@
>   
>   #include <asm/special_insns.h>
>   
> -#define CLOCK_TICK_RATE	1193180 /* Underlying HZ */
> -
>   typedef unsigned long cycles_t;
>   
>   static inline cycles_t get_cycles(void)
> --- a/arch/powerpc/include/asm/timex.h
> +++ b/arch/powerpc/include/asm/timex.h
> @@ -11,8 +11,6 @@
>   #include <asm/cputable.h>
>   #include <asm/vdso/timebase.h>
>   
> -#define CLOCK_TICK_RATE	1024000 /* Underlying HZ */
> -
>   typedef unsigned long cycles_t;
>   
>   static inline cycles_t get_cycles(void)
> --- a/arch/s390/include/asm/timex.h
> +++ b/arch/s390/include/asm/timex.h
> @@ -177,8 +177,6 @@ static inline void local_tick_enable(uns
>   	set_clock_comparator(get_lowcore()->clock_comparator);
>   }
>   
> -#define CLOCK_TICK_RATE		1193180 /* Underlying HZ */
> -
>   typedef unsigned long cycles_t;
>   
>   static __always_inline unsigned long get_tod_clock(void)
> --- a/arch/sh/include/asm/timex.h
> +++ /dev/null
> @@ -1,24 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/*
> - * linux/include/asm-sh/timex.h
> - *
> - * sh architecture timex specifications
> - */
> -#ifndef __ASM_SH_TIMEX_H
> -#define __ASM_SH_TIMEX_H
> -
> -/*
> - * Only parts using the legacy CPG code for their clock framework
> - * implementation need to define their own Pclk value. If provided, this
> - * can be used for accurately setting CLOCK_TICK_RATE, otherwise we
> - * simply fall back on the i8253 PIT value.
> - */
> -#ifdef CONFIG_SH_PCLK_FREQ
> -#define CLOCK_TICK_RATE		(CONFIG_SH_PCLK_FREQ / 4) /* Underlying HZ */
> -#else
> -#define CLOCK_TICK_RATE		1193180
> -#endif
> -
> -#include <asm-generic/timex.h>
> -
> -#endif /* __ASM_SH_TIMEX_H */
> --- a/arch/sparc/include/asm/timex.h
> +++ b/arch/sparc/include/asm/timex.h
> @@ -4,6 +4,6 @@
>   #if defined(__sparc__) && defined(__arch64__)
>   #include <asm/timex_64.h>
>   #else
> -#include <asm/timex_32.h>
> +#include <asm-generic/timex.h>
>   #endif
>   #endif
> --- a/arch/sparc/include/asm/timex_32.h
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/*
> - * linux/include/asm/timex.h
> - *
> - * sparc architecture timex specifications
> - */
> -#ifndef _ASMsparc_TIMEX_H
> -#define _ASMsparc_TIMEX_H
> -
> -#define CLOCK_TICK_RATE	1193180 /* Underlying HZ */
> -
> -#include <asm-generic/timex.h>
> -
> -#endif
> --- a/arch/sparc/include/asm/timex_64.h
> +++ b/arch/sparc/include/asm/timex_64.h
> @@ -9,8 +9,6 @@
>   
>   #include <asm/timer.h>
>   
> -#define CLOCK_TICK_RATE	1193180 /* Underlying HZ */
> -
>   /* Getting on the cycle counter on sparc64. */
>   typedef unsigned long cycles_t;
>   #define get_cycles()	tick_ops->get_tick()
> --- a/arch/um/include/asm/timex.h
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef __UM_TIMEX_H
> -#define __UM_TIMEX_H
> -
> -#define CLOCK_TICK_RATE (HZ)
> -
> -#include <asm-generic/timex.h>
> -
> -#endif
> --- a/arch/x86/include/asm/timex.h
> +++ b/arch/x86/include/asm/timex.h
> @@ -14,9 +14,6 @@ static inline unsigned long random_get_e
>   }
>   #define random_get_entropy random_get_entropy
>   
> -/* Assume we use the PIT time source for the clock tick */
> -#define CLOCK_TICK_RATE		PIT_TICK_RATE
> -
>   #define ARCH_HAS_READ_CURRENT_TIMER
>   
>   #endif /* _ASM_X86_TIMEX_H */
> 
> 


^ permalink raw reply

* Re: [patch 07/38] treewide: Consolidate cycles_t
From: Christophe Leroy (CS GROUP) @ 2026-04-15  6:43 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik, netdev,
	linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
	linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
	Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
	Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
	Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
	linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
	Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
	Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
	linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120318.045532623@kernel.org>



Le 10/04/2026 à 14:19, Thomas Gleixner a écrit :
> Most architectures define cycles_t as unsigned long execpt:
> 
>   - x86 requires it to be 64-bit independent of the 32-bit/64-bit build.
> 
>   - parisc and mips define it as unsigned int
> 
>     parisc has no real reason to do so as there are only a few usage sites
>     which either expand it to a 64-bit value or utilize only the lower
>     32bits.
> 
>     mips has no real requirement either.
> 
> Move the typedef to types.h and provide a config switch to enforce the
> 64-bit type for x86.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> ---
>   arch/Kconfig                       |    4 ++++
>   arch/alpha/include/asm/timex.h     |    3 ---
>   arch/arm/include/asm/timex.h       |    1 -
>   arch/loongarch/include/asm/timex.h |    2 --
>   arch/m68k/include/asm/timex.h      |    2 --
>   arch/mips/include/asm/timex.h      |    2 --
>   arch/nios2/include/asm/timex.h     |    2 --
>   arch/parisc/include/asm/timex.h    |    2 --
>   arch/powerpc/include/asm/timex.h   |    4 +---
>   arch/riscv/include/asm/timex.h     |    2 --
>   arch/s390/include/asm/timex.h      |    2 --
>   arch/sparc/include/asm/timex_64.h  |    1 -
>   arch/x86/Kconfig                   |    1 +
>   arch/x86/include/asm/tsc.h         |    2 --
>   include/asm-generic/timex.h        |    1 -
>   include/linux/types.h              |    6 ++++++
>   16 files changed, 12 insertions(+), 25 deletions(-)
> 
> --- a/arch/powerpc/include/asm/timex.h
> +++ b/arch/powerpc/include/asm/timex.h
> @@ -11,9 +11,7 @@
>   #include <asm/cputable.h>
>   #include <asm/vdso/timebase.h>
>   
> -typedef unsigned long cycles_t;
> -
> -static inline cycles_t get_cycles(void)
> +ostatic inline cycles_t get_cycles(void)

What is 'ostatic' ?

>   {
>   	return mftb();
>   }

^ permalink raw reply

* Re: [RFC PATCH 2/2] kernel/module: Decouple klp and ftrace from load_module
From: Song Chen @ 2026-04-15  6:43 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, da.gomez, samitolvanen, atomlin, jpoimboe, jikos, mbenes,
	pmladek, joe.lawrence, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-modules, linux-kernel,
	linux-trace-kernel, linux-acpi, linux-clk, linux-pm,
	live-patching, dm-devel, linux-raid, kgdb-bugreport, netdev
In-Reply-To: <1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com>

Hi,

On 4/14/26 22:33, Petr Pavlu wrote:
> On 4/13/26 10:07 AM, chensong_2000@189.cn wrote:
>> From: Song Chen <chensong_2000@189.cn>
>>
>> ftrace and livepatch currently have their module load/unload callbacks
>> hard-coded in the module loader as direct function calls to
>> ftrace_module_enable(), klp_module_coming(), klp_module_going()
>> and ftrace_release_mod(). This tight coupling was originally introduced
>> to enforce strict call ordering that could not be guaranteed by the
>> module notifier chain, which only supported forward traversal. Their
>> notifiers were moved in and out back and forth. see [1] and [2].
> 
> I'm unclear about what is meant by the notifiers being moved back and
> forth. The links point to patches that converted ftrace+klp from using
> module notifiers to explicit callbacks due to ordering issues, but this
> switch occurred only once. Have there been other attempts to use
> notifiers again?
> 

Yes,only once,i will rephrase.

>>
>> Now that the notifier chain supports reverse traversal via
>> blocking_notifier_call_chain_reverse(), the ordering can be enforced
>> purely through notifier priority. As a result, the module loader is now
>> decoupled from the implementation details of ftrace and livepatch.
>> What's more, adding a new subsystem with symmetric setup/teardown ordering
>> requirements during module load/unload no longer requires modifying
>> kernel/module/main.c; it only needs to register a notifier_block with an
>> appropriate priority.
>>
>> [1]:https://lore.kernel.org/all/
>> 	alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
>> [2]:https://lore.kernel.org/all/
>> 	20160301030034.GC12120@packer-debian-8-amd64.digitalocean.com/
> 
> Nit: Avoid wrapping URLs, as it breaks autolinking and makes the links
> harder to copy.
> 
> Better links would be:
> [1] https://lore.kernel.org/all/1455661953-15838-1-git-send-email-jeyu@redhat.com/
> [2] https://lore.kernel.org/all/1458176139-17455-1-git-send-email-jeyu@redhat.com/
> 
> The first link is the final version of what landed as commit
> 7dcd182bec27 ("ftrace/module: remove ftrace module notifier"). The
> second is commit 7e545d6eca20 ("livepatch/module: remove livepatch
> module notifier").
> 

Thank you, i will update.

>>
>> Signed-off-by: Song Chen <chensong_2000@189.cn>
>> ---
>>   include/linux/module.h  |  8 ++++++++
>>   kernel/livepatch/core.c | 29 ++++++++++++++++++++++++++++-
>>   kernel/module/main.c    | 34 +++++++++++++++-------------------
>>   kernel/trace/ftrace.c   | 38 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 89 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/linux/module.h b/include/linux/module.h
>> index 14f391b186c6..0bdd56f9defd 100644
>> --- a/include/linux/module.h
>> +++ b/include/linux/module.h
>> @@ -308,6 +308,14 @@ enum module_state {
>>   	MODULE_STATE_COMING,	/* Full formed, running module_init. */
>>   	MODULE_STATE_GOING,	/* Going away. */
>>   	MODULE_STATE_UNFORMED,	/* Still setting it up. */
>> +	MODULE_STATE_FORMED,
> 
> I don't see a reason to add a new module state. Why is it necessary and
> how does it fit with the existing states?
> 
because once notifier fails in state MODULE_STATE_UNFORMED (now only 
ftrace has someting to do in this state), notifier chain will roll back 
by calling blocking_notifier_call_chain_robust, i'm afraid 
MODULE_STATE_GOING is going to jeopardise the notifers which don't 
handle it appropriately, like:

case MODULE_STATE_COMING:
      kmalloc();
case MODULE_STATE_GOING:
      kfree();


>> +};
>> +
>> +enum module_notifier_prio {
>> +	MODULE_NOTIFIER_PRIO_LOW = INT_MIN,	/* Low prioroty, coming last, going first */
>> +	MODULE_NOTIFIER_PRIO_MID = 0,	/* Normal priority. */
>> +	MODULE_NOTIFIER_PRIO_SECOND_HIGH = INT_MAX - 1,	/* Second high priorigy, coming second*/
>> +	MODULE_NOTIFIER_PRIO_HIGH = INT_MAX,	/* High priorigy, coming first, going late. */
> 
> I suggest being explicit about how the notifiers are ordered. For
> example:
> 
> enum module_notifier_prio {
> 	MODULE_NOTIFIER_PRIO_NORMAL,	/* Normal priority, coming last, going first. */
> 	MODULE_NOTIFIER_PRIO_LIVEPATCH,
> 	MODULE_NOTIFIER_PRIO_FTRACE,	/* High priority, coming first, going late. */
> };
> 

accepted.

>>   };
>>   
>>   struct mod_tree_node {
>> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
>> index 28d15ba58a26..ce78bb23e24b 100644
>> --- a/kernel/livepatch/core.c
>> +++ b/kernel/livepatch/core.c
>> @@ -1375,13 +1375,40 @@ void *klp_find_section_by_name(const struct module *mod, const char *name,
>>   }
>>   EXPORT_SYMBOL_GPL(klp_find_section_by_name);
>>   
>> +static int klp_module_callback(struct notifier_block *nb, unsigned long op,
>> +			void *module)
>> +{
>> +	struct module *mod = module;
>> +	int err = 0;
>> +
>> +	switch (op) {
>> +	case MODULE_STATE_COMING:
>> +		err = klp_module_coming(mod);
>> +		break;
>> +	case MODULE_STATE_LIVE:
>> +		break;
>> +	case MODULE_STATE_GOING:
>> +		klp_module_going(mod);
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> klp_module_coming() and klp_module_going() are now used only in
> kernel/livepatch/core.c where they are also defined. This means the
> functions can be static and their declarations removed from
> include/linux/livepatch.h.
> 
> Nit: The MODULE_STATE_LIVE and default cases in the switch can be
> removed.
> 

accepted.

>> +
>> +	return notifier_from_errno(err);
>> +}
>> +
>> +static struct notifier_block klp_module_nb = {
>> +	.notifier_call = klp_module_callback,
>> +	.priority = MODULE_NOTIFIER_PRIO_SECOND_HIGH
>> +};
>> +
>>   static int __init klp_init(void)
>>   {
>>   	klp_root_kobj = kobject_create_and_add("livepatch", kernel_kobj);
>>   	if (!klp_root_kobj)
>>   		return -ENOMEM;
>>   
>> -	return 0;
>> +	return register_module_notifier(&klp_module_nb);
>>   }
>>   
>>   module_init(klp_init);
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index c3ce106c70af..226dd5b80997 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -833,10 +833,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>>   	/* Final destruction now no one is using it. */
>>   	if (mod->exit != NULL)
>>   		mod->exit();
>> -	blocking_notifier_call_chain(&module_notify_list,
>> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
>> -	klp_module_going(mod);
>> -	ftrace_release_mod(mod);
>>   
>>   	async_synchronize_full();
>>   
>> @@ -3135,10 +3133,8 @@ static noinline int do_init_module(struct module *mod)
>>   	mod->state = MODULE_STATE_GOING;
>>   	synchronize_rcu();
>>   	module_put(mod);
>> -	blocking_notifier_call_chain(&module_notify_list,
>> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
>> -	klp_module_going(mod);
>> -	ftrace_release_mod(mod);
>>   	free_module(mod);
>>   	wake_up_all(&module_wq);
>>   
> 
> The patch unexpectedly leaves a call to ftrace_free_mem() in
> do_init_module().

Thanks for pointing it out, it was removed when i implemented and 
tested, but when i organized the patch, it was left. I will remove it.

> 
>> @@ -3281,20 +3277,14 @@ static int complete_formation(struct module *mod, struct load_info *info)
>>   	return err;
>>   }
>>   
>> -static int prepare_coming_module(struct module *mod)
>> +static int prepare_module_state_transaction(struct module *mod,
>> +			unsigned long val_up, unsigned long val_down)
>>   {
>>   	int err;
>>   
>> -	ftrace_module_enable(mod);
>> -	err = klp_module_coming(mod);
>> -	if (err)
>> -		return err;
>> -
>>   	err = blocking_notifier_call_chain_robust(&module_notify_list,
>> -			MODULE_STATE_COMING, MODULE_STATE_GOING, mod);
>> +			val_up, val_down, mod);
>>   	err = notifier_to_errno(err);
>> -	if (err)
>> -		klp_module_going(mod);
>>   
>>   	return err;
>>   }
>> @@ -3468,14 +3458,21 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>   	init_build_id(mod, info);
>>   
>>   	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
>> -	ftrace_module_init(mod);
>> +	err = prepare_module_state_transaction(mod,
>> +				MODULE_STATE_UNFORMED, MODULE_STATE_FORMED);
> 
> I believe val_down should be MODULE_STATE_GOING to reverse the
> operation. Why is the new state MODULE_STATE_FORMED needed here?
to avoid this:

case MODULE_STATE_COMING:
      kmalloc();
case MODULE_STATE_GOING:
      kfree();



> 
>> +	if (err)
>> +		goto ddebug_cleanup;
>>   
>>   	/* Finally it's fully formed, ready to start executing. */
>>   	err = complete_formation(mod, info);
>> -	if (err)
>> +	if (err) {
>> +		blocking_notifier_call_chain_reverse(&module_notify_list,
>> +				MODULE_STATE_FORMED, mod);
>>   		goto ddebug_cleanup;
>> +	}
>>   
>> -	err = prepare_coming_module(mod);
>> +	err = prepare_module_state_transaction(mod,
>> +				MODULE_STATE_COMING, MODULE_STATE_GOING);
>>   	if (err)
>>   		goto bug_cleanup;
>>   
>> @@ -3522,7 +3519,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>   	destroy_params(mod->kp, mod->num_kp);
>>   	blocking_notifier_call_chain(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
> 
> My understanding is that all notifier chains for MODULE_STATE_GOING
> should be reversed.
yes, all, from lowest priority notifier to highest.
I will resend patch 1 which was failed due to my proxy setting.

> 
>> -	klp_module_going(mod);
>>    bug_cleanup:
>>   	mod->state = MODULE_STATE_GOING;
>>   	/* module_bug_cleanup needs module_mutex protection */
> 
> The patch removes the klp_module_going() cleanup call in load_module().
> Similarly, the ftrace_release_mod() call under the ddebug_cleanup label
> should be removed and appropriately replaced with a cleanup via
> a notifier.
> 
     err = prepare_module_state_transaction(mod,
                 MODULE_STATE_UNFORMED, MODULE_STATE_FORMED);
     if (err)
         goto ddebug_cleanup;

ftrace will be cleanup in blocking_notifier_call_chain_robust rolling back.

     err = prepare_module_state_transaction(mod,
                 MODULE_STATE_COMING, MODULE_STATE_GOING);

each notifier including ftrace and klp will be cleanup in 
blocking_notifier_call_chain_robust rolling back.

if all notifiers are successful in MODULE_STATE_COMING, they all will be 
clean up in
  coming_cleanup:
     mod->state = MODULE_STATE_GOING;
     destroy_params(mod->kp, mod->num_kp);
     blocking_notifier_call_chain(&module_notify_list,
                      MODULE_STATE_GOING, mod);

if  something wrong underneath.

>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 8df69e702706..efedb98d3db4 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -5241,6 +5241,44 @@ static int __init ftrace_mod_cmd_init(void)
>>   }
>>   core_initcall(ftrace_mod_cmd_init);
>>   
>> +static int ftrace_module_callback(struct notifier_block *nb, unsigned long op,
>> +			void *module)
>> +{
>> +	struct module *mod = module;
>> +
>> +	switch (op) {
>> +	case MODULE_STATE_UNFORMED:
>> +		ftrace_module_init(mod);
>> +		break;
>> +	case MODULE_STATE_COMING:
>> +		ftrace_module_enable(mod);
>> +		break;
>> +	case MODULE_STATE_LIVE:
>> +		ftrace_free_mem(mod, mod->mem[MOD_INIT_TEXT].base,
>> +				mod->mem[MOD_INIT_TEXT].base + mod->mem[MOD_INIT_TEXT].size);
>> +		break;
>> +	case MODULE_STATE_GOING:
>> +	case MODULE_STATE_FORMED:
>> +		ftrace_release_mod(mod);
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> ftrace_module_init(), ftrace_module_enable(), ftrace_free_mem() and
> ftrace_release_mod() should be newly used only in kernel/trace/ftrace.c
> where they are also defined. The functions can then be made static and
> removed from include/linux/ftrace.h.
> 
> Nit: The default case in the switch can be removed.
> 

accepted.

>> +
>> +	return notifier_from_errno(0);
> 
> Nit: This can be simply "return NOTIFY_OK;".

accepted
> 
>> +}
>> +
>> +static struct notifier_block ftrace_module_nb = {
>> +	.notifier_call = ftrace_module_callback,
>> +	.priority = MODULE_NOTIFIER_PRIO_HIGH
>> +};
>> +
>> +static int __init ftrace_register_module_notifier(void)
>> +{
>> +	return register_module_notifier(&ftrace_module_nb);
>> +}
>> +core_initcall(ftrace_register_module_notifier);
>> +
>>   static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
>>   				      struct ftrace_ops *op, struct ftrace_regs *fregs)
>>   {
> 

Best regards

Song


^ permalink raw reply

* [PATCH] llc: Return -EINPROGRESS from llc_ui_connect()
From: Ernestas Kulik @ 2026-04-15  6:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Ernestas Kulik

Given a zero sk_sndtimeo, llc_ui_connect() skips waiting for state
change and returns 0, confusing userspace applications that will assume
the socket is connected, making e.g. getpeername() calls error out.

Set rc to -EINPROGRESS before considering blocking, akin to AF_INET
sockets.

Signed-off-by: Ernestas Kulik <ernestas.k@iconn-networks.com>
---
 net/llc/af_llc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 59d593bb5d18..9317d092ba84 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -515,10 +515,12 @@ static int llc_ui_connect(struct socket *sock, struct sockaddr_unsized *uaddr,
 		sock->state  = SS_UNCONNECTED;
 		sk->sk_state = TCP_CLOSE;
 		goto out;
 	}
 
+	rc = -EINPROGRESS;
+
 	if (sk->sk_state == TCP_SYN_SENT) {
 		const long timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
 
 		if (!timeo || !llc_ui_wait_for_conn(sk, timeo))
 			goto out;
-- 
2.53.0


^ permalink raw reply related

* Re: [patch 33/38] powerpc: Select ARCH_HAS_RANDOM_ENTROPY
From: Christophe Leroy (CS GROUP) @ 2026-04-15  6:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Michael Ellerman, linuxppc-dev, Arnd Bergmann, x86, Lu Baolu,
	iommu, Michael Grzeschik, netdev, linux-wireless, Herbert Xu,
	linux-crypto, Vlastimil Babka, linux-mm, David Woodhouse,
	Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
	Andrew Morton, Uladzislau Rezki, Marco Elver, Dmitry Vyukov,
	kasan-dev, Andrey Ryabinin, Thomas Sailer, linux-hams,
	Jason A. Donenfeld, Richard Henderson, linux-alpha, Russell King,
	linux-arm-kernel, Catalin Marinas, Huacai Chen, loongarch,
	Geert Uytterhoeven, linux-m68k, Dinh Nguyen, Jonas Bonn,
	linux-openrisc, Helge Deller, linux-parisc, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120319.789114053@kernel.org>



Le 10/04/2026 à 14:21, Thomas Gleixner a écrit :
> The only remaining usage of get_cycles() is to provide random_get_entropy().
> 
> Switch powerpc over to the new scheme of selecting ARCH_HAS_RANDOM_ENTROPY
> and providing random_get_entropy() in asm/random.h.
> 
> Remove asm/timex.h as it has no functionality anymore.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

> ---
>   arch/powerpc/Kconfig              |    1 +
>   arch/powerpc/include/asm/random.h |   13 +++++++++++++
>   arch/powerpc/include/asm/timex.h  |   21 ---------------------
>   3 files changed, 14 insertions(+), 21 deletions(-)
> 
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -150,6 +150,7 @@ config PPC
>   	select ARCH_HAS_PREEMPT_LAZY
>   	select ARCH_HAS_PTDUMP
>   	select ARCH_HAS_PTE_SPECIAL
> +	select ARCH_HAS_RANDOM_ENTROPY
>   	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
>   	select ARCH_HAS_SET_MEMORY
>   	select ARCH_HAS_STRICT_KERNEL_RWX	if (PPC_BOOK3S || PPC_8xx) && !HIBERNATION
> --- /dev/null
> +++ b/arch/powerpc/include/asm/random.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_RANDOM_H
> +#define _ASM_POWERPC_RANDOM_H
> +
> +#include <asm/cputable.h>
> +#include <asm/vdso/timebase.h>
> +
> +static inline unsigned long random_get_entropy(void)
> +{
> +	return mftb();
> +}
> +
> +#endif	/* _ASM_POWERPC_RANDOM_H */
> --- a/arch/powerpc/include/asm/timex.h
> +++ b/arch/powerpc/include/asm/timex.h
> @@ -1,21 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _ASM_POWERPC_TIMEX_H
> -#define _ASM_POWERPC_TIMEX_H
> -
> -#ifdef __KERNEL__
> -
> -/*
> - * PowerPC architecture timex specifications
> - */
> -
> -#include <asm/cputable.h>
> -#include <asm/vdso/timebase.h>
> -
> -ostatic inline cycles_t get_cycles(void)
> -{
> -	return mftb();
> -}
> -#define get_cycles get_cycles
> -
> -#endif	/* __KERNEL__ */
> -#endif	/* _ASM_POWERPC_TIMEX_H */
> 
> 


^ permalink raw reply

* Re: [PATCH net v2 4/4] net: mana: Fix EQ leak in mana_remove on NULL port
From: Erni Sri Satya Vennela @ 2026-04-15  7:01 UTC (permalink / raw)
  To: Simon Horman
  Cc: pabeni, edumazet, ssengar, dipayanroy, gargaditya, decui,
	shacharr, linux-hyperv, kotaranov, kys, shirazsaleem, longli,
	kuba, andrew+netdev, wei.liu, linux-kernel, netdev, haiyangz,
	kees, stephen, leon, davem
In-Reply-To: <20260414154058.771241-1-horms@kernel.org>

On Tue, Apr 14, 2026 at 04:40:58PM +0100, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net: mana: Fix EQ leak in mana_remove on NULL port
> 
> In mana_remove(), when a NULL port is encountered in the port iteration
> loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
> queues allocated earlier by mana_create_eq(). This can happen when
> mana_probe_port() fails for port 0, leaving ac->ports[0] as NULL. On
> driver unload or error cleanup, mana_remove() hits the NULL entry and
> jumps past mana_destroy_eq(). Change 'goto out' to 'break' so the
> for-loop exits normally and mana_destroy_eq() is always reached.
> 
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> 
> Does this Fixes tag point to the correct commit? Looking at the commit
> history, the original driver commit ca9c54d2d6a5 did not have
> mana_destroy_eq() in mana_remove() at all, so the 'goto out' pattern could
> not have caused an EQ leak at that point.
> 
> The bug appears to have been introduced by commit 1e2d0824a9c3 ("net: mana:
> Add support for EQ sharing"), which added mana_destroy_eq(ac) to
> mana_remove() and placed it after the 'out:' label, making it skippable by
> the 'goto out' on NULL port.
> 
> Should the Fixes tag be:
>   Fixes: 1e2d0824a9c3 ("net: mana: Add support for EQ sharing")
Thankyou for the correction, Simon. 
I will make this change in the next version of the patchset.

- Vennela

^ permalink raw reply

* [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
From: chensong_2000 @ 2026-04-15  7:01 UTC (permalink / raw)
  To: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin, jpoimboe,
	jikos, mbenes, pmladek, joe.lawrence, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-modules, linux-kernel, linux-trace-kernel, linux-acpi,
	linux-clk, linux-pm, live-patching, dm-devel, linux-raid,
	kgdb-bugreport, netdev, Song Chen

From: Song Chen <chensong_2000@189.cn>

The current notifier chain implementation uses a single-linked list
(struct notifier_block *next), which only supports forward traversal
in priority order. This makes it difficult to handle cleanup/teardown
scenarios that require notifiers to be called in reverse priority order.

A concrete example is the ordering dependency between ftrace and
livepatch during module load/unload. see the detail here [1].

This patch replaces the single-linked list in struct notifier_block
with a struct list_head, converting the notifier chain into a
doubly-linked list sorted in descending priority order. Based on
this, a new function notifier_call_chain_reverse() is introduced,
which traverses the chain in reverse (ascending priority order).
The corresponding blocking_notifier_call_chain_reverse() is also
added as the locking wrapper for blocking notifier chains.

The internal notifier_call_chain_robust() is updated to use
notifier_call_chain_reverse() for rollback: on error, it records
the failing notifier (last_nb) and the count of successfully called
notifiers (nr), then rolls back exactly those nr-1 notifiers in
reverse order starting from last_nb's predecessor, without needing
to know the total length of the chain.

With this change, subsystems with symmetric setup/teardown ordering
requirements can register a single notifier_block with one priority
value, and rely on blocking_notifier_call_chain() for forward
traversal and blocking_notifier_call_chain_reverse() for reverse
traversal, without needing hard-coded call sequences or separate
notifier registrations for each direction.

[1]:https://lore.kernel.org/all
	/alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/

Signed-off-by: Song Chen <chensong_2000@189.cn>
---
 drivers/acpi/sleep.c      |   1 -
 drivers/clk/clk.c         |   2 +-
 drivers/cpufreq/cpufreq.c |   2 +-
 drivers/md/dm-integrity.c |   1 -
 drivers/md/md.c           |   1 -
 include/linux/notifier.h  |  26 ++---
 kernel/debug/debug_core.c |   1 -
 kernel/notifier.c         | 219 ++++++++++++++++++++++++++++++++------
 net/ipv4/nexthop.c        |   2 +-
 9 files changed, 201 insertions(+), 54 deletions(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 132a9df98471..b776dbd5a382 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -56,7 +56,6 @@ static int tts_notify_reboot(struct notifier_block *this,
 
 static struct notifier_block tts_notifier = {
 	.notifier_call	= tts_notify_reboot,
-	.next		= NULL,
 	.priority	= 0,
 };
 
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 47093cda9df3..b6fe380d0468 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4862,7 +4862,7 @@ int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
 			clk->core->notifier_count--;
 
 			/* XXX the notifier code should handle this better */
-			if (!cn->notifier_head.head) {
+			if (list_empty(&cn->notifier_head.head)) {
 				srcu_cleanup_notifier_head(&cn->notifier_head);
 				list_del(&cn->node);
 				kfree(cn);
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 277884d91913..12637e742ffa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -445,7 +445,7 @@ static void cpufreq_list_transition_notifiers(void)
 
 	mutex_lock(&cpufreq_transition_notifier_list.mutex);
 
-	for (nb = cpufreq_transition_notifier_list.head; nb; nb = nb->next)
+	list_for_each_entry(nb, &cpufreq_transition_notifier_list.head, entry)
 		pr_info("%pS\n", nb->notifier_call);
 
 	mutex_unlock(&cpufreq_transition_notifier_list.mutex);
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 06e805902151..ccdf75c40b62 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -3909,7 +3909,6 @@ static void dm_integrity_resume(struct dm_target *ti)
 	}
 
 	ic->reboot_notifier.notifier_call = dm_integrity_reboot;
-	ic->reboot_notifier.next = NULL;
 	ic->reboot_notifier.priority = INT_MAX - 1;	/* be notified after md and before hardware drivers */
 	WARN_ON(register_reboot_notifier(&ic->reboot_notifier));
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3ce6f9e9d38e..8249e78636ab 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -10480,7 +10480,6 @@ static int md_notify_reboot(struct notifier_block *this,
 
 static struct notifier_block md_notifier = {
 	.notifier_call	= md_notify_reboot,
-	.next		= NULL,
 	.priority	= INT_MAX, /* before any real devices */
 };
 
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 01b6c9d9956f..b2abbdfcaadd 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -53,41 +53,41 @@ typedef	int (*notifier_fn_t)(struct notifier_block *nb,
 
 struct notifier_block {
 	notifier_fn_t notifier_call;
-	struct notifier_block __rcu *next;
+	struct list_head __rcu entry;
 	int priority;
 };
 
 struct atomic_notifier_head {
 	spinlock_t lock;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct blocking_notifier_head {
 	struct rw_semaphore rwsem;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct raw_notifier_head {
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct srcu_notifier_head {
 	struct mutex mutex;
 	struct srcu_usage srcuu;
 	struct srcu_struct srcu;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 #define ATOMIC_INIT_NOTIFIER_HEAD(name) do {	\
 		spin_lock_init(&(name)->lock);	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 #define BLOCKING_INIT_NOTIFIER_HEAD(name) do {	\
 		init_rwsem(&(name)->rwsem);	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 #define RAW_INIT_NOTIFIER_HEAD(name) do {	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 
 /* srcu_notifier_heads must be cleaned up dynamically */
@@ -97,17 +97,17 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
 
 #define ATOMIC_NOTIFIER_INIT(name) {				\
 		.lock = __SPIN_LOCK_UNLOCKED(name.lock),	\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 #define BLOCKING_NOTIFIER_INIT(name) {				\
 		.rwsem = __RWSEM_INITIALIZER((name).rwsem),	\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 #define RAW_NOTIFIER_INIT(name)	{				\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 
 #define SRCU_NOTIFIER_INIT(name, pcpu)				\
 	{							\
 		.mutex = __MUTEX_INITIALIZER(name.mutex),	\
-		.head = NULL,					\
+		.head = LIST_HEAD_INIT((name).head),					\
 		.srcuu = __SRCU_USAGE_INIT(name.srcuu),		\
 		.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu, 0), \
 	}
@@ -170,6 +170,8 @@ extern int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 		unsigned long val, void *v);
 extern int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
 		unsigned long val, void *v);
+extern int blocking_notifier_call_chain_reverse(struct blocking_notifier_head *nh,
+		unsigned long val, void *v);
 extern int raw_notifier_call_chain(struct raw_notifier_head *nh,
 		unsigned long val, void *v);
 extern int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 0b9495187fba..a26a7683d142 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -1054,7 +1054,6 @@ dbg_notify_reboot(struct notifier_block *this, unsigned long code, void *x)
 
 static struct notifier_block dbg_reboot_notifier = {
 	.notifier_call		= dbg_notify_reboot,
-	.next			= NULL,
 	.priority		= INT_MAX,
 };
 
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2f9fe7c30287..6f4d887771c4 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -14,39 +14,47 @@
  *	are layered on top of these, with appropriate locking added.
  */
 
-static int notifier_chain_register(struct notifier_block **nl,
+static int notifier_chain_register(struct list_head *nl,
 				   struct notifier_block *n,
 				   bool unique_priority)
 {
-	while ((*nl) != NULL) {
-		if (unlikely((*nl) == n)) {
+	struct notifier_block *cur;
+
+	list_for_each_entry(cur, nl, entry) {
+		if (unlikely(cur == n)) {
 			WARN(1, "notifier callback %ps already registered",
 			     n->notifier_call);
 			return -EEXIST;
 		}
-		if (n->priority > (*nl)->priority)
-			break;
-		if (n->priority == (*nl)->priority && unique_priority)
+
+		if (n->priority == cur->priority && unique_priority)
 			return -EBUSY;
-		nl = &((*nl)->next);
+
+		if (n->priority > cur->priority) {
+			list_add_tail(&n->entry, &cur->entry);
+			goto out;
+		}
 	}
-	n->next = *nl;
-	rcu_assign_pointer(*nl, n);
+
+	list_add_tail(&n->entry, nl);
+out:
 	trace_notifier_register((void *)n->notifier_call);
 	return 0;
 }
 
-static int notifier_chain_unregister(struct notifier_block **nl,
+static int notifier_chain_unregister(struct list_head *nl,
 		struct notifier_block *n)
 {
-	while ((*nl) != NULL) {
-		if ((*nl) == n) {
-			rcu_assign_pointer(*nl, n->next);
+	struct notifier_block *cur;
+
+	list_for_each_entry(cur, nl, entry) {
+		if (cur == n) {
+			list_del(&n->entry);
 			trace_notifier_unregister((void *)n->notifier_call);
 			return 0;
 		}
-		nl = &((*nl)->next);
 	}
+
 	return -ENOENT;
 }
 
@@ -59,25 +67,25 @@ static int notifier_chain_unregister(struct notifier_block **nl,
  *			value of this parameter is -1.
  *	@nr_calls:	Records the number of notifications sent. Don't care
  *			value of this field is NULL.
+ *	@last_nb:  Records the last called notifier block for rolling back
  *	Return:		notifier_call_chain returns the value returned by the
  *			last notifier function called.
  */
-static int notifier_call_chain(struct notifier_block **nl,
+static int notifier_call_chain(struct list_head *nl,
 			       unsigned long val, void *v,
-			       int nr_to_call, int *nr_calls)
+			       int nr_to_call, int *nr_calls,
+				   struct notifier_block **last_nb)
 {
 	int ret = NOTIFY_DONE;
-	struct notifier_block *nb, *next_nb;
-
-	nb = rcu_dereference_raw(*nl);
+	struct notifier_block *nb;
 
-	while (nb && nr_to_call) {
-		next_nb = rcu_dereference_raw(nb->next);
+	if (!nr_to_call)
+		return ret;
 
+	list_for_each_entry(nb, nl, entry) {
 #ifdef CONFIG_DEBUG_NOTIFIERS
 		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
 			WARN(1, "Invalid notifier called!");
-			nb = next_nb;
 			continue;
 		}
 #endif
@@ -87,15 +95,118 @@ static int notifier_call_chain(struct notifier_block **nl,
 		if (nr_calls)
 			(*nr_calls)++;
 
+		if (last_nb)
+			*last_nb = nb;
+
 		if (ret & NOTIFY_STOP_MASK)
 			break;
-		nb = next_nb;
-		nr_to_call--;
+
+		if (nr_to_call-- == 0)
+			break;
 	}
 	return ret;
 }
 NOKPROBE_SYMBOL(notifier_call_chain);
 
+/**
+ * notifier_call_chain_reverse - Informs the registered notifiers
+ *			about an event reversely.
+ *	@nl:		Pointer to head of the blocking notifier chain
+ *	@val:		Value passed unmodified to notifier function
+ *	@v:		Pointer passed unmodified to notifier function
+ *	@nr_to_call:	Number of notifier functions to be called. Don't care
+ *			value of this parameter is -1.
+ *	@nr_calls:	Records the number of notifications sent. Don't care
+ *			value of this field is NULL.
+ *	Return:		notifier_call_chain returns the value returned by the
+ *			last notifier function called.
+ */
+static int notifier_call_chain_reverse(struct list_head *nl,
+					struct notifier_block *start,
+					unsigned long val, void *v,
+					int nr_to_call, int *nr_calls)
+{
+	int ret = NOTIFY_DONE;
+	struct notifier_block *nb;
+	bool do_call = (start == NULL);
+
+	if (!nr_to_call)
+		return ret;
+
+	list_for_each_entry_reverse(nb, nl, entry) {
+		if (!do_call) {
+			if (nb == start)
+				do_call = true;
+			continue;
+		}
+#ifdef CONFIG_DEBUG_NOTIFIERS
+		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
+			WARN(1, "Invalid notifier called!");
+			continue;
+		}
+#endif
+		trace_notifier_run((void *)nb->notifier_call);
+		ret = nb->notifier_call(nb, val, v);
+
+		if (nr_calls)
+			(*nr_calls)++;
+
+		if (ret & NOTIFY_STOP_MASK)
+			break;
+
+		if (nr_to_call-- == 0)
+			break;
+	}
+	return ret;
+}
+NOKPROBE_SYMBOL(notifier_call_chain_reverse);
+
+/**
+ * notifier_call_chain_rcu - Informs the registered notifiers
+ *			about an event for srcu notifier chain.
+ *	@nl:		Pointer to head of the blocking notifier chain
+ *	@val:		Value passed unmodified to notifier function
+ *	@v:		Pointer passed unmodified to notifier function
+ *	@nr_to_call:	Number of notifier functions to be called. Don't care
+ *			value of this parameter is -1.
+ *	@nr_calls:	Records the number of notifications sent. Don't care
+ *			value of this field is NULL.
+ *	Return:		notifier_call_chain returns the value returned by the
+ *			last notifier function called.
+ */
+static int notifier_call_chain_rcu(struct list_head *nl,
+			       unsigned long val, void *v,
+			       int nr_to_call, int *nr_calls)
+{
+	int ret = NOTIFY_DONE;
+	struct notifier_block *nb;
+
+	if (!nr_to_call)
+		return ret;
+
+	list_for_each_entry_rcu(nb, nl, entry) {
+#ifdef CONFIG_DEBUG_NOTIFIERS
+		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
+			WARN(1, "Invalid notifier called!");
+			continue;
+		}
+#endif
+		trace_notifier_run((void *)nb->notifier_call);
+		ret = nb->notifier_call(nb, val, v);
+
+		if (nr_calls)
+			(*nr_calls)++;
+
+		if (ret & NOTIFY_STOP_MASK)
+			break;
+
+		if (nr_to_call-- == 0)
+			break;
+	}
+	return ret;
+}
+NOKPROBE_SYMBOL(notifier_call_chain_rcu);
+
 /**
  * notifier_call_chain_robust - Inform the registered notifiers about an event
  *                              and rollback on error.
@@ -111,15 +222,16 @@ NOKPROBE_SYMBOL(notifier_call_chain);
  *
  * Return:	the return value of the @val_up call.
  */
-static int notifier_call_chain_robust(struct notifier_block **nl,
+static int notifier_call_chain_robust(struct list_head *nl,
 				     unsigned long val_up, unsigned long val_down,
 				     void *v)
 {
 	int ret, nr = 0;
+	struct notifier_block *last_nb = NULL;
 
-	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
+	ret = notifier_call_chain(nl, val_up, v, -1, &nr, &last_nb);
 	if (ret & NOTIFY_STOP_MASK)
-		notifier_call_chain(nl, val_down, v, nr-1, NULL);
+		notifier_call_chain_reverse(nl, last_nb, val_down, v, nr-1, NULL);
 
 	return ret;
 }
@@ -220,7 +332,7 @@ int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 	int ret;
 
 	rcu_read_lock();
-	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+	ret = notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 	rcu_read_unlock();
 
 	return ret;
@@ -238,7 +350,7 @@ NOKPROBE_SYMBOL(atomic_notifier_call_chain);
  */
 bool atomic_notifier_call_chain_is_empty(struct atomic_notifier_head *nh)
 {
-	return !rcu_access_pointer(nh->head);
+	return list_empty(&nh->head);
 }
 
 /*
@@ -340,7 +452,7 @@ int blocking_notifier_call_chain_robust(struct blocking_notifier_head *nh,
 	 * racy then it does not matter what the result of the test
 	 * is, we re-check the list after having taken the lock anyway:
 	 */
-	if (rcu_access_pointer(nh->head)) {
+	if (!list_empty(&nh->head)) {
 		down_read(&nh->rwsem);
 		ret = notifier_call_chain_robust(&nh->head, val_up, val_down, v);
 		up_read(&nh->rwsem);
@@ -375,15 +487,52 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
 	 * racy then it does not matter what the result of the test
 	 * is, we re-check the list after having taken the lock anyway:
 	 */
-	if (rcu_access_pointer(nh->head)) {
+	if (!list_empty(&nh->head)) {
 		down_read(&nh->rwsem);
-		ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+		ret = notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 		up_read(&nh->rwsem);
 	}
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blocking_notifier_call_chain);
 
+/**
+ *	blocking_notifier_call_chain_reverse - Call functions reversely in
+ *				a blocking notifier chain
+ *	@nh: Pointer to head of the blocking notifier chain
+ *	@val: Value passed unmodified to notifier function
+ *	@v: Pointer passed unmodified to notifier function
+ *
+ *	Calls each function in a notifier chain in turn.  The functions
+ *	run in a process context, so they are allowed to block.
+ *
+ *	If the return value of the notifier can be and'ed
+ *	with %NOTIFY_STOP_MASK then blocking_notifier_call_chain()
+ *	will return immediately, with the return value of
+ *	the notifier function which halted execution.
+ *	Otherwise the return value is the return value
+ *	of the last notifier function called.
+ */
+
+int blocking_notifier_call_chain_reverse(struct blocking_notifier_head *nh,
+		unsigned long val, void *v)
+{
+	int ret = NOTIFY_DONE;
+
+	/*
+	 * We check the head outside the lock, but if this access is
+	 * racy then it does not matter what the result of the test
+	 * is, we re-check the list after having taken the lock anyway:
+	 */
+	if (!list_empty(&nh->head)) {
+		down_read(&nh->rwsem);
+		ret = notifier_call_chain_reverse(&nh->head, NULL, val, v, -1, NULL);
+		up_read(&nh->rwsem);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_reverse);
+
 /*
  *	Raw notifier chain routines.  There is no protection;
  *	the caller must provide it.  Use at your own risk!
@@ -450,7 +599,7 @@ EXPORT_SYMBOL_GPL(raw_notifier_call_chain_robust);
 int raw_notifier_call_chain(struct raw_notifier_head *nh,
 		unsigned long val, void *v)
 {
-	return notifier_call_chain(&nh->head, val, v, -1, NULL);
+	return notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(raw_notifier_call_chain);
 
@@ -543,7 +692,7 @@ int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
 	int idx;
 
 	idx = srcu_read_lock(&nh->srcu);
-	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+	ret = notifier_call_chain_rcu(&nh->head, val, v, -1, NULL);
 	srcu_read_unlock(&nh->srcu, idx);
 	return ret;
 }
@@ -566,7 +715,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head *nh)
 	mutex_init(&nh->mutex);
 	if (init_srcu_struct(&nh->srcu) < 0)
 		BUG();
-	nh->head = NULL;
+	INIT_LIST_HEAD(&nh->head);
 }
 EXPORT_SYMBOL_GPL(srcu_init_notifier_head);
 
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index c942f1282236..0afcba2967c7 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -90,7 +90,7 @@ static const struct nla_policy rtm_nh_res_bucket_policy_get[] = {
 
 static bool nexthop_notifiers_is_empty(struct net *net)
 {
-	return !net->nexthop.notifier_chain.head;
+	return list_empty(&net->nexthop.notifier_chain.head);
 }
 
 static void
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox