Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] RDS: Fix memory leak in rds_rdma_extra_size()
From: Allison Henderson @ 2026-04-13  4:18 UTC (permalink / raw)
  To: Xiaobo Liu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20260412124455.2008-1-cppcoffee@gmail.com>

On Sun, 2026-04-12 at 20:44 +0800, Xiaobo Liu wrote:
> Free iov->iov when copy_from_user() or page count validation fails in rds_rdma_extra_size().
> 
> This preserves the existing success path and avoids leaking the allocated iovec array on error.

Hi Xiaobo,

Thanks for catching this.  The fix itself looks correct, but it will need your
Signed-off-by line.  Also be sure to note the target tree and subsystem in the subject
line like this "[PATCH net v2] net/rds: Fix memory leak in rds_rdma_extra_size()", and
make sure the commit message wraps at about 72 characters.  Other than that I think
the patch looks good.

Thank you!
Allison

> ---
>  net/rds/rdma.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index aa6465dc7..91a20c1e2 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -560,6 +560,7 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	struct rds_iovec *vec;
>  	struct rds_iovec __user *local_vec;
>  	int tot_pages = 0;
> +	int ret = 0;
>  	unsigned int nr_pages;
>  	unsigned int i;
>  
> @@ -578,16 +579,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	vec = &iov->iov[0];
>  
>  	if (copy_from_user(vec, local_vec, args->nr_local *
> -			   sizeof(struct rds_iovec)))
> -		return -EFAULT;
> +			   sizeof(struct rds_iovec))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
>  	iov->len = args->nr_local;
>  
>  	/* figure out the number of pages in the vector */
>  	for (i = 0; i < args->nr_local; i++, vec++) {
>  
>  		nr_pages = rds_pages_in_vec(vec);
> -		if (nr_pages == 0)
> -			return -EINVAL;
> +		if (nr_pages == 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  
>  		tot_pages += nr_pages;
>  
> @@ -595,11 +600,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
>  		 * so tot_pages cannot overflow without first going negative.
>  		 */
> -		if (tot_pages < 0)
> -			return -EINVAL;
> +		if (tot_pages < 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  	}
>  
> -	return tot_pages * sizeof(struct scatterlist);
> +	ret = tot_pages * sizeof(struct scatterlist);
> +
> +out:
> +	if (ret < 0) {
> +		kfree(iov->iov);
> +		iov->iov = NULL;
> +	}
> +	return ret;
>  }
>  
>  /*


^ permalink raw reply

* [PATCH v6.6] netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
From: Keerthana K @ 2026-04-13  4:32 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, kadlec, fw, davem, edumazet, kuba, pabeni, netfilter-devel,
	coreteam, netdev, linux-kernel, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu,
	Stefano Brivio, Mukul Sikka, Brennan Lamoreaux, Keerthana K

From: Florian Westphal <fw@strlen.de>

commit 07ace0bbe03b3d8e85869af1dec5e4087b1d57b8 upstream

pipapo relies on kmalloc(0) returning ZERO_SIZE_PTR (i.e., not NULL
but pointer is invalid).

Rework this to not call slab allocator when we'd request a 0-byte
allocation.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mukul Sikka <mukul.sikka@broadcom.com>
Signed-off-by: Brennan Lamoreaux <brennan.lamoreaux@broadcom.com>
[Keerthana: In older stable branches (v6.6 and earlier), the allocation logic in
pipapo_clone() still relies on `src->rules` rather than `src->rules_alloc`
(introduced in v6.9 via 9f439bd6ef4f). Consequently, the previously
backported INT_MAX clamping check uses `src->rules`. This patch correctly
moves that `src->rules > (INT_MAX / ...)` check inside the new
`if (src->rules > 0)` block]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
---
 net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index c3ada6798d4a..98cdeb9fa210 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -525,6 +525,8 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 	int i;
 
 	m = priv->clone;
+	if (m->bsize_max == 0)
+		return ret;
 
 	res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), GFP_ATOMIC);
 	if (!res_map) {
@@ -1394,14 +1396,20 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 		       src->bsize * sizeof(*dst->lt) *
 		       src->groups * NFT_PIPAPO_BUCKETS(src->bb));
 
-		if (src->rules > (INT_MAX / sizeof(*src->mt)))
-			goto out_mt;
+		if (src->rules > 0) {
+			if (src->rules > (INT_MAX / sizeof(*src->mt)))
+				goto out_mt;
 
-		dst->mt = kvmalloc(src->rules * sizeof(*src->mt), GFP_KERNEL_ACCOUNT);
-		if (!dst->mt)
-			goto out_mt;
+			dst->mt = kvmalloc_array(src->rules, sizeof(*src->mt),
+						 GFP_KERNEL);
+			if (!dst->mt)
+				goto out_mt;
+
+			memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
+		} else {
+			dst->mt = NULL;
+		}
 
-		memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
 		src++;
 		dst++;
 	}
-- 
2.43.7


^ permalink raw reply related

* [PATCH v5.15-v6.1] netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
From: Keerthana K @ 2026-04-13  4:32 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, kadlec, fw, davem, edumazet, kuba, pabeni, netfilter-devel,
	coreteam, netdev, linux-kernel, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu,
	Stefano Brivio, Mukul Sikka, Brennan Lamoreaux, Keerthana K

From: Florian Westphal <fw@strlen.de>

commit 07ace0bbe03b3d8e85869af1dec5e4087b1d57b8 upstream

pipapo relies on kmalloc(0) returning ZERO_SIZE_PTR (i.e., not NULL
but pointer is invalid).

Rework this to not call slab allocator when we'd request a 0-byte
allocation.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mukul Sikka <mukul.sikka@broadcom.com>
Signed-off-by: Brennan Lamoreaux <brennan.lamoreaux@broadcom.com>
[Keerthana: In older stable branches (v6.6 and earlier), the allocation logic in
pipapo_clone() still relies on `src->rules` rather than `src->rules_alloc`
(introduced in v6.9 via 9f439bd6ef4f). Consequently, the previously
backported INT_MAX clamping check uses `src->rules`. This patch correctly
moves that `src->rules > (INT_MAX / ...)` check inside the new
`if (src->rules > 0)` block]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
---
 net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 863162c82330..2072c89a467d 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -525,6 +525,8 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 	int i;
 
 	m = priv->clone;
+	if (m->bsize_max == 0)
+		return ret;
 
 	res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), GFP_ATOMIC);
 	if (!res_map) {
@@ -1365,14 +1367,20 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 		       src->bsize * sizeof(*dst->lt) *
 		       src->groups * NFT_PIPAPO_BUCKETS(src->bb));
 
-		if (src->rules > (INT_MAX / sizeof(*src->mt)))
-			goto out_mt;
+		if (src->rules > 0) {
+			if (src->rules > (INT_MAX / sizeof(*src->mt)))
+				goto out_mt;
+
+			dst->mt = kvmalloc_array(src->rules, sizeof(*src->mt),
+						 GFP_KERNEL);
+			if (!dst->mt)
+				goto out_mt;
 
-		dst->mt = kvmalloc(src->rules * sizeof(*src->mt), GFP_KERNEL_ACCOUNT);
-		if (!dst->mt)
-			goto out_mt;
+			memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
+		} else {
+			dst->mt = NULL;
+		}
 
-		memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
 		src++;
 		dst++;
 	}
-- 
2.43.7


^ permalink raw reply related

* [PATCH v5.10] netfilter: nft_set_pipapo: do not rely on ZERO_SIZE_PTR
From: Keerthana K @ 2026-04-13  4:33 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, kadlec, fw, davem, edumazet, kuba, pabeni, netfilter-devel,
	coreteam, netdev, linux-kernel, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu,
	Stefano Brivio, Mukul Sikka, Brennan Lamoreaux, Keerthana K

From: Florian Westphal <fw@strlen.de>

commit 07ace0bbe03b3d8e85869af1dec5e4087b1d57b8 upstream

pipapo relies on kmalloc(0) returning ZERO_SIZE_PTR (i.e., not NULL
but pointer is invalid).

Rework this to not call slab allocator when we'd request a 0-byte
allocation.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mukul Sikka <mukul.sikka@broadcom.com>
Signed-off-by: Brennan Lamoreaux <brennan.lamoreaux@broadcom.com>
[Keerthana: In older stable branches (v6.6 and earlier), the allocation logic in
pipapo_clone() still relies on `src->rules` rather than `src->rules_alloc`
(introduced in v6.9 via 9f439bd6ef4f). Consequently, the previously
backported INT_MAX clamping check uses `src->rules`. This patch correctly
moves that `src->rules > (INT_MAX / ...)` check inside the new
`if (src->rules > 0)` block]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
---
 net/netfilter/nft_set_pipapo.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index a4fdd1587bb3..83606dfde033 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -524,6 +524,9 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 	struct nft_pipapo_field *f;
 	int i;
 
+	if (m->bsize_max == 0)
+		return ret;
+
 	res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), GFP_ATOMIC);
 	if (!res_map) {
 		ret = ERR_PTR(-ENOMEM);
@@ -1363,14 +1366,20 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 		       src->bsize * sizeof(*dst->lt) *
 		       src->groups * NFT_PIPAPO_BUCKETS(src->bb));
 
-		if (src->rules > (INT_MAX / sizeof(*src->mt)))
-			goto out_mt;
+		if (src->rules > 0) {
+			if (src->rules > (INT_MAX / sizeof(*src->mt)))
+				goto out_mt;
 
-		dst->mt = kvmalloc(src->rules * sizeof(*src->mt), GFP_KERNEL);
-		if (!dst->mt)
-			goto out_mt;
+			dst->mt = kvmalloc_array(src->rules, sizeof(*src->mt),
+						 GFP_KERNEL);
+			if (!dst->mt)
+				goto out_mt;
+
+			memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
+		} else {
+			dst->mt = NULL;
+		}
 
-		memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
 		src++;
 		dst++;
 	}
-- 
2.43.7


^ permalink raw reply related

* [PATCH net v2 0/4] net: mana: Fix probe/remove error path bugs
From: Erni Sri Satya Vennela @ 2026-04-13  5:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel

Fix four pre-existing bugs in mana_probe()/mana_remove() error handling
that can cause warnings on uninitialized work structs, masked errors,
and resource leaks when early probe steps fail.

Patches 1-2 move work struct initialization (link_change_work and
gf_stats_work) to before any error path that could trigger
mana_remove(), preventing WARN_ON in __flush_work() or debug object
warnings when sync cancellation runs on uninitialized work structs.

Patch 3 prevents add_adev() from overwriting a port probe error,
which could leave the driver in a broken state with NULL ports while
reporting success.

Patch 4 changes 'goto out' to 'break' in mana_remove()'s port loop
so that mana_destroy_eq() is always reached, preventing EQ leaks when
a NULL port is encountered.
---
Changes in v2:
* Apply the patch in net instead of net-next.
---
Erni Sri Satya Vennela (4):
  net: mana: Init link_change_work before potential error paths in probe
  net: mana: Init gf_stats_work before potential error paths in probe
  net: mana: Don't overwrite port probe error with add_adev result
  net: mana: Fix EQ leak in mana_remove on NULL port

 drivers/net/ethernet/microsoft/mana/mana_en.c | 28 +++++++++----------
 1 file changed, 14 insertions(+), 14 deletions(-)

-- 
2.34.1

^ permalink raw reply

* [PATCH net v2 1/4] net: mana: Init link_change_work before potential error paths in probe
From: Erni Sri Satya Vennela @ 2026-04-13  5:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-1-ernis@linux.microsoft.com>

Move INIT_WORK(link_change_work) to right after the mana_context
allocation, before any error path that could reach mana_remove().

Previously, if mana_create_eq() or mana_query_device_cfg() failed,
mana_probe() would jump to the error path which calls mana_remove().
mana_remove() unconditionally calls disable_work_sync(link_change_work),
but the work struct had not been initialized yet. This can trigger
CONFIG_DEBUG_OBJECTS_WORK enabled.

Fixes: 54133f9b4b53 ("net: mana: Support HW link state events")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 07630322545f..57f146ea6f66 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3631,6 +3631,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 		ac->gdma_dev = gd;
 		gd->driver_data = ac;
+
+		INIT_WORK(&ac->link_change_work, mana_link_state_handle);
 	}
 
 	err = mana_create_eq(ac);
@@ -3648,8 +3650,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 	if (!resuming) {
 		ac->num_ports = num_ports;
-
-		INIT_WORK(&ac->link_change_work, mana_link_state_handle);
 	} else {
 		if (ac->num_ports != num_ports) {
 			dev_err(dev, "The number of vPorts changed: %d->%d\n",
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 2/4] net: mana: Init gf_stats_work before potential error paths in probe
From: Erni Sri Satya Vennela @ 2026-04-13  5:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-1-ernis@linux.microsoft.com>

Move INIT_DELAYED_WORK(gf_stats_work) to before mana_create_eq(),
while keeping schedule_delayed_work() at its original location.

Previously, if any function between mana_create_eq() and the
INIT_DELAYED_WORK call failed, mana_probe() would call mana_remove()
which unconditionally calls cancel_delayed_work_sync(gf_stats_work)
in __flush_work() or debug object warnings with
CONFIG_DEBUG_OBJECTS_WORK enabled.

Fixes: be4f1d67ec56 ("net: mana: Add standard counter rx_missed_errors")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 57f146ea6f66..f6ad46736418 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3635,6 +3635,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		INIT_WORK(&ac->link_change_work, mana_link_state_handle);
 	}
 
+	INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
+
 	err = mana_create_eq(ac);
 	if (err) {
 		dev_err(dev, "Failed to create EQs: %d\n", err);
@@ -3709,7 +3711,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 	err = add_adev(gd, "eth");
 
-	INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
 	schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
 
 out:
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 3/4] net: mana: Don't overwrite port probe error with add_adev result
From: Erni Sri Satya Vennela @ 2026-04-13  5:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-1-ernis@linux.microsoft.com>

In mana_probe(), if mana_probe_port() fails for any port, the error
is stored in 'err' and the loop breaks. However, the subsequent
unconditional 'err = add_adev(gd, "eth")' overwrites this error.
If add_adev() succeeds, mana_probe() returns success despite ports
being left in a partially initialized state (ac->ports[i] == NULL).

Only call add_adev() when there is no prior error, so the probe
correctly fails and triggers mana_remove() cleanup.

Fixes: ced82fce77e9 ("net: mana: Probe rdma device in mana driver")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index f6ad46736418..1a141c46ac27 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3680,10 +3680,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 	if (!resuming) {
 		for (i = 0; i < ac->num_ports; i++) {
 			err = mana_probe_port(ac, i, &ac->ports[i]);
-			/* we log the port for which the probe failed and stop
-			 * probes for subsequent ports.
-			 * Note that we keep running ports, for which the probes
-			 * were successful, unless add_adev fails too
+			/* Log the port for which the probe failed, stop probing
+			 * subsequent ports, and skip add_adev.
+			 * Already-probed ports remain functional.
 			 */
 			if (err) {
 				dev_err(dev, "Probe Failed for port %d\n", i);
@@ -3697,10 +3696,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 			enable_work(&apc->queue_reset_work);
 			err = mana_attach(ac->ports[i]);
 			rtnl_unlock();
-			/* we log the port for which the attach failed and stop
-			 * attach for subsequent ports
-			 * Note that we keep running ports, for which the attach
-			 * were successful, unless add_adev fails too
+			/* Log the port for which the attach failed, stop
+			 * attaching subsequent ports, and skip add_adev.
+			 * Already-attached ports remain functional.
 			 */
 			if (err) {
 				dev_err(dev, "Attach Failed for port %d\n", i);
@@ -3709,7 +3707,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		}
 	}
 
-	err = add_adev(gd, "eth");
+	if (!err)
+		err = add_adev(gd, "eth");
 
 	schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 4/4] net: mana: Fix EQ leak in mana_remove on NULL port
From: Erni Sri Satya Vennela @ 2026-04-13  5:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260413050843.605789-1-ernis@linux.microsoft.com>

In mana_remove(), when a NULL port is encountered in the port iteration
loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
queues allocated earlier by mana_create_eq().

This can happen when mana_probe_port() fails for port 0, leaving
ac->ports[0] as NULL. On driver unload or error cleanup, mana_remove()
hits the NULL entry and jumps past mana_destroy_eq().

Change 'goto out' to 'break' so the for-loop exits normally and
mana_destroy_eq() is always reached. Remove the now-unreferenced out:
label.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 1a141c46ac27..97237d137cbf 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3747,7 +3747,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		if (!ndev) {
 			if (i == 0)
 				dev_err(dev, "No net device to remove\n");
-			goto out;
+			break;
 		}
 
 		apc = netdev_priv(ndev);
@@ -3778,7 +3778,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 	}
 
 	mana_destroy_eq(ac);
-out:
+
 	if (ac->per_port_queue_reset_wq) {
 		destroy_workqueue(ac->per_port_queue_reset_wq);
 		ac->per_port_queue_reset_wq = NULL;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net] bnge: return after auxiliary_device_uninit() in error path
From: Vikas Gupta @ 2026-04-13  5:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, linux-kernel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, stable
In-Reply-To: <2026041124-banshee-molecular-0f70@gregkh>

On Sat, Apr 11, 2026 at 4:15 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> When auxiliary_device_add() fails, the error block calls
> auxiliary_device_uninit() but does not return.  The uninit drops the
> last reference and synchronously runs bnge_aux_dev_release(), which sets
> bd->auxr_dev = NULL and frees the underlying object.  The subsequent
> bd->auxr_dev->net = bd->netdev then dereferences NULL, which is not a
> good thing to have happen when trying to clean up from an error.
>
> Add the missing return, as the auxiliary bus documentation states is a
> requirement (seems that LLM tools read documentation better than humans
> do...)
>
> Cc: Vikas Gupta <vikas.gupta@broadcom.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Fixes: 8ac050ec3b1c ("bng_en: Add RoCE aux device support")
> Cc: stable <stable@kernel.org>
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>

> ---
>  drivers/net/ethernet/broadcom/bnge/bnge_auxr.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_auxr.c b/drivers/net/ethernet/broadcom/bnge/bnge_auxr.c
> index b942076762ef..67e93e17d4d9 100644
> --- a/drivers/net/ethernet/broadcom/bnge/bnge_auxr.c
> +++ b/drivers/net/ethernet/broadcom/bnge/bnge_auxr.c
> @@ -194,6 +194,7 @@ void bnge_rdma_aux_device_add(struct bnge_dev *bd)
>                 dev_warn(bd->dev, "Failed to add auxiliary device for ROCE\n");
>                 auxiliary_device_uninit(aux_dev);
>                 bd->flags &= ~BNGE_EN_ROCE;
> +               return;
>         }
>
>         bd->auxr_dev->net = bd->netdev;
> --
> 2.53.0
>

^ permalink raw reply

* Re: [PATCH v2 0/6] bus: mhi: host: mhi_phc: Add support for PHC over MHI
From: Manivannan Sadhasivam @ 2026-04-13  5:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Krishna Chaitanya Chundru, Richard Cochran, mhi, linux-arm-msm,
	linux-kernel, netdev, Vivek Pernamitta, Sivareddy Surasani,
	Vivek Pernamitta, Imran Shaik, Taniya Das
In-Reply-To: <20260412080951.723295fa@kernel.org>

On Sun, Apr 12, 2026 at 08:09:51AM -0700, Jakub Kicinski wrote:
> On Sat, 11 Apr 2026 13:42:00 +0530 Krishna Chaitanya Chundru wrote:
> > - User space applications use the standard Linux PTP interface.
> > - The PTP subsystem routes IOCTLs to the MHI PHC driver.
> > - The MHI PHC driver communicates with the MHI core to fetch timestamps.
> > - The MHI core interacts with the device to retrieve accurate time data.

This is a misleading statement. Only the 'pci_generic' controller driver interacts
with the device for querying timestamp. MHI bus just acts as a messenger.

> 
> Nack, stop adding functionality under the mhi "bus".
> Bus is supposed to be an abstraction into which real drivers plug in.

MHI bus is very similar to the PCI bus. Just like PCI capabilities, MHI also has
capabilities to discover the supported functionalities including timesync. So
for making use of the timesync feature, we need to add some hooks into the MHI
bus layer, but the functionality is added to the separate driver,
'drivers/bus/mhi/host/mhi_phc*'. This is also quite similar to how PCI(e)
features like AER, hotplug are structured.

In this series, timesync API definitions are added mistakenly to the core bus
code, which should be moved to the mhi_phc driver instead.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply

* Re: [patch 28/38] mips: Select ARCH_HAS_RANDOM_ENTROPY
From: Maciej W. Rozycki @ 2026-04-13  5:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Arnd Bergmann, x86, Lu Baolu, iommu, Michael Grzeschik,
	netdev, linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
	linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
	Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
	Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
	Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
	linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
	Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
	Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
	linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120319.462206386@kernel.org>

On Fri, 10 Apr 2026, Thomas Gleixner wrote:

> The only solution for now is to uninline random_get_entropy().  Fix up all
> other dependencies on the content of asm/timex.h in those files which
> really depend on it.

 Oh dear!  I'd yet have to fully evaluate the consequences, but offhand 
this has clearly turned what compiles to a single CPU instruction on the 
vast majority of MIPS platforms into an expensive function call, possibly 
also changing the caller from a leaf to a nested function with all the 
associated execution penalty.  Is there no other way?

 Cf. commit 06947aaaf9bf ("MIPS: Implement random_get_entropy with CP0 
Random").

  Maciej

^ permalink raw reply

* Re: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
From: Ruide Cao @ 2026-04-13  6:01 UTC (permalink / raw)
  To: Tung Quang Nguyen, Ren Wei
  Cc: jmaloy@redhat.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	yifanwucs@gmail.com, tomapufckgml@gmail.com, yuantan098@gmail.com,
	bird@lzu.edu.cn, enjou1224z@gmail.com, netdev@vger.kernel.org
In-Reply-To: <GV1P189MB1988963A155568005A37BF7FC6242@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>


On 4/12/2026 8:06 PM, Tung Quang Nguyen wrote:
>> Subject: [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
>>
>> From: Ruide Cao <caoruide123@gmail.com>
>>
>> tipc_get_gap_ack_blks() reads len, ugack_cnt and bgack_cnt directly from
>> msg_data(hdr) before verifying that a STATE message actually contains the
>> fixed Gap ACK block header in its logical data area.
>>
>> A peer that negotiates TIPC_GAP_ACK_BLOCK can send a short STATE message
>> with a declared TIPC payload shorter than struct tipc_gap_ack_blks and still
>> append a few physical bytes after the header. The helper then trusts those
>> bytes as Gap ACK metadata, and the forged bgack_cnt/len values can drive the
>> broadcast receive path into kmemdup() beyond the skb boundary.
> Can you explain how that peer can alter the STATE message ? If it can, what concrete values are used  and on what fields of the STATE messages ?

Thanks for the review.

To clarify, the peer is not "altering" an already received STATE
message; it is actively sending a malformed LINK_PROTOCOL/STATE_MSG
after the link has already negotiated the TIPC_GAP_ACK_BLOCK capability.

Concretely, the crafted STATE message is sent with a modified msg_size
so that msg_data_sz(hdr) is 0, but the actual UDP payload still carries
extra physical bytes after the 40-byte TIPC header. Those bytes are then
interpreted as the fixed Gap ACK header. For example:
  len       = 0x07fc
  ugack_cnt = 0xff
  bgack_cnt = 0xff

These values are specifically chosen so that the existing sanity check
remains internally consistent:
  struct_size(p, gacks, 0xff + 0xff) == 0x07fc

Therefore, the existing sanity check does not reject this case. It only
checks the self-consistency of the attacker-controlled Gap ACK fields;
it completely fails to check if the declared Gap ACK record actually
fits inside the enclosing STATE message's logical payload length.

>> Fix this by rejecting Gap ACK parsing unless the logical STATE payload is large
>> enough to cover the fixed header, and by rejecting declared Gap ACK lengths
>> that are smaller than the fixed header or larger than the logical payload.
>> Return 0 for invalid lengths so malformed Gap ACK data is not treated as a
>> valid payload offset, and drop unicast STATE messages that advertise Gap ACK
>> support but still yield an invalid Gap ACK length. This keeps malformed Gap
>> ACK data ignored without misaligning monitor payload parsing.
>>
>> Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
>> Cc: stable@kernel.org
>> Reported-by: Yifan Wu <yifanwucs@gmail.com>
>> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
>> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
>> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
>> Suggested-by: Xin Liu <bird@lzu.edu.cn>
>> Tested-by: Ren Wei <enjou1224z@gmail.com>
>> Signed-off-by: Ruide Cao <caoruide123@gmail.com>
>> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
>> ---
>> net/tipc/link.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/tipc/link.c b/net/tipc/link.c index 49dfc098d89b..44678d98939a
>> 100644
>> --- a/net/tipc/link.c
>> +++ b/net/tipc/link.c
>> @@ -1415,12 +1415,22 @@ u16 tipc_get_gap_ack_blks(struct
>> tipc_gap_ack_blks **ga, struct tipc_link *l,
>> 			  struct tipc_msg *hdr, bool uc)
>> {
>> 	struct tipc_gap_ack_blks *p;
>> -	u16 sz = 0;
>> +	u16 sz = 0, dlen = msg_data_sz(hdr);
>>
>> 	/* Does peer support the Gap ACK blocks feature? */
>> 	if (l->peer_caps & TIPC_GAP_ACK_BLOCK) {
>> +		u16 min_sz = struct_size(p, gacks, 0);
>> +
>> +		if (dlen < min_sz)
>> +			goto ignore;
> This checking is redundant because with existing sanity checking, the invalid gap ACK blocks will not be used to release acked messages in transmit queue.

The `dlen < min_sz` check is required because the existing sanity check
already dereferences `p->len`, `p->ugack_cnt`, and `p->bgack_cnt`.
Without this new check, an Out-of-Bounds (OOB) read occurs before the
old sanity check even has a chance to run.

>> +
>> 		p = (struct tipc_gap_ack_blks *)msg_data(hdr);
>> 		sz = ntohs(p->len);
>> +		if (sz < min_sz || sz > dlen) {
>> +			sz = 0;
>> +			goto ignore;
>> +		}
> This checking is redundant. Existing sanity checking is good enough.

The `sz < min_sz || sz > dlen` check is not redundant because the old
sanity check completely fails to verify if the declared Gap ACK length
(`sz`) actually fits inside the enclosing STATE message's logical
payload length (`dlen`).

Without checking against `dlen`, an internally consistent spoofed packet
will pass the old check and cause OOB reads during the subsequent block
parsing.

>> +
>> 		/* Sanity check */
>> 		if (sz == struct_size(p, gacks, size_add(p->ugack_cnt, p-
>>> bgack_cnt))) {
>> 			/* Good, check if the desired type exists */ @@ -
>> 1434,6 +1444,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga,
>> struct tipc_link *l,
>> 			}
>> 		}
>> 	}
>> +
>> +ignore:
>> 	/* Other cases: ignore! */
>> 	p = NULL;
>>
>> @@ -2270,7 +2282,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct
>> sk_buff *skb,
>> 	case STATE_MSG:
>> 		/* Validate Gap ACK blocks, drop if invalid */
>> 		glen = tipc_get_gap_ack_blks(&ga, l, hdr, true);
>> -		if (glen > dlen)
>> +		if (glen > dlen || ((l->peer_caps & TIPC_GAP_ACK_BLOCK) &&
>> !glen))
> This checking is redundant. Existing sanity checking is good enough.

The unicast caller-side drop `((l->peer_caps & TIPC_GAP_ACK_BLOCK) &&
!glen)` is also necessary. Once the capability is negotiated, a valid
Gap ACK record MUST have at least the fixed 4-byte header. If `glen ==
0` from such a peer, it indicates a malformed payload. 

The STATE message must be dropped here so it is not passed on to
`tipc_mon_rcv()` as if monitor data started at `data + 0`, which would
misalign the monitor payload parsing.

>> 			break;
>>
>> 		l->rcv_nxt_state = msg_seqno(hdr) + 1;
>> --
>> 2.34.1
>>

^ permalink raw reply

* Re: [PATCH v2 2/6] bus: mhi: host: Add support for non-posted TSC timesync feature
From: Manivannan Sadhasivam @ 2026-04-13  6:42 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: Richard Cochran, mhi, linux-arm-msm, linux-kernel, netdev,
	Vivek Pernamitta
In-Reply-To: <20260411-tsc_timesync-v2-2-6f25f72987b3@oss.qualcomm.com>

On Sat, Apr 11, 2026 at 01:42:02PM +0530, Krishna Chaitanya Chundru wrote:
> From: Vivek Pernamitta <quic_vpernami@quicinc.com>
> 
> Implement non-posted time synchronization as described in section 5.1.1
> of the MHI v1.2 specification. The host disables low-power link states
> to minimize latency, reads the local time, issues a MMIO read to the
> device's TIME register.
> 
> Add support for initializing this feature and export a function to be
> used by the drivers which does the time synchronization.
> 
> MHI reads the device time registers in the MMIO address space pointed to
> by the capability register after disabling all low power modes and keeping
> MHI in M0. Before and after MHI reads, the local time is captured
> and shared for processing.
> 
> Signed-off-by: Vivek Pernamitta <quic_vpernami@quicinc.com>
> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---
>  drivers/bus/mhi/common.h        |  4 +++
>  drivers/bus/mhi/host/init.c     | 28 ++++++++++++++++
>  drivers/bus/mhi/host/internal.h |  9 +++++
>  drivers/bus/mhi/host/main.c     | 74 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/mhi.h             | 37 +++++++++++++++++++++
>  5 files changed, 152 insertions(+)
> 
> diff --git a/drivers/bus/mhi/common.h b/drivers/bus/mhi/common.h
> index 4c316f3d5a68beb01f15cf575b03747096fdcf2c..64f9b2b94387a112bb6b5e20c634c3ba8d6bc78e 100644
> --- a/drivers/bus/mhi/common.h
> +++ b/drivers/bus/mhi/common.h
> @@ -118,6 +118,10 @@
>  #define CAP_CAPID_MASK			GENMASK(31, 24)
>  #define CAP_NEXT_CAP_MASK		GENMASK(23, 12)
>  
> +/* MHI TSC Timesync */
> +#define TSC_TIMESYNC_TIME_LOW_OFFSET	(0x8)
> +#define TSC_TIMESYNC_TIME_HIGH_OFFSET	(0xC)
> +
>  /* Command Ring Element macros */
>  /* No operation command */
>  #define MHI_TRE_CMD_NOOP_PTR		0
> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
> index c2162aa04e810e45ccfbedd20aaa62f892420d31..eb720f671726d919646cbc450cd54bda655a1060 100644
> --- a/drivers/bus/mhi/host/init.c
> +++ b/drivers/bus/mhi/host/init.c
> @@ -498,6 +498,30 @@ static int mhi_find_capability(struct mhi_controller *mhi_cntrl, u32 capability)
>  	return 0;
>  }
>  
> +static int mhi_init_tsc_timesync(struct mhi_controller *mhi_cntrl)
> +{
> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	struct mhi_timesync *mhi_tsc_tsync;
> +	u32 time_offset;
> +	int ret;
> +
> +	time_offset = mhi_find_capability(mhi_cntrl, MHI_CAP_ID_TSC_TIME_SYNC);
> +	if (!time_offset)
> +		return -ENXIO;
> +
> +	mhi_tsc_tsync = devm_kzalloc(dev, sizeof(*mhi_tsc_tsync), GFP_KERNEL);
> +	if (!mhi_tsc_tsync)
> +		return -ENOMEM;
> +
> +	mhi_cntrl->tsc_timesync = mhi_tsc_tsync;
> +	mutex_init(&mhi_tsc_tsync->ts_mutex);
> +
> +	/* save time_offset for obtaining time via MMIO register reads */
> +	mhi_tsc_tsync->time_reg = mhi_cntrl->regs + time_offset;
> +
> +	return 0;
> +}

Move all the function definitions from MHI core code to mhi_phc.c.

> +
>  int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
>  {
>  	u32 val;
> @@ -635,6 +659,10 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
>  		return ret;
>  	}
>  
> +	ret = mhi_init_tsc_timesync(mhi_cntrl);
> +	if (ret)
> +		dev_dbg(dev, "TSC Time synchronization init failure\n");

And just keep this call in the MHI core.

> +
>  	return 0;
>  }
>  
> diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h
> index 7b0ee5e3a12dd585064169b7b884750bf4d8c8db..a0e729e7a1198c1b82c70b6bfe3bc2ee24331229 100644
> --- a/drivers/bus/mhi/host/internal.h
> +++ b/drivers/bus/mhi/host/internal.h
> @@ -15,6 +15,15 @@ extern const struct bus_type mhi_bus_type;
>  #define MHI_SOC_RESET_REQ_OFFSET			0xb0
>  #define MHI_SOC_RESET_REQ				BIT(0)
>  
> +/*
> + * With ASPM enabled, the link may enter a low power state, requiring
> + * a wake-up sequence. Use a short burst of back-to-back reads to
> + * transition the link to the active state. Based on testing,
> + * 4 iterations are necessary to ensure reliable wake-up without
> + * excess latency.
> + */
> +#define MHI_NUM_BACK_TO_BACK_READS			4
> +
>  struct mhi_ctxt {
>  	struct mhi_event_ctxt *er_ctxt;
>  	struct mhi_chan_ctxt *chan_ctxt;
> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> index 53c0ffe300702bcc3caa8fd9ea8086203c75b186..b7a727b1a5d1f20b570c62707a991ec5b85bfec7 100644
> --- a/drivers/bus/mhi/host/main.c
> +++ b/drivers/bus/mhi/host/main.c
> @@ -1626,3 +1626,77 @@ int mhi_get_channel_doorbell_offset(struct mhi_controller *mhi_cntrl, u32 *chdb_
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(mhi_get_channel_doorbell_offset);
> +
> +static int mhi_get_remote_time(struct mhi_controller *mhi_cntrl, struct mhi_timesync *mhi_tsync,
> +			       struct mhi_timesync_info *time)
> +{
> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	int ret, i;
> +
> +	if (!mhi_tsync && !mhi_tsync->time_reg) {
> +		dev_err(dev, "Time sync is not supported\n");
> +		return -EINVAL;

-EOPNOTSUPP

> +	}
> +
> +	if (unlikely(MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))) {
> +		dev_err(dev, "MHI is not in active state, pm_state:%s\n",
> +			to_mhi_pm_state_str(mhi_cntrl->pm_state));
> +		return -EIO;
> +	}
> +
> +	/* bring to M0 state */
> +	ret = mhi_device_get_sync(mhi_cntrl->mhi_dev);
> +	if (ret)
> +		return ret;
> +
> +	guard(mutex)(&mhi_tsync->ts_mutex);
> +	mhi_cntrl->runtime_get(mhi_cntrl);
> +
> +	/*
> +	 * time critical code to fetch device time, delay between these two steps
> +	 * should be deterministic as possible.
> +	 */
> +	preempt_disable();
> +	local_irq_disable();
> +
> +	time->t_host_pre = ktime_get_real();
> +
> +	/*
> +	 * To ensure the PCIe link is in L0 when ASPM is enabled, perform series
> +	 * of back-to-back reads. This is necessary because the link may be in a
> +	 * low-power state (e.g., L1 or L1ss), and need to be forced it to
> +	 * transition to L0.
> +	 */

You should be doing these back-to-back reads only if ASPM is enabled. You can
check that using pcie_aspm_enabled(). Also, see if you can hide this call inside
pci_generic driver to make mhi_phc truly transport agnostic.

> +	for (i = 0; i < MHI_NUM_BACK_TO_BACK_READS; i++) {
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_LOW_OFFSET, &time->t_dev_lo);
> +
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_HIGH_OFFSET, &time->t_dev_hi);
> +	}
> +
> +	time->t_host_post = ktime_get_real();
> +
> +	local_irq_enable();
> +	preempt_enable();
> +
> +	mhi_cntrl->runtime_put(mhi_cntrl);
> +
> +	mhi_device_put(mhi_cntrl->mhi_dev);
> +
> +	return 0;
> +}
> +
> +int mhi_get_remote_tsc_time_sync(struct mhi_device *mhi_dev, struct mhi_timesync_info *time)
> +{
> +	struct mhi_controller *mhi_cntrl = mhi_dev->mhi_cntrl;
> +	struct mhi_timesync *mhi_tsc_tsync = mhi_cntrl->tsc_timesync;
> +	int ret;
> +
> +	ret = mhi_get_remote_time(mhi_cntrl, mhi_tsc_tsync, time);
> +	if (ret)
> +		dev_err(&mhi_dev->dev, "Failed to get TSC Time Sync value:%d\n", ret);

What is the difference between 'TSC' and 'Time Sync'?

Nit: Space after 'value:'

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(mhi_get_remote_tsc_time_sync);
> diff --git a/include/linux/mhi.h b/include/linux/mhi.h
> index 88ccb3e14f481d6b85c2a314eb74ba960c2d4c81..f39c8ca7c251954f2d83c1227d206b600b88c75f 100644
> --- a/include/linux/mhi.h
> +++ b/include/linux/mhi.h
> @@ -286,6 +286,30 @@ struct mhi_controller_config {
>  	bool m2_no_db;
>  };
>  
> +/**
> + * struct mhi_timesync - MHI time synchronization structure
> + * @time_reg: Points to address of Timesync register
> + * @ts_mutex: Mutex for synchronization
> + */
> +struct mhi_timesync {
> +	void __iomem *time_reg;
> +	struct mutex ts_mutex;
> +};
> +
> +/**
> + * struct mhi_timesync_info - MHI time sync info structure
> + * @t_host_pre: Pre host soc time
> + * @t_host_post: Post host soc time

Get rid of 'soc', it provides no value.

> + * @t_dev_lo: Mhi device time of lower dword
> + * @t_dev_hi: Mhi device time of higher dword

s/Mhi/MHI

> + */
> +struct mhi_timesync_info {
> +	ktime_t t_host_pre;
> +	ktime_t t_host_post;
> +	u32 t_dev_lo;
> +	u32 t_dev_hi;
> +};
> +
>  /**
>   * struct mhi_controller - Master MHI controller structure
>   * @name: Device name of the MHI controller
> @@ -323,6 +347,7 @@ struct mhi_controller_config {
>   * @mhi_event: MHI event ring configurations table
>   * @mhi_cmd: MHI command ring configurations table
>   * @mhi_ctxt: MHI device context, shared memory between host and device
> + * @tsc_timesync: MHI TSC timesync
>   * @pm_mutex: Mutex for suspend/resume operation
>   * @pm_lock: Lock for protecting MHI power management state
>   * @timeout_ms: Timeout in ms for state transitions
> @@ -401,6 +426,8 @@ struct mhi_controller {
>  	struct mhi_cmd *mhi_cmd;
>  	struct mhi_ctxt *mhi_ctxt;
>  
> +	struct mhi_timesync *tsc_timesync;
> +
>  	struct mutex pm_mutex;
>  	rwlock_t pm_lock;
>  	u32 timeout_ms;
> @@ -795,4 +822,14 @@ bool mhi_queue_is_full(struct mhi_device *mhi_dev, enum dma_data_direction dir);
>   */
>  int mhi_get_channel_doorbell_offset(struct mhi_controller *mhi_cntrl, u32 *chdb_offset);
>  
> +/**
> + * mhi_get_remote_tsc_time_sync - get external soc time relative to local soc

Same comment as above

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply

* [PATCH net v2] RDS: Fix memory leak in rds_rdma_extra_size()
From: Xiaobo Liu @ 2026-04-13  7:00 UTC (permalink / raw)
  To: Allison Henderson, David S. Miller
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-rdma, rds-devel, linux-kernel, Xiaobo Liu

Free iov->iov when copy_from_user() or page count validation fails
in rds_rdma_extra_size().

This preserves the existing success path and avoids leaking the
allocated iovec array on error.

Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com>
---
 net/rds/rdma.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index aa6465dc7..91a20c1e2 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -560,6 +560,7 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
 	struct rds_iovec *vec;
 	struct rds_iovec __user *local_vec;
 	int tot_pages = 0;
+	int ret = 0;
 	unsigned int nr_pages;
 	unsigned int i;
 
@@ -578,16 +579,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
 	vec = &iov->iov[0];
 
 	if (copy_from_user(vec, local_vec, args->nr_local *
-			   sizeof(struct rds_iovec)))
-		return -EFAULT;
+			   sizeof(struct rds_iovec))) {
+		ret = -EFAULT;
+		goto out;
+	}
 	iov->len = args->nr_local;
 
 	/* figure out the number of pages in the vector */
 	for (i = 0; i < args->nr_local; i++, vec++) {
 
 		nr_pages = rds_pages_in_vec(vec);
-		if (nr_pages == 0)
-			return -EINVAL;
+		if (nr_pages == 0) {
+			ret = -EINVAL;
+			goto out;
+		}
 
 		tot_pages += nr_pages;
 
@@ -595,11 +600,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
 		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
 		 * so tot_pages cannot overflow without first going negative.
 		 */
-		if (tot_pages < 0)
-			return -EINVAL;
+		if (tot_pages < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
 	}
 
-	return tot_pages * sizeof(struct scatterlist);
+	ret = tot_pages * sizeof(struct scatterlist);
+
+out:
+	if (ret < 0) {
+		kfree(iov->iov);
+		iov->iov = NULL;
+	}
+	return ret;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-13  7:24 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni
In-Reply-To: <CAH5Ym4hEX15dmJhGBqhhk--_PoFRKRSyE1AomY4D3ipwAz+pKg@mail.gmail.com>

On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote:
> On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > As the dwmac 5.0 core receive path seems to lock up after the first
> > RBU, I never see more than one of those at a time.
> >
> > Right now, I consider this pretty much unsolvable - I've spent quite
> > some time looking at it and trying various approaches, nothing seems
> > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> > paths does seem to improve the situation a little with the 480Mbps
> > case, because I think it means that we're reading the descriptors in
> > a more timely manner after the hardware has updated them.
> 
> Hey Russell,
> 
> I'd like to repro this but I currently can't boot net-next. My issue
> is the same as [1], and the patch to fix it [2] isn't yet committed
> anywhere apparently.
> 
> This prevents my Jetson Xavier NX from starting at all (and after
> enough attempts, corrupts eMMC); I'm surprised you're not suffering
> the same effects. But because this bug lives in the IOMMU subsystem
> (and it has somewhat inconsistent effects), perhaps this is just a
> different way it manifests? Could you confirm whether your dwmac hang
> happens with IOMMU disabled, and/or with [1] reverted or [2] applied?
> 
> I'm using a defconfig build and a fairly minimal cmdline (just
> console=, root=, and rootwait).
> 
> Cheers,
> Sam
> 
> [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
> [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/

In the second link, there is this sub-thread:

https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/

which was committed into -rc as:

7e0548525abd iommu: Ensure .iotlb_sync is called correctly

which does fix IOMMU problems which caused net-next which reports itself
as v7.0-rc6 failing to boot with ext4 errors. See:

https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk

which resulted in it being merged into v7.0-rc7 just before Thursday's
net tree merge. Due to the way net-next is operated, that means that
net-next on Thursday evening gained this fix.

Involving Linus in the problem meant he was aware of it, and explaining
how netdev works allowed him to delay the merging of the net tree to
ensure net-next gained the fix.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net,v2 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: Russell King (Oracle) @ 2026-04-13  7:26 UTC (permalink / raw)
  To: KhaiWenTan
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
	alexandre.torgue, maxime.chevallier, ovidiu.panait.rb,
	vladimir.oltean, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, yoong.siang.song, hong.aun.looi, khai.wen.tan
In-Reply-To: <20260413020339.68426-1-khai.wen.tan@linux.intel.com>

On Mon, Apr 13, 2026 at 10:03:39AM +0800, KhaiWenTan wrote:
> get_interfaces() will update both the plat->phy_interfaces and
> mdio_bus_data->default_an_inband based on reading a SERDES register. As
> get_interfaces() will be called after default_an_inband had already been
> read, dwmac-intel regressed as a result with incorrect default_an_inband
> value in phylink_config.
> 
> Therefore, we moved the priv->plat->get_interfaces() to be executed first
> before assigning mdio_bus_data->default_an_inband to
> config->default_an_inband to ensure default_an_inband is in correct value.
> 
> Fixes: d3836052fe09 ("net: stmmac: intel: convert speed_mode_2500() to get_interfaces()")
> Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

I'll note that this will cause a conflict with net-next when that is
eventually merged.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v2 2/6] bus: mhi: host: Add support for non-posted TSC timesync feature
From: Manivannan Sadhasivam @ 2026-04-13  7:27 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: Richard Cochran, mhi, linux-arm-msm, linux-kernel, netdev,
	Vivek Pernamitta
In-Reply-To: <20260411-tsc_timesync-v2-2-6f25f72987b3@oss.qualcomm.com>

On Sat, Apr 11, 2026 at 01:42:02PM +0530, Krishna Chaitanya Chundru wrote:
> From: Vivek Pernamitta <quic_vpernami@quicinc.com>
> 
> Implement non-posted time synchronization as described in section 5.1.1
> of the MHI v1.2 specification. The host disables low-power link states
> to minimize latency, reads the local time, issues a MMIO read to the
> device's TIME register.
> 
> Add support for initializing this feature and export a function to be
> used by the drivers which does the time synchronization.
> 
> MHI reads the device time registers in the MMIO address space pointed to
> by the capability register after disabling all low power modes and keeping
> MHI in M0. Before and after MHI reads, the local time is captured
> and shared for processing.
> 
> Signed-off-by: Vivek Pernamitta <quic_vpernami@quicinc.com>
> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---
>  drivers/bus/mhi/common.h        |  4 +++
>  drivers/bus/mhi/host/init.c     | 28 ++++++++++++++++
>  drivers/bus/mhi/host/internal.h |  9 +++++
>  drivers/bus/mhi/host/main.c     | 74 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/mhi.h             | 37 +++++++++++++++++++++
>  5 files changed, 152 insertions(+)
> 
> diff --git a/drivers/bus/mhi/common.h b/drivers/bus/mhi/common.h
> index 4c316f3d5a68beb01f15cf575b03747096fdcf2c..64f9b2b94387a112bb6b5e20c634c3ba8d6bc78e 100644
> --- a/drivers/bus/mhi/common.h
> +++ b/drivers/bus/mhi/common.h
> @@ -118,6 +118,10 @@
>  #define CAP_CAPID_MASK			GENMASK(31, 24)
>  #define CAP_NEXT_CAP_MASK		GENMASK(23, 12)
>  
> +/* MHI TSC Timesync */
> +#define TSC_TIMESYNC_TIME_LOW_OFFSET	(0x8)
> +#define TSC_TIMESYNC_TIME_HIGH_OFFSET	(0xC)
> +
>  /* Command Ring Element macros */
>  /* No operation command */
>  #define MHI_TRE_CMD_NOOP_PTR		0
> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
> index c2162aa04e810e45ccfbedd20aaa62f892420d31..eb720f671726d919646cbc450cd54bda655a1060 100644
> --- a/drivers/bus/mhi/host/init.c
> +++ b/drivers/bus/mhi/host/init.c
> @@ -498,6 +498,30 @@ static int mhi_find_capability(struct mhi_controller *mhi_cntrl, u32 capability)
>  	return 0;
>  }
>  
> +static int mhi_init_tsc_timesync(struct mhi_controller *mhi_cntrl)
> +{
> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	struct mhi_timesync *mhi_tsc_tsync;
> +	u32 time_offset;
> +	int ret;
> +
> +	time_offset = mhi_find_capability(mhi_cntrl, MHI_CAP_ID_TSC_TIME_SYNC);
> +	if (!time_offset)
> +		return -ENXIO;
> +
> +	mhi_tsc_tsync = devm_kzalloc(dev, sizeof(*mhi_tsc_tsync), GFP_KERNEL);
> +	if (!mhi_tsc_tsync)
> +		return -ENOMEM;
> +
> +	mhi_cntrl->tsc_timesync = mhi_tsc_tsync;
> +	mutex_init(&mhi_tsc_tsync->ts_mutex);
> +
> +	/* save time_offset for obtaining time via MMIO register reads */
> +	mhi_tsc_tsync->time_reg = mhi_cntrl->regs + time_offset;
> +
> +	return 0;
> +}
> +
>  int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
>  {
>  	u32 val;
> @@ -635,6 +659,10 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
>  		return ret;
>  	}
>  
> +	ret = mhi_init_tsc_timesync(mhi_cntrl);
> +	if (ret)
> +		dev_dbg(dev, "TSC Time synchronization init failure\n");
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h
> index 7b0ee5e3a12dd585064169b7b884750bf4d8c8db..a0e729e7a1198c1b82c70b6bfe3bc2ee24331229 100644
> --- a/drivers/bus/mhi/host/internal.h
> +++ b/drivers/bus/mhi/host/internal.h
> @@ -15,6 +15,15 @@ extern const struct bus_type mhi_bus_type;
>  #define MHI_SOC_RESET_REQ_OFFSET			0xb0
>  #define MHI_SOC_RESET_REQ				BIT(0)
>  
> +/*
> + * With ASPM enabled, the link may enter a low power state, requiring
> + * a wake-up sequence. Use a short burst of back-to-back reads to
> + * transition the link to the active state. Based on testing,
> + * 4 iterations are necessary to ensure reliable wake-up without
> + * excess latency.
> + */
> +#define MHI_NUM_BACK_TO_BACK_READS			4
> +
>  struct mhi_ctxt {
>  	struct mhi_event_ctxt *er_ctxt;
>  	struct mhi_chan_ctxt *chan_ctxt;
> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> index 53c0ffe300702bcc3caa8fd9ea8086203c75b186..b7a727b1a5d1f20b570c62707a991ec5b85bfec7 100644
> --- a/drivers/bus/mhi/host/main.c
> +++ b/drivers/bus/mhi/host/main.c
> @@ -1626,3 +1626,77 @@ int mhi_get_channel_doorbell_offset(struct mhi_controller *mhi_cntrl, u32 *chdb_
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(mhi_get_channel_doorbell_offset);
> +
> +static int mhi_get_remote_time(struct mhi_controller *mhi_cntrl, struct mhi_timesync *mhi_tsync,
> +			       struct mhi_timesync_info *time)
> +{
> +	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	int ret, i;
> +
> +	if (!mhi_tsync && !mhi_tsync->time_reg) {
> +		dev_err(dev, "Time sync is not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	if (unlikely(MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))) {
> +		dev_err(dev, "MHI is not in active state, pm_state:%s\n",
> +			to_mhi_pm_state_str(mhi_cntrl->pm_state));
> +		return -EIO;
> +	}
> +
> +	/* bring to M0 state */
> +	ret = mhi_device_get_sync(mhi_cntrl->mhi_dev);
> +	if (ret)
> +		return ret;
> +
> +	guard(mutex)(&mhi_tsync->ts_mutex);
> +	mhi_cntrl->runtime_get(mhi_cntrl);
> +
> +	/*
> +	 * time critical code to fetch device time, delay between these two steps
> +	 * should be deterministic as possible.
> +	 */

Thinking more, to be deterministic, ASPM should be disabled at this point. But
we cannot do that safely now from PCI client drivers until this series gets in:
https://lore.kernel.org/linux-arm-msm/20250825-ath-aspm-fix-v2-0-61b2f2db7d89@oss.qualcomm.com/

So let's keep it this way for now, but add a FIXME to disable ASPM.

> +	preempt_disable();
> +	local_irq_disable();
> +
> +	time->t_host_pre = ktime_get_real();
> +
> +	/*
> +	 * To ensure the PCIe link is in L0 when ASPM is enabled, perform series
> +	 * of back-to-back reads. This is necessary because the link may be in a
> +	 * low-power state (e.g., L1 or L1ss), and need to be forced it to
> +	 * transition to L0.
> +	 */
> +	for (i = 0; i < MHI_NUM_BACK_TO_BACK_READS; i++) {
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_LOW_OFFSET, &time->t_dev_lo);
> +
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_HIGH_OFFSET, &time->t_dev_hi);

And an idiomatic way to read 64bit MMIO value in two 32bit read is:

	do {
		high = read_high();
		low = read_low();
	} while(high != read_high());

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-13  7:28 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni
In-Reply-To: <adyaS3EauyrNrjMy@shell.armlinux.org.uk>

On Mon, Apr 13, 2026 at 08:24:59AM +0100, Russell King (Oracle) wrote:
> On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote:
> > On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > As the dwmac 5.0 core receive path seems to lock up after the first
> > > RBU, I never see more than one of those at a time.
> > >
> > > Right now, I consider this pretty much unsolvable - I've spent quite
> > > some time looking at it and trying various approaches, nothing seems
> > > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> > > paths does seem to improve the situation a little with the 480Mbps
> > > case, because I think it means that we're reading the descriptors in
> > > a more timely manner after the hardware has updated them.
> > 
> > Hey Russell,
> > 
> > I'd like to repro this but I currently can't boot net-next. My issue
> > is the same as [1], and the patch to fix it [2] isn't yet committed
> > anywhere apparently.
> > 
> > This prevents my Jetson Xavier NX from starting at all (and after
> > enough attempts, corrupts eMMC); I'm surprised you're not suffering
> > the same effects. But because this bug lives in the IOMMU subsystem
> > (and it has somewhat inconsistent effects), perhaps this is just a
> > different way it manifests? Could you confirm whether your dwmac hang
> > happens with IOMMU disabled, and/or with [1] reverted or [2] applied?
> > 
> > I'm using a defconfig build and a fairly minimal cmdline (just
> > console=, root=, and rootwait).
> > 
> > Cheers,
> > Sam
> > 
> > [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
> > [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/
> 
> In the second link, there is this sub-thread:
> 
> https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/
> 
> which was committed into -rc as:
> 
> 7e0548525abd iommu: Ensure .iotlb_sync is called correctly
> 
> which does fix IOMMU problems which caused net-next which reports itself
> as v7.0-rc6 failing to boot with ext4 errors. See:
> 
> https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk
> 
> which resulted in it being merged into v7.0-rc7 just before Thursday's
> net tree merge. Due to the way net-next is operated, that means that
> net-next on Thursday evening gained this fix.
> 
> Involving Linus in the problem meant he was aware of it, and explaining
> how netdev works allowed him to delay the merging of the net tree to
> ensure net-next gained the fix.

I'll also state what I've stated previously about the iperf3 problem:
it seems to go back a long time, certainly before I started cleaning
up the stmmac driver which is now well over a year ago.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH iwl-net 0/5] iavf: five correctness fixes
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev

Small batch of iavf bug fixes.  Patches address a NULL-pointer dereference
crash in the hung-tx detector, a spurious free_irq() call in the misc-IRQ
error path, a VSI-state-corruption race when ethtool changes ring parameters
during an active reset, an inverted TC-boundary comparison that silently
steered frames to non-existing traffic classes, and an -EINVAL that confused
upper layers when a TC flower filter was looked up after its qdisc had
already been torn down.

All five are genuine correctness fixes with no functional changes for the
common path.  Best routed via net.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

Avinash Dayanand (1):
  iavf: fix TC boundary check in iavf_handle_tclass

Kiran Patil (2):
  iavf: fix null pointer dereference in iavf_detect_recover_hung
  iavf: return 0 when TC flower filter not found after qdisc teardown

Piotr Gardocki (1):
  iavf: fix error path in iavf_request_misc_irq

Sylwester Dziedziuch (1):
  iavf: prevent VSI corruption when ring params changed during reset

 drivers/net/ethernet/intel/iavf/iavf_ethtool.c |  5 +++++
 drivers/net/ethernet/intel/iavf/iavf_main.c    | 14 +++++++++++---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c    |  8 +++++---
 3 files changed, 21 insertions(+), 6 deletions(-)

-- 
2.52.0

^ permalink raw reply

* [PATCH iwl-net 1/5] iavf: fix null pointer dereference in iavf_detect_recover_hung
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Kiran Patil
In-Reply-To: <20260413073035.4082204-1-aleksandr.loktionov@intel.com>

From: Kiran Patil <kiran.patil@intel.com>

During a concurrent reset, q_vectors are freed and re-allocated while
the watchdog task may still be iterating rings in
iavf_detect_recover_hung(). Dereferencing a NULL q_vector inside
iavf_force_wb() results in a crash. Guard against this by skipping
rings whose q_vector is NULL.

Also move the tx_ring declaration into the loop body and drop the
redundant outer NULL initialisation, which the compiler can never
observe since an array-element address is always non-NULL.

Fixes: 9c6c12595b73 ("i40e: Detection and recovery of TX queue hung logic moved to service_task from tx_timeout")
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 363c42b..e7e7fc9 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -176,7 +176,6 @@ static void iavf_force_wb(struct iavf_vsi *vsi, struct iavf_q_vector *q_vector)
  **/
 void iavf_detect_recover_hung(struct iavf_vsi *vsi)
 {
-	struct iavf_ring *tx_ring = NULL;
 	struct net_device *netdev;
 	unsigned int i;
 	int packets;
@@ -195,8 +194,11 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
 		return;
 
 	for (i = 0; i < vsi->back->num_active_queues; i++) {
-		tx_ring = &vsi->back->tx_rings[i];
-		if (tx_ring && tx_ring->desc) {
+		struct iavf_ring *tx_ring = &vsi->back->tx_rings[i];
+
+		if (!tx_ring || !tx_ring->q_vector)
+			continue;
+		if (tx_ring->desc) {
 			/* If packet counter has not changed the queue is
 			 * likely stalled, so force an interrupt for this
 			 * queue.
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net 2/5] iavf: fix error path in iavf_request_misc_irq
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev
In-Reply-To: <20260413073035.4082204-1-aleksandr.loktionov@intel.com>

From: Piotr Gardocki <piotrx.gardocki@intel.com>

When request_irq() fails the interrupt vector was not registered for
the driver. Calling free_irq() on a vector that was never successfully
requested triggers a kernel warning. Drop the erroneous free_irq()
call from the error path.

Fixes: 5eae00c57f5e ("i40evf: main driver core")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index dad001a..ab5f5adc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -587,7 +587,6 @@ static int iavf_request_misc_irq(struct iavf_adapter *adapter)
 		dev_err(&adapter->pdev->dev,
 			"request_irq for %s failed: %d\n",
 			adapter->misc_vector_name, err);
-		free_irq(adapter->msix_entries[0].vector, netdev);
 	}
 	return err;
 }
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net 3/5] iavf: prevent VSI corruption when ring params changed during reset
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Sylwester Dziedziuch
In-Reply-To: <20260413073035.4082204-1-aleksandr.loktionov@intel.com>

From: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>

Changing ring parameters via ethtool triggers a VF reset and queue
reconfiguration. If ethtool is called again before the first reset
completes, the second reset races with uninitialised queue state and
can corrupt the VSI resource tree on the PF side.

Return -EAGAIN from iavf_set_ringparam() when the adapter is already
resetting or its queues are disabled.

Fixes: fbb7ddfef253 ("i40evf: core ethtool functionality")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 1cd1f3f..3909131 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -495,6 +495,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
 
+	if (adapter->state == __IAVF_RESETTING ||
+	    (adapter->state == __IAVF_RUNNING &&
+	     adapter->flags & IAVF_FLAG_QUEUES_DISABLED))
+		return -EAGAIN;
+
 	if (ring->tx_pending > IAVF_MAX_TXD ||
 	    ring->tx_pending < IAVF_MIN_TXD ||
 	    ring->rx_pending > IAVF_MAX_RXD ||
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net 4/5] iavf: fix TC boundary check in iavf_handle_tclass
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Avinash Dayanand
In-Reply-To: <20260413073035.4082204-1-aleksandr.loktionov@intel.com>

From: Avinash Dayanand <avinash.dayanand@intel.com>

The condition `tc < adapter->num_tc` admits any tc value equal to or
greater than num_tc, bypassing the destination-port validation and
allowing traffic to be steered to a non-existent traffic class. Change
the comparison to `tc > adapter->num_tc` to correctly reject
out-of-range TC values.

Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters")
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index ab5f5adc..5e4035b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4062,7 +4062,7 @@ static int iavf_handle_tclass(struct iavf_adapter *adapter, u32 tc,
 {
 	if (tc == 0)
 		return 0;
-	if (tc < adapter->num_tc) {
+	if (tc > adapter->num_tc) {
 		if (!filter->f.data.tcp_spec.dst_port) {
 			dev_err(&adapter->pdev->dev,
 				"Specify destination port to redirect to traffic class other than TC0\n");
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net 5/5] iavf: return 0 when TC flower filter not found after qdisc teardown
From: Aleksandr Loktionov @ 2026-04-13  7:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Kiran Patil
In-Reply-To: <20260413073035.4082204-1-aleksandr.loktionov@intel.com>

From: Kiran Patil <kiran.patil@intel.com>

When an egress qdisc is destroyed, the driver proactively deletes all
associated cloud filters to prevent stale hardware state, decrementing
num_cloud_filters to zero in the process.

The kernel netdev layer is unaware of this implicit cleanup and may
still try to delete the same filters individually. If the filter is
not found in the driver's list and num_cloud_filters is already zero,
return 0 instead of -EINVAL to avoid confusing upper layers that
believe the filter is still offloaded in hardware.

Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters")
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 5e4035b..05aaae9 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4175,7 +4175,16 @@ static int iavf_delete_clsflower(struct iavf_adapter *adapter,
 	if (filter) {
 		filter->del = true;
 		adapter->aq_required |= IAVF_FLAG_AQ_DEL_CLOUD_FILTER;
-	} else {
+	} else if (adapter->num_cloud_filters) {
+		/* When the egress qdisc is detached the driver implicitly
+		 * deletes all associated cloud filters to prevent stale
+		 * hardware entries, reducing num_cloud_filters to zero.
+		 * The netdev layer is unaware of this implicit cleanup and
+		 * may still request deletion of individual filters.  Only
+		 * return -EINVAL when a filter lookup fails and
+		 * num_cloud_filters is non-zero, indicating a genuine
+		 * lookup failure rather than a post-teardown stale delete.
+		 */
 		err = -EINVAL;
 	}
 	spin_unlock_bh(&adapter->cloud_filter_list_lock);
-- 
2.52.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox