Linux-HyperV List
 help / color / mirror / Atom feed
* [PATCH -hyperv 1/3] x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
From: Uros Bizjak @ 2026-03-11 10:25 UTC (permalink / raw)
  To: linux-hyperv, x86, linux-kernel
  Cc: Uros Bizjak, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

hv_hvcrash_ctxt_save() in arch/x86/hyperv/hv_crash.c currently saves
segment registers via a general-purpose register (%eax). Update the
code to save segment registers (cs, ss, ds, es, fs, gs) directly to
the crash context memory using movw. This avoids unnecessary use of
a general-purpose register, making the code simpler and more efficient.

The size of the corresponding object file improves as follows:

   text    data     bss     dec     hex filename
   4167     176     200    4543    11bf hv_crash-old.o
   4151     176     200    4527    11af hv_crash-new.o

No functional change occurs to the saved context contents; this is
purely a code-quality improvement.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/hyperv/hv_crash.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/hyperv/hv_crash.c b/arch/x86/hyperv/hv_crash.c
index fdb277bf73d8..2c7ea7e70854 100644
--- a/arch/x86/hyperv/hv_crash.c
+++ b/arch/x86/hyperv/hv_crash.c
@@ -207,12 +207,12 @@ static void hv_hvcrash_ctxt_save(void)
 	asm volatile("movq %%cr2, %0" : "=a"(ctxt->cr2));
 	asm volatile("movq %%cr8, %0" : "=a"(ctxt->cr8));
 
-	asm volatile("movl %%cs, %%eax" : "=a"(ctxt->cs));
-	asm volatile("movl %%ss, %%eax" : "=a"(ctxt->ss));
-	asm volatile("movl %%ds, %%eax" : "=a"(ctxt->ds));
-	asm volatile("movl %%es, %%eax" : "=a"(ctxt->es));
-	asm volatile("movl %%fs, %%eax" : "=a"(ctxt->fs));
-	asm volatile("movl %%gs, %%eax" : "=a"(ctxt->gs));
+	asm volatile("movw %%cs, %0" : "=m"(ctxt->cs));
+	asm volatile("movw %%ss, %0" : "=m"(ctxt->ss));
+	asm volatile("movw %%ds, %0" : "=m"(ctxt->ds));
+	asm volatile("movw %%es, %0" : "=m"(ctxt->es));
+	asm volatile("movw %%fs, %0" : "=m"(ctxt->fs));
+	asm volatile("movw %%gs, %0" : "=m"(ctxt->gs));
 
 	native_store_gdt(&ctxt->gdtr);
 	store_idt(&ctxt->idtr);
-- 
2.53.0


^ permalink raw reply related

* [PATCH -hyperv 2/3] x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
From: Uros Bizjak @ 2026-03-11 10:25 UTC (permalink / raw)
  To: linux-hyperv, x86, linux-kernel
  Cc: Uros Bizjak, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
In-Reply-To: <20260311102658.215693-1-ubizjak@gmail.com>

Use current_stack_pointer to avoid asm() when saving %rsp to the
crash context memory in hv_hvcrash_ctxt_save(). The new code is
more readable and results in exactly the same object file.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/hyperv/hv_crash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_crash.c b/arch/x86/hyperv/hv_crash.c
index 2c7ea7e70854..d0f95a278fdb 100644
--- a/arch/x86/hyperv/hv_crash.c
+++ b/arch/x86/hyperv/hv_crash.c
@@ -199,7 +199,7 @@ static void hv_hvcrash_ctxt_save(void)
 {
 	struct hv_crash_ctxt *ctxt = &hv_crash_ctxt;
 
-	asm volatile("movq %%rsp,%0" : "=m"(ctxt->rsp));
+	ctxt->rsp = current_stack_pointer;
 
 	ctxt->cr0 = native_read_cr0();
 	ctxt->cr4 = native_read_cr4();
-- 
2.53.0


^ permalink raw reply related

* [PATCH -hyperv 3/3] x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
From: Uros Bizjak @ 2026-03-11 10:26 UTC (permalink / raw)
  To: linux-hyperv, x86, linux-kernel
  Cc: Uros Bizjak, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
In-Reply-To: <20260311102658.215693-1-ubizjak@gmail.com>

hv_hvcrash_ctxt_save() in arch/x86/hyperv/hv_crash.c currently saves %cr2
and %cr8 using %eax ("=a"). This unnecessarily forces a specific register.
Update the inline assembly to use a general-purpose register ("=r") for
both %cr2 and %cr8. This makes the code more flexible for the compiler
while producing the same saved context contents.

No functional changes.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/hyperv/hv_crash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_crash.c b/arch/x86/hyperv/hv_crash.c
index d0f95a278fdb..5ffcc23255de 100644
--- a/arch/x86/hyperv/hv_crash.c
+++ b/arch/x86/hyperv/hv_crash.c
@@ -204,8 +204,8 @@ static void hv_hvcrash_ctxt_save(void)
 	ctxt->cr0 = native_read_cr0();
 	ctxt->cr4 = native_read_cr4();
 
-	asm volatile("movq %%cr2, %0" : "=a"(ctxt->cr2));
-	asm volatile("movq %%cr8, %0" : "=a"(ctxt->cr8));
+	asm volatile("movq %%cr2, %0" : "=r"(ctxt->cr2));
+	asm volatile("movq %%cr8, %0" : "=r"(ctxt->cr8));
 
 	asm volatile("movw %%cs, %0" : "=m"(ctxt->cs));
 	asm volatile("movw %%ss, %0" : "=m"(ctxt->ss));
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH 36/61] arch/sh: Prefer IS_ERR_OR_NULL over manual NULL check
From: Geert Uytterhoeven @ 2026-03-11 13:15 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
	gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
	linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
	linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
	linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
	linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
	linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
	linux-sctp, linux-security-module, linux-sh, linux-sound,
	linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
	netdev, ntfs3, samba-technical, sched-ext, target-devel,
	tipc-discussion, v9fs, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz
In-Reply-To: <20260310-b4-is_err_or_null-v1-36-bd63b656022d@avm.de>

On Tue, 10 Mar 2026 at 12:56, Philipp Hahn <phahn-oss@avm.de> wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Change generated with coccinelle.
>
> To: Yoshinori Sato <ysato@users.sourceforge.jp>
> To: Rich Felker <dalias@libc.org>
> To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Cc: linux-sh@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 15/61] trace: Prefer IS_ERR_OR_NULL over manual NULL check
From: Steven Rostedt @ 2026-03-11 14:03 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
	dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
	linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
	linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
	linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
	linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
	linux-nfs, linux-omap, linux-phy, linux-pm, linux-rockchip,
	linux-s390, linux-scsi, linux-sctp, linux-security-module,
	linux-sh, linux-sound, linux-stm32, linux-trace-kernel, linux-usb,
	linux-wireless, netdev, ntfs3, samba-technical, sched-ext,
	target-devel, tipc-discussion, v9fs, Mathieu Desnoyers
In-Reply-To: <20260311141332.b611237d36b61b2409e66cb3@kernel.org>

On Wed, 11 Mar 2026 14:13:32 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Hmm, now IS_ERR_OR_NULL() is an inline function, so it is safe.
> But if you want to use IS_ERR_OR_NULL() here, it will be better something like
> 
> node = rhashtable_walk_next(&iter);
> while (!IS_ERR_OR_NULL(node)) {
> 	fprobe_remove_node_in_module(mod, node, &alist);
> 	node = rhashtable_walk_next(&iter);
> }

But now you need to have a duplicate code in order to acquire "node"

I think the patch just makes the code worse.

-- Steve

^ permalink raw reply

* Re: [PATCH 15/61] trace: Prefer IS_ERR_OR_NULL over manual NULL check
From: Geert Uytterhoeven @ 2026-03-11 14:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), Philipp Hahn, amd-gfx, apparmor, bpf,
	ceph-devel, cocci, dm-devel, dri-devel, gfs2, intel-gfx,
	intel-wired-lan, iommu, kvm, linux-arm-kernel, linux-block,
	linux-bluetooth, linux-btrfs, linux-cifs, linux-clk, linux-erofs,
	linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv, linux-input,
	linux-kernel, linux-leds, linux-media, linux-mips, linux-mm,
	linux-modules, linux-mtd, linux-nfs, linux-omap, linux-phy,
	linux-pm, linux-rockchip, linux-s390, linux-scsi, linux-sctp,
	linux-security-module, linux-sh, linux-sound, linux-stm32,
	linux-trace-kernel, linux-usb, linux-wireless, netdev, ntfs3,
	samba-technical, sched-ext, target-devel, tipc-discussion, v9fs,
	Mathieu Desnoyers
In-Reply-To: <20260311100332.6a2ce4b1@gandalf.local.home>

Hi Steven,

On Wed, 11 Mar 2026 at 15:03, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 11 Mar 2026 14:13:32 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > Hmm, now IS_ERR_OR_NULL() is an inline function, so it is safe.
> > But if you want to use IS_ERR_OR_NULL() here, it will be better something like
> >
> > node = rhashtable_walk_next(&iter);
> > while (!IS_ERR_OR_NULL(node)) {
> >       fprobe_remove_node_in_module(mod, node, &alist);
> >       node = rhashtable_walk_next(&iter);
> > }
>
> But now you need to have a duplicate code in order to acquire "node"
>
> I think the patch just makes the code worse.

Obviously we need a new for_each_*() helper hiding all the gory internals?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 01/61] Coccinelle: Prefer IS_ERR_OR_NULL over manual NULL check
From: Markus Elfring @ 2026-03-11 15:12 UTC (permalink / raw)
  To: Philipp Hahn, cocci, Julia Lawall, Nicolas Palix
  Cc: amd-gfx, apparmor, bpf, ceph-devel, dm-devel, dri-devel, gfs2,
	intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
	linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
	linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
	linux-input, linux-leds, linux-media, linux-mips, linux-mm,
	linux-modules, linux-mtd, linux-nfs, linux-omap, linux-phy,
	linux-pm, linux-rockchip, linux-s390, linux-scsi, linux-sctp,
	linux-security-module, linux-sh, linux-sound, linux-stm32,
	linux-trace-kernel, linux-usb, linux-wireless, netdev, ntfs3,
	samba-technical, sched-ext, target-devel, tipc-discussion, v9fs,
	LKML
In-Reply-To: <20260310-b4-is_err_or_null-v1-1-bd63b656022d@avm.de>

…
> +// Confidence: High

Some contributors presented discerning comments for this change approach.
Thus I became also curious how much they can eventually be taken better into account
by the means of the semantic patch language (Coccinelle software).

…
+@p1 depends on patch@
+expression E;
+@@
+(
> +-	E != NULL && !IS_ERR(E)
> ++	!IS_ERR_OR_NULL(E)
> +|
> +-	E == NULL || IS_ERR(E)
> ++	IS_ERR_OR_NULL(E)
> +|
> +-	!IS_ERR(E) && E != NULL
> ++	!IS_ERR_OR_NULL(E)
> +|
> +-	IS_ERR(E) || E == NULL
> ++	IS_ERR_OR_NULL(E)
> +)

Several detected expressions should refer to return values from function calls.
https://en.wikipedia.org/wiki/Return_statement

* Do any development challenges hinder still the determination of corresponding
  failure predicates?

* How will interests evolve to improve data processing any further for such
  use cases?


Regards,
Markus

^ permalink raw reply

* Re: [PATCH net-next v2] net: mana: Expose hardware diagnostic info via debugfs
From: Simon Horman @ 2026-03-11 16:46 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, shradhagupta, dipayanroy,
	yury.norov, kees, shirazsaleem, linux-hyperv, netdev,
	linux-kernel, linux-rdma
In-Reply-To: <20260309143840.675606-1-ernis@linux.microsoft.com>

On Mon, Mar 09, 2026 at 07:38:28AM -0700, Erni Sri Satya Vennela wrote:

...

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c

...

> @@ -2128,6 +2140,9 @@ int mana_gd_suspend(struct pci_dev *pdev, pm_message_t state)
>  
>  	mana_gd_cleanup(pdev);
>  
> +	debugfs_remove_recursive(gc->mana_pci_debugfs);
> +	gc->mana_pci_debugfs = NULL;

Hi Erni,

The same cleanup of mana_pci_debugfs already appears in a couple of other
places. It seems that all such cleanup is now paired with a call to
mana_gd_cleanup().

So could you consider performing the mana_pci_debugfs cleanup in
mana_gd_cleanup()? Possibly also renaming that function?

> +
>  	return 0;
>  }
>  
> @@ -2140,6 +2155,12 @@ int mana_gd_resume(struct pci_dev *pdev)
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
>  	int err;
>  
> +	if (gc->is_pf)
> +		gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root);
> +	else
> +		gc->mana_pci_debugfs = debugfs_create_dir(pci_slot_name(pdev->slot),
> +							  mana_debugfs_root);

Likewise the setup of mana_pci_debugfs seems to now always be paired
with a call to mana_gd_setup().

> +
>  	err = mana_gd_setup(pdev);
>  	if (err)
>  		return err;

...

^ permalink raw reply

* Re: [PATCH net-next,V4, 3/3] net: mana: Add ethtool counters for RX CQEs in coalesced type
From: Simon Horman @ 2026-03-11 17:58 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Erni Sri Satya Vennela, Dipayaan Roy, Shradha Gupta,
	Shiraz Saleem, Kees Cook, Subbaraya Sundeep, Aditya Garg,
	Breno Leitao, linux-kernel, linux-rdma, paulros
In-Reply-To: <20260309212106.764156-4-haiyangz@linux.microsoft.com>

On Mon, Mar 09, 2026 at 02:20:45PM -0700, Haiyang Zhang wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> For RX CQEs with type CQE_RX_COALESCED_4, to measure the coalescing
> efficiency, add counters to count how many contains 2, 3, 4 packets
> respectively.
> Also, add a counter for the error case of first packet with length == 0.
> 
> Reviewed-by: Long Li <longli@microsoft.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 21 ++++++++++++++++++-
>  .../ethernet/microsoft/mana/mana_ethtool.c    | 15 +++++++++++--
>  include/net/mana/mana.h                       |  9 +++++---
>  3 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index fa30046dcd3d..85f7a56d0d90 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2148,11 +2148,23 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
>  		old_buf = NULL;
>  		pktlen = oob->ppi[i].pkt_len;
>  		if (pktlen == 0) {
> -			if (i == 0)
> +			/* Collect coalesced CQE count based on packets processed.
> +			 * Coalesced CQEs have at least 2 packets, so index is i - 2.
> +			 */
> +			if (i > 1) {
> +				u64_stats_update_begin(&rxq->stats.syncp);
> +				rxq->stats.coalesced_cqe[i - 2]++;
> +				u64_stats_update_end(&rxq->stats.syncp);
> +			} else if (i == 0) {
> +				/* Error case stat */
> +				u64_stats_update_begin(&rxq->stats.syncp);
> +				rxq->stats.pkt_len0_err++;
> +				u64_stats_update_end(&rxq->stats.syncp);
>  				netdev_err_once(
>  					ndev,
>  					"RX pkt len=0, rq=%u, cq=%u, rxobj=0x%llx\n",
>  					rxq->gdma_id, cq->gdma_id, rxq->rxobj);
> +			}
>  			break;

Hi Haiyang Zhang,

As there is a break here, can the accounting logic above be move out of the
loop, and merged with the "Coalesced CQE with all 4 packets" accounting
logic that is already there?

As is, accounting seems split between and slightly duplicated in two locations.


>  		}
>  
> @@ -2175,6 +2187,13 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
>  		if (!coalesced)
>  			break;
>  	}
> +
> +	/* Coalesced CQE with all 4 packets */
> +	if (coalesced && i == MANA_RXCOMP_OOB_NUM_PPI) {
> +		u64_stats_update_begin(&rxq->stats.syncp);
> +		rxq->stats.coalesced_cqe[MANA_RXCOMP_OOB_NUM_PPI - 2]++;
> +		u64_stats_update_end(&rxq->stats.syncp);
> +	}
>  }
>  
>  static void mana_poll_rx_cq(struct mana_cq *cq)

...

^ permalink raw reply

* Re: [PATCH net] net: mana: Fix use-after-free in mana_hwc_destroy_channel() by re-ordering teardown
From: Dipayaan Roy @ 2026-03-11 18:52 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma
In-Reply-To: <aa8PshvNRfuRYO/p@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Mon, Mar 09, 2026 at 11:21:38AM -0700, Dipayaan Roy wrote:
> A potential race condition exists in mana_hwc_destroy_channel() where
> hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and
> Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
> handler to dereference freed memory, leading to a use-after-free or
> NULL pointer dereference in mana_hwc_handle_resp().
> 
> mana_smc_teardown_hwc() signals the hardware to stop but does not
> synchronize against IRQ handlers already executing on other CPUs. The
> IRQ synchronization only happens in mana_hwc_destroy_cq() via
> mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs
> after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler()
> can dereference freed caller_ctx (and rxq->msg_buf) in
> mana_hwc_handle_resp().
> 
> Fix this by reordering teardown to reverse-of-creation order: destroy
> the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This
> ensures all in-flight interrupt handlers complete before the memory they
> access is freed.
> 
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/hw_channel.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 91975bdb5686..dbbde0fa57e7 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -814,9 +814,6 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
>  		gc->max_num_cqs = 0;
>  	}
>  
> -	kfree(hwc->caller_ctx);
> -	hwc->caller_ctx = NULL;
> -
>  	if (hwc->txq)
>  		mana_hwc_destroy_wq(hwc, hwc->txq);
>  
> @@ -826,6 +823,9 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
>  	if (hwc->cq)
>  		mana_hwc_destroy_cq(hwc->gdma_dev->gdma_context, hwc->cq);
>  
> +	kfree(hwc->caller_ctx);
> +	hwc->caller_ctx = NULL;
> +
>  	mana_gd_free_res_map(&hwc->inflight_msg_res);
>  
>  	hwc->num_inflight_msg = 0;
> -- 
> 2.43.0
>
Hi,
I am sending a v2 as I missed adding Stephen.

Thank you.  

^ permalink raw reply

* [PATCH net,v2] net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown
From: Dipayaan Roy @ 2026-03-11 19:22 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, dipayanroy

A potential race condition exists in mana_hwc_destroy_channel() where
hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and
Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
handler to dereference freed memory, leading to a use-after-free or
NULL pointer dereference in mana_hwc_handle_resp().

mana_smc_teardown_hwc() signals the hardware to stop but does not
synchronize against IRQ handlers already executing on other CPUs. The
IRQ synchronization only happens in mana_hwc_destroy_cq() via
mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs
after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler()
can dereference freed caller_ctx (and rxq->msg_buf) in
mana_hwc_handle_resp().

Fix this by reordering teardown to reverse-of-creation order: destroy
the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This
ensures all in-flight interrupt handlers complete before the memory they
access is freed.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
Changes in v2:
  - Added maintainers missed in v1.
---
---
 drivers/net/ethernet/microsoft/mana/hw_channel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 91975bdb5686..dbbde0fa57e7 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -814,9 +814,6 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
 		gc->max_num_cqs = 0;
 	}
 
-	kfree(hwc->caller_ctx);
-	hwc->caller_ctx = NULL;
-
 	if (hwc->txq)
 		mana_hwc_destroy_wq(hwc, hwc->txq);
 
@@ -826,6 +823,9 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
 	if (hwc->cq)
 		mana_hwc_destroy_cq(hwc->gdma_dev->gdma_context, hwc->cq);
 
+	kfree(hwc->caller_ctx);
+	hwc->caller_ctx = NULL;
+
 	mana_gd_free_res_map(&hwc->inflight_msg_res);
 
 	hwc->num_inflight_msg = 0;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next] net: mana: Expose page_pool stats via ethtool
From: Dipayaan Roy @ 2026-03-11 19:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, dipayanroy
In-Reply-To: <20260227092722.50a7e45f@kernel.org>

On Fri, Feb 27, 2026 at 09:27:22AM -0800, Jakub Kicinski wrote:
> On Fri, 27 Feb 2026 01:39:18 -0800 Dipayaan Roy wrote:
> > MANA relies on page_pool for RX buffers, and the buffer refill paths
> > can behave quite differently across architectures and configurations (e.g.
> > base page size, fragment vs full-page usage). This makes it harder to
> > understand and compare RX buffer behavior when investigating performance
> > and memory differences across platforms.
> 
> Standard stats must not be duplicated in ethtool -S.
> ynl and ynltool provide easy access to these stats
> 
> # ynltool page-pool stats 
>     eth0[2]	page pools: 44 (zombies: 0)
> 		refs: 495680 bytes: 2030305280 (refs: 0 bytes: 0)
> 		recycling: 100.0% (alloc: 7745:2097593009 recycle: 379301630:1717888312)

Thanks Jakub for the feedback, and understood the generic page pool
stats should be combined with ethtool -S. I will drop this patch
and use ynltool page-pool stats.


Regards



^ permalink raw reply

* Re: [PATCH 49/61] media: Prefer IS_ERR_OR_NULL over manual NULL check
From: Kieran Bingham @ 2026-03-11 23:03 UTC (permalink / raw)
  To: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
	dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
	linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
	linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
	linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
	linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
	linux-nfs, linux-omap, linux-phy, lin 
  Cc: Shuah Khan, Mauro Carvalho Chehab
In-Reply-To: <20260310-b4-is_err_or_null-v1-49-bd63b656022d@avm.de>

Quoting Philipp Hahn (2026-03-10 11:49:15)
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
> 
> Change generated with coccinelle.
> 
> To: Shuah Khan <skhan@linuxfoundation.org>
> To: Kieran Bingham <kieran.bingham@ideasonboard.com>
> To: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
>  drivers/media/test-drivers/vimc/vimc-streamer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/test-drivers/vimc/vimc-streamer.c b/drivers/media/test-drivers/vimc/vimc-streamer.c
> index 15d863f97cbf96b7ca7fbf3d7b6b6ec39fcc8ae3..da5aca50bcb4990c06f28e5a883eb398606991e9 100644
> --- a/drivers/media/test-drivers/vimc/vimc-streamer.c
> +++ b/drivers/media/test-drivers/vimc/vimc-streamer.c
> @@ -167,7 +167,7 @@ static int vimc_streamer_thread(void *data)
>                 for (i = stream->pipe_size - 1; i >= 0; i--) {
>                         frame = stream->ved_pipeline[i]->process_frame(
>                                         stream->ved_pipeline[i], frame);
> -                       if (!frame || IS_ERR(frame))
> +                       if (IS_ERR_OR_NULL(frame))

Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

>                                 break;
>                 }
>                 //wait for 60hz
> 
> -- 
> 2.43.0
>

^ permalink raw reply

* [PATCH 02/16] RDMA: Consolidate patterns with offsetof() to ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Similar to the prior patch, these patterns are open coding an
offsetofend(). The use of offsetof() targets the prior field as the
last field in the struct.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mana/cq.c |  9 ++-------
 drivers/infiniband/hw/mlx5/cq.c | 10 +++-------
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index b2749f971cd0af..3f932ef6e5fff6 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -27,14 +27,9 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	is_rnic_cq = mana_ib_is_rnic(mdev);
 
 	if (udata) {
-		if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
-			return -EINVAL;
-
-		err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-		if (err) {
-			ibdev_dbg(ibdev, "Failed to copy from udata for create cq, %d\n", err);
+		err = ib_copy_validate_udata_in(udata, ucmd, buf_addr);
+		if (err)
 			return err;
-		}
 
 		if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) ||
 		    attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 43a7b5ca49dcc9..643b3b7d387834 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -723,7 +723,6 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	struct mlx5_ib_create_cq ucmd = {};
 	unsigned long page_size;
 	unsigned int page_offset_quantized;
-	size_t ucmdlen;
 	__be64 *pas;
 	int ncont;
 	void *cqc;
@@ -731,12 +730,9 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 	struct mlx5_ib_ucontext *context = rdma_udata_to_drv_context(
 		udata, struct mlx5_ib_ucontext, ibucontext);
 
-	ucmdlen = min(udata->inlen, sizeof(ucmd));
-	if (ucmdlen < offsetof(struct mlx5_ib_create_cq, flags))
-		return -EINVAL;
-
-	if (ib_copy_from_udata(&ucmd, udata, ucmdlen))
-		return -EFAULT;
+	err = ib_copy_validate_udata_in(udata, ucmd, cqe_comp_res_format);
+	if (err)
+		return err;
 
 	if ((ucmd.flags & ~(MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD |
 			    MLX5_IB_CREATE_CQ_FLAGS_UAR_PAGE_INDEX |
-- 
2.43.0


^ permalink raw reply related

* [PATCH 12/16] RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Directly check the supported comp_mask bitmap using
ib_copy_validate_udata_in_cm() and remove the open coding.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 68c6e107747693..3b602ed0a2dafc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -4707,12 +4707,12 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		return -ENOSYS;
 
 	if (udata && udata->inlen) {
-		err = ib_copy_validate_udata_in(udata, ucmd, ece_options);
+		err = ib_copy_validate_udata_in_cm(udata, ucmd, ece_options,
+						   MLX5_IB_MODIFY_QP_OOO_DP);
 		if (err)
 			return err;
 
-		if (ucmd.comp_mask & ~MLX5_IB_MODIFY_QP_OOO_DP ||
-		    memchr_inv(&ucmd.burst_info.reserved, 0,
+		if (memchr_inv(&ucmd.burst_info.reserved, 0,
 			       sizeof(ucmd.burst_info.reserved)))
 			return -EOPNOTSUPP;
 
@@ -5381,17 +5381,16 @@ static int prepare_user_rq(struct ib_pd *pd,
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_ib_create_wq ucmd = {};
 	int err;
-	err = ib_copy_validate_udata_in(udata, ucmd,
-					single_stride_log_num_of_bytes);
+
+	err = ib_copy_validate_udata_in_cm(udata, ucmd,
+					   single_stride_log_num_of_bytes,
+					   MLX5_IB_CREATE_WQ_STRIDING_RQ);
 	if (err) {
 		mlx5_ib_dbg(dev, "copy failed\n");
 		return err;
 	}
 
-	if (ucmd.comp_mask & (~MLX5_IB_CREATE_WQ_STRIDING_RQ)) {
-		mlx5_ib_dbg(dev, "invalid comp mask\n");
-		return -EOPNOTSUPP;
-	} else if (ucmd.comp_mask & MLX5_IB_CREATE_WQ_STRIDING_RQ) {
+	if (ucmd.comp_mask & MLX5_IB_CREATE_WQ_STRIDING_RQ) {
 		if (!MLX5_CAP_GEN(dev->mdev, striding_rq)) {
 			mlx5_ib_dbg(dev, "Striding RQ is not supported\n");
 			return -EOPNOTSUPP;
-- 
2.43.0


^ permalink raw reply related

* [PATCH 03/16] RDMA: Consolidate patterns with sizeof() to ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Similar to the prior patch, these patterns are open coding an
offsetofend() using sizeof(), which targets the last member of the
current struct.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mana/qp.c       | 27 +++++++++------------------
 drivers/infiniband/hw/mana/wq.c       | 10 ++--------
 drivers/infiniband/hw/mlx4/main.c     |  6 ++----
 drivers/infiniband/hw/mlx5/cq.c       |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c | 13 ++-----------
 drivers/infiniband/sw/siw/siw_verbs.c |  6 +-----
 6 files changed, 17 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 82f84f7ad37a90..69c8d4f7a1f46b 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -111,16 +111,12 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 	u32 port;
 	int ret;
 
-	if (!udata || udata->inlen < sizeof(ucmd))
+	if (!udata)
 		return -EINVAL;
 
-	ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-	if (ret) {
-		ibdev_dbg(&mdev->ib_dev,
-			  "Failed copy from udata for create rss-qp, err %d\n",
-			  ret);
+	ret = ib_copy_validate_udata_in(udata, ucmd, port);
+	if (ret)
 		return ret;
-	}
 
 	if (attr->cap.max_recv_wr > mdev->adapter_caps.max_qp_wr) {
 		ibdev_dbg(&mdev->ib_dev,
@@ -282,15 +278,12 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	u32 port;
 	int err;
 
-	if (!mana_ucontext || udata->inlen < sizeof(ucmd))
+	if (!mana_ucontext)
 		return -EINVAL;
 
-	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-	if (err) {
-		ibdev_dbg(&mdev->ib_dev,
-			  "Failed to copy from udata create qp-raw, %d\n", err);
+	err = ib_copy_validate_udata_in(udata, ucmd, port);
+	if (err)
 		return err;
-	}
 
 	if (attr->cap.max_send_wr > mdev->adapter_caps.max_qp_wr) {
 		ibdev_dbg(&mdev->ib_dev,
@@ -535,17 +528,15 @@ static int mana_ib_create_rc_qp(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	u64 flags = 0;
 	u32 doorbell;
 
-	if (!udata || udata->inlen < sizeof(ucmd))
+	if (!udata)
 		return -EINVAL;
 
 	mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext, ibucontext);
 	doorbell = mana_ucontext->doorbell;
 	flags = MANA_RC_FLAG_NO_FMR;
-	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-	if (err) {
-		ibdev_dbg(&mdev->ib_dev, "Failed to copy from udata, %d\n", err);
+	err = ib_copy_validate_udata_in(udata, ucmd, queue_size);
+	if (err)
 		return err;
-	}
 
 	for (i = 0, j = 0; i < MANA_RC_QUEUE_TYPE_MAX; ++i) {
 		/* skip FMR for user-level RC QPs */
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
index 6206244f762e42..aceeea7f17b339 100644
--- a/drivers/infiniband/hw/mana/wq.c
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -15,15 +15,9 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
 	struct mana_ib_wq *wq;
 	int err;
 
-	if (udata->inlen < sizeof(ucmd))
-		return ERR_PTR(-EINVAL);
-
-	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
-	if (err) {
-		ibdev_dbg(&mdev->ib_dev,
-			  "Failed to copy from udata for create wq, %d\n", err);
+	err = ib_copy_validate_udata_in(udata, ucmd, reserved);
+	if (err)
 		return ERR_PTR(err);
-	}
 
 	wq = kzalloc_obj(*wq);
 	if (!wq)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 73e17b4339eb60..16e4cffbd7a84d 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -50,6 +50,7 @@
 #include <rdma/ib_user_verbs.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
+#include <rdma/uverbs_ioctl.h>
 
 #include <net/bonding.h>
 
@@ -445,10 +446,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	struct mlx4_clock_params clock_params;
 
 	if (uhw->inlen) {
-		if (uhw->inlen < sizeof(cmd))
-			return -EINVAL;
-
-		err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd));
+		err = ib_copy_validate_udata_in(uhw, cmd, reserved);
 		if (err)
 			return err;
 
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 643b3b7d387834..f5e75e51c6763f 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1229,7 +1229,7 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	struct ib_umem *umem;
 	int err;
 
-	err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd));
+	err = ib_copy_validate_udata_in(udata, ucmd, reserved1);
 	if (err)
 		return err;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index fe41362c51444c..c9fd40bfa09eb2 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -452,18 +452,9 @@ static int rxe_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
 	int err;
 
 	if (udata) {
-		if (udata->inlen < sizeof(cmd)) {
-			err = -EINVAL;
-			rxe_dbg_srq(srq, "malformed udata\n");
+		err = ib_copy_validate_udata_in(udata, cmd, mmap_info_addr);
+		if (err)
 			goto err_out;
-		}
-
-		err = ib_copy_from_udata(&cmd, udata, sizeof(cmd));
-		if (err) {
-			err = -EFAULT;
-			rxe_dbg_srq(srq, "unable to read udata\n");
-			goto err_out;
-		}
 	}
 
 	err = rxe_srq_chk_attr(rxe, srq, attr, mask);
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index ef504db8f2b48b..1e1d262a4ae2db 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1373,11 +1373,7 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		struct siw_uresp_reg_mr uresp = {};
 		struct siw_mem *mem = mr->mem;
 
-		if (udata->inlen < sizeof(ureq)) {
-			rv = -EINVAL;
-			goto err_out;
-		}
-		rv = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+		rv = ib_copy_validate_udata_in(udata, ureq, pad);
 		if (rv)
 			goto err_out;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 00/16] Update drivers to use ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches

Progress the uAPI work by shifting nearly all drivers to use
ib_copy_validate_udata_in() and its variations.

These helpers are easier to use and enforce a tighter uAPI protocol
for the udata.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (16):
  RDMA: Consolidate patterns with offsetofend() to
    ib_copy_validate_udata_in()
  RDMA: Consolidate patterns with offsetof() to
    ib_copy_validate_udata_in()
  RDMA: Consolidate patterns with sizeof() to
    ib_copy_validate_udata_in()
  RDMA: Use ib_copy_validate_udata_in() for implicit full structs
  RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq
  RDMA/mlx5: Use ib_copy_validate_udata_in()
  RDMA/mlx4: Use ib_copy_validate_udata_in()
  RDMA/mlx4: Use ib_copy_validate_udata_in() for QP
  RDMA/hns: Use ib_copy_validate_udata_in()
  RDMA/efa: Use ib_copy_validate_udata_in_cm()
  RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask
  RDMA/mlx5: Pull comp_mask validation into
    ib_copy_validate_udata_in_cm()
  RDMA/hns: Add missing comp_mask check in create_qp
  RDMA/irdma: Add missing comp_mask check in alloc_ucontext
  RDMA: Remove redundant = {} for udata req structs
  RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in()

 drivers/infiniband/hw/efa/efa_verbs.c         | 69 ++++------------
 drivers/infiniband/hw/erdma/erdma_verbs.c     |  6 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c       | 16 +---
 drivers/infiniband/hw/hns/hns_roce_main.c     |  6 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c       | 10 +--
 drivers/infiniband/hw/hns/hns_roce_srq.c      | 54 ++++--------
 .../infiniband/hw/ionic/ionic_controlpath.c   |  6 +-
 drivers/infiniband/hw/irdma/verbs.c           | 12 +--
 drivers/infiniband/hw/mana/cq.c               | 11 +--
 drivers/infiniband/hw/mana/qp.c               | 29 +++----
 drivers/infiniband/hw/mana/wq.c               | 12 +--
 drivers/infiniband/hw/mlx4/cq.c               | 10 +--
 drivers/infiniband/hw/mlx4/main.c             |  9 +-
 drivers/infiniband/hw/mlx4/qp.c               | 82 ++++---------------
 drivers/infiniband/hw/mlx4/srq.c              |  5 +-
 drivers/infiniband/hw/mlx5/cq.c               | 14 ++--
 drivers/infiniband/hw/mlx5/main.c             |  2 +-
 drivers/infiniband/hw/mlx5/mr.c               | 11 +--
 drivers/infiniband/hw/mlx5/qp.c               | 66 ++++-----------
 drivers/infiniband/hw/mlx5/srq.c              | 17 +---
 drivers/infiniband/hw/mthca/mthca_provider.c  | 27 +++---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   | 14 ++--
 drivers/infiniband/hw/qedr/verbs.c            | 42 ++++------
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  5 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  6 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  6 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         | 13 +--
 drivers/infiniband/sw/siw/siw_verbs.c         |  6 +-
 29 files changed, 176 insertions(+), 392 deletions(-)


base-commit: eb15cffa15201bd53d1ac296645aa2bc5f726841
-- 
2.43.0


^ permalink raw reply

* [PATCH 08/16] RDMA/mlx4: Use ib_copy_validate_udata_in() for QP
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Move the validation of the udata to the same function that copies it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mlx4/qp.c | 25 +++----------------------
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index deb1b0306aa7a1..40ddd723d7b549 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -854,7 +854,6 @@ static int create_rq(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 	unsigned long flags;
 	int range_size;
 	struct mlx4_ib_create_wq wq;
-	size_t copy_len;
 	int shift;
 	int n;
 
@@ -867,12 +866,9 @@ static int create_rq(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 
 	qp->state = IB_QPS_RESET;
 
-	copy_len = min(sizeof(struct mlx4_ib_create_wq), udata->inlen);
-
-	if (ib_copy_from_udata(&wq, udata, copy_len)) {
-		err = -EFAULT;
+	err = ib_copy_validate_udata_in(udata, wq, comp_mask);
+	if (err)
 		goto err;
-	}
 
 	if (wq.comp_mask || wq.reserved[0] || wq.reserved[1] ||
 	    wq.reserved[2]) {
@@ -4112,26 +4108,11 @@ struct ib_wq *mlx4_ib_create_wq(struct ib_pd *pd,
 	struct mlx4_dev *dev = to_mdev(pd->device)->dev;
 	struct ib_qp_init_attr ib_qp_init_attr = {};
 	struct mlx4_ib_qp *qp;
-	struct mlx4_ib_create_wq ucmd;
-	int err, required_cmd_sz;
+	int err;
 
 	if (!udata)
 		return ERR_PTR(-EINVAL);
 
-	required_cmd_sz = offsetof(typeof(ucmd), comp_mask) +
-			  sizeof(ucmd.comp_mask);
-	if (udata->inlen < required_cmd_sz) {
-		pr_debug("invalid inlen\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	if (udata->inlen > sizeof(ucmd) &&
-	    !ib_is_udata_cleared(udata, sizeof(ucmd),
-				 udata->inlen - sizeof(ucmd))) {
-		pr_debug("inlen is not supported\n");
-		return ERR_PTR(-EOPNOTSUPP);
-	}
-
 	if (udata->outlen)
 		return ERR_PTR(-EOPNOTSUPP);
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 11/16] RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

All of these cases require a 0 comp_mask. Consolidate these into
using ib_copy_validate_udata_in_cm() and remove the open coded
comp_mask test.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/efa/efa_verbs.c |  8 ++++----
 drivers/infiniband/hw/mlx4/main.c     |  5 +----
 drivers/infiniband/hw/mlx4/qp.c       | 13 ++++++-------
 drivers/infiniband/hw/mlx5/mr.c       |  4 ++--
 drivers/infiniband/hw/mlx5/qp.c       |  4 ++--
 5 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 22993273028433..b491bcd886ccb0 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -699,11 +699,11 @@ int efa_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init_attr,
 	if (err)
 		goto err_out;
 
-	err = ib_copy_validate_udata_in(udata, cmd, driver_qp_type);
+	err = ib_copy_validate_udata_in_cm(udata, cmd, driver_qp_type, 0);
 	if (err)
 		goto err_out;
 
-	if (cmd.comp_mask || !is_reserved_cleared(cmd.reserved_98)) {
+	if (!is_reserved_cleared(cmd.reserved_98)) {
 		ibdev_dbg(&dev->ibdev,
 			  "Incompatible ABI params, unknown fields in udata\n");
 		err = -EINVAL;
@@ -1140,11 +1140,11 @@ int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		goto err_out;
 	}
 
-	err = ib_copy_validate_udata_in(udata, cmd, num_sub_cqs);
+	err = ib_copy_validate_udata_in_cm(udata, cmd, num_sub_cqs, 0);
 	if (err)
 		goto err_out;
 
-	if (cmd.comp_mask || !is_reserved_cleared(cmd.reserved_58)) {
+	if (!is_reserved_cleared(cmd.reserved_58)) {
 		ibdev_dbg(ibdev,
 			  "Incompatible ABI params, unknown fields in udata\n");
 		err = -EINVAL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 16e4cffbd7a84d..037f02b5f28fb5 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -446,13 +446,10 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	struct mlx4_clock_params clock_params;
 
 	if (uhw->inlen) {
-		err = ib_copy_validate_udata_in(uhw, cmd, reserved);
+		err = ib_copy_validate_udata_in_cm(uhw, cmd, reserved, 0);
 		if (err)
 			return err;
 
-		if (cmd.comp_mask)
-			return -EINVAL;
-
 		if (cmd.reserved)
 			return -EINVAL;
 	}
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 40ddd723d7b549..cfb54ffcaac22c 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -720,7 +720,7 @@ static int _mlx4_ib_create_qp_rss(struct ib_pd *pd, struct mlx4_ib_qp *qp,
 	if (udata->outlen)
 		return -EOPNOTSUPP;
 
-	err = ib_copy_validate_udata_in(udata, ucmd, reserved1);
+	err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved1, 0);
 	if (err) {
 		pr_debug("copy failed\n");
 		return err;
@@ -729,7 +729,7 @@ static int _mlx4_ib_create_qp_rss(struct ib_pd *pd, struct mlx4_ib_qp *qp,
 	if (memchr_inv(ucmd.reserved, 0, sizeof(ucmd.reserved)))
 		return -EOPNOTSUPP;
 
-	if (ucmd.comp_mask || ucmd.reserved1)
+	if (ucmd.reserved1)
 		return -EOPNOTSUPP;
 
 	if (init_attr->qp_type != IB_QPT_RAW_PACKET) {
@@ -866,12 +866,11 @@ static int create_rq(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 
 	qp->state = IB_QPS_RESET;
 
-	err = ib_copy_validate_udata_in(udata, wq, comp_mask);
+	err = ib_copy_validate_udata_in_cm(udata, wq, comp_mask, 0);
 	if (err)
 		goto err;
 
-	if (wq.comp_mask || wq.reserved[0] || wq.reserved[1] ||
-	    wq.reserved[2]) {
+	if (wq.reserved[0] || wq.reserved[1] || wq.reserved[2]) {
 		pr_debug("user command isn't supported\n");
 		err = -EOPNOTSUPP;
 		goto err;
@@ -4235,11 +4234,11 @@ int mlx4_ib_modify_wq(struct ib_wq *ibwq, struct ib_wq_attr *wq_attr,
 	enum ib_wq_state cur_state, new_state;
 	int err;
 
-	err = ib_copy_validate_udata_in(udata, ucmd, reserved);
+	err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved, 0);
 	if (err)
 		return err;
 
-	if (ucmd.comp_mask || ucmd.reserved)
+	if (ucmd.reserved)
 		return -EOPNOTSUPP;
 
 	if (wq_attr_mask & IB_WQ_FLAGS)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index fce519b87633ef..49dcc39836c047 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1774,11 +1774,11 @@ int mlx5_ib_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 		__u32	response_length;
 	} resp = {};
 
-	err = ib_copy_validate_udata_in(udata, req, reserved2);
+	err = ib_copy_validate_udata_in_cm(udata, req, reserved2, 0);
 	if (err)
 		return err;
 
-	if (req.comp_mask || req.reserved1 || req.reserved2)
+	if (req.reserved1 || req.reserved2)
 		return -EOPNOTSUPP;
 
 	ndescs = req.num_klms ? roundup(req.num_klms, 4) : roundup(1, 4);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index d4d5e0d457a0b5..68c6e107747693 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5611,11 +5611,11 @@ int mlx5_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
 	void *rqc;
 	void *in;
 
-	err = ib_copy_validate_udata_in(udata, ucmd, reserved);
+	err = ib_copy_validate_udata_in_cm(udata, ucmd, reserved, 0);
 	if (err)
 		return err;
 
-	if (ucmd.comp_mask || ucmd.reserved)
+	if (ucmd.reserved)
 		return -EOPNOTSUPP;
 
 	inlen = MLX5_ST_SZ_BYTES(modify_rq_in);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 06/16] RDMA/mlx5: Use ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Fix up the remaining different patterns in mlx5 to use the helper.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c  |  7 +------
 drivers/infiniband/hw/mlx5/srq.c | 15 +++------------
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index cbe34251e340b9..fce519b87633ef 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1774,18 +1774,13 @@ int mlx5_ib_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 		__u32	response_length;
 	} resp = {};
 
-	err = ib_copy_from_udata(&req, udata, min(udata->inlen, sizeof(req)));
+	err = ib_copy_validate_udata_in(udata, req, reserved2);
 	if (err)
 		return err;
 
 	if (req.comp_mask || req.reserved1 || req.reserved2)
 		return -EOPNOTSUPP;
 
-	if (udata->inlen > sizeof(req) &&
-	    !ib_is_udata_cleared(udata, sizeof(req),
-				 udata->inlen - sizeof(req)))
-		return -EOPNOTSUPP;
-
 	ndescs = req.num_klms ? roundup(req.num_klms, 4) : roundup(1, 4);
 
 	in = kzalloc(inlen, GFP_KERNEL);
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index 17e018554d81d5..6d89c0242cab61 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -48,25 +48,16 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	struct mlx5_ib_create_srq ucmd = {};
 	struct mlx5_ib_ucontext *ucontext = rdma_udata_to_drv_context(
 		udata, struct mlx5_ib_ucontext, ibucontext);
-	size_t ucmdlen;
 	int err;
 	u32 uidx = MLX5_IB_DEFAULT_UIDX;
 
-	ucmdlen = min(udata->inlen, sizeof(ucmd));
-
-	if (ib_copy_from_udata(&ucmd, udata, ucmdlen)) {
-		mlx5_ib_dbg(dev, "failed copy udata\n");
-		return -EFAULT;
-	}
+	err = ib_copy_validate_udata_in(udata, ucmd, flags);
+	if (err)
+		return err;
 
 	if (ucmd.reserved0 || ucmd.reserved1)
 		return -EINVAL;
 
-	if (udata->inlen > sizeof(ucmd) &&
-	    !ib_is_udata_cleared(udata, sizeof(ucmd),
-				 udata->inlen - sizeof(ucmd)))
-		return -EINVAL;
-
 	if (in->type != IB_SRQT_BASIC) {
 		err = get_srq_user_index(ucontext, &ucmd, udata->inlen, &uidx);
 		if (err)
-- 
2.43.0


^ permalink raw reply related

* [PATCH 07/16] RDMA/mlx4: Use ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Follow the last member of each struct at the point
MLX4_IB_UVERBS_ABI_VERSION was set to 4.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/mlx4/cq.c  | 10 +++++-----
 drivers/infiniband/hw/mlx4/qp.c  |  8 ++------
 drivers/infiniband/hw/mlx4/srq.c |  5 +++--
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 8535fd561691d7..ed4c2e740670be 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -168,10 +168,9 @@ int mlx4_ib_create_user_cq(struct ib_cq *ibcq,
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
 
-	if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
-		err = -EFAULT;
+	err = ib_copy_validate_udata_in(udata, ucmd, db_addr);
+	if (err)
 		goto err_cq;
-	}
 
 	buf_addr = (void *)(unsigned long)ucmd.buf_addr;
 
@@ -332,8 +331,9 @@ static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq
 	if (cq->resize_umem)
 		return -EBUSY;
 
-	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd))
-		return -EFAULT;
+	err = ib_copy_validate_udata_in(udata, ucmd, buf_addr);
+	if (err)
+		return err;
 
 	cq->resize_buf = kmalloc_obj(*cq->resize_buf);
 	if (!cq->resize_buf)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index b87a4b7949a3a0..deb1b0306aa7a1 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1053,16 +1053,12 @@ static int create_qp_common(struct ib_pd *pd, struct ib_qp_init_attr *init_attr,
 
 	if (udata) {
 		struct mlx4_ib_create_qp ucmd;
-		size_t copy_len;
 		int shift;
 		int n;
 
-		copy_len = sizeof(struct mlx4_ib_create_qp);
-
-		if (ib_copy_from_udata(&ucmd, udata, copy_len)) {
-			err = -EFAULT;
+		err = ib_copy_validate_udata_in(udata, ucmd, sq_no_prefetch);
+		if (err)
 			goto err;
-		}
 
 		qp->inl_recv_sz = ucmd.inl_recv_sz;
 
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index c4cf91235eee3a..5b23e5f8b84aca 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -111,8 +111,9 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
 	if (udata) {
 		struct mlx4_ib_create_srq ucmd;
 
-		if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd)))
-			return -EFAULT;
+		err = ib_copy_validate_udata_in(udata, ucmd, db_addr);
+		if (err)
+			return err;
 
 		srq->umem =
 			ib_umem_get(ib_srq->device, ucmd.buf_addr, buf_size, 0);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 13/16] RDMA/hns: Add missing comp_mask check in create_qp
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

hns has a comp_mask field that was never checked for validity, check
it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/hns/hns_roce_qp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 3d6eb22cbcd940..a27ea85bb06323 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -1130,7 +1130,9 @@ static int set_qp_param(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
 	}
 
 	if (udata) {
-		ret = ib_copy_validate_udata_in(udata, *ucmd, reserved);
+		ret = ib_copy_validate_udata_in_cm(
+			udata, *ucmd, reserved,
+			HNS_ROCE_CREATE_QP_MASK_CONGEST_TYPE);
 		if (ret)
 			return ret;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 05/16] RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

struct pvrdma_create_srq was introduced when the driver was first
merged but was never used. At that point it had only buf_addr. Later
when SRQ was introduced the struct was expanded. So unlike the other
cases that grab the first struct member based on git blame this
uses the entire struct.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
index b3df6eb9b8eff6..bc3adcc1ae67c2 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -134,10 +134,9 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	cq->is_kernel = !udata;
 
 	if (!cq->is_kernel) {
-		if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
-			ret = -EFAULT;
+		ret = ib_copy_validate_udata_in(udata, ucmd, reserved);
+		if (ret)
 			goto err_cq;
-		}
 
 		cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, ucmd.buf_size,
 				       IB_ACCESS_LOCAL_WRITE);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 16/16] RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

A udata should be read only once per ioctl, not multiple times.
Multiple reads make it unclear what the content is since userspace can
change it between the reads.

Lift the ib_copy_validate_udata_in() out of
alloc_srq_buf()/alloc_srq_db() and into hns_roce_create_srq().

Found by AI.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/hns/hns_roce_srq.c | 35 +++++++++++-------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_srq.c b/drivers/infiniband/hw/hns/hns_roce_srq.c
index 601f8cdfce96a3..cb848e8e6bbd76 100644
--- a/drivers/infiniband/hw/hns/hns_roce_srq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_srq.c
@@ -340,22 +340,16 @@ static int set_srq_param(struct hns_roce_srq *srq,
 }
 
 static int alloc_srq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq,
-			 struct ib_udata *udata)
+			 struct ib_udata *udata,
+			 struct hns_roce_ib_create_srq *ucmd)
 {
-	struct hns_roce_ib_create_srq ucmd = {};
 	int ret;
 
-	if (udata) {
-		ret = ib_copy_validate_udata_in(udata, ucmd, que_addr);
-		if (ret)
-			return ret;
-	}
-
-	ret = alloc_srq_idx(hr_dev, srq, udata, ucmd.que_addr);
+	ret = alloc_srq_idx(hr_dev, srq, udata, ucmd->que_addr);
 	if (ret)
 		return ret;
 
-	ret = alloc_srq_wqe_buf(hr_dev, srq, udata, ucmd.buf_addr);
+	ret = alloc_srq_wqe_buf(hr_dev, srq, udata, ucmd->buf_addr);
 	if (ret)
 		goto err_idx;
 
@@ -404,22 +398,18 @@ static void free_srq_db(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq,
 
 static int alloc_srq_db(struct hns_roce_dev *hr_dev, struct hns_roce_srq *srq,
 			struct ib_udata *udata,
+			struct hns_roce_ib_create_srq *ucmd,
 			struct hns_roce_ib_create_srq_resp *resp)
 {
-	struct hns_roce_ib_create_srq ucmd;
 	struct hns_roce_ucontext *uctx;
 	int ret;
 
 	if (udata) {
-		ret = ib_copy_validate_udata_in(udata, ucmd, que_addr);
-		if (ret)
-			return ret;
-
 		if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB) &&
-		    (ucmd.req_cap_flags & HNS_ROCE_SRQ_CAP_RECORD_DB)) {
+		    (ucmd->req_cap_flags & HNS_ROCE_SRQ_CAP_RECORD_DB)) {
 			uctx = rdma_udata_to_drv_context(udata,
 					struct hns_roce_ucontext, ibucontext);
-			ret = hns_roce_db_map_user(uctx, ucmd.db_addr,
+			ret = hns_roce_db_map_user(uctx, ucmd->db_addr,
 						   &srq->rdb);
 			if (ret)
 				return ret;
@@ -448,6 +438,7 @@ int hns_roce_create_srq(struct ib_srq *ib_srq,
 	struct hns_roce_dev *hr_dev = to_hr_dev(ib_srq->device);
 	struct hns_roce_ib_create_srq_resp resp = {};
 	struct hns_roce_srq *srq = to_hr_srq(ib_srq);
+	struct hns_roce_ib_create_srq ucmd = {};
 	int ret;
 
 	mutex_init(&srq->mutex);
@@ -457,11 +448,17 @@ int hns_roce_create_srq(struct ib_srq *ib_srq,
 	if (ret)
 		goto err_out;
 
-	ret = alloc_srq_buf(hr_dev, srq, udata);
+	if (udata) {
+		ret = ib_copy_validate_udata_in(udata, ucmd, que_addr);
+		if (ret)
+			goto err_out;
+	}
+
+	ret = alloc_srq_buf(hr_dev, srq, udata, &ucmd);
 	if (ret)
 		goto err_out;
 
-	ret = alloc_srq_db(hr_dev, srq, udata, &resp);
+	ret = alloc_srq_db(hr_dev, srq, udata, &ucmd, &resp);
 	if (ret)
 		goto err_srq_buf;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 10/16] RDMA/efa: Use ib_copy_validate_udata_in_cm()
From: Jason Gunthorpe @ 2026-03-12  0:24 UTC (permalink / raw)
  To: Abhijit Gangurde, Allen Hubbe,
	Broadcom internal kernel review list, Bernard Metzler, Bryan Tan,
	Cheng Xu, Gal Pressman, Junxian Huang, Kai Shen,
	Konstantin Taranov, Krzysztof Czurylo, Leon Romanovsky,
	linux-hyperv, linux-rdma, Long Li, Michal Kalderon,
	Michael Margolin, Nelson Escobar, Satish Kharat, Selvin Xavier,
	Yossi Leybovich, Chengchang Tang, Tatyana Nikolova, Vishnu Dasa,
	Yishai Hadas, Zhu Yanjun
  Cc: patches
In-Reply-To: <0-v1-2b86f54cda42+7d-rdma_udata_req_jgg@nvidia.com>

Add the missed check for unsupported comp_mask bits.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/efa/efa_verbs.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 8d9357e2d513bb..22993273028433 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1918,13 +1918,13 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
 	 * it's fine if the driver does not know all request fields,
 	 * we will ack input fields in our response.
 	 */
-
-	err = ib_copy_from_udata(&cmd, udata,
-				 min(sizeof(cmd), udata->inlen));
-	if (err) {
-		ibdev_dbg(&dev->ibdev,
-			  "Cannot copy udata for alloc_ucontext\n");
-		goto err_out;
+	if (udata->inlen) {
+		err = ib_copy_validate_udata_in_cm(
+			udata, cmd, comp_mask,
+			EFA_ALLOC_UCONTEXT_CMD_COMP_TX_BATCH |
+				EFA_ALLOC_UCONTEXT_CMD_COMP_MIN_SQ_WR);
+		if (err)
+			goto err_out;
 	}
 
 	err = efa_user_comp_handshake(ibucontext, &cmd);
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox