* Re: [PATCH rc 00/15] Various bug fixes for RDMA drivers in the uapi functions
From: Junxian Huang @ 2026-04-29 7:55 UTC (permalink / raw)
To: Jason Gunthorpe, Andrew Lunn,
Broadcom internal kernel review list, Bryan Tan, Eric Dumazet,
Konstantin Taranov, Jakub Kicinski, Leon Romanovsky, linux-hyperv,
linux-rdma, netdev, Paolo Abeni, Selvin Xavier, Chengchang Tang,
Tariq Toukan, Vishnu Dasa, Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Long Li,
Lijun Ou, Parav Pandit, patches, Roland Dreier, Roland Dreier,
Sagi Grimberg, Ajay Sharma, stable, Tariq Toukan, Wei Hu (Xavier),
Shaobo Xu, Nenglong Zhao
In-Reply-To: <0-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
On 2026/4/29 0:17, Jason Gunthorpe wrote:
> All were found by Sashiko or Claude AI tools. They vary in severity, but
> are all things that shouldn't be present.
>
> Jason Gunthorpe (15):
> RDMA/hns: Fix xarray race in hns_roce_create_srq()
> RDMA/hns: Fix xarray race in hns_roce_create_qp_common()
> RDMA/hns: Fix unlocked call to hns_roce_qp_remove()
For hns patches:
Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com>
Thanks,
Junxian
^ permalink raw reply
* RE: [PATCH] Drivers: hv: vmbus: Improve the logc of reserving fb_mmio on Gen2 VMs
From: Dexuan Cui @ 2026-04-29 3:12 UTC (permalink / raw)
To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Long Li, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, matthew.ruffell@canonical.com,
johansen@templeofstupid.com
Cc: stable@vger.kernel.org
In-Reply-To: <SN6PR02MB41576A849B6C4967622B4BA8D42A2@SN6PR02MB4157.namprd02.prod.outlook.com>
> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Thursday, April 23, 2026 10:40 AM
Sorry for the late response! I got sidetracked by something else.
> > If vmbus_reserve_fb() in the kdump kernel fails to properly reserve the
>
> This problem has wider scope than just kdump. Any kexec'ed kernel would see
> the same problem, though kdump is probably the most common case. But the
> discussion here, and the mention of kdump in the code comments, should be
> adjusted accordingly.
Agreed. I'll post v2, which will use "kdump/kexec".
> > framebuffer MMIO range due to a Gen2 VM's screen.lfb_base being zero [1],
> > there is an MMIO conflict between the drivers hyperv_drm and pci-hyperv.
>
> You describe an MMIO "conflict" without giving the details. Is that
> intentional to keep the commit message from being too long? It might be
Yes.
> helpful to future readers to say a little more about how PCI devices must not
> use MMIO space that the hypervisor has assigned to the frame buffer.
Will do.
> As you noted in the detailed discussion in the other email thread [2],
> there's a Gen1 VM case that this patch doesn't fix. For completeness,
> perhaps that case should be called out in this commit message.
Will do.
> > + /* Hyper-V CoCo guests do not have a framebuffer device. */
> > + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> > + return;
>
> This test is testing feature "A" (mem encryption) in order to determine
> the presence of feature "B" (no framebuffer), because current
> configurations happen to always have "A" and "B" at the same time. But
> the linkage between the features is tenuous, and if configurations should
> change in the future, testing this way could be bogus. It works now, but I'm
> leery of depending on the linkage between "A" and "B".
>
> You could set up a "can_have_framebuffer" flag in ms_hyperv_init_platform()
> if running in a CVM, and test that flag here. But I'd suggest just dropping
> this optimization. CVMs are always Gen2 (and that's not going to change),
> so they have plenty of low mmio space.
This is not true on a lab host, e.g. I have a TDX VM on a lab host created
by these 2 commands (without the 2nd command, Hyper-V won't allow
the TDX VM to start):
New-VM -Generation 2 -GuestStateIsolationType Tdx -Name $vmName
Disable-VMConsoleSupport -VMName $vmName
The low_mmio_base is still 4GB-128MB. In this case, it's not a good idea
to try to reserve the 128MB:
1) the available low MMIO size is smaller than 128MB due to the vTPM
MMIO range.
2) even if we can reserve the 109.25 low mmio range
[0xf8000000-0xfed3ffff], we may not want to do that, just in case
some assigned PCI device has 32-bit BARs.
So, IMO we need to keep the check:
+ if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+ return;
BTW, I think this may be a slightly better check here:
+ if (hv_is_isolation_supported())
+ return;
A CVM on Hyper-V won't start without the command line
Disable-VMConsoleSupport -VMName $vmName
IMO this is very unlikely to change in the future, because the Hyper-V
synthetic framebuffer VMBus device is not a trusted device for a CVM,
so there is no reason for Hyper-V to offer such a device to CVMs; even
if the host offers it, currently the guest hv_vmbus driver ignores it.
When we assign a physical PCI GPU device to a CVM, I'm not sure if there
is any framebuffer from the GPU or not. Even if there is, that's a completely
different scenario and not reserving some low MMIO for "framebuffer"
is unrelated: I think hyperv_drm (or the deprecated hyperv_fb) is the only
driver that sets the fb_overlap_ok parameter of vmbus_allocate_mmio().
> And at the moment, CVMs don't
> support PCI devices,
This is not true: recently I created a "Standard DC16eds v6" TDX CVM
on Azure, and I did see two NVMe local temporary disks in "nvme list"
(here TDISP is not used). In 2023, we added the commit
2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
and I believe some users are running CVMs with GPUs.
> so can't encounter a conflict (though conceivably
Correct, since there is no legacy or synthetic framebuffer device for CVMs.
> some new flavor of CVM in the future could support PCI devices).
>
> > +
> > if (efi_enabled(EFI_BOOT)) {
> > /* Gen2 VM: get FB base from EFI framebuffer */
> > if (IS_ENABLED(CONFIG_SYSFB)) {
> > start = sysfb_primary_display.screen.lfb_base;
> > size = max_t(__u32,
> sysfb_primary_display.screen.lfb_size, 0x800000);
> > +
> > + low_mmio_base = hyperv_mmio->start;
> > + if (!low_mmio_base || low_mmio_base >= SZ_4G ||
> > + (start && start < low_mmio_base)) {
> > + pr_warn("Unexpected low mmio base
> 0x%pa\n", &low_mmio_base);
> > + } else {
> > + /*
> > + * If the kdump kernel's lfb_base is 0,
>
> As mentioned earlier, this case isn't just kdump kernels.
Yes, the first kernel also runs here with a non-zero 'start'.
>
> > + * fall back to the low mmio base.
> > + */
> > + if (!start)
> > + start = low_mmio_base;
> > + /*
> > + * Reserve half of the space below 4GB for
> high
> > + * resolutions, but cap the reservation to
> 128MB.
> > + */
> > + size = min((SZ_4G - start) / 2, SZ_128M);
> > + }
> > }
> > } else {
> > /* Gen1 VM: get FB base from PCI */
> > @@ -2433,6 +2457,8 @@ static void __maybe_unused
> vmbus_reserve_fb(void)
> > */
> > for (; !fb_mmio && (size >= 0x100000); size >>= 1)
> > fb_mmio = __request_region(hyperv_mmio, start, size,
> fb_mmio_name, 0);
>
> Just above this "for" loop, "start" is tested for 0. This patch eliminates the main
> reason start might be 0. But I guess it's still possible that the legacy PCI device
> BAR might return 0 for a Gen1 VM?
IMO the legacy PCI BAR's base in a Gen1 VM can't be 0.
> Or you might get 0 if the pr_warn() about low
> mmio base is triggered. But I'm thinking maybe a pr_warn() should be done if
> start is zero.
Ok, will add a pr_warn() here.
> > +
> > + pr_info("hv_mmio=%pR,%pR fb=%pR\n", hyperv_mmio,
> hyperv_mmio->sibling, fb_mmio);
>
> Outputting the above info is nice!
>
> Michael
Thanks for all the good input! Will post v2 for review.
Thanks,
Dexuan
^ permalink raw reply
* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Dexuan Cui @ 2026-04-29 1:58 UTC (permalink / raw)
To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
Jake Oshins, linux-hyperv@vger.kernel.org,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
matthew.ruffell@canonical.com, kjlx@templeofstupid.com
Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SN6PR02MB4157D5BAFAE2134276241FFED42A2@SN6PR02MB4157.namprd02.prod.outlook.com>
> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Thursday, April 23, 2026 10:40 AM
> > ...
> > Another example is: for a Gen2 VM with the below commands:
> > Set-VM -LowMemoryMappedIoSpace 1GB \
> > -VMName decui-u2204-gen2-fb
> > // i.e. the default setting on Azure. Let's ignore CVMs here.
Sorry for the incorrect statement: this is not the default setting
on Azure. The default for regular VMs on Azure should be
"-LowMemoryMappedIoSpace 3GB". Not sure how I made the
incorrect statement -- I guess I might have confused my local VM
with my Azure VM, and at some moment, I might have mistaken
the meaning of the "-LowMemoryMappedIoSpace" parameter:
for that local VM, I might somehow incorrectly though that the
param means low_mmio_base rather than low_mmio_size.
> FWIW, I'm seeing that in Gen2 VMs in Azure, the low_mmio_size
> is 3 GiB. I'm looking at a D16ds_v5, and a D16lds_v6. The v5 VM
> is newly created, while the v6 has been around for a few months.
This is also my observation, after I double checked my Azure VM.
> In a CVM, the low_mmio_size should be 1 GiB. This overall example
> is still correct -- it's just the comment that I have doubts about. Or
> maybe you are looking at a different VM size that has a different
> default?
For CVMs, yes, the low_mmio_size is 1GB.
>
> Some years back, I had gotten into a discussion with Azure about
> this size because the swiotlb memory wants to be allocated below
> the 4 GiB line, and reserving 3 GiB for low mmio limited the size
> of the swiotlb. CVMs were changed to have only 1 GiB for low
> mmio because they need a larger swiotlb.
Right, I also remember the story. :-)
> > With the below command:
> > Set-VM -LowMemoryMappedIoSpace 3GB \
> > -VMName decui-u2204-gen2-fb
> > // i.e. the default setting on Azure. Unlike x86-64, an ARM64
> > // VM on Azure has 3GB of mmio below 4GB.
>
> See my previous comment on the same topic. I think arm64
> and x86/x64 are the same.
Agreed.
> Question about Gen 1 VMs: If the Linux frame buffer driver moves
> the frame buffer somewhere other than the default location, and
> then the VM does a kexec/kdump, what does the legacy PCI graphic
> device BAR report as the frame buffer location? Does it *always*
> report 4G-128MB, or does it report the new location? I can run
It always reports 4G-128MB.
BTW, I suspect a Gen2 VM may have the same issue, i.e.
currently we only reserve 8MB below 4GB; if hyperv_drm uses
high MMIO, I suspect the UEFI firmware would still report the
same original low MMIO framebuffer base/size to the kdump kernel,
but there is no easy way to verify this for Gen2 VMs...
> an experiment to find out, but maybe you've already done so and
> not reported that detail here.
>
> Michael
I have a Gen1 Ubuntu 22.04 VM, and I run the below commands:
Set-VM -LowMemoryMappedIoSpace 128MB -VMName decui-u2204-gen1-fb
Set-VMVideo -VMName decui-u2204-gen1-fb -HorizontalResolution 7680 -VerticalResolution 4320 -ResolutionType Single
When the VM boots up, we reserve 64MB at 4G-128MB:
[ 11.492075] hv_vmbus: hv_mmio=[mem 0xf8000000-0xfed3ffff],[mem 0xfe0000000-0xfffffffff] fb=[mem 0xf8000000-0xfbffffff]
Since the required mmio size in the hyperv-drm driver is 128MB:
[ 28.631923] hyperv_connect_vsp: hyperv_drm: mmio_megabytes=128 MB
the driver has to allocate MMIO from the high MMIO space, because
we only reserve 64MB below 4GB, and the available low_mmio_size is
smaller than 128MB due to the vTPM MMIO range:
# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000e0000-000fffff : Reserved
000f0000-000fffff : System ROM
00100000-f7feffff : System RAM
d7000000-f6ffffff : Crash kernel
f7ff0000-f7ffefff : ACPI Tables
f7fff000-f7ffffff : ACPI Non-volatile Storage
f8000000-fffbffff : PCI Bus 0000:00
f8000000-fbffffff : 0000:00:08.0
fec00000-fec003ff : IOAPIC 0
fee00000-fee00fff : PNP0C02:01
fffc0000-ffffffff : PNP0C01:00
100000000-507ffffff : System RAM
281600000-28295449f : Kernel code
282a00000-283746fff : Kernel rodata
283800000-283c5287f : Kernel data
28411a000-2845fffff : Kernel bss
fe0000000-fffffffff : PCI Bus 0000:00
fe0000000-fe7ffffff : 5620e0c7-8062-4dce-aeb7-520c7ef76171
However, when the kdump kernel starts to run, and I print the
pci_resource_start(pdev, 0) and pci_resource_len(pdev, 0)
from vmbus_reserve_fb(), I still see 4G-128MB:
[ 12.506159] Gen1 VM: start=0xf8000000, size=0x4000000
In this case, we can't really fix the MMIO conflict, e.g.
if both hv_pci and hyperv_drm are built as modules, then
the order of loading them can be nondeterministic:if the order
in the first kernel is different from the order in
the kdump kernel, we run into trouble.
If the order is deterministic (e.g. hv_pci is
built-in, and hyperv_drm is built as a module),
we should be good since both allocates MMIO from
the high MMIO range in a deterministic way.
Thanks,
Dexuan
^ permalink raw reply
* Re: [PATCH] hv_sock: fix ARM64 support
From: Jakub Kicinski @ 2026-04-28 23:48 UTC (permalink / raw)
To: Dexuan Cui
Cc: Hamza Mahfooz, netdev@vger.kernel.org, KY Srinivasan,
Haiyang Zhang, Wei Liu, Long Li, Stefano Garzarella,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Michael Kelley, Himadri Pandya, linux-hyperv@vger.kernel.org,
virtualization@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <SA1PR21MB69211500C7F60FC29F1BAA79BF372@SA1PR21MB6921.namprd21.prod.outlook.com>
On Tue, 28 Apr 2026 21:24:59 +0000 Dexuan Cui wrote:
> > Sent: Tuesday, April 28, 2026 5:54 AM
> > Subject: [PATCH] hv_sock: fix ARM64 support
>
> Typically, for a change to net/, you'd want to add a "net" or "net-next"
> after the "PATCH", i.e.
>
> [PATCH net]
> or
> [PATCH net v2]
>
> See "Documentation/process/maintainer-netdev.rst"
Speaking of Documentation/process/maintainer-netdev.rst:
Reviewer guidance
-----------------
[...]
Reviewers are highly encouraged to do more in-depth review of submissions
and not focus exclusively on process issues, trivial or subjective
matters like code formatting, tags etc.
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#reviewer-guidance
^ permalink raw reply
* Re: [PATCH v4 3/3] mshv: unmap debugfs stats pages on kexec
From: Stanislav Kinsburskii @ 2026-04-28 23:31 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, Anirudh Rayabharam, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-4-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:54PM -0700, Jork Loeser wrote:
> On L1VH, debugfs stats pages are overlay pages: the kernel allocates
> them and registers the GPAs with the hypervisor via
> HVCALL_MAP_STATS_PAGE2. These overlay mappings persist in the
> hypervisor across kexec. If the kexec'd kernel reuses those physical
> pages, the hypervisor's overlay semantics cause a machine check
> exception.
>
> Fix this by calling mshv_debugfs_exit() from the reboot notifier,
> which issues HVCALL_UNMAP_STATS_PAGE for each mapped stats page before
> kexec. This releases the overlay bindings so the physical pages can be
> safely reused. Guard mshv_debugfs_exit() against being called when
> init failed.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
> drivers/hv/mshv_debugfs.c | 7 ++++++-
> drivers/hv/mshv_synic.c | 1 +
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/mshv_debugfs.c b/drivers/hv/mshv_debugfs.c
> index 418b6dc8f3c2..3c3e02237ae9 100644
> --- a/drivers/hv/mshv_debugfs.c
> +++ b/drivers/hv/mshv_debugfs.c
> @@ -674,8 +674,10 @@ int __init mshv_debugfs_init(void)
>
> mshv_debugfs = debugfs_create_dir("mshv", NULL);
> if (IS_ERR(mshv_debugfs)) {
> + err = PTR_ERR(mshv_debugfs);
> + mshv_debugfs = NULL;
> pr_err("%s: failed to create debugfs directory\n", __func__);
> - return PTR_ERR(mshv_debugfs);
> + return err;
> }
>
> if (hv_root_partition()) {
> @@ -710,6 +712,9 @@ int __init mshv_debugfs_init(void)
>
> void mshv_debugfs_exit(void)
> {
> + if (!mshv_debugfs)
> + return;
> +
> mshv_debugfs_parent_partition_remove();
>
> if (hv_root_partition()) {
> diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
> index 978a1cace341..88170ce6b83f 100644
> --- a/drivers/hv/mshv_synic.c
> +++ b/drivers/hv/mshv_synic.c
> @@ -723,6 +723,7 @@ mshv_unregister_doorbell(u64 partition_id, int doorbell_portid)
> static int mshv_synic_reboot_notify(struct notifier_block *nb,
> unsigned long code, void *unused)
> {
> + mshv_debugfs_exit();
> cpuhp_remove_state(synic_cpuhp_online);
> return 0;
> }
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH v4 2/3] mshv: clean up SynIC state on kexec for L1VH
From: Stanislav Kinsburskii @ 2026-04-28 23:28 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, Anirudh Rayabharam, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-3-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:53PM -0700, Jork Loeser wrote:
> The reboot notifier that tears down the SynIC cpuhp state guards the
> cleanup with hv_root_partition(), so on L1VH (where
> hv_root_partition() is false) SINT0, SINT5, and SIRBP are never
> cleaned up before kexec. The kexec'd kernel then inherits stale
> unmasked SINTs and an enabled SIRBP pointing to freed memory.
>
> Remove the hv_root_partition() guard so the cleanup runs for all
> parent partitions.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
> drivers/hv/mshv_synic.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
> index 2db3b0192eac..978a1cace341 100644
> --- a/drivers/hv/mshv_synic.c
> +++ b/drivers/hv/mshv_synic.c
> @@ -723,9 +723,6 @@ mshv_unregister_doorbell(u64 partition_id, int doorbell_portid)
> static int mshv_synic_reboot_notify(struct notifier_block *nb,
> unsigned long code, void *unused)
> {
> - if (!hv_root_partition())
> - return 0;
> -
> cpuhp_remove_state(synic_cpuhp_online);
> return 0;
> }
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH v4 1/3] mshv: limit SynIC management to MSHV-owned resources
From: Stanislav Kinsburskii @ 2026-04-28 23:27 UTC (permalink / raw)
To: Jork Loeser
Cc: linux-hyperv, x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, Arnd Bergmann,
Michael Kelley, Anirudh Rayabharam, linux-kernel, linux-arch
In-Reply-To: <20260427213855.1675044-2-jloeser@linux.microsoft.com>
On Mon, Apr 27, 2026 at 02:38:52PM -0700, Jork Loeser wrote:
> The SynIC is shared between VMBus and MSHV. VMBus owns the message
> page (SIMP), event flags page (SIEFP), global enable (SCONTROL),
> and SINT2. MSHV adds SINT0, SINT5, and the event ring page (SIRBP).
>
> Currently mshv_synic_cpu_init() redundantly enables SIMP, SIEFP, and
> SCONTROL that VMBus already configured, and mshv_synic_cpu_exit()
> disables all of them. This is wrong because MSHV can be torn down
> while VMBus is still active. In particular, a kexec reboot notifier
> tears down MSHV first. Disabling SCONTROL, SIMP, and SIEFP out
> from under VMBus causes its later cleanup to write SynIC MSRs while
> SynIC is disabled, which the hypervisor does not tolerate.
>
> Restrict MSHV to managing only the resources it owns:
> - SINT0, SINT5: mask on cleanup, unmask on init
> - SIRBP: enable/disable as before
> - SIMP, SIEFP, SCONTROL: leave to VMBus when it is active (L1VH
> and nested root partition); on a non-nested root partition VMBus
> does not run, so MSHV must enable/disable them
>
> While here, fix the SIEFP and SIRBP memremap() and virt_to_phys()
> calls to use HV_HYP_PAGE_SHIFT/HV_HYP_PAGE_SIZE instead of
> PAGE_SHIFT/PAGE_SIZE. The hypervisor always uses 4K pages for SynIC
> register GPAs regardless of the kernel page size, so using PAGE_SHIFT
> produces wrong addresses on ARM64 with 64K pages.
>
> Note that initialization order matters - VMBUS first, MSHV second,
> and the reverse on de-init. Ideally, we would want a dedicated SYNIC
> driver that replaces the cross-dependencies with a clear API and
> dynamic tracking. Such refactor should go into its own dedicated
> series, outside of this kexec fix series.
>
> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
> ---
> drivers/hv/hv.c | 3 +
> drivers/hv/mshv_synic.c | 150 ++++++++++++++++++++++++++--------------
> 2 files changed, 103 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index ae60fd542292..ef4b1b03395d 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -272,6 +272,9 @@ void hv_synic_free(void)
> /*
> * hv_hyp_synic_enable_regs - Initialize the Synthetic Interrupt Controller
> * with the hypervisor.
> + *
> + * Note: When MSHV is present, mshv_synic_cpu_init() intializes further
> + * registers later.
> */
> void hv_hyp_synic_enable_regs(unsigned int cpu)
> {
> diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
> index e2288a726fec..2db3b0192eac 100644
> --- a/drivers/hv/mshv_synic.c
> +++ b/drivers/hv/mshv_synic.c
> @@ -13,6 +13,7 @@
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/cpuhotplug.h>
> +#include <linux/hyperv.h>
> #include <linux/reboot.h>
> #include <asm/mshyperv.h>
> #include <linux/acpi.h>
> @@ -456,46 +457,75 @@ static int mshv_synic_cpu_init(unsigned int cpu)
> union hv_synic_siefp siefp;
> union hv_synic_sirbp sirbp;
> union hv_synic_sint sint;
> - union hv_synic_scontrol sctrl;
> struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
> struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
> struct hv_synic_event_flags_page **event_flags_page =
> &spages->synic_event_flags_page;
> struct hv_synic_event_ring_page **event_ring_page =
> &spages->synic_event_ring_page;
> + /*
> + * VMBus owns SIMP/SIEFP/SCONTROL when it is active.
> + * See hv_hyp_synic_enable_regs() for that initialization.
> + */
> + bool vmbus_active = hv_vmbus_exists();
>
> - /* Setup the Synic's message page */
> + /*
> + * Map the SYNIC message page. When VMBus is not active the
> + * hypervisor pre-provisions the SIMP GPA but may not set
> + * simp_enabled — enable it here.
> + */
> simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
> - simp.simp_enabled = true;
> + if (!vmbus_active) {
> + simp.simp_enabled = true;
> + hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
> + }
> *msg_page = memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
> HV_HYP_PAGE_SIZE,
> MEMREMAP_WB);
>
> if (!(*msg_page))
> - return -EFAULT;
> -
> - hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
> + goto cleanup_simp;
It would be cleaner (and simpler to read), if there would be another
goto label to only unset HV_MSR_SIMP instead of checking *msg_page for
NULL again in the cleanup_simp label.
This applies to all the goto labels in this function.
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>
> - /* Setup the Synic's event flags page */
> + /*
> + * Map the event flags page. Same as SIMP: enable when
> + * VMBus is not active, already enabled by VMBus otherwise.
> + */
> siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
> - siefp.siefp_enabled = true;
> - *event_flags_page = memremap(siefp.base_siefp_gpa << PAGE_SHIFT,
> - PAGE_SIZE, MEMREMAP_WB);
> + if (!vmbus_active) {
> + siefp.siefp_enabled = true;
> + hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
> + }
> + *event_flags_page = memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
> + HV_HYP_PAGE_SIZE, MEMREMAP_WB);
>
> if (!(*event_flags_page))
> - goto cleanup;
> -
> - hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
> + goto cleanup_siefp;
>
> /* Setup the Synic's event ring page */
> sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
> - sirbp.sirbp_enabled = true;
> - *event_ring_page = memremap(sirbp.base_sirbp_gpa << PAGE_SHIFT,
> - PAGE_SIZE, MEMREMAP_WB);
>
> - if (!(*event_ring_page))
> - goto cleanup;
> + if (hv_root_partition()) {
> + *event_ring_page = memremap(sirbp.base_sirbp_gpa << HV_HYP_PAGE_SHIFT,
> + HV_HYP_PAGE_SIZE, MEMREMAP_WB);
>
> + if (!(*event_ring_page))
> + goto cleanup_siefp;
> + } else {
> + /*
> + * On L1VH the hypervisor does not provide a SIRBP page.
> + * Allocate one and program its GPA into the MSR.
> + */
> + *event_ring_page = (struct hv_synic_event_ring_page *)
> + get_zeroed_page(GFP_KERNEL);
> +
> + if (!(*event_ring_page))
> + goto cleanup_siefp;
> +
> + sirbp.base_sirbp_gpa = virt_to_phys(*event_ring_page)
> + >> HV_HYP_PAGE_SHIFT;
> + }
> +
> + sirbp.sirbp_enabled = true;
> hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
>
> if (mshv_sint_irq != -1)
> @@ -518,28 +548,30 @@ static int mshv_synic_cpu_init(unsigned int cpu)
> hv_set_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_DOORBELL_SINT_INDEX,
> sint.as_uint64);
>
> - /* Enable global synic bit */
> - sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
> - sctrl.enable = 1;
> - hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
> + /* When VMBus is active it already enabled SCONTROL. */
> + if (!vmbus_active) {
> + union hv_synic_scontrol sctrl;
> +
> + sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
> + sctrl.enable = 1;
> + hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
> + }
>
> return 0;
>
> -cleanup:
> - if (*event_ring_page) {
> - sirbp.sirbp_enabled = false;
> - hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
> - memunmap(*event_ring_page);
> - }
> - if (*event_flags_page) {
> +cleanup_siefp:
> + if (*event_flags_page)
> + memunmap(*event_flags_page);
> + if (!vmbus_active) {
> siefp.siefp_enabled = false;
> hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
> - memunmap(*event_flags_page);
> }
> - if (*msg_page) {
> +cleanup_simp:
> + if (*msg_page)
> + memunmap(*msg_page);
> + if (!vmbus_active) {
> simp.simp_enabled = false;
> hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
> - memunmap(*msg_page);
> }
>
> return -EFAULT;
> @@ -548,16 +580,15 @@ static int mshv_synic_cpu_init(unsigned int cpu)
> static int mshv_synic_cpu_exit(unsigned int cpu)
> {
> union hv_synic_sint sint;
> - union hv_synic_simp simp;
> - union hv_synic_siefp siefp;
> union hv_synic_sirbp sirbp;
> - union hv_synic_scontrol sctrl;
> struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
> struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
> struct hv_synic_event_flags_page **event_flags_page =
> &spages->synic_event_flags_page;
> struct hv_synic_event_ring_page **event_ring_page =
> &spages->synic_event_ring_page;
> + /* VMBus owns SIMP/SIEFP/SCONTROL when it is active */
> + bool vmbus_active = hv_vmbus_exists();
>
> /* Disable the interrupt */
> sint.as_uint64 = hv_get_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_INTERCEPTION_SINT_INDEX);
> @@ -574,28 +605,47 @@ static int mshv_synic_cpu_exit(unsigned int cpu)
> if (mshv_sint_irq != -1)
> disable_percpu_irq(mshv_sint_irq);
>
> - /* Disable Synic's event ring page */
> + /* Disable SYNIC event ring page owned by MSHV */
> sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
> sirbp.sirbp_enabled = false;
> - hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
> - memunmap(*event_ring_page);
>
> - /* Disable Synic's event flags page */
> - siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
> - siefp.siefp_enabled = false;
> - hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
> + if (hv_root_partition()) {
> + hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
> + memunmap(*event_ring_page);
> + } else {
> + sirbp.base_sirbp_gpa = 0;
> + hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
> + free_page((unsigned long)*event_ring_page);
> + }
> +
> + /*
> + * Release our mappings of the message and event flags pages.
> + * When VMBus is not active, we enabled SIMP/SIEFP — disable
> + * them. Otherwise VMBus owns the MSRs — leave them.
> + */
> memunmap(*event_flags_page);
> + if (!vmbus_active) {
> + union hv_synic_simp simp;
> + union hv_synic_siefp siefp;
>
> - /* Disable Synic's message page */
> - simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
> - simp.simp_enabled = false;
> - hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
> + siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
> + siefp.siefp_enabled = false;
> + hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
> +
> + simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
> + simp.simp_enabled = false;
> + hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
> + }
> memunmap(*msg_page);
>
> - /* Disable global synic bit */
> - sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
> - sctrl.enable = 0;
> - hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
> + /* When VMBus is active it owns SCONTROL — leave it. */
> + if (!vmbus_active) {
> + union hv_synic_scontrol sctrl;
> +
> + sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
> + sctrl.enable = 0;
> + hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
> + }
>
> return 0;
> }
> --
> 2.43.0
>
^ permalink raw reply
* [PATCH] mshv: Simplify GPA map/unmap hypercall helpers
From: Stanislav Kinsburskii @ 2026-04-28 23:21 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
Clean up hv_do_map_gpa_hcall() and hv_call_unmap_gpa_pages() after the
preceding bug-fix patches:
Move "done += completed" before the status checks so that pages mapped
by a partially-successful batch are included in the error cleanup unmap.
Previously these mappings were leaked on failure.
While here, improve type safety and readability:
- Change "int done" to "u64 done" to match the u64 page_count it is
compared against, avoiding signed/unsigned comparison hazards.
- Use u64 for loop iteration and batch size variables consistently.
- Add proper braces to the for-loop body in hv_do_map_gpa_hcall().
- Remove unnecessary "ret" variable from hv_call_unmap_gpa_pages().
- Simplify the error-path unmap to use "done << large_shift" directly
instead of mutating done in place.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root_hv_call.c | 55 +++++++++++++++-------------------------
1 file changed, 20 insertions(+), 35 deletions(-)
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index e5992c324904a..f5f205a397834 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -195,8 +195,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
struct hv_input_map_gpa_pages *input_page;
u64 status, *pfnlist;
unsigned long irq_flags, large_shift = 0;
- int ret = 0, done = 0;
- u64 page_count = page_struct_count;
+ u64 done = 0, page_count = page_struct_count;
+ int ret = 0;
if (page_count == 0 || (pages && mmio_spa))
return -EINVAL;
@@ -213,8 +213,8 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
}
while (done < page_count) {
- ulong i, completed, remain = page_count - done;
- int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
+ u64 i, completed, remain = page_count - done;
+ u64 rep_count = min(remain, (u64)HV_MAP_GPA_BATCH_SIZE);
local_irq_save(irq_flags);
input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -224,23 +224,13 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
input_page->map_flags = flags;
pfnlist = input_page->source_gpa_page_list;
- for (i = 0; i < rep_count; i++)
- if (flags & HV_MAP_GPA_NO_ACCESS) {
+ for (i = 0; i < rep_count; i++) {
+ if (flags & HV_MAP_GPA_NO_ACCESS)
pfnlist[i] = 0;
- } else if (pages) {
- u64 index = (done + i) << large_shift;
-
- if (index >= page_struct_count) {
- ret = -EINVAL;
- break;
- }
- pfnlist[i] = page_to_pfn(pages[index]);
- } else {
+ else if (pages)
+ pfnlist[i] = page_to_pfn(pages[(done + i) << large_shift]);
+ else
pfnlist[i] = mmio_spa + done + i;
- }
- if (ret) {
- local_irq_restore(irq_flags);
- break;
}
status = hv_do_rep_hypercall(HVCALL_MAP_GPA_PAGES, rep_count, 0,
@@ -248,29 +238,26 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
local_irq_restore(irq_flags);
completed = hv_repcomp(status);
+ done += completed;
if (hv_result_needs_memory(status)) {
ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
HV_MAP_GPA_DEPOSIT_PAGES);
if (ret)
break;
-
} else if (!hv_result_success(status)) {
ret = hv_result_to_errno(status);
break;
}
-
- done += completed;
}
if (ret && done) {
u32 unmap_flags = 0;
- if (flags & HV_MAP_GPA_LARGE_PAGE) {
+ if (flags & HV_MAP_GPA_LARGE_PAGE)
unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
- done <<= large_shift;
- }
- hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
+ hv_call_unmap_gpa_pages(partition_id, gfn,
+ done << large_shift, unmap_flags);
}
return ret;
@@ -305,7 +292,7 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
struct hv_input_unmap_gpa_pages *input_page;
u64 status, page_count = page_count_4k;
unsigned long irq_flags, large_shift = 0;
- int ret = 0, done = 0;
+ u64 done = 0;
if (page_count == 0)
return -EINVAL;
@@ -319,8 +306,8 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
}
while (done < page_count) {
- ulong completed, remain = page_count - done;
- int rep_count = min(remain, HV_UMAP_GPA_PAGES);
+ u64 completed, remain = page_count - done;
+ u64 rep_count = min(remain, (u64)HV_UMAP_GPA_PAGES);
local_irq_save(irq_flags);
input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -333,15 +320,13 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
local_irq_restore(irq_flags);
completed = hv_repcomp(status);
- if (!hv_result_success(status)) {
- ret = hv_result_to_errno(status);
- break;
- }
-
done += completed;
+
+ if (!hv_result_success(status))
+ return hv_result_to_errno(status);
}
- return ret;
+ return 0;
}
int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
^ permalink raw reply related
* Re: [PATCH v2] mshv: Fix interrupt state corruption in hv_do_map_pfns error path
From: Stanislav Kinsburskii @ 2026-04-28 23:00 UTC (permalink / raw)
To: Michael Kelley
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, longli@microsoft.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR02MB41578863BCE23D41B6A521B8D4372@SN6PR02MB4157.namprd02.prod.outlook.com>
On Tue, Apr 28, 2026 at 12:20:35AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, April 27, 2026 7:44 AM
> >
> > Restore interrupt state before breaking out of the loop on error.
> >
> > The irq_flags are saved before entering the loop, but the early exit
> > path on error fails to restore them. This leaves interrupts in an
> > inconsistent state and can lead to lockdep warnings or other
> > interrupt-related issues.
> >
> > Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> > drivers/hv/mshv_root_hv_call.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > index ab210a7fcb8c3..61291ec6f3468 100644
> > --- a/drivers/hv/mshv_root_hv_call.c
> > +++ b/drivers/hv/mshv_root_hv_call.c
> > @@ -229,8 +229,10 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > } else {
> > pfnlist[i] = mmio_spa + done + i;
> > }
> > - if (ret)
> > + if (ret) {
> > + local_irq_restore(irq_flags);
> > break;
> > + }
> >
>
> This looks good for fixing the immediate bug.
>
> But I'd note that this error path occurs solely based on the
> if (index >= page_struct_count) test in the preceding 'for' loop. That test is a
> "can't happen" sanity test that never triggers if hv_do_map_gpa_hcall()
> is coded correctly. At the beginning of the function there are validations of
> the input arguments, which is reasonable. But this sanity test isn't based
> on the input arguments, and it adds non-trivial complexity to the code
> because of the nested loops and the need to figure out where the two
> "break" statements go. I'd argue for dropping the sanity test entirely,
> along with this test of 'ret' and the need to restore the interrupt state.
>
Fair enough. Let me rework this function (and it's unmap peer).
Thanks,
Stanislav
> Michael
^ permalink raw reply
* Re: [PATCH] mshv: Fix interrupt state corruption in hv_do_map_pfns error path
From: Stanislav Kinsburskii @ 2026-04-28 22:49 UTC (permalink / raw)
To: Anirudh Rayabharam
Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <20260424-merry-elfish-fossa-885586@anirudhrb>
On Fri, Apr 24, 2026 at 02:35:09PM +0000, Anirudh Rayabharam wrote:
> On Wed, Apr 22, 2026 at 12:15:28AM +0000, Stanislav Kinsburskii wrote:
> > Restore interrupt state before breaking out of the loop on error.
> >
> > The irq_flags are saved before entering the loop, but the early exit
> > path on error fails to restore them. This leaves interrupts in an
> > inconsistent state and can lead to lockdep warnings or other
> > interrupt-related issues.
> >
> > Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> > drivers/hv/mshv_root_hv_call.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > index 7ed623668c8ec..6381f949d9d91 100644
> > --- a/drivers/hv/mshv_root_hv_call.c
> > +++ b/drivers/hv/mshv_root_hv_call.c
> > @@ -237,8 +237,10 @@ static int hv_do_map_pfns(u64 partition_id, u64 gfn, u64 pfns_count,
>
> Umm... I don't see this function in the hyperv-next tree at all.
>
Please see v2.
Thanks,
Stanislav
> Anirudh.
>
> > } else {
> > pfnlist[i] = mmio_spa + done + i;
> > }
> > - if (ret)
> > + if (ret) {
> > + local_irq_restore(irq_flags);
> > break;
> > + }
> >
> > status = hv_do_rep_hypercall(HVCALL_MAP_GPA_PAGES, rep_count, 0,
> > input_page, NULL);
> >
> >
^ permalink raw reply
* [PATCH] mshv: Add dedicated ioctl for GVA to GPA translation
From: Stanislav Kinsburskii @ 2026-04-28 22:48 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
Add an MSHV_TRANSLATE_GVA ioctl on the VP fd that wraps
HVCALL_TRANSLATE_VIRTUAL_ADDRESS_EX with transparent fault-in handling for
movable memory regions. The passthrough path for this hypercall is retained
for backward compatibility.
When guest-backing pages reside in movable memory regions, the mmu_notifier
invalidation path remaps them to NO_ACCESS in the hypervisor's second-level
address translation tables. If the VMM issues a GVA translation (e.g.
during MMIO emulation) while a page-table page is invalidated, the
hypervisor returns HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS. The VMM cannot
resolve this on its own.
The new ioctl detects this transient GPA access failure, faults the page
back in via mshv_region_handle_gfn_fault(), and retries the translation
until it succeeds or an unrecoverable error occurs.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root.h | 3 ++
drivers/hv/mshv_root_hv_call.c | 37 +++++++++++++++++++++
drivers/hv/mshv_root_main.c | 69 ++++++++++++++++++++++++++++++++++++++++
include/hyperv/hvgdk_mini.h | 1 +
include/hyperv/hvhdk.h | 41 ++++++++++++++++++++++++
include/uapi/linux/mshv.h | 10 ++++++
6 files changed, 161 insertions(+)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 1f086dcb7aa1a..2e6c4414740cc 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -290,6 +290,9 @@ int hv_call_delete_vp(u64 partition_id, u32 vp_index);
int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
u64 dest_addr,
union hv_interrupt_control control);
+int hv_call_translate_virtual_address_ex(u32 vp_index, u64 partition_id,
+ u64 flags, u64 gva, u64 *gfn,
+ struct hv_translate_gva_result_ex *result);
int hv_call_clear_virtual_interrupt(u64 partition_id);
int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
union hv_gpa_page_access_state_flags state_flags,
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index e5992c324904a..9ff4ba5373f59 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -692,6 +692,43 @@ int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
return 0;
}
+int hv_call_translate_virtual_address_ex(u32 vp_index, u64 partition_id,
+ u64 flags, u64 gva, u64 *gfn,
+ struct hv_translate_gva_result_ex *result)
+{
+ struct hv_input_translate_virtual_address *input;
+ struct hv_output_translate_virtual_address_ex *output;
+ unsigned long irq_flags;
+ u64 status;
+
+ local_irq_save(irq_flags);
+
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+ memset(input, 0, sizeof(*input));
+ input->partition_id = partition_id;
+ input->vp_index = vp_index;
+ input->control_flags = flags;
+ input->gva_page = gva >> HV_HYP_PAGE_SHIFT;
+
+ status = hv_do_hypercall(HVCALL_TRANSLATE_VIRTUAL_ADDRESS_EX,
+ input, output);
+
+ if (!hv_result_success(status)) {
+ local_irq_restore(irq_flags);
+ pr_err("%s: %s\n", __func__, hv_result_to_string(status));
+ return hv_result_to_errno(status);
+ }
+
+ *result = output->translation_result;
+ *gfn = output->gpa_page;
+
+ local_irq_restore(irq_flags);
+
+ return 0;
+}
+
int
hv_call_clear_virtual_interrupt(u64 partition_id)
{
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index bd1359eb58dd4..2d7b6923415a8 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -898,6 +898,72 @@ mshv_vp_ioctl_get_set_state(struct mshv_vp *vp,
return 0;
}
+static bool mshv_gpa_fault_retryable(u32 result_code)
+{
+ /*
+ * Note: HV_TRANSLATE_GVA_GPA_UNMAPPED is intentionally not handled
+ * here. The guest page table cannot be unmapped under normal
+ * operation. It may be mapped with no access during page moves,
+ * but a truly unmapped state indicates a kernel driver bug.
+ * Retrying in this case would only mask the underlying problem of
+ * an unmapped guest page table.
+ */
+ return result_code == HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS;
+}
+
+static long
+mshv_vp_ioctl_translate_gva(struct mshv_vp *vp, void __user *user_args)
+{
+ struct mshv_partition *partition = vp->vp_partition;
+ struct mshv_translate_gva args;
+ struct hv_translate_gva_result_ex result;
+ u64 gfn, gpa;
+ int ret;
+
+ if (copy_from_user(&args, user_args, sizeof(args)))
+ return -EFAULT;
+
+ do {
+ ret = hv_call_translate_virtual_address_ex(vp->vp_index,
+ partition->pt_id,
+ args.flags, args.gva,
+ &gfn, &result);
+ if (ret)
+ return ret;
+
+ if (mshv_gpa_fault_retryable(result.result_code)) {
+ struct mshv_mem_region *region;
+ bool faulted;
+
+ region = mshv_partition_region_by_gfn_get(partition,
+ gfn);
+ if (!region)
+ return -EFAULT;
+
+ faulted = false;
+ if (region->mreg_type == MSHV_REGION_TYPE_MEM_MOVABLE)
+ faulted = mshv_region_handle_gfn_fault(region,
+ gfn);
+ mshv_region_put(region);
+
+ if (!faulted)
+ return -EFAULT;
+
+ cond_resched();
+ }
+ } while (mshv_gpa_fault_retryable(result.result_code));
+
+ gpa = (gfn << PAGE_SHIFT) | (args.gva & ~PAGE_MASK);
+
+ if (copy_to_user(args.result, &result, sizeof(*args.result)))
+ return -EFAULT;
+
+ if (copy_to_user(args.gpa, &gpa, sizeof(*args.gpa)))
+ return -EFAULT;
+
+ return 0;
+}
+
static long
mshv_vp_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
{
@@ -917,6 +983,9 @@ mshv_vp_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
case MSHV_SET_VP_STATE:
r = mshv_vp_ioctl_get_set_state(vp, (void __user *)arg, true);
break;
+ case MSHV_TRANSLATE_GVA:
+ r = mshv_vp_ioctl_translate_gva(vp, (void __user *)arg);
+ break;
case MSHV_ROOT_HVCALL:
r = mshv_ioctl_passthru_hvcall(vp->vp_partition, false,
(void __user *)arg);
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 6a4e8b9d570fd..ac901801fd397 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -484,6 +484,7 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
#define HVCALL_CONNECT_PORT 0x0096
#define HVCALL_START_VP 0x0099
#define HVCALL_GET_VP_INDEX_FROM_APIC_ID 0x009a
+#define HVCALL_TRANSLATE_VIRTUAL_ADDRESS_EX 0x00ac
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_SIGNAL_EVENT_DIRECT 0x00c0
diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
index 5e83d37149662..08eede666762e 100644
--- a/include/hyperv/hvhdk.h
+++ b/include/hyperv/hvhdk.h
@@ -952,4 +952,45 @@ struct hv_input_modify_sparse_spa_page_host_access {
#define HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE 0x4
#define HV_MODIFY_SPA_PAGE_HOST_ACCESS_HUGE_PAGE 0x8
+enum hv_translate_gva_result_code {
+ HV_TRANSLATE_GVA_SUCCESS = 0,
+
+ /* Translation failures */
+ HV_TRANSLATE_GVA_PAGE_NOT_PRESENT = 1,
+ HV_TRANSLATE_GVA_PRIVILEGE_VIOLATION = 2,
+ HV_TRANSLATE_GVA_INVALID_PAGE_TABLE_FLAGS = 3,
+
+ /* GPA access failures */
+ HV_TRANSLATE_GVA_GPA_UNMAPPED = 4,
+ HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS = 5,
+ HV_TRANSLATE_GVA_GPA_NO_WRITE_ACCESS = 6,
+ HV_TRANSLATE_GVA_GPA_ILLEGAL_OVERLAY_ACCESS = 7,
+
+ HV_TRANSLATE_GVA_INTERCEPT = 8,
+ HV_TRANSLATE_GVA_GPA_UNACCEPTED = 9,
+};
+
+struct hv_input_translate_virtual_address {
+ u64 partition_id;
+ u32 vp_index;
+ u32 padding;
+ u64 control_flags;
+ u64 gva_page;
+} __packed;
+
+struct hv_translate_gva_result_ex {
+ u32 result_code; /* enum hv_translate_gva_result_code */
+ u32 cache_type : 8;
+ u32 overlay_page : 1;
+ u32 reserved : 23;
+#if IS_ENABLED(CONFIG_X86)
+ char event_info[40]; /* HV_X64_PENDING_EVENT */
+#endif
+} __packed;
+
+struct hv_output_translate_virtual_address_ex {
+ struct hv_translate_gva_result_ex translation_result;
+ u64 gpa_page;
+} __packed;
+
#endif /* _HV_HVHDK_H */
diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
index 32ff92b6342b2..29892013a4752 100644
--- a/include/uapi/linux/mshv.h
+++ b/include/uapi/linux/mshv.h
@@ -318,6 +318,16 @@ struct mshv_get_set_vp_state {
#define MSHV_RUN_VP _IOR(MSHV_IOCTL, 0x00, struct mshv_run_vp)
#define MSHV_GET_VP_STATE _IOWR(MSHV_IOCTL, 0x01, struct mshv_get_set_vp_state)
#define MSHV_SET_VP_STATE _IOWR(MSHV_IOCTL, 0x02, struct mshv_get_set_vp_state)
+
+struct mshv_translate_gva {
+ __u64 gva;
+ __u64 flags;
+ enum hv_translate_gva_result_code *result;
+ __u64 *gpa;
+};
+
+#define MSHV_TRANSLATE_GVA _IOWR(MSHV_IOCTL, 0xF2, struct mshv_translate_gva)
+
/*
* Generic hypercall
* Defined above in partition IOCTLs, avoid redefining it here
^ permalink raw reply related
* RE: [PATCH] hv_sock: fix ARM64 support
From: Dexuan Cui @ 2026-04-28 21:24 UTC (permalink / raw)
To: Hamza Mahfooz, netdev@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Long Li,
Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Michael Kelley, Himadri Pandya,
linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
linux-kernel@vger.kernel.org
In-Reply-To: <20260428125339.13963-1-hamzamahfooz@linux.microsoft.com>
> From: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
> Sent: Tuesday, April 28, 2026 5:54 AM
> Subject: [PATCH] hv_sock: fix ARM64 support
Typically, for a change to net/, you'd want to add a "net" or "net-next"
after the "PATCH", i.e.
[PATCH net]
or
[PATCH net v2]
See "Documentation/process/maintainer-netdev.rst"
^ permalink raw reply
* RE: [PATCH 2/2] drm/hyperv: use VMBUS_RING_SIZE()
From: Dexuan Cui @ 2026-04-28 21:07 UTC (permalink / raw)
To: Hamza Mahfooz, linux-kernel@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Long Li,
Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Himadri Pandya, Michael Kelley,
linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
netdev@vger.kernel.org, Saurabh Sengar, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Deepak Rawat, dri-devel@lists.freedesktop.org,
stable@kernel.vger.org
In-Reply-To: <20260425181719.1538483-2-hamzamahfooz@linux.microsoft.com>
> From: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
> Sent: Saturday, April 25, 2026 11:17 AM
>
> Cc: stable@kernel.vger.org
I think this should be
Cc: stable@vger.kernel.org
^ permalink raw reply
* RE: [PATCH 2/2] drm/hyperv: use VMBUS_RING_SIZE()
From: Dexuan Cui @ 2026-04-28 21:05 UTC (permalink / raw)
To: Michael Kelley, Hamza Mahfooz, Saurabh Singh Sengar
Cc: linux-kernel@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
Wei Liu, Long Li, Stefano Garzarella, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Himadri Pandya, linux-hyperv@vger.kernel.org,
virtualization@lists.linux.dev, netdev@vger.kernel.org,
Saurabh Sengar, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Deepak Rawat,
dri-devel@lists.freedesktop.org, stable@kernel.vger.org
In-Reply-To: <SN6PR02MB41571A5B77A5FDDFE17AEF19D4362@SN6PR02MB4157.namprd02.prod.outlook.com>
> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Monday, April 27, 2026 12:06 PM
> > IMO the Fixes tag is unnecessary because the existing
> > VMBUS_RING_BUFSIZE
> > is 256KB, which is already aligned to 4KB, 16KB and 64KB.
> >
> > VMBUS_RING_SIZE(256 * 1024) is still 256KB.
>
> Not always. If PAGE_SIZE is 64KiB, VMBUS_RING_SIZE(256 * 1024) is
> 320KiB. If PAGE_SIZE is 16KiB or 4KiB, then VMBUS_RING_SIZE(256 * 1024)
> is indeed 256 KiB. See the explanation in the comment for
> VMBUS_RING_SIZE.
>
> Michael
Thanks for correcting me!
I didn't realize that sizeof(struct hv_ring_buffer) is based on
PAGE_SIZE, not on HV_HYP_PAGE_SIZE.
However, it looks like the Fixes tag is still not needed:
without the patch, we always pass two arguments of 256KB to
vmbus_open().
with the patch, we still pass 256KB to vmbus_open() in the case of
PAGE_SIZE=4KB or 16KB, and we pass 320KB in the case of
PAGE_SIZE=64KB.
Both 320K and 256KB are multiples of PAGE_SIZE, so
vmbus_open() -> vmbus_alloc_ring() doesn't return -EINVAL.
In the case of PAGE_SIZE=64KB, it's OK to pass 256KB to vmbus_open()
here since the hyperv-drm driver doesn't really have to use a slightly
bigger VMBus ringbuffer size.
Thanks,
Dexuan
^ permalink raw reply
* RE: [EXTERNAL] [PATCH rc 06/15] RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()
From: Long Li @ 2026-04-28 17:55 UTC (permalink / raw)
To: Jason Gunthorpe, Andrew Lunn,
Broadcom internal kernel review list, Bryan Tan, Eric Dumazet,
Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Lijun Ou,
Parav Pandit, patches@lists.linux.dev, Roland Dreier,
Roland Dreier, Sagi Grimberg, Ajay Sharma, stable@vger.kernel.org,
Tariq Toukan, Wei Hu (Xavier), Shaobo Xu, Nenglong Zhao
In-Reply-To: <6-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
>
> Sashiko points out there are two bugs here in the error unwind flow, both related
> to how the WQ table is unwound.
>
> First there is a double i-- on the first failure path due to the while loop having a i--,
> remove it.
>
> Second if mana_ib_install_cq_cb() fails then mana_create_wq_obj() is not undone
> due to the above i--.
>
> Cc: stable@vger.kernel.org
> Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP")
> Link:
> https://sashiko.d/
> ev%2F%23%2Fpatchset%2F0-v2-1c49eeb88c48%252B91-
> rdma_udata_rep_jgg%2540nvidia.com%3Fpart%3D1&data=05%7C02%7Clongli%
> 40microsoft.com%7Cd4d57c89064d4cc1781e08dea541b72a%7C72f988bf86f141
> af91ab2d7cd011db47%7C1%7C0%7C639129898849523924%7CUnknown%7CT
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=hbczVL%2F
> QTqw5zawJJPpSNkjtDrBOJNkV5Qn9vGGYbhE%3D&reserved=0
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/qp.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index f7bb0d1f0f8034..8e1f052d0ec976 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -176,11 +176,8 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp,
> struct ib_pd *pd,
>
> ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
> &wq_spec, &cq_spec, &wq-
> >rx_object);
> - if (ret) {
> - /* Do cleanup starting with index i-1 */
> - i--;
> + if (ret)
> goto fail;
> - }
>
> /* The GDMA regions are now owned by the WQ object */
> wq->queue.gdma_region = GDMA_INVALID_DMA_REGION; @@
> -200,8 +197,10 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct
> ib_pd *pd,
>
> /* Create CQ table entry */
> ret = mana_ib_install_cq_cb(mdev, cq);
> - if (ret)
> + if (ret) {
> + mana_destroy_wq_obj(mpc, GDMA_RQ, wq-
> >rx_object);
> goto fail;
> + }
> }
> resp.num_entries = i;
>
> --
> 2.43.0
^ permalink raw reply
* RE: [EXTERNAL] [PATCH rc 07/15] RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()
From: Long Li @ 2026-04-28 17:53 UTC (permalink / raw)
To: Jason Gunthorpe, Andrew Lunn,
Broadcom internal kernel review list, Bryan Tan, Eric Dumazet,
Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Lijun Ou,
Parav Pandit, patches@lists.linux.dev, Roland Dreier,
Roland Dreier, Sagi Grimberg, Ajay Sharma, stable@vger.kernel.org,
Tariq Toukan, Wei Hu (Xavier), Shaobo Xu, Nenglong Zhao
In-Reply-To: <7-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
>
> Sashiko points out that mana_ib_cfg_vport_steering() is leaked, the normal
> destroy path cleans it up.
>
> Cc: stable@vger.kernel.org
> Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure
> Network Adapter")
> Link:
> https://sashiko.d/
> ev%2F%23%2Fpatchset%2F0-v1-e911b76a94d1%252B65d95-
> rdma_udata_rep_jgg%2540nvidia.com%3Fpart%3D4&data=05%7C02%7Clongli%
> 40microsoft.com%7Cb377464abc954481e9b108dea541b646%7C72f988bf86f141
> af91ab2d7cd011db47%7C1%7C0%7C639129898856785811%7CUnknown%7CT
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pqtgE8ULS
> pXgq%2BbpubumadArZO9lTvPki2ATvD9TnGI%3D&reserved=0
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/qp.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 8e1f052d0ec976..0fbcf449c134b5 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -217,13 +217,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp,
> struct ib_pd *pd,
> ibdev_dbg(&mdev->ib_dev,
> "Failed to copy to udata create rss-qp, %d\n",
> ret);
> - goto fail;
> + goto err_disable_vport_rx;
> }
>
> kfree(mana_ind_table);
>
> return 0;
>
> +err_disable_vport_rx:
> + mana_disable_vport_rx(mpc);
> fail:
> while (i-- > 0) {
> ibwq = ind_tbl->ind_tbl[i];
> --
> 2.43.0
^ permalink raw reply
* RE: [EXTERNAL] [PATCH rc 04/15] RDMA/mana: Validate rx_hash_key_len
From: Long Li @ 2026-04-28 17:50 UTC (permalink / raw)
To: Jason Gunthorpe, Andrew Lunn,
Broadcom internal kernel review list, Bryan Tan, Eric Dumazet,
Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Lijun Ou,
Parav Pandit, patches@lists.linux.dev, Roland Dreier,
Roland Dreier, Sagi Grimberg, Ajay Sharma, stable@vger.kernel.org,
Tariq Toukan, Wei Hu (Xavier), Shaobo Xu, Nenglong Zhao
In-Reply-To: <4-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
>
> Sashiko points out that rx_hash_key_len comes from a uAPI structure and is
> blindly passed to memcpy, allowing the userspace to trash kernel memory.
> Bounds check it so the memcpy cannot overflow.
>
> Cc: stable@vger.kernel.org
> Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure
> Network Adapter")
> Link:
> https://sashiko.d/
> ev%2F%23%2Fpatchset%2F0-v2-1c49eeb88c48%252B91-
> rdma_udata_rep_jgg%2540nvidia.com%3Fpart%3D1&data=05%7C02%7Clongli%
> 40microsoft.com%7C12e76b7833a74fb98a8208dea541b8cd%7C72f988bf86f141
> af91ab2d7cd011db47%7C1%7C0%7C639129898875053924%7CUnknown%7CT
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=75tKj32YfU
> uN7KdnsW63AjlwgnSLt2KXz34EUbXp2wI%3D&reserved=0
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/qp.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index 645581359cee0b..f7bb0d1f0f8034 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -21,6 +21,9 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev
> *dev,
>
> gc = mdev_to_gc(dev);
>
> + if (rx_hash_key_len > sizeof(req->hashkey))
> + return -EINVAL;
> +
> req_buf_size = struct_size(req, indir_tab,
> MANA_INDIRECT_TABLE_DEF_SIZE);
> req = kzalloc(req_buf_size, GFP_KERNEL);
> if (!req)
> --
> 2.43.0
^ permalink raw reply
* RE: [EXTERNAL] [PATCH rc 05/15] RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()
From: Long Li @ 2026-04-28 17:43 UTC (permalink / raw)
To: Jason Gunthorpe, Andrew Lunn,
Broadcom internal kernel review list, Bryan Tan, Eric Dumazet,
Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Lijun Ou,
Parav Pandit, patches@lists.linux.dev, Roland Dreier,
Roland Dreier, Sagi Grimberg, Ajay Sharma, stable@vger.kernel.org,
Tariq Toukan, Wei Hu (Xavier), Shaobo Xu, Nenglong Zhao
In-Reply-To: <5-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
>
> Sashiko points out that the user can specify WQs sharing the same CQ as a part of
> the uAPI and this will trigger the WARN_ON() then go on to corrupt the kernel.
>
> Just reject it outright and fail the QP creation.
>
> Cc: stable@vger.kernel.org
> Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP")
> Link:
> https://sashiko.d/
> ev%2F%23%2Fpatchset%2F0-v2-1c49eeb88c48%252B91-
> rdma_udata_rep_jgg%2540nvidia.com%3Fpart%3D1&data=05%7C02%7Clongli%
> 40microsoft.com%7C05b55740f97741c63c8408dea541ba1a%7C72f988bf86f141
> af91ab2d7cd011db47%7C1%7C0%7C639129898905286592%7CUnknown%7CT
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=045NsQ2efi
> m0Mmc8EGeKWQ3pyFbs63%2B2023OvD3%2B8IM%3D&reserved=0
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/cq.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
> index f4cbe21763bf11..2d682428ef202a 100644
> --- a/drivers/infiniband/hw/mana/cq.c
> +++ b/drivers/infiniband/hw/mana/cq.c
> @@ -137,8 +137,9 @@ int mana_ib_install_cq_cb(struct mana_ib_dev *mdev,
> struct mana_ib_cq *cq)
>
> if (cq->queue.id >= gc->max_num_cqs)
> return -EINVAL;
> - /* Create CQ table entry */
> - WARN_ON(gc->cq_table[cq->queue.id]);
> + /* Create CQ table entry, sharing a CQ between WQs is not supported */
> + if (gc->cq_table[cq->queue.id])
> + return -EINVAL;
> if (cq->queue.kmem)
> gdma_cq = cq->queue.kmem;
> else
> --
> 2.43.0
^ permalink raw reply
* Re: [PATCH V1 08/13] PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg
From: Mukesh R @ 2026-04-28 17:37 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: hpa, robin.murphy, robh, wei.liu, mhklinux, muislam, namjain,
magnuskulke, anbelski, linux-kernel, linux-hyperv, iommu,
linux-pci, linux-arch, kys, haiyangz, decui, longli, tglx, mingo,
bp, dave.hansen, x86, joro, will, lpieralisi, kwilczynski,
bhelgaas, arnd
In-Reply-To: <20260428171451.GA233136@bhelgaas>
On 4/28/26 10:14, Bjorn Helgaas wrote:
> On Mon, Apr 27, 2026 at 07:22:12PM -0700, Mukesh R wrote:
>> On 4/27/26 09:31, Bjorn Helgaas wrote:
>>> On Tue, Apr 21, 2026 at 07:32:34PM -0700, Mukesh R wrote:
>>>> Main change here is to rename hv_compose_msi_msg to
>>>> hv_vmbus_compose_msi_msg as we introduce hv_compose_msi_msg in upcoming
>>>> patches that builds MSI messages for both VMBus and non-VMBus cases. VMBus
>>>> is not used on baremetal root partition for example. While at it, replace
>>>> spaces with tabs and fix some formatting involving excessive line wraps.
>>>
>>> Would be better to do the whitespace changes in their own patch,
>>> although several of them should just be dropped (see below).
>
>>>> - * facilities. For instance, the configuration space of a function exposed
>>>> + * facilities. For instance, the configuration space of a function exposed
>>>
>>> Oops, this hunk made it worse. Definitely don't want a tab there.
>
>>>> - * The vector we select here is a dummy value. The correct
>>>> + * The vector we select here is a dummy value. The correct
>>>
>>> Another tab that should be a space. Actually, you should just drop
>>> this hunk; the rest of the comment has two spaces after periods, so
>>> this should too.
>>
>> well, most of our files does global replace 8 spaces with tabs, so
>> everywhere comments are well indented. Since, checkpatch doesn't complain
>> about tabs on comment lines, may I assue it is not a strict requirement
>> and more a nit or personal preference?
>
> I guess I didn't make it clear. I'm not complaining about leading
> tabs; I'm pointing out that the comments should not have embedded tabs
> in the middle between a period and the first word of the next
> sentence.
Oh, my bad, sorry, i didnt' realize that. Thank you, will def get rid
of them (most likely resulted from: vim %retab! command).
Thanks,
-Mukesh
> Here's what it looks like with "git show | cat -T":
>
> - * facilities. For instance, the configuration space of a function exposed
> + * facilities.^IFor instance, the configuration space of a function exposed
> ^^
>
> -^I^I * The vector we select here is a dummy value. The correct
> +^I^I * The vector we select here is a dummy value.^IThe correct
> ^^
>
> -^I^I * freed while we dereference the ring buffer pointer. Test
> +^I^I * freed while we dereference the ring buffer pointer.^ITest
> ^^
>
> -^I * to be overlapped by those children. Set the flag on this claim
> +^I * to be overlapped by those children.^ISet the flag on this claim
> ^^
>
> None of these hunks should be here. Maybe some automation gone wrong?
>
> In any case, every hunk of a patch that does "rename
> hv_compose_msi_msg to hv_vmbus_compose_msi_msg" should contain those
> names. Any whitespace changes should be in their own patch so they
> don't make it hard to review the rename.
^ permalink raw reply
* Re: [PATCH] hv_sock: fix ARM64 support
From: Hamza Mahfooz @ 2026-04-28 17:15 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Michael Kelley, Himadri Pandya,
linux-hyperv, virtualization, linux-kernel
In-Reply-To: <afCxfHKA7hJilGM3@sgarzare-redhat>
On Tue, Apr 28, 2026 at 03:12:40PM +0200, Stefano Garzarella wrote:
> No version number in the subject?
I generally on increment the version number if I make changes to the code
itself I did try to change the subject prefix to "PATCH RESEND" but it
appears something is off about my config that prevented from going
through.
>
> Please next time follow
> https://docs.kernel.org/process/submitting-patches.html#subject-line
>
> Common tags might include a version descriptor if the multiple versions
> of the patch have been sent out in response to comments (i.e., “v1, v2,
> v3”), or “RFC” to indicate a request for comments.
>
> On Tue, Apr 28, 2026 at 08:53:39AM -0400, Hamza Mahfooz wrote:
> > VMBUS ring buffers must be page aligned. Therefore, the current value of
> > 24K presents a challenge on ARM64 kernels (with 64K pages). So, use
> > VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
> > hold all of the relevant data.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 77ffe33363c0 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication")
> > Tested-by: Dexuan Cui <decui@microsoft.com>
> > Reviewed-by: Dexuan Cui <decui@microsoft.com>
> > Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
> > ---
> > net/vmw_vsock/hyperv_transport.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Acked-by: Stefano Garzarella <sgarzare@redhat.com>
>
> >
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 069386a74557..40f09b23efa3 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -375,10 +375,10 @@ static void hvs_open_connection(struct vmbus_channel *chan)
> > } else {
> > sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
> > sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
> > - sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
> > + sndbuf = VMBUS_RING_SIZE(sndbuf);
> > rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
> > rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
> > - rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
> > + rcvbuf = VMBUS_RING_SIZE(rcvbuf);
> > }
> >
> > chan->max_pkt_size = HVS_MAX_PKT_SIZE;
> > --
> > 2.54.0
> >
>
^ permalink raw reply
* Re: [PATCH V1 08/13] PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg
From: Bjorn Helgaas @ 2026-04-28 17:14 UTC (permalink / raw)
To: Mukesh R
Cc: hpa, robin.murphy, robh, wei.liu, mhklinux, muislam, namjain,
magnuskulke, anbelski, linux-kernel, linux-hyperv, iommu,
linux-pci, linux-arch, kys, haiyangz, decui, longli, tglx, mingo,
bp, dave.hansen, x86, joro, will, lpieralisi, kwilczynski,
bhelgaas, arnd
In-Reply-To: <43f41598-ee90-eb2f-1877-da6d1687322e@linux.microsoft.com>
On Mon, Apr 27, 2026 at 07:22:12PM -0700, Mukesh R wrote:
> On 4/27/26 09:31, Bjorn Helgaas wrote:
> > On Tue, Apr 21, 2026 at 07:32:34PM -0700, Mukesh R wrote:
> > > Main change here is to rename hv_compose_msi_msg to
> > > hv_vmbus_compose_msi_msg as we introduce hv_compose_msi_msg in upcoming
> > > patches that builds MSI messages for both VMBus and non-VMBus cases. VMBus
> > > is not used on baremetal root partition for example. While at it, replace
> > > spaces with tabs and fix some formatting involving excessive line wraps.
> >
> > Would be better to do the whitespace changes in their own patch,
> > although several of them should just be dropped (see below).
> > > - * facilities. For instance, the configuration space of a function exposed
> > > + * facilities. For instance, the configuration space of a function exposed
> >
> > Oops, this hunk made it worse. Definitely don't want a tab there.
> > > - * The vector we select here is a dummy value. The correct
> > > + * The vector we select here is a dummy value. The correct
> >
> > Another tab that should be a space. Actually, you should just drop
> > this hunk; the rest of the comment has two spaces after periods, so
> > this should too.
>
> well, most of our files does global replace 8 spaces with tabs, so
> everywhere comments are well indented. Since, checkpatch doesn't complain
> about tabs on comment lines, may I assue it is not a strict requirement
> and more a nit or personal preference?
I guess I didn't make it clear. I'm not complaining about leading
tabs; I'm pointing out that the comments should not have embedded tabs
in the middle between a period and the first word of the next
sentence.
Here's what it looks like with "git show | cat -T":
- * facilities. For instance, the configuration space of a function exposed
+ * facilities.^IFor instance, the configuration space of a function exposed
^^
-^I^I * The vector we select here is a dummy value. The correct
+^I^I * The vector we select here is a dummy value.^IThe correct
^^
-^I^I * freed while we dereference the ring buffer pointer. Test
+^I^I * freed while we dereference the ring buffer pointer.^ITest
^^
-^I * to be overlapped by those children. Set the flag on this claim
+^I * to be overlapped by those children.^ISet the flag on this claim
^^
None of these hunks should be here. Maybe some automation gone wrong?
In any case, every hunk of a patch that does "rename
hv_compose_msi_msg to hv_vmbus_compose_msi_msg" should contain those
names. Any whitespace changes should be in their own patch so they
don't make it hard to review the rename.
^ permalink raw reply
* [PATCH] hv: utils: replace deprecated strcpy with strscpy in kvp_register
From: Thorsten Blum @ 2026-04-28 17:11 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li
Cc: Thorsten Blum, linux-hyperv, linux-kernel
strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. While the
current code works correctly, replace strcpy() with the safer strscpy()
to follow secure coding best practices. Use ->body.kvp_register.version
directly as the destination buffer and remove the local variable.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
---
Based on my other patch [1] which needs to be applied first.
[1] https://lore.kernel.org/lkml/20260414111008.307220-2-thorsten.blum@linux.dev/
---
drivers/hv/hv_kvp.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 6180ebe040ff..336b278b2182 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -27,6 +27,7 @@
#include <linux/connector.h>
#include <linux/workqueue.h>
#include <linux/hyperv.h>
+#include <linux/string.h>
#include <hyperv/hvhdk.h>
#include "hyperv_vmbus.h"
@@ -130,18 +131,15 @@ static void kvp_register_done(void)
static int
kvp_register(int reg_value)
{
-
struct hv_kvp_msg *kvp_msg;
- char *version;
int ret;
kvp_msg = kzalloc_obj(*kvp_msg);
if (!kvp_msg)
return -ENOMEM;
- version = kvp_msg->body.kvp_register.version;
kvp_msg->kvp_hdr.operation = reg_value;
- strcpy(version, HV_DRV_VERSION);
+ strscpy(kvp_msg->body.kvp_register.version, HV_DRV_VERSION);
ret = hvutil_transport_send(hvt, kvp_msg, sizeof(*kvp_msg),
kvp_register_done);
^ permalink raw reply related
* [PATCH rc 01/15] RDMA/ionic: Fix typo in format string
From: Jason Gunthorpe @ 2026-04-28 16:17 UTC (permalink / raw)
To: Andrew Lunn, Broadcom internal kernel review list, Bryan Tan,
Eric Dumazet, Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv, linux-rdma, netdev, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Long Li,
Lijun Ou, Parav Pandit, patches, Roland Dreier, Roland Dreier,
Sagi Grimberg, Ajay Sharma, stable, Tariq Toukan, Wei Hu (Xavier),
Shaobo Xu, Nenglong Zhao
In-Reply-To: <0-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
Applying the corrupted patch by hand mangled the format string, put the s
in the right place.
Cc: stable@vger.kernel.org
Fixes: 654a27f25530 ("RDMA/ionic: bound node_desc sysfs read with %.64s")
Reported-by: Brad Spengler <brad.spengler@opensrcsec.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/infiniband/hw/ionic/ionic_ibdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/ionic/ionic_ibdev.c b/drivers/infiniband/hw/ionic/ionic_ibdev.c
index 0382a64839d26a..73a616ae350236 100644
--- a/drivers/infiniband/hw/ionic/ionic_ibdev.c
+++ b/drivers/infiniband/hw/ionic/ionic_ibdev.c
@@ -185,7 +185,7 @@ static ssize_t hca_type_show(struct device *device,
struct ionic_ibdev *dev =
rdma_device_to_drv_device(device, struct ionic_ibdev, ibdev);
- return sysfs_emit(buf, "%s.64\n", dev->ibdev.node_desc);
+ return sysfs_emit(buf, "%.64s\n", dev->ibdev.node_desc);
}
static DEVICE_ATTR_RO(hca_type);
--
2.43.0
^ permalink raw reply related
* [PATCH rc 14/15] RDMA/hns: Fix xarray race in hns_roce_create_qp_common()
From: Jason Gunthorpe @ 2026-04-28 16:17 UTC (permalink / raw)
To: Andrew Lunn, Broadcom internal kernel review list, Bryan Tan,
Eric Dumazet, Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv, linux-rdma, netdev, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Long Li,
Lijun Ou, Parav Pandit, patches, Roland Dreier, Roland Dreier,
Sagi Grimberg, Ajay Sharma, stable, Tariq Toukan, Wei Hu (Xavier),
Shaobo Xu, Nenglong Zhao
In-Reply-To: <0-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
Similar to the SRQ case the hr_qp is stored in the xarray before it is
fully initialized. Unlike the SRQ case the error unwinds do not wait for
the completion so keep the refcount 0 until the function succeeds.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Suggested-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/infiniband/hw/hns/hns_roce_qp.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index a27ea85bb06323..f94ba98871f0d0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -47,8 +47,8 @@ static struct hns_roce_qp *hns_roce_qp_lookup(struct hns_roce_dev *hr_dev,
xa_lock_irqsave(&hr_dev->qp_table_xa, flags);
qp = __hns_roce_qp_lookup(hr_dev, qpn);
- if (qp)
- refcount_inc(&qp->refcount);
+ if (qp && !refcount_inc_not_zero(&qp->refcount))
+ qp = NULL;
xa_unlock_irqrestore(&hr_dev->qp_table_xa, flags);
if (!qp)
@@ -1251,8 +1251,8 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
hr_qp->ibqp.qp_num = hr_qp->qpn;
hr_qp->event = hns_roce_ib_qp_event;
- refcount_set(&hr_qp->refcount, 1);
init_completion(&hr_qp->free);
+ refcount_set_release(&hr_qp->refcount, 1);
return 0;
--
2.43.0
^ permalink raw reply related
* [PATCH rc 09/15] RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
From: Jason Gunthorpe @ 2026-04-28 16:17 UTC (permalink / raw)
To: Andrew Lunn, Broadcom internal kernel review list, Bryan Tan,
Eric Dumazet, Junxian Huang, Konstantin Taranov, Jakub Kicinski,
Leon Romanovsky, linux-hyperv, linux-rdma, netdev, Paolo Abeni,
Selvin Xavier, Chengchang Tang, Tariq Toukan, Vishnu Dasa,
Yishai Hadas
Cc: Abhijit Gangurde, Adit Ranadive, Allen Hubbe, Andrew Boyer,
Aditya Sarwade, Brad Spengler, Bryan Tan, David S. Miller,
Dexuan Cui, Doug Ledford, George Zhang, Jorgen Hansen, Jianbo Liu,
Kai Aizen, Leon Romanovsky, Leon Romanovsky, Yixian Liu, Long Li,
Lijun Ou, Parav Pandit, patches, Roland Dreier, Roland Dreier,
Sagi Grimberg, Ajay Sharma, stable, Tariq Toukan, Wei Hu (Xavier),
Shaobo Xu, Nenglong Zhao
In-Reply-To: <0-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com>
Sashiko points out that pd->uctx isn't initialized until late in the
function so all these error flow references are NULL and will crash. Use
the uctx that isn't NULL.
Cc: stable@vger.kernel.org
Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 463c9a5703fc4e..a88cc5d84af828 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -620,9 +620,9 @@ static int ocrdma_copy_pd_uresp(struct ocrdma_dev *dev, struct ocrdma_pd *pd,
ucopy_err:
if (pd->dpp_enabled)
- ocrdma_del_mmap(pd->uctx, dpp_page_addr, PAGE_SIZE);
+ ocrdma_del_mmap(uctx, dpp_page_addr, PAGE_SIZE);
dpp_map_err:
- ocrdma_del_mmap(pd->uctx, db_page_addr, db_page_size);
+ ocrdma_del_mmap(uctx, db_page_addr, db_page_size);
return status;
}
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox