Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: mana: hardening: Reject zero max_num_queues from MANA_QUERY_VPORT_CONFIG
From: Erni Sri Satya Vennela @ 2026-04-10  5:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260326174815.2012137-1-ernis@linux.microsoft.com>

On Thu, Mar 26, 2026 at 10:48:10AM -0700, Erni Sri Satya Vennela wrote:
> As a part of MANA hardening for CVM, validate that max_num_sq and
> max_num_rq returned by MANA_QUERY_VPORT_CONFIG are not zero. These
> values flow into apc->num_queues, which is used as an allocation count
> and loop bound. A zero value would result in zero-size allocations and
> incorrect driver behavior.
> 
> Return -EPROTO if either value is zero.
> 
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index b39e8b920791..a4197b4b0597 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1249,6 +1249,12 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
>  
>  	*max_sq = resp.max_num_sq;
>  	*max_rq = resp.max_num_rq;
> +
> +	if (*max_sq == 0 || *max_rq == 0) {
> +		netdev_err(apc->ndev, "Invalid max queues from vPort config\n");
> +		return -EPROTO;
> +	}
> +
>  	if (resp.num_indirection_ent > 0 &&
>  	    resp.num_indirection_ent <= MANA_INDIRECT_TABLE_MAX_SIZE &&
>  	    is_power_of_2(resp.num_indirection_ent)) {
> -- 
> 2.34.1

Hi,

Gentle reminder regarding this patch.

I would really appreciate any feedback whenever you get a chance.
Please let me know if any changes are required from my side.

Thanks for your time.

Regards,
Vennela


^ permalink raw reply

* Re: [PATCH v2] Drivers: hv: mshv: fix integer overflow in memory region overlap check
From: Junrui Luo @ 2026-04-10  3:06 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Nuno Das Neves, Anirudh Rayabharam, Mukesh Rathor, Muminul Islam,
	Praveen K Paladugu, Jinank Jain, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Yuhao Jiang, Roman Kisel,
	stable@vger.kernel.org
In-Reply-To: <ac76zlXjXhPVkA6f@skinsburskii.localdomain>

On Thu, Apr 02, 2026 at 04:25:02PM -0700, Stanislav Kinsburskii wrote:
> nit: both comments are redundant - the meaning is clear from the code
> itself.

I will drop them in v3.

> This maximum value check bugs me a bit.
> 
> First of all, why does it matter what is the region end? Potentially, there can be
> regions not backed by host address space (leave alone host RAM), so why
> intropducing this limitation?
> 
> Second, this check takes a host-specific constant (MAX_PHYSMEM_BITS) and rounds it down
> to hypervisor-specific units which may not be aligned with the host page
> size. Should this be host pages instead?
 
This check was suggested by Roman in v1 review. Roman, could you
share your thoughts on Stanislav's concerns? I'd like to align on whether an upper
bound check is needed here.

Thanks,
Junrui Luo

^ permalink raw reply

* Re: [PATCH net-next v6 0/2] net: mana: add ethtool private flag for full-page RX buffers
From: Jakub Kicinski @ 2026-04-10  1:35 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, leitao, kees, john.fastabend,
	hawk, bpf, daniel, ast, sdf, dipayanroy
In-Reply-To: <20260407200216.272659-1-dipayanroy@linux.microsoft.com>

On Tue,  7 Apr 2026 12:59:17 -0700 Dipayaan Roy wrote:
> This behavior is observed on a single platform; other platforms
> perform better with page_pool fragments, indicating this is not a
> page_pool issue but platform-specific.

Well, someone has to run some experiments and confirm other ARM
platforms are not impacted, with data. I was hoping to do it myself
but doesn't look like that will happen in time for the merge window :(

> Changes in v6:
>  - Added missed maintainers.

STOP REPOSTING PATCHES FOR NO REASON.

^ permalink raw reply

* [PATCH] Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv
From: Dexuan Cui @ 2026-04-09 21:52 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, lpieralisi, kwilczynski,
	mani, robh, bhelgaas, linux-hyperv, linux-pci
  Cc: linux-kernel, Mukesh Rathor

With commit f84b21da3624 ("PCI: hv: Don't load the driver for baremetal root partition"),
the bare metal Linux root partition won't use the pci-hyperv driver, but
when a Linux VM runs on the Linux root partition, pci-hyperv's module_init
function init_hv_pci_drv() can still run, e.g. in the case of
CONFIG_PCI_HYPERV=y, even if the VMBus driver is not used in such a VM
(i.e. the hv_vmbus driver's init function returns -ENODEV due to
vmbus_root_device being NULL).

In such a Linux VM, init_hv_pci_drv() runs with a side effect: the 3
hvpci_block_ops callbacks are set to functions that depend on hv_vmbus.

Later, when the MLX driver in such a VM invokes the callbacks, e.g. in
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c:
mlx5_hv_register_invalidate(), hvpci_block_ops.reg_blk_invalidate() is
hv_register_block_invalidate() rather than a NULL function pointer, and
hv_register_block_invalidate() assumes that it can find a struct
hv_pcibus_device from pdev->bus->sysdata, which is false in such a VM.

Consequently, hv_register_block_invalidate() -> get_pcichild_wslot() ->
spin_lock_irqsave() may hang since it can be accessing an invalid
spinlock pointer.

Fix the issue by exporting hv_vmbus_exists() and using it in pci-hyperv:

    hv_root_partition() is true and hv_nested is false ==>
	hv_vmbus_exists() is false.

    hv_root_partition() is true and hv_nested is true ==>
	hv_vmbus_exists() is true.

    hv_root_partition() is false ==> hv_vmbus_exists() is true.

While at it, rename vmbus_exists() to hv_vmbus_exists() to follow the
convention that all public functions have the hv_ prefix; also change
the return value's type from int to bool to make the code more readable;
also move the two pr_info() calls.

Reported-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c              | 20 ++++++++------------
 drivers/pci/controller/pci-hyperv.c |  2 +-
 include/linux/hyperv.h              |  2 ++
 3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bc4fc1951ae1..2c8936efc8d1 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -101,13 +101,11 @@ struct device *hv_get_vmbus_root_device(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_vmbus_root_device);

-static int vmbus_exists(void)
+bool hv_vmbus_exists(void)
 {
-	if (vmbus_root_device == NULL)
-		return -ENODEV;
-
-	return 0;
+	return vmbus_root_device != NULL;
 }
+EXPORT_SYMBOL_GPL(hv_vmbus_exists);

 static u8 channel_monitor_group(const struct vmbus_channel *channel)
 {
@@ -1582,11 +1580,10 @@ int __vmbus_driver_register(struct hv_driver *hv_driver, struct module *owner, c
 {
 	int ret;

-	pr_info("registering driver %s\n", hv_driver->name);
+	if (!hv_vmbus_exists())
+		return -ENODEV;

-	ret = vmbus_exists();
-	if (ret < 0)
-		return ret;
+	pr_info("registering driver %s\n", hv_driver->name);

 	hv_driver->driver.name = hv_driver->name;
 	hv_driver->driver.owner = owner;
@@ -1612,9 +1609,8 @@ EXPORT_SYMBOL_GPL(__vmbus_driver_register);
  */
 void vmbus_driver_unregister(struct hv_driver *hv_driver)
 {
-	pr_info("unregistering driver %s\n", hv_driver->name);
-
-	if (!vmbus_exists()) {
+	if (hv_vmbus_exists()) {
+		pr_info("unregistering driver %s\n", hv_driver->name);
 		driver_unregister(&hv_driver->driver);
 		vmbus_free_dynids(hv_driver);
 	}
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..226b8bb802f3 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -4166,7 +4166,7 @@ static int __init init_hv_pci_drv(void)
 	if (!hv_is_hyperv_initialized())
 		return -ENODEV;

-	if (hv_root_partition() && !hv_nested)
+	if (!hv_vmbus_exists())
 		return -ENODEV;

 	ret = hv_pci_irqchip_init();
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index dfc516c1c719..5459e776ec17 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1304,6 +1304,8 @@ static inline void *hv_get_drvdata(struct hv_device *dev)

 struct device *hv_get_vmbus_root_device(void);

+bool hv_vmbus_exists(void);
+
 struct hv_ring_buffer_debug_info {
 	u32 current_interrupt_mask;
 	u32 current_read_index;
-- 
2.43.0

^ permalink raw reply related

* RE: [RFC v1 1/5] PCI: hv: Create and export hv_build_logical_dev_id()
From: Michael Kelley @ 2026-04-09 19:01 UTC (permalink / raw)
  To: Easwar Hariharan
  Cc: Yu Zhang, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, iommu@lists.linux.dev,
	linux-pci@vger.kernel.org, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	lpieralisi@kernel.org, kwilczynski@kernel.org, mani@kernel.org,
	robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
	joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	jacob.pan@linux.microsoft.com, nunodasneves@linux.microsoft.com,
	mrathor@linux.microsoft.com, peterz@infradead.org,
	linux-arch@vger.kernel.org
In-Reply-To: <2dabc1b8-0cf0-4fc8-9cd4-cce60adfc05e@linux.microsoft.com>

From: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Sent: Wednesday, April 8, 2026 1:21 PM
> 
> On 1/11/2026 9:36 AM, Michael Kelley wrote:
> > From: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Sent: Friday, January 9, 2026 10:41 AM
> >>
> >> On 1/8/2026 10:46 AM, Michael Kelley wrote:
> >>> From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Monday, December 8, 2025 9:11 PM
> >>>>
> >>>> From: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> >>>>
> >>>> Hyper-V uses a logical device ID to identify a PCI endpoint device for
> >>>> child partitions. This ID will also be required for future hypercalls
> >>>> used by the Hyper-V IOMMU driver.
> >>>>
> >>>> Refactor the logic for building this logical device ID into a standalone
> >>>> helper function and export the interface for wider use.
> >>>>
> >>>> Signed-off-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> >>>> Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
> >>>> ---
> >>>>  drivers/pci/controller/pci-hyperv.c | 28 ++++++++++++++++++++--------
> >>>>  include/asm-generic/mshyperv.h      |  2 ++
> >>>>  2 files changed, 22 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> >>>> index 146b43981b27..4b82e06b5d93 100644
> >>>> --- a/drivers/pci/controller/pci-hyperv.c
> >>>> +++ b/drivers/pci/controller/pci-hyperv.c
> >>>> @@ -598,15 +598,31 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data)
> >>>>
> >>>>  #define hv_msi_prepare		pci_msi_prepare
> >>>>
> >>>> +/**
> >>>> + * Build a "Device Logical ID" out of this PCI bus's instance GUID and the
> >>>> + * function number of the device.
> >>>> + */
> >>>> +u64 hv_build_logical_dev_id(struct pci_dev *pdev)
> >>>> +{
> >>>> +	struct pci_bus *pbus = pdev->bus;
> >>>> +	struct hv_pcibus_device *hbus = container_of(pbus->sysdata,
> >>>> +						struct hv_pcibus_device, sysdata);
> >>>> +
> >>>> +	return (u64)((hbus->hdev->dev_instance.b[5] << 24) |
> >>>> +		     (hbus->hdev->dev_instance.b[4] << 16) |
> >>>> +		     (hbus->hdev->dev_instance.b[7] << 8)  |
> >>>> +		     (hbus->hdev->dev_instance.b[6] & 0xf8) |
> >>>> +		     PCI_FUNC(pdev->devfn));
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(hv_build_logical_dev_id);
> >>>
> >>> This change is fine for hv_irq_retarget_interrupt(), it doesn't help for the
> >>> new IOMMU driver because pci-hyperv.c can (and often is) built as a module.
> >>> The new Hyper-V IOMMU driver in this patch series is built-in, and so it can't
> >>> use this symbol in that case -- you'll get a link error on vmlinux when building
> >>> the kernel. Requiring pci-hyperv.c to *not* be built as a module would also
> >>> require that the VMBus driver not be built as a module, so I don't think that's
> >>> the right solution.
> >>>
> >>> This is a messy problem. The new IOMMU driver needs to start with a generic
> >>> "struct device" for the PCI device, and somehow find the corresponding VMBus
> >>> PCI pass-thru device from which it can get the VMBus instance ID. I'm thinking
> >>> about ways to do this that don't depend on code and data structures that are
> >>> private to the pci-hyperv.c driver, and will follow-up if I have a good suggestion.
> >>
> >> Thank you, Michael. FWIW, I did try to pull out the device ID components out of
> >> pci-hyperv into include/linux/hyperv.h and/or a new include/linux/pci-hyperv.h
> >> but it was just too messy as you say.
> >
> > Yes, the current approach for getting the device ID wanders through struct
> > hv_pcibus_device (which is private to the pci-hyperv driver), and through
> > struct hv_device (which is a VMBus data structure). That makes the linkage
> > between the PV IOMMU driver and the pci-hyperv and VMBus drivers rather
> > substantial, which is not good.
> 
> Hi Michael,
> 
> I missed this, or made a mental note to follow up but forgot. Either way, Yu reminded
> me about this email chain and I started looking at it this week.
> 
> >
> > But here's an idea for an alternate approach. The PV IOMMU driver doesn't
> > have to generate the logical device ID on-the-fly by going to the dev_instance
> > field of struct hv_device. Instead, the pci-hyperv driver can generate the logical
> > device ID in hv_pci_probe(), and put it somewhere that's easy for the IOMMU
> > driver to access. The logical device ID doesn't change while Linux is running, so
> > stashing another copy somewhere isn't a problem.
> 
> In my exploration and consulting with Dexuan, I realized that one of the components of
> the logical device ID, the PCI function number is set only in pci_scan_device(), well into
> pci_scan_root_bus_bridge() that you call out as the point by which the communication
> must have occurred.
> 
> But then, Dexuan also pointed me to hv_pci_assign_slots() with its call to wslot_to_devfn() and I'm
> honestly confused how these two interact. With the current approach, it looks like whatever
> devfn pci_scan_device() set is the correct function number to use for the logical device
> ID, in which case, the best I can do with your suggested approach below is to inform the
> pvIOMMU driver of the GUID, rather than the logical device ID itself.
> 
> Perhaps with your history, you can clarify the interaction, and/or share your thoughts
> on the above?

During hv_pci_probe(), hv_pci_query_relations() is called to ask the Hyper-V
host about what PCI devices are present. hv_pci_query_relations() sends a
PCI_QUERY_BUS_RELATIONS message to the host, and the host send back a
PCI_BUS_RELATIONS or PCI_BUS_RELATIONS2 message. The response message
is handled in hv_pci_onchannelcallback(), which calls hv_pci_devices_present()
or hv_pci_devices_present2().  The latter two functions both call
hv_pci_start_relations_work() to add a request to a workqueue that runs
pci_devices_present_work().  Finally, pci_devices_present_work() calls
pc_scan_child_bus(), followed by hv_pci_assign_slots().

In hv_pci_assign_slots, you can see that the PCI_BUS_RELATIONS[2]
info from the Hyper-V host contains a function number encoded in the
win_slot field. So the Hyper-V host *does* tell the guest the function number.
However, the generic Linux PCI subsystem doesn't use this function number.
It still scans the PCI device, trying successive function numbers to see which
ones work. The scan should find the same function number that the Hyper-V
host originally reported.

As you noted, there's a sequencing problem in waiting for
pci_scan_single_device() to find the function number. In the hv_pci_probe()
path, after hv_pci_query_relations() runs and before create_root_hv_pci_bus()
is called, it seems feasible to use the function number provided by the
Hyper-V host to construct the logical device ID. That should work. But there's
another path, in that the Hyper-V host can generate a PCI_BUS_RELATIONS[2]
message without a request from Linux when something on the host side changes
the PCI device setup. There's a code path where pci_devices_present_work()
finds the state is "hv_pcibus_installed", and directly calls pci_scan_child_bus().
This path would presumably also need to construct (or re-construct) the
logical device ID using the information from the Hyper-V host before calling
pci_scan_child_bus(). I'm vague on the scenario for this latter case, but the
code is obviously there to handle it.

The other approach is as you suggest. The Hyper-V PCI driver can tell
the IOMMU driver the almost complete logical device ID, using just the
GUID bits. Then the IOMMU driver can then construct the full logical
device ID by adding the function number from the struct pci_dev. I don't
see a problem with this approach -- other IOMMU drivers are referencing
the struct pci_dev, and pulling out the function number doesn't seem like
a violation of layering.

> 
> >
> > So have the Hyper-V PV IOMMU driver provide an EXPORTed function to accept
> > a PCI domain ID and the related logical device ID. The PV IOMMU driver is
> > responsible for storing this data in a form that it can later search. hv_pci_probe()
> > calls this new function when it instantiates a new PCI pass-thru device. Then when
> > the IOMMU driver needs to attach a new device, it can get the PCI domain ID
> > from the struct pci_dev (or struct pci_bus), search for the related logical device
> > ID in its own data structure, and use it. The pci-hyperv driver has a dependency
> > on the IOMMU driver, but that's a dependency in the desired direction. The
> > PCI domain ID and logical device ID are just integers, so no data structures are
> > shared.
> 
> In a previous reply on this thread, you raised the uniqueness issue of bytes 4 and 5
> of the GUID being used to create the domain number. I thought this approach could
> help with that too, but as I coded it up, I realized that using the domain number
> (not guaranteed to be unique) to search for the bus instance GUID (guaranteed to be unique)
> is the wrong way around. It is unfortunately the only available key in the pci_dev
> handed to the pvIOMMU driver in this approach though...
> 
> Do you think that's a fatal flaw?

There are two uniqueness problems, which I didn't fully separate conceptually
until writing this. One problem is constructing a PCI domain ID that Linux can use
to identify the virtual PCI bus that the Hyper-V PCI driver creates for each vPCI
device. The Hyper-V virtual PCI driver uses GUID bytes 4 and 5, and recognizes
that they might not be unique. So there's code in hv_pci_probe() to pick another
number if there's a duplicate. Hyper-V doesn't really care how Linux picks the
domain ID for the virtual PCI bus as it's purely a Linux construct.

The second problem is the logical device ID that Hyper-V interprets to
identify a vPCI device in hypercalls such a HVCALL_RETARGET_INTERRUPT
and the new pvIOMMU related hypercalls. This logical device ID uses
GUID bytes 4 thru 7 (minus 1 bit).  I don’t think Linux uses the
logical device ID for anything. Since only Hyper-V interprets it, Hyper-V
must somehow be ensuring uniqueness of bytes 4 thru 7 (minus 1 bit).
That's something to confirm with the Hyper-V team. If they are just hoping
for the best, I don't know how Linux can solve the problem.

My original comment about uniqueness somewhat conflated the two problems,
and that's misleading. The use of the logical device ID has been around for years
in hv_irq_retarget_interrupt(). Extending its use to the new pvIOMMU
hypercalls doesn't make things any worse. But I'm still curious about
what the Hyper-V team says about the uniqueness of bytes 4 thru 7.

Michael

> 
> >
> > Note that the pci-hyperv must inform the PV IOMMU driver of the logical
> > device ID *before* create_root_hv_pci_bus() calls pci_scan_root_bus_bridge().
> > The latter function eventually invokes hv_iommu_attach_dev(), which will
> > need the logical device ID. See example stack trace. [1]
> >
> > I don't think the pci-hyperv driver even needs to tell the IOMMU driver to
> > remove the information if a PCI pass-thru device is unbound or removed, as
> > the logical device ID will be the same if the device ever comes back. At worst,
> > the IOMMU driver can simply replace an existing logical device ID if a new one
> > is provided for the same PCI domain ID.
> 
> As above, replacing a unique GUID when a result is found for a non-unique
> key value may be prone to failure if it happens that the device that came "back"
> is not in fact the same device (or class of device) that went away and just happens
> to, either due to bytes 4 and 5 being identical, or due to collision in the
> pci_domain_nr_dynamic_ida, have the same domain number.
> 
> Thanks,
> Easwar (he/him)
> 
> >
> > An include file must provide a stub for the new function if
> > CONFIG_HYPERV_PVIOMMU is not defined, so that the pci-hyperv driver still
> > builds and works.
> >
> > I haven't coded this up, but it seems like it should be pretty clean.
> >
> > Michael
> >
> > [1] Example stack trace, starting with vmbus_add_channel_work() as a
> > result of Hyper-V offering the PCI pass-thru device to the guest.
> > hv_pci_probe() runs, and ends up in the generic Linux code for adding
> > a PCI device, which in turn sets up the IOMMU.
> >
> > [    1.731786]  hv_iommu_attach_dev+0xf0/0x1d0
> > [    1.731788]  __iommu_attach_device+0x21/0xb0
> > [    1.731790]  __iommu_device_set_domain+0x65/0xd0
> > [    1.731792]  __iommu_group_set_domain_internal+0x61/0x120
> > [    1.731795]  iommu_setup_default_domain+0x3a4/0x530
> > [    1.731796]  __iommu_probe_device.part.0+0x15d/0x1d0
> > [    1.731798]  iommu_probe_device+0x81/0xb0
> > [    1.731799]  iommu_bus_notifier+0x2c/0x80
> > [    1.731800]  notifier_call_chain+0x66/0xe0
> > [    1.731802]  blocking_notifier_call_chain+0x47/0x70
> > [    1.731804]  bus_notify+0x3b/0x50
> > [    1.731805]  device_add+0x631/0x850
> > [    1.731807]  pci_device_add+0x2db/0x670
> > [    1.731809]  pci_scan_single_device+0xc3/0x100
> > [    1.731810]  pci_scan_slot+0x97/0x230
> > [    1.731812]  pci_scan_child_bus_extend+0x3b/0x2f0
> > [    1.731814]  pci_scan_root_bus_bridge+0xc0/0xf0
> > [    1.731816]  hv_pci_probe+0x398/0x5f0
> > [    1.731817]  vmbus_probe+0x42/0xa0
> > [    1.731819]  really_probe+0xe5/0x3e0
> > [    1.731822]  __driver_probe_device+0x7e/0x170
> > [    1.731823]  driver_probe_device+0x23/0xa0
> > [    1.731824]  __device_attach_driver+0x92/0x130
> > [    1.731826]  bus_for_each_drv+0x8c/0xe0
> > [    1.731828]  __device_attach+0xc0/0x200
> > [    1.731830]  device_initial_probe+0x4c/0x50
> > [    1.731831]  bus_probe_device+0x32/0x90
> > [    1.731832]  device_add+0x65b/0x850
> > [    1.731836]  device_register+0x1f/0x30
> > [    1.731837]  vmbus_device_register+0x87/0x130
> > [    1.731840]  vmbus_add_channel_work+0x139/0x1a0
> > [    1.731841]  process_one_work+0x19f/0x3f0
> > [    1.731843]  worker_thread+0x188/0x2f0
> > [    1.731845]  kthread+0x119/0x230
> > [    1.731852]  ret_from_fork+0x1b4/0x1e0
> > [    1.731854]  ret_from_fork_asm+0x1a/0x30
> >
> >>


^ permalink raw reply

* Re: [PATCH 00/61] treewide: Use IS_ERR_OR_NULL over manual NULL check - refactor
From: Al Viro @ 2026-04-09 18:16 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
	gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
	linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
	linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
	linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
	linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
	linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
	linux-sctp, linux-security-module, linux-sh, linux-sound,
	linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
	netdev, ntfs3, samba-technical, sched-ext, target-devel,
	tipc-discussion, v9fs, Julia Lawall, Nicolas Palix, Chris Mason,
	David Sterba, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	Theodore Ts'o, Andreas Dilger, Steve French, Paulo Alcantara,
	Ronnie Sahlberg, Shyam Prasad N, Tom Talpey, Bharath SM,
	Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	Christian Schoenebeck, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Miklos Szeredi,
	Konstantin Komarov, Andreas Gruenbacher, Kees Cook, Tony Luck,
	Guilherme G. Piccoli, Jan Kara, Phillip Lougher,
	Christian Brauner, Jan Kara, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Tejun Heo, David Vernet, Andrea Righi,
	Changwoo Min, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Sylwester Nawrocki, Liam Girdwood,
	Mark Brown, Jaroslav Kysela, Takashi Iwai, Max Filippov,
	Paolo Bonzini, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Andrew Morton, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Benjamin Marzinski, David S. Miller, David Ahern,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Jamal Hadi Salim, Jiri Pirko,
	Marcelo Ricardo Leitner, Xin Long, Trond Myklebust,
	Anna Schumaker, Chuck Lever, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Jon Maloy, Johannes Berg,
	Catalin Marinas, Russell King, John Crispin, Thomas Bogendoerfer,
	Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Zhenyu Wang,
	Zhi Wang, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Alex Deucher, Christian König, Sandy Huang,
	Heiko Stübner, Andy Yan, Igor Russkikh, Andrew Lunn,
	Pavan Chebbi, Michael Chan, Potnuri Bharat Teja, Tony Nguyen,
	Przemek Kitszel, Taras Chornyi, Maxime Coquelin, Alexandre Torgue,
	Iyappan Subramanian, Keyur Chudgar, Quan Nguyen, Heiner Kallweit,
	Marc Zyngier, Thomas Gleixner, Andrew Lunn, Gregory Clement,
	Sebastian Hesselbarth, Vinod Koul, Linus Walleij, Ulf Hansson,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Martin K. Petersen,
	Eduardo Valentin, Keerthy, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, Lukasz Luba, Alex Williamson, Mark Greer,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Shuah Khan, Kieran Bingham, Mauro Carvalho Chehab, Joerg Roedel,
	Will Deacon, Robin Murphy, Lee Jones, Pavel Machek, Dave Penkler,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Justin Sanders, Jens Axboe, Georgi Djakov, Michael Turquette,
	Stephen Boyd, Philipp Zabel, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Pali Rohár, Dmitry Torokhov
In-Reply-To: <20260310-b4-is_err_or_null-v1-0-bd63b656022d@avm.de>

On Tue, Mar 10, 2026 at 12:48:26PM +0100, Philipp Hahn wrote:
> While doing some static code analysis I stumbled over a common pattern,
> where IS_ERR() is combined with a NULL check. For that there is
> IS_ERR_OR_NULL().

... and valid uses of IS_ERR_OR_NULL are rare as hen teeth.
Most of those are "I'm not sure how this function returns an
error, let's use that just in case".

Please, do not introduce more of that crap.

^ permalink raw reply

* [PATCH v3 7/7] mshv: Allocate pfns array only for pinned regions
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

Convert pfns to a pointer allocated only for pinned regions that
actually need it for share/unshare/evict operations. Unpinned
regions use NULL since HMM handles their mappings dynamically.

The pfns array was previously a flexible array member, forcing
allocation for all regions regardless of memory type. This wastes
significant memory for unpinned HMM-managed regions which don't
need persistent PFN tracking - a 1GB region wastes 2MB for an
unused array.

This also allows using kzalloc for the main structure instead of
vzalloc, improving allocation efficiency and cache locality.

Simplify unpinned region invalidation by calling the hypervisor
directly rather than tracking PFNs. The tradeoff of skipping huge
page optimization is acceptable since invalidation ranges are
typically small and not performance-critical.

Add NULL checks where pfns array is required and update cleanup
to handle conditional allocation.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   61 ++++++++++++++++++++++++---------------------
 drivers/hv/mshv_root.h    |    6 +++-
 2 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 5a1a06ee83d2..44eb6dfd7142 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -243,7 +243,7 @@ struct mshv_region *mshv_region_create(enum mshv_region_type type,
 	int ret = 0;
 	u64 i;
 
-	region = vzalloc(sizeof(*region) + sizeof(unsigned long) * nr_pfns);
+	region = kzalloc_obj(*region);
 	if (!region)
 		return ERR_PTR(-ENOMEM);
 
@@ -255,6 +255,13 @@ struct mshv_region *mshv_region_create(enum mshv_region_type type,
 						   &mshv_region_mni_ops);
 		break;
 	case MSHV_REGION_TYPE_MEM_PINNED:
+		region->mreg_pfns = vmalloc_array(nr_pfns, sizeof(unsigned long));
+		if (!region->mreg_pfns) {
+			ret = -ENOMEM;
+			break;
+		}
+		for (i = 0; i < nr_pfns; i++)
+			region->mreg_pfns[i] = MSHV_INVALID_PFN;
 		break;
 	case MSHV_REGION_TYPE_MMIO:
 		region->mreg_mmio_pfn = mmio_pfn;
@@ -276,16 +283,13 @@ struct mshv_region *mshv_region_create(enum mshv_region_type type,
 	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
 		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
 
-	for (i = 0; i < nr_pfns; i++)
-		region->mreg_pfns[i] = MSHV_INVALID_PFN;
-
 	mutex_init(&region->mreg_mutex);
 	kref_init(&region->mreg_refcount);
 
 	return region;
 
 free_region:
-	vfree(region);
+	kfree(region);
 	return ERR_PTR(ret);
 }
 
@@ -312,6 +316,9 @@ static int mshv_region_share(struct mshv_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
 
+	if (!region->mreg_pfns)
+		return -EINVAL;
+
 	return mshv_region_process_range(region, flags,
 					 0, region->nr_pfns,
 					 region->mreg_pfns,
@@ -340,6 +347,9 @@ static int mshv_region_unshare(struct mshv_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
 
+	if (!region->mreg_pfns)
+		return -EINVAL;
+
 	return mshv_region_process_range(region, flags,
 					 0, region->nr_pfns,
 					 region->mreg_pfns,
@@ -380,27 +390,19 @@ static int mshv_region_remap_pfns(struct mshv_region *region,
 					 mshv_region_chunk_remap);
 }
 
-static int mshv_region_map(struct mshv_region *region)
-{
-	u32 map_flags = region->hv_map_flags;
-
-	return mshv_region_remap_pfns(region, map_flags,
-				      0, region->nr_pfns,
-				      region->mreg_pfns);
-}
-
 static void mshv_region_invalidate_pfns(struct mshv_region *region,
 					u64 pfn_offset, u64 pfn_count)
 {
 	u64 i;
 
+	if (region->mreg_type != MSHV_REGION_TYPE_MEM_PINNED)
+		return;
+
 	for (i = pfn_offset; i < pfn_offset + pfn_count; i++) {
 		if (!pfn_valid(region->mreg_pfns[i]))
 			continue;
 
-		if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
-			unpin_user_page(pfn_to_page(region->mreg_pfns[i]));
-
+		unpin_user_page(pfn_to_page(region->mreg_pfns[i]));
 		region->mreg_pfns[i] = MSHV_INVALID_PFN;
 	}
 }
@@ -517,7 +519,9 @@ static void mshv_region_destroy(struct kref *ref)
 
 	mshv_region_invalidate(region);
 
-	vfree(region);
+	if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
+		vfree(region->mreg_pfns);
+	kfree(region);
 }
 
 void mshv_region_put(struct mshv_region *region)
@@ -627,10 +631,9 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_region *region,
  *   leaving missing pages as invalid PFN markers.
  *   Used for initial region setup.
  *
- * Collected PFNs are stored in region->mreg_pfns[] with HMM bookkeeping
- * flags cleared, then the range is mapped into the hypervisor. Present
- * PFNs get mapped with region access permissions; missing PFNs (zero
- * entries) get mapped with no-access permissions.
+ * HMM bookkeeping flags are stripped from collected PFNs before mapping.
+ * Present PFNs get mapped with region access permissions; missing PFNs
+ * (marked as MSHV_INVALID_PFN) get mapped with no-access permissions.
  *
  * Return: 0 on success, negative errno on failure.
  */
@@ -659,15 +662,17 @@ static int mshv_region_collect_and_map(struct mshv_region *region,
 		goto out;
 
 	for (i = 0; i < pfn_count; i++) {
-		if (!(pfns[i] & HMM_PFN_VALID))
+		if (!(pfns[i] & HMM_PFN_VALID)) {
+			pfns[i] = MSHV_INVALID_PFN;
 			continue;
+		}
 		/* Drop HMM_PFN_* flags to ensure PFNs are valid. */
-		region->mreg_pfns[pfn_offset + i] = pfns[i] & ~HMM_PFN_FLAGS;
+		pfns[i] &= ~HMM_PFN_FLAGS;
 	}
 
 	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
 				     pfn_offset, pfn_count,
-				     region->mreg_pfns + pfn_offset);
+				     pfns);
 
 	mutex_unlock(&region->mreg_mutex);
 out:
@@ -792,8 +797,6 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 	if (ret)
 		goto out_unlock;
 
-	mshv_region_invalidate_pfns(region, pfn_offset, pfn_count);
-
 	mutex_unlock(&region->mreg_mutex);
 
 	return true;
@@ -856,7 +859,9 @@ static int mshv_map_pinned_region(struct mshv_region *region)
 		}
 	}
 
-	ret = mshv_region_map(region);
+	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
+				     0, region->nr_pfns,
+				     region->mreg_pfns);
 	if (!ret)
 		return 0;
 
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 97659ba55418..e43bdbf1ada8 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -92,8 +92,10 @@ struct mshv_region {
 	enum mshv_region_type mreg_type;
 	struct mmu_interval_notifier mreg_mni;
 	struct mutex mreg_mutex;	/* protects region PFNs remapping */
-	u64 mreg_mmio_pfn;
-	unsigned long mreg_pfns[];
+	union {
+		unsigned long *mreg_pfns;
+		u64 mreg_mmio_pfn;
+	};
 };
 
 struct mshv_irq_ack_notifier {



^ permalink raw reply related

* [PATCH v3 6/7] mshv: Simplify pfn array handling in region processing
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

The current code requires passing both the full pfn array and an offset
parameter to region processing functions, forcing callees to manually
index into arrays. This approach is inflexible and makes it difficult
to work with different sources of pfn arrays.

Upcoming changes will need to pass pfn arrays obtained from the HMM
framework directly to these functions. The HMM framework returns arrays
that represent specific ranges rather than full region arrays with
offsets, making the current offset-based indexing pattern incompatible.

Refactor by having callers pass pre-offset pointers to pfn arrays and
removing offset-based indexing from callees. This allows functions to
work with any pfn array starting at index 0, regardless of its source,
and prepares the code for HMM integration.

No functional change intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   38 ++++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 1c318d1020fc..5a1a06ee83d2 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -92,7 +92,7 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 	unsigned long pfn;
 	int stride, ret;
 
-	pfn = pfns[pfn_offset];
+	pfn = pfns[0];
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
@@ -102,7 +102,7 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 
 	/* Start at stride since the first stride is validated */
 	for (count = stride; count < pfn_count ; count += stride) {
-		pfn = pfns[pfn_offset + count];
+		pfn = pfns[count];
 
 		/* Break if current pfn is invalid */
 		if (!pfn_valid(pfn))
@@ -157,7 +157,7 @@ static long mshv_region_process_chunk(struct mshv_region *region,
 				      unsigned long *pfns,
 				      pfn_handler_t handler)
 {
-	if (pfn_valid(pfns[pfn_offset]))
+	if (pfn_valid(pfns[0]))
 		return mshv_region_process_pfns(region, flags,
 				pfn_offset, pfn_count, pfns,
 				handler);
@@ -204,10 +204,7 @@ static int mshv_region_process_range(struct mshv_region *region,
 	if (end > region->nr_pfns)
 		return -EINVAL;
 
-	start = pfn_offset;
-	end = pfn_offset + 1;
-
-	while (end < pfn_offset + pfn_count) {
+	for (start = 0, end = 1; end < pfn_count; ) {
 		/*
 		 * Accumulate contiguous pfns with the same validity
 		 * (valid or not).
@@ -218,8 +215,9 @@ static int mshv_region_process_range(struct mshv_region *region,
 		}
 
 		ret = mshv_region_process_chunk(region, flags,
-						start, end - start,
-						pfns, handler);
+						pfn_offset + start,
+						end - start,
+						pfns + start, handler);
 		if (ret < 0)
 			return ret;
 
@@ -227,8 +225,9 @@ static int mshv_region_process_range(struct mshv_region *region,
 	}
 
 	ret = mshv_region_process_chunk(region, flags,
-					start, end - start,
-					pfns, handler);
+					pfn_offset + start,
+					end - start,
+					pfns + start, handler);
 	if (ret < 0)
 		return ret;
 
@@ -296,15 +295,14 @@ static int mshv_region_chunk_share(struct mshv_region *region,
 				   unsigned long *pfns,
 				   bool huge_page)
 {
-	if (!pfn_valid(pfns[pfn_offset]))
+	if (!pfn_valid(pfns[0]))
 		return -EINVAL;
 
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      pfns + pfn_offset,
-					      pfn_count,
+					      pfns, pfn_count,
 					      HV_MAP_GPA_READABLE |
 					      HV_MAP_GPA_WRITABLE,
 					      flags, true);
@@ -326,15 +324,15 @@ static int mshv_region_chunk_unshare(struct mshv_region *region,
 				     unsigned long *pfns,
 				     bool huge_page)
 {
-	if (!pfn_valid(pfns[pfn_offset]))
+	if (!pfn_valid(pfns[0]))
 		return -EINVAL;
 
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      pfns + pfn_offset,
-					      pfn_count, 0,
+					      pfns, pfn_count,
+					      0,
 					      flags, false);
 }
 
@@ -359,7 +357,7 @@ static int mshv_region_chunk_remap(struct mshv_region *region,
 	 * hypervisor track dirty pages, enabling precopy live
 	 * migration.
 	 */
-	if (!pfn_valid(pfns[pfn_offset]))
+	if (!pfn_valid(pfns[0]))
 		flags = HV_MAP_GPA_NO_ACCESS;
 
 	if (huge_page)
@@ -368,7 +366,7 @@ static int mshv_region_chunk_remap(struct mshv_region *region,
 	return hv_call_map_ram_pfns(region->partition->pt_id,
 				    region->start_gfn + pfn_offset,
 				    pfn_count, flags,
-				    pfns + pfn_offset);
+				    pfns);
 }
 
 static int mshv_region_remap_pfns(struct mshv_region *region,
@@ -669,7 +667,7 @@ static int mshv_region_collect_and_map(struct mshv_region *region,
 
 	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
 				     pfn_offset, pfn_count,
-				     region->mreg_pfns);
+				     region->mreg_pfns + pfn_offset);
 
 	mutex_unlock(&region->mreg_mutex);
 out:



^ permalink raw reply related

* [PATCH v3 5/7] mshv: Pass pfns array explicitly through processing chain
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

The current implementation relies on accessing region->pfns directly
within the pfn processing chain, making it difficult to use these
handlers with alternative pfn sources. This tight coupling limits
flexibility when processing pfns from different locations, such as
temporary arrays or external sources.

By threading the pfns pointer through the entire processing chain
(mshv_region_process_range, mshv_region_process_chunk, and all
handlers), we decouple the processing logic from the storage location.
This enables future enhancements like processing pfns from multiple
sources or implementing more sophisticated memory management strategies
without duplicating the core processing logic.

No functional change intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   61 +++++++++++++++++++++++++++------------------
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index f209a34afb3a..1c318d1020fc 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -22,7 +22,7 @@
 
 typedef int (*pfn_handler_t)(struct mshv_region *region, u32 flags,
 			     u64 pfn_offset, u64 pfn_count,
-			     bool huge_page);
+			     unsigned long *pfns, bool huge_page);
 
 static const struct mmu_interval_notifier_ops mshv_region_mni_ops;
 
@@ -84,6 +84,7 @@ static int mshv_chunk_stride(struct page *page,
 static long mshv_region_process_pfns(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
+				     unsigned long *pfns,
 				     pfn_handler_t handler)
 {
 	u64 gfn = region->start_gfn + pfn_offset;
@@ -91,7 +92,7 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 	unsigned long pfn;
 	int stride, ret;
 
-	pfn = region->mreg_pfns[pfn_offset];
+	pfn = pfns[pfn_offset];
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
@@ -101,7 +102,7 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 
 	/* Start at stride since the first stride is validated */
 	for (count = stride; count < pfn_count ; count += stride) {
-		pfn = region->mreg_pfns[pfn_offset + count];
+		pfn = pfns[pfn_offset + count];
 
 		/* Break if current pfn is invalid */
 		if (!pfn_valid(pfn))
@@ -114,7 +115,7 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 			break;
 	}
 
-	ret = handler(region, flags, pfn_offset, count, stride > 1);
+	ret = handler(region, flags, pfn_offset, count, pfns, stride > 1);
 	if (ret)
 		return ret;
 
@@ -138,11 +139,12 @@ static long mshv_region_process_pfns(struct mshv_region *region,
 static long mshv_region_process_hole(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
+				     unsigned long *pfns,
 				     pfn_handler_t handler)
 {
 	long ret;
 
-	ret = handler(region, flags, pfn_offset, pfn_count, 0);
+	ret = handler(region, flags, pfn_offset, pfn_count, pfns, 0);
 	if (ret)
 		return ret;
 
@@ -152,15 +154,16 @@ static long mshv_region_process_hole(struct mshv_region *region,
 static long mshv_region_process_chunk(struct mshv_region *region,
 				      u32 flags,
 				      u64 pfn_offset, u64 pfn_count,
+				      unsigned long *pfns,
 				      pfn_handler_t handler)
 {
-	if (pfn_valid(region->mreg_pfns[pfn_offset]))
+	if (pfn_valid(pfns[pfn_offset]))
 		return mshv_region_process_pfns(region, flags,
-				pfn_offset, pfn_count,
+				pfn_offset, pfn_count, pfns,
 				handler);
 	else
 		return mshv_region_process_hole(region, flags,
-				pfn_offset, pfn_count,
+				pfn_offset, pfn_count, pfns,
 				handler);
 }
 
@@ -170,12 +173,13 @@ static long mshv_region_process_chunk(struct mshv_region *region,
  * @flags     : Flags to pass to the handler.
  * @pfn_offset: Offset into the region's PFNs array to start processing.
  * @pfn_count : Number of PFNs to process.
+ * @pfns      : Pointer to an array of PFNs corresponding to the region.
  * @handler   : Callback function to handle each chunk of contiguous
  *              valid PFNs.
  *
- * Iterates over the specified range of PFNs in @region, skipping
- * invalid PFNs. For each contiguous chunk of valid PFNS, invokes
- * @handler via mshv_region_process_pfns.
+ * Iterates over the specified range of PFNs, skipping invalid PFNs.
+ * For each contiguous chunk of valid PFNS, invokes @handler via
+ * mshv_region_process_pfns.
  *
  * Note: The @handler callback must be able to handle PFNs backed by both
  * normal and huge pages.
@@ -185,6 +189,7 @@ static long mshv_region_process_chunk(struct mshv_region *region,
 static int mshv_region_process_range(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
+				     unsigned long *pfns,
 				     pfn_handler_t handler)
 {
 	u64 start, end;
@@ -207,15 +212,14 @@ static int mshv_region_process_range(struct mshv_region *region,
 		 * Accumulate contiguous pfns with the same validity
 		 * (valid or not).
 		 */
-		if (pfn_valid(region->mreg_pfns[start]) ==
-		    pfn_valid(region->mreg_pfns[end])) {
+		if (pfn_valid(pfns[start]) == pfn_valid(pfns[end])) {
 			end++;
 			continue;
 		}
 
 		ret = mshv_region_process_chunk(region, flags,
 						start, end - start,
-						handler);
+						pfns, handler);
 		if (ret < 0)
 			return ret;
 
@@ -224,7 +228,7 @@ static int mshv_region_process_range(struct mshv_region *region,
 
 	ret = mshv_region_process_chunk(region, flags,
 					start, end - start,
-					handler);
+					pfns, handler);
 	if (ret < 0)
 		return ret;
 
@@ -289,16 +293,17 @@ struct mshv_region *mshv_region_create(enum mshv_region_type type,
 static int mshv_region_chunk_share(struct mshv_region *region,
 				   u32 flags,
 				   u64 pfn_offset, u64 pfn_count,
+				   unsigned long *pfns,
 				   bool huge_page)
 {
-	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+	if (!pfn_valid(pfns[pfn_offset]))
 		return -EINVAL;
 
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      region->mreg_pfns + pfn_offset,
+					      pfns + pfn_offset,
 					      pfn_count,
 					      HV_MAP_GPA_READABLE |
 					      HV_MAP_GPA_WRITABLE,
@@ -311,22 +316,24 @@ static int mshv_region_share(struct mshv_region *region)
 
 	return mshv_region_process_range(region, flags,
 					 0, region->nr_pfns,
+					 region->mreg_pfns,
 					 mshv_region_chunk_share);
 }
 
 static int mshv_region_chunk_unshare(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
+				     unsigned long *pfns,
 				     bool huge_page)
 {
-	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+	if (!pfn_valid(pfns[pfn_offset]))
 		return -EINVAL;
 
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      region->mreg_pfns + pfn_offset,
+					      pfns + pfn_offset,
 					      pfn_count, 0,
 					      flags, false);
 }
@@ -337,12 +344,14 @@ static int mshv_region_unshare(struct mshv_region *region)
 
 	return mshv_region_process_range(region, flags,
 					 0, region->nr_pfns,
+					 region->mreg_pfns,
 					 mshv_region_chunk_unshare);
 }
 
 static int mshv_region_chunk_remap(struct mshv_region *region,
 				   u32 flags,
 				   u64 pfn_offset, u64 pfn_count,
+				   unsigned long *pfns,
 				   bool huge_page)
 {
 	/*
@@ -350,7 +359,7 @@ static int mshv_region_chunk_remap(struct mshv_region *region,
 	 * hypervisor track dirty pages, enabling precopy live
 	 * migration.
 	 */
-	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+	if (!pfn_valid(pfns[pfn_offset]))
 		flags = HV_MAP_GPA_NO_ACCESS;
 
 	if (huge_page)
@@ -359,15 +368,17 @@ static int mshv_region_chunk_remap(struct mshv_region *region,
 	return hv_call_map_ram_pfns(region->partition->pt_id,
 				    region->start_gfn + pfn_offset,
 				    pfn_count, flags,
-				    region->mreg_pfns + pfn_offset);
+				    pfns + pfn_offset);
 }
 
 static int mshv_region_remap_pfns(struct mshv_region *region,
 				  u32 map_flags,
-				  u64 pfn_offset, u64 pfn_count)
+				  u64 pfn_offset, u64 pfn_count,
+				  unsigned long *pfns)
 {
 	return mshv_region_process_range(region, map_flags,
 					 pfn_offset, pfn_count,
+					 pfns,
 					 mshv_region_chunk_remap);
 }
 
@@ -376,7 +387,8 @@ static int mshv_region_map(struct mshv_region *region)
 	u32 map_flags = region->hv_map_flags;
 
 	return mshv_region_remap_pfns(region, map_flags,
-				      0, region->nr_pfns);
+				      0, region->nr_pfns,
+				      region->mreg_pfns);
 }
 
 static void mshv_region_invalidate_pfns(struct mshv_region *region,
@@ -656,7 +668,8 @@ static int mshv_region_collect_and_map(struct mshv_region *region,
 	}
 
 	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
-				     pfn_offset, pfn_count);
+				     pfn_offset, pfn_count,
+				     region->mreg_pfns);
 
 	mutex_unlock(&region->mreg_mutex);
 out:



^ permalink raw reply related

* [PATCH v3 4/7] mshv: Optimize memory region mapping operations
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

Two specific operations don't require PFN iteration: region unmapping
and region remapping with no access. For unmapping, all frames in MSHV
memory regions are guaranteed to be mapped with page access, so we can
unmap them all without checking individual PFNs. For remapping with no
access, all frames are already mapped with page access, allowing us to
unmap them all in one pass.

Since neither operation needs PFN validation, iterating over PFNs is
redundant. Batch operations into large page-aligned chunks followed by
remaining pages. This eliminates PFN traversal for these operations,
requires no additional hypercalls compared to the PFN-checking approach,
and provides the simplest possible sequential execution path.

The optimization utilizes HV_MAP_GPA_LARGE_PAGE and
HV_UNMAP_GPA_LARGE_PAGE flags for aligned portions, processing only the
remainder with base page granularity. This removes
mshv_region_chunk_unmap() and eliminates PFN iteration for unmap and
no-access operations, reducing code complexity.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   87 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 2c4215381e0b..f209a34afb3a 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -449,27 +449,38 @@ static int mshv_region_pin(struct mshv_region *region)
 	return ret < 0 ? ret : -ENOMEM;
 }
 
-static int mshv_region_chunk_unmap(struct mshv_region *region,
-				   u32 flags,
-				   u64 pfn_offset, u64 pfn_count,
-				   bool huge_page)
+static int mshv_region_unmap(struct mshv_region *region)
 {
-	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
-		return 0;
+	u64 gfn, nr_pfns, starting_pfns, aligned_pfns, remaining_pfns;
+	int ret = 0;
 
-	if (huge_page)
-		flags |= HV_UNMAP_GPA_LARGE_PAGE;
+	gfn = region->start_gfn;
+	nr_pfns = region->nr_pfns;
 
-	return hv_call_unmap_pfns(region->partition->pt_id,
-				  region->start_gfn + pfn_offset,
-				  pfn_count, flags);
-}
+	starting_pfns = min(ALIGN(gfn, PTRS_PER_PMD) - gfn, nr_pfns);
+	aligned_pfns = ALIGN_DOWN(nr_pfns - starting_pfns, PTRS_PER_PMD);
+	remaining_pfns = nr_pfns - aligned_pfns - starting_pfns;
 
-static int mshv_region_unmap(struct mshv_region *region)
-{
-	return mshv_region_process_range(region, 0,
-					 0, region->nr_pfns,
-					 mshv_region_chunk_unmap);
+	if (starting_pfns)
+		ret = hv_call_unmap_pfns(region->partition->pt_id,
+					 gfn, starting_pfns,
+					 0);
+
+	gfn += starting_pfns;
+
+	if (!ret && aligned_pfns)
+		ret = hv_call_unmap_pfns(region->partition->pt_id,
+					 gfn, aligned_pfns,
+					 HV_UNMAP_GPA_LARGE_PAGE);
+
+	gfn += aligned_pfns;
+
+	if (!ret && remaining_pfns)
+		ret = hv_call_unmap_pfns(region->partition->pt_id,
+					 gfn, remaining_pfns,
+					 0);
+
+	return ret;
 }
 
 static void mshv_region_destroy(struct kref *ref)
@@ -684,6 +695,45 @@ bool mshv_region_handle_gfn_fault(struct mshv_region *region, u64 gfn)
 	return !ret;
 }
 
+static int mshv_region_map_no_access(struct mshv_region *region,
+				     u64 pfn_offset, u64 pfn_count)
+{
+	u64 gfn, nr_pfns, starting_pfns, aligned_pfns, remaining_pfns;
+	int ret = 0;
+
+	gfn = region->start_gfn + pfn_offset;
+	nr_pfns = pfn_count;
+
+	starting_pfns = min(ALIGN(gfn, PTRS_PER_PMD) - gfn, nr_pfns);
+	aligned_pfns = ALIGN_DOWN(nr_pfns - starting_pfns, PTRS_PER_PMD);
+	remaining_pfns = nr_pfns - aligned_pfns - starting_pfns;
+
+	if (starting_pfns)
+		ret = hv_call_map_ram_pfns(region->partition->pt_id,
+					   gfn, starting_pfns,
+					   HV_MAP_GPA_NO_ACCESS,
+					   NULL);
+
+	gfn += starting_pfns;
+
+	if (!ret && aligned_pfns)
+		ret = hv_call_map_ram_pfns(region->partition->pt_id,
+					   gfn, aligned_pfns,
+					   HV_MAP_GPA_NO_ACCESS |
+					   HV_MAP_GPA_LARGE_PAGE,
+					   NULL);
+
+	gfn += aligned_pfns;
+
+	if (!ret && remaining_pfns)
+		ret = hv_call_map_ram_pfns(region->partition->pt_id,
+					   gfn, remaining_pfns,
+					   HV_MAP_GPA_NO_ACCESS,
+					   NULL);
+
+	return ret;
+}
+
 /**
  * mshv_region_interval_invalidate - Invalidate a range of memory region
  * @mni: Pointer to the mmu_interval_notifier structure
@@ -727,8 +777,7 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 
 	mmu_interval_set_seq(mni, cur_seq);
 
-	ret = mshv_region_remap_pfns(region, HV_MAP_GPA_NO_ACCESS,
-				     pfn_offset, pfn_count);
+	ret = mshv_region_map_no_access(region, pfn_offset, pfn_count);
 	if (ret)
 		goto out_unlock;
 



^ permalink raw reply related

* [PATCH v3 3/7] mshv: Rename mshv_mem_region to mshv_region
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

The mshv_mem_region structure represents guest address space regions,
which can be either RAM-backed memory or memory-mapped IO regions
without physical backing. The "mem_" prefix incorrectly suggests the
structure only handles memory regions, creating confusion about its
actual purpose.

Remove the "mem_" prefix to align with existing function naming
(mshv_region_map, mshv_region_pin, etc.) and accurately reflect that
this structure manages arbitrary guest address space mappings
regardless of their backing type.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |   74 ++++++++++++++++++++++---------------------
 drivers/hv/mshv_root.h      |   18 +++++-----
 drivers/hv/mshv_root_main.c |   20 ++++++------
 3 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 70cd0857a28e..2c4215381e0b 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -20,7 +20,7 @@
 #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
 #define MSHV_INVALID_PFN				ULONG_MAX
 
-typedef int (*pfn_handler_t)(struct mshv_mem_region *region, u32 flags,
+typedef int (*pfn_handler_t)(struct mshv_region *region, u32 flags,
 			     u64 pfn_offset, u64 pfn_count,
 			     bool huge_page);
 
@@ -81,7 +81,7 @@ static int mshv_chunk_stride(struct page *page,
  *
  * Return: Number of pages handled, or negative error code.
  */
-static long mshv_region_process_pfns(struct mshv_mem_region *region,
+static long mshv_region_process_pfns(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
 				     pfn_handler_t handler)
@@ -135,7 +135,7 @@ static long mshv_region_process_pfns(struct mshv_mem_region *region,
  *
  * Return: Number of PFNs handled, or negative error code.
  */
-static long mshv_region_process_hole(struct mshv_mem_region *region,
+static long mshv_region_process_hole(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
 				     pfn_handler_t handler)
@@ -149,7 +149,7 @@ static long mshv_region_process_hole(struct mshv_mem_region *region,
 	return pfn_count;
 }
 
-static long mshv_region_process_chunk(struct mshv_mem_region *region,
+static long mshv_region_process_chunk(struct mshv_region *region,
 				      u32 flags,
 				      u64 pfn_offset, u64 pfn_count,
 				      pfn_handler_t handler)
@@ -182,7 +182,7 @@ static long mshv_region_process_chunk(struct mshv_mem_region *region,
  *
  * Returns 0 on success, or a negative error code on failure.
  */
-static int mshv_region_process_range(struct mshv_mem_region *region,
+static int mshv_region_process_range(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
 				     pfn_handler_t handler)
@@ -231,12 +231,12 @@ static int mshv_region_process_range(struct mshv_mem_region *region,
 	return 0;
 }
 
-struct mshv_mem_region *mshv_region_create(enum mshv_region_type type,
-					   u64 guest_pfn, u64 nr_pfns,
-					   u64 uaddr, u32 flags,
-					   ulong mmio_pfn)
+struct mshv_region *mshv_region_create(enum mshv_region_type type,
+				       u64 guest_pfn, u64 nr_pfns,
+				       u64 uaddr, u32 flags,
+				       ulong mmio_pfn)
 {
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 	int ret = 0;
 	u64 i;
 
@@ -286,7 +286,7 @@ struct mshv_mem_region *mshv_region_create(enum mshv_region_type type,
 	return ERR_PTR(ret);
 }
 
-static int mshv_region_chunk_share(struct mshv_mem_region *region,
+static int mshv_region_chunk_share(struct mshv_region *region,
 				   u32 flags,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
@@ -305,7 +305,7 @@ static int mshv_region_chunk_share(struct mshv_mem_region *region,
 					      flags, true);
 }
 
-static int mshv_region_share(struct mshv_mem_region *region)
+static int mshv_region_share(struct mshv_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
 
@@ -314,7 +314,7 @@ static int mshv_region_share(struct mshv_mem_region *region)
 					 mshv_region_chunk_share);
 }
 
-static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
+static int mshv_region_chunk_unshare(struct mshv_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
 				     bool huge_page)
@@ -331,7 +331,7 @@ static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
 					      flags, false);
 }
 
-static int mshv_region_unshare(struct mshv_mem_region *region)
+static int mshv_region_unshare(struct mshv_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
 
@@ -340,7 +340,7 @@ static int mshv_region_unshare(struct mshv_mem_region *region)
 					 mshv_region_chunk_unshare);
 }
 
-static int mshv_region_chunk_remap(struct mshv_mem_region *region,
+static int mshv_region_chunk_remap(struct mshv_region *region,
 				   u32 flags,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
@@ -362,7 +362,7 @@ static int mshv_region_chunk_remap(struct mshv_mem_region *region,
 				    region->mreg_pfns + pfn_offset);
 }
 
-static int mshv_region_remap_pfns(struct mshv_mem_region *region,
+static int mshv_region_remap_pfns(struct mshv_region *region,
 				  u32 map_flags,
 				  u64 pfn_offset, u64 pfn_count)
 {
@@ -371,7 +371,7 @@ static int mshv_region_remap_pfns(struct mshv_mem_region *region,
 					 mshv_region_chunk_remap);
 }
 
-static int mshv_region_map(struct mshv_mem_region *region)
+static int mshv_region_map(struct mshv_region *region)
 {
 	u32 map_flags = region->hv_map_flags;
 
@@ -379,7 +379,7 @@ static int mshv_region_map(struct mshv_mem_region *region)
 				      0, region->nr_pfns);
 }
 
-static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
+static void mshv_region_invalidate_pfns(struct mshv_region *region,
 					u64 pfn_offset, u64 pfn_count)
 {
 	u64 i;
@@ -395,12 +395,12 @@ static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
 	}
 }
 
-static void mshv_region_invalidate(struct mshv_mem_region *region)
+static void mshv_region_invalidate(struct mshv_region *region)
 {
 	mshv_region_invalidate_pfns(region, 0, region->nr_pfns);
 }
 
-static int mshv_region_pin(struct mshv_mem_region *region)
+static int mshv_region_pin(struct mshv_region *region)
 {
 	u64 done_count, nr_pfns, i;
 	unsigned long *pfns;
@@ -449,7 +449,7 @@ static int mshv_region_pin(struct mshv_mem_region *region)
 	return ret < 0 ? ret : -ENOMEM;
 }
 
-static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
+static int mshv_region_chunk_unmap(struct mshv_region *region,
 				   u32 flags,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
@@ -465,7 +465,7 @@ static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
 				  pfn_count, flags);
 }
 
-static int mshv_region_unmap(struct mshv_mem_region *region)
+static int mshv_region_unmap(struct mshv_region *region)
 {
 	return mshv_region_process_range(region, 0,
 					 0, region->nr_pfns,
@@ -474,8 +474,8 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
 
 static void mshv_region_destroy(struct kref *ref)
 {
-	struct mshv_mem_region *region =
-		container_of(ref, struct mshv_mem_region, mreg_refcount);
+	struct mshv_region *region =
+		container_of(ref, struct mshv_region, mreg_refcount);
 	struct mshv_partition *partition = region->partition;
 	int ret;
 
@@ -499,12 +499,12 @@ static void mshv_region_destroy(struct kref *ref)
 	vfree(region);
 }
 
-void mshv_region_put(struct mshv_mem_region *region)
+void mshv_region_put(struct mshv_region *region)
 {
 	kref_put(&region->mreg_refcount, mshv_region_destroy);
 }
 
-int mshv_region_get(struct mshv_mem_region *region)
+int mshv_region_get(struct mshv_region *region)
 {
 	return kref_get_unless_zero(&region->mreg_refcount);
 }
@@ -534,7 +534,7 @@ int mshv_region_get(struct mshv_mem_region *region)
  *
  * Return: 0 on success, a negative error code otherwise.
  */
-static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
+static int mshv_region_hmm_fault_and_lock(struct mshv_region *region,
 					  unsigned long start,
 					  unsigned long end,
 					  unsigned long *pfns,
@@ -613,7 +613,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
  *
  * Return: 0 on success, negative errno on failure.
  */
-static int mshv_region_collect_and_map(struct mshv_mem_region *region,
+static int mshv_region_collect_and_map(struct mshv_region *region,
 				       u64 pfn_offset, u64 pfn_count,
 				       bool do_fault)
 {
@@ -653,14 +653,14 @@ static int mshv_region_collect_and_map(struct mshv_mem_region *region,
 	return ret;
 }
 
-static int mshv_region_range_fault(struct mshv_mem_region *region,
+static int mshv_region_range_fault(struct mshv_region *region,
 				   u64 pfn_offset, u64 pfn_count)
 {
 	return mshv_region_collect_and_map(region, pfn_offset, pfn_count,
 					   true);
 }
 
-bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
+bool mshv_region_handle_gfn_fault(struct mshv_region *region, u64 gfn)
 {
 	u64 pfn_offset, pfn_count;
 	int ret;
@@ -706,9 +706,9 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 					    const struct mmu_notifier_range *range,
 					    unsigned long cur_seq)
 {
-	struct mshv_mem_region *region = container_of(mni,
-						      struct mshv_mem_region,
-						      mreg_mni);
+	struct mshv_region *region = container_of(mni,
+						  struct mshv_region,
+						  mreg_mni);
 	u64 pfn_offset, pfn_count;
 	unsigned long mstart, mend;
 	int ret = -EPERM;
@@ -767,7 +767,7 @@ static const struct mmu_interval_notifier_ops mshv_region_mni_ops = {
  *
  * Return: 0 on success, negative error code on failure.
  */
-static int mshv_map_pinned_region(struct mshv_mem_region *region)
+static int mshv_map_pinned_region(struct mshv_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 	int ret;
@@ -823,13 +823,13 @@ static int mshv_map_pinned_region(struct mshv_mem_region *region)
 	return ret;
 }
 
-static int mshv_map_movable_region(struct mshv_mem_region *region)
+static int mshv_map_movable_region(struct mshv_region *region)
 {
 	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
 					   false);
 }
 
-static int mshv_map_mmio_region(struct mshv_mem_region *region)
+static int mshv_map_mmio_region(struct mshv_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 
@@ -838,7 +838,7 @@ static int mshv_map_mmio_region(struct mshv_mem_region *region)
 				     region->nr_pfns);
 }
 
-int mshv_map_region(struct mshv_mem_region *region)
+int mshv_map_region(struct mshv_region *region)
 {
 	switch (region->mreg_type) {
 	case MSHV_REGION_TYPE_MEM_PINNED:
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 2bcdfa070517..97659ba55418 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -81,7 +81,7 @@ enum mshv_region_type {
 	MSHV_REGION_TYPE_MMIO
 };
 
-struct mshv_mem_region {
+struct mshv_region {
 	struct hlist_node hnode;
 	struct kref mreg_refcount;
 	u64 nr_pfns;
@@ -367,13 +367,13 @@ extern struct mshv_root mshv_root;
 extern enum hv_scheduler_type hv_scheduler_type;
 extern u8 * __percpu *hv_synic_eventring_tail;
 
-struct mshv_mem_region *mshv_region_create(enum mshv_region_type type,
-					   u64 guest_pfn, u64 nr_pfns,
-					   u64 uaddr, u32 flags,
-					   ulong mmio_pfn);
-void mshv_region_put(struct mshv_mem_region *region);
-int mshv_region_get(struct mshv_mem_region *region);
-bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn);
-int mshv_map_region(struct mshv_mem_region *region);
+struct mshv_region *mshv_region_create(enum mshv_region_type type,
+				       u64 guest_pfn, u64 nr_pfns,
+				       u64 uaddr, u32 flags,
+				       ulong mmio_pfn);
+void mshv_region_put(struct mshv_region *region);
+int mshv_region_get(struct mshv_region *region);
+bool mshv_region_handle_gfn_fault(struct mshv_region *region, u64 gfn);
+int mshv_map_region(struct mshv_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 3bfa9e9c575f..9d83a2348655 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -612,10 +612,10 @@ static long mshv_run_vp_with_root_scheduler(struct mshv_vp *vp)
 static_assert(sizeof(struct hv_message) <= MSHV_RUN_VP_BUF_SZ,
 	      "sizeof(struct hv_message) must not exceed MSHV_RUN_VP_BUF_SZ");
 
-static struct mshv_mem_region *
+static struct mshv_region *
 mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 {
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 
 	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
 		if (gfn >= region->start_gfn &&
@@ -626,10 +626,10 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 	return NULL;
 }
 
-static struct mshv_mem_region *
+static struct mshv_region *
 mshv_partition_region_by_gfn_get(struct mshv_partition *p, u64 gfn)
 {
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 
 	spin_lock(&p->pt_mem_regions_lock);
 	region = mshv_partition_region_by_gfn(p, gfn);
@@ -656,7 +656,7 @@ mshv_partition_region_by_gfn_get(struct mshv_partition *p, u64 gfn)
 static bool mshv_handle_gpa_intercept(struct mshv_vp *vp)
 {
 	struct mshv_partition *p = vp->vp_partition;
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 	bool ret = false;
 	u64 gfn;
 #if defined(CONFIG_X86_64)
@@ -1217,9 +1217,9 @@ static void mshv_async_hvcall_handler(void *data, u64 *status)
  */
 static int mshv_partition_create_region(struct mshv_partition *partition,
 					struct mshv_user_mem_region *mem,
-					struct mshv_mem_region **regionpp)
+					struct mshv_region **regionpp)
 {
-	struct mshv_mem_region *rg;
+	struct mshv_region *rg;
 	enum mshv_region_type type;
 	u64 nr_pfns = HVPFN_DOWN(mem->size);
 	struct vm_area_struct *vma;
@@ -1282,7 +1282,7 @@ static long
 mshv_map_user_memory(struct mshv_partition *partition,
 		     struct mshv_user_mem_region mem)
 {
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 	long ret;
 
 	if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
@@ -1318,7 +1318,7 @@ static long
 mshv_unmap_user_memory(struct mshv_partition *partition,
 		       struct mshv_user_mem_region mem)
 {
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 
 	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
 		return -EINVAL;
@@ -1690,7 +1690,7 @@ remove_partition(struct mshv_partition *partition)
 static void destroy_partition(struct mshv_partition *partition)
 {
 	struct mshv_vp *vp;
-	struct mshv_mem_region *region;
+	struct mshv_region *region;
 	struct hlist_node *n;
 	int i;
 



^ permalink raw reply related

* [PATCH v3 2/7] mshv: Improve code readability with handler function typedef
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

The inline function pointer declarations in mshv_region_process_*
functions make the code harder to read and maintain. Each function
signature repeats the same lengthy callback parameter definition,
adding visual noise and making the actual logic less clear.

Introduce pfn_handler_t typedef to replace the repeated inline
function pointer declarations. This simplifies function signatures,
makes the code more maintainable, and follows common kernel
patterns for callback handling.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index a85d18e2c279..70cd0857a28e 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -20,6 +20,10 @@
 #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
 #define MSHV_INVALID_PFN				ULONG_MAX
 
+typedef int (*pfn_handler_t)(struct mshv_mem_region *region, u32 flags,
+			     u64 pfn_offset, u64 pfn_count,
+			     bool huge_page);
+
 static const struct mmu_interval_notifier_ops mshv_region_mni_ops;
 
 /**
@@ -80,11 +84,7 @@ static int mshv_chunk_stride(struct page *page,
 static long mshv_region_process_pfns(struct mshv_mem_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
-				     int (*handler)(struct mshv_mem_region *region,
-						    u32 flags,
-						    u64 pfn_offset,
-						    u64 pfn_count,
-						    bool huge_page))
+				     pfn_handler_t handler)
 {
 	u64 gfn = region->start_gfn + pfn_offset;
 	u64 count;
@@ -138,11 +138,7 @@ static long mshv_region_process_pfns(struct mshv_mem_region *region,
 static long mshv_region_process_hole(struct mshv_mem_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
-				     int (*handler)(struct mshv_mem_region *region,
-						    u32 flags,
-						    u64 pfn_offset,
-						    u64 pfn_count,
-						    bool huge_page))
+				     pfn_handler_t handler)
 {
 	long ret;
 
@@ -156,11 +152,7 @@ static long mshv_region_process_hole(struct mshv_mem_region *region,
 static long mshv_region_process_chunk(struct mshv_mem_region *region,
 				      u32 flags,
 				      u64 pfn_offset, u64 pfn_count,
-				      int (*handler)(struct mshv_mem_region *region,
-						     u32 flags,
-						     u64 pfn_offset,
-						     u64 pfn_count,
-						     bool huge_page))
+				      pfn_handler_t handler)
 {
 	if (pfn_valid(region->mreg_pfns[pfn_offset]))
 		return mshv_region_process_pfns(region, flags,
@@ -193,11 +185,7 @@ static long mshv_region_process_chunk(struct mshv_mem_region *region,
 static int mshv_region_process_range(struct mshv_mem_region *region,
 				     u32 flags,
 				     u64 pfn_offset, u64 pfn_count,
-				     int (*handler)(struct mshv_mem_region *region,
-						    u32 flags,
-						    u64 pfn_offset,
-						    u64 pfn_count,
-						    bool huge_page))
+				     pfn_handler_t handler)
 {
 	u64 start, end;
 	long ret;



^ permalink raw reply related

* [PATCH v3 1/7] mshv: Consolidate region creation and mapping
From: Stanislav Kinsburskii @ 2026-04-09 15:24 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177574802240.19719.4873018419452139691.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

Consolidate region type detection and initialization into
mshv_region_create() to simplify the region creation flow. Move type
determination logic (MMIO/pinned/movable) earlier in the process and
initialize type-specific fields during creation rather than after.

This eliminates the need for mshv_region_movable_init/fini() by
handling MMU interval notifier setup directly in the constructor and
teardown in the destructor. Region mapping is also unified through a
single mshv_map_region() dispatcher that routes to the appropriate
type-specific handler.

Changes improve code organization by:
- Reducing API surface (4 fewer exported functions)
- Centralizing type determination and validation
- Making region lifecycle more explicit and easier to follow
- Removing post-construction initialization steps

The refactoring maintains existing functionality while making the
codebase more maintainable and less error-prone.

Additionally, movable region initialization now fails explicitly
if mmu_interval_notifier_insert() returns an error, rather than
silently falling back to pinned memory. This fail-fast approach
makes configuration issues more visible.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |   81 ++++++++++++++++++++++++++++---------------
 drivers/hv/mshv_root.h      |   14 +++----
 drivers/hv/mshv_root_main.c |   61 +++++++++++++-------------------
 3 files changed, 83 insertions(+), 73 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 6b703b269a4f..a85d18e2c279 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -20,6 +20,8 @@
 #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
 #define MSHV_INVALID_PFN				ULONG_MAX
 
+static const struct mmu_interval_notifier_ops mshv_region_mni_ops;
+
 /**
  * mshv_chunk_stride - Compute stride for mapping guest memory
  * @page      : The page to check for huge page backing
@@ -241,16 +243,39 @@ static int mshv_region_process_range(struct mshv_mem_region *region,
 	return 0;
 }
 
-struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pfns,
-					   u64 uaddr, u32 flags)
+struct mshv_mem_region *mshv_region_create(enum mshv_region_type type,
+					   u64 guest_pfn, u64 nr_pfns,
+					   u64 uaddr, u32 flags,
+					   ulong mmio_pfn)
 {
 	struct mshv_mem_region *region;
+	int ret = 0;
 	u64 i;
 
 	region = vzalloc(sizeof(*region) + sizeof(unsigned long) * nr_pfns);
 	if (!region)
 		return ERR_PTR(-ENOMEM);
 
+	switch (type) {
+	case MSHV_REGION_TYPE_MEM_MOVABLE:
+		ret = mmu_interval_notifier_insert(&region->mreg_mni,
+						   current->mm, uaddr,
+						   nr_pfns << HV_HYP_PAGE_SHIFT,
+						   &mshv_region_mni_ops);
+		break;
+	case MSHV_REGION_TYPE_MEM_PINNED:
+		break;
+	case MSHV_REGION_TYPE_MMIO:
+		region->mreg_mmio_pfn = mmio_pfn;
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		goto free_region;
+
+	region->mreg_type = type;
 	region->nr_pfns = nr_pfns;
 	region->start_gfn = guest_pfn;
 	region->start_uaddr = uaddr;
@@ -263,9 +288,14 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pfns,
 	for (i = 0; i < nr_pfns; i++)
 		region->mreg_pfns[i] = MSHV_INVALID_PFN;
 
+	mutex_init(&region->mreg_mutex);
 	kref_init(&region->mreg_refcount);
 
 	return region;
+
+free_region:
+	vfree(region);
+	return ERR_PTR(ret);
 }
 
 static int mshv_region_chunk_share(struct mshv_mem_region *region,
@@ -462,7 +492,7 @@ static void mshv_region_destroy(struct kref *ref)
 	int ret;
 
 	if (region->mreg_type == MSHV_REGION_TYPE_MEM_MOVABLE)
-		mshv_region_movable_fini(region);
+		mmu_interval_notifier_remove(&region->mreg_mni);
 
 	if (mshv_partition_encrypted(partition)) {
 		ret = mshv_region_share(region);
@@ -736,27 +766,6 @@ static const struct mmu_interval_notifier_ops mshv_region_mni_ops = {
 	.invalidate = mshv_region_interval_invalidate,
 };
 
-void mshv_region_movable_fini(struct mshv_mem_region *region)
-{
-	mmu_interval_notifier_remove(&region->mreg_mni);
-}
-
-bool mshv_region_movable_init(struct mshv_mem_region *region)
-{
-	int ret;
-
-	ret = mmu_interval_notifier_insert(&region->mreg_mni, current->mm,
-					   region->start_uaddr,
-					   region->nr_pfns << HV_HYP_PAGE_SHIFT,
-					   &mshv_region_mni_ops);
-	if (ret)
-		return false;
-
-	mutex_init(&region->mreg_mutex);
-
-	return true;
-}
-
 /**
  * mshv_map_pinned_region - Pin and map memory regions
  * @region: Pointer to the memory region structure
@@ -770,7 +779,7 @@ bool mshv_region_movable_init(struct mshv_mem_region *region)
  *
  * Return: 0 on success, negative error code on failure.
  */
-int mshv_map_pinned_region(struct mshv_mem_region *region)
+static int mshv_map_pinned_region(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 	int ret;
@@ -826,17 +835,31 @@ int mshv_map_pinned_region(struct mshv_mem_region *region)
 	return ret;
 }
 
-int mshv_map_movable_region(struct mshv_mem_region *region)
+static int mshv_map_movable_region(struct mshv_mem_region *region)
 {
 	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
 					   false);
 }
 
-int mshv_map_mmio_region(struct mshv_mem_region *region,
-			 unsigned long mmio_pfn)
+static int mshv_map_mmio_region(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 
 	return hv_call_map_mmio_pfns(partition->pt_id, region->start_gfn,
-				     mmio_pfn, region->nr_pfns);
+				     region->mreg_mmio_pfn,
+				     region->nr_pfns);
+}
+
+int mshv_map_region(struct mshv_mem_region *region)
+{
+	switch (region->mreg_type) {
+	case MSHV_REGION_TYPE_MEM_PINNED:
+		return mshv_map_pinned_region(region);
+	case MSHV_REGION_TYPE_MEM_MOVABLE:
+		return mshv_map_movable_region(region);
+	case MSHV_REGION_TYPE_MMIO:
+		return mshv_map_mmio_region(region);
+	}
+
+	return -EINVAL;
 }
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 1f92b9f85b60..2bcdfa070517 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -92,6 +92,7 @@ struct mshv_mem_region {
 	enum mshv_region_type mreg_type;
 	struct mmu_interval_notifier mreg_mni;
 	struct mutex mreg_mutex;	/* protects region PFNs remapping */
+	u64 mreg_mmio_pfn;
 	unsigned long mreg_pfns[];
 };
 
@@ -366,16 +367,13 @@ extern struct mshv_root mshv_root;
 extern enum hv_scheduler_type hv_scheduler_type;
 extern u8 * __percpu *hv_synic_eventring_tail;
 
-struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
-					   u64 uaddr, u32 flags);
+struct mshv_mem_region *mshv_region_create(enum mshv_region_type type,
+					   u64 guest_pfn, u64 nr_pfns,
+					   u64 uaddr, u32 flags,
+					   ulong mmio_pfn);
 void mshv_region_put(struct mshv_mem_region *region);
 int mshv_region_get(struct mshv_mem_region *region);
 bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn);
-void mshv_region_movable_fini(struct mshv_mem_region *region);
-bool mshv_region_movable_init(struct mshv_mem_region *region);
-int mshv_map_pinned_region(struct mshv_mem_region *region);
-int mshv_map_movable_region(struct mshv_mem_region *region);
-int mshv_map_mmio_region(struct mshv_mem_region *region,
-			 unsigned long mmio_pfn);
+int mshv_map_region(struct mshv_mem_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index adb09350205a..3bfa9e9c575f 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1217,11 +1217,14 @@ static void mshv_async_hvcall_handler(void *data, u64 *status)
  */
 static int mshv_partition_create_region(struct mshv_partition *partition,
 					struct mshv_user_mem_region *mem,
-					struct mshv_mem_region **regionpp,
-					bool is_mmio)
+					struct mshv_mem_region **regionpp)
 {
 	struct mshv_mem_region *rg;
+	enum mshv_region_type type;
 	u64 nr_pfns = HVPFN_DOWN(mem->size);
+	struct vm_area_struct *vma;
+	ulong mmio_pfn;
+	bool is_mmio;
 
 	/* Reject overlapping regions */
 	spin_lock(&partition->pt_mem_regions_lock);
@@ -1234,18 +1237,27 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	}
 	spin_unlock(&partition->pt_mem_regions_lock);
 
-	rg = mshv_region_create(mem->guest_pfn, nr_pfns,
-				mem->userspace_addr, mem->flags);
-	if (IS_ERR(rg))
-		return PTR_ERR(rg);
+	mmap_read_lock(current->mm);
+	vma = vma_lookup(current->mm, mem->userspace_addr);
+	is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
+	mmio_pfn = is_mmio ? vma->vm_pgoff : 0;
+	mmap_read_unlock(current->mm);
+
+	if (!vma)
+		return -EINVAL;
 
 	if (is_mmio)
-		rg->mreg_type = MSHV_REGION_TYPE_MMIO;
-	else if (mshv_partition_encrypted(partition) ||
-		 !mshv_region_movable_init(rg))
-		rg->mreg_type = MSHV_REGION_TYPE_MEM_PINNED;
+		type = MSHV_REGION_TYPE_MMIO;
+	else if (mshv_partition_encrypted(partition))
+		type = MSHV_REGION_TYPE_MEM_PINNED;
 	else
-		rg->mreg_type = MSHV_REGION_TYPE_MEM_MOVABLE;
+		type = MSHV_REGION_TYPE_MEM_MOVABLE;
+
+	rg = mshv_region_create(type, mem->guest_pfn, nr_pfns,
+				mem->userspace_addr, mem->flags,
+				mmio_pfn);
+	if (IS_ERR(rg))
+		return PTR_ERR(rg);
 
 	rg->partition = partition;
 
@@ -1271,40 +1283,17 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		     struct mshv_user_mem_region mem)
 {
 	struct mshv_mem_region *region;
-	struct vm_area_struct *vma;
-	bool is_mmio;
-	ulong mmio_pfn;
 	long ret;
 
 	if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
 	    !access_ok((const void __user *)mem.userspace_addr, mem.size))
 		return -EINVAL;
 
-	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, mem.userspace_addr);
-	is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
-	mmio_pfn = is_mmio ? vma->vm_pgoff : 0;
-	mmap_read_unlock(current->mm);
-
-	if (!vma)
-		return -EINVAL;
-
-	ret = mshv_partition_create_region(partition, &mem, &region,
-					   is_mmio);
+	ret = mshv_partition_create_region(partition, &mem, &region);
 	if (ret)
 		return ret;
 
-	switch (region->mreg_type) {
-	case MSHV_REGION_TYPE_MEM_PINNED:
-		ret = mshv_map_pinned_region(region);
-		break;
-	case MSHV_REGION_TYPE_MEM_MOVABLE:
-		ret = mshv_map_movable_region(region);
-		break;
-	case MSHV_REGION_TYPE_MMIO:
-		ret = mshv_map_mmio_region(region, mmio_pfn);
-		break;
-	}
+	ret = mshv_map_region(region);
 
 	trace_mshv_map_user_memory(partition->pt_id, region->start_uaddr,
 				   region->start_gfn, region->nr_pfns,



^ permalink raw reply related

* [PATCH v3 0/7] mshv: Reduce memory consumption for unpinned regions
From: Stanislav Kinsburskii @ 2026-04-09 15:23 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

This series reduces memory consumption for unpinned regions by avoiding
PFN array allocation. A 1GB unpinned region currently wastes 2MB for an
unused PFN array that HMM-managed regions don't need.

The first three patches are preparatory refactoring. Patch 1 consolidates
region creation and mapping logic, reducing API surface by 4 functions.
Patch 2 introduces a typedef for PFN handler callbacks to simplify
function signatures. Patch 3 renames mshv_mem_region to mshv_region to
align with existing function naming conventions.

Patch 4 optimizes unmap and no-access remap operations by eliminating
redundant PFN iteration when all frames are guaranteed to be mapped.
This uses large page flags for aligned chunks and removes unnecessary
helper functions.

Patches 5-6 decouple PFN processing from the region->pfns storage.
Patch 5 threads the pfns pointer explicitly through the processing
chain. Patch 6 removes offset-based indexing by having callers pass
pre-offset pointers.

Patch 7 converts the pfns array from a flexible array member to a
conditional pointer, allocated only for pinned regions that need it
for share/unshare/evict operations. This eliminates the memory waste
for unpinned regions and allows using kzalloc instead of vzalloc.

v3:
- Fix missing unmap/remap of pages before the first huge page.

v2:
- Improved commit message
- Fixed invalid vfree(region->mreg_pfns) call for MMIO-backed regions
- Fixed unpinning of already-released pages in the error path during
  pinned region creation
- Removed redundant mshv_map_region helper in favor of the new
  optimized mapping logic

---

Stanislav Kinsburskii (7):
      mshv: Consolidate region creation and mapping
      mshv: Improve code readability with handler function typedef
      mshv: Rename mshv_mem_region to mshv_region
      mshv: Optimize memory region mapping operations
      mshv: Pass pfns array explicitly through processing chain
      mshv: Simplify pfn array handling in region processing
      mshv: Allocate pfns array only for pinned regions

 drivers/hv/mshv_regions.c   |  372 ++++++++++++++++++++++++++-----------------
 drivers/hv/mshv_root.h      |   26 ++-
 drivers/hv/mshv_root_main.c |   79 ++++-----
 3 files changed, 271 insertions(+), 206 deletions(-)

^ permalink raw reply

* Re: [PATCH v0 06/15] mshv: Implement mshv bridge device for VFIO
From: Stanislav Kinsburskii @ 2026-04-09 14:41 UTC (permalink / raw)
  To: Mukesh R
  Cc: linux-kernel, linux-hyperv, linux-arm-kernel, iommu, linux-pci,
	linux-arch, kys, haiyangz, wei.liu, decui, longli,
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, hpa, joro,
	lpieralisi, kwilczynski, mani, robh, bhelgaas, arnd, nunodasneves,
	mhklinux, romank
In-Reply-To: <c30ede65-46c4-02b1-756a-868f9a265cf1@linux.microsoft.com>

On Tue, Apr 07, 2026 at 10:41:12AM -0700, Mukesh R wrote:
> On 1/20/26 08:09, Stanislav Kinsburskii wrote:
> > On Mon, Jan 19, 2026 at 10:42:21PM -0800, Mukesh R wrote:
> > > From: Mukesh Rathor <mrathor@linux.microsoft.com>
> > > 
> > > Add a new file to implement VFIO-MSHV bridge pseudo device. These
> > > functions are called in the VFIO framework, and credits to kvm/vfio.c
> > > as this file was adapted from it.
> > > 
> > > Original author: Wei Liu <wei.liu@kernel.org>
> > > (Slightly modified from the original version).
> > > 
> > 
> > There is a Linux standard for giving credits when code is adapted from.
> > This doesn't follow that standard. Please fix.
> > 
> > > Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
> > > ---
> > >   drivers/hv/Makefile    |   3 +-
> > >   drivers/hv/mshv_vfio.c | 210 +++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 212 insertions(+), 1 deletion(-)
> > >   create mode 100644 drivers/hv/mshv_vfio.c
> > > 
> > > diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> > > index a49f93c2d245..eae003c4cb8f 100644
> > > --- a/drivers/hv/Makefile
> > > +++ b/drivers/hv/Makefile
> > > @@ -14,7 +14,8 @@ hv_vmbus-y := vmbus_drv.o \
> > >   hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
> > >   hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
> > >   mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> > > -	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
> > > +	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o \
> > > +               mshv_vfio.o
> > >   mshv_vtl-y := mshv_vtl_main.o
> > >   # Code that must be built-in
> > > diff --git a/drivers/hv/mshv_vfio.c b/drivers/hv/mshv_vfio.c
> > > new file mode 100644
> > > index 000000000000..6ea4d99a3bd2
> > > --- /dev/null
> > > +++ b/drivers/hv/mshv_vfio.c
> > > @@ -0,0 +1,210 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * VFIO-MSHV bridge pseudo device
> > > + *
> > > + * Heavily inspired by the VFIO-KVM bridge pseudo device.
> > > + */
> > > +#include <linux/errno.h>
> > > +#include <linux/file.h>
> > > +#include <linux/list.h>
> > > +#include <linux/module.h>
> > > +#include <linux/mutex.h>
> > > +#include <linux/slab.h>
> > > +#include <linux/vfio.h>
> > > +
> > > +#include "mshv.h"
> > > +#include "mshv_root.h"
> > > +
> > > +struct mshv_vfio_file {
> > > +	struct list_head node;
> > > +	struct file *file;	/* list of struct mshv_vfio_file */
> > > +};
> > > +
> > > +struct mshv_vfio {
> > > +	struct list_head file_list;
> > > +	struct mutex lock;
> > > +};
> > > +
> > > +static bool mshv_vfio_file_is_valid(struct file *file)
> > > +{
> > > +	bool (*fn)(struct file *file);
> > > +	bool ret;
> > > +
> > > +	fn = symbol_get(vfio_file_is_valid);
> > > +	if (!fn)
> > > +		return false;
> > > +
> > > +	ret = fn(file);
> > > +
> > > +	symbol_put(vfio_file_is_valid);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static long mshv_vfio_file_add(struct mshv_device *mshvdev, unsigned int fd)
> > > +{
> > > +	struct mshv_vfio *mshv_vfio = mshvdev->device_private;
> > > +	struct mshv_vfio_file *mvf;
> > > +	struct file *filp;
> > > +	long ret = 0;
> > > +
> > > +	filp = fget(fd);
> > > +	if (!filp)
> > > +		return -EBADF;
> > > +
> > > +	/* Ensure the FD is a vfio FD. */
> > > +	if (!mshv_vfio_file_is_valid(filp)) {
> > > +		ret = -EINVAL;
> > > +		goto out_fput;
> > > +	}
> > > +
> > > +	mutex_lock(&mshv_vfio->lock);
> > > +
> > > +	list_for_each_entry(mvf, &mshv_vfio->file_list, node) {
> > > +		if (mvf->file == filp) {
> > > +			ret = -EEXIST;
> > > +			goto out_unlock;
> > > +		}
> > > +	}
> > > +
> > > +	mvf = kzalloc(sizeof(*mvf), GFP_KERNEL_ACCOUNT);
> > > +	if (!mvf) {
> > > +		ret = -ENOMEM;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	mvf->file = get_file(filp);
> > > +	list_add_tail(&mvf->node, &mshv_vfio->file_list);
> > > +
> > > +out_unlock:
> > > +	mutex_unlock(&mshv_vfio->lock);
> > > +out_fput:
> > > +	fput(filp);
> > > +	return ret;
> > > +}
> > > +
> > > +static long mshv_vfio_file_del(struct mshv_device *mshvdev, unsigned int fd)
> > > +{
> > > +	struct mshv_vfio *mshv_vfio = mshvdev->device_private;
> > > +	struct mshv_vfio_file *mvf;
> > > +	long ret;
> > > +
> > > +	CLASS(fd, f)(fd);
> > > +
> > > +	if (fd_empty(f))
> > > +		return -EBADF;
> > > +
> > > +	ret = -ENOENT;
> > > +	mutex_lock(&mshv_vfio->lock);
> > > +
> > > +	list_for_each_entry(mvf, &mshv_vfio->file_list, node) {
> > > +		if (mvf->file != fd_file(f))
> > > +			continue;
> > > +
> > > +		list_del(&mvf->node);
> > > +		fput(mvf->file);
> > > +		kfree(mvf);
> > > +		ret = 0;
> > > +		break;
> > > +	}
> > > +
> > > +	mutex_unlock(&mshv_vfio->lock);
> > > +	return ret;
> > > +}
> > > +
> > > +static long mshv_vfio_set_file(struct mshv_device *mshvdev, long attr,
> > > +			      void __user *arg)
> > > +{
> > > +	int32_t __user *argp = arg;
> > > +	int32_t fd;
> > > +
> > > +	switch (attr) {
> > > +	case MSHV_DEV_VFIO_FILE_ADD:
> > > +		if (get_user(fd, argp))
> > > +			return -EFAULT;
> > > +		return mshv_vfio_file_add(mshvdev, fd);
> > > +
> > > +	case MSHV_DEV_VFIO_FILE_DEL:
> > > +		if (get_user(fd, argp))
> > > +			return -EFAULT;
> > > +		return mshv_vfio_file_del(mshvdev, fd);
> > > +	}
> > > +
> > > +	return -ENXIO;
> > > +}
> > > +
> > > +static long mshv_vfio_set_attr(struct mshv_device *mshvdev,
> > > +			      struct mshv_device_attr *attr)
> > > +{
> > > +	switch (attr->group) {
> > > +	case MSHV_DEV_VFIO_FILE:
> > > +		return mshv_vfio_set_file(mshvdev, attr->attr,
> > > +					  u64_to_user_ptr(attr->addr));
> > > +	}
> > > +
> > > +	return -ENXIO;
> > > +}
> > > +
> > > +static long mshv_vfio_has_attr(struct mshv_device *mshvdev,
> > > +			      struct mshv_device_attr *attr)
> > > +{
> > > +	switch (attr->group) {
> > > +	case MSHV_DEV_VFIO_FILE:
> > > +		switch (attr->attr) {
> > > +		case MSHV_DEV_VFIO_FILE_ADD:
> > > +		case MSHV_DEV_VFIO_FILE_DEL:
> > > +			return 0;
> > > +		}
> > > +
> > > +		break;
> > > +	}
> > > +
> > > +	return -ENXIO;
> > > +}
> > > +
> > > +static long mshv_vfio_create_device(struct mshv_device *mshvdev, u32 type)
> > > +{
> > > +	struct mshv_device *tmp;
> > > +	struct mshv_vfio *mshv_vfio;
> > > +
> > > +	/* Only one VFIO "device" per VM */
> > > +	hlist_for_each_entry(tmp, &mshvdev->device_pt->pt_devices,
> > > +			     device_ptnode)
> > > +		if (tmp->device_ops == &mshv_vfio_device_ops)
> > > +			return -EBUSY;
> > > +
> > > +	mshv_vfio = kzalloc(sizeof(*mshv_vfio), GFP_KERNEL_ACCOUNT);
> > > +	if (mshv_vfio == NULL)
> > > +		return -ENOMEM;
> > > +
> > > +	INIT_LIST_HEAD(&mshv_vfio->file_list);
> > > +	mutex_init(&mshv_vfio->lock);
> > > +
> > > +	mshvdev->device_private = mshv_vfio;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* This is called from mshv_device_fop_release() */
> > > +static void mshv_vfio_release_device(struct mshv_device *mshvdev)
> > > +{
> > > +	struct mshv_vfio *mv = mshvdev->device_private;
> > > +	struct mshv_vfio_file *mvf, *tmp;
> > > +
> > > +	list_for_each_entry_safe(mvf, tmp, &mv->file_list, node) {
> > > +		fput(mvf->file);
> > 
> > This put must be sync as device must be detached from domain before
> > attempting partition destruction.
> 
> Like I said in 6.6 PR, this does not attach or detach devices.
> 

You are mistaken. It absolutely does.

Thanks,
Stanislav

> > This was explicitly mentioned in the patch originated this code.
> > Please fix, add a comment and credits to the commit message.
> 
> That was ".detstroy" hook which is gone.
> 
> Thanks,
> -Mukesh
> 
> 
> > Thanks,
> > Stanislav
> > 
> > 
> > > +		list_del(&mvf->node);
> > > +		kfree(mvf);
> > > +	}
> > > +
> > > +	kfree(mv);
> > > +	kfree(mshvdev);
> > > +}
> > > +
> > > +struct mshv_device_ops mshv_vfio_device_ops = {
> > > +	.device_name = "mshv-vfio",
> > > +	.device_create = mshv_vfio_create_device,
> > > +	.device_release = mshv_vfio_release_device,
> > > +	.device_set_attr = mshv_vfio_set_attr,
> > > +	.device_has_attr = mshv_vfio_has_attr,
> > > +};
> > > -- 
> > > 2.51.2.vfs.0.1
> > > 
> 

^ permalink raw reply

* [PATCH v3] tools: hv: Fix cross-compilation
From: Aditya Garg @ 2026-04-09 10:32 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, gregkh, ssengar,
	linux-hyperv, linux-kernel, avladu, vdso, gargaditya, gargaditya
  Cc: Roman Kisel

Use the native ARCH only in case it is not set, this will allow the
cross-compilation where ARCH is explicitly set.

Additionally, simplify the ARCH check to build the fcopy daemon only
for x86 and x86_64.

Fixes: 82b0945ce2c2 ("tools: hv: Add new fcopy application based on uio driver")
Reported-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Closes: https://lore.kernel.org/linux-hyperv/PR3PR09MB54119DB2FD76977C62D8DD6AB04D2@PR3PR09MB5411.eurprd09.prod.outlook.com/
Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
---
Changes since v2:
    - Handle the normalized ARCH=x86 value from the top-level kernel Makefile

Changes since v1:
    - Dropped the info target printing CC, LD and ARCH

v2: https://lore.kernel.org/all/20260407122040.249733-1-gargaditya@linux.microsoft.com/
v1: https://lore.kernel.org/all/1733992114-7305-1-git-send-email-ssengar@linux.microsoft.com/
---
 tools/hv/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index 34ffcec264ab..016753f3dd7f 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Hyper-V tools
 include ../scripts/Makefile.include
 
-ARCH := $(shell uname -m 2>/dev/null)
+ARCH ?= $(shell uname -m 2>/dev/null)
 sbindir ?= /usr/sbin
 libexecdir ?= /usr/libexec
 sharedstatedir ?= /var/lib
@@ -20,7 +20,7 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
 override CFLAGS += -Wno-address-of-packed-member
 
 ALL_TARGETS := hv_kvp_daemon hv_vss_daemon
-ifneq ($(ARCH), aarch64)
+ifneq ($(filter x86_64 x86,$(ARCH)),)
 ALL_TARGETS += hv_fcopy_uio_daemon
 endif
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] x86/VMBus: Confidential VMBus for dynamic DMA transfers
From: Tianyu Lan @ 2026-04-09  2:05 UTC (permalink / raw)
  To: Easwar Hariharan
  Cc: kys, haiyangz, wei.liu, decui, longli, James.Bottomley,
	martin.petersen, apais, Tianyu Lan, linux-hyperv, linux-kernel,
	linux-scsi, vdso, mhklinux
In-Reply-To: <2a80b7a6-2cfe-4bd0-a799-ff855df7bd41@linux.microsoft.com>

On Thu, Apr 9, 2026 at 12:55 AM Easwar Hariharan
<easwar.hariharan@linux.microsoft.com> wrote:
>
> On 4/8/2026 12:31 AM, Tianyu Lan wrote:
> > Hyper-V provides Confidential VMBus to communicate between
> > device model and device guest driver via encrypted/private
> > memory in Confidential VM. The device model is in OpenHCL
> > (https://openvmm.dev/guide/user_guide/openhcl.html) that
> > plays the paravisor role.
> >
> > For a VMBus device, there are two communication methods to
> > talk with Host/Hypervisor. 1) VMBUS Ring buffer 2) Dynamic
> > DMA transfer.
> >
> > The Confidential VMBus Ring buffer has been upstreamed by
> > Roman Kisel(commit 6802d8af47d1).
> >
> > The dynamic DMA transition of VMBus device normally goes
> > through DMA core and it uses SWIOTLB as bounce buffer in
> > a CoCo VM.
> >
> > The Confidential VMBus device can do DMA directly to
> > private/encrypted memory. Because the swiotlb is decrypted
> > memory, the DMA transfer must not be bounced through the
> > swiotlb, so as to preserve confidentiality. This is different
> > from the default for Linux CoCo VMs, so not use DMA(SWIOTLB)
> > API in VMBus driver when confidential dynamic DMA transfers
> > capability is present.
> >
> > Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> > ---
> >  drivers/scsi/storvsc_drv.c | 28 +++++++++++++++++++++-------
> >  include/linux/hyperv.h     |  1 +
> >  2 files changed, 22 insertions(+), 7 deletions(-)
> >
>
> Does netvsc not need this same sort of patch?
>

Hi Easwar:
     Thanks for your review. AFAIK, storvsc support the capability
We may add such change for netvsc driver later once netvsc
also supports confidential external memory.

-- 
Thanks
Tianyu Lan

^ permalink raw reply

* Re: [PATCH] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
From: Martin K. Petersen @ 2026-04-09  1:52 UTC (permalink / raw)
  To: Li Tian
  Cc: linux-scsi, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, James E.J. Bottomley, Martin K. Petersen, linux-hyperv,
	linux-kernel
In-Reply-To: <20260406015344.12566-1-litian@redhat.com>


Li,

> The storvsc driver has become stricter in handling SRB status codes
> returned by the Hyper-V host. When using Virtual Fibre Channel (vFC)
> passthrough, the host may return SRB_STATUS_DATA_OVERRUN for
> PERSISTENT_RESERVE_IN commands if the allocation length in the CDB
> does not match the host's expected response size.

Applied to 7.1/scsi-staging, thanks!

-- 
Martin K. Petersen

^ permalink raw reply

* Re: [RFC v1 1/5] PCI: hv: Create and export hv_build_logical_dev_id()
From: Easwar Hariharan @ 2026-04-08 20:20 UTC (permalink / raw)
  To: Michael Kelley
  Cc: easwar.hariharan, Yu Zhang, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, iommu@lists.linux.dev,
	linux-pci@vger.kernel.org, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	lpieralisi@kernel.org, kwilczynski@kernel.org, mani@kernel.org,
	robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
	joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	jacob.pan@linux.microsoft.com, nunodasneves@linux.microsoft.com,
	mrathor@linux.microsoft.com, peterz@infradead.org,
	linux-arch@vger.kernel.org
In-Reply-To: <SN6PR02MB4157098A14BE63FCA8C0A70ED480A@SN6PR02MB4157.namprd02.prod.outlook.com>

On 1/11/2026 9:36 AM, Michael Kelley wrote:
> From: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Sent: Friday, January 9, 2026 10:41 AM
>>
>> On 1/8/2026 10:46 AM, Michael Kelley wrote:
>>> From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Monday, December 8, 2025 9:11 PM
>>>>
>>>> From: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>>>>
>>>> Hyper-V uses a logical device ID to identify a PCI endpoint device for
>>>> child partitions. This ID will also be required for future hypercalls
>>>> used by the Hyper-V IOMMU driver.
>>>>
>>>> Refactor the logic for building this logical device ID into a standalone
>>>> helper function and export the interface for wider use.
>>>>
>>>> Signed-off-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>>>> Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
>>>> ---
>>>>  drivers/pci/controller/pci-hyperv.c | 28 ++++++++++++++++++++--------
>>>>  include/asm-generic/mshyperv.h      |  2 ++
>>>>  2 files changed, 22 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
>>>> index 146b43981b27..4b82e06b5d93 100644
>>>> --- a/drivers/pci/controller/pci-hyperv.c
>>>> +++ b/drivers/pci/controller/pci-hyperv.c
>>>> @@ -598,15 +598,31 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data)
>>>>
>>>>  #define hv_msi_prepare		pci_msi_prepare
>>>>
>>>> +/**
>>>> + * Build a "Device Logical ID" out of this PCI bus's instance GUID and the
>>>> + * function number of the device.
>>>> + */
>>>> +u64 hv_build_logical_dev_id(struct pci_dev *pdev)
>>>> +{
>>>> +	struct pci_bus *pbus = pdev->bus;
>>>> +	struct hv_pcibus_device *hbus = container_of(pbus->sysdata,
>>>> +						struct hv_pcibus_device, sysdata);
>>>> +
>>>> +	return (u64)((hbus->hdev->dev_instance.b[5] << 24) |
>>>> +		     (hbus->hdev->dev_instance.b[4] << 16) |
>>>> +		     (hbus->hdev->dev_instance.b[7] << 8)  |
>>>> +		     (hbus->hdev->dev_instance.b[6] & 0xf8) |
>>>> +		     PCI_FUNC(pdev->devfn));
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(hv_build_logical_dev_id);
>>>
>>> This change is fine for hv_irq_retarget_interrupt(), it doesn't help for the
>>> new IOMMU driver because pci-hyperv.c can (and often is) built as a module.
>>> The new Hyper-V IOMMU driver in this patch series is built-in, and so it can't
>>> use this symbol in that case -- you'll get a link error on vmlinux when building
>>> the kernel. Requiring pci-hyperv.c to *not* be built as a module would also
>>> require that the VMBus driver not be built as a module, so I don't think that's
>>> the right solution.
>>>
>>> This is a messy problem. The new IOMMU driver needs to start with a generic
>>> "struct device" for the PCI device, and somehow find the corresponding VMBus
>>> PCI pass-thru device from which it can get the VMBus instance ID. I'm thinking
>>> about ways to do this that don't depend on code and data structures that are
>>> private to the pci-hyperv.c driver, and will follow-up if I have a good suggestion.
>>
>> Thank you, Michael. FWIW, I did try to pull out the device ID components out of
>> pci-hyperv into include/linux/hyperv.h and/or a new include/linux/pci-hyperv.h
>> but it was just too messy as you say.
> 
> Yes, the current approach for getting the device ID wanders through struct
> hv_pcibus_device (which is private to the pci-hyperv driver), and through
> struct hv_device (which is a VMBus data structure). That makes the linkage
> between the PV IOMMU driver and the pci-hyperv and VMBus drivers rather
> substantial, which is not good.

Hi Michael,

I missed this, or made a mental note to follow up but forgot. Either way, Yu reminded
me about this email chain and I started looking at it this week.

> 
> But here's an idea for an alternate approach. The PV IOMMU driver doesn't
> have to generate the logical device ID on-the-fly by going to the dev_instance
> field of struct hv_device. Instead, the pci-hyperv driver can generate the logical
> device ID in hv_pci_probe(), and put it somewhere that's easy for the IOMMU
> driver to access. The logical device ID doesn't change while Linux is running, so
> stashing another copy somewhere isn't a problem.

In my exploration and consulting with Dexuan, I realized that one of the components of
the logical device ID, the PCI function number is set only in pci_scan_device(), well into
pci_scan_root_bus_bridge() that you call out as the point by which the communication must
have occurred.

But then, Dexuan also pointed me to hv_pci_assign_slots() with its call to wslot_to_devfn() and I'm
honestly confused how these two interact. With the current approach, it looks like whatever
devfn pci_scan_device() set is the correct function number to use for the logical device
ID, in which case, the best I can do with your suggested approach below is to inform the
pvIOMMU driver of the GUID, rather than the logical device ID itself.

Perhaps with your history, you can clarify the interaction, and/or share your thoughts
on the above?

> 
> So have the Hyper-V PV IOMMU driver provide an EXPORTed function to accept
> a PCI domain ID and the related logical device ID. The PV IOMMU driver is
> responsible for storing this data in a form that it can later search. hv_pci_probe()
> calls this new function when it instantiates a new PCI pass-thru device. Then when
> the IOMMU driver needs to attach a new device, it can get the PCI domain ID
> from the struct pci_dev (or struct pci_bus), search for the related logical device
> ID in its own data structure, and use it. The pci-hyperv driver has a dependency
> on the IOMMU driver, but that's a dependency in the desired direction. The
> PCI domain ID and logical device ID are just integers, so no data structures are
> shared.

In a previous reply on this thread, you raised the uniqueness issue of bytes 4 and 5
of the GUID being used to create the domain number. I thought this approach could
help with that too, but as I coded it up, I realized that using the domain number 
(not guaranteed to be unique) to search for the bus instance GUID (guaranteed to be unique)
is the wrong way around. It is unfortunately the only available key in the pci_dev
handed to the pvIOMMU driver in this approach though...

Do you think that's a fatal flaw?

> 
> Note that the pci-hyperv must inform the PV IOMMU driver of the logical
> device ID *before* create_root_hv_pci_bus() calls pci_scan_root_bus_bridge().
> The latter function eventually invokes hv_iommu_attach_dev(), which will
> need the logical device ID. See example stack trace. [1]
> 
> I don't think the pci-hyperv driver even needs to tell the IOMMU driver to
> remove the information if a PCI pass-thru device is unbound or removed, as
> the logical device ID will be the same if the device ever comes back. At worst,
> the IOMMU driver can simply replace an existing logical device ID if a new one
> is provided for the same PCI domain ID.

As above, replacing a unique GUID when a result is found for a non-unique
key value may be prone to failure if it happens that the device that came "back"
is not in fact the same device (or class of device) that went away and just happens
to, either due to bytes 4 and 5 being identical, or due to collision in the
pci_domain_nr_dynamic_ida, have the same domain number. 

Thanks,
Easwar (he/him)

> 
> An include file must provide a stub for the new function if
> CONFIG_HYPERV_PVIOMMU is not defined, so that the pci-hyperv driver still
> builds and works.
> 
> I haven't coded this up, but it seems like it should be pretty clean.
> 
> Michael
> 
> [1] Example stack trace, starting with vmbus_add_channel_work() as a
> result of Hyper-V offering the PCI pass-thru device to the guest.
> hv_pci_probe() runs, and ends up in the generic Linux code for adding
> a PCI device, which in turn sets up the IOMMU.
> 
> [    1.731786]  hv_iommu_attach_dev+0xf0/0x1d0
> [    1.731788]  __iommu_attach_device+0x21/0xb0
> [    1.731790]  __iommu_device_set_domain+0x65/0xd0
> [    1.731792]  __iommu_group_set_domain_internal+0x61/0x120
> [    1.731795]  iommu_setup_default_domain+0x3a4/0x530
> [    1.731796]  __iommu_probe_device.part.0+0x15d/0x1d0
> [    1.731798]  iommu_probe_device+0x81/0xb0
> [    1.731799]  iommu_bus_notifier+0x2c/0x80
> [    1.731800]  notifier_call_chain+0x66/0xe0
> [    1.731802]  blocking_notifier_call_chain+0x47/0x70
> [    1.731804]  bus_notify+0x3b/0x50
> [    1.731805]  device_add+0x631/0x850
> [    1.731807]  pci_device_add+0x2db/0x670
> [    1.731809]  pci_scan_single_device+0xc3/0x100
> [    1.731810]  pci_scan_slot+0x97/0x230
> [    1.731812]  pci_scan_child_bus_extend+0x3b/0x2f0
> [    1.731814]  pci_scan_root_bus_bridge+0xc0/0xf0
> [    1.731816]  hv_pci_probe+0x398/0x5f0
> [    1.731817]  vmbus_probe+0x42/0xa0
> [    1.731819]  really_probe+0xe5/0x3e0
> [    1.731822]  __driver_probe_device+0x7e/0x170
> [    1.731823]  driver_probe_device+0x23/0xa0
> [    1.731824]  __device_attach_driver+0x92/0x130
> [    1.731826]  bus_for_each_drv+0x8c/0xe0
> [    1.731828]  __device_attach+0xc0/0x200
> [    1.731830]  device_initial_probe+0x4c/0x50
> [    1.731831]  bus_probe_device+0x32/0x90
> [    1.731832]  device_add+0x65b/0x850
> [    1.731836]  device_register+0x1f/0x30
> [    1.731837]  vmbus_device_register+0x87/0x130
> [    1.731840]  vmbus_add_channel_work+0x139/0x1a0
> [    1.731841]  process_one_work+0x19f/0x3f0
> [    1.731843]  worker_thread+0x188/0x2f0
> [    1.731845]  kthread+0x119/0x230
> [    1.731852]  ret_from_fork+0x1b4/0x1e0
> [    1.731854]  ret_from_fork_asm+0x1a/0x30
> 
>>

^ permalink raw reply

* Re: [EXTERNAL] [PATCH] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
From: Laurence Oberman @ 2026-04-08 18:06 UTC (permalink / raw)
  To: Long Li, Li Tian, linux-scsi@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	James E.J. Bottomley, Martin K. Petersen,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <SA1PR21MB6683ABEAC8B490387658B7C7CE5AA@SA1PR21MB6683.namprd21.prod.outlook.com>

On Tue, 2026-04-07 at 22:30 +0000, Long Li wrote:
> 
> 
> > -----Original Message-----
> > From: Li Tian <litian@redhat.com>
> > Sent: Sunday, April 5, 2026 6:54 PM
> > To: linux-scsi@vger.kernel.org
> > Cc: Li Tian <litian@redhat.com>; KY Srinivasan <kys@microsoft.com>;
> > Haiyang
> > Zhang <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>;
> > Dexuan Cui
> > <DECUI@microsoft.com>; Long Li <longli@microsoft.com>; James E.J.
> > Bottomley
> > <James.Bottomley@HansenPartnership.com>; Martin K. Petersen
> > <martin.petersen@oracle.com>; linux-hyperv@vger.kernel.org; linux-
> > kernel@vger.kernel.org
> > Subject: [EXTERNAL] [PATCH] scsi: storvsc: Handle
> > PERSISTENT_RESERVE_IN
> > truncation for Hyper-V vFC
> > 
> > The storvsc driver has become stricter in handling SRB status codes
> > returned by
> > the Hyper-V host. When using Virtual Fibre Channel (vFC)
> > passthrough, the host
> > may return SRB_STATUS_DATA_OVERRUN for PERSISTENT_RESERVE_IN
> > commands if the allocation length in the CDB does not match the
> > host's expected
> > response size.
> > 
> > Currently, this status is treated as a fatal error, propagating
> > Host_status=0x07 [DID_ERROR] to the SCSI mid-layer. This causes
> > userspace
> > storage utilities (such as sg_persist) to fail with transport
> > errors, even when the
> > host has actually returned the requested reservation data in the
> > buffer.
> > 
> > Refactor the existing command-specific workarounds into a new
> > helper function,
> > storvsc_host_mishandles_cmd(), and add PERSISTENT_RESERVE_IN to the
> > list of
> > commands where SRB status errors should be suppressed for vFC
> > devices. This
> > ensures that the SCSI mid-layer processes the returned data buffer
> > instead of
> > terminating the command.
> > 
> > Signed-off-by: Li Tian <litian@redhat.com>
> 
> Reviewed-by: Long Li <longli@microsoft.com>
> 
> 
> > ---
> >  drivers/scsi/storvsc_drv.c | 32 +++++++++++++++++++++-----------
> >  1 file changed, 21 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c
> > b/drivers/scsi/storvsc_drv.c index
> > ae1abab97835..6977ca8a0658 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -1131,6 +1131,26 @@ static void
> > storvsc_command_completion(struct
> > storvsc_cmd_request *cmd_request,
> >  		kfree(payload);
> >  }
> > 
> > +/*
> > + * The current SCSI handling on the host side does not correctly
> > handle:
> > + * INQUIRY with page code 0x80, MODE_SENSE / MODE_SENSE_10 with
> > cmd[2]
> > +== 0x1c,
> > + * and (for FC) MAINTENANCE_IN / PERSISTENT_RESERVE_IN
> > passthrough.
> > + */
> > +static bool storvsc_host_mishandles_cmd(u8 opcode, struct
> > hv_device
> > +*device) {
> > +	switch (opcode) {
> > +	case INQUIRY:
> > +	case MODE_SENSE:
> > +	case MODE_SENSE_10:
> > +		return true;
> > +	case MAINTENANCE_IN:
> > +	case PERSISTENT_RESERVE_IN:
> > +		return hv_dev_is_fc(device);
> > +	default:
> > +		return false;
> > +	}
> > +}
> > +
> >  static void storvsc_on_io_completion(struct storvsc_device
> > *stor_device,
> >  				  struct vstor_packet
> > *vstor_packet,
> >  				  struct storvsc_cmd_request
> > *request) @@ -
> > 1141,22 +1161,12 @@ static void storvsc_on_io_completion(struct
> > storvsc_device *stor_device,
> >  	stor_pkt = &request->vstor_packet;
> > 
> >  	/*
> > -	 * The current SCSI handling on the host side does
> > -	 * not correctly handle:
> > -	 * INQUIRY command with page code parameter set to 0x80
> > -	 * MODE_SENSE and MODE_SENSE_10 command with cmd[2] ==
> > 0x1c
> > -	 * MAINTENANCE_IN is not supported by HyperV FC
> > passthrough
> > -	 *
> >  	 * Setup srb and scsi status so this won't be fatal.
> >  	 * We do this so we can distinguish truly fatal failues
> >  	 * (srb status == 0x4) and off-line the device in that
> > case.
> >  	 */
> > 
> > -	if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> > -	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE) ||
> > -	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE_10) ||
> > -	   (stor_pkt->vm_srb.cdb[0] == MAINTENANCE_IN &&
> > -	   hv_dev_is_fc(device))) {
> > +	if (storvsc_host_mishandles_cmd(stor_pkt->vm_srb.cdb[0],
> > device)) {
> >  		vstor_packet->vm_srb.scsi_status = 0;
> >  		vstor_packet->vm_srb.srb_status =
> > SRB_STATUS_SUCCESS;
> >  	}
> > --
> > 2.53.0
> 

Looks good, rewrite of how it was done before but will achieve the same
behavior we wanted for the new addition for PR.

Reviewed-by: Laurence Oberman <loberman@redhat.com>


^ permalink raw reply

* Re: [PATCH] x86/VMBus: Confidential VMBus for dynamic DMA transfers
From: Easwar Hariharan @ 2026-04-08 16:54 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: kys, haiyangz, wei.liu, decui, longli, James.Bottomley,
	martin.petersen, apais, easwar.hariharan, Tianyu Lan,
	linux-hyperv, linux-kernel, linux-scsi, vdso, mhklinux
In-Reply-To: <20260408073105.272255-1-tiala@microsoft.com>

On 4/8/2026 12:31 AM, Tianyu Lan wrote:
> Hyper-V provides Confidential VMBus to communicate between
> device model and device guest driver via encrypted/private
> memory in Confidential VM. The device model is in OpenHCL
> (https://openvmm.dev/guide/user_guide/openhcl.html) that
> plays the paravisor role.
> 
> For a VMBus device, there are two communication methods to
> talk with Host/Hypervisor. 1) VMBUS Ring buffer 2) Dynamic
> DMA transfer.
> 
> The Confidential VMBus Ring buffer has been upstreamed by
> Roman Kisel(commit 6802d8af47d1).
> 
> The dynamic DMA transition of VMBus device normally goes
> through DMA core and it uses SWIOTLB as bounce buffer in
> a CoCo VM.
> 
> The Confidential VMBus device can do DMA directly to
> private/encrypted memory. Because the swiotlb is decrypted
> memory, the DMA transfer must not be bounced through the
> swiotlb, so as to preserve confidentiality. This is different
> from the default for Linux CoCo VMs, so not use DMA(SWIOTLB)
> API in VMBus driver when confidential dynamic DMA transfers
> capability is present.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
>  drivers/scsi/storvsc_drv.c | 28 +++++++++++++++++++++-------
>  include/linux/hyperv.h     |  1 +
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 

Does netvsc not need this same sort of patch?

Thanks,
Easwar (he/him)



^ permalink raw reply

* Re: [PATCH 2/8] firmware: efi: Never declare sysfb_primary_display on x86
From: Thomas Zimmermann @ 2026-04-08 14:07 UTC (permalink / raw)
  To: Ard Biesheuvel, Javier Martinez Canillas, Arnd Bergmann,
	Ilias Apalodimas, Huacai Chen, WANG Xuerui, maarten.lankhorst,
	mripard, David Airlie, Simona Vetter, kys, haiyangz, Wei Liu,
	decui, Long Li, Helge Deller
  Cc: linux-arm-kernel, loongarch, linux-efi, linux-riscv, dri-devel,
	linux-hyperv, linux-fbdev, kernel test robot
In-Reply-To: <d0624a61-b96b-4b2f-89c2-029e8671039d@app.fastmail.com>

Hi

Am 08.04.26 um 15:45 schrieb Ard Biesheuvel:
> Hi Thomas,
>
> On Thu, 2 Apr 2026, at 11:09, Thomas Zimmermann wrote:
>> The x86 architecture comes with its own instance of the global
>> state variable sysfb_primary_display. Never declare it in the EFI
>> subsystem. Fix the test for CONFIG_FIRMWARE_EDID accordingly.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Fixes: e65ca1646311 ("efi: export sysfb_primary_display for EDID")
>> Cc: kernel test robot <lkp@intel.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>> Cc: linux-efi@vger.kernel.org
>> ---
>>   drivers/firmware/efi/efi-init.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
> Should this be sent out as a fix?

Yes, please.



-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Michael Kelley @ 2026-04-08 13:53 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SA1PR21MB69215C164B06109C6682984EBF5BA@SA1PR21MB6921.namprd21.prod.outlook.com>

From: Dexuan Cui <DECUI@microsoft.com> Sent: Wednesday, April 8, 2026 2:24 AM
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Sunday, April 5, 2026 4:15 PM
> > > ...
> > > Note: we still need to figure out how to address the possible MMIO
> > > conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> > > MMIO BARs, but that's of low priority because all PCI devices available
> > > to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> > > should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> > > devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
> >
> > Just to clarify, since this patch is predicated on all BARs being 64-bit,
> > hv_pci_alloc_bridge_windows() never encounters a non-zero
> > hbus->low_mmio_space, and hence also never allocates from low
> > MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
> > patched. Is that correct?
> 
> Correct. For 32-bit BARs (if any), IMO we can't really do anything for
> them in hv_pci_allocate_bridge_windows(), since they must reside
> below 4GB.
> 
> Note: while the patch doesn't fix the MMIO conflict if there are any
> 32-bit BARs, the patch doesn't make things worse for 32-bit BARs (if any).

OK, right. Your patch doesn't prevent 32-bit BARs from working. It
just doesn't fix any potential frame buffer conflicts with 32-bit BARs.
I misinterpreted the situation.

> 
> > Taking a broader view, fundamentally the current MMIO location of
> > the frame buffer may be unknown to the Linux guest. At the same time,
> > Linux must ensure that PCI devices don't get assigned to the MMIO space
> > where the frame buffer is located. While the current MMIO location of
> > the frame buffer may be unknown, we can assume it was placed in low
> > MMIO space by the host -- either Windows Hyper-V or Linux/VMM
> > in the root partition, and perhaps as mediated by a paravisor. Probably
> > need to confirm with the Linux-in-the-root partition team (and maybe
> > the OpenHCL team) that this assumption is true.
> 
> IMO this is a good idea! It looks like the framebuffer base always starts
> at the beginning of the low MMIO space. We can reserve some
> MMIO for the framebuffer at the beginning of the low MMIO space.
> 
> > Presumably the
> > hyperv_drm driver doesn't need to move the frame buffer, but if it
> > does, it must stay in the low MMIO space.
> 
> It looks like this assumption is true.
> 
> > This patch depends on this assumption, and effectively reserves
> > the entire low MMIO space for the frame buffer.
> 
> To make it precise, the patch reserves the entire low MMIO space for
> the frame buffer and the 32-bit BARs (if any), and there is no MMIO
> conflict in the first kernel (assuming hyperv_drm doesn't relocate the
> MMIO range), and there can be an MMIO conflict in the
> kdump/kexec kernel if there is any 32-bit BAR.
> 
> > The low MMIO space
> > size defaults to 128 MiB on a local Hyper-V,
> Yes, by default, the low MMIO base =0xf800_0000, size=128MB,
> but the range [0xfed4_0000, 0xffff_ffff], whose size is 18.75MB,
> is reserved for vTPM: see vmbus_walk_resources(). So by default
> the available low MMIO size for hyperv_drm is 128 - 18.75 =
> 109.25 MB.
> 
> The size of the framebuffer should be aligned to 2MB, so if the
> framebuffer size is bigger than 108MB, it looks like there is no
> enough MMIO space in the low MMIO range, e.g. with the below
> command:
> Set-VMVideo -VMName vm_name -HorizontalResolution 7680
> -VerticalResolution 4320 -ResolutionType Maximum
> , the resulting max framebuffer size is
> 7680 * 4320 * 32/8 /1024.0/1024 = 126.5625, which would be
> rounded up to 128MB.
> 
> However, according to my testing, with the above command,
> the low MMIO base = 0xf000_0000, size=256MB, so it's probably
> ok to reserve 128 MB for the frame buffer.
> 
> In case the low MMIO size is <=64MB, we would want to reserve
> less MMIO for the frame buffer.
> 
> > and is set to 3 GiB in most
> > Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.
> >
> > A slightly different approach to the whole problem is to change
> > vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> > it should use the same assumption as above, and reserve a frame buffer
> > area starting at the lowest address in low MMIO space. The reserved size
> > could be the max possible frame buffer size, which I think is 64 MiB (?).
> 
> It can be 128MB with the highest resolution 7680*4320 (I hope the
> highest resolution won't become bigger in the future).

Indeed!

> 
> > This still leaves low MMIO space for subsequent PCI devices, and allows
> > 32-bit BARs to continue to work. This approach requires one further
> > assumption, which is that the host, plus any movement by hyperv_drm,
> > has kept the frame buffer at the low end of the low MMIO space. From
> > what I've seen, that assumption is reality -- the frame buffer always
> > starts at the beginning of low MMIO space.
> >
> > This approach could be taken one step further, where vmbus_reserve_fb()
> > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > regardless of the value of "start". The messy code for getting "start"
> > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > away. Or maybe still get the value of "start" and "size", and if non-zero
> > just do a sanity check that they are within the fixed 64 MiB reserved area.
> >
> > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > straightforward and explicit way to do the reserving, vs. modifying
> > the requested range in the Hyper-V PCI driver.
> 
> Agreed. Let me try to make a new patch for review.
> 
> > And FWIW, it avoids  introducing the 32-bit BAR limitation.
> 
> This patch addresses the MMIO conflict for 64-bit BARs and not for
> 32-bit BARs (if any). The patch does not introduce the 32-bit BAR limitation.

Right.  I misinterpreted the problem you mentioned about 32-bit BARs.

Michael

^ permalink raw reply

* Re: [PATCH 2/8] firmware: efi: Never declare sysfb_primary_display on x86
From: Ard Biesheuvel @ 2026-04-08 13:45 UTC (permalink / raw)
  To: Thomas Zimmermann, Javier Martinez Canillas, Arnd Bergmann,
	Ilias Apalodimas, Huacai Chen, WANG Xuerui, maarten.lankhorst,
	mripard, David Airlie, Simona Vetter, kys, haiyangz, Wei Liu,
	decui, Long Li, Helge Deller
  Cc: linux-arm-kernel, loongarch, linux-efi, linux-riscv, dri-devel,
	linux-hyperv, linux-fbdev, kernel test robot
In-Reply-To: <20260402092305.208728-3-tzimmermann@suse.de>

Hi Thomas,

On Thu, 2 Apr 2026, at 11:09, Thomas Zimmermann wrote:
> The x86 architecture comes with its own instance of the global
> state variable sysfb_primary_display. Never declare it in the EFI
> subsystem. Fix the test for CONFIG_FIRMWARE_EDID accordingly.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Fixes: e65ca1646311 ("efi: export sysfb_primary_display for EDID")
> Cc: kernel test robot <lkp@intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: linux-efi@vger.kernel.org
> ---
>  drivers/firmware/efi/efi-init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>

Should this be sent out as a fix?

^ permalink raw reply

* Re: [PATCH v2] tools: hv: Fix cross-compilation
From: Aditya Garg @ 2026-04-08 12:36 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, gregkh, ssengar,
	linux-hyperv, linux-kernel, romank, avladu, vdso, gargaditya
In-Reply-To: <20260407122040.249733-1-gargaditya@linux.microsoft.com>

On 07-04-2026 17:50, Aditya Garg wrote:
> Use the native ARCH only in case it is not set, this will allow the
> cross-compilation where ARCH is explicitly set.
> 
> Additionally, simplify the check for ARCH so that fcopy daemon is built
> only for x86_64.
> 
> Fixes: 82b0945ce2c2 ("tools: hv: Add new fcopy application based on uio driver")
> Reported-by: Adrian Vladu <avladu@cloudbasesolutions.com>
> Closes: https://lore.kernel.org/linux-hyperv/PR3PR09MB54119DB2FD76977C62D8DD6AB04D2@PR3PR09MB5411.eurprd09.prod.outlook.com/
> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
> Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
> ---
> Changes since v1:
>      - Dropped the info target printing CC, LD and ARCH
> 
> v1: https://lore.kernel.org/all/1733992114-7305-1-git-send-email-ssengar@linux.microsoft.com/
> ---
>   tools/hv/Makefile | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/hv/Makefile b/tools/hv/Makefile
> index 34ffcec264ab..e377caf89fb6 100644
> --- a/tools/hv/Makefile
> +++ b/tools/hv/Makefile
> @@ -2,7 +2,7 @@
>   # Makefile for Hyper-V tools
>   include ../scripts/Makefile.include
>   
> -ARCH := $(shell uname -m 2>/dev/null)
> +ARCH ?= $(shell uname -m 2>/dev/null)
>   sbindir ?= /usr/sbin
>   libexecdir ?= /usr/libexec
>   sharedstatedir ?= /var/lib
> @@ -20,7 +20,7 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
>   override CFLAGS += -Wno-address-of-packed-member
>   
>   ALL_TARGETS := hv_kvp_daemon hv_vss_daemon
> -ifneq ($(ARCH), aarch64)
> +ifeq ($(ARCH), x86_64)
>   ALL_TARGETS += hv_fcopy_uio_daemon
>   endif
>   ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))

Sashiko AI review flagged an issue, I tested it and confirmed.
When building via make tools/hv from the top-level kernel directory,
scripts/subarch.include normalizes x86_64 to x86, and since ARCH is
exported, the ?= assignment in tools/hv/Makefile preserves the
normalized value, causing ifeq ($(ARCH), x86_64) to be false and
hv_fcopy_uio_daemon to be silently excluded.

I'll change this to include x86 as well in v3.

Regards,
Aditya

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox