From: Mukesh R <mrathor@linux.microsoft.com>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
linux-pci@vger.kernel.org, linux-arch@vger.kernel.org,
kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, longli@microsoft.com,
catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, joro@8bytes.org, lpieralisi@kernel.org,
kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
bhelgaas@google.com, arnd@arndb.de,
nunodasneves@linux.microsoft.com, mhklinux@outlook.com
Subject: Re: [PATCH v0 15/15] mshv: Populate mmio mappings for PCI passthru
Date: Wed, 4 Feb 2026 14:52:54 -0800 [thread overview]
Message-ID: <596c9549-9edc-91f3-7473-e206ddc68e76@linux.microsoft.com> (raw)
In-Reply-To: <aYDROXpR5kvlylGG@skinsburskii.localdomain>
On 2/2/26 08:30, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 02:17:24PM -0800, Mukesh R wrote:
>> On 1/27/26 10:57, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 26, 2026 at 07:07:22PM -0800, Mukesh R wrote:
>>>> On 1/26/26 10:15, Stanislav Kinsburskii wrote:
>>>>> On Fri, Jan 23, 2026 at 06:19:15PM -0800, Mukesh R wrote:
>>>>>> On 1/20/26 17:53, Stanislav Kinsburskii wrote:
>>>>>>> On Mon, Jan 19, 2026 at 10:42:30PM -0800, Mukesh R wrote:
>>>>>>>> From: Mukesh Rathor <mrathor@linux.microsoft.com>
>>>>>>>>
>>>>>>>> Upon guest access, in case of missing mmio mapping, the hypervisor
>>>>>>>> generates an unmapped gpa intercept. In this path, lookup the PCI
>>>>>>>> resource pfn for the guest gpa, and ask the hypervisor to map it
>>>>>>>> via hypercall. The PCI resource pfn is maintained by the VFIO driver,
>>>>>>>> and obtained via fixup_user_fault call (similar to KVM).
>>>>>>>>
>>>>>>>> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
>>>>>>>> ---
>>>>>>>> drivers/hv/mshv_root_main.c | 115 ++++++++++++++++++++++++++++++++++++
>>>>>>>> 1 file changed, 115 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>>>>>>>> index 03f3aa9f5541..4c8bc7cd0888 100644
>>>>>>>> --- a/drivers/hv/mshv_root_main.c
>>>>>>>> +++ b/drivers/hv/mshv_root_main.c
>>>>>>>> @@ -56,6 +56,14 @@ struct hv_stats_page {
>>>>>>>> };
>>>>>>>> } __packed;
>>>>>>>> +bool hv_nofull_mmio; /* don't map entire mmio region upon fault */
>>>>>>>> +static int __init setup_hv_full_mmio(char *str)
>>>>>>>> +{
>>>>>>>> + hv_nofull_mmio = true;
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +__setup("hv_nofull_mmio", setup_hv_full_mmio);
>>>>>>>> +
>>>>>>>> struct mshv_root mshv_root;
>>>>>>>> enum hv_scheduler_type hv_scheduler_type;
>>>>>>>> @@ -612,6 +620,109 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
>>>>>>>> }
>>>>>>>> #ifdef CONFIG_X86_64
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * Check if uaddr is for mmio range. If yes, return 0 with mmio_pfn filled in
>>>>>>>> + * else just return -errno.
>>>>>>>> + */
>>>>>>>> +static int mshv_chk_get_mmio_start_pfn(struct mshv_partition *pt, u64 gfn,
>>>>>>>> + u64 *mmio_pfnp)
>>>>>>>> +{
>>>>>>>> + struct vm_area_struct *vma;
>>>>>>>> + bool is_mmio;
>>>>>>>> + u64 uaddr;
>>>>>>>> + struct mshv_mem_region *mreg;
>>>>>>>> + struct follow_pfnmap_args pfnmap_args;
>>>>>>>> + int rc = -EINVAL;
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * Do not allow mem region to be deleted beneath us. VFIO uses
>>>>>>>> + * useraddr vma to lookup pci bar pfn.
>>>>>>>> + */
>>>>>>>> + spin_lock(&pt->pt_mem_regions_lock);
>>>>>>>> +
>>>>>>>> + /* Get the region again under the lock */
>>>>>>>> + mreg = mshv_partition_region_by_gfn(pt, gfn);
>>>>>>>> + if (mreg == NULL || mreg->type != MSHV_REGION_TYPE_MMIO)
>>>>>>>> + goto unlock_pt_out;
>>>>>>>> +
>>>>>>>> + uaddr = mreg->start_uaddr +
>>>>>>>> + ((gfn - mreg->start_gfn) << HV_HYP_PAGE_SHIFT);
>>>>>>>> +
>>>>>>>> + mmap_read_lock(current->mm);
>>>>>>>
>>>>>>> Semaphore can't be taken under spinlock.
>>>>>
>>>>>>
>>>>>> Yeah, something didn't feel right here and I meant to recheck, now regret
>>>>>> rushing to submit the patch.
>>>>>>
>>>>>> Rethinking, I think the pt_mem_regions_lock is not needed to protect
>>>>>> the uaddr because unmap will properly serialize via the mm lock.
>>>>>>
>>>>>>
>>>>>>>> + vma = vma_lookup(current->mm, uaddr);
>>>>>>>> + is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
>>>>>>>
>>>>>>> Why this check is needed again?
>>>>>>
>>>>>> To make sure region did not change. This check is under lock.
>>>>>>
>>>>>
>>>>> How can this happen? One can't change VMA type without unmapping it
>>>>> first. And unmapping it leads to a kernel MMIO region state dangling
>>>>> around without corresponding user space mapping.
>>>>
>>>> Right, and vm_flags would not be mmio expected then.
>>>>
>>>>> This is similar to dangling pinned regions and should likely be
>>>>> addressed the same way by utilizing MMU notifiers to destpoy memoty
>>>>> regions is VMA is detached.
>>>>
>>>> I don't think we need that. Either it succeeds if the region did not
>>>> change at all, or just fails.
>>>>
>>>
>>> I'm afraid we do, as if the driver mapped a page with the previous
>>> memory region, and then the region is unmapped, the page will stay
>>> mapped in the hypervisor, but will be considered free by kernel, which
>>> in turn will lead to GPF upn next allocation.
>>
>> There are no ram pages for mmio regions. Also, we don't do much with
>> mmio regions other than tell the hyp about it.
>>
>
> So, are you saying that the hypervisor does not use these pages and only
> tracks them? That would make things easier.
> However, if we later try to map a GPA that is already mapped, will the
> hypervisor return an error?
Hypervisor does not return an error.
> Thanks,
> Stanislav
>
>> Thanks,
>> -Mukesh
>>
>>
>>> With pinned regions we issue is similar but less impacting: pages can't
>>> be released by user space unmapping and thus will be simply leaked, but
>>> the system stays intact.
>>>
>>> MMIO regions are simila to movable region in this regard: they don't
>>> reference the user pages, and thus this guest region replaement is a
>>> stright wat to kernel panic.
>>>
>>>>
>>>>>>> The region type is stored on the region itself.
>>>>>>> And the type is checked on the caller side.
>>>>>>>
>>>>>>>> + if (!is_mmio)
>>>>>>>> + goto unlock_mmap_out;
>>>>>>>> +
>>>>>>>> + pfnmap_args.vma = vma;
>>>>>>>> + pfnmap_args.address = uaddr;
>>>>>>>> +
>>>>>>>> + rc = follow_pfnmap_start(&pfnmap_args);
>>>>>>>> + if (rc) {
>>>>>>>> + rc = fixup_user_fault(current->mm, uaddr, FAULT_FLAG_WRITE,
>>>>>>>> + NULL);
>>>>>>>> + if (rc)
>>>>>>>> + goto unlock_mmap_out;
>>>>>>>> +
>>>>>>>> + rc = follow_pfnmap_start(&pfnmap_args);
>>>>>>>> + if (rc)
>>>>>>>> + goto unlock_mmap_out;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + *mmio_pfnp = pfnmap_args.pfn;
>>>>>>>> + follow_pfnmap_end(&pfnmap_args);
>>>>>>>> +d
>>>>>>>> +unlock_mmap_out:
>>>>>>>> + mmap_read_unlock(current->mm);
>>>>>>>> +unlock_pt_out:
>>>>>>>> + spin_unlock(&pt->pt_mem_regions_lock);
>>>>>>>> + return rc;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * At present, the only unmapped gpa is mmio space. Verify if it's mmio
>>>>>>>> + * and resolve if possible.
>>>>>>>> + * Returns: True if valid mmio intercept and it was handled, else false
>>>>>>>> + */
>>>>>>>> +static bool mshv_handle_unmapped_gpa(struct mshv_vp *vp)
>>>>>>>> +{
>>>>>>>> + struct hv_message *hvmsg = vp->vp_intercept_msg_page;
>>>>>>>> + struct hv_x64_memory_intercept_message *msg;
>>>>>>>> + union hv_x64_memory_access_info accinfo;
>>>>>>>> + u64 gfn, mmio_spa, numpgs;
>>>>>>>> + struct mshv_mem_region *mreg;
>>>>>>>> + int rc;
>>>>>>>> + struct mshv_partition *pt = vp->vp_partition;
>>>>>>>> +
>>>>>>>> + msg = (struct hv_x64_memory_intercept_message *)hvmsg->u.payload;
>>>>>>>> + accinfo = msg->memory_access_info;
>>>>>>>> +
>>>>>>>> + if (!accinfo.gva_gpa_valid)
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + /* Do a fast check and bail if non mmio intercept */
>>>>>>>> + gfn = msg->guest_physical_address >> HV_HYP_PAGE_SHIFT;
>>>>>>>> + mreg = mshv_partition_region_by_gfn(pt, gfn);
>>>>>>>
>>>>>>> This call needs to be protected by the spinlock.
>>>>>>
>>>>>> This is sorta fast path to bail. We recheck under partition lock above.
>>>>>>
>>>>>
>>>>> Accessing the list of regions without lock is unsafe.
>>>>
>>>> I am not sure why? This check is done by a vcpu thread, so regions
>>>> will not have just gone away.
>>>>
>>>
>>> This is shared resources. Multiple VP thread get into this function
>>> simultaneously, so there is a race already. But this one we can live
>>> with without locking as they don't mutate the list of the regions.
>>>
>>> The issue happens when VMM adds or removed another region as it mutates
>>> the list and races with VP threads doing this lookup.
>>>
>>> Thanks,
>>> Stanislav
>>>
>>>
>>>> Thanks,
>>>> -Mukesh
>>>>
>>>>
>>>>> Thanks,
>>>>> Stanislav
>>>>>
>>>>>> Thanks,
>>>>>> -Mukesh
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Stanislav
>>>>>>>
>>>>>>>> + if (mreg == NULL || mreg->type != MSHV_REGION_TYPE_MMIO)
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + rc = mshv_chk_get_mmio_start_pfn(pt, gfn, &mmio_spa);
>>>>>>>> + if (rc)
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + if (!hv_nofull_mmio) { /* default case */
>>>>>>>> + gfn = mreg->start_gfn;
>>>>>>>> + mmio_spa = mmio_spa - (gfn - mreg->start_gfn);
>>>>>>>> + numpgs = mreg->nr_pages;
>>>>>>>> + } else
>>>>>>>> + numpgs = 1;
>>>>>>>> +
>>>>>>>> + rc = hv_call_map_mmio_pages(pt->pt_id, gfn, mmio_spa, numpgs);
>>>>>>>> +
>>>>>>>> + return rc == 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static struct mshv_mem_region *
>>>>>>>> mshv_partition_region_by_gfn_get(struct mshv_partition *p, u64 gfn)
>>>>>>>> {
>>>>>>>> @@ -666,13 +777,17 @@ static bool mshv_handle_gpa_intercept(struct mshv_vp *vp)
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>> +
>>>>>>>> #else /* CONFIG_X86_64 */
>>>>>>>> +static bool mshv_handle_unmapped_gpa(struct mshv_vp *vp) { return false; }
>>>>>>>> static bool mshv_handle_gpa_intercept(struct mshv_vp *vp) { return false; }
>>>>>>>> #endif /* CONFIG_X86_64 */
>>>>>>>> static bool mshv_vp_handle_intercept(struct mshv_vp *vp)
>>>>>>>> {
>>>>>>>> switch (vp->vp_intercept_msg_page->header.message_type) {
>>>>>>>> + case HVMSG_UNMAPPED_GPA:
>>>>>>>> + return mshv_handle_unmapped_gpa(vp);
>>>>>>>> case HVMSG_GPA_INTERCEPT:
>>>>>>>> return mshv_handle_gpa_intercept(vp);
>>>>>>>> }
>>>>>>>> --
>>>>>>>> 2.51.2.vfs.0.1
>>>>>>>>
next prev parent reply other threads:[~2026-02-04 22:52 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-20 6:42 [PATCH v0 00/15] PCI passthru on Hyper-V (Part I) Mukesh R
2026-01-20 6:42 ` [PATCH v0 01/15] iommu/hyperv: rename hyperv-iommu.c to hyperv-irq.c Mukesh R
2026-01-20 19:08 ` kernel test robot
2026-01-20 21:09 ` kernel test robot
2026-02-05 18:48 ` Anirudh Rayabharam
2026-01-20 6:42 ` [PATCH v0 02/15] x86/hyperv: cosmetic changes in irqdomain.c for readability Mukesh R
2026-02-05 18:47 ` Anirudh Rayabharam
2026-01-20 6:42 ` [PATCH v0 03/15] x86/hyperv: add insufficient memory support in irqdomain.c Mukesh R
2026-01-21 0:53 ` kernel test robot
2026-01-20 6:42 ` [PATCH v0 04/15] mshv: Provide a way to get partition id if running in a VMM process Mukesh R
2026-01-23 18:23 ` Nuno Das Neves
2026-01-20 6:42 ` [PATCH v0 05/15] mshv: Declarations and definitions for VFIO-MSHV bridge device Mukesh R
2026-01-23 18:25 ` Nuno Das Neves
2026-01-24 0:36 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 06/15] mshv: Implement mshv bridge device for VFIO Mukesh R
2026-01-20 16:09 ` Stanislav Kinsburskii
2026-01-23 18:32 ` Nuno Das Neves
2026-01-24 0:37 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 07/15] mshv: Add ioctl support for MSHV-VFIO bridge device Mukesh R
2026-01-20 16:13 ` Stanislav Kinsburskii
2026-01-20 6:42 ` [PATCH v0 08/15] PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg Mukesh R
2026-01-28 14:03 ` Manivannan Sadhasivam
2026-01-20 6:42 ` [PATCH v0 09/15] mshv: Import data structs around device domains and irq remapping Mukesh R
2026-01-20 22:17 ` Stanislav Kinsburskii
2026-01-24 0:38 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 10/15] PCI: hv: Build device id for a VMBus device Mukesh R
2026-01-20 22:22 ` Stanislav Kinsburskii
2026-01-24 0:42 ` Mukesh R
2026-01-26 20:50 ` Stanislav Kinsburskii
2026-01-28 14:36 ` Manivannan Sadhasivam
2026-01-20 6:42 ` [PATCH v0 11/15] x86/hyperv: Build logical device ids for PCI passthru hcalls Mukesh R
2026-01-20 22:27 ` Stanislav Kinsburskii
2026-01-24 0:44 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 12/15] x86/hyperv: Implement hyperv virtual iommu Mukesh R
2026-01-21 0:12 ` Stanislav Kinsburskii
2026-01-24 1:26 ` Mukesh R
2026-01-26 15:57 ` Stanislav Kinsburskii
2026-01-27 3:02 ` Mukesh R
2026-01-27 18:46 ` Stanislav Kinsburskii
2026-01-30 22:51 ` Mukesh R
2026-02-02 16:20 ` Stanislav Kinsburskii
2026-01-22 5:18 ` Jacob Pan
2026-01-24 2:01 ` Mukesh R
2026-01-27 19:21 ` Jacob Pan
2026-01-27 22:31 ` Jacob Pan
2026-01-30 22:10 ` Mukesh R
2026-01-30 23:44 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 13/15] x86/hyperv: Basic interrupt support for direct attached devices Mukesh R
2026-01-21 0:47 ` Stanislav Kinsburskii
2026-01-24 2:08 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 14/15] mshv: Remove mapping of mmio space during map user ioctl Mukesh R
2026-01-21 1:41 ` Stanislav Kinsburskii
2026-01-23 18:34 ` Nuno Das Neves
2026-01-24 2:12 ` Mukesh R
2026-01-20 6:42 ` [PATCH v0 15/15] mshv: Populate mmio mappings for PCI passthru Mukesh R
2026-01-20 19:52 ` kernel test robot
2026-01-21 1:53 ` Stanislav Kinsburskii
2026-01-24 2:19 ` Mukesh R
2026-01-26 18:15 ` Stanislav Kinsburskii
2026-01-27 3:07 ` Mukesh R
2026-01-27 18:57 ` Stanislav Kinsburskii
2026-01-30 22:17 ` Mukesh R
2026-02-02 16:30 ` Stanislav Kinsburskii
2026-02-04 22:52 ` Mukesh R [this message]
2026-02-05 16:28 ` Stanislav Kinsburskii
2026-02-05 17:57 ` Mukesh R
2026-02-05 18:31 ` Stanislav Kinsburskii
2026-01-20 21:50 ` [PATCH v0 00/15] PCI passthru on Hyper-V (Part I) Jacob Pan
2026-01-24 2:27 ` Mukesh R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=596c9549-9edc-91f3-7473-e206ddc68e76@linux.microsoft.com \
--to=mrathor@linux.microsoft.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hpa@zytor.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kwilczynski@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=mhklinux@outlook.com \
--cc=mingo@redhat.com \
--cc=nunodasneves@linux.microsoft.com \
--cc=robh@kernel.org \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox