public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Mukesh R <mrathor@linux.microsoft.com>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-pci@vger.kernel.org, linux-arch@vger.kernel.org,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, joro@8bytes.org, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de,
	nunodasneves@linux.microsoft.com, mhklinux@outlook.com
Subject: Re: [PATCH v0 15/15] mshv: Populate mmio mappings for PCI passthru
Date: Mon, 26 Jan 2026 19:07:22 -0800	[thread overview]
Message-ID: <f39a501e-478f-66ff-26c8-229ca3991f4f@linux.microsoft.com> (raw)
In-Reply-To: <aXevWXolgNrrLltF@skinsburskii.localdomain>

On 1/26/26 10:15, Stanislav Kinsburskii wrote:
> On Fri, Jan 23, 2026 at 06:19:15PM -0800, Mukesh R wrote:
>> On 1/20/26 17:53, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 19, 2026 at 10:42:30PM -0800, Mukesh R wrote:
>>>> From: Mukesh Rathor <mrathor@linux.microsoft.com>
>>>>
>>>> Upon guest access, in case of missing mmio mapping, the hypervisor
>>>> generates an unmapped gpa intercept. In this path, lookup the PCI
>>>> resource pfn for the guest gpa, and ask the hypervisor to map it
>>>> via hypercall. The PCI resource pfn is maintained by the VFIO driver,
>>>> and obtained via fixup_user_fault call (similar to KVM).
>>>>
>>>> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
>>>> ---
>>>>    drivers/hv/mshv_root_main.c | 115 ++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 115 insertions(+)
>>>>
>>>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>>>> index 03f3aa9f5541..4c8bc7cd0888 100644
>>>> --- a/drivers/hv/mshv_root_main.c
>>>> +++ b/drivers/hv/mshv_root_main.c
>>>> @@ -56,6 +56,14 @@ struct hv_stats_page {
>>>>    	};
>>>>    } __packed;
>>>> +bool hv_nofull_mmio;   /* don't map entire mmio region upon fault */
>>>> +static int __init setup_hv_full_mmio(char *str)
>>>> +{
>>>> +	hv_nofull_mmio = true;
>>>> +	return 0;
>>>> +}
>>>> +__setup("hv_nofull_mmio", setup_hv_full_mmio);
>>>> +
>>>>    struct mshv_root mshv_root;
>>>>    enum hv_scheduler_type hv_scheduler_type;
>>>> @@ -612,6 +620,109 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
>>>>    }
>>>>    #ifdef CONFIG_X86_64
>>>> +
>>>> +/*
>>>> + * Check if uaddr is for mmio range. If yes, return 0 with mmio_pfn filled in
>>>> + * else just return -errno.
>>>> + */
>>>> +static int mshv_chk_get_mmio_start_pfn(struct mshv_partition *pt, u64 gfn,
>>>> +				       u64 *mmio_pfnp)
>>>> +{
>>>> +	struct vm_area_struct *vma;
>>>> +	bool is_mmio;
>>>> +	u64 uaddr;
>>>> +	struct mshv_mem_region *mreg;
>>>> +	struct follow_pfnmap_args pfnmap_args;
>>>> +	int rc = -EINVAL;
>>>> +
>>>> +	/*
>>>> +	 * Do not allow mem region to be deleted beneath us. VFIO uses
>>>> +	 * useraddr vma to lookup pci bar pfn.
>>>> +	 */
>>>> +	spin_lock(&pt->pt_mem_regions_lock);
>>>> +
>>>> +	/* Get the region again under the lock */
>>>> +	mreg = mshv_partition_region_by_gfn(pt, gfn);
>>>> +	if (mreg == NULL || mreg->type != MSHV_REGION_TYPE_MMIO)
>>>> +		goto unlock_pt_out;
>>>> +
>>>> +	uaddr = mreg->start_uaddr +
>>>> +		((gfn - mreg->start_gfn) << HV_HYP_PAGE_SHIFT);
>>>> +
>>>> +	mmap_read_lock(current->mm);
>>>
>>> Semaphore can't be taken under spinlock.
> 
>>
>> Yeah, something didn't feel right here and I meant to recheck, now regret
>> rushing to submit the patch.
>>
>> Rethinking, I think the pt_mem_regions_lock is not needed to protect
>> the uaddr because unmap will properly serialize via the mm lock.
>>
>>
>>>> +	vma = vma_lookup(current->mm, uaddr);
>>>> +	is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
>>>
>>> Why this check is needed again?
>>
>> To make sure region did not change. This check is under lock.
>>
> 
> How can this happen? One can't change VMA type without unmapping it
> first. And unmapping it leads to a kernel MMIO region state dangling
> around without corresponding user space mapping.

Right, and vm_flags would not be mmio expected then.

> This is similar to dangling pinned regions and should likely be
> addressed the same way by utilizing MMU notifiers to destpoy memoty
> regions is VMA is detached.

I don't think we need that. Either it succeeds if the region did not
change at all, or just fails.


>>> The region type is stored on the region itself.
>>> And the type is checked on the caller side.
>>>
>>>> +	if (!is_mmio)
>>>> +		goto unlock_mmap_out;
>>>> +
>>>> +	pfnmap_args.vma = vma;
>>>> +	pfnmap_args.address = uaddr;
>>>> +
>>>> +	rc = follow_pfnmap_start(&pfnmap_args);
>>>> +	if (rc) {
>>>> +		rc = fixup_user_fault(current->mm, uaddr, FAULT_FLAG_WRITE,
>>>> +				      NULL);
>>>> +		if (rc)
>>>> +			goto unlock_mmap_out;
>>>> +
>>>> +		rc = follow_pfnmap_start(&pfnmap_args);
>>>> +		if (rc)
>>>> +			goto unlock_mmap_out;
>>>> +	}
>>>> +
>>>> +	*mmio_pfnp = pfnmap_args.pfn;
>>>> +	follow_pfnmap_end(&pfnmap_args);
>>>> +d
>>>> +unlock_mmap_out:
>>>> +	mmap_read_unlock(current->mm);
>>>> +unlock_pt_out:
>>>> +	spin_unlock(&pt->pt_mem_regions_lock);
>>>> +	return rc;
>>>> +}
>>>> +
>>>> +/*
>>>> + * At present, the only unmapped gpa is mmio space. Verify if it's mmio
>>>> + * and resolve if possible.
>>>> + * Returns: True if valid mmio intercept and it was handled, else false
>>>> + */
>>>> +static bool mshv_handle_unmapped_gpa(struct mshv_vp *vp)
>>>> +{
>>>> +	struct hv_message *hvmsg = vp->vp_intercept_msg_page;
>>>> +	struct hv_x64_memory_intercept_message *msg;
>>>> +	union hv_x64_memory_access_info accinfo;
>>>> +	u64 gfn, mmio_spa, numpgs;
>>>> +	struct mshv_mem_region *mreg;
>>>> +	int rc;
>>>> +	struct mshv_partition *pt = vp->vp_partition;
>>>> +
>>>> +	msg = (struct hv_x64_memory_intercept_message *)hvmsg->u.payload;
>>>> +	accinfo = msg->memory_access_info;
>>>> +
>>>> +	if (!accinfo.gva_gpa_valid)
>>>> +		return false;
>>>> +
>>>> +	/* Do a fast check and bail if non mmio intercept */
>>>> +	gfn = msg->guest_physical_address >> HV_HYP_PAGE_SHIFT;
>>>> +	mreg = mshv_partition_region_by_gfn(pt, gfn);
>>>
>>> This call needs to be protected by the spinlock.
>>
>> This is sorta fast path to bail. We recheck under partition lock above.
>>
> 
> Accessing the list of regions without lock is unsafe.

I am not sure why? This check is done by a vcpu thread, so regions
will not have just gone away.

Thanks,
-Mukesh


> Thanks,
> Stanislav
> 
>> Thanks,
>> -Mukesh
>>
>>
>>> Thanks,
>>> Stanislav
>>>
>>>> +	if (mreg == NULL || mreg->type != MSHV_REGION_TYPE_MMIO)
>>>> +		return false;
>>>> +
>>>> +	rc = mshv_chk_get_mmio_start_pfn(pt, gfn, &mmio_spa);
>>>> +	if (rc)
>>>> +		return false;
>>>> +
>>>> +	if (!hv_nofull_mmio) {		/* default case */
>>>> +		gfn = mreg->start_gfn;
>>>> +		mmio_spa = mmio_spa - (gfn - mreg->start_gfn);
>>>> +		numpgs = mreg->nr_pages;
>>>> +	} else
>>>> +		numpgs = 1;
>>>> +
>>>> +	rc = hv_call_map_mmio_pages(pt->pt_id, gfn, mmio_spa, numpgs);
>>>> +
>>>> +	return rc == 0;
>>>> +}
>>>> +
>>>>    static struct mshv_mem_region *
>>>>    mshv_partition_region_by_gfn_get(struct mshv_partition *p, u64 gfn)
>>>>    {
>>>> @@ -666,13 +777,17 @@ static bool mshv_handle_gpa_intercept(struct mshv_vp *vp)
>>>>    	return ret;
>>>>    }
>>>> +
>>>>    #else  /* CONFIG_X86_64 */
>>>> +static bool mshv_handle_unmapped_gpa(struct mshv_vp *vp) { return false; }
>>>>    static bool mshv_handle_gpa_intercept(struct mshv_vp *vp) { return false; }
>>>>    #endif /* CONFIG_X86_64 */
>>>>    static bool mshv_vp_handle_intercept(struct mshv_vp *vp)
>>>>    {
>>>>    	switch (vp->vp_intercept_msg_page->header.message_type) {
>>>> +	case HVMSG_UNMAPPED_GPA:
>>>> +		return mshv_handle_unmapped_gpa(vp);
>>>>    	case HVMSG_GPA_INTERCEPT:
>>>>    		return mshv_handle_gpa_intercept(vp);
>>>>    	}
>>>> -- 
>>>> 2.51.2.vfs.0.1
>>>>



  reply	other threads:[~2026-01-27  3:07 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-20  6:42 [PATCH v0 00/15] PCI passthru on Hyper-V (Part I) Mukesh R
2026-01-20  6:42 ` [PATCH v0 01/15] iommu/hyperv: rename hyperv-iommu.c to hyperv-irq.c Mukesh R
2026-01-20 19:08   ` kernel test robot
2026-01-20 21:09   ` kernel test robot
2026-02-05 18:48   ` Anirudh Rayabharam
2026-01-20  6:42 ` [PATCH v0 02/15] x86/hyperv: cosmetic changes in irqdomain.c for readability Mukesh R
2026-02-05 18:47   ` Anirudh Rayabharam
2026-01-20  6:42 ` [PATCH v0 03/15] x86/hyperv: add insufficient memory support in irqdomain.c Mukesh R
2026-01-21  0:53   ` kernel test robot
2026-01-20  6:42 ` [PATCH v0 04/15] mshv: Provide a way to get partition id if running in a VMM process Mukesh R
2026-01-23 18:23   ` Nuno Das Neves
2026-01-20  6:42 ` [PATCH v0 05/15] mshv: Declarations and definitions for VFIO-MSHV bridge device Mukesh R
2026-01-23 18:25   ` Nuno Das Neves
2026-01-24  0:36     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 06/15] mshv: Implement mshv bridge device for VFIO Mukesh R
2026-01-20 16:09   ` Stanislav Kinsburskii
2026-01-23 18:32   ` Nuno Das Neves
2026-01-24  0:37     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 07/15] mshv: Add ioctl support for MSHV-VFIO bridge device Mukesh R
2026-01-20 16:13   ` Stanislav Kinsburskii
2026-01-20  6:42 ` [PATCH v0 08/15] PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg Mukesh R
2026-01-28 14:03   ` Manivannan Sadhasivam
2026-01-20  6:42 ` [PATCH v0 09/15] mshv: Import data structs around device domains and irq remapping Mukesh R
2026-01-20 22:17   ` Stanislav Kinsburskii
2026-01-24  0:38     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 10/15] PCI: hv: Build device id for a VMBus device Mukesh R
2026-01-20 22:22   ` Stanislav Kinsburskii
2026-01-24  0:42     ` Mukesh R
2026-01-26 20:50       ` Stanislav Kinsburskii
2026-01-28 14:36       ` Manivannan Sadhasivam
2026-01-20  6:42 ` [PATCH v0 11/15] x86/hyperv: Build logical device ids for PCI passthru hcalls Mukesh R
2026-01-20 22:27   ` Stanislav Kinsburskii
2026-01-24  0:44     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 12/15] x86/hyperv: Implement hyperv virtual iommu Mukesh R
2026-01-21  0:12   ` Stanislav Kinsburskii
2026-01-24  1:26     ` Mukesh R
2026-01-26 15:57       ` Stanislav Kinsburskii
2026-01-27  3:02         ` Mukesh R
2026-01-27 18:46           ` Stanislav Kinsburskii
2026-01-30 22:51             ` Mukesh R
2026-02-02 16:20               ` Stanislav Kinsburskii
2026-01-22  5:18   ` Jacob Pan
2026-01-24  2:01     ` Mukesh R
2026-01-27 19:21       ` Jacob Pan
2026-01-27 22:31         ` Jacob Pan
2026-01-30 22:10           ` Mukesh R
2026-01-30 23:44         ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 13/15] x86/hyperv: Basic interrupt support for direct attached devices Mukesh R
2026-01-21  0:47   ` Stanislav Kinsburskii
2026-01-24  2:08     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 14/15] mshv: Remove mapping of mmio space during map user ioctl Mukesh R
2026-01-21  1:41   ` Stanislav Kinsburskii
2026-01-23 18:34   ` Nuno Das Neves
2026-01-24  2:12     ` Mukesh R
2026-01-20  6:42 ` [PATCH v0 15/15] mshv: Populate mmio mappings for PCI passthru Mukesh R
2026-01-20 19:52   ` kernel test robot
2026-01-21  1:53   ` Stanislav Kinsburskii
2026-01-24  2:19     ` Mukesh R
2026-01-26 18:15       ` Stanislav Kinsburskii
2026-01-27  3:07         ` Mukesh R [this message]
2026-01-27 18:57           ` Stanislav Kinsburskii
2026-01-30 22:17             ` Mukesh R
2026-02-02 16:30               ` Stanislav Kinsburskii
2026-02-04 22:52                 ` Mukesh R
2026-02-05 16:28                   ` Stanislav Kinsburskii
2026-02-05 17:57                     ` Mukesh R
2026-02-05 18:31                       ` Stanislav Kinsburskii
2026-01-20 21:50 ` [PATCH v0 00/15] PCI passthru on Hyper-V (Part I) Jacob Pan
2026-01-24  2:27   ` Mukesh R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f39a501e-478f-66ff-26c8-229ca3991f4f@linux.microsoft.com \
    --to=mrathor@linux.microsoft.com \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kwilczynski@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=mingo@redhat.com \
    --cc=nunodasneves@linux.microsoft.com \
    --cc=robh@kernel.org \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox