From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: Yu Zhang <zhangyu1@linux.microsoft.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
"kys@microsoft.com" <kys@microsoft.com>,
"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
"decui@microsoft.com" <decui@microsoft.com>,
"longli@microsoft.com" <longli@microsoft.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"will@kernel.org" <will@kernel.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"kwilczynski@kernel.org" <kwilczynski@kernel.org>,
"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
"mani@kernel.org" <mani@kernel.org>,
"robh@kernel.org" <robh@kernel.org>,
"arnd@arndb.de" <arnd@arndb.de>,
"tgopinath@linux.microsoft.com" <tgopinath@linux.microsoft.com>,
"easwar.hariharan@linux.microsoft.com"
<easwar.hariharan@linux.microsoft.com>,
jacob.pan@linux.microsoft.com
Subject: Re: [PATCH v1 4/4] iommu/hyperv: Add page-selective IOTLB flush support
Date: Wed, 20 May 2026 13:40:27 -0700 [thread overview]
Message-ID: <20260520134027.00005e91@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB4157C1EC7F5F69C5ABDA9C7FD4012@SN6PR02MB4157.namprd02.prod.outlook.com>
Hi Michael,
On Wed, 20 May 2026 19:26:24 +0000
Michael Kelley <mhklinux@outlook.com> wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> To: Yu Zhang <zhangyu1@linux.microsoft.com>, Jason Gunthorpe
> <jgg@ziepe.ca> CC: "linux-kernel@vger.kernel.org"
> <linux-kernel@vger.kernel.org>, "linux-hyperv@vger.kernel.org"
> <linux-hyperv@vger.kernel.org>, "iommu@lists.linux.dev"
> <iommu@lists.linux.dev>, "linux-pci@vger.kernel.org"
> <linux-pci@vger.kernel.org>, "linux-arch@vger.kernel.org"
> <linux-arch@vger.kernel.org>, "wei.liu@kernel.org"
> <wei.liu@kernel.org>, "kys@microsoft.com" <kys@microsoft.com>,
> "haiyangz@microsoft.com" <haiyangz@microsoft.com>,
> "decui@microsoft.com" <decui@microsoft.com>, "longli@microsoft.com"
> <longli@microsoft.com>, "joro@8bytes.org" <joro@8bytes.org>,
> "will@kernel.org" <will@kernel.org>, "robin.murphy@arm.com"
> <robin.murphy@arm.com>, "bhelgaas@google.com" <bhelgaas@google.com>,
> "kwilczynski@kernel.org" <kwilczynski@kernel.org>,
> "lpieralisi@kernel.org" <lpieralisi@kernel.org>, "mani@kernel.org"
> <mani@kernel.org>, "robh@kernel.org" <robh@kernel.org>,
> "arnd@arndb.de" <arnd@arndb.de>, "jacob.pan@linux.microsoft.com"
> <jacob.pan@linux.microsoft.com>, "tgopinath@linux.microsoft.com"
> <tgopinath@linux.microsoft.com>,
> "easwar.hariharan@linux.microsoft.com"
> <easwar.hariharan@linux.microsoft.com> Subject: RE: [PATCH v1 4/4]
> iommu/hyperv: Add page-selective IOTLB flush support Date: Wed, 20
> May 2026 19:26:24 +0000
>
> From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Wednesday, May
> 20, 2026 10:15 AM
> >
> > On Fri, May 15, 2026 at 07:35:45PM -0300, Jason Gunthorpe wrote:
> > > On Tue, May 12, 2026 at 12:24:08AM +0800, Yu Zhang wrote:
> > > > +static inline u16 hv_iommu_fill_iova_list(union
> > > > hv_iommu_flush_va *iova_list,
> > > > + unsigned long start,
> > > > + unsigned long end)
> > > > +{
> > > > + unsigned long start_pfn = start >> PAGE_SHIFT;
> > > > + unsigned long end_pfn = PAGE_ALIGN(end) >> PAGE_SHIFT;
> > > > + unsigned long nr_pages = end_pfn - start_pfn;
> > > > + u16 count = 0;
> > > > +
> > > > + while (nr_pages > 0) {
> > > > + unsigned long flush_pages;
> > > > + int order;
> > > > + unsigned long pfn_align;
> > > > + unsigned long size_align;
> > > > +
> > > > + if (count >= HV_IOMMU_MAX_FLUSH_VA_COUNT) {
> > > > + count = HV_IOMMU_FLUSH_VA_OVERFLOW;
> > > > + break;
> > > > + }
> > > > +
> > > > + if (start_pfn)
> > > > + pfn_align = __ffs(start_pfn);
> > > > + else
> > > > + pfn_align = BITS_PER_LONG - 1;
> > > > +
> > > > + size_align = __fls(nr_pages);
> > > > + order = min(pfn_align, size_align);
> > > > + iova_list[count].page_mask_shift = order;
> > > > + iova_list[count].page_number = start_pfn;
> > > > +
> > > > + flush_pages = 1UL << order;
> > > > + start_pfn += flush_pages;
> > > > + nr_pages -= flush_pages;
> > > > + count++;
> > > > + }
> > >
> > > This seems like a really silly hypervisor interface. Why doesn't
> > > it just accept a normal range? Splitting it into power of two
> > > aligned ranges is very inefficient.
> >
> > Fair point. I'm not sure how much flexibility we have to change
> > this hypercall interface at the moment - it predates the pvIOMMU
> > work and may have other consumers beyond Linux guest. On the other
> > hand, having the guest specify 2^N-aligned blocks does save the
> > hypervisor from having to decompose ranges itself before issuing
> > hardware invalidation commands - the guest-provided entries can be
> > fed to the HW more or less directly.
> >
> > That said, the way I'm currently using this interface may be
> > more precise than necessary. Maybe we have 2 options:
> >
> > 1) Current approach: decompose the range into multiple exact
> > 2^N-aligned blocks with no over-flush, but at the cost of
> > more complex calculations and more entries.
> >
> > 2) Follow what Intel/AMD drivers do: find a single minimal
> > 2^N-aligned block that covers the entire range, but may
> > over-flush.
> >
> > Any preference?
> >
> > @Michael, since you've also been reviewing this patch, I'd
> > appreciate your thoughts on the above as well. :)
> >
>
> I'm just guessing, but perhaps flushing an aligned power-of-2
> range can be processed by the hypervisor at a relatively fixed
> cost, regardless of the size. Having the guest do the decomposing
> of an arbitrary range allows the hypervisor to make use of the
> existing "rep" hypercall mechanism if the hypercall is taking
> "too long". The hypervisor can pause its processing, return to
> the guest temporarily, and then continue the hypercall. If the
> arbitrary range were passed into the hypercall for the hypervisor
> to do the decomposing, that pause-and-restart mechanism
> wouldn't be available.
>
> Of course, Linux doesn't really take advantage of the pause to
> reduce guest interrupt latency because the Hyper-V code in
> Linux typically disable interrupts around a hypercall due to the
> way the hypercall input page is allocated. But other guest
> operating systems might benefit from such a pause. And we could
> probably fix the Hyper-V code in Linux to allow interrupts during a
> hypercall pause/restart if long-running hypercalls turn out to be
> a problem.
I am not sure if this pause feature is suitable for IOTLB flush at all
since it is inherently synchronous — the caller must block until all
invalidations complete. Pausing mid-flush to return to the guest
doesn't help if the guest can't make forward progress anyway.
next prev parent reply other threads:[~2026-05-20 20:40 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 16:24 [PATCH v1 0/4] Hyper-V: Add para-virtualized IOMMU support for Linux guests Yu Zhang
2026-05-11 16:24 ` [PATCH v1 1/4] iommu: Move Hyper-V IOMMU driver to its own subdirectory Yu Zhang
2026-05-15 22:19 ` Jason Gunthorpe
2026-05-20 6:37 ` Yu Zhang
2026-05-20 13:38 ` Jason Gunthorpe
2026-05-11 16:24 ` [PATCH v1 2/4] hyperv: Introduce new hypercall interfaces used by Hyper-V guest IOMMU Yu Zhang
2026-05-12 21:24 ` sashiko-bot
2026-05-11 16:24 ` [PATCH v1 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest Yu Zhang
2026-05-12 22:30 ` sashiko-bot
2026-05-13 18:39 ` Jacob Pan
2026-05-15 12:38 ` Yu Zhang
2026-05-14 18:13 ` Michael Kelley
2026-05-15 13:59 ` Yu Zhang
2026-05-15 14:51 ` Michael Kelley
2026-05-15 16:53 ` Yu Zhang
2026-05-15 17:36 ` Michael Kelley
2026-05-20 15:50 ` Yu Zhang
2026-05-16 0:11 ` Mukesh R
2026-05-18 9:38 ` Yu Zhang
2026-05-15 22:31 ` Jason Gunthorpe
2026-05-20 15:25 ` Yu Zhang
2026-05-20 18:27 ` Jacob Pan
2026-05-21 12:27 ` Yu Zhang
2026-05-11 16:24 ` [PATCH v1 4/4] iommu/hyperv: Add page-selective IOTLB flush support Yu Zhang
2026-05-12 23:45 ` sashiko-bot
2026-05-14 18:14 ` Michael Kelley
2026-05-14 21:16 ` Michael Kelley
2026-05-15 16:23 ` Yu Zhang
2026-05-15 18:00 ` Michael Kelley
2026-05-15 23:33 ` Michael Kelley
2026-05-20 16:27 ` Yu Zhang
2026-05-15 22:35 ` Jason Gunthorpe
2026-05-20 17:15 ` Yu Zhang
2026-05-20 17:29 ` Jason Gunthorpe
2026-05-20 19:26 ` Michael Kelley
2026-05-20 20:40 ` Jacob Pan [this message]
2026-05-21 15:45 ` Michael Kelley
2026-05-21 14:34 ` Yu Zhang
2026-05-21 15:39 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520134027.00005e91@linux.microsoft.com \
--to=jacob.pan@linux.microsoft.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=decui@microsoft.com \
--cc=easwar.hariharan@linux.microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kwilczynski@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=mhklinux@outlook.com \
--cc=robh@kernel.org \
--cc=robin.murphy@arm.com \
--cc=tgopinath@linux.microsoft.com \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
--cc=zhangyu1@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.