From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: Yu Zhang <zhangyu1@linux.microsoft.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
"kys@microsoft.com" <kys@microsoft.com>,
"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
"decui@microsoft.com" <decui@microsoft.com>,
"longli@microsoft.com" <longli@microsoft.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"will@kernel.org" <will@kernel.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"kwilczynski@kernel.org" <kwilczynski@kernel.org>,
"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
"mani@kernel.org" <mani@kernel.org>,
"robh@kernel.org" <robh@kernel.org>,
"arnd@arndb.de" <arnd@arndb.de>,
"tgopinath@linux.microsoft.com" <tgopinath@linux.microsoft.com>,
"easwar.hariharan@linux.microsoft.com"
<easwar.hariharan@linux.microsoft.com>,
jacob.pan@linux.microsoft.com
Subject: Re: [PATCH v1 4/4] iommu/hyperv: Add page-selective IOTLB flush support
Date: Wed, 20 May 2026 13:40:27 -0700 [thread overview]
Message-ID: <20260520134027.00005e91@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB4157C1EC7F5F69C5ABDA9C7FD4012@SN6PR02MB4157.namprd02.prod.outlook.com>
Hi Michael,
On Wed, 20 May 2026 19:26:24 +0000
Michael Kelley <mhklinux@outlook.com> wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> To: Yu Zhang <zhangyu1@linux.microsoft.com>, Jason Gunthorpe
> <jgg@ziepe.ca> CC: "linux-kernel@vger.kernel.org"
> <linux-kernel@vger.kernel.org>, "linux-hyperv@vger.kernel.org"
> <linux-hyperv@vger.kernel.org>, "iommu@lists.linux.dev"
> <iommu@lists.linux.dev>, "linux-pci@vger.kernel.org"
> <linux-pci@vger.kernel.org>, "linux-arch@vger.kernel.org"
> <linux-arch@vger.kernel.org>, "wei.liu@kernel.org"
> <wei.liu@kernel.org>, "kys@microsoft.com" <kys@microsoft.com>,
> "haiyangz@microsoft.com" <haiyangz@microsoft.com>,
> "decui@microsoft.com" <decui@microsoft.com>, "longli@microsoft.com"
> <longli@microsoft.com>, "joro@8bytes.org" <joro@8bytes.org>,
> "will@kernel.org" <will@kernel.org>, "robin.murphy@arm.com"
> <robin.murphy@arm.com>, "bhelgaas@google.com" <bhelgaas@google.com>,
> "kwilczynski@kernel.org" <kwilczynski@kernel.org>,
> "lpieralisi@kernel.org" <lpieralisi@kernel.org>, "mani@kernel.org"
> <mani@kernel.org>, "robh@kernel.org" <robh@kernel.org>,
> "arnd@arndb.de" <arnd@arndb.de>, "jacob.pan@linux.microsoft.com"
> <jacob.pan@linux.microsoft.com>, "tgopinath@linux.microsoft.com"
> <tgopinath@linux.microsoft.com>,
> "easwar.hariharan@linux.microsoft.com"
> <easwar.hariharan@linux.microsoft.com> Subject: RE: [PATCH v1 4/4]
> iommu/hyperv: Add page-selective IOTLB flush support Date: Wed, 20
> May 2026 19:26:24 +0000
>
> From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Wednesday, May
> 20, 2026 10:15 AM
> >
> > On Fri, May 15, 2026 at 07:35:45PM -0300, Jason Gunthorpe wrote:
> > > On Tue, May 12, 2026 at 12:24:08AM +0800, Yu Zhang wrote:
> > > > +static inline u16 hv_iommu_fill_iova_list(union
> > > > hv_iommu_flush_va *iova_list,
> > > > + unsigned long start,
> > > > + unsigned long end)
> > > > +{
> > > > + unsigned long start_pfn = start >> PAGE_SHIFT;
> > > > + unsigned long end_pfn = PAGE_ALIGN(end) >> PAGE_SHIFT;
> > > > + unsigned long nr_pages = end_pfn - start_pfn;
> > > > + u16 count = 0;
> > > > +
> > > > + while (nr_pages > 0) {
> > > > + unsigned long flush_pages;
> > > > + int order;
> > > > + unsigned long pfn_align;
> > > > + unsigned long size_align;
> > > > +
> > > > + if (count >= HV_IOMMU_MAX_FLUSH_VA_COUNT) {
> > > > + count = HV_IOMMU_FLUSH_VA_OVERFLOW;
> > > > + break;
> > > > + }
> > > > +
> > > > + if (start_pfn)
> > > > + pfn_align = __ffs(start_pfn);
> > > > + else
> > > > + pfn_align = BITS_PER_LONG - 1;
> > > > +
> > > > + size_align = __fls(nr_pages);
> > > > + order = min(pfn_align, size_align);
> > > > + iova_list[count].page_mask_shift = order;
> > > > + iova_list[count].page_number = start_pfn;
> > > > +
> > > > + flush_pages = 1UL << order;
> > > > + start_pfn += flush_pages;
> > > > + nr_pages -= flush_pages;
> > > > + count++;
> > > > + }
> > >
> > > This seems like a really silly hypervisor interface. Why doesn't
> > > it just accept a normal range? Splitting it into power of two
> > > aligned ranges is very inefficient.
> >
> > Fair point. I'm not sure how much flexibility we have to change
> > this hypercall interface at the moment - it predates the pvIOMMU
> > work and may have other consumers beyond Linux guest. On the other
> > hand, having the guest specify 2^N-aligned blocks does save the
> > hypervisor from having to decompose ranges itself before issuing
> > hardware invalidation commands - the guest-provided entries can be
> > fed to the HW more or less directly.
> >
> > That said, the way I'm currently using this interface may be
> > more precise than necessary. Maybe we have 2 options:
> >
> > 1) Current approach: decompose the range into multiple exact
> > 2^N-aligned blocks with no over-flush, but at the cost of
> > more complex calculations and more entries.
> >
> > 2) Follow what Intel/AMD drivers do: find a single minimal
> > 2^N-aligned block that covers the entire range, but may
> > over-flush.
> >
> > Any preference?
> >
> > @Michael, since you've also been reviewing this patch, I'd
> > appreciate your thoughts on the above as well. :)
> >
>
> I'm just guessing, but perhaps flushing an aligned power-of-2
> range can be processed by the hypervisor at a relatively fixed
> cost, regardless of the size. Having the guest do the decomposing
> of an arbitrary range allows the hypervisor to make use of the
> existing "rep" hypercall mechanism if the hypercall is taking
> "too long". The hypervisor can pause its processing, return to
> the guest temporarily, and then continue the hypercall. If the
> arbitrary range were passed into the hypercall for the hypervisor
> to do the decomposing, that pause-and-restart mechanism
> wouldn't be available.
>
> Of course, Linux doesn't really take advantage of the pause to
> reduce guest interrupt latency because the Hyper-V code in
> Linux typically disable interrupts around a hypercall due to the
> way the hypercall input page is allocated. But other guest
> operating systems might benefit from such a pause. And we could
> probably fix the Hyper-V code in Linux to allow interrupts during a
> hypercall pause/restart if long-running hypercalls turn out to be
> a problem.
I am not sure if this pause feature is suitable for IOTLB flush at all
since it is inherently synchronous — the caller must block until all
invalidations complete. Pausing mid-flush to return to the guest
doesn't help if the guest can't make forward progress anyway.
prev parent reply other threads:[~2026-05-20 20:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 16:24 [PATCH v1 0/4] Hyper-V: Add para-virtualized IOMMU support for Linux guests Yu Zhang
2026-05-11 16:24 ` [PATCH v1 1/4] iommu: Move Hyper-V IOMMU driver to its own subdirectory Yu Zhang
2026-05-15 22:19 ` Jason Gunthorpe
2026-05-20 6:37 ` Yu Zhang
2026-05-20 13:38 ` Jason Gunthorpe
2026-05-11 16:24 ` [PATCH v1 2/4] hyperv: Introduce new hypercall interfaces used by Hyper-V guest IOMMU Yu Zhang
2026-05-11 16:24 ` [PATCH v1 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest Yu Zhang
2026-05-13 18:39 ` Jacob Pan
2026-05-15 12:38 ` Yu Zhang
2026-05-14 18:13 ` Michael Kelley
2026-05-15 13:59 ` Yu Zhang
2026-05-15 14:51 ` Michael Kelley
2026-05-15 16:53 ` Yu Zhang
2026-05-15 17:36 ` Michael Kelley
2026-05-20 15:50 ` Yu Zhang
2026-05-16 0:11 ` Mukesh R
2026-05-18 9:38 ` Yu Zhang
2026-05-15 22:31 ` Jason Gunthorpe
2026-05-20 15:25 ` Yu Zhang
2026-05-20 18:27 ` Jacob Pan
2026-05-11 16:24 ` [PATCH v1 4/4] iommu/hyperv: Add page-selective IOTLB flush support Yu Zhang
2026-05-12 23:45 ` sashiko-bot
2026-05-14 18:14 ` Michael Kelley
2026-05-14 21:16 ` Michael Kelley
2026-05-15 16:23 ` Yu Zhang
2026-05-15 18:00 ` Michael Kelley
2026-05-15 23:33 ` Michael Kelley
2026-05-20 16:27 ` Yu Zhang
2026-05-15 22:35 ` Jason Gunthorpe
2026-05-20 17:15 ` Yu Zhang
2026-05-20 17:29 ` Jason Gunthorpe
2026-05-20 19:26 ` Michael Kelley
2026-05-20 20:40 ` Jacob Pan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520134027.00005e91@linux.microsoft.com \
--to=jacob.pan@linux.microsoft.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=decui@microsoft.com \
--cc=easwar.hariharan@linux.microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kwilczynski@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=mhklinux@outlook.com \
--cc=robh@kernel.org \
--cc=robin.murphy@arm.com \
--cc=tgopinath@linux.microsoft.com \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
--cc=zhangyu1@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox