From: "Roedel, Joerg" <Joerg.Roedel@amd.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Kay, Allen M" <allen.m.kay@intel.com>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>
Subject: Re: [RFC][PATCH] Fix superpage unmap on Intel IOMMU
Date: Mon, 6 Jun 2011 11:50:30 +0200 [thread overview]
Message-ID: <20110606095030.GB30762@amd.com> (raw)
In-Reply-To: <1307129491.23418.4.camel@i7.infradead.org>
On Fri, Jun 03, 2011 at 03:31:30PM -0400, David Woodhouse wrote:
> + /* The KVM code is *fucked* in the head. It maps the range
> + one page at a time, using 4KiB pages unless it actually
> + allocated hugepages using hugetlbfs.
This is acutally by design. The IOMMU driver is not responsible for
finding out details like the continuity of the memory it has to map. It
just gets the physical address and the size and maps it into its
page-tables. Any details beside that are the business of KVM, and since
KVM allocates guest memory in user-space the only way to find out the
size of the physically contiguous region is the page-size of the vma.
> + (So we get to flush the CPU data cache and then
> + the IOTLB for each page in its loop).
The best way to avoid this is to introduce some kind of
iommu_domain_commit function which KVM calls in the end and which just
does a singe iotlb flush (and a dcache flush on VT-d).
This is at least better than moving more functionality into the IOMMU
driver (for example by teaching the driver to map a vma region). The
mapping code should stay common between IOMMU drivers as it is now.
> + And on unmap, it unmaps 4KiB at a time (always
> + passing gfp_order==0), regardless of whether it mapped
> + using superpages or not. So on unmap, if we detect a
> + superpage in our page tables we are expected to unmap
> + *more* than we are asked to, and return a value indicating
> + how much we actually unmapped. WTF? */
That is actually a weak point, I agree. The interface isn't very
consistent here. The idea was that KVM does not need to care which
page-size was used to map.
Probably we can clean that interface up and allow the ->unmap call to
only unmap the requested size, forcing to split a page if needed. That
makes the interface a lot cleaner and consistent, but the KVM IOMMU
unmap path a lot more costly.
But all KVM wants to do is to unmap everything from the io-page-table
and unpin the pages. We can hide the unmapping part in IOMMU domain
destruction and KVM only unpins its pages.
Thoughts?
Regards,
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
prev parent reply other threads:[~2011-06-06 9:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-03 19:31 [RFC][PATCH] Fix superpage unmap on Intel IOMMU David Woodhouse
2011-06-03 20:00 ` David Woodhouse
2011-06-03 20:01 ` Alex Williamson
2011-06-04 9:00 ` David Woodhouse
2011-06-06 9:50 ` Roedel, Joerg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110606095030.GB30762@amd.com \
--to=joerg.roedel@amd.com \
--cc=alex.williamson@redhat.com \
--cc=allen.m.kay@intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux-foundation.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox