From: Peter Xu <peterx@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com,
kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com,
jasowang@redhat.com, alex.williamson@redhat.com,
bd.aviv@gmail.com
Subject: Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region
Date: Tue, 20 Dec 2016 12:16:50 +0800 [thread overview]
Message-ID: <20161220041650.GA22006@pxdev.xzpeter.org> (raw)
In-Reply-To: <20161219233012.GF23176@umbus.fritz.box>
On Tue, Dec 20, 2016 at 10:30:12AM +1100, David Gibson wrote:
[...]
> > +static void vtd_switch_address_space(IntelIOMMUState *s, bool enabled)
> > +{
> > + GHashTableIter iter;
> > + VTDBus *vtd_bus;
> > + VTDAddressSpace *as;
> > + int i;
> > +
> > + g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> > + while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> > + for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) {
> > + as = vtd_bus->dev_as[i];
> > + if (as == NULL) {
> > + continue;
> > + }
> > + trace_vtd_switch_address_space(pci_bus_num(vtd_bus->bus),
> > + VTD_PCI_SLOT(i), VTD_PCI_FUNC(i),
> > + enabled);
> > + if (enabled) {
> > + memory_region_add_subregion_overlap(&as->root, 0,
> > + &as->iommu, 2);
> > + } else {
> > + memory_region_del_subregion(&as->root, &as->iommu);
>
> Why not use memory_region_set_enabled() rather than actually
> adding/deleting the subregion?
Good idea, thanks. :-)
[...]
> > @@ -2343,15 +2378,47 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> > vtd_dev_as->devfn = (uint8_t)devfn;
> > vtd_dev_as->iommu_state = s;
> > vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> > +
> > + /*
> > + * When DMAR is disabled, memory region relationships looks
> > + * like:
> > + *
> > + * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > + * 0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > + * 00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > + *
> > + * When DMAR is disabled, it becomes:
> > + *
> > + * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > + * 0000000000000000-ffffffffffffffff (prio 2, RW): intel_iommu
> > + * 0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > + * 00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > + *
> > + * The intel_iommu region is dynamically added/removed.
> > + */
> > memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
> > &s->iommu_ops, "intel_iommu", UINT64_MAX);
>
> I'm almost certain UINT64_MAX is wrong here. For one thing it would
> collide with PCI BARs. For another, I can't imagine that the IOMMU
> page tables can really span an entire 2^64 space.
Could you explain why here device address space has things to do with
PCI BARs? I thought BARs are for CPU address space only (so that CPU
can access PCI registers via MMIO manner), am I wrong?
I think we should have a big enough IOMMU region size here. If device
writes to invalid addresses, IMHO we should trap it and report to
guest. If we have a smaller size than UINT64_MAX, how we can trap this
behavior and report for the whole address space (it should cover [0,
2^64-1])?
>
> > + memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s),
> > + "vtd_sys_alias", get_system_memory(),
> > + 0, memory_region_size(get_system_memory()));
>
> I strongly suspect using memory_region_size(get_system_memory()) is
> also incorrect here. System memory has size UINT64_MAX, but I'll bet
> you you can't actually access all of that via PCI space (again, it
> would collide with actual PCI BARs). I also suspect you can't reach
> CPU MMIO regions via the PCI DMA space.
Hmm, sounds correct.
However if so we will have the same problem if without IOMMU? See
pci_device_iommu_address_space() - address_space_memory will be the
default if we have no IOMMU protection, and that will cover e.g. CPU
MMIO regions as well.
>
> So, I think you should find out what this limit actually is and
> restrict the alias to that window.
/me needs some more reading to figure this out. Still not quite
familiar with the whole VM memory regions. Hints are welcomed...
>
> > memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
> > &vtd_mem_ir_ops, s, "intel_iommu_ir",
> > VTD_INTERRUPT_ADDR_SIZE);
> > - memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
> > - &vtd_dev_as->iommu_ir);
> > - address_space_init(&vtd_dev_as->as,
> > - &vtd_dev_as->iommu, "intel_iommu");
> > + memory_region_init(&vtd_dev_as->root, OBJECT(s),
> > + "vtd_root", UINT64_MAX);
> > + memory_region_add_subregion_overlap(&vtd_dev_as->root,
> > + VTD_INTERRUPT_ADDR_FIRST,
> > + &vtd_dev_as->iommu_ir, 64);
> > + address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);
> > + memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > + &vtd_dev_as->sys_alias, 1);
> > + if (s->dmar_enabled) {
> > + memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > + &vtd_dev_as->iommu, 2);
> > + }
>
> Hmm. You have the IOMMU translated region overlaying the
> direct-mapped alias. You enable and disable the IOMMU subregion, but
> you always leave the direct mapping enabled. You might get away with
> this because the size of the IOMMU region is UINT64_MAX, which will
> overlay everything - but as above, I think that's wrong. If that's
> changed then guest devices may be able to access portions of the raw
> address space outside the IOMMU mapped region, which could break the
> guest's expectations of device isolation.
>
> I think it would be much safer to disable the system memory alias when
> the IOMMU is enabled.
Reasonable. Will adopt.
Thanks!
-- peterx
next prev parent reply other threads:[~2016-12-20 4:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-19 14:41 [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2016-12-19 16:56 ` Alex Williamson
2016-12-20 3:44 ` Peter Xu
2016-12-20 4:52 ` Alex Williamson
2016-12-20 6:38 ` Peter Xu
2016-12-21 0:04 ` Alex Williamson
2016-12-21 3:19 ` Peter Xu
2016-12-21 3:49 ` David Gibson
2016-12-21 3:30 ` David Gibson
2016-12-19 23:30 ` David Gibson
2016-12-20 4:16 ` Peter Xu [this message]
2016-12-21 2:53 ` David Gibson
2016-12-21 10:05 ` Peter Xu
2016-12-21 22:56 ` David Gibson
2016-12-20 23:02 ` no-reply
2016-12-21 3:33 ` Peter Xu
2016-12-20 23:57 ` no-reply
2016-12-21 3:39 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161220041650.GA22006@pxdev.xzpeter.org \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=bd.aviv@gmail.com \
--cc=david@gibson.dropbear.id.au \
--cc=jan.kiszka@siemens.com \
--cc=jasowang@redhat.com \
--cc=kevin.tian@intel.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tianyu.lan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).