Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com,
	kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com,
	jasowang@redhat.com, alex.williamson@redhat.com,
	bd.aviv@gmail.com
Subject: Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region
Date: Tue, 20 Dec 2016 12:16:50 +0800	[thread overview]
Message-ID: <20161220041650.GA22006@pxdev.xzpeter.org> (raw)
In-Reply-To: <20161219233012.GF23176@umbus.fritz.box>

On Tue, Dec 20, 2016 at 10:30:12AM +1100, David Gibson wrote:

[...]

> > +static void vtd_switch_address_space(IntelIOMMUState *s, bool enabled)
> > +{
> > +    GHashTableIter iter;
> > +    VTDBus *vtd_bus;
> > +    VTDAddressSpace *as;
> > +    int i;
> > +
> > +    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> > +    while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> > +        for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) {
> > +            as = vtd_bus->dev_as[i];
> > +            if (as == NULL) {
> > +                continue;
> > +            }
> > +            trace_vtd_switch_address_space(pci_bus_num(vtd_bus->bus),
> > +                                           VTD_PCI_SLOT(i), VTD_PCI_FUNC(i),
> > +                                           enabled);
> > +            if (enabled) {
> > +                memory_region_add_subregion_overlap(&as->root, 0,
> > +                                                    &as->iommu, 2);
> > +            } else {
> > +                memory_region_del_subregion(&as->root, &as->iommu);
> 
> Why not use memory_region_set_enabled() rather than actually
> adding/deleting the subregion?

Good idea, thanks. :-)

[...]

> > @@ -2343,15 +2378,47 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >          vtd_dev_as->devfn = (uint8_t)devfn;
> >          vtd_dev_as->iommu_state = s;
> >          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> > +
> > +        /*
> > +         * When DMAR is disabled, memory region relationships looks
> > +         * like:
> > +         *
> > +         * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > +         *  0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > +         *  00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > +         *
> > +         * When DMAR is disabled, it becomes:
> > +         *
> > +         * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > +         *  0000000000000000-ffffffffffffffff (prio 2, RW): intel_iommu
> > +         *  0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > +         *  00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > +         *
> > +         * The intel_iommu region is dynamically added/removed.
> > +         */
> >          memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
> >                                   &s->iommu_ops, "intel_iommu", UINT64_MAX);
> 
> I'm almost certain UINT64_MAX is wrong here.  For one thing it would
> collide with PCI BARs.  For another, I can't imagine that the IOMMU
> page tables can really span an entire 2^64 space.

Could you explain why here device address space has things to do with
PCI BARs? I thought BARs are for CPU address space only (so that CPU
can access PCI registers via MMIO manner), am I wrong?

I think we should have a big enough IOMMU region size here. If device
writes to invalid addresses, IMHO we should trap it and report to
guest. If we have a smaller size than UINT64_MAX, how we can trap this
behavior and report for the whole address space (it should cover [0,
2^64-1])?

> 
> > +        memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s),
> > +                                 "vtd_sys_alias", get_system_memory(),
> > +                                 0, memory_region_size(get_system_memory()));
> 
> I strongly suspect using memory_region_size(get_system_memory()) is
> also incorrect here.  System memory has size UINT64_MAX, but I'll bet
> you you can't actually access all of that via PCI space (again, it
> would collide with actual PCI BARs).  I also suspect you can't reach
> CPU MMIO regions via the PCI DMA space.

Hmm, sounds correct.

However if so we will have the same problem if without IOMMU? See
pci_device_iommu_address_space() - address_space_memory will be the
default if we have no IOMMU protection, and that will cover e.g. CPU
MMIO regions as well.

> 
> So, I think you should find out what this limit actually is and
> restrict the alias to that window.

/me needs some more reading to figure this out. Still not quite
familiar with the whole VM memory regions. Hints are welcomed...

> 
> >          memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
> >                                &vtd_mem_ir_ops, s, "intel_iommu_ir",
> >                                VTD_INTERRUPT_ADDR_SIZE);
> > -        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
> > -                                    &vtd_dev_as->iommu_ir);
> > -        address_space_init(&vtd_dev_as->as,
> > -                           &vtd_dev_as->iommu, "intel_iommu");
> > +        memory_region_init(&vtd_dev_as->root, OBJECT(s),
> > +                           "vtd_root", UINT64_MAX);
> > +        memory_region_add_subregion_overlap(&vtd_dev_as->root,
> > +                                            VTD_INTERRUPT_ADDR_FIRST,
> > +                                            &vtd_dev_as->iommu_ir, 64);
> > +        address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);
> > +        memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > +                                            &vtd_dev_as->sys_alias, 1);
> > +        if (s->dmar_enabled) {
> > +            memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > +                                                &vtd_dev_as->iommu, 2);
> > +        }
> 
> Hmm.  You have the IOMMU translated region overlaying the
> direct-mapped alias.  You enable and disable the IOMMU subregion, but
> you always leave the direct mapping enabled.  You might get away with
> this because the size of the IOMMU region is UINT64_MAX, which will
> overlay everything - but as above, I think that's wrong.  If that's
> changed then guest devices may be able to access portions of the raw
> address space outside the IOMMU mapped region, which could break the
> guest's expectations of device isolation.
> 
> I think it would be much safer to disable the system memory alias when
> the IOMMU is enabled.

Reasonable. Will adopt.

Thanks!

-- peterx

next prev parent reply	other threads:[~2016-12-20  4:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-19 14:41 [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2016-12-19 16:56 ` Alex Williamson
2016-12-20  3:44   ` Peter Xu
2016-12-20  4:52     ` Alex Williamson
2016-12-20  6:38       ` Peter Xu
2016-12-21  0:04         ` Alex Williamson
2016-12-21  3:19           ` Peter Xu
2016-12-21  3:49           ` David Gibson
2016-12-21  3:30       ` David Gibson
2016-12-19 23:30 ` David Gibson
2016-12-20  4:16   ` Peter Xu [this message]
2016-12-21  2:53     ` David Gibson
2016-12-21 10:05       ` Peter Xu
2016-12-21 22:56         ` David Gibson
2016-12-20 23:02 ` no-reply
2016-12-21  3:33   ` Peter Xu
2016-12-20 23:57 ` no-reply
2016-12-21  3:39   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161220041650.GA22006@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).