Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Igor Mammedov <imammedo@redhat.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
Date: Wed, 19 Dec 2018 11:40:37 +0100	[thread overview]
Message-ID: <20181219114037.5550a562@redhat.com> (raw)
In-Reply-To: <20181219025717.6m72hq73p2haexkv@linux.intel.com>

On Wed, 19 Dec 2018 10:57:17 +0800
Yu Zhang <yu.c.zhang@linux.intel.com> wrote:

> On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:
> > On Tue, 18 Dec 2018 17:27:23 +0800
> > Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> >   
> > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:  
> > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> > > >   
> > > > > Currently, vIOMMU is using the value of IOVA address width, instead of
> > > > > the host address width(HAW) to calculate the number of reserved bits in
> > > > > data structures such as root entries, context entries, and entries of
> > > > > DMA paging structures etc.
> > > > > 
> > > > > However values of IOVA address width and of the HAW may not equal. For
> > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > > > > in an invalid IOVA being accepted.
> > > > > 
> > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
> > > > > whose value is initialized based on the maximum physical address set to
> > > > > guest CPU.  
> > > >   
> > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > to clarify.
> > > > > 
> > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > > ---  
> > > > [...]
> > > >   
> > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > >  {
> > > > >      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > +    CPUState *cs = first_cpu;
> > > > > +    X86CPU *cpu = X86_CPU(cs);
> > > > >  
> > > > >      memset(s->csr, 0, DMAR_REG_SIZE);
> > > > >      memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > >      s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > >               VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > >               VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > -    if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > +    if (s->aw_bits == VTD_AW_48BIT) {
> > > > >          s->cap |= VTD_CAP_SAGAW_48bit;
> > > > >      }
> > > > >      s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > +    s->haw_bits = cpu->phys_bits;  
> > > > Is it possible to avoid accessing CPU fields directly or cpu altogether
> > > > and set phys_bits when iommu is created?  
> > > 
> > > Thanks for your comments, Igor.
> > > 
> > > Well, I guess you prefer not to query the CPU capabilities while deciding
> > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > 
> > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR
> > > are referring to the same concept. In VM, both are the maximum guest physical
> > > address width. If we do not check the CPU field here, we will still have to
> > > check the CPU field in other places such as build_dmar_q35(), and reset the
> > > s->haw_bits again.
> > > 
> > > Is this explanation convincing enough? :)  
> > current build_dmar_q35() doesn't do it, it's all new code in this series that
> > contains not acceptable direct access from one device (iommu) to another (cpu).   
> > Proper way would be for the owner of iommu to fish limits from somewhere and set
> > values during iommu creation.  
> 
> Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :)
> According to the spec, the host address width is the maximum physical address width,
> yet current implementation is using the DMA address width. For me, this is not only
> wrong, but also unsecure. For this point, I think we all agree this need to be fixed.
> 
> As to how to fix it - should we query the cpu fields, I still do not understand why
> this is not acceptable. :)
> 
> I had thought of other approaches before, yet I did not choose:
> 
> 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its  
> physical address width(similar to the "x-aw-bits" for IOVA). But what should we check
> this parameter or not? What if this parameter is set to sth. different than the "phys-bits"
> or not?
> 
> 2> Another choice I had thought of is, to query the physical iommu. I abandoned this  
> idea because my understanding is that vIOMMU is not a passthrued device, it is emulated.

> So Igor, may I ask why you think checking against the cpu fields so not acceptable? :)
Because accessing private fields of device from another random device is not robust
and a subject to breaking in unpredictable manner when field meaning or initialization
order changes. (analogy to baremetal: one does not solder wire to a CPU die to let
access some piece of data from random device).

I've looked at intel-iommu code and how it's created so here is a way to do the thing
you need using proper interfaces:

1. add x-haw_bits property
2. include in your series patch
    '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override  bus hotplug handler'
3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to
   machine and let _pre_plug handler to check and set x-haw_bits for machine level
4. you probably can use phys-bits/host-phys-bits properties to get data that you need
   also see how ms->possible_cpus, that's how you can get access to CPU from machine
   layer.

> >   
> > > > 
> > > > Perhaps Eduardo
> > > >  can suggest better approach, since he's more familiar with phys_bits topic  
> > > 
> > > @Eduardo, any comments? Thanks!
> > >   
> > > >   
> > > > >      /*
> > > > >       * Rsvd field masks for spte
> > > > >       */
> > > > >      vtd_paging_entry_rsvd_field[0] = ~0ULL;
> > > > > -    vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
> > > > > -    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
> > > > > +    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
> > > > >  
> > > > >      if (x86_iommu->intr_supported) {
> > > > >          s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
> > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > > > >      }
> > > > >  
> > > > >      /* Currently only address widths supported are 39 and 48 bits */
> > > > > -    if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
> > > > > -        (s->aw_bits != VTD_HOST_AW_48BIT)) {
> > > > > +    if ((s->aw_bits != VTD_AW_39BIT) &&
> > > > > +        (s->aw_bits != VTD_AW_48BIT)) {
> > > > >          error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > > > > -                   VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
> > > > > +                   VTD_AW_39BIT, VTD_AW_48BIT);
> > > > >          return false;
> > > > >      }
> > > > >  
> > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > > > > index ed4e758..820451c 100644
> > > > > --- a/include/hw/i386/intel_iommu.h
> > > > > +++ b/include/hw/i386/intel_iommu.h
> > > > > @@ -47,9 +47,9 @@
> > > > >  #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
> > > > >  
> > > > >  #define DMAR_REG_SIZE               0x230
> > > > > -#define VTD_HOST_AW_39BIT           39
> > > > > -#define VTD_HOST_AW_48BIT           48
> > > > > -#define VTD_HOST_ADDRESS_WIDTH      VTD_HOST_AW_39BIT
> > > > > +#define VTD_AW_39BIT                39
> > > > > +#define VTD_AW_48BIT                48
> > > > > +#define VTD_ADDRESS_WIDTH           VTD_AW_39BIT
> > > > >  #define VTD_HAW_MASK(aw)            ((1ULL << (aw)) - 1)
> > > > >  
> > > > >  #define DMAR_REPORT_F_INTR          (1)
> > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState {
> > > > >      bool intr_eime;                 /* Extended interrupt mode enabled */
> > > > >      OnOffAuto intr_eim;             /* Toggle for EIM cabability */
> > > > >      bool buggy_eim;                 /* Force buggy EIM unless eim=off */
> > > > > -    uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
> > > > > +    uint8_t aw_bits;                /* IOVA address width (in bits) */
> > > > > +    uint8_t haw_bits;               /* Hardware address width (in bits) */
> > > > >  
> > > > >      /*
> > > > >       * Protects IOMMU states in general.  Currently it protects the  
> > > > 
> > > >   
> > > 
> > > B.R.
> > > Yu  
> > 
> >   
> 
> B.R.
> Yu

next prev parent reply	other threads:[~2018-12-19 10:40 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
2018-12-17 13:17   ` Igor Mammedov
2018-12-18  9:27     ` Yu Zhang
2018-12-18 14:23       ` Michael S. Tsirkin
2018-12-18 14:55       ` Igor Mammedov
2018-12-18 14:58         ` Michael S. Tsirkin
2018-12-19  3:03           ` Yu Zhang
2018-12-19  3:12             ` Michael S. Tsirkin
2018-12-19  6:28               ` Yu Zhang
2018-12-19 15:30                 ` Michael S. Tsirkin
2018-12-19  2:57         ` Yu Zhang
2018-12-19 10:40           ` Igor Mammedov [this message]
2018-12-19 16:47             ` Michael S. Tsirkin
2018-12-20  5:59               ` Yu Zhang
2018-12-20 21:18             ` Eduardo Habkost
2018-12-21 14:13               ` Igor Mammedov
2018-12-21 16:09                 ` Yu Zhang
2018-12-21 17:04                   ` Michael S. Tsirkin
2018-12-21 17:37                     ` Yu Zhang
2018-12-21 19:02                       ` Michael S. Tsirkin
2018-12-21 20:01                         ` Eduardo Habkost
2018-12-22  1:11                         ` Yu Zhang
2018-12-25 16:56                           ` Michael S. Tsirkin
2018-12-26  5:30                             ` Yu Zhang
2018-12-27 15:14                               ` Eduardo Habkost
2018-12-28  2:32                                 ` Yu Zhang
2018-12-29  1:29                                   ` Eduardo Habkost
2019-01-15  7:13                                     ` Yu Zhang
2019-01-18  7:10                                       ` Yu Zhang
2018-12-27 14:54                 ` Eduardo Habkost
2018-12-28 11:42                   ` Igor Mammedov
2018-12-20 20:58       ` Eduardo Habkost
2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang
2018-12-17 13:29   ` Igor Mammedov
2018-12-18  9:47     ` Yu Zhang
2018-12-18 10:01       ` Yu Zhang
2018-12-18 12:43         ` Michael S. Tsirkin
2018-12-18 13:45           ` Yu Zhang
2018-12-18 14:49             ` Michael S. Tsirkin
2018-12-19  3:40               ` Yu Zhang
2018-12-19  4:35                 ` Michael S. Tsirkin
2018-12-19  5:57                   ` Yu Zhang
2018-12-19 15:23                     ` Michael S. Tsirkin
2018-12-20  5:49                       ` Yu Zhang
2018-12-20 18:28                         ` Michael S. Tsirkin
2018-12-21 16:19                           ` Yu Zhang
2018-12-21 17:15                             ` Michael S. Tsirkin
2018-12-21 17:34                               ` Yu Zhang
2018-12-21 18:10                                 ` Michael S. Tsirkin
2018-12-22  0:41                                   ` Yu Zhang
2018-12-25 17:00                                     ` Michael S. Tsirkin
2018-12-26  5:58                                       ` Yu Zhang
2018-12-25  1:59                                 ` Tian, Kevin
2018-12-14  9:17 ` [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
2019-01-15  4:02 ` Michael S. Tsirkin
2019-01-15  7:27   ` Yu Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181219114037.5550a562@redhat.com \
    --to=imammedo@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.