qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Liu, Yi L" <yi.l.liu@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Eric Auger <eric.auger@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Alexander Witte <alexander.witte@baicanada.com>,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Date: Wed, 18 Apr 2018 14:26:01 +0800	[thread overview]
Message-ID: <20180418062601.GB14841@xz-mi> (raw)
In-Reply-To: <A2975661238FB949B60364EF0F2C257439BDD8EC@SHSMSX104.ccr.corp.intel.com>

On Wed, Apr 18, 2018 at 05:29:56AM +0000, Liu, Yi L wrote:
> > Sent: Wednesday, April 18, 2018 12:51 PM
> > Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> > set
> > 
> > During IOVA page table walk, there is a special case when:
> > 
> > - notify_unmap is set, meanwhile
> > - entry is invalid
> 
> This is very brief description, would you mind talk a little bit more.

It means the case when the program reaches [1] below.

>  
> > In the past, we skip the entry always.  This is not correct.  We should send UNMAP
> > notification to registered notifiers in this case.  Otherwise some stall pages will still
> > be mapped in the host even if L1 guest unmapped them already.
> >
> > Without this patch, nested device assignment to L2 guests might dump some errors
> > like:
> 
> Should it be physical device assigned from L0 host? Or emulated devices could also
> trigger this problem?

If using emulated devices, we possibly need three levels, so I think
we can also see this warning if you assign a emulated device from L1
guest to L2 then to L3, and you should be able to see this warning
dumped from the QEMU that runs L2.

> 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                     0x7f89a920d000) = -17 (File exists)
> > 
> > To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected by this
> > problem).
> 
> Does this fix also apply to L0 QEMU?

Sorry I wasn't clear.  When I say L1 QEMU I did mean the QEMU that
runs as L1.  I believe it means your "L0 QEMU" here.

And yes, this fix should also be valid even if without nesting,
however we can hardly trigger this (that's why I just found it
recently when people reported nested breakage, since it is hardly seen
without nested), but AFAIU it's still possible.

>  
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > To test nested assignment, one also needs to apply below patchset:
> > https://lkml.org/lkml/2018/4/18/5
> > ---
> >  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > fb31de9416..b359efd6f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t
> > iova, bool is_write,
> > 
> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> > 
> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > +                             vtd_page_walk_hook hook_fn, void *private)
> > +{
> > +    assert(hook_fn);
> > +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> > +                            entry->addr_mask, entry->perm);
> > +    return hook_fn(entry, private);
> > +}
> > +
> >  /**
> >   * vtd_page_walk_level - walk over specific level for IOVA range
> >   *
> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> > start,
> >           */
> >          entry_valid = read_cur | write_cur;
> > 
> > +        entry.target_as = &address_space_memory;
> > +        entry.iova = iova & subpage_mask;
> > +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> > +        entry.addr_mask = ~subpage_mask;
> > +
> >          if (vtd_is_last_slpte(slpte, level)) {
> > -            entry.target_as = &address_space_memory;
> > -            entry.iova = iova & subpage_mask;
> >              /* NOTE: this is only meaningful if entry_valid == true */
> >              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> > -            entry.addr_mask = ~subpage_mask;
> > -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >              if (!entry_valid && !notify_unmap) {
> >                  trace_vtd_page_walk_skip_perm(iova, iova_next);
> >                  goto next;
> >              }
> > -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> > -                                    entry.addr_mask, entry.perm);
> > -            if (hook_fn) {
> > -                ret = hook_fn(&entry, private);
> > -                if (ret < 0) {
> > -                    return ret;
> > -                }
> > +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +            if (ret < 0) {
> > +                return ret;
> >              }
> >          } else {
> >              if (!entry_valid) {

[1]

> > -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                if (notify_unmap) {
> > +                    /*
> > +                     * The whole entry is invalid; unmap it all.
> > +                     * Translated address is meaningless, zero it.
> > +                     */
> > +                    entry.translated_addr = 0x0;
> > +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +                    if (ret < 0) {
> > +                        return ret;
> > +                    }
> > +                } else {
> > +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                }
> >                  goto next;
> >              }
> >              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> > --
> > 2.14.3
> > 
> 
> Thanks,
> Yi Liu

-- 
Peter Xu

  reply	other threads:[~2018-04-18  6:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
2018-04-18  5:28 ` Peter Xu
2018-04-18  5:29 ` Liu, Yi L
2018-04-18  6:26   ` Peter Xu [this message]
2018-04-20  4:57 ` Jason Wang
2018-04-20  5:11   ` Peter Xu
2018-04-20  8:29     ` Jason Wang
2018-04-20  8:30 ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418062601.GB14841@xz-mi \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.witte@baicanada.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).