Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Auger Eric <eric.auger@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Tian Kevin <kevin.tian@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs
Date: Fri, 18 May 2018 09:39:42 +0200	[thread overview]
Message-ID: <941044eb-2f85-3d1d-a9d9-2c257648680e@redhat.com> (raw)
In-Reply-To: <20180518034153.GB2569@xz-mi>

Hi,

On 05/18/2018 05:41 AM, Peter Xu wrote:
> On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:
>> Hi Peter,
>>
>> On 05/04/2018 05:08 AM, Peter Xu wrote:
>>> During IOVA page table walking, there is a special case when the PSI
>>> covers one whole PDE (Page Directory Entry, which contains 512 Page
>>> Table Entries) or more.  In the past, we skip that entry and we don't
>>> notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
>>> notification to registered UNMAP notifiers in this case.
>>>
>>> For UNMAP only notifiers, this might cause IOTLBs cached in the devices
>>> even if they were already invalid.  For MAP/UNMAP notifiers like
>>> vfio-pci, this will cause stale page mappings.
>>>
>>> This special case doesn't trigger often, but it is very easy to be
>>> triggered by nested device assignments, since in that case we'll
>>> possibly map the whole L2 guest RAM region into the device's IOVA
>>> address space (several GBs at least), which is far bigger than normal
>>> kernel driver usages of the device (tens of MBs normally).
>>>
>>> Without this patch applied to L1 QEMU, nested device assignment to L2
>>> guests will dump some errors like:
>>>
>>> qemu-system-x86_64: VFIO_MAP_DMA: -17
>>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>>>                     0x7f89a920d000) = -17 (File exists)
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> [peterx: rewrite the commit message]
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>>>  1 file changed, 30 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index fb31de9416..b359efd6f9 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
>>>  
>>>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>>>  
>>> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
>>> +                             vtd_page_walk_hook hook_fn, void *private)
>> I find the function  name a bit weird as it does not does a ptw but
>> rather call a callback on an entry. vtd_callback_wrapper?
> 
> It's a hook for the page walk process, and IMHO vtd_callback_wrapper
> does not really provide any hint for the page walking.  So even if you
> prefer the "callback_wrapper" naming I would still more prefer:
> 
>   vtd_page_walk_callback[_wrapper]
> 
> though if so I'd say I don't see much benefit comparing to use the old
> vtd_page_walk_hook, which seems fine to me too...
I preferred vtd_page_walk_hook too.

Thanks

Eric
> 
>>> +{
>>> +    assert(hook_fn);
>>> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
>>> +                            entry->addr_mask, entry->perm);
>>> +    return hook_fn(entry, private);
>>> +}
>>> +
>>>  /**
>>>   * vtd_page_walk_level - walk over specific level for IOVA range
>>>   *
>>> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>>>           */
>>>          entry_valid = read_cur | write_cur;
>>>  
>>> +        entry.target_as = &address_space_memory;
>>> +        entry.iova = iova & subpage_mask;
>>> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>>> +        entry.addr_mask = ~subpage_mask;
>>> +
>>>          if (vtd_is_last_slpte(slpte, level)) {
>>> -            entry.target_as = &address_space_memory;
>>> -            entry.iova = iova & subpage_mask;
>>>              /* NOTE: this is only meaningful if entry_valid == true */
>>>              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
>>> -            entry.addr_mask = ~subpage_mask;
>>> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>>>              if (!entry_valid && !notify_unmap) {
>>>                  trace_vtd_page_walk_skip_perm(iova, iova_next);
>>>                  goto next;
>>>              }
>>> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
>>> -                                    entry.addr_mask, entry.perm);
>>> -            if (hook_fn) {
>>> -                ret = hook_fn(&entry, private);
>>> -                if (ret < 0) {
>>> -                    return ret;
>>> -                }
>>> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
>>> +            if (ret < 0) {
>>> +                return ret;
>>>              }
>>>          } else {
>>>              if (!entry_valid) {
>>> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
>>> +                if (notify_unmap) {
>>> +                    /*
>>> +                     * The whole entry is invalid; unmap it all.
>>> +                     * Translated address is meaningless, zero it.
>>> +                     */
>>> +                    entry.translated_addr = 0x0;
>> do you really need to zero the translated_addr and the related comment.
>> As soon as perm is NONE this should not be used?
> 
> Yes here we can avoid setting it.  However that'll make sure we don't
> leak strange numbers to the below notifiers, so I would still slightly
> prefer to zero it here.
> 
>>> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
>>> +                    if (ret < 0) {
>>> +                        return ret;
>>> +                    }
>>> +                } else {
>>> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
>>> +                }
>>>                  goto next;
>>>              }
>>>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
>>>
>>
>> Besides
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> 
> Thanks for reviewing the series.
> 
> Note that this v2 is obsolete, please feel free to read version 3 of
> this series, which contains quite a lot of functional changes.
> 
> Regards,
>

next prev parent reply	other threads:[~2018-05-18  7:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04  3:08 [Qemu-devel] [PATCH v2 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-05-17 14:42   ` Auger Eric
2018-05-18  3:41     ` Peter Xu
2018-05-18  7:39       ` Auger Eric [this message]
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 02/10] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-05-17  9:46   ` Auger Eric
2018-05-17 10:02     ` Peter Xu
2018-05-17 10:10       ` Auger Eric
2018-05-17 10:14         ` Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 03/10] intel-iommu: add iommu lock Peter Xu
2018-05-17 14:32   ` Auger Eric
2018-05-18  5:32     ` Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 04/10] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-05-17 13:39   ` Auger Eric
2018-05-18  5:53     ` Peter Xu
2018-05-18  7:38       ` Auger Eric
2018-05-18 10:02         ` Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 05/10] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-05-17 14:32   ` Auger Eric
2018-05-18  5:59     ` Peter Xu
2018-05-18  7:24       ` Auger Eric
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 06/10] intel-iommu: pass in address space when page walk Peter Xu
2018-05-17 14:32   ` Auger Eric
2018-05-18  6:02     ` Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 07/10] util: implement simple interval tree logic Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 08/10] intel-iommu: maintain per-device iova ranges Peter Xu
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 09/10] intel-iommu: don't unmap all for shadow page table Peter Xu
2018-05-17 17:23   ` Auger Eric
2018-05-18  6:06     ` Peter Xu
2018-05-18  7:31       ` Auger Eric
2018-05-04  3:08 ` [Qemu-devel] [PATCH v2 10/10] intel-iommu: remove notify_unmap for page walk Peter Xu
2018-05-04  3:20 ` [Qemu-devel] [PATCH v2 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes no-reply
2018-05-04  3:40   ` Peter Xu
2018-05-08  7:29 ` [Qemu-devel] [PATCH v2 11/10] tests: add interval tree unit test Peter Xu
2018-05-16  6:30 ` [Qemu-devel] [PATCH v2 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-05-16 13:57   ` Jason Wang
2018-05-17  2:45     ` Peter Xu
2018-05-17  3:39       ` Alex Williamson
2018-05-17  4:16         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=941044eb-2f85-3d1d-a9d9-2c257648680e@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).