From: Lu Baolu <baolu.lu@linux.intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: baolu.lu@linux.intel.com, Joerg Roedel <joro@8bytes.org>,
David Woodhouse <dwmw2@infradead.org>,
Alex Williamson <alex.williamson@redhat.com>,
kevin.tian@intel.com, Yi Sun <yi.y.sun@linux.intel.com>,
ashok.raj@intel.com, kvm@vger.kernel.org,
sanjay.k.kumar@intel.com, iommu@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, yi.y.sun@intel.com
Subject: Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces
Date: Sat, 28 Sep 2019 16:23:16 +0800 [thread overview]
Message-ID: <66823e27-aa33-5968-b5fd-e5221fb1fffe@linux.intel.com> (raw)
In-Reply-To: <20190927053449.GA9412@xz-x1>
Hi Peter,
On 9/27/19 1:34 PM, Peter Xu wrote:
> Hi, Baolu,
>
> On Fri, Sep 27, 2019 at 10:27:24AM +0800, Lu Baolu wrote:
>>>>>> + spin_lock(&(domain)->page_table_lock); \
>>>>>
>>>>> Is this intended to lock here instead of taking the lock during the
>>>>> whole page table walk? Is it safe?
>>>>>
>>>>> Taking the example where nm==PTE: when we reach here how do we
>>>>> guarantee that the PMD page that has this PTE is still valid?
>>>>
>>>> We will always keep the non-leaf pages in the table,
>>>
>>> I see. Though, could I ask why? It seems to me that the existing 2nd
>>> level page table does not keep these when unmap, and it's not even use
>>> locking at all by leveraging cmpxchg()?
>>
>> I still need some time to understand how cmpxchg() solves the race issue
>> when reclaims pages. For example.
>>
>> Thread A Thread B
>> -A1: check all PTE's empty -B1: up-level PDE valid
>> -A2: clear the up-level PDE
>> -A3: reclaim the page -B2: populate the PTEs
>>
>> Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen.
>
> I'm not sure of this, but IMHO it is similarly because we need to
> allocate the iova ranges from iova allocator first, so thread A (who's
> going to unmap pages) and thread B (who's going to map new pages)
> should never have collapsed regions if happening concurrently. I'm
Although they don't collapse, they might share a same pmd entry. If A
cleared the pmd entry and B goes ahead with populating the pte's. It
will crash.
> referring to intel_unmap() in which we won't free the iova region
> before domain_unmap() completes (which should cover the whole process
> of A1-A3) so the same iova range to be unmapped won't be allocated to
> any new pages in some other thread.
>
> There's also a hint in domain_unmap():
>
> /* we don't need lock here; nobody else touches the iova range */
>
>>
>> Actually, the iova allocator always packs IOVA ranges close to the top
>> of the address space. This results in requiring a minimal number of
>> pages to map the allocated IOVA ranges, which makes memory onsumption
>> by IOMMU page tables tolerable. Hence, we don't need to reclaim the
>> pages until the whole page table is about to tear down. The real data
>> on my test machine also improves this.
>
> Do you mean you have run the code with a 1st-level-supported IOMMU
> hardware? IMHO any data point would be good to be in the cover letter
> as reference.
Yes. Sure! Let me do this since the next version.
>
> [...]
>
>>>>>> +static struct page *
>>>>>> +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd,
>>>>>> + unsigned long addr, unsigned long end,
>>>>>> + struct page *freelist, bool reclaim)
>>>>>> +{
>>>>>> + int i;
>>>>>> + unsigned long start;
>>>>>> + pte_t *pte, *first_pte;
>>>>>> +
>>>>>> + start = addr;
>>>>>> + pte = pte_offset_kernel(pmd, addr);
>>>>>> + first_pte = pte;
>>>>>> + do {
>>>>>> + set_pte(pte, __pte(0));
>>>>>> + } while (pte++, addr += PAGE_SIZE, addr != end);
>>>>>> +
>>>>>> + domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte);
>>>>>> +
>>>>>> + /* Add page to free list if all entries are empty. */
>>>>>> + if (reclaim) {
>>>>>
>>>>> Shouldn't we know whether to reclaim if with (addr, end) specified as
>>>>> long as they cover the whole range of this PMD?
>>>>
>>>> Current policy is that we don't reclaim any pages until the whole page
>>>> table will be torn down.
>>>
>>> Ah OK. But I saw that you're passing in relaim==!start_addr.
>>> Shouldn't that errornously trigger if one wants to unmap the 1st page
>>> as well even if not the whole address space?
>>
>> IOVA 0 is assumed to be reserved by the allocator. Otherwise, we have no
>> means to check whether a IOVA is valid.
>
> Is this an assumption of the allocator? Could that change in the future?
Yes. And I think it should keep unless no consumer depends on this
optimization.
>
> IMHO that's not necessary if so, after all it's as simple as replacing
> (!start_addr) with (start == 0 && end == END). I see that in
> domain_unmap() it has a similar check when freeing pgd:
>
> if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw))
>
Yours looks better. Thank you!
> Thanks,
>
Best regards,
Baolu
next prev parent reply other threads:[~2019-09-28 8:25 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-23 12:24 [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 1/4] iommu/vt-d: Move domain_flush_cache helper into header Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces Lu Baolu
2019-09-23 20:31 ` Raj, Ashok
2019-09-24 1:38 ` Lu Baolu
2019-09-25 4:30 ` Peter Xu
2019-09-25 4:38 ` Tian, Kevin
2019-09-25 5:24 ` Peter Xu
2019-09-25 6:52 ` Lu Baolu
2019-09-25 7:32 ` Tian, Kevin
2019-09-25 8:35 ` Peter Xu
2019-09-26 1:42 ` Lu Baolu
2019-09-25 5:21 ` Peter Xu
2019-09-26 2:35 ` Lu Baolu
2019-09-26 3:49 ` Peter Xu
2019-09-27 2:27 ` Lu Baolu
2019-09-27 5:34 ` Peter Xu
2019-09-28 8:23 ` Lu Baolu [this message]
2019-09-29 5:25 ` Peter Xu
2019-10-08 2:20 ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap Lu Baolu
2019-09-25 5:00 ` Tian, Kevin
2019-09-25 7:06 ` Lu Baolu
2019-09-23 12:24 ` [RFC PATCH 4/4] iommu/vt-d: Identify domains using first level page table Lu Baolu
2019-09-25 6:50 ` Peter Xu
2019-09-25 7:35 ` Tian, Kevin
2019-09-23 19:27 ` [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest Jacob Pan
2019-09-23 20:25 ` Raj, Ashok
2019-09-24 4:40 ` Lu Baolu
2019-09-24 7:00 ` Tian, Kevin
2019-09-25 2:48 ` Lu Baolu
2019-09-25 6:56 ` Peter Xu
2019-09-25 7:21 ` Tian, Kevin
2019-09-25 7:45 ` Peter Xu
2019-09-25 8:02 ` Tian, Kevin
2019-09-25 8:52 ` Peter Xu
2019-09-26 1:37 ` Lu Baolu
2019-09-24 4:27 ` Lu Baolu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66823e27-aa33-5968-b5fd-e5221fb1fffe@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=sanjay.k.kumar@intel.com \
--cc=yi.y.sun@intel.com \
--cc=yi.y.sun@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox