From: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Christian König" <christian.koenig@amd.com>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
"Matthew Wilcox" <willy@infradead.org>,
"moderated list:DMA BUFFER SHARING FRAMEWORK"
<linaro-mm-sig@lists.linaro.org>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"John Stultz" <john.stultz@linaro.org>,
"DRI Development" <dri-devel@lists.freedesktop.org>,
"Daniel Vetter" <daniel.vetter@intel.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"open list:DMA BUFFER SHARING FRAMEWORK"
<linux-media@vger.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap
Date: Thu, 11 Mar 2021 16:37:45 +0100 [thread overview]
Message-ID: <b5385bb5-9d1b-3ce6-ee1f-a669d0780253@shipmail.org> (raw)
In-Reply-To: <CAKMK7uEzGKUc27xdWTv7KPESsyg1kCYCmVxP3b-HrzNCNO5x7g@mail.gmail.com>
On 3/11/21 2:17 PM, Daniel Vetter wrote:
> On Thu, Mar 11, 2021 at 2:12 PM Thomas Hellström (Intel)
> <thomas_os@shipmail.org> wrote:
>> Hi!
>>
>> On 3/11/21 2:00 PM, Daniel Vetter wrote:
>>> On Thu, Mar 11, 2021 at 11:22:06AM +0100, Thomas Hellström (Intel) wrote:
>>>> On 3/1/21 3:09 PM, Daniel Vetter wrote:
>>>>> On Mon, Mar 1, 2021 at 11:17 AM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 01.03.21 um 10:21 schrieb Thomas Hellström (Intel):
>>>>>>> On 3/1/21 10:05 AM, Daniel Vetter wrote:
>>>>>>>> On Mon, Mar 01, 2021 at 09:39:53AM +0100, Thomas Hellström (Intel)
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 3/1/21 9:28 AM, Daniel Vetter wrote:
>>>>>>>>>> On Sat, Feb 27, 2021 at 9:06 AM Thomas Hellström (Intel)
>>>>>>>>>> <thomas_os@shipmail.org> wrote:
>>>>>>>>>>> On 2/26/21 2:28 PM, Daniel Vetter wrote:
>>>>>>>>>>>> So I think it stops gup. But I haven't verified at all. Would be
>>>>>>>>>>>> good
>>>>>>>>>>>> if Christian can check this with some direct io to a buffer in
>>>>>>>>>>>> system
>>>>>>>>>>>> memory.
>>>>>>>>>>> Hmm,
>>>>>>>>>>>
>>>>>>>>>>> Docs (again vm_normal_page() say)
>>>>>>>>>>>
>>>>>>>>>>> * VM_MIXEDMAP mappings can likewise contain memory with or
>>>>>>>>>>> without "struct
>>>>>>>>>>> * page" backing, however the difference is that _all_ pages
>>>>>>>>>>> with a struct
>>>>>>>>>>> * page (that is, those where pfn_valid is true) are refcounted
>>>>>>>>>>> and
>>>>>>>>>>> considered
>>>>>>>>>>> * normal pages by the VM. The disadvantage is that pages are
>>>>>>>>>>> refcounted
>>>>>>>>>>> * (which can be slower and simply not an option for some PFNMAP
>>>>>>>>>>> users). The
>>>>>>>>>>> * advantage is that we don't have to follow the strict
>>>>>>>>>>> linearity rule of
>>>>>>>>>>> * PFNMAP mappings in order to support COWable mappings.
>>>>>>>>>>>
>>>>>>>>>>> but it's true __vm_insert_mixed() ends up in the insert_pfn()
>>>>>>>>>>> path, so
>>>>>>>>>>> the above isn't really true, which makes me wonder if and in that
>>>>>>>>>>> case
>>>>>>>>>>> why there could any longer ever be a significant performance
>>>>>>>>>>> difference
>>>>>>>>>>> between MIXEDMAP and PFNMAP.
>>>>>>>>>> Yeah it's definitely confusing. I guess I'll hack up a patch and see
>>>>>>>>>> what sticks.
>>>>>>>>>>
>>>>>>>>>>> BTW regarding the TTM hugeptes, I don't think we ever landed that
>>>>>>>>>>> devmap
>>>>>>>>>>> hack, so they are (for the non-gup case) relying on
>>>>>>>>>>> vma_is_special_huge(). For the gup case, I think the bug is still
>>>>>>>>>>> there.
>>>>>>>>>> Maybe there's another devmap hack, but the ttm_vm_insert functions do
>>>>>>>>>> use PFN_DEV and all that. And I think that stops gup_fast from trying
>>>>>>>>>> to find the underlying page.
>>>>>>>>>> -Daniel
>>>>>>>>> Hmm perhaps it might, but I don't think so. The fix I tried out was
>>>>>>>>> to set
>>>>>>>>>
>>>>>>>>> PFN_DEV | PFN_MAP for huge PTEs which causes pfn_devmap() to be
>>>>>>>>> true, and
>>>>>>>>> then
>>>>>>>>>
>>>>>>>>> follow_devmap_pmd()->get_dev_pagemap() which returns NULL and
>>>>>>>>> gup_fast()
>>>>>>>>> backs off,
>>>>>>>>>
>>>>>>>>> in the end that would mean setting in stone that "if there is a huge
>>>>>>>>> devmap
>>>>>>>>> page table entry for which we haven't registered any devmap struct
>>>>>>>>> pages
>>>>>>>>> (get_dev_pagemap returns NULL), we should treat that as a "special"
>>>>>>>>> huge
>>>>>>>>> page table entry".
>>>>>>>>>
>>>>>>>>> From what I can tell, all code calling get_dev_pagemap() already
>>>>>>>>> does that,
>>>>>>>>> it's just a question of getting it accepted and formalizing it.
>>>>>>>> Oh I thought that's already how it works, since I didn't spot anything
>>>>>>>> else that would block gup_fast from falling over. I guess really would
>>>>>>>> need some testcases to make sure direct i/o (that's the easiest to test)
>>>>>>>> fails like we expect.
>>>>>>> Yeah, IIRC the "| PFN_MAP" is the missing piece for TTM huge ptes.
>>>>>>> Otherwise pmd_devmap() will not return true and since there is no
>>>>>>> pmd_special() things break.
>>>>>> Is that maybe the issue we have seen with amdgpu and huge pages?
>>>>> Yeah, essentially when you have a hugepte inserted by ttm, and it
>>>>> happens to point at system memory, then gup will work on that. And
>>>>> create all kinds of havoc.
>>>>>
>>>>>> Apart from that I'm lost guys, that devmap and gup stuff is not
>>>>>> something I have a good knowledge of apart from a one mile high view.
>>>>> I'm not really better, hence would be good to do a testcase and see.
>>>>> This should provoke it:
>>>>> - allocate nicely aligned bo in system memory
>>>>> - mmap, again nicely aligned to 2M
>>>>> - do some direct io from a filesystem into that mmap, that should trigger gup
>>>>> - before the gup completes free the mmap and bo so that ttm recycles
>>>>> the pages, which should trip up on the elevated refcount. If you wait
>>>>> until the direct io is completely, then I think nothing bad can be
>>>>> observed.
>>>>>
>>>>> Ofc if your amdgpu+hugepte issue is something else, then maybe we have
>>>>> another issue.
>>>>>
>>>>> Also usual caveat: I'm not an mm hacker either, so might be completely wrong.
>>>>> -Daniel
>>>> So I did the following quick experiment on vmwgfx, and it turns out that
>>>> with it,
>>>> fast gup never succeeds. Without the "| PFN_MAP", it typically succeeds
>>>>
>>>> I should probably craft an RFC formalizing this.
>>> Yeah I think that would be good. Maybe even more formalized if we also
>>> switch over to VM_PFNMAP, since afaiui these pte flags here only stop the
>>> fast gup path. And slow gup can still peak through VM_MIXEDMAP. Or
>>> something like that.
>>>
>>> Otoh your description of when it only sometimes succeeds would indicate my
>>> understanding of VM_PFNMAP vs VM_MIXEDMAP is wrong here.
>> My understanding from reading the vmf_insert_mixed() code is that iff
>> the arch has pte_special(), VM_MIXEDMAP should be harmless. But that's
>> not consistent with the vm_normal_page() doc. For architectures without
>> pte_special, VM_PFNMAP must be used, and then we must also block COW
>> mappings.
>>
>> If we can get someone can commit to verify that the potential PAT WC
>> performance issue is gone with PFNMAP, I can put together a series with
>> that included.
> Iirc when I checked there's not much archs without pte_special, so I
> guess that's why we luck out. Hopefully.
>
>> As for existing userspace using COW TTM mappings, I once had a couple of
>> test cases to verify that it actually worked, in particular together
>> with huge PMDs and PUDs where breaking COW would imply splitting those,
>> but I can't think of anything else actually wanting to do that other
>> than by mistake.
> Yeah disallowing MAP_PRIVATE mappings would be another good thing to
> lock down. Really doesn't make much sense.
> -Daniel
Yes, we can't allow them with PFNMAP + a non-linear address space...
/Thomas
>> /Thomas
>>
>>
>>> Christian, what's your take?
>>> -Daniel
>>>
>>>> /Thomas
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> index 6dc96cf66744..72b6fb17c984 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>> @@ -195,6 +195,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
>>>> *vmf,
>>>> pfn_t pfnt;
>>>> struct ttm_tt *ttm = bo->ttm;
>>>> bool write = vmf->flags & FAULT_FLAG_WRITE;
>>>> + struct dev_pagemap *pagemap;
>>>>
>>>> /* Fault should not cross bo boundary. */
>>>> page_offset &= ~(fault_page_size - 1);
>>>> @@ -210,6 +211,17 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
>>>> *vmf,
>>>> if ((pfn & (fault_page_size - 1)) != 0)
>>>> goto out_fallback;
>>>>
>>>> + /*
>>>> + * Huge entries must be special, that is marking them as devmap
>>>> + * with no backing device map range. If there is a backing
>>>> + * range, Don't insert a huge entry.
>>>> + */
>>>> + pagemap = get_dev_pagemap(pfn, NULL);
>>>> + if (pagemap) {
>>>> + put_dev_pagemap(pagemap);
>>>> + goto out_fallback;
>>>> + }
>>>> +
>>>> /* Check that memory is contiguous. */
>>>> if (!bo->mem.bus.is_iomem) {
>>>> for (i = 1; i < fault_page_size; ++i) {
>>>> @@ -223,7 +235,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
>>>> *vmf,
>>>> }
>>>> }
>>>>
>>>> - pfnt = __pfn_to_pfn_t(pfn, PFN_DEV);
>>>> + pfnt = __pfn_to_pfn_t(pfn, PFN_DEV | PFN_MAP);
>>>> if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT))
>>>> ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write);
>>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> @@ -236,6 +248,21 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
>>>> *vmf,
>>>> if (ret != VM_FAULT_NOPAGE)
>>>> goto out_fallback;
>>>>
>>>> +#if 1
>>>> + {
>>>> + int npages;
>>>> + struct page *page;
>>>> +
>>>> + npages = get_user_pages_fast_only(vmf->address, 1, 0,
>>>> &page);
>>>> + if (npages == 1) {
>>>> + DRM_WARN("Fast gup succeeded. Bad.\n");
>>>> + put_page(page);
>>>> + } else {
>>>> + DRM_INFO("Fast gup failed. Good.\n");
>>>> + }
>>>> + }
>>>> +#endif
>>>> +
>>>> return VM_FAULT_NOPAGE;
>>>> out_fallback:
>>>> count_vm_event(THP_FAULT_FALLBACK);
>>>>
>>>>
>>>>
>>>>
>>>>
>
>
next prev parent reply other threads:[~2021-03-11 15:38 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-23 10:59 [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2021-02-24 7:46 ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-02-24 8:45 ` Daniel Vetter
2021-02-24 9:15 ` Thomas Hellström (Intel)
2021-02-24 9:31 ` Daniel Vetter
2021-02-25 10:28 ` Christian König
2021-02-25 10:44 ` Daniel Vetter
2021-02-25 15:49 ` Daniel Vetter
2021-02-25 16:53 ` Christian König
2021-02-26 9:41 ` Thomas Hellström (Intel)
2021-02-26 13:28 ` Daniel Vetter
2021-02-27 8:06 ` Thomas Hellström (Intel)
2021-03-01 8:28 ` Daniel Vetter
2021-03-01 8:39 ` Thomas Hellström (Intel)
2021-03-01 9:05 ` Daniel Vetter
2021-03-01 9:21 ` Thomas Hellström (Intel)
2021-03-01 10:17 ` Christian König
2021-03-01 14:09 ` Daniel Vetter
2021-03-11 10:22 ` Thomas Hellström (Intel)
2021-03-11 13:00 ` Daniel Vetter
2021-03-11 13:12 ` Thomas Hellström (Intel)
2021-03-11 13:17 ` Daniel Vetter
2021-03-11 15:37 ` Thomas Hellström (Intel) [this message]
2021-03-12 7:51 ` Christian König
2021-02-24 18:46 ` Jason Gunthorpe
2021-02-25 10:30 ` Christian König
2021-02-25 10:45 ` Daniel Vetter
2021-02-26 3:57 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5385bb5-9d1b-3ce6-ee1f-a669d0780253@shipmail.org \
--to=thomas_os@shipmail.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=daniel.vetter@intel.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jgg@ziepe.ca \
--cc=john.stultz@linaro.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-media@vger.kernel.org \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox