public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>
Cc: "Daniel Vetter" <daniel@ffwll.ch>,
	"Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"John Stultz" <john.stultz@linaro.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap
Date: Thu, 11 Mar 2021 14:00:21 +0100	[thread overview]
Message-ID: <YEoUZe8BtvQdv3TG@phenom.ffwll.local> (raw)
In-Reply-To: <9d608c61-c64c-dcde-c719-59a970144404@shipmail.org>

On Thu, Mar 11, 2021 at 11:22:06AM +0100, Thomas Hellström (Intel) wrote:
> 
> On 3/1/21 3:09 PM, Daniel Vetter wrote:
> > On Mon, Mar 1, 2021 at 11:17 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > 
> > > 
> > > Am 01.03.21 um 10:21 schrieb Thomas Hellström (Intel):
> > > > On 3/1/21 10:05 AM, Daniel Vetter wrote:
> > > > > On Mon, Mar 01, 2021 at 09:39:53AM +0100, Thomas Hellström (Intel)
> > > > > wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On 3/1/21 9:28 AM, Daniel Vetter wrote:
> > > > > > > On Sat, Feb 27, 2021 at 9:06 AM Thomas Hellström (Intel)
> > > > > > > <thomas_os@shipmail.org> wrote:
> > > > > > > > On 2/26/21 2:28 PM, Daniel Vetter wrote:
> > > > > > > > > So I think it stops gup. But I haven't verified at all. Would be
> > > > > > > > > good
> > > > > > > > > if Christian can check this with some direct io to a buffer in
> > > > > > > > > system
> > > > > > > > > memory.
> > > > > > > > Hmm,
> > > > > > > > 
> > > > > > > > Docs (again vm_normal_page() say)
> > > > > > > > 
> > > > > > > >      * VM_MIXEDMAP mappings can likewise contain memory with or
> > > > > > > > without "struct
> > > > > > > >      * page" backing, however the difference is that _all_ pages
> > > > > > > > with a struct
> > > > > > > >      * page (that is, those where pfn_valid is true) are refcounted
> > > > > > > > and
> > > > > > > > considered
> > > > > > > >      * normal pages by the VM. The disadvantage is that pages are
> > > > > > > > refcounted
> > > > > > > >      * (which can be slower and simply not an option for some PFNMAP
> > > > > > > > users). The
> > > > > > > >      * advantage is that we don't have to follow the strict
> > > > > > > > linearity rule of
> > > > > > > >      * PFNMAP mappings in order to support COWable mappings.
> > > > > > > > 
> > > > > > > > but it's true __vm_insert_mixed() ends up in the insert_pfn()
> > > > > > > > path, so
> > > > > > > > the above isn't really true, which makes me wonder if and in that
> > > > > > > > case
> > > > > > > > why there could any longer ever be a significant performance
> > > > > > > > difference
> > > > > > > > between MIXEDMAP and PFNMAP.
> > > > > > > Yeah it's definitely confusing. I guess I'll hack up a patch and see
> > > > > > > what sticks.
> > > > > > > 
> > > > > > > > BTW regarding the TTM hugeptes, I don't think we ever landed that
> > > > > > > > devmap
> > > > > > > > hack, so they are (for the non-gup case) relying on
> > > > > > > > vma_is_special_huge(). For the gup case, I think the bug is still
> > > > > > > > there.
> > > > > > > Maybe there's another devmap hack, but the ttm_vm_insert functions do
> > > > > > > use PFN_DEV and all that. And I think that stops gup_fast from trying
> > > > > > > to find the underlying page.
> > > > > > > -Daniel
> > > > > > Hmm perhaps it might, but I don't think so. The fix I tried out was
> > > > > > to set
> > > > > > 
> > > > > > PFN_DEV | PFN_MAP for huge PTEs which causes pfn_devmap() to be
> > > > > > true, and
> > > > > > then
> > > > > > 
> > > > > > follow_devmap_pmd()->get_dev_pagemap() which returns NULL and
> > > > > > gup_fast()
> > > > > > backs off,
> > > > > > 
> > > > > > in the end that would mean setting in stone that "if there is a huge
> > > > > > devmap
> > > > > > page table entry for which we haven't registered any devmap struct
> > > > > > pages
> > > > > > (get_dev_pagemap returns NULL), we should treat that as a "special"
> > > > > > huge
> > > > > > page table entry".
> > > > > > 
> > > > > >   From what I can tell, all code calling get_dev_pagemap() already
> > > > > > does that,
> > > > > > it's just a question of getting it accepted and formalizing it.
> > > > > Oh I thought that's already how it works, since I didn't spot anything
> > > > > else that would block gup_fast from falling over. I guess really would
> > > > > need some testcases to make sure direct i/o (that's the easiest to test)
> > > > > fails like we expect.
> > > > Yeah, IIRC the "| PFN_MAP" is the missing piece for TTM huge ptes.
> > > > Otherwise pmd_devmap() will not return true and since there is no
> > > > pmd_special() things break.
> > > Is that maybe the issue we have seen with amdgpu and huge pages?
> > Yeah, essentially when you have a hugepte inserted by ttm, and it
> > happens to point at system memory, then gup will work on that. And
> > create all kinds of havoc.
> > 
> > > Apart from that I'm lost guys, that devmap and gup stuff is not
> > > something I have a good knowledge of apart from a one mile high view.
> > I'm not really better, hence would be good to do a testcase and see.
> > This should provoke it:
> > - allocate nicely aligned bo in system memory
> > - mmap, again nicely aligned to 2M
> > - do some direct io from a filesystem into that mmap, that should trigger gup
> > - before the gup completes free the mmap and bo so that ttm recycles
> > the pages, which should trip up on the elevated refcount. If you wait
> > until the direct io is completely, then I think nothing bad can be
> > observed.
> > 
> > Ofc if your amdgpu+hugepte issue is something else, then maybe we have
> > another issue.
> > 
> > Also usual caveat: I'm not an mm hacker either, so might be completely wrong.
> > -Daniel
> 
> So I did the following quick experiment on vmwgfx, and it turns out that
> with it,
> fast gup never succeeds. Without the "| PFN_MAP", it typically succeeds
> 
> I should probably craft an RFC formalizing this.

Yeah I think that would be good. Maybe even more formalized if we also
switch over to VM_PFNMAP, since afaiui these pte flags here only stop the
fast gup path. And slow gup can still peak through VM_MIXEDMAP. Or
something like that.

Otoh your description of when it only sometimes succeeds would indicate my
understanding of VM_PFNMAP vs VM_MIXEDMAP is wrong here.

Christian, what's your take?
-Daniel

> 
> /Thomas
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 6dc96cf66744..72b6fb17c984 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -195,6 +195,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
> *vmf,
>         pfn_t pfnt;
>         struct ttm_tt *ttm = bo->ttm;
>         bool write = vmf->flags & FAULT_FLAG_WRITE;
> +       struct dev_pagemap *pagemap;
> 
>         /* Fault should not cross bo boundary. */
>         page_offset &= ~(fault_page_size - 1);
> @@ -210,6 +211,17 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
> *vmf,
>         if ((pfn & (fault_page_size - 1)) != 0)
>                 goto out_fallback;
> 
> +       /*
> +        * Huge entries must be special, that is marking them as devmap
> +        * with no backing device map range. If there is a backing
> +        * range, Don't insert a huge entry.
> +        */
> +       pagemap = get_dev_pagemap(pfn, NULL);
> +       if (pagemap) {
> +               put_dev_pagemap(pagemap);
> +               goto out_fallback;
> +       }
> +
>         /* Check that memory is contiguous. */
>         if (!bo->mem.bus.is_iomem) {
>                 for (i = 1; i < fault_page_size; ++i) {
> @@ -223,7 +235,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
> *vmf,
>                 }
>         }
> 
> -       pfnt = __pfn_to_pfn_t(pfn, PFN_DEV);
> +       pfnt = __pfn_to_pfn_t(pfn, PFN_DEV | PFN_MAP);
>         if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT))
>                 ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write);
>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> @@ -236,6 +248,21 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault
> *vmf,
>         if (ret != VM_FAULT_NOPAGE)
>                 goto out_fallback;
> 
> +#if 1
> +       {
> +               int npages;
> +               struct page *page;
> +
> +               npages = get_user_pages_fast_only(vmf->address, 1, 0,
> &page);
> +               if (npages == 1) {
> +                       DRM_WARN("Fast gup succeeded. Bad.\n");
> +                       put_page(page);
> +               } else {
> +                       DRM_INFO("Fast gup failed. Good.\n");
> +               }
> +       }
> +#endif
> +
>         return VM_FAULT_NOPAGE;
>  out_fallback:
>         count_vm_event(THP_FAULT_FALLBACK);
> 
> 
> 
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-03-11 13:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 10:59 [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2021-02-24  7:46 ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-02-24  8:45   ` Daniel Vetter
2021-02-24  9:15     ` Thomas Hellström (Intel)
2021-02-24  9:31       ` Daniel Vetter
2021-02-25 10:28         ` Christian König
2021-02-25 10:44           ` Daniel Vetter
2021-02-25 15:49             ` Daniel Vetter
2021-02-25 16:53               ` Christian König
2021-02-26  9:41               ` Thomas Hellström (Intel)
2021-02-26 13:28                 ` Daniel Vetter
2021-02-27  8:06                   ` Thomas Hellström (Intel)
2021-03-01  8:28                     ` Daniel Vetter
2021-03-01  8:39                       ` Thomas Hellström (Intel)
2021-03-01  9:05                         ` Daniel Vetter
2021-03-01  9:21                           ` Thomas Hellström (Intel)
2021-03-01 10:17                             ` Christian König
2021-03-01 14:09                               ` Daniel Vetter
2021-03-11 10:22                                 ` Thomas Hellström (Intel)
2021-03-11 13:00                                   ` Daniel Vetter [this message]
2021-03-11 13:12                                     ` Thomas Hellström (Intel)
2021-03-11 13:17                                       ` Daniel Vetter
2021-03-11 15:37                                         ` Thomas Hellström (Intel)
2021-03-12  7:51                                         ` Christian König
2021-02-24 18:46     ` Jason Gunthorpe
2021-02-25 10:30       ` Christian König
2021-02-25 10:45         ` Daniel Vetter
2021-02-26  3:57 ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEoUZe8BtvQdv3TG@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=john.stultz@linaro.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=thomas_os@shipmail.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox