public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"John Stultz" <john.stultz@linaro.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap
Date: Thu, 25 Feb 2021 17:53:53 +0100	[thread overview]
Message-ID: <432a0da0-ff0e-9b2b-4aee-13f20522fee3@amd.com> (raw)
In-Reply-To: <CAKMK7uFezcV52oTZbHeve2HFFATeCGyK6zTT6nE1KVP69QRr0A@mail.gmail.com>



Am 25.02.21 um 16:49 schrieb Daniel Vetter:
> On Thu, Feb 25, 2021 at 11:44 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Feb 25, 2021 at 11:28:31AM +0100, Christian König wrote:
>>> Am 24.02.21 um 10:31 schrieb Daniel Vetter:
>>>> On Wed, Feb 24, 2021 at 10:16 AM Thomas Hellström (Intel)
>>>> <thomas_os@shipmail.org> wrote:
>>>>> On 2/24/21 9:45 AM, Daniel Vetter wrote:
>>>>>> On Wed, Feb 24, 2021 at 8:46 AM Thomas Hellström (Intel)
>>>>>> <thomas_os@shipmail.org> wrote:
>>>>>>> On 2/23/21 11:59 AM, Daniel Vetter wrote:
>>>>>>>> tldr; DMA buffers aren't normal memory, expecting that you can use
>>>>>>>> them like that (like calling get_user_pages works, or that they're
>>>>>>>> accounting like any other normal memory) cannot be guaranteed.
>>>>>>>>
>>>>>>>> Since some userspace only runs on integrated devices, where all
>>>>>>>> buffers are actually all resident system memory, there's a huge
>>>>>>>> temptation to assume that a struct page is always present and useable
>>>>>>>> like for any more pagecache backed mmap. This has the potential to
>>>>>>>> result in a uapi nightmare.
>>>>>>>>
>>>>>>>> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
>>>>>>>> blocks get_user_pages and all the other struct page based
>>>>>>>> infrastructure for everyone. In spirit this is the uapi counterpart to
>>>>>>>> the kernel-internal CONFIG_DMABUF_DEBUG.
>>>>>>>>
>>>>>>>> Motivated by a recent patch which wanted to swich the system dma-buf
>>>>>>>> heap to vm_insert_page instead of vm_insert_pfn.
>>>>>>>>
>>>>>>>> v2:
>>>>>>>>
>>>>>>>> Jason brought up that we also want to guarantee that all ptes have the
>>>>>>>> pte_special flag set, to catch fast get_user_pages (on architectures
>>>>>>>> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
>>>>>>>> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
>>>>>>>>
>>>>>>>>     From auditing the various functions to insert pfn pte entires
>>>>>>>> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
>>>>>>>> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
>>>>>>>> this should be the correct flag to check for.
>>>>>>>>
>>>>>>> If we require VM_PFNMAP, for ordinary page mappings, we also need to
>>>>>>> disallow COW mappings, since it will not work on architectures that
>>>>>>> don't have CONFIG_ARCH_HAS_PTE_SPECIAL, (see the docs for vm_normal_page()).
>>>>>> Hm I figured everyone just uses MAP_SHARED for buffer objects since
>>>>>> COW really makes absolutely no sense. How would we enforce this?
>>>>> Perhaps returning -EINVAL on is_cow_mapping() at mmap time. Either that
>>>>> or allowing MIXEDMAP.
>>>>>
>>>>>>> Also worth noting is the comment in  ttm_bo_mmap_vma_setup() with
>>>>>>> possible performance implications with x86 + PAT + VM_PFNMAP + normal
>>>>>>> pages. That's a very old comment, though, and might not be valid anymore.
>>>>>> I think that's why ttm has a page cache for these, because it indeed
>>>>>> sucks. The PAT changes on pages are rather expensive.
>>>>> IIRC the page cache was implemented because of the slowness of the
>>>>> caching mode transition itself, more specifically the wbinvd() call +
>>>>> global TLB flush.
>>> Yes, exactly that. The global TLB flush is what really breaks our neck here
>>> from a performance perspective.
>>>
>>>>>> There is still an issue for iomem mappings, because the PAT validation
>>>>>> does a linear walk of the resource tree (lol) for every vm_insert_pfn.
>>>>>> But for i915 at least this is fixed by using the io_mapping
>>>>>> infrastructure, which does the PAT reservation only once when you set
>>>>>> up the mapping area at driver load.
>>>>> Yes, I guess that was the issue that the comment describes, but the
>>>>> issue wasn't there with vm_insert_mixed() + VM_MIXEDMAP.
>>>>>
>>>>>> Also TTM uses VM_PFNMAP right now for everything, so it can't be a
>>>>>> problem that hurts much :-)
>>>>> Hmm, both 5.11 and drm-tip appears to still use MIXEDMAP?
>>>>>
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_bo_vm.c%23L554&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Ca93d0dbbc0484fec118808d8d9a4fc22%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637498649935442516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7%2BO0WNdBF62eVDy7u4hRydsfviF6dBJEDeZiYIzQAcc%3D&amp;reserved=0
>>>> Uh that's bad, because mixed maps pointing at struct page wont stop
>>>> gup. At least afaik.
>>> Hui? I'm pretty sure MIXEDMAP stops gup as well. Otherwise we would have
>>> already seen tons of problems with the page cache.
>> On any architecture which has CONFIG_ARCH_HAS_PTE_SPECIAL vm_insert_mixed
>> boils down to vm_insert_pfn wrt gup. And special pte stops gup fast path.
>>
>> But if you don't have VM_IO or VM_PFNMAP set, then I'm not seeing how
>> you're stopping gup slow path. See check_vma_flags() in mm/gup.c.
>>
>> Also if you don't have CONFIG_ARCH_HAS_PTE_SPECIAL then I don't think
>> vm_insert_mixed even works on iomem pfns. There's the devmap exception,
>> but we're not devmap. Worse ttm abuses some accidental codepath to smuggle
>> in hugepte support by intentionally not being devmap.
>>
>> So I'm really not sure this works as we think it should. Maybe good to do
>> a quick test program on amdgpu with a buffer in system memory only and try
>> to do direct io into it. If it works, you have a problem, and a bad one.
> That's probably impossible, since a quick git grep shows that pretty
> much anything reasonable has special ptes: arc, arm, arm64, powerpc,
> riscv, s390, sh, sparc, x86. I don't think you'll have a platform
> where you can plug an amdgpu in and actually exercise the bug :-)
>
> So maybe we should just switch over to VM_PFNMAP for ttm for more clarity?

Maybe yes, but not sure.

I've once had a request to do this from some google guys, but rejected 
it because I wasn't sure of the consequences.

Christian.

> -Daniel
>
>
>>> Regards,
>>> Christian.
>>>
>>>> Christian, do we need to patch this up, and maybe fix up ttm fault
>>>> handler to use io_mapping so the vm_insert_pfn stuff is fast?
>>>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Ca93d0dbbc0484fec118808d8d9a4fc22%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637498649935442516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PmdIbYM6kemXstScf2OoZU9YyXGGzzNzeWEyL8ZDnfo%3D&amp;reserved=0
>
>


  reply	other threads:[~2021-02-25 16:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 10:59 [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2021-02-24  7:46 ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-02-24  8:45   ` Daniel Vetter
2021-02-24  9:15     ` Thomas Hellström (Intel)
2021-02-24  9:31       ` Daniel Vetter
2021-02-25 10:28         ` Christian König
2021-02-25 10:44           ` Daniel Vetter
2021-02-25 15:49             ` Daniel Vetter
2021-02-25 16:53               ` Christian König [this message]
2021-02-26  9:41               ` Thomas Hellström (Intel)
2021-02-26 13:28                 ` Daniel Vetter
2021-02-27  8:06                   ` Thomas Hellström (Intel)
2021-03-01  8:28                     ` Daniel Vetter
2021-03-01  8:39                       ` Thomas Hellström (Intel)
2021-03-01  9:05                         ` Daniel Vetter
2021-03-01  9:21                           ` Thomas Hellström (Intel)
2021-03-01 10:17                             ` Christian König
2021-03-01 14:09                               ` Daniel Vetter
2021-03-11 10:22                                 ` Thomas Hellström (Intel)
2021-03-11 13:00                                   ` Daniel Vetter
2021-03-11 13:12                                     ` Thomas Hellström (Intel)
2021-03-11 13:17                                       ` Daniel Vetter
2021-03-11 15:37                                         ` Thomas Hellström (Intel)
2021-03-12  7:51                                         ` Christian König
2021-02-24 18:46     ` Jason Gunthorpe
2021-02-25 10:30       ` Christian König
2021-02-25 10:45         ` Daniel Vetter
2021-02-26  3:57 ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=432a0da0-ff0e-9b2b-4aee-13f20522fee3@amd.com \
    --to=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=john.stultz@linaro.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=thomas_os@shipmail.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox