From: Dan Williams <dan.j.williams@intel.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [git pull] vfs.git get_user_pages_fast() conversion
Date: Sat, 18 Nov 2017 13:49:16 -0800	[thread overview]
Message-ID: <CAA9_cmfJjT78M3E7Lt+x_tqT87JMMjHWNhC9kt6ASx-m4wOfXw@mail.gmail.com> (raw)
In-Reply-To: <20171117213215.GQ21978@ZenIV.linux.org.uk>
On Fri, Nov 17, 2017 at 1:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Nov 17, 2017 at 12:50:47PM -0800, Linus Torvalds wrote:
>
>> Not because the conversion was wrong, but because the original code is
>> so broken.
>>
>> In particular, that "1" that is unchanged in the arguments is correct
>> in the conversion, but it was completely wrong in the original, even
>> if it happened to work.
>>
>> it _should_ have been a FOLL_WRITE. Yes, it happens to have that
>> value, but it was broken.
>>
>> (I note that a bit of grepping shows we have the same issue in a stale
>> comment in mm/ksm.c).
>>
>> It would have been nice to see things like this mentioned in the commit message.
>>
>> Because I'm pretty sure you actually _realized_ that as you made the
>> conversion, but there's no sign of that in the logs, because the
>> commit message just says
>>
>>     atomisp: use get_user_pages_fast()
>>
>> without mentioning how broken the old case was (even it if happened to work).
>
> Point...  Frankly, my impression from the whole thing is that get_user_pages()
> and get_user_pages_unlocked() calling conventions are overcomplicated, especially
> since most of the callers either should be get_user_pages_fast() or happen
> to be inside implementations of get_user_pages_fast().  Grepping for the
> remaining callers right now yields just this:
>
> arch/cris/arch-v32/drivers/cryptocop.c:2722:    err = get_user_pages((unsigned long int)(oper.indata + prev_ix),
> arch/cris/arch-v32/drivers/cryptocop.c:2736:            err = get_user_pages((unsigned long int)oper.cipher_outdata,
> arch/ia64/kernel/err_inject.c:145:      ret = get_user_pages(virt_addr, 1, FOLL_WRITE, NULL, NULL);
> arch/x86/mm/mpx.c:550:  gup_ret = get_user_pages((unsigned long)addr, nr_pages,
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:646:            r = get_user_pages(userptr, num_pages, flags, p, NULL);
> drivers/gpu/drm/radeon/radeon_ttm.c:571:                r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
> drivers/infiniband/core/umem.c:194:             ret = get_user_pages(cur_base,
> drivers/infiniband/hw/mthca/mthca_memfree.c:475:        ret = get_user_pages(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages, NUL
> L);
> drivers/infiniband/hw/qib/qib_user_pages.c:70:          ret = get_user_pages(start_page + got * PAGE_SIZE,
> drivers/infiniband/hw/usnic/usnic_uiom.c:146:           ret = get_user_pages(cur_base,
> drivers/media/pci/ivtv/ivtv-udma.c:127: err = get_user_pages_unlocked(user_dma.uaddr, user_dma.page_count,
> drivers/media/pci/ivtv/ivtv-yuv.c:78:   y_pages = get_user_pages_unlocked(y_dma.uaddr,
> drivers/media/pci/ivtv/ivtv-yuv.c:82:           uv_pages = get_user_pages_unlocked(uv_dma.uaddr,
> drivers/media/v4l2-core/videobuf-dma-sg.c:188:  err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
> drivers/misc/mic/scif/scif_rma.c:1399:          pinned_pages->nr_pages = get_user_pages(
> drivers/misc/sgi-gru/grufault.c:201:    if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> mm/mempolicy.c:829:     err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
> virt/kvm/kvm_main.c:1326:       return get_user_pages(start, 1, flags, page, NULL);
> virt/kvm/kvm_main.c:1333:       rc = get_user_pages(addr, 1, flags, NULL, NULL);
> virt/kvm/kvm_main.c:1395:               npages = get_user_pages_unlocked(addr, 1, page, flags);
>
> and even those are dubious - e.g. cris ones could bloody well become a pair of
> get_user_pages_fast(), without ->mmap_sem being held across both calls.  Itanic
> one is almost certainly buggered - we are not holding ->mmap_sem there.
>
> Hell knows...  I wonder if exposing FOLL_... thing outside of mm/gup.c actually
> makes sense.  With the users so heavily skewed towards just two combinations of
> flags (0 and FOLL_WRITE)...
>
> And for get_user_pages() itself it's even more ridiculous - vmalist (the last
> argument) is non-NULL in only one caller.  Which uses it only to check if all
> of the VMAs happen to be hugetlb ones, apparently.
One note here, we've discovered that filesystem-dax mappings are
broken with respect to DMA and fallocate(PUNCH_HOLE). As a band-aid
we're looking to change rdma, video/media, and any other subsystem
that tries to pin pages indefinitely to fail in the vma_is_dax() case
with a new get_user_pages_longterm() helper [1]. Then the plan is to
follow on with new infrastructure to register memory with a file
lease. We need an interface for the kernel to notify userspace that
the pages it pinned need to be unpinned because the file blocks are
being deallocated (where 'file blocks' and 'memory pages' are one in
the same in the dax case).
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013295.html
next prev parent reply	other threads:[~2017-11-18 21:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-17  3:02 [git pull] vfs.git get_user_pages_fast() conversion Al Viro
2017-11-17 20:50 ` Linus Torvalds
2017-11-17 21:32   ` Al Viro
2017-11-18 21:45     ` Al Viro
2017-11-21 18:35       ` Andrea Arcangeli
2017-11-18 21:49     ` Dan Williams [this message]
2017-11-19 21:23     ` mthca misuse of get_user_pages() (was Re: [git pull] vfs.git get_user_pages_fast() conversion) Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=CAA9_cmfJjT78M3E7Lt+x_tqT87JMMjHWNhC9kt6ASx-m4wOfXw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).