From: Al Viro <viro@zeniv.linux.org.uk>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
John Hubbard <jhubbard@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>, Miklos Szeredi <miklos@szeredi.hu>,
"Darrick J . Wong" <djwong@kernel.org>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna@kernel.org>,
David Hildenbrand <david@redhat.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines
Date: Fri, 16 Sep 2022 02:55:53 +0100 [thread overview]
Message-ID: <YyPXqfyf37CUbOf0@ZenIV> (raw)
In-Reply-To: <20220915081625.6a72nza6yq4l5etp@quack3>
On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote:
> > How would that work? What protects the area where you want to avoid running
> > into pinned pages from previously acceptable page getting pinned? If "they
> > must have been successfully unmapped" is a part of what you are planning, we
> > really do have a problem...
>
> But this is a very good question. So far the idea was that we lock the
> page, unmap (or writeprotect) the page, and then check pincount == 0 and
> that is a reliable method for making sure page data is stable (until we
> unlock the page & release other locks blocking page faults and writes). But
> once suddently ordinary page references can be used to create pins this
> does not work anymore. Hrm.
>
> Just brainstorming ideas now: So we'd either need to obtain the pins early
> when we still have the virtual address (but I guess that is often not
> practical but should work e.g. for normal direct IO path) or we need some
> way to "simulate" the page fault when pinning the page, just don't map it
> into page tables in the end. This simulated page fault could be perhaps
> avoided if rmap walk shows that the page is already mapped somewhere with
> suitable permissions.
OK... I'd done some digging; results so far
* READ vs. WRITE turned out to be an awful way to specify iov_iter
data direction. Local iov_iter branch so far:
get rid of unlikely() on page_copy_sane() calls
csum_and_copy_to_iter(): handle ITER_DISCARD
[s390] copy_oldmem_kernel() - WRITE is "data source", not destination
[fsi] WRITE is "data source", not destination...
[infiniband] READ is "data destination", not source...
[s390] zcore: WRITE is "data source", not destination...
[target] fix iov_iter_bvec() "direction" argument
[vhost] fix 'direction' argument of iov_iter_{init,bvec}()
[xen] fix "direction" argument of iov_iter_kvec()
[trace] READ means "data destination", not source...
iov_iter: saner checks for attempt to copy to/from iterator
use less confusing names for iov_iter direction initializers
those 8 commits in the middle consist of fixes, some of them with more than
one call site affected. Folks keep going "oh, we are going to copy data
into that iterator, must be WRITE". Wrong - WRITE means "as for write(2)",
i.e. the data _source_, not data destination. And the same kind of bugs
goes in the opposite direction, of course.
I think something like ITER_DEST vs. ITER_SOURCE would be less
confusing.
* anything that goes with ITER_SOURCE doesn't need pin.
* ITER_IOVEC/ITER_UBUF need pin for get_pages and for nothing else.
Need to grab reference on get_pages, obviously.
* even more obviously, ITER_DISCARD is irrelevant here.
* ITER_PIPE only modifies anonymous pages that had been allocated
by iov_iter primitives and hadn't been observed by anything outside until
we are done with said ITER_PIPE.
* quite a few instances are similar to e.g. REQ_OP_READ handling in
/dev/loop - we work with ITER_BVEC there and we do modify the page contents,
but the damn thing would better be given to us locked and stay locked until
all involved modifications (be it real IO/decoding/whatever) is complete.
That ought to be safe, unless I'm missing something.
That doesn't cover everything; still going through the list...
next prev parent reply other threads:[~2022-09-16 1:56 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 4:18 [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() John Hubbard
2022-08-31 4:18 ` [PATCH v2 1/7] mm: change release_pages() to use unsigned long for npages John Hubbard
2022-08-31 4:18 ` [PATCH v2 2/7] mm/gup: introduce pin_user_page() John Hubbard
2022-09-06 6:37 ` Christoph Hellwig
2022-09-06 7:12 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 3/7] block: add dio_w_*() wrappers for pin, unpin user pages John Hubbard
2022-08-31 4:18 ` [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines John Hubbard
2022-09-01 0:42 ` Al Viro
2022-09-01 1:48 ` John Hubbard
2022-09-06 6:47 ` Christoph Hellwig
2022-09-06 7:44 ` John Hubbard
2022-09-06 7:48 ` Christoph Hellwig
2022-09-06 7:58 ` John Hubbard
2022-09-07 8:50 ` Christoph Hellwig
2022-09-06 10:21 ` Jan Kara
2022-09-07 8:45 ` Christoph Hellwig
2022-09-14 3:51 ` Al Viro
2022-09-14 14:52 ` Jan Kara
2022-09-14 16:42 ` Al Viro
2022-09-15 8:16 ` Jan Kara
2022-09-16 1:55 ` Al Viro [this message]
2022-09-20 5:02 ` Al Viro
2022-09-22 14:36 ` Christoph Hellwig
2022-09-22 14:43 ` David Hildenbrand
2022-09-22 14:45 ` Christoph Hellwig
2022-09-22 2:22 ` Al Viro
2022-09-22 6:09 ` John Hubbard
2022-09-22 11:29 ` Jan Kara
2022-09-23 3:19 ` Al Viro
2022-09-23 4:05 ` John Hubbard
2022-09-23 8:39 ` Christoph Hellwig
2022-09-23 12:22 ` Jan Kara
2022-09-23 4:34 ` John Hubbard
2022-09-22 14:38 ` Christoph Hellwig
2022-09-23 4:22 ` Al Viro
2022-09-23 8:44 ` Christoph Hellwig
2022-09-23 16:13 ` Al Viro
2022-09-26 15:53 ` Christoph Hellwig
2022-09-26 19:55 ` Al Viro
2022-09-22 14:31 ` Christoph Hellwig
2022-09-22 14:36 ` Al Viro
2022-08-31 4:18 ` [PATCH v2 5/7] block, bio, fs: convert most filesystems to pin_user_pages_fast() John Hubbard
2022-09-06 6:48 ` Christoph Hellwig
2022-09-06 7:15 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 6/7] NFS: direct-io: convert to FOLL_PIN pages John Hubbard
2022-09-06 6:49 ` Christoph Hellwig
2022-09-06 7:16 ` John Hubbard
2022-08-31 4:18 ` [PATCH v2 7/7] fuse: convert direct IO paths to use FOLL_PIN John Hubbard
2022-08-31 10:37 ` Miklos Szeredi
2022-09-01 1:33 ` John Hubbard
2022-09-06 6:36 ` [PATCH v2 0/7] convert most filesystems to pin_user_pages_fast() Christoph Hellwig
2022-09-06 7:10 ` John Hubbard
2022-09-06 7:22 ` Christoph Hellwig
2022-09-06 7:37 ` John Hubbard
2022-09-06 7:46 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YyPXqfyf37CUbOf0@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=miklos@szeredi.hu \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).