From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
David Sterba <dsterba@suse.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Anton Altaparmakov <anton@tuxera.com>,
David Howells <dhowells@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: [RFC][PATCHSET] iov_iter work
Date: Sun, 6 Jun 2021 23:29:59 +0000 [thread overview]
Message-ID: <YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <CAHk-=wj6ZiTgqbeCPtzP+5tgHjur6Amag66YWub_2DkGpP9h-Q@mail.gmail.com>
On Sun, Jun 06, 2021 at 03:05:49PM -0700, Linus Torvalds wrote:
> The end result certainly looks cleaner than before, although I can't
> say that it's pretty. It's still haitry and grotty, but at least
> _some_ of the hairiness has been removed (yay for
> _copy_from_iter_full() being *much* simpler now, and
> iterate_all_kinds() being gone).
>
> I also have to admit to despising what iov_iter_init() ends up looking
> like. Not because I think that using those initializer assignments to
> set it is wrong, but simply because the initializers are basically
> identical except for "iter_type".
TBH, I wonder if we still need that these days. That is to say,
do we even call iov_iter_init() when uaccess_kernel() is true?
We certainly used to do that, but... We have
* copy_from_kernel_nofault(); calls __copy_from_user_inatomic() under
KERNEL_DS and pagefault_disable(), and that would better *not* fuck with
iov_iter inside.
* copy_to_kernel_nofault(); similar, with s/_from_/_to_/.
* strncpy_from_kernel_nofault(); ditto, with __get_user() instead of
__copy_..._user_inatomic().
* unaligned trap handling in kernel mode on sh and itanic
* arm dumping insns in oops handler
* 3 turds in arm oabi compat: sys_semtimedop_time32(), sys_epoll_wait()
and sys_fcntl64() (for [GS]ETLK... fcntls) called under KERNEL_DS.
The last one should simply call fcntl_[gs]etlk() and be done with
that, not that it was going to play with any iov_iter anyway.
epoll_wait() is currently not using iov_iter (it does use __put_user()
and not in a pretty way, but that's a separate story). And
for semtimedop_time32(), we do not have any iov_iter in sight *and*
I would rather get rid of KERNEL_DS there completely - all it takes
is a variant of do_semtimedop() with tsops already copied into the
kernel. In any case, none of those do iov_iter_init().
I have *not* checked if any kernel threads might be playing
with iov_iter_init() outside of kthread_use_mm(); some might, but
IMO any such place would be a good candidate for conversion to
explicit iov_iter_kvec()...
BTW, speaking of initializers... Pavel, could you check if
the following breaks anything? Unless I'm misreading __io_import_fixed(),
looks like that's what it's trying to do...
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f46acbbeed57..9bd2da9a4c3d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2773,57 +2773,14 @@ static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter
{
size_t len = req->rw.len;
u64 buf_end, buf_addr = req->rw.addr;
- size_t offset;
if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end)))
return -EFAULT;
/* not inside the mapped region */
if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end))
return -EFAULT;
-
- /*
- * May not be a start of buffer, set size appropriately
- * and advance us to the beginning.
- */
- offset = buf_addr - imu->ubuf;
- iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
-
- if (offset) {
- /*
- * Don't use iov_iter_advance() here, as it's really slow for
- * using the latter parts of a big fixed buffer - it iterates
- * over each segment manually. We can cheat a bit here, because
- * we know that:
- *
- * 1) it's a BVEC iter, we set it up
- * 2) all bvecs are PAGE_SIZE in size, except potentially the
- * first and last bvec
- *
- * So just find our index, and adjust the iterator afterwards.
- * If the offset is within the first bvec (or the whole first
- * bvec, just use iov_iter_advance(). This makes it easier
- * since we can just skip the first segment, which may not
- * be PAGE_SIZE aligned.
- */
- const struct bio_vec *bvec = imu->bvec;
-
- if (offset <= bvec->bv_len) {
- iov_iter_advance(iter, offset);
- } else {
- unsigned long seg_skip;
-
- /* skip first vec */
- offset -= bvec->bv_len;
- seg_skip = 1 + (offset >> PAGE_SHIFT);
-
- iter->bvec = bvec + seg_skip;
- iter->nr_segs -= seg_skip;
- iter->count -= bvec->bv_len + offset;
- iter->iov_offset = offset & ~PAGE_MASK;
- }
- }
-
- return 0;
+ return import_pagevec(rw, buf_addr, len, imu->ubuf,
+ imu->nr_bvecs, imu->bvec, iter);
}
static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index fd88d9911dad..f6291e981d07 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -299,5 +299,8 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
struct iov_iter *i, bool compat);
int import_single_range(int type, void __user *buf, size_t len,
struct iovec *iov, struct iov_iter *i);
+int import_pagevec(int rw, unsigned long from, size_t len,
+ unsigned long base, unsigned nr_pages,
+ struct bio_vec *bvec, struct iov_iter *i);
#endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 11b39bd1d1ab..4a771fcb529b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1982,3 +1982,21 @@ int import_single_range(int rw, void __user *buf, size_t len,
return 0;
}
EXPORT_SYMBOL(import_single_range);
+
+int import_pagevec(int rw, unsigned long from, size_t len,
+ unsigned long base, unsigned nr_pages,
+ struct bio_vec *bvec, struct iov_iter *i)
+
+{
+ unsigned long skip_pages = (from >> PAGE_SHIFT) - (base >> PAGE_SHIFT);
+
+ *i = (struct iov_iter){
+ .iter_type = ITER_BVEC,
+ .data_source = rw,
+ .bvec = bvec + skip_pages,
+ .nr_segs = nr_pages - skip_pages,
+ .iov_offset = skip_pages ? from & ~PAGE_MASK : from - base,
+ .count = len
+ };
+ return 0;
+}
--
2.11.0
next prev parent reply other threads:[~2021-06-06 23:30 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-06 19:07 [RFC][PATCHSET] iov_iter work Al Viro
2021-06-06 19:10 ` [RFC PATCH 01/37] ntfs_copy_from_user_iter(): don't bother with copying iov_iter Al Viro
2021-06-06 19:10 ` [RFC PATCH 02/37] generic_perform_write()/iomap_write_actor(): saner logics for short copy Al Viro
2021-06-06 19:10 ` [RFC PATCH 03/37] fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count() Al Viro
2021-06-06 19:10 ` [RFC PATCH 04/37] iov_iter: Remove iov_iter_for_each_range() Al Viro
2021-06-06 19:10 ` [RFC PATCH 05/37] teach copy_page_to_iter() to handle compound pages Al Viro
2021-06-06 19:10 ` [RFC PATCH 06/37] copy_page_to_iter(): fix ITER_DISCARD case Al Viro
2021-06-06 19:10 ` [RFC PATCH 07/37] [xarray] iov_iter_fault_in_readable() should do nothing in xarray case Al Viro
2021-06-06 19:10 ` [RFC PATCH 08/37] iov_iter_advance(): use consistent semantics for move past the end Al Viro
2021-06-06 19:10 ` [RFC PATCH 09/37] iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert() Al Viro
2021-06-06 19:10 ` [RFC PATCH 10/37] iov_iter: reorder handling of flavours in primitives Al Viro
2021-06-06 19:10 ` [RFC PATCH 11/37] iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD Al Viro
2021-06-06 19:10 ` [RFC PATCH 12/37] iov_iter: separate direction from flavour Al Viro
2021-06-06 19:10 ` [RFC PATCH 13/37] iov_iter: optimize iov_iter_advance() for iovec and kvec Al Viro
2021-06-06 19:10 ` [RFC PATCH 14/37] sanitize iov_iter_fault_in_readable() Al Viro
2021-06-06 19:10 ` [RFC PATCH 15/37] iov_iter_alignment(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10 ` [RFC PATCH 16/37] iov_iter_gap_alignment(): get rid of iterate_all_kinds() Al Viro
2021-06-09 13:01 ` Qian Cai
2021-06-09 18:06 ` Al Viro
2021-06-06 19:10 ` [RFC PATCH 17/37] get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc() Al Viro
2021-06-06 19:10 ` [RFC PATCH 18/37] iov_iter_npages(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10 ` [RFC PATCH 19/37] [xarray] iov_iter_npages(): just use DIV_ROUND_UP() Al Viro
2021-06-06 19:10 ` [RFC PATCH 20/37] iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant Al Viro
2021-06-06 19:10 ` [RFC PATCH 21/37] csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() Al Viro
2021-06-06 19:10 ` [RFC PATCH 22/37] iterate_and_advance(): get rid of magic in case when n is 0 Al Viro
2021-06-06 19:10 ` [RFC PATCH 23/37] iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec Al Viro
2021-06-06 19:10 ` [RFC PATCH 24/37] iov_iter: unify iterate_iovec and iterate_kvec Al Viro
2021-06-06 19:10 ` [RFC PATCH 25/37] iterate_bvec(): expand bvec.h macro forest, massage a bit Al Viro
2021-06-06 19:10 ` [RFC PATCH 26/37] iov_iter: teach iterate_{bvec,xarray}() about possible short copies Al Viro
2021-06-06 19:10 ` [RFC PATCH 27/37] iov_iter: get rid of separate bvec and xarray callbacks Al Viro
2021-06-06 19:10 ` [RFC PATCH 28/37] iov_iter: make the amount already copied available to iterator callbacks Al Viro
2021-06-06 19:10 ` [RFC PATCH 29/37] iov_iter: make iterator callbacks use base and len instead of iovec Al Viro
2021-06-06 19:10 ` [RFC PATCH 30/37] pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} Al Viro
2021-06-06 19:10 ` [RFC PATCH 31/37] iterate_xarray(): only of the first iteration we might get offset != 0 Al Viro
2021-06-06 19:10 ` [RFC PATCH 32/37] copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases Al Viro
2021-06-06 19:10 ` [RFC PATCH 33/37] copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases Al Viro
2021-06-06 19:10 ` [RFC PATCH 34/37] iov_iter: clean csum_and_copy_...() primitives up a bit Al Viro
2021-06-06 19:10 ` [RFC PATCH 35/37] pipe_zero(): we don't need no stinkin' kmap_atomic() Al Viro
2021-06-06 19:10 ` [RFC PATCH 36/37] clean up copy_mc_pipe_to_iter() Al Viro
2021-06-06 19:10 ` [RFC PATCH 37/37] csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller Al Viro
2021-06-06 22:05 ` [RFC][PATCHSET] iov_iter work Linus Torvalds
2021-06-06 22:46 ` Linus Torvalds
2021-06-07 9:28 ` Christoph Hellwig
2021-06-07 14:43 ` Al Viro
2021-06-07 15:59 ` Christoph Hellwig
2021-06-07 21:07 ` Al Viro
2021-06-07 22:01 ` Linus Torvalds
2021-06-07 23:35 ` Linus Torvalds
2021-06-08 5:25 ` Christoph Hellwig
2021-06-08 11:27 ` Al Viro
2021-06-06 23:29 ` Al Viro [this message]
2021-06-07 10:38 ` Pavel Begunkov
2021-06-08 14:43 ` David Laight
2021-06-10 14:29 ` Qian Cai
2021-06-10 15:35 ` Al Viro
2021-06-10 15:48 ` Al Viro
2021-06-10 19:08 ` Qian Cai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=anton@tuxera.com \
--cc=asml.silence@gmail.com \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox