public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	David Sterba <dsterba@suse.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Anton Altaparmakov <anton@tuxera.com>,
	David Howells <dhowells@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: Re: [RFC][PATCHSET] iov_iter work
Date: Sun, 6 Jun 2021 23:29:59 +0000	[thread overview]
Message-ID: <YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <CAHk-=wj6ZiTgqbeCPtzP+5tgHjur6Amag66YWub_2DkGpP9h-Q@mail.gmail.com>

On Sun, Jun 06, 2021 at 03:05:49PM -0700, Linus Torvalds wrote:

> The end result certainly looks cleaner than before, although I can't
> say that it's pretty. It's still haitry and grotty, but at least
> _some_ of the hairiness has been removed (yay for
> _copy_from_iter_full() being *much* simpler now, and
> iterate_all_kinds() being gone).
> 
> I also have to admit to despising what iov_iter_init() ends up looking
> like. Not because I think that using those initializer assignments to
> set it is wrong, but simply because the initializers are basically
> identical except for "iter_type".

TBH, I wonder if we still need that these days.  That is to say,
do we even call iov_iter_init() when uaccess_kernel() is true?
We certainly used to do that, but...  We have

* copy_from_kernel_nofault(); calls __copy_from_user_inatomic() under
KERNEL_DS and pagefault_disable(), and that would better *not* fuck with
iov_iter inside.

* copy_to_kernel_nofault(); similar, with s/_from_/_to_/.

* strncpy_from_kernel_nofault(); ditto, with __get_user() instead of
__copy_..._user_inatomic().

* unaligned trap handling in kernel mode on sh and itanic

* arm dumping insns in oops handler

* 3 turds in arm oabi compat: sys_semtimedop_time32(), sys_epoll_wait()
and sys_fcntl64() (for [GS]ETLK... fcntls) called under KERNEL_DS.
The last one should simply call fcntl_[gs]etlk() and be done with
that, not that it was going to play with any iov_iter anyway.
epoll_wait() is currently not using iov_iter (it does use __put_user()
and not in a pretty way, but that's a separate story).  And
for semtimedop_time32(), we do not have any iov_iter in sight *and*
I would rather get rid of KERNEL_DS there completely - all it takes
is a variant of do_semtimedop() with tsops already copied into the
kernel.  In any case, none of those do iov_iter_init().

	I have *not* checked if any kernel threads might be playing
with iov_iter_init() outside of kthread_use_mm(); some might, but
IMO any such place would be a good candidate for conversion to
explicit iov_iter_kvec()...

	BTW, speaking of initializers...  Pavel, could you check if
the following breaks anything?  Unless I'm misreading __io_import_fixed(),
looks like that's what it's trying to do...

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f46acbbeed57..9bd2da9a4c3d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2773,57 +2773,14 @@ static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter
 {
 	size_t len = req->rw.len;
 	u64 buf_end, buf_addr = req->rw.addr;
-	size_t offset;
 
 	if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end)))
 		return -EFAULT;
 	/* not inside the mapped region */
 	if (unlikely(buf_addr < imu->ubuf || buf_end > imu->ubuf_end))
 		return -EFAULT;
-
-	/*
-	 * May not be a start of buffer, set size appropriately
-	 * and advance us to the beginning.
-	 */
-	offset = buf_addr - imu->ubuf;
-	iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
-
-	if (offset) {
-		/*
-		 * Don't use iov_iter_advance() here, as it's really slow for
-		 * using the latter parts of a big fixed buffer - it iterates
-		 * over each segment manually. We can cheat a bit here, because
-		 * we know that:
-		 *
-		 * 1) it's a BVEC iter, we set it up
-		 * 2) all bvecs are PAGE_SIZE in size, except potentially the
-		 *    first and last bvec
-		 *
-		 * So just find our index, and adjust the iterator afterwards.
-		 * If the offset is within the first bvec (or the whole first
-		 * bvec, just use iov_iter_advance(). This makes it easier
-		 * since we can just skip the first segment, which may not
-		 * be PAGE_SIZE aligned.
-		 */
-		const struct bio_vec *bvec = imu->bvec;
-
-		if (offset <= bvec->bv_len) {
-			iov_iter_advance(iter, offset);
-		} else {
-			unsigned long seg_skip;
-
-			/* skip first vec */
-			offset -= bvec->bv_len;
-			seg_skip = 1 + (offset >> PAGE_SHIFT);
-
-			iter->bvec = bvec + seg_skip;
-			iter->nr_segs -= seg_skip;
-			iter->count -= bvec->bv_len + offset;
-			iter->iov_offset = offset & ~PAGE_MASK;
-		}
-	}
-
-	return 0;
+	return import_pagevec(rw, buf_addr, len, imu->ubuf,
+			      imu->nr_bvecs, imu->bvec, iter);
 }
 
 static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index fd88d9911dad..f6291e981d07 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -299,5 +299,8 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 		 struct iov_iter *i, bool compat);
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
+int import_pagevec(int rw, unsigned long from, size_t len,
+		unsigned long base, unsigned nr_pages,
+		struct bio_vec *bvec, struct iov_iter *i);
 
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 11b39bd1d1ab..4a771fcb529b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1982,3 +1982,21 @@ int import_single_range(int rw, void __user *buf, size_t len,
 	return 0;
 }
 EXPORT_SYMBOL(import_single_range);
+
+int import_pagevec(int rw, unsigned long from, size_t len,
+		unsigned long base, unsigned nr_pages,
+		struct bio_vec *bvec, struct iov_iter *i)
+
+{
+	unsigned long skip_pages = (from >> PAGE_SHIFT) - (base >> PAGE_SHIFT);
+
+	*i = (struct iov_iter){
+		.iter_type = ITER_BVEC,
+		.data_source = rw,
+		.bvec = bvec + skip_pages,
+		.nr_segs = nr_pages - skip_pages,
+		.iov_offset = skip_pages ? from & ~PAGE_MASK : from - base,
+		.count = len
+	};
+	return 0;
+}
-- 
2.11.0


  parent reply	other threads:[~2021-06-06 23:30 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 19:07 [RFC][PATCHSET] iov_iter work Al Viro
2021-06-06 19:10 ` [RFC PATCH 01/37] ntfs_copy_from_user_iter(): don't bother with copying iov_iter Al Viro
2021-06-06 19:10   ` [RFC PATCH 02/37] generic_perform_write()/iomap_write_actor(): saner logics for short copy Al Viro
2021-06-06 19:10   ` [RFC PATCH 03/37] fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count() Al Viro
2021-06-06 19:10   ` [RFC PATCH 04/37] iov_iter: Remove iov_iter_for_each_range() Al Viro
2021-06-06 19:10   ` [RFC PATCH 05/37] teach copy_page_to_iter() to handle compound pages Al Viro
2021-06-06 19:10   ` [RFC PATCH 06/37] copy_page_to_iter(): fix ITER_DISCARD case Al Viro
2021-06-06 19:10   ` [RFC PATCH 07/37] [xarray] iov_iter_fault_in_readable() should do nothing in xarray case Al Viro
2021-06-06 19:10   ` [RFC PATCH 08/37] iov_iter_advance(): use consistent semantics for move past the end Al Viro
2021-06-06 19:10   ` [RFC PATCH 09/37] iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert() Al Viro
2021-06-06 19:10   ` [RFC PATCH 10/37] iov_iter: reorder handling of flavours in primitives Al Viro
2021-06-06 19:10   ` [RFC PATCH 11/37] iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD Al Viro
2021-06-06 19:10   ` [RFC PATCH 12/37] iov_iter: separate direction from flavour Al Viro
2021-06-06 19:10   ` [RFC PATCH 13/37] iov_iter: optimize iov_iter_advance() for iovec and kvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 14/37] sanitize iov_iter_fault_in_readable() Al Viro
2021-06-06 19:10   ` [RFC PATCH 15/37] iov_iter_alignment(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10   ` [RFC PATCH 16/37] iov_iter_gap_alignment(): get rid of iterate_all_kinds() Al Viro
2021-06-09 13:01     ` Qian Cai
2021-06-09 18:06       ` Al Viro
2021-06-06 19:10   ` [RFC PATCH 17/37] get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc() Al Viro
2021-06-06 19:10   ` [RFC PATCH 18/37] iov_iter_npages(): don't bother with iterate_all_kinds() Al Viro
2021-06-06 19:10   ` [RFC PATCH 19/37] [xarray] iov_iter_npages(): just use DIV_ROUND_UP() Al Viro
2021-06-06 19:10   ` [RFC PATCH 20/37] iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant Al Viro
2021-06-06 19:10   ` [RFC PATCH 21/37] csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() Al Viro
2021-06-06 19:10   ` [RFC PATCH 22/37] iterate_and_advance(): get rid of magic in case when n is 0 Al Viro
2021-06-06 19:10   ` [RFC PATCH 23/37] iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 24/37] iov_iter: unify iterate_iovec and iterate_kvec Al Viro
2021-06-06 19:10   ` [RFC PATCH 25/37] iterate_bvec(): expand bvec.h macro forest, massage a bit Al Viro
2021-06-06 19:10   ` [RFC PATCH 26/37] iov_iter: teach iterate_{bvec,xarray}() about possible short copies Al Viro
2021-06-06 19:10   ` [RFC PATCH 27/37] iov_iter: get rid of separate bvec and xarray callbacks Al Viro
2021-06-06 19:10   ` [RFC PATCH 28/37] iov_iter: make the amount already copied available to iterator callbacks Al Viro
2021-06-06 19:10   ` [RFC PATCH 29/37] iov_iter: make iterator callbacks use base and len instead of iovec Al Viro
2021-06-06 19:10   ` [RFC PATCH 30/37] pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} Al Viro
2021-06-06 19:10   ` [RFC PATCH 31/37] iterate_xarray(): only of the first iteration we might get offset != 0 Al Viro
2021-06-06 19:10   ` [RFC PATCH 32/37] copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases Al Viro
2021-06-06 19:10   ` [RFC PATCH 33/37] copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases Al Viro
2021-06-06 19:10   ` [RFC PATCH 34/37] iov_iter: clean csum_and_copy_...() primitives up a bit Al Viro
2021-06-06 19:10   ` [RFC PATCH 35/37] pipe_zero(): we don't need no stinkin' kmap_atomic() Al Viro
2021-06-06 19:10   ` [RFC PATCH 36/37] clean up copy_mc_pipe_to_iter() Al Viro
2021-06-06 19:10   ` [RFC PATCH 37/37] csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller Al Viro
2021-06-06 22:05 ` [RFC][PATCHSET] iov_iter work Linus Torvalds
2021-06-06 22:46   ` Linus Torvalds
2021-06-07  9:28     ` Christoph Hellwig
2021-06-07 14:43       ` Al Viro
2021-06-07 15:59         ` Christoph Hellwig
2021-06-07 21:07           ` Al Viro
2021-06-07 22:01             ` Linus Torvalds
2021-06-07 23:35               ` Linus Torvalds
2021-06-08  5:25                 ` Christoph Hellwig
2021-06-08 11:27                 ` Al Viro
2021-06-06 23:29   ` Al Viro [this message]
2021-06-07 10:38     ` Pavel Begunkov
2021-06-08 14:43 ` David Laight
2021-06-10 14:29 ` Qian Cai
2021-06-10 15:35   ` Al Viro
2021-06-10 15:48     ` Al Viro
2021-06-10 19:08     ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YL1ad7+I30xnCto8@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=anton@tuxera.com \
    --cc=asml.silence@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox