From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ublk: use copy_{to,from}_iter() for user copy
Date: Wed, 5 Nov 2025 09:48:09 +0800 [thread overview]
Message-ID: <aQqs2TlXU0UYlsuy@fedora> (raw)
In-Reply-To: <CADUfDZqKV2SzbWoe4gr4aSPaBtr+VwmEgEidZKo=LQBU9Quf2Q@mail.gmail.com>
On Mon, Nov 03, 2025 at 08:40:30AM -0800, Caleb Sander Mateos wrote:
> On Fri, Oct 31, 2025 at 4:04 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 09:02:48AM -0700, Caleb Sander Mateos wrote:
> > > On Thu, Oct 30, 2025 at 8:45 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 07:05:21PM -0600, Caleb Sander Mateos wrote:
> > > > > ublk_copy_user_pages()/ublk_copy_io_pages() currently uses
> > > > > iov_iter_get_pages2() to extract the pages from the iov_iter and
> > > > > memcpy()s between the bvec_iter and the iov_iter's pages one at a time.
> > > > > Switch to using copy_to_iter()/copy_from_iter() instead. This avoids the
> > > > > user page reference count increments and decrements and needing to split
> > > > > the memcpy() at user page boundaries. It also simplifies the code
> > > > > considerably.
> > > > >
> > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > > > ---
> > > > > drivers/block/ublk_drv.c | 62 +++++++++-------------------------------
> > > > > 1 file changed, 14 insertions(+), 48 deletions(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index 0c74a41a6753..852350e639d6 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -912,58 +912,47 @@ static const struct block_device_operations ub_fops = {
> > > > > .open = ublk_open,
> > > > > .free_disk = ublk_free_disk,
> > > > > .report_zones = ublk_report_zones,
> > > > > };
> > > > >
> > > > > -#define UBLK_MAX_PIN_PAGES 32
> > > > > -
> > > > > struct ublk_io_iter {
> > > > > - struct page *pages[UBLK_MAX_PIN_PAGES];
> > > > > struct bio *bio;
> > > > > struct bvec_iter iter;
> > > > > };
> > > >
> > > > ->pages[] is actually for pinning user io pages in batch, so killing it may cause
> > > > perf drop.
> > >
> > > As far as I can tell, copy_to_iter()/copy_from_iter() avoids the page
> > > pinning entirely. It calls copy_to_user_iter() for each contiguous
> > > user address range:
> > >
> > > size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> > > {
> > > if (WARN_ON_ONCE(i->data_source))
> > > return 0;
> > > if (user_backed_iter(i))
> > > might_fault();
> > > return iterate_and_advance(i, bytes, (void *)addr,
> > > copy_to_user_iter, memcpy_to_iter);
> > > }
> > >
> > > Which just checks that the address range doesn't include any kernel
> > > addresses and then memcpy()s directly via the userspace virtual
> > > addresses:
> > >
> > > static __always_inline
> > > size_t copy_to_user_iter(void __user *iter_to, size_t progress,
> > > size_t len, void *from, void *priv2)
> > > {
> > > if (should_fail_usercopy())
> > > return len;
> > > if (access_ok(iter_to, len)) {
> > > from += progress;
> > > instrument_copy_to_user(iter_to, from, len);
> > > len = raw_copy_to_user(iter_to, from, len);
> > > }
> > > return len;
> > > }
> > >
> > > static __always_inline __must_check unsigned long
> > > raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
> > > {
> > > return copy_user_generic((__force void *)dst, src, size);
> > > }
> > >
> > > static __always_inline __must_check unsigned long
> > > copy_user_generic(void *to, const void *from, unsigned long len)
> > > {
> > > stac();
> > > /*
> > > * If CPU has FSRM feature, use 'rep movs'.
> > > * Otherwise, use rep_movs_alternative.
> > > */
> > > asm volatile(
> > > "1:\n\t"
> > > ALTERNATIVE("rep movsb",
> > > "call rep_movs_alternative",
> > > ALT_NOT(X86_FEATURE_FSRM))
> > > "2:\n"
> > > _ASM_EXTABLE_UA(1b, 2b)
> > > :"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> > > : : "memory", "rax");
> > > clac();
> > > return len;
> > > }
> > >
> > > Am I missing something?
> >
> > page is allocated & mapped in page fault handler.
>
> Right, physical pages certainly need to be allocated for the virtual
> address range being copied to/from. But that would have happened
> previously in iov_iter_get_pages2(), so this isn't a new cost. And as
> you point out, in the common case that the virtual pages are already
> mapped to physical pages, the copy won't cause any page faults.
>
> >
> > However, in typical cases, pages in io buffer shouldn't be swapped out
> > frequently, so this cleanup may be good, I will run some perf test.
>
> Thanks for testing.
`fio/t/io_uring` shows 40% improvement on `./kublk -t null -q 2` with this
patch in my test VM, so looks very nice improvement.
Also it works well by forcing to pass IOSQE_ASYNC on the ublk uring_cmd,
and this change is correct because the copy is guaranteed to be done in ublk
daemon context.
>
> >
> > Also copy_page_from_iter()/copy_page_to_iter() can be used for avoiding
> > bvec_kmap_local(), and the two helper can handle one whole bvec instead
> > of single page.
>
> Yes, that's a good idea. Thanks, I didn't know about that.
>
> >
> > Then rq_for_each_bvec() can be used directly, and `ublk_io_iter` may be
> > killed.
>
> Hmm, we still need a way to offset into the request (i.e. what
> ublk_advance_io_iter() does currently). Are you thinking of a single
> rq_for_each_bvec() loop that would skip bvecs until the offset is
> reached and then copy until reaching the end of the user iterator?
Yeah, that is basically what ublk_advance_io_iter() does.
Thanks,
Ming
next prev parent reply other threads:[~2025-11-05 1:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 1:05 [PATCH] ublk: use copy_{to,from}_iter() for user copy Caleb Sander Mateos
2025-10-31 3:45 ` Ming Lei
2025-10-31 16:02 ` Caleb Sander Mateos
2025-10-31 23:03 ` Ming Lei
2025-11-03 16:40 ` Caleb Sander Mateos
2025-11-05 1:48 ` Ming Lei [this message]
2025-11-05 15:26 ` Jens Axboe
2025-11-05 15:37 ` Caleb Sander Mateos
2025-11-05 15:41 ` Jens Axboe
2025-11-05 16:16 ` Caleb Sander Mateos
2025-11-05 17:10 ` Caleb Sander Mateos
2025-11-05 17:16 ` Caleb Sander Mateos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQqs2TlXU0UYlsuy@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.