linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org,
	viro@zeniv.linux.org.uk
Subject: Re: [PATCHSET v6b 0/11] Turn single segment imports into ITER_UBUF
Date: Thu, 30 Mar 2023 11:33:16 -0600	[thread overview]
Message-ID: <de35d11d-bce7-e976-7372-1f2caf417103@kernel.dk> (raw)
In-Reply-To: <CAHk-=wgmGBCO9QnBhheQDOHu+6k+OGHGCjHyHm4J=snowkSupQ@mail.gmail.com>

On 3/30/23 11:11?AM, Linus Torvalds wrote:
> On Thu, Mar 30, 2023 at 9:47?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Sadly, in absolute numbers, comparing read(2) and readv(2),
>> the latter takes 2.11x as long in the stock kernel, and 2.01x as long
>> with the patches. So while single segment is better now than before,
>> it's still waaaay slower than having to copy in a single iovec. Testing
>> was run with all security mitigations off.
> 
> What does the profile say? Iis it all in import_iovec() or what?
> 
> I do note that we have some completely horrid "helper" functions in
> the iter paths: things like "do_iter_readv_writev()" supposedly being
> a common function , but then it ends up doing some small setup and
> just doing a conditional on the "type" after all, so when it isn't
> inlined, you have those things that don't predict well at all.
> 
> And the iter interfaces don't have just that iterator, they have the
> whole kiocb overhead too. All in the name of being generic. Most file
> descriptors don't even support the simpler ".read" interface, because
> they want the whole thing with IOCB_DIRECT flags etc.
> 
> So to some degree it's unfair to compare read-vs-read_iter. The latter
> has all that disgusting support for O_DIRECT and friends, and testing
> with /dev/null just doesn't show that part.

Oh I agree, and particularly for the "read from /dev/zero" case it's
not very interesting, as it does too different things there as well.
It was just more of a "gah it's potentially this bad" outburst than
anything else, the numbers I did care about was readv before and
after patches, not read vs readv.

That said, there might be things to improve here. But that's a task
for another time. perf diff of a read vs readv run below.


# Event 'cycles'
#
# Baseline  Delta Abs  Shared Object         Symbol                               
# ........  .........  ....................  .....................................
#
              +40.56%  [kernel.vmlinux]      [k] iov_iter_zero
              +12.59%  [kernel.vmlinux]      [k] copy_user_enhanced_fast_string
    21.56%    -11.10%  [kernel.vmlinux]      [k] entry_SYSCALL_64
               +7.67%  [kernel.vmlinux]      [k] _copy_from_user
               +7.40%  libc.so.6             [.] __GI___readv
     3.76%     -2.22%  [kernel.vmlinux]      [k] __fsnotify_parent
               +2.13%  [kernel.vmlinux]      [k] do_iter_read
               +2.02%  [kernel.vmlinux]      [k] do_iter_readv_writev
               +1.89%  [kernel.vmlinux]      [k] __import_iovec
               +1.59%  [kernel.vmlinux]      [k] do_readv
     3.15%     -1.43%  [kernel.vmlinux]      [k] __fget_light
               +1.42%  [kernel.vmlinux]      [k] vfs_readv
               +1.32%  [kernel.vmlinux]      [k] read_iter_zero
     2.39%     -1.30%  [kernel.vmlinux]      [k] syscall_exit_to_user_mode
     1.89%     -1.17%  [kernel.vmlinux]      [k] exit_to_user_mode_prepare
     2.01%     -1.10%  [kernel.vmlinux]      [k] do_syscall_64
     2.04%     -1.06%  [kernel.vmlinux]      [k] __fdget_pos
     1.93%     -0.99%  [kernel.vmlinux]      [k] syscall_enter_from_user_mode
               +0.81%  [kernel.vmlinux]      [k] __get_task_ioprio
     1.03%     -0.56%  [kernel.vmlinux]      [k] fpregs_assert_state_consistent
     0.85%     -0.49%  [kernel.vmlinux]      [k] syscall_exit_to_user_mode_prepare
               +0.45%  [kernel.vmlinux]      [k] import_iovec
               +0.20%  [kernel.vmlinux]      [k] kfree
               +0.18%  [kernel.vmlinux]      [k] __x64_sys_readv
               +0.06%  read-zero             [.] readv@plt
               +0.01%  [kernel.vmlinux]      [k] filemap_map_pages
               +0.01%  ld-linux-x86-64.so.2  [.] check_match
     0.00%     +0.00%  [kernel.vmlinux]      [k] memset_erms
     0.00%     -0.00%  [kernel.vmlinux]      [k] perf_iterate_ctx
               +0.00%  [kernel.vmlinux]      [k] xfs_iunlock
     0.49%     -0.00%  read-zero             [.] main
               +0.00%  [kernel.vmlinux]      [k] arch_get_unmapped_area_topdown
               +0.00%  [kernel.vmlinux]      [k] pcpu_alloc
               +0.00%  [kernel.vmlinux]      [k] perf_event_exec
               +0.00%  taskset               [.] 0x0000000000002ebd
     0.00%     +0.00%  [kernel.vmlinux]      [k] perf_ibs_handle_irq
     0.00%     -0.00%  [kernel.vmlinux]      [k] perf_ibs_start
    32.88%             [kernel.vmlinux]      [k] read_zero
    15.22%             libc.so.6             [.] read
     6.27%             [kernel.vmlinux]      [k] vfs_read
     2.60%             [kernel.vmlinux]      [k] ksys_read
     1.02%             [kernel.vmlinux]      [k] __cond_resched
     0.41%             [kernel.vmlinux]      [k] rcu_all_qs
     0.35%             [kernel.vmlinux]      [k] __x64_sys_read
     0.12%             read-zero             [.] read@plt
     0.01%             ld-linux-x86-64.so.2  [.] _dl_load_cache_lookup
     0.01%             ld-linux-x86-64.so.2  [.] _dl_check_map_versions
     0.00%             [kernel.vmlinux]      [k] _find_next_or_bit
     0.00%             [kernel.vmlinux]      [k] perf_event_update_userpage
     0.00%             [kernel.vmlinux]      [k] native_sched_clock

-- 
Jens Axboe


  reply	other threads:[~2023-03-30 17:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 16:46 [PATCHSET v6b 0/11] Turn single segment imports into ITER_UBUF Jens Axboe
2023-03-30 16:46 ` [PATCH 01/11] block: ensure bio_alloc_map_data() deals with ITER_UBUF correctly Jens Axboe
2023-03-30 16:46 ` [PATCH 02/11] iov_iter: add iter_iovec() helper Jens Axboe
2023-03-30 16:46 ` [PATCH 03/11] IB/hfi1: check for user backed iterator, not specific iterator type Jens Axboe
2023-03-30 16:46 ` [PATCH 04/11] IB/qib: " Jens Axboe
2023-03-30 16:46 ` [PATCH 05/11] ALSA: pcm: " Jens Axboe
2023-03-30 16:46 ` [PATCH 06/11] iov_iter: add iter_iov_addr() and iter_iov_len() helpers Jens Axboe
2023-03-30 16:46 ` [PATCH 07/11] iov_iter: remove iov_iter_iovec() Jens Axboe
2023-03-30 16:46 ` [PATCH 08/11] iov_iter: set nr_segs = 1 for ITER_UBUF Jens Axboe
2023-03-30 16:47 ` [PATCH 09/11] iov_iter: overlay struct iovec and ubuf/len Jens Axboe
2023-03-30 16:47 ` [PATCH 10/11] iov_iter: convert import_single_range() to ITER_UBUF Jens Axboe
2023-03-30 16:47 ` [PATCH 11/11] iov_iter: import single vector iovecs as ITER_UBUF Jens Axboe
2023-03-30 17:11 ` [PATCHSET v6b 0/11] Turn single segment imports into ITER_UBUF Linus Torvalds
2023-03-30 17:33   ` Jens Axboe [this message]
2023-03-30 21:53     ` Linus Torvalds
2023-03-30 22:18       ` Jens Axboe
2023-04-02 22:22         ` Jens Axboe
2023-04-02 23:37           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de35d11d-bce7-e976-7372-1f2caf417103@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).