All of lore.kernel.org
 help / color / mirror / Atom feed
From: Carel Si <beibei.si@intel.com>
To: lkp@lists.01.org
Subject: Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression
Date: Mon, 13 Dec 2021 18:57:29 +0800	[thread overview]
Message-ID: <20211213105728.GA21139@linux.intel.com> (raw)
In-Reply-To: <CAHk-=wgxd2DqzM3PAsFmzJDHFggxg7ODTQxfJoGCRDbjgMm8nA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 31513 bytes --]

Hi Linus,

On Fri, Dec 10, 2021 at 10:33:43AM -0800, Linus Torvalds wrote:
> On Thu, Dec 9, 2021 at 9:38 PM kernel test robot <oliver.sang@intel.com> wrote:
> >
> > FYI, we noticed a -5.7% regression of will-it-scale.per_thread_ops due to commit:
> > 054aa8d439b9  ("fget: check that the fd still exists after getting a ref to it")
> 
> Well, some downside of the new checks was expected, that's just much
> more than I really like or would have thought.
> 
> But it's exactly where you'd expect:
> 
> >      27.16 ± 10%      +4.3       31.51 ±  2%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      22.91 ± 10%      +4.4       27.34 ±  2%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
> >      26.33 ± 10%      +4.4       30.70 ±  2%  perf-profile.children.cycles-pp.__fget_light
> >      22.92 ± 10%      +4.4       27.35 ±  2%  perf-profile.children.cycles-pp.__fget_files
> >      22.70 ± 10%      +4.4       27.11 ±  2%  perf-profile.self.cycles-pp.__fget_files
> 
> although there's odd spikes in dTLB-loads etc.
> 
> I checked whether it's some unexpected code generation issue, but the
> new "re-check file table after refcount update" really looks very
> cheap when I look at what gcc generates, there's nothing really
> unexpected there.
> 
> What did change was:
> 
>  (a) some branches go other ways, which might well affect branch
> prediction and just be unlucky. It might be that just marking the
> mismatch case "unlikely()" will help.
> 
>  (b) the obvious few new instructions (re-load and check file table
> pointer, re-load and check file pointer)
> 
>  (c) that __fget_files() function is now no longer a leaf function in
> a simple config case, since it calls "fput_many" in the error case.
> 
> And that (c) is worth mentioning simply because it means that the
> function goes from not having any stack frame at all, to having to
> save/restore four registers. So now it has the usual push/pop
> sequences.
> 
> It may also be that the test-case actually does a lot of threaded
> open/close/poll, and either actually triggers the re-lookup looping
> case (unlikely) or just sees a lot of cacheline bouncing that now got
> worse due to the re-check of the file pointer.
> 
> So this regression looks real, and the issue seems to be that
> __fget_files() really is _that_ important for this do_sys_poll()
> benchmark, and even just the handful of extra instructions end up
> being meaningful.
> 
> Oliver - I'm attaching the obvious "unlikely9)" oneliner in case it's
> just "gcc thought the retry loop was the common case" issue and bad
> branch prediction.

We tested your patch, it didn't work, still has -6.0% regression, thanks.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/poll2/will-it-scale/0x42e

commit:
  5f58da2bef ("Merge tag 'drm-fixes-2021-12-03-1' of git://anongit.freedesktop.org/drm/drm")
  054aa8d439 ("fget: check that the fd still exists after getting a ref to it")
  ef8c68873e ("fixup-for-054aa8d439")

5f58da2befa58edf 054aa8d439b9185d4f5eb9a9028 ef8c68873e75cf486bec22c3e8d 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.24.threads
    277779            -5.7%     262011            -6.0%     261178        will-it-scale.per_thread_ops
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.workload
     28.74 ± 23%      -1.9       26.84 ±  6%      +1.2       29.92 ± 25%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     19.12 ±  9%      -1.7       17.39 ±  2%      -2.5       16.64 ± 11%  perf-profile.calltrace.cycles-pp.fput_many.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±100%      -0.2        0.10 ±223%      -0.1        0.17 ±141%  perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.88 ±  9%      -0.1        2.76 ±  2%      -0.2        2.64 ± 11%  perf-profile.calltrace.cycles-pp.testcase
      3.27 ±  9%      -0.1        3.18 ±  2%      -0.2        3.04 ± 11%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
      1.70 ±  9%      -0.1        1.63 ±  2%      -0.1        1.56 ± 11%  perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.70 ± 10%      -0.1        1.64 ±  2%      -0.1        1.58 ± 10%  perf-profile.calltrace.cycles-pp.fput.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.04 ±  8%      -0.1        0.98 ±  4%      -0.1        0.96 ± 11%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.64 ±  9%      -0.1        1.59 ±  3%      -0.1        1.50 ± 10%  perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
      1.69 ±  9%      -0.0        1.65 ±  3%      -0.1        1.57 ± 12%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
      1.11 ±  9%      -0.0        1.08 ±  4%      -0.1        1.02 ±  9%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.48 ±  8%      -0.0        1.45 ±  4%      -0.1        1.38 ± 10%  perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
      0.54 ± 45%      +0.1        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     67.66 ±  9%      +2.0       69.69 ±  2%      -0.9       66.78 ± 10%  perf-profile.calltrace.cycles-pp.__poll
     63.66 ±  9%      +2.2       65.82 ±  2%      -0.6       63.10 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
     63.48 ±  9%      +2.2       65.64 ±  2%      -0.6       62.93 ± 10%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     59.85 ±  9%      +2.3       62.12 ±  2%      -0.3       59.56 ± 10%  perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     57.74 ±  9%      +2.4       60.09 ±  2%      -0.1       57.62 ± 10%  perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     27.16 ± 10%      +4.3       31.51 ±  2%      +3.0       30.21 ± 10%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.91 ± 10%      +4.4       27.34 ±  2%      +3.3       26.25 ± 10%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.children.cycles-pp.start_secondary
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter_state
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpu_startup_entry
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.do_idle
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.children.cycles-pp.intel_idle
     18.28 ±  9%      -1.7       16.59 ±  2%      -2.4       15.87 ± 11%  perf-profile.children.cycles-pp.fput_many
      2.90 ±  9%      -0.1        2.76 ±  2%      -0.3        2.64 ± 11%  perf-profile.children.cycles-pp.testcase
      3.29 ±  9%      -0.1        3.20 ±  2%      -0.2        3.07 ± 11%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.69 ±  9%      -0.1        1.63 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.__fdget
      1.70 ±  9%      -0.1        1.64 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.fput
      1.82 ±  9%      -0.1        1.76 ±  3%      -0.2        1.67 ± 11%  perf-profile.children.cycles-pp.__entry_text_start
      1.07 ±  9%      -0.1        1.02 ±  4%      -0.1        0.99 ± 10%  perf-profile.children.cycles-pp.__check_object_size
      1.90 ±  9%      -0.0        1.86 ±  2%      -0.1        1.77 ± 12%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.50 ±  8%      -0.0        1.47 ±  4%      -0.1        1.39 ± 10%  perf-profile.children.cycles-pp._copy_from_user
      1.12 ±  9%      -0.0        1.09 ±  4%      -0.1        1.03 ± 10%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.30 ±  8%      -0.0        0.26 ± 11%      -0.0        0.29 ± 20%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.12 ±  8%      -0.0        0.09 ± 11%      -0.0        0.09 ± 12%  perf-profile.children.cycles-pp.poll(a)plt
      0.04 ± 71%      -0.0        0.01 ±223%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.seq_read_iter
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_sched
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_record
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.ksys_read
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.vfs_read
      0.61 ± 13%      -0.0        0.59 ±  9%      +0.1        0.66 ±  5%  perf-profile.children.cycles-pp.kfree
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.__libc_start_main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.run_builtin
      0.02 ±142%      -0.0        0.00            -0.0        0.00        perf-profile.children.cycles-pp.new_sync_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.ksys_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.vfs_write
      0.04 ± 45%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.13 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 15%  perf-profile.children.cycles-pp.kmalloc_slab
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.seq_read
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__libc_read
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.record__finish_output
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.perf_session__process_events
      0.06 ± 46%      -0.0        0.04 ± 73%      -0.0        0.04 ± 73%  perf-profile.children.cycles-pp.machines__deliver_event
      0.14 ± 27%      -0.0        0.12 ± 32%      -0.0        0.12 ± 19%  perf-profile.children.cycles-pp.process_simple
      0.09 ± 30%      -0.0        0.08 ± 34%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__process_user_event
      0.05 ± 47%      -0.0        0.04 ± 71%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.exc_page_fault
      0.10 ± 24%      -0.0        0.08 ± 30%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__deliver_event
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.poll_select_set_timeout
      0.09 ± 24%      -0.0        0.08 ± 28%      -0.0        0.08 ± 19%  perf-profile.children.cycles-pp.__ordered_events__flush
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.00        perf-profile.children.cycles-pp.proc_reg_read
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.perf_output_sample
      0.01 ±223%      -0.0        0.00            +0.0        0.01 ±223%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.04 ± 45%      -0.0        0.04 ± 71%      -0.0        0.00        perf-profile.children.cycles-pp.perf_callchain_user
      0.29 ±  9%      -0.0        0.28 ±  4%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.11 ± 12%      -0.0        0.10 ±  8%      -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.__might_sleep
      0.05 ± 49%      -0.0        0.05 ± 52%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.check_stack_object
      0.16 ± 20%      -0.0        0.16 ±  9%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.perf_callchain_kernel
      0.12 ± 20%      -0.0        0.11 ±  8%      -0.0        0.10 ± 11%  perf-profile.children.cycles-pp.unwind_next_frame
      0.29 ± 11%      -0.0        0.28 ±  5%      -0.0        0.26 ±  9%  perf-profile.children.cycles-pp.__check_heap_object
      0.43 ± 19%      +0.0        0.43 ± 10%      -0.1        0.37 ±  9%  perf-profile.children.cycles-pp.update_process_times
      0.23 ± 20%      +0.0        0.23 ± 11%      -0.0        0.18 ±  9%  perf-profile.children.cycles-pp.perf_prepare_sample
      0.21 ± 20%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.perf_callchain
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.children.cycles-pp.io_serial_in
      0.02 ±141%      +0.0        0.02 ±141%      +0.0        0.02 ±141%  perf-profile.children.cycles-pp.ordered_events__queue
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.uart_console_write
      0.07 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_write
      0.21 ± 21%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.get_perf_callchain
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.wait_for_xmitr
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.vprintk_emit
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.console_unlock
      0.48 ± 17%      +0.0        0.48 ±  9%      -0.1        0.42 ± 10%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.32 ± 21%      +0.0        0.32 ± 11%      -0.1        0.26 ± 11%  perf-profile.children.cycles-pp.update_curr
      0.61 ± 16%      +0.0        0.61 ±  7%      -0.1        0.55 ± 10%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.30 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.__perf_event_overflow
      0.28 ± 19%      +0.0        0.28 ± 11%      -0.1        0.23 ± 10%  perf-profile.children.cycles-pp.perf_event_output_forward
      0.29 ± 10%      +0.0        0.29 ±  9%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.__might_fault
      0.08 ± 12%      +0.0        0.08 ± 22%      -0.0        0.07 ± 14%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.worker_thread
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.process_one_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.mgag200_simple_display_pipe_update
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_memcpy_dstclip
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.memcpy_toio
      0.07 ±  8%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_putchar
      0.43 ± 19%      +0.0        0.44 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_handle
      0.57 ± 16%      +0.0        0.58 ±  7%      -0.1        0.52 ±  9%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.05 ± 47%      +0.0        0.06 ±  8%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.__unwind_start
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.perf_swevent_overflow
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.__sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_single
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp._printk
      0.44 ± 19%      +0.0        0.45 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_timer
      0.54 ± 17%      +0.0        0.55 ±  8%      -0.1        0.49 ± 10%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.54 ± 17%      +0.0        0.54 ±  8%      -0.1        0.48 ± 10%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.37 ± 19%      +0.0        0.37 ± 10%      -0.1        0.31 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
      0.06 ± 47%      +0.0        0.07 ± 18%      -0.0        0.06 ± 21%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.07 ±  9%      +0.0        0.08 ±  8%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run_list
      0.98 ± 47%      +0.0        0.99 ± 38%      -0.2        0.78 ± 45%  perf-profile.children.cycles-pp.start_kernel
      0.29 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_tp_event
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.queue_event
      0.00            +0.0        0.01 ±223%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.build_id__mark_dso_hit
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.poll_freewait
      0.39 ± 19%      +0.0        0.40 ±  9%      -0.1        0.34 ±  9%  perf-profile.children.cycles-pp.scheduler_tick
      0.04 ± 71%      +0.0        0.04 ± 45%      -0.0        0.02 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.ret_from_fork
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.kthread
      0.19 ± 10%      +0.0        0.20 ±  9%      +0.0        0.19 ± 10%  perf-profile.children.cycles-pp.__might_resched
      0.02 ±141%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.01 ±223%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.__get_user_nocheck_8
      0.50 ± 12%      +0.0        0.53 ±  9%      -0.0        0.48 ± 11%  perf-profile.children.cycles-pp.__kmalloc
     67.89 ±  9%      +2.0       69.92 ±  2%      -0.9       67.00 ± 10%  perf-profile.children.cycles-pp.__poll
     63.84 ±  9%      +2.1       65.97 ±  2%      -0.6       63.26 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     63.66 ±  9%      +2.1       65.80 ±  2%      -0.6       63.09 ± 10%  perf-profile.children.cycles-pp.do_syscall_64
     59.86 ±  9%      +2.3       62.12 ±  2%      -0.3       59.57 ± 10%  perf-profile.children.cycles-pp.__x64_sys_poll
     59.47 ±  9%      +2.3       61.75 ±  2%      -0.2       59.22 ± 10%  perf-profile.children.cycles-pp.do_sys_poll
     26.33 ± 10%      +4.4       30.70 ±  2%      +3.1       29.43 ± 10%  perf-profile.children.cycles-pp.__fget_light
     22.92 ± 10%      +4.4       27.35 ±  2%      +3.3       26.26 ± 10%  perf-profile.children.cycles-pp.__fget_files
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.self.cycles-pp.intel_idle
     17.29 ±  9%      -1.6       15.64 ±  2%      -2.3       14.96 ± 11%  perf-profile.self.cycles-pp.fput_many
     11.06 ±  9%      -0.3       10.74 ±  2%      -0.8       10.29 ± 10%  perf-profile.self.cycles-pp.do_sys_poll
      2.87 ±  9%      -0.1        2.74 ±  2%      -0.2        2.62 ± 11%  perf-profile.self.cycles-pp.testcase
      3.21 ±  9%      -0.1        3.12 ±  2%      -0.2        3.00 ± 11%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.60 ±  9%      -0.1        1.54 ±  4%      -0.1        1.47 ± 10%  perf-profile.self.cycles-pp.__entry_text_start
      1.90 ±  8%      -0.0        1.86 ±  3%      -0.1        1.76 ± 12%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      2.55 ±  9%      -0.0        2.51 ±  4%      -0.2        2.36 ± 12%  perf-profile.self.cycles-pp.__fget_light
      0.84 ±  8%      -0.0        0.81 ±  2%      -0.1        0.78 ± 10%  perf-profile.self.cycles-pp.fput
      0.29 ±  8%      -0.0        0.26 ± 11%      -0.0        0.28 ± 20%  perf-profile.self.cycles-pp.__virt_addr_valid
      1.10 ±  9%      -0.0        1.07 ±  4%      -0.1        1.01 ± 10%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.28 ±  8%      -0.0        0.26 ±  5%      -0.0        0.26 ± 11%  perf-profile.self.cycles-pp.do_syscall_64
      0.84 ±  9%      -0.0        0.82 ±  3%      -0.0        0.79 ± 10%  perf-profile.self.cycles-pp.__fdget
      0.10 ±  6%      -0.0        0.08 ± 10%      -0.0        0.08 ± 14%  perf-profile.self.cycles-pp.poll(a)plt
      0.60 ± 12%      -0.0        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.kfree
      0.11 ± 12%      -0.0        0.09 ±  7%      -0.0        0.10 ± 13%  perf-profile.self.cycles-pp._copy_from_user
      0.44 ±  9%      -0.0        0.43 ±  3%      -0.0        0.40 ± 15%  perf-profile.self.cycles-pp.__check_object_size
      0.31 ± 10%      -0.0        0.30 ±  3%      -0.0        0.28 ± 15%  perf-profile.self.cycles-pp.__x64_sys_poll
      0.18 ±  8%      -0.0        0.17 ±  5%      -0.0        0.16 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.self.cycles-pp.poll_select_set_timeout
      0.04 ± 71%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.12 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 17%  perf-profile.self.cycles-pp.kmalloc_slab
      0.01 ±223%      -0.0        0.00            -0.0        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.06 ± 11%      -0.0        0.05 ±  8%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.03 ±100%      -0.0        0.02 ±142%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.check_stack_object
      0.28 ± 11%      -0.0        0.27 ±  5%      -0.0        0.25 ± 11%  perf-profile.self.cycles-pp.__check_heap_object
      0.09 ± 13%      -0.0        0.09 ±  7%      -0.0        0.09 ± 10%  perf-profile.self.cycles-pp.__might_sleep
      0.08 ± 17%      -0.0        0.08 ± 25%      -0.0        0.08 ± 11%  perf-profile.self.cycles-pp.__might_fault
      0.03 ±100%      -0.0        0.02 ± 99%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.unwind_next_frame
      0.01 ±223%      +0.0        0.01 ±223%      -0.0        0.00        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.self.cycles-pp.io_serial_in
      0.11 ±  9%      +0.0        0.12 ±  8%      -0.0        0.11 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.44 ±  8%      +0.0        0.45 ±  4%      -0.0        0.42 ± 12%  perf-profile.self.cycles-pp.__poll
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.self.cycles-pp.queue_event
      0.18 ± 11%      +0.0        0.19 ±  8%      -0.0        0.18 ± 10%  perf-profile.self.cycles-pp.__might_resched
      0.26 ± 13%      +0.0        0.30 ± 17%      -0.0        0.25 ± 13%  perf-profile.self.cycles-pp.__kmalloc
     22.70 ± 10%      +4.4       27.11 ±  2%      +3.3       26.04 ± 10%  perf-profile.self.cycles-pp.__fget_files

> 
> And it would perhaps be interesting to get an actual instruction-level
> profile of that __fget_files() thing for that benchmark, if that
> pinpoints exactly what is going on and in case that would be easy to
> get on that machine.
> 
> Because it might just be truly horrendously bad luck, with the 32-byte
> stack frame meaning that the kernel stack goes one more page down
> (just jhandwaving from the dTLB number spike), and this all being just
> random bad luck on that particular benchmark.
> 
> Of course, the thing about poll() is that for that case, we *don't*
> really need the "re-check the file descriptor" code at all, since the
> resulting fd isn't going to be installed as a new fd, and it doesn't
> matter for the socket garbage collector logic.
> 
> So maybe it was a mistake to put that re-check in the generic fdget()
> code - yes, it should be cheap, but it's also some of the most hot
> code in the kernel on some loads.
> 
> But if we move it elsewhere, we'd need to come up with some list of
> "these cases need it". Some are obvious: dup, dup2, unix domain file
> passing. It's the non-obvious ones I'd worry about.
> 
> Anybody?
> 
>               Linus

>  fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index ad4a8bf3cf10..f802360e240d 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -858,7 +858,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd,
>  			file = NULL;
>  		else if (!get_file_rcu_many(file, refs))
>  			goto loop;
> -		else if (files_lookup_fd_raw(files, fd) != file) {
> +		else if (unlikely(files_lookup_fd_raw(files, fd) != file)) {
>  			fput_many(file, refs);
>  			goto loop;
>  		}

> _______________________________________________
> LKP mailing list -- lkp(a)lists.01.org
> To unsubscribe send an email to lkp-leave(a)lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Carel Si <beibei.si@intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kernel test robot <oliver.sang@intel.com>,
	Jann Horn <jannh@google.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, kernel test robot <lkp@intel.com>,
	fengwei.yin@intel.com
Subject: Re: [LKP] Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression
Date: Mon, 13 Dec 2021 18:57:29 +0800	[thread overview]
Message-ID: <20211213105728.GA21139@linux.intel.com> (raw)
In-Reply-To: <CAHk-=wgxd2DqzM3PAsFmzJDHFggxg7ODTQxfJoGCRDbjgMm8nA@mail.gmail.com>

Hi Linus,

On Fri, Dec 10, 2021 at 10:33:43AM -0800, Linus Torvalds wrote:
> On Thu, Dec 9, 2021 at 9:38 PM kernel test robot <oliver.sang@intel.com> wrote:
> >
> > FYI, we noticed a -5.7% regression of will-it-scale.per_thread_ops due to commit:
> > 054aa8d439b9  ("fget: check that the fd still exists after getting a ref to it")
> 
> Well, some downside of the new checks was expected, that's just much
> more than I really like or would have thought.
> 
> But it's exactly where you'd expect:
> 
> >      27.16 ± 10%      +4.3       31.51 ±  2%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      22.91 ± 10%      +4.4       27.34 ±  2%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
> >      26.33 ± 10%      +4.4       30.70 ±  2%  perf-profile.children.cycles-pp.__fget_light
> >      22.92 ± 10%      +4.4       27.35 ±  2%  perf-profile.children.cycles-pp.__fget_files
> >      22.70 ± 10%      +4.4       27.11 ±  2%  perf-profile.self.cycles-pp.__fget_files
> 
> although there's odd spikes in dTLB-loads etc.
> 
> I checked whether it's some unexpected code generation issue, but the
> new "re-check file table after refcount update" really looks very
> cheap when I look at what gcc generates, there's nothing really
> unexpected there.
> 
> What did change was:
> 
>  (a) some branches go other ways, which might well affect branch
> prediction and just be unlucky. It might be that just marking the
> mismatch case "unlikely()" will help.
> 
>  (b) the obvious few new instructions (re-load and check file table
> pointer, re-load and check file pointer)
> 
>  (c) that __fget_files() function is now no longer a leaf function in
> a simple config case, since it calls "fput_many" in the error case.
> 
> And that (c) is worth mentioning simply because it means that the
> function goes from not having any stack frame at all, to having to
> save/restore four registers. So now it has the usual push/pop
> sequences.
> 
> It may also be that the test-case actually does a lot of threaded
> open/close/poll, and either actually triggers the re-lookup looping
> case (unlikely) or just sees a lot of cacheline bouncing that now got
> worse due to the re-check of the file pointer.
> 
> So this regression looks real, and the issue seems to be that
> __fget_files() really is _that_ important for this do_sys_poll()
> benchmark, and even just the handful of extra instructions end up
> being meaningful.
> 
> Oliver - I'm attaching the obvious "unlikely9)" oneliner in case it's
> just "gcc thought the retry loop was the common case" issue and bad
> branch prediction.

We tested your patch, it didn't work, still has -6.0% regression, thanks.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/poll2/will-it-scale/0x42e

commit:
  5f58da2bef ("Merge tag 'drm-fixes-2021-12-03-1' of git://anongit.freedesktop.org/drm/drm")
  054aa8d439 ("fget: check that the fd still exists after getting a ref to it")
  ef8c68873e ("fixup-for-054aa8d439")

5f58da2befa58edf 054aa8d439b9185d4f5eb9a9028 ef8c68873e75cf486bec22c3e8d 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.24.threads
    277779            -5.7%     262011            -6.0%     261178        will-it-scale.per_thread_ops
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.workload
     28.74 ± 23%      -1.9       26.84 ±  6%      +1.2       29.92 ± 25%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     19.12 ±  9%      -1.7       17.39 ±  2%      -2.5       16.64 ± 11%  perf-profile.calltrace.cycles-pp.fput_many.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±100%      -0.2        0.10 ±223%      -0.1        0.17 ±141%  perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.88 ±  9%      -0.1        2.76 ±  2%      -0.2        2.64 ± 11%  perf-profile.calltrace.cycles-pp.testcase
      3.27 ±  9%      -0.1        3.18 ±  2%      -0.2        3.04 ± 11%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
      1.70 ±  9%      -0.1        1.63 ±  2%      -0.1        1.56 ± 11%  perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.70 ± 10%      -0.1        1.64 ±  2%      -0.1        1.58 ± 10%  perf-profile.calltrace.cycles-pp.fput.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.04 ±  8%      -0.1        0.98 ±  4%      -0.1        0.96 ± 11%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.64 ±  9%      -0.1        1.59 ±  3%      -0.1        1.50 ± 10%  perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
      1.69 ±  9%      -0.0        1.65 ±  3%      -0.1        1.57 ± 12%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
      1.11 ±  9%      -0.0        1.08 ±  4%      -0.1        1.02 ±  9%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.48 ±  8%      -0.0        1.45 ±  4%      -0.1        1.38 ± 10%  perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
      0.54 ± 45%      +0.1        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     67.66 ±  9%      +2.0       69.69 ±  2%      -0.9       66.78 ± 10%  perf-profile.calltrace.cycles-pp.__poll
     63.66 ±  9%      +2.2       65.82 ±  2%      -0.6       63.10 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
     63.48 ±  9%      +2.2       65.64 ±  2%      -0.6       62.93 ± 10%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     59.85 ±  9%      +2.3       62.12 ±  2%      -0.3       59.56 ± 10%  perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     57.74 ±  9%      +2.4       60.09 ±  2%      -0.1       57.62 ± 10%  perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     27.16 ± 10%      +4.3       31.51 ±  2%      +3.0       30.21 ± 10%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.91 ± 10%      +4.4       27.34 ±  2%      +3.3       26.25 ± 10%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.children.cycles-pp.start_secondary
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter_state
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpu_startup_entry
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.do_idle
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.children.cycles-pp.intel_idle
     18.28 ±  9%      -1.7       16.59 ±  2%      -2.4       15.87 ± 11%  perf-profile.children.cycles-pp.fput_many
      2.90 ±  9%      -0.1        2.76 ±  2%      -0.3        2.64 ± 11%  perf-profile.children.cycles-pp.testcase
      3.29 ±  9%      -0.1        3.20 ±  2%      -0.2        3.07 ± 11%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.69 ±  9%      -0.1        1.63 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.__fdget
      1.70 ±  9%      -0.1        1.64 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.fput
      1.82 ±  9%      -0.1        1.76 ±  3%      -0.2        1.67 ± 11%  perf-profile.children.cycles-pp.__entry_text_start
      1.07 ±  9%      -0.1        1.02 ±  4%      -0.1        0.99 ± 10%  perf-profile.children.cycles-pp.__check_object_size
      1.90 ±  9%      -0.0        1.86 ±  2%      -0.1        1.77 ± 12%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.50 ±  8%      -0.0        1.47 ±  4%      -0.1        1.39 ± 10%  perf-profile.children.cycles-pp._copy_from_user
      1.12 ±  9%      -0.0        1.09 ±  4%      -0.1        1.03 ± 10%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.30 ±  8%      -0.0        0.26 ± 11%      -0.0        0.29 ± 20%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.12 ±  8%      -0.0        0.09 ± 11%      -0.0        0.09 ± 12%  perf-profile.children.cycles-pp.poll@plt
      0.04 ± 71%      -0.0        0.01 ±223%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.seq_read_iter
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_sched
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_record
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.ksys_read
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.vfs_read
      0.61 ± 13%      -0.0        0.59 ±  9%      +0.1        0.66 ±  5%  perf-profile.children.cycles-pp.kfree
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.__libc_start_main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.run_builtin
      0.02 ±142%      -0.0        0.00            -0.0        0.00        perf-profile.children.cycles-pp.new_sync_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.ksys_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.vfs_write
      0.04 ± 45%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.13 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 15%  perf-profile.children.cycles-pp.kmalloc_slab
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.seq_read
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__libc_read
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.record__finish_output
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.perf_session__process_events
      0.06 ± 46%      -0.0        0.04 ± 73%      -0.0        0.04 ± 73%  perf-profile.children.cycles-pp.machines__deliver_event
      0.14 ± 27%      -0.0        0.12 ± 32%      -0.0        0.12 ± 19%  perf-profile.children.cycles-pp.process_simple
      0.09 ± 30%      -0.0        0.08 ± 34%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__process_user_event
      0.05 ± 47%      -0.0        0.04 ± 71%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.exc_page_fault
      0.10 ± 24%      -0.0        0.08 ± 30%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__deliver_event
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.poll_select_set_timeout
      0.09 ± 24%      -0.0        0.08 ± 28%      -0.0        0.08 ± 19%  perf-profile.children.cycles-pp.__ordered_events__flush
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.00        perf-profile.children.cycles-pp.proc_reg_read
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.perf_output_sample
      0.01 ±223%      -0.0        0.00            +0.0        0.01 ±223%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.04 ± 45%      -0.0        0.04 ± 71%      -0.0        0.00        perf-profile.children.cycles-pp.perf_callchain_user
      0.29 ±  9%      -0.0        0.28 ±  4%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.11 ± 12%      -0.0        0.10 ±  8%      -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.__might_sleep
      0.05 ± 49%      -0.0        0.05 ± 52%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.check_stack_object
      0.16 ± 20%      -0.0        0.16 ±  9%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.perf_callchain_kernel
      0.12 ± 20%      -0.0        0.11 ±  8%      -0.0        0.10 ± 11%  perf-profile.children.cycles-pp.unwind_next_frame
      0.29 ± 11%      -0.0        0.28 ±  5%      -0.0        0.26 ±  9%  perf-profile.children.cycles-pp.__check_heap_object
      0.43 ± 19%      +0.0        0.43 ± 10%      -0.1        0.37 ±  9%  perf-profile.children.cycles-pp.update_process_times
      0.23 ± 20%      +0.0        0.23 ± 11%      -0.0        0.18 ±  9%  perf-profile.children.cycles-pp.perf_prepare_sample
      0.21 ± 20%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.perf_callchain
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.children.cycles-pp.io_serial_in
      0.02 ±141%      +0.0        0.02 ±141%      +0.0        0.02 ±141%  perf-profile.children.cycles-pp.ordered_events__queue
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.uart_console_write
      0.07 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_write
      0.21 ± 21%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.get_perf_callchain
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.wait_for_xmitr
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.vprintk_emit
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.console_unlock
      0.48 ± 17%      +0.0        0.48 ±  9%      -0.1        0.42 ± 10%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.32 ± 21%      +0.0        0.32 ± 11%      -0.1        0.26 ± 11%  perf-profile.children.cycles-pp.update_curr
      0.61 ± 16%      +0.0        0.61 ±  7%      -0.1        0.55 ± 10%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.30 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.__perf_event_overflow
      0.28 ± 19%      +0.0        0.28 ± 11%      -0.1        0.23 ± 10%  perf-profile.children.cycles-pp.perf_event_output_forward
      0.29 ± 10%      +0.0        0.29 ±  9%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.__might_fault
      0.08 ± 12%      +0.0        0.08 ± 22%      -0.0        0.07 ± 14%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.worker_thread
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.process_one_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.mgag200_simple_display_pipe_update
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_memcpy_dstclip
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.memcpy_toio
      0.07 ±  8%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_putchar
      0.43 ± 19%      +0.0        0.44 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_handle
      0.57 ± 16%      +0.0        0.58 ±  7%      -0.1        0.52 ±  9%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.05 ± 47%      +0.0        0.06 ±  8%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.__unwind_start
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.perf_swevent_overflow
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.__sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_single
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp._printk
      0.44 ± 19%      +0.0        0.45 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_timer
      0.54 ± 17%      +0.0        0.55 ±  8%      -0.1        0.49 ± 10%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.54 ± 17%      +0.0        0.54 ±  8%      -0.1        0.48 ± 10%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.37 ± 19%      +0.0        0.37 ± 10%      -0.1        0.31 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
      0.06 ± 47%      +0.0        0.07 ± 18%      -0.0        0.06 ± 21%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.07 ±  9%      +0.0        0.08 ±  8%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run_list
      0.98 ± 47%      +0.0        0.99 ± 38%      -0.2        0.78 ± 45%  perf-profile.children.cycles-pp.start_kernel
      0.29 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_tp_event
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.queue_event
      0.00            +0.0        0.01 ±223%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.build_id__mark_dso_hit
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.poll_freewait
      0.39 ± 19%      +0.0        0.40 ±  9%      -0.1        0.34 ±  9%  perf-profile.children.cycles-pp.scheduler_tick
      0.04 ± 71%      +0.0        0.04 ± 45%      -0.0        0.02 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.ret_from_fork
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.kthread
      0.19 ± 10%      +0.0        0.20 ±  9%      +0.0        0.19 ± 10%  perf-profile.children.cycles-pp.__might_resched
      0.02 ±141%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.01 ±223%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.__get_user_nocheck_8
      0.50 ± 12%      +0.0        0.53 ±  9%      -0.0        0.48 ± 11%  perf-profile.children.cycles-pp.__kmalloc
     67.89 ±  9%      +2.0       69.92 ±  2%      -0.9       67.00 ± 10%  perf-profile.children.cycles-pp.__poll
     63.84 ±  9%      +2.1       65.97 ±  2%      -0.6       63.26 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     63.66 ±  9%      +2.1       65.80 ±  2%      -0.6       63.09 ± 10%  perf-profile.children.cycles-pp.do_syscall_64
     59.86 ±  9%      +2.3       62.12 ±  2%      -0.3       59.57 ± 10%  perf-profile.children.cycles-pp.__x64_sys_poll
     59.47 ±  9%      +2.3       61.75 ±  2%      -0.2       59.22 ± 10%  perf-profile.children.cycles-pp.do_sys_poll
     26.33 ± 10%      +4.4       30.70 ±  2%      +3.1       29.43 ± 10%  perf-profile.children.cycles-pp.__fget_light
     22.92 ± 10%      +4.4       27.35 ±  2%      +3.3       26.26 ± 10%  perf-profile.children.cycles-pp.__fget_files
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.self.cycles-pp.intel_idle
     17.29 ±  9%      -1.6       15.64 ±  2%      -2.3       14.96 ± 11%  perf-profile.self.cycles-pp.fput_many
     11.06 ±  9%      -0.3       10.74 ±  2%      -0.8       10.29 ± 10%  perf-profile.self.cycles-pp.do_sys_poll
      2.87 ±  9%      -0.1        2.74 ±  2%      -0.2        2.62 ± 11%  perf-profile.self.cycles-pp.testcase
      3.21 ±  9%      -0.1        3.12 ±  2%      -0.2        3.00 ± 11%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.60 ±  9%      -0.1        1.54 ±  4%      -0.1        1.47 ± 10%  perf-profile.self.cycles-pp.__entry_text_start
      1.90 ±  8%      -0.0        1.86 ±  3%      -0.1        1.76 ± 12%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      2.55 ±  9%      -0.0        2.51 ±  4%      -0.2        2.36 ± 12%  perf-profile.self.cycles-pp.__fget_light
      0.84 ±  8%      -0.0        0.81 ±  2%      -0.1        0.78 ± 10%  perf-profile.self.cycles-pp.fput
      0.29 ±  8%      -0.0        0.26 ± 11%      -0.0        0.28 ± 20%  perf-profile.self.cycles-pp.__virt_addr_valid
      1.10 ±  9%      -0.0        1.07 ±  4%      -0.1        1.01 ± 10%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.28 ±  8%      -0.0        0.26 ±  5%      -0.0        0.26 ± 11%  perf-profile.self.cycles-pp.do_syscall_64
      0.84 ±  9%      -0.0        0.82 ±  3%      -0.0        0.79 ± 10%  perf-profile.self.cycles-pp.__fdget
      0.10 ±  6%      -0.0        0.08 ± 10%      -0.0        0.08 ± 14%  perf-profile.self.cycles-pp.poll@plt
      0.60 ± 12%      -0.0        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.kfree
      0.11 ± 12%      -0.0        0.09 ±  7%      -0.0        0.10 ± 13%  perf-profile.self.cycles-pp._copy_from_user
      0.44 ±  9%      -0.0        0.43 ±  3%      -0.0        0.40 ± 15%  perf-profile.self.cycles-pp.__check_object_size
      0.31 ± 10%      -0.0        0.30 ±  3%      -0.0        0.28 ± 15%  perf-profile.self.cycles-pp.__x64_sys_poll
      0.18 ±  8%      -0.0        0.17 ±  5%      -0.0        0.16 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.self.cycles-pp.poll_select_set_timeout
      0.04 ± 71%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.12 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 17%  perf-profile.self.cycles-pp.kmalloc_slab
      0.01 ±223%      -0.0        0.00            -0.0        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.06 ± 11%      -0.0        0.05 ±  8%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.03 ±100%      -0.0        0.02 ±142%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.check_stack_object
      0.28 ± 11%      -0.0        0.27 ±  5%      -0.0        0.25 ± 11%  perf-profile.self.cycles-pp.__check_heap_object
      0.09 ± 13%      -0.0        0.09 ±  7%      -0.0        0.09 ± 10%  perf-profile.self.cycles-pp.__might_sleep
      0.08 ± 17%      -0.0        0.08 ± 25%      -0.0        0.08 ± 11%  perf-profile.self.cycles-pp.__might_fault
      0.03 ±100%      -0.0        0.02 ± 99%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.unwind_next_frame
      0.01 ±223%      +0.0        0.01 ±223%      -0.0        0.00        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.self.cycles-pp.io_serial_in
      0.11 ±  9%      +0.0        0.12 ±  8%      -0.0        0.11 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.44 ±  8%      +0.0        0.45 ±  4%      -0.0        0.42 ± 12%  perf-profile.self.cycles-pp.__poll
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.self.cycles-pp.queue_event
      0.18 ± 11%      +0.0        0.19 ±  8%      -0.0        0.18 ± 10%  perf-profile.self.cycles-pp.__might_resched
      0.26 ± 13%      +0.0        0.30 ± 17%      -0.0        0.25 ± 13%  perf-profile.self.cycles-pp.__kmalloc
     22.70 ± 10%      +4.4       27.11 ±  2%      +3.3       26.04 ± 10%  perf-profile.self.cycles-pp.__fget_files

> 
> And it would perhaps be interesting to get an actual instruction-level
> profile of that __fget_files() thing for that benchmark, if that
> pinpoints exactly what is going on and in case that would be easy to
> get on that machine.
> 
> Because it might just be truly horrendously bad luck, with the 32-byte
> stack frame meaning that the kernel stack goes one more page down
> (just jhandwaving from the dTLB number spike), and this all being just
> random bad luck on that particular benchmark.
> 
> Of course, the thing about poll() is that for that case, we *don't*
> really need the "re-check the file descriptor" code at all, since the
> resulting fd isn't going to be installed as a new fd, and it doesn't
> matter for the socket garbage collector logic.
> 
> So maybe it was a mistake to put that re-check in the generic fdget()
> code - yes, it should be cheap, but it's also some of the most hot
> code in the kernel on some loads.
> 
> But if we move it elsewhere, we'd need to come up with some list of
> "these cases need it". Some are obvious: dup, dup2, unix domain file
> passing. It's the non-obvious ones I'd worry about.
> 
> Anybody?
> 
>               Linus

>  fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index ad4a8bf3cf10..f802360e240d 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -858,7 +858,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd,
>  			file = NULL;
>  		else if (!get_file_rcu_many(file, refs))
>  			goto loop;
> -		else if (files_lookup_fd_raw(files, fd) != file) {
> +		else if (unlikely(files_lookup_fd_raw(files, fd) != file)) {
>  			fput_many(file, refs);
>  			goto loop;
>  		}

> _______________________________________________
> LKP mailing list -- lkp@lists.01.org
> To unsubscribe send an email to lkp-leave@lists.01.org


  parent reply	other threads:[~2021-12-13 10:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  5:37 [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression kernel test robot
2021-12-10  5:37 ` kernel test robot
2021-12-10 18:33 ` Linus Torvalds
2021-12-10 18:33   ` Linus Torvalds
2021-12-10 20:29   ` Jann Horn
2021-12-10 20:29     ` Jann Horn
2021-12-10 21:25     ` Linus Torvalds
2021-12-10 21:25       ` Linus Torvalds
2021-12-10 21:59       ` Linus Torvalds
2021-12-10 21:59         ` Linus Torvalds
2021-12-10 23:29         ` Jann Horn
2021-12-10 23:29           ` Jann Horn
2021-12-11  1:01           ` Linus Torvalds
2021-12-11  1:01             ` Linus Torvalds
2021-12-11  1:32         ` Linus Torvalds
2021-12-11  1:32           ` Linus Torvalds
2021-12-13  8:31         ` Carel Si
2021-12-13  8:31           ` [LKP] " Carel Si
2021-12-13 18:37           ` Linus Torvalds
2021-12-13 18:37             ` [LKP] " Linus Torvalds
2021-12-13 19:44             ` Linus Torvalds
2021-12-13 19:44               ` [LKP] " Linus Torvalds
2021-12-15 12:54               ` Greg KH
2021-12-15 12:54                 ` [LKP] " Greg KH
2021-12-13 10:57   ` Carel Si [this message]
2021-12-13 10:57     ` Carel Si

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211213105728.GA21139@linux.intel.com \
    --to=beibei.si@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.