* [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression
@ 2025-01-26 8:16 kernel test robot
2025-01-27 19:26 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2025-01-26 8:16 UTC (permalink / raw)
To: Al Viro
Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, linux-fsdevel,
oliver.sang
Hello,
kernel test robot noticed a 11.7% regression of will-it-scale.per_process_ops on:
commit: 89359897983825dbfc08578e7ee807aaf24d9911 ("do_pollfd(): convert to CLASS(fd)")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test faield on linus/master b46c89c08f4146e7987fc355941a93b12e2c03ef]
[test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:
nr_task: 100%
mode: process
test: poll2
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202501261509.b6b4260d-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250126/202501261509.b6b4260d-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/poll2/will-it-scale
commit:
d000e073ca ("convert do_select()")
8935989798 ("do_pollfd(): convert to CLASS(fd)")
d000e073ca2a08ab 89359897983825dbfc08578e7ee
---------------- ---------------------------
%stddev %change %stddev
\ | \
21281 ±147% +197.5% 63313 ± 84% numa-meminfo.node0.Shmem
5318 ±147% +197.5% 15825 ± 84% numa-vmstat.node0.nr_shmem
27370126 -11.7% 24170828 will-it-scale.104.processes
263173 -11.7% 232411 will-it-scale.per_process_ops
27370126 -11.7% 24170828 will-it-scale.workload
0.12 ± 16% -42.1% 0.07 ± 42% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
4.33 ± 28% +154.2% 11.02 ± 61% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
268.62 ± 53% -61.2% 104.10 ±114% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1053 ± 6% -17.1% 873.33 ± 15% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1687 ± 10% +11.7% 1884 ± 6% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
3519 ± 4% +11.2% 3913 ± 5% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
8.67 ± 28% +154.2% 22.04 ± 61% perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
268.45 ± 53% -61.4% 103.72 ±115% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
4.33 ± 28% +154.2% 11.02 ± 61% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.01 ± 2% +10.0% 0.01 perf-stat.i.MPKI
5.157e+10 -11.7% 4.554e+10 perf-stat.i.branch-instructions
1.573e+08 -11.8% 1.387e+08 perf-stat.i.branch-misses
0.97 +13.1% 1.09 perf-stat.i.cpi
2.9e+11 -11.7% 2.561e+11 perf-stat.i.instructions
1.04 -11.7% 0.91 perf-stat.i.ipc
0.00 ± 2% +17.9% 0.00 perf-stat.overall.MPKI
0.96 +13.2% 1.09 perf-stat.overall.cpi
1.04 -11.7% 0.92 perf-stat.overall.ipc
5.14e+10 -11.7% 4.538e+10 perf-stat.ps.branch-instructions
1.567e+08 -11.8% 1.382e+08 perf-stat.ps.branch-misses
2.891e+11 -11.7% 2.552e+11 perf-stat.ps.instructions
8.743e+13 -11.7% 7.724e+13 perf-stat.total.instructions
7.61 -0.6 7.03 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
6.16 -0.5 5.66 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__poll
5.11 ± 2% -0.5 4.62 ± 2% perf-profile.calltrace.cycles-pp.testcase
2.92 ± 2% -0.4 2.55 ± 2% perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.91 -0.3 2.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__poll
1.92 ± 5% -0.3 1.67 ± 4% perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
2.12 -0.2 1.91 perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.32 -0.2 1.17 perf-profile.calltrace.cycles-pp.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.84 -0.1 1.72 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__poll
0.98 -0.1 0.88 ± 2% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
0.97 -0.1 0.88 perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.72 -0.1 0.66 perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
0.62 -0.1 0.57 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
94.36 +0.5 94.89 perf-profile.calltrace.cycles-pp.__poll
75.76 +2.0 77.76 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
71.45 +2.4 73.83 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
69.72 +2.5 72.24 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
69.19 +2.6 71.77 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
54.05 +4.1 58.18 perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
38.56 +4.5 43.08 perf-profile.calltrace.cycles-pp.fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
7.68 -0.6 7.10 perf-profile.children.cycles-pp.syscall_return_via_sysret
6.61 -0.6 6.06 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
5.12 ± 2% -0.5 4.64 ± 2% perf-profile.children.cycles-pp.testcase
3.15 ± 2% -0.4 2.74 ± 2% perf-profile.children.cycles-pp._copy_from_user
3.70 -0.4 3.33 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.94 ± 4% -0.3 1.69 ± 4% perf-profile.children.cycles-pp.rep_movs_alternative
2.26 -0.2 2.04 perf-profile.children.cycles-pp.__check_object_size
1.35 -0.2 1.19 perf-profile.children.cycles-pp.__kmalloc_noprof
1.04 -0.1 0.94 perf-profile.children.cycles-pp.check_heap_object
0.97 -0.1 0.88 perf-profile.children.cycles-pp.kfree
1.07 -0.1 1.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.74 -0.1 0.66 perf-profile.children.cycles-pp.__virt_addr_valid
0.57 -0.1 0.50 perf-profile.children.cycles-pp.__check_heap_object
0.63 -0.0 0.58 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.22 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.check_stack_object
0.18 ± 3% -0.0 0.16 perf-profile.children.cycles-pp.__cond_resched
0.07 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.is_vmalloc_addr
0.13 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.x64_sys_call
0.34 -0.0 0.33 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.12 ± 3% -0.0 0.11 perf-profile.children.cycles-pp.rcu_all_qs
94.98 +0.5 95.45 perf-profile.children.cycles-pp.__poll
75.89 +2.0 77.89 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
71.52 +2.4 73.89 perf-profile.children.cycles-pp.do_syscall_64
69.78 +2.5 72.29 perf-profile.children.cycles-pp.__x64_sys_poll
69.28 +2.6 71.85 perf-profile.children.cycles-pp.do_sys_poll
54.18 +4.1 58.28 perf-profile.children.cycles-pp.do_poll
38.44 +4.6 43.00 perf-profile.children.cycles-pp.fdget
7.24 -0.6 6.60 perf-profile.self.cycles-pp.do_sys_poll
7.68 -0.6 7.09 perf-profile.self.cycles-pp.syscall_return_via_sysret
16.95 -0.6 16.39 perf-profile.self.cycles-pp.do_poll
6.55 -0.5 6.00 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
4.93 ± 2% -0.5 4.46 ± 2% perf-profile.self.cycles-pp.testcase
4.46 -0.4 4.06 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.25 -0.3 2.92 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.78 ± 5% -0.2 1.54 ± 4% perf-profile.self.cycles-pp.rep_movs_alternative
1.34 -0.2 1.18 perf-profile.self.cycles-pp._copy_from_user
1.16 -0.1 1.02 ± 2% perf-profile.self.cycles-pp.__kmalloc_noprof
0.96 -0.1 0.87 perf-profile.self.cycles-pp.kfree
0.68 -0.1 0.61 ± 2% perf-profile.self.cycles-pp.__virt_addr_valid
0.56 -0.1 0.50 perf-profile.self.cycles-pp.__check_heap_object
0.43 -0.0 0.39 perf-profile.self.cycles-pp.__x64_sys_poll
0.49 -0.0 0.45 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.29 ± 2% -0.0 0.26 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.19 -0.0 0.17 ± 3% perf-profile.self.cycles-pp.check_stack_object
0.26 -0.0 0.25 ± 3% perf-profile.self.cycles-pp.check_heap_object
0.12 -0.0 0.11 ± 3% perf-profile.self.cycles-pp.x64_sys_call
36.98 +4.6 41.62 perf-profile.self.cycles-pp.fdget
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression
2025-01-26 8:16 [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression kernel test robot
@ 2025-01-27 19:26 ` Al Viro
2025-01-28 9:37 ` Oliver Sang
0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2025-01-27 19:26 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, linux-fsdevel
On Sun, Jan 26, 2025 at 04:16:04PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 11.7% regression of will-it-scale.per_process_ops on:
>
>
> commit: 89359897983825dbfc08578e7ee807aaf24d9911 ("do_pollfd(): convert to CLASS(fd)")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test faield on linus/master b46c89c08f4146e7987fc355941a93b12e2c03ef]
> [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
>
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 104 threads 2 sockets (Skylake) with 192G memory
> parameters:
>
> nr_task: 100%
> mode: process
> test: poll2
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202501261509.b6b4260d-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250126/202501261509.b6b4260d-lkp@intel.com
Very interesting... Looking at the generated asm, two things seem to
change in there- "we need an fput()" case in (now implicit) fdput() in
do_pollfd() is no longer out of line and slightly different spills are
done in do_poll().
Just to make sure it's not a geniune change of logics somewhere,
could you compare d000e073ca2a, 893598979838 and d000e073ca2a with the
delta below? That delta provably is an equivalent transformation - all
exits from do_pollfd() go through the return in the end, so that just
shifts the last assignment in there into the caller.
diff --git a/fs/select.c b/fs/select.c
index b41e2d651cc1..e0c816fa4ec4 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -875,8 +875,6 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
fdput(f);
out:
- /* ... and so does ->revents */
- pollfd->revents = mangle_poll(mask);
return mask;
}
@@ -909,6 +907,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
pfd = walk->entries;
pfd_end = pfd + walk->len;
for (; pfd != pfd_end; pfd++) {
+ __poll_t mask;
/*
* Fish for events. If we found one, record it
* and kill poll_table->_qproc, so we don't
@@ -916,8 +915,9 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
* this. They'll get immediately deregistered
* when we break out and return.
*/
- if (do_pollfd(pfd, pt, &can_busy_loop,
- busy_flag)) {
+ mask = do_pollfd(pfd, pt, &can_busy_loop, busy_flag);
+ pfd->revents = mangle_poll(mask);
+ if (mask) {
count++;
pt->_qproc = NULL;
/* found something, stop busy polling */
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression
2025-01-27 19:26 ` Al Viro
@ 2025-01-28 9:37 ` Oliver Sang
2025-01-28 19:10 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: Oliver Sang @ 2025-01-28 9:37 UTC (permalink / raw)
To: Al Viro
Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, linux-fsdevel,
oliver.sang
hi, Al Viro,
On Mon, Jan 27, 2025 at 07:26:16PM +0000, Al Viro wrote:
> On Sun, Jan 26, 2025 at 04:16:04PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 11.7% regression of will-it-scale.per_process_ops on:
> >
> >
> > commit: 89359897983825dbfc08578e7ee807aaf24d9911 ("do_pollfd(): convert to CLASS(fd)")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test faield on linus/master b46c89c08f4146e7987fc355941a93b12e2c03ef]
> > [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
> >
> > testcase: will-it-scale
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 104 threads 2 sockets (Skylake) with 192G memory
> > parameters:
> >
> > nr_task: 100%
> > mode: process
> > test: poll2
> > cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202501261509.b6b4260d-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250126/202501261509.b6b4260d-lkp@intel.com
>
> Very interesting... Looking at the generated asm, two things seem to
> change in there- "we need an fput()" case in (now implicit) fdput() in
> do_pollfd() is no longer out of line and slightly different spills are
> done in do_poll().
>
> Just to make sure it's not a geniune change of logics somewhere,
> could you compare d000e073ca2a, 893598979838 and d000e073ca2a with the
> delta below? That delta provably is an equivalent transformation - all
> exits from do_pollfd() go through the return in the end, so that just
> shifts the last assignment in there into the caller.
the 'd000e073ca2a with the delta below' has just very similar score as
d000e073ca2a as below.
Tested-by: kernel test robot <oliver.sang@intel.com>
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/poll2/will-it-scale
commit:
d000e073ca ("convert do_select()")
8935989798 ("do_pollfd(): convert to CLASS(fd)")
2c43a225261 <--- d000e073ca with the delta below
d000e073ca2a08ab 89359897983825dbfc08578e7ee 2c43a2252614bf1692ef2ad5a46
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
263173 -11.7% 232411 -0.5% 261953 will-it-scale.per_process_ops
below full comparison FYI.
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/poll2/will-it-scale
commit:
d000e073ca ("convert do_select()")
8935989798 ("do_pollfd(): convert to CLASS(fd)")
2c43a225261 <--- d000e073ca with the delta below
d000e073ca2a08ab 89359897983825dbfc08578e7ee 2c43a2252614bf1692ef2ad5a46
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
1.98e+08 ± 12% +15.7% 2.29e+08 ± 18% -13.1% 1.721e+08 cpuidle..time
21281 ±147% +197.5% 63313 ± 84% +180.7% 59731 ± 86% numa-meminfo.node0.Shmem
5318 ±147% +197.5% 15825 ± 84% +180.7% 14930 ± 86% numa-vmstat.node0.nr_shmem
88607 +0.2% 88803 -1.5% 87297 proc-vmstat.nr_shmem
11118 ± 15% +13.6% 12633 ± 51% -27.7% 8034 ± 10% proc-vmstat.numa_hint_faults_local
21894 ± 4% +135.8% 51630 ±124% +144.5% 53539 ±117% sched_debug.cfs_rq:/.load.max
2575 ± 4% +106.7% 5323 ±112% +115.5% 5548 ±106% sched_debug.cfs_rq:/.load.stddev
3940 ± 18% -19.1% 3188 ± 8% -25.5% 2933 ± 20% sched_debug.cpu.avg_idle.min
27370126 -11.7% 24170828 -0.5% 27243222 will-it-scale.104.processes
263173 -11.7% 232411 -0.5% 261953 will-it-scale.per_process_ops
27370126 -11.7% 24170828 -0.5% 27243222 will-it-scale.workload
0.12 ± 16% -42.1% 0.07 ± 42% -36.3% 0.07 ± 35% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
4.33 ± 28% +154.2% 11.02 ± 61% +86.2% 8.07 ± 83% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
2.27 ± 22% -34.2% 1.49 ± 66% -48.9% 1.16 ± 36% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
268.62 ± 53% -61.2% 104.10 ±114% -39.4% 162.90 ± 82% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1053 ± 6% -17.1% 873.33 ± 15% -4.6% 1004 ± 11% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1687 ± 10% +11.7% 1884 ± 6% +5.3% 1777 ± 10% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
3519 ± 4% +11.2% 3913 ± 5% +3.9% 3656 ± 5% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
8.67 ± 28% +154.2% 22.04 ± 61% +86.2% 16.14 ± 83% perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
268.45 ± 53% -61.4% 103.72 ±115% -39.5% 162.49 ± 83% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
4.33 ± 28% +154.2% 11.02 ± 61% +86.2% 8.07 ± 83% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.01 ± 2% +10.0% 0.01 +0.7% 0.01 ± 2% perf-stat.i.MPKI
5.157e+10 -11.7% 4.554e+10 -0.5% 5.133e+10 perf-stat.i.branch-instructions
1.573e+08 -11.8% 1.387e+08 +0.0% 1.573e+08 perf-stat.i.branch-misses
0.97 +13.1% 1.09 +0.2% 0.97 perf-stat.i.cpi
2.9e+11 -11.7% 2.561e+11 -0.5% 2.887e+11 perf-stat.i.instructions
1.04 -11.7% 0.91 -0.2% 1.03 perf-stat.i.ipc
0.00 ± 2% +17.9% 0.00 +1.4% 0.00 ± 3% perf-stat.overall.MPKI
0.96 +13.2% 1.09 +0.2% 0.97 perf-stat.overall.cpi
1.04 -11.7% 0.92 -0.2% 1.03 perf-stat.overall.ipc
5.14e+10 -11.7% 4.538e+10 -0.5% 5.116e+10 perf-stat.ps.branch-instructions
1.567e+08 -11.8% 1.382e+08 +0.0% 1.568e+08 perf-stat.ps.branch-misses
2.891e+11 -11.7% 2.552e+11 -0.5% 2.877e+11 perf-stat.ps.instructions
8.743e+13 -11.7% 7.724e+13 -0.5% 8.699e+13 perf-stat.total.instructions
7.61 -0.6 7.03 +0.0 7.63 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
6.16 -0.5 5.66 -0.0 6.13 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__poll
5.11 ± 2% -0.5 4.62 ± 2% +0.3 5.44 perf-profile.calltrace.cycles-pp.testcase
2.92 ± 2% -0.4 2.55 ± 2% -0.1 2.85 perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.91 -0.3 2.60 +0.0 2.93 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__poll
1.92 ± 5% -0.3 1.67 ± 4% -0.1 1.84 perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
2.12 -0.2 1.91 -0.0 2.10 perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.32 -0.2 1.17 -0.0 1.30 perf-profile.calltrace.cycles-pp.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.84 -0.1 1.72 +0.0 1.85 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__poll
0.98 -0.1 0.88 ± 2% -0.0 0.97 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
0.97 -0.1 0.88 -0.0 0.94 ± 4% perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.72 -0.1 0.66 -0.0 0.72 perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
0.62 -0.1 0.57 +0.0 0.62 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
94.36 +0.5 94.89 -0.3 94.03 perf-profile.calltrace.cycles-pp.__poll
75.76 +2.0 77.76 -0.3 75.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
71.45 +2.4 73.83 -0.3 71.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
69.72 +2.5 72.24 -0.4 69.32 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
69.19 +2.6 71.77 -0.4 68.80 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
54.05 +4.1 58.18 -0.2 53.85 perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
38.56 +4.5 43.08 -0.2 38.35 perf-profile.calltrace.cycles-pp.fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
7.68 -0.6 7.10 +0.0 7.70 perf-profile.children.cycles-pp.syscall_return_via_sysret
6.61 -0.6 6.06 -0.0 6.59 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
5.12 ± 2% -0.5 4.64 ± 2% +0.3 5.45 perf-profile.children.cycles-pp.testcase
3.15 ± 2% -0.4 2.74 ± 2% -0.1 3.07 perf-profile.children.cycles-pp._copy_from_user
3.70 -0.4 3.33 +0.0 3.72 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.94 ± 4% -0.3 1.69 ± 4% -0.1 1.86 perf-profile.children.cycles-pp.rep_movs_alternative
2.26 -0.2 2.04 -0.0 2.25 perf-profile.children.cycles-pp.__check_object_size
1.35 -0.2 1.19 -0.0 1.33 perf-profile.children.cycles-pp.__kmalloc_noprof
1.04 -0.1 0.94 -0.0 1.04 perf-profile.children.cycles-pp.check_heap_object
0.97 -0.1 0.88 -0.0 0.94 ± 4% perf-profile.children.cycles-pp.kfree
1.07 -0.1 1.00 +0.0 1.08 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.74 -0.1 0.66 -0.0 0.73 perf-profile.children.cycles-pp.__virt_addr_valid
0.57 -0.1 0.50 -0.0 0.56 ± 2% perf-profile.children.cycles-pp.__check_heap_object
0.63 -0.0 0.58 +0.0 0.63 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.22 ± 2% -0.0 0.20 ± 2% +0.0 0.23 ± 3% perf-profile.children.cycles-pp.check_stack_object
0.18 ± 3% -0.0 0.16 -0.0 0.17 ± 2% perf-profile.children.cycles-pp.__cond_resched
0.07 ± 6% -0.0 0.06 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.is_vmalloc_addr
0.13 -0.0 0.12 ± 3% +0.0 0.13 perf-profile.children.cycles-pp.x64_sys_call
0.34 -0.0 0.33 -0.0 0.34 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.12 ± 3% -0.0 0.11 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.rcu_all_qs
94.98 +0.5 95.45 -0.3 94.65 perf-profile.children.cycles-pp.__poll
75.89 +2.0 77.89 -0.3 75.58 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
71.52 +2.4 73.89 -0.3 71.19 perf-profile.children.cycles-pp.do_syscall_64
69.78 +2.5 72.29 -0.4 69.38 perf-profile.children.cycles-pp.__x64_sys_poll
69.28 +2.6 71.85 -0.4 68.89 perf-profile.children.cycles-pp.do_sys_poll
54.18 +4.1 58.28 -0.2 53.99 perf-profile.children.cycles-pp.do_poll
38.44 +4.6 43.00 -0.2 38.24 perf-profile.children.cycles-pp.fdget
7.24 -0.6 6.60 -0.1 7.19 perf-profile.self.cycles-pp.do_sys_poll
7.68 -0.6 7.09 +0.0 7.70 perf-profile.self.cycles-pp.syscall_return_via_sysret
16.95 -0.6 16.39 +0.0 16.96 perf-profile.self.cycles-pp.do_poll
6.55 -0.5 6.00 -0.0 6.52 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
4.93 ± 2% -0.5 4.46 ± 2% +0.3 5.26 perf-profile.self.cycles-pp.testcase
4.46 -0.4 4.06 +0.0 4.47 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.25 -0.3 2.92 +0.0 3.27 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.78 ± 5% -0.2 1.54 ± 4% -0.1 1.70 perf-profile.self.cycles-pp.rep_movs_alternative
1.34 -0.2 1.18 -0.0 1.33 perf-profile.self.cycles-pp._copy_from_user
1.16 -0.1 1.02 ± 2% -0.0 1.15 perf-profile.self.cycles-pp.__kmalloc_noprof
0.96 -0.1 0.87 -0.0 0.93 ± 4% perf-profile.self.cycles-pp.kfree
0.68 -0.1 0.61 ± 2% -0.0 0.68 perf-profile.self.cycles-pp.__virt_addr_valid
0.56 -0.1 0.50 -0.0 0.55 perf-profile.self.cycles-pp.__check_heap_object
0.43 -0.0 0.39 -0.0 0.43 perf-profile.self.cycles-pp.__x64_sys_poll
0.49 -0.0 0.45 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.29 ± 2% -0.0 0.26 ± 3% +0.0 0.30 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.19 -0.0 0.17 ± 3% +0.0 0.19 ± 2% perf-profile.self.cycles-pp.check_stack_object
0.26 -0.0 0.25 ± 3% -0.0 0.26 ± 2% perf-profile.self.cycles-pp.check_heap_object
0.12 -0.0 0.11 ± 3% +0.0 0.12 perf-profile.self.cycles-pp.x64_sys_call
36.98 +4.6 41.62 -0.2 36.77 perf-profile.self.cycles-pp.fdget
>
> diff --git a/fs/select.c b/fs/select.c
> index b41e2d651cc1..e0c816fa4ec4 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -875,8 +875,6 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
> fdput(f);
>
> out:
> - /* ... and so does ->revents */
> - pollfd->revents = mangle_poll(mask);
> return mask;
> }
>
> @@ -909,6 +907,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
> pfd = walk->entries;
> pfd_end = pfd + walk->len;
> for (; pfd != pfd_end; pfd++) {
> + __poll_t mask;
> /*
> * Fish for events. If we found one, record it
> * and kill poll_table->_qproc, so we don't
> @@ -916,8 +915,9 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
> * this. They'll get immediately deregistered
> * when we break out and return.
> */
> - if (do_pollfd(pfd, pt, &can_busy_loop,
> - busy_flag)) {
> + mask = do_pollfd(pfd, pt, &can_busy_loop, busy_flag);
> + pfd->revents = mangle_poll(mask);
> + if (mask) {
> count++;
> pt->_qproc = NULL;
> /* found something, stop busy polling */
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression
2025-01-28 9:37 ` Oliver Sang
@ 2025-01-28 19:10 ` Al Viro
2025-02-18 5:31 ` Oliver Sang
0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2025-01-28 19:10 UTC (permalink / raw)
To: Oliver Sang; +Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, linux-fsdevel
On Tue, Jan 28, 2025 at 05:37:39PM +0800, Oliver Sang wrote:
> > Just to make sure it's not a geniune change of logics somewhere,
> > could you compare d000e073ca2a, 893598979838 and d000e073ca2a with the
> > delta below? That delta provably is an equivalent transformation - all
> > exits from do_pollfd() go through the return in the end, so that just
> > shifts the last assignment in there into the caller.
>
> the 'd000e073ca2a with the delta below' has just very similar score as
> d000e073ca2a as below.
Not a change of logics, then... AFAICS, the only differences in code generation
here are different spills and conditional fput() not taken out of line.
I'm somewhat surprised by the amount of slowdowns, TBH... Is there any
chance to get per-insn profiles for those? How much time is spent in
each insn of do_poll()/do_pollfd()?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression
2025-01-28 19:10 ` Al Viro
@ 2025-02-18 5:31 ` Oliver Sang
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2025-02-18 5:31 UTC (permalink / raw)
To: Al Viro
Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, linux-fsdevel,
oliver.sang
hi, Al Viro,
On Tue, Jan 28, 2025 at 07:10:42PM +0000, Al Viro wrote:
> On Tue, Jan 28, 2025 at 05:37:39PM +0800, Oliver Sang wrote:
>
> > > Just to make sure it's not a geniune change of logics somewhere,
> > > could you compare d000e073ca2a, 893598979838 and d000e073ca2a with the
> > > delta below? That delta provably is an equivalent transformation - all
> > > exits from do_pollfd() go through the return in the end, so that just
> > > shifts the last assignment in there into the caller.
> >
> > the 'd000e073ca2a with the delta below' has just very similar score as
> > d000e073ca2a as below.
>
> Not a change of logics, then... AFAICS, the only differences in code generation
> here are different spills and conditional fput() not taken out of line.
>
> I'm somewhat surprised by the amount of slowdowns, TBH... Is there any
> chance to get per-insn profiles for those? How much time is spent in
> each insn of do_poll()/do_pollfd()?
sorry for late.
we cannot support per-insn profiles for now.
at the same time, we revisit this results on some newer platforms, found there
is no signicicant regression by same tests.
on an Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory
=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/no-monitor/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2ap2/poll2/will-it-scale
d000e073ca2a08ab 89359897983825dbfc08578e7ee
---------------- ---------------------------
%stddev %change %stddev
\ | \
775439 +0.6% 780350 will-it-scale.per_process_ops
on an INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/no-monitor/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/poll2/will-it-scale
d000e073ca2a08ab 89359897983825dbfc08578e7ee
---------------- ---------------------------
%stddev %change %stddev
\ | \
583865 -0.8% 579319 will-it-scale.per_process_ops
on an Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/no-monitor/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/poll2/will-it-scale
d000e073ca2a08ab 89359897983825dbfc08578e7ee
---------------- ---------------------------
%stddev %change %stddev
\ | \
595389 -1.6% 586063 will-it-scale.per_process_ops
our original report is upon a Skylake which maybe kind of old.
if you still want more check or you have more patch want us to check, please
let us know. thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-18 5:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-26 8:16 [linus:master] [do_pollfd()] 8935989798: will-it-scale.per_process_ops 11.7% regression kernel test robot
2025-01-27 19:26 ` Al Viro
2025-01-28 9:37 ` Oliver Sang
2025-01-28 19:10 ` Al Viro
2025-02-18 5:31 ` Oliver Sang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox