* [linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression
@ 2025-01-27 14:32 kernel test robot
2025-01-28 10:51 ` Christian Brauner
0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2025-01-27 14:32 UTC (permalink / raw)
To: Christian Brauner; +Cc: oe-lkp, lkp, linux-kernel, linux-fsdevel, oliver.sang
Hello,
kernel test robot noticed a 12.6% regression of stress-ng.fstat.ops_per_sec on:
commit: 16ecd47cb0cd895c7c2f5dd5db50f6c005c51639 ("pidfs: lookup pid through rbtree")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master aa22f4da2a46b484a257d167c67a2adc1b7aaf68]
[test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: btrfs
test: fstat
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.pthread.ops_per_sec 23.7% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=pthread |
| | testtime=60s |
+------------------+---------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202501272257.a95372bc-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250127/202501272257.a95372bc-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s
commit:
59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
16ecd47cb0 ("pidfs: lookup pid through rbtree")
59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5
---------------- ---------------------------
%stddev %change %stddev
\ | \
2813179 ± 2% -30.7% 1948548 cpuidle..usage
7.22 -6.8% 6.73 ± 2% iostat.cpu.user
0.38 -0.0 0.33 mpstat.cpu.all.irq%
5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active
5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active(anon)
681017 -13.0% 592632 vmstat.system.cs
262754 -8.6% 240105 vmstat.system.in
25349297 -14.3% 21728755 numa-numastat.node0.local_node
25389508 -14.3% 21770830 numa-numastat.node0.numa_hit
26719069 -14.2% 22919085 numa-numastat.node1.local_node
26746344 -14.2% 22943171 numa-numastat.node1.numa_hit
25391110 -14.3% 21771814 numa-vmstat.node0.numa_hit
25350899 -14.3% 21729738 numa-vmstat.node0.numa_local
1423040 ± 5% -13.3% 1233884 ± 10% numa-vmstat.node1.nr_active_anon
1423039 ± 5% -13.3% 1233883 ± 10% numa-vmstat.node1.nr_zone_active_anon
26748443 -14.2% 22948826 numa-vmstat.node1.numa_hit
26721168 -14.2% 22924740 numa-vmstat.node1.numa_local
4274794 -12.6% 3735109 stress-ng.fstat.ops
71246 -12.6% 62251 stress-ng.fstat.ops_per_sec
13044663 -10.2% 11715455 stress-ng.time.involuntary_context_switches
4590 -2.1% 4492 stress-ng.time.percent_of_cpu_this_job_got
2545 -1.6% 2503 stress-ng.time.system_time
212.55 -8.2% 195.17 ± 2% stress-ng.time.user_time
6786385 -12.7% 5924000 stress-ng.time.voluntary_context_switches
9685654 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg
4917374 ± 6% +26.4% 6217585 ± 8% sched_debug.cfs_rq:/.avg_vruntime.min
9685655 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg
4917374 ± 6% +26.4% 6217586 ± 8% sched_debug.cfs_rq:/.min_vruntime.min
319.78 ± 4% -8.9% 291.47 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
331418 -12.3% 290724 sched_debug.cpu.nr_switches.avg
349777 -12.0% 307943 sched_debug.cpu.nr_switches.max
247719 ± 5% -18.2% 202753 ± 2% sched_debug.cpu.nr_switches.min
1681668 -5.8% 1584232 proc-vmstat.nr_active_anon
2335388 -4.2% 2237095 proc-vmstat.nr_file_pages
1434429 -6.9% 1336146 proc-vmstat.nr_shmem
50745 -2.5% 49497 proc-vmstat.nr_slab_unreclaimable
1681668 -5.8% 1584232 proc-vmstat.nr_zone_active_anon
52137742 -14.2% 44716504 proc-vmstat.numa_hit
52070256 -14.2% 44650343 proc-vmstat.numa_local
57420831 -13.4% 49744871 proc-vmstat.pgalloc_normal
54983559 -13.7% 47445719 proc-vmstat.pgfree
1.30 -10.6% 1.17 perf-stat.i.MPKI
2.797e+10 -7.0% 2.6e+10 perf-stat.i.branch-instructions
0.32 ± 4% +0.0 0.33 perf-stat.i.branch-miss-rate%
24.15 -1.1 23.00 perf-stat.i.cache-miss-rate%
1.689e+08 -17.1% 1.401e+08 perf-stat.i.cache-misses
6.99e+08 -12.9% 6.085e+08 perf-stat.i.cache-references
708230 -12.7% 618047 perf-stat.i.context-switches
1.71 +8.2% 1.85 perf-stat.i.cpi
115482 -2.7% 112333 perf-stat.i.cpu-migrations
1311 +21.2% 1588 perf-stat.i.cycles-between-cache-misses
1.288e+11 -7.3% 1.195e+11 perf-stat.i.instructions
0.59 -7.4% 0.55 perf-stat.i.ipc
12.84 -11.0% 11.43 perf-stat.i.metric.K/sec
1.31 -10.5% 1.17 perf-stat.overall.MPKI
0.29 ± 4% +0.0 0.30 perf-stat.overall.branch-miss-rate%
24.21 -1.1 23.07 perf-stat.overall.cache-miss-rate%
1.71 +8.2% 1.85 perf-stat.overall.cpi
1303 +21.0% 1576 perf-stat.overall.cycles-between-cache-misses
0.58 -7.6% 0.54 perf-stat.overall.ipc
2.724e+10 -6.8% 2.539e+10 perf-stat.ps.branch-instructions
1.648e+08 -16.8% 1.371e+08 perf-stat.ps.cache-misses
6.807e+08 -12.7% 5.943e+08 perf-stat.ps.cache-references
689389 -12.5% 603372 perf-stat.ps.context-switches
1.255e+11 -7.0% 1.167e+11 perf-stat.ps.instructions
7.621e+12 -6.9% 7.097e+12 perf-stat.total.instructions
56.06 -56.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
56.04 -56.0 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
31.25 -31.2 0.00 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
31.23 -31.2 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
31.22 -31.2 0.00 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
27.58 -27.6 0.00 perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64
23.72 -23.7 0.00 perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
23.68 -23.7 0.00 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
20.15 -20.2 0.00 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
19.23 -19.2 0.00 perf-profile.calltrace.cycles-pp.fstatat64
16.51 -16.5 0.00 perf-profile.calltrace.cycles-pp.statx
14.81 -14.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64
14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
14.05 -14.0 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3
14.04 -14.0 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
13.55 -13.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit
13.24 -13.2 0.00 perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
13.08 -13.1 0.00 perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
12.01 -12.0 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
11.93 -11.9 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit
11.76 -11.8 0.00 perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
11.72 -11.7 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
11.45 -11.4 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
10.27 -10.3 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
7.21 -7.2 0.00 perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.25 -5.3 0.00 perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
86.11 -86.1 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
85.52 -85.5 0.00 perf-profile.children.cycles-pp.do_syscall_64
41.40 -41.4 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
40.49 -40.5 0.00 perf-profile.children.cycles-pp.queued_write_lock_slowpath
31.57 -31.6 0.00 perf-profile.children.cycles-pp.x64_sys_call
31.23 -31.2 0.00 perf-profile.children.cycles-pp.do_exit
31.23 -31.2 0.00 perf-profile.children.cycles-pp.__x64_sys_exit
27.59 -27.6 0.00 perf-profile.children.cycles-pp.exit_notify
23.72 -23.7 0.00 perf-profile.children.cycles-pp.__do_sys_clone3
23.69 -23.7 0.00 perf-profile.children.cycles-pp.kernel_clone
20.18 -20.2 0.00 perf-profile.children.cycles-pp.copy_process
19.70 -19.7 0.00 perf-profile.children.cycles-pp.fstatat64
16.58 -16.6 0.00 perf-profile.children.cycles-pp.statx
13.51 -13.5 0.00 perf-profile.children.cycles-pp.__do_sys_newfstatat
13.25 -13.2 0.00 perf-profile.children.cycles-pp.release_task
12.22 -12.2 0.00 perf-profile.children.cycles-pp.vfs_fstatat
11.38 -11.4 0.00 perf-profile.children.cycles-pp.vfs_statx
10.36 -10.4 0.00 perf-profile.children.cycles-pp.__x64_sys_statx
8.25 -8.3 0.00 perf-profile.children.cycles-pp.filename_lookup
7.89 -7.9 0.00 perf-profile.children.cycles-pp.getname_flags
7.74 -7.7 0.00 perf-profile.children.cycles-pp.path_lookupat
41.39 -41.4 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pthread/stress-ng/60s
commit:
59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
16ecd47cb0 ("pidfs: lookup pid through rbtree")
59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5
---------------- ---------------------------
%stddev %change %stddev
\ | \
6.458e+08 ± 3% -20.7% 5.119e+08 ± 6% cpuidle..time
4424460 ± 4% -56.5% 1923713 ± 2% cpuidle..usage
1916 +17.2% 2245 ± 2% vmstat.procs.r
880095 -24.7% 662885 vmstat.system.cs
717291 -7.6% 662983 vmstat.system.in
4.81 -0.9 3.87 ± 2% mpstat.cpu.all.idle%
0.48 -0.1 0.42 mpstat.cpu.all.irq%
0.32 ± 3% -0.1 0.26 ± 2% mpstat.cpu.all.soft%
1.77 -0.3 1.46 mpstat.cpu.all.usr%
43182538 -21.9% 33726626 numa-numastat.node0.local_node
43338607 -22.0% 33814109 numa-numastat.node0.numa_hit
43334202 -22.8% 33451907 numa-numastat.node1.local_node
43415892 -22.6% 33601910 numa-numastat.node1.numa_hit
43339112 -22.0% 33811967 numa-vmstat.node0.numa_hit
43183037 -21.9% 33724483 numa-vmstat.node0.numa_local
43416602 -22.6% 33599378 numa-vmstat.node1.numa_hit
43334912 -22.8% 33449374 numa-vmstat.node1.numa_local
13189 ± 14% -24.0% 10022 ± 19% perf-c2c.DRAM.local
9611 ± 16% -28.8% 6844 ± 17% perf-c2c.DRAM.remote
16436 ± 15% -32.1% 11162 ± 19% perf-c2c.HITM.local
4431 ± 16% -30.8% 3064 ± 19% perf-c2c.HITM.remote
20868 ± 15% -31.8% 14226 ± 19% perf-c2c.HITM.total
205629 +67.1% 343625 stress-ng.pthread.nanosecs_to_start_a_pthread
12690825 -23.7% 9689255 stress-ng.pthread.ops
210833 -23.7% 160924 stress-ng.pthread.ops_per_sec
5684649 -16.0% 4772378 stress-ng.time.involuntary_context_switches
26588792 -21.0% 20998281 stress-ng.time.minor_page_faults
12705 +5.1% 13353 stress-ng.time.percent_of_cpu_this_job_got
7559 +5.6% 7986 stress-ng.time.system_time
132.77 -24.1% 100.72 stress-ng.time.user_time
29099733 -22.3% 22601666 stress-ng.time.voluntary_context_switches
340547 +1.4% 345226 proc-vmstat.nr_mapped
150971 -3.2% 146184 proc-vmstat.nr_page_table_pages
48017 -2.0% 47078 proc-vmstat.nr_slab_reclaimable
540694 ± 9% +50.6% 814286 ± 15% proc-vmstat.numa_hint_faults
255145 ± 22% +62.3% 414122 ± 17% proc-vmstat.numa_hint_faults_local
86757062 -22.3% 67418409 proc-vmstat.numa_hit
86519300 -22.4% 67180920 proc-vmstat.numa_local
89935256 -22.2% 69939407 proc-vmstat.pgalloc_normal
27887502 -20.1% 22295448 proc-vmstat.pgfault
86343992 -22.7% 66777255 proc-vmstat.pgfree
1187131 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.avg_vruntime.stddev
12970740 ± 42% -49.3% 6577803 ± 11% sched_debug.cfs_rq:/.left_deadline.max
2408752 ± 4% -9.6% 2177658 ± 2% sched_debug.cfs_rq:/.left_deadline.stddev
12970554 ± 42% -49.3% 6577515 ± 11% sched_debug.cfs_rq:/.left_vruntime.max
2408688 ± 4% -9.6% 2177606 ± 2% sched_debug.cfs_rq:/.left_vruntime.stddev
1187132 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.min_vruntime.stddev
12970563 ± 42% -49.3% 6577516 ± 11% sched_debug.cfs_rq:/.right_vruntime.max
2408788 ± 4% -9.6% 2177610 ± 2% sched_debug.cfs_rq:/.right_vruntime.stddev
2096120 -68.2% 665792 sched_debug.cpu.curr->pid.max
655956 ± 8% -53.1% 307752 sched_debug.cpu.curr->pid.stddev
124008 -24.6% 93528 sched_debug.cpu.nr_switches.avg
270857 ± 4% -38.9% 165624 ± 10% sched_debug.cpu.nr_switches.max
27972 ± 13% -67.5% 9102 ± 17% sched_debug.cpu.nr_switches.stddev
179.43 ± 4% +17.8% 211.44 ± 4% sched_debug.cpu.nr_uninterruptible.stddev
4.21 -13.4% 3.65 perf-stat.i.MPKI
2.03e+10 -8.3% 1.863e+10 perf-stat.i.branch-instructions
0.66 -0.1 0.61 perf-stat.i.branch-miss-rate%
1.289e+08 -16.7% 1.074e+08 perf-stat.i.branch-misses
39.17 +0.7 39.92 perf-stat.i.cache-miss-rate%
3.806e+08 -21.8% 2.976e+08 perf-stat.i.cache-misses
9.691e+08 -23.3% 7.437e+08 perf-stat.i.cache-references
903142 -24.9% 678436 perf-stat.i.context-switches
6.89 +11.5% 7.69 perf-stat.i.cpi
6.239e+11 +1.0% 6.304e+11 perf-stat.i.cpu-cycles
311004 -18.5% 253387 perf-stat.i.cpu-migrations
1631 +29.1% 2106 perf-stat.i.cycles-between-cache-misses
9.068e+10 -9.7% 8.192e+10 perf-stat.i.instructions
0.15 -9.5% 0.14 perf-stat.i.ipc
10.41 -22.2% 8.11 perf-stat.i.metric.K/sec
462421 -19.7% 371144 perf-stat.i.minor-faults
668589 -21.0% 527974 perf-stat.i.page-faults
4.22 -13.6% 3.65 perf-stat.overall.MPKI
0.63 -0.1 0.57 perf-stat.overall.branch-miss-rate%
39.29 +0.7 40.04 perf-stat.overall.cache-miss-rate%
6.94 +11.7% 7.75 perf-stat.overall.cpi
1643 +29.3% 2125 perf-stat.overall.cycles-between-cache-misses
0.14 -10.5% 0.13 perf-stat.overall.ipc
1.971e+10 -8.6% 1.801e+10 perf-stat.ps.branch-instructions
1.237e+08 -17.2% 1.024e+08 perf-stat.ps.branch-misses
3.713e+08 -22.3% 2.887e+08 perf-stat.ps.cache-misses
9.451e+08 -23.7% 7.21e+08 perf-stat.ps.cache-references
883135 -25.3% 659967 perf-stat.ps.context-switches
304186 -18.9% 246645 perf-stat.ps.cpu-migrations
8.797e+10 -10.0% 7.916e+10 perf-stat.ps.instructions
445107 -20.6% 353509 perf-stat.ps.minor-faults
646755 -21.7% 506142 perf-stat.ps.page-faults
5.397e+12 -10.2% 4.846e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression
2025-01-27 14:32 [linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression kernel test robot
@ 2025-01-28 10:51 ` Christian Brauner
2025-01-28 13:38 ` Mateusz Guzik
0 siblings, 1 reply; 3+ messages in thread
From: Christian Brauner @ 2025-01-28 10:51 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-lkp, lkp, linux-kernel, linux-fsdevel
On Mon, Jan 27, 2025 at 10:32:11PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 12.6% regression of stress-ng.fstat.ops_per_sec on:
I'm confused about how this would affect stat performance given that it
has absolutely nothing to do with stat. Is this stating pidfds at least?
>
>
> commit: 16ecd47cb0cd895c7c2f5dd5db50f6c005c51639 ("pidfs: lookup pid through rbtree")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master aa22f4da2a46b484a257d167c67a2adc1b7aaf68]
> [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
>
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 100%
> disk: 1HDD
> testtime: 60s
> fs: btrfs
> test: fstat
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.pthread.ops_per_sec 23.7% regression |
> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | nr_threads=100% |
> | | test=pthread |
> | | testtime=60s |
> +------------------+---------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202501272257.a95372bc-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250127/202501272257.a95372bc-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s
>
> commit:
> 59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
> 16ecd47cb0 ("pidfs: lookup pid through rbtree")
>
> 59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2813179 ± 2% -30.7% 1948548 cpuidle..usage
> 7.22 -6.8% 6.73 ± 2% iostat.cpu.user
> 0.38 -0.0 0.33 mpstat.cpu.all.irq%
> 5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active
> 5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active(anon)
> 681017 -13.0% 592632 vmstat.system.cs
> 262754 -8.6% 240105 vmstat.system.in
> 25349297 -14.3% 21728755 numa-numastat.node0.local_node
> 25389508 -14.3% 21770830 numa-numastat.node0.numa_hit
> 26719069 -14.2% 22919085 numa-numastat.node1.local_node
> 26746344 -14.2% 22943171 numa-numastat.node1.numa_hit
> 25391110 -14.3% 21771814 numa-vmstat.node0.numa_hit
> 25350899 -14.3% 21729738 numa-vmstat.node0.numa_local
> 1423040 ± 5% -13.3% 1233884 ± 10% numa-vmstat.node1.nr_active_anon
> 1423039 ± 5% -13.3% 1233883 ± 10% numa-vmstat.node1.nr_zone_active_anon
> 26748443 -14.2% 22948826 numa-vmstat.node1.numa_hit
> 26721168 -14.2% 22924740 numa-vmstat.node1.numa_local
> 4274794 -12.6% 3735109 stress-ng.fstat.ops
> 71246 -12.6% 62251 stress-ng.fstat.ops_per_sec
> 13044663 -10.2% 11715455 stress-ng.time.involuntary_context_switches
> 4590 -2.1% 4492 stress-ng.time.percent_of_cpu_this_job_got
> 2545 -1.6% 2503 stress-ng.time.system_time
> 212.55 -8.2% 195.17 ± 2% stress-ng.time.user_time
> 6786385 -12.7% 5924000 stress-ng.time.voluntary_context_switches
> 9685654 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg
> 4917374 ± 6% +26.4% 6217585 ± 8% sched_debug.cfs_rq:/.avg_vruntime.min
> 9685655 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg
> 4917374 ± 6% +26.4% 6217586 ± 8% sched_debug.cfs_rq:/.min_vruntime.min
> 319.78 ± 4% -8.9% 291.47 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
> 331418 -12.3% 290724 sched_debug.cpu.nr_switches.avg
> 349777 -12.0% 307943 sched_debug.cpu.nr_switches.max
> 247719 ± 5% -18.2% 202753 ± 2% sched_debug.cpu.nr_switches.min
> 1681668 -5.8% 1584232 proc-vmstat.nr_active_anon
> 2335388 -4.2% 2237095 proc-vmstat.nr_file_pages
> 1434429 -6.9% 1336146 proc-vmstat.nr_shmem
> 50745 -2.5% 49497 proc-vmstat.nr_slab_unreclaimable
> 1681668 -5.8% 1584232 proc-vmstat.nr_zone_active_anon
> 52137742 -14.2% 44716504 proc-vmstat.numa_hit
> 52070256 -14.2% 44650343 proc-vmstat.numa_local
> 57420831 -13.4% 49744871 proc-vmstat.pgalloc_normal
> 54983559 -13.7% 47445719 proc-vmstat.pgfree
> 1.30 -10.6% 1.17 perf-stat.i.MPKI
> 2.797e+10 -7.0% 2.6e+10 perf-stat.i.branch-instructions
> 0.32 ± 4% +0.0 0.33 perf-stat.i.branch-miss-rate%
> 24.15 -1.1 23.00 perf-stat.i.cache-miss-rate%
> 1.689e+08 -17.1% 1.401e+08 perf-stat.i.cache-misses
> 6.99e+08 -12.9% 6.085e+08 perf-stat.i.cache-references
> 708230 -12.7% 618047 perf-stat.i.context-switches
> 1.71 +8.2% 1.85 perf-stat.i.cpi
> 115482 -2.7% 112333 perf-stat.i.cpu-migrations
> 1311 +21.2% 1588 perf-stat.i.cycles-between-cache-misses
> 1.288e+11 -7.3% 1.195e+11 perf-stat.i.instructions
> 0.59 -7.4% 0.55 perf-stat.i.ipc
> 12.84 -11.0% 11.43 perf-stat.i.metric.K/sec
> 1.31 -10.5% 1.17 perf-stat.overall.MPKI
> 0.29 ± 4% +0.0 0.30 perf-stat.overall.branch-miss-rate%
> 24.21 -1.1 23.07 perf-stat.overall.cache-miss-rate%
> 1.71 +8.2% 1.85 perf-stat.overall.cpi
> 1303 +21.0% 1576 perf-stat.overall.cycles-between-cache-misses
> 0.58 -7.6% 0.54 perf-stat.overall.ipc
> 2.724e+10 -6.8% 2.539e+10 perf-stat.ps.branch-instructions
> 1.648e+08 -16.8% 1.371e+08 perf-stat.ps.cache-misses
> 6.807e+08 -12.7% 5.943e+08 perf-stat.ps.cache-references
> 689389 -12.5% 603372 perf-stat.ps.context-switches
> 1.255e+11 -7.0% 1.167e+11 perf-stat.ps.instructions
> 7.621e+12 -6.9% 7.097e+12 perf-stat.total.instructions
> 56.06 -56.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 56.04 -56.0 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 31.25 -31.2 0.00 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 31.23 -31.2 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 31.22 -31.2 0.00 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 27.58 -27.6 0.00 perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64
> 23.72 -23.7 0.00 perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 23.68 -23.7 0.00 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 20.15 -20.2 0.00 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 19.23 -19.2 0.00 perf-profile.calltrace.cycles-pp.fstatat64
> 16.51 -16.5 0.00 perf-profile.calltrace.cycles-pp.statx
> 14.81 -14.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64
> 14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
> 14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
> 14.05 -14.0 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3
> 14.04 -14.0 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
> 13.55 -13.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit
> 13.24 -13.2 0.00 perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
> 13.08 -13.1 0.00 perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
> 12.01 -12.0 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
> 11.93 -11.9 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit
> 11.76 -11.8 0.00 perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
> 11.72 -11.7 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
> 11.45 -11.4 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
> 10.27 -10.3 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
> 7.21 -7.2 0.00 perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 5.25 -5.3 0.00 perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
> 86.11 -86.1 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 85.52 -85.5 0.00 perf-profile.children.cycles-pp.do_syscall_64
> 41.40 -41.4 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 40.49 -40.5 0.00 perf-profile.children.cycles-pp.queued_write_lock_slowpath
> 31.57 -31.6 0.00 perf-profile.children.cycles-pp.x64_sys_call
> 31.23 -31.2 0.00 perf-profile.children.cycles-pp.do_exit
> 31.23 -31.2 0.00 perf-profile.children.cycles-pp.__x64_sys_exit
> 27.59 -27.6 0.00 perf-profile.children.cycles-pp.exit_notify
> 23.72 -23.7 0.00 perf-profile.children.cycles-pp.__do_sys_clone3
> 23.69 -23.7 0.00 perf-profile.children.cycles-pp.kernel_clone
> 20.18 -20.2 0.00 perf-profile.children.cycles-pp.copy_process
> 19.70 -19.7 0.00 perf-profile.children.cycles-pp.fstatat64
> 16.58 -16.6 0.00 perf-profile.children.cycles-pp.statx
> 13.51 -13.5 0.00 perf-profile.children.cycles-pp.__do_sys_newfstatat
> 13.25 -13.2 0.00 perf-profile.children.cycles-pp.release_task
> 12.22 -12.2 0.00 perf-profile.children.cycles-pp.vfs_fstatat
> 11.38 -11.4 0.00 perf-profile.children.cycles-pp.vfs_statx
> 10.36 -10.4 0.00 perf-profile.children.cycles-pp.__x64_sys_statx
> 8.25 -8.3 0.00 perf-profile.children.cycles-pp.filename_lookup
> 7.89 -7.9 0.00 perf-profile.children.cycles-pp.getname_flags
> 7.74 -7.7 0.00 perf-profile.children.cycles-pp.path_lookupat
> 41.39 -41.4 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
> ***************************************************************************************************
> lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pthread/stress-ng/60s
>
> commit:
> 59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
> 16ecd47cb0 ("pidfs: lookup pid through rbtree")
>
> 59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 6.458e+08 ± 3% -20.7% 5.119e+08 ± 6% cpuidle..time
> 4424460 ± 4% -56.5% 1923713 ± 2% cpuidle..usage
> 1916 +17.2% 2245 ± 2% vmstat.procs.r
> 880095 -24.7% 662885 vmstat.system.cs
> 717291 -7.6% 662983 vmstat.system.in
> 4.81 -0.9 3.87 ± 2% mpstat.cpu.all.idle%
> 0.48 -0.1 0.42 mpstat.cpu.all.irq%
> 0.32 ± 3% -0.1 0.26 ± 2% mpstat.cpu.all.soft%
> 1.77 -0.3 1.46 mpstat.cpu.all.usr%
> 43182538 -21.9% 33726626 numa-numastat.node0.local_node
> 43338607 -22.0% 33814109 numa-numastat.node0.numa_hit
> 43334202 -22.8% 33451907 numa-numastat.node1.local_node
> 43415892 -22.6% 33601910 numa-numastat.node1.numa_hit
> 43339112 -22.0% 33811967 numa-vmstat.node0.numa_hit
> 43183037 -21.9% 33724483 numa-vmstat.node0.numa_local
> 43416602 -22.6% 33599378 numa-vmstat.node1.numa_hit
> 43334912 -22.8% 33449374 numa-vmstat.node1.numa_local
> 13189 ± 14% -24.0% 10022 ± 19% perf-c2c.DRAM.local
> 9611 ± 16% -28.8% 6844 ± 17% perf-c2c.DRAM.remote
> 16436 ± 15% -32.1% 11162 ± 19% perf-c2c.HITM.local
> 4431 ± 16% -30.8% 3064 ± 19% perf-c2c.HITM.remote
> 20868 ± 15% -31.8% 14226 ± 19% perf-c2c.HITM.total
> 205629 +67.1% 343625 stress-ng.pthread.nanosecs_to_start_a_pthread
> 12690825 -23.7% 9689255 stress-ng.pthread.ops
> 210833 -23.7% 160924 stress-ng.pthread.ops_per_sec
> 5684649 -16.0% 4772378 stress-ng.time.involuntary_context_switches
> 26588792 -21.0% 20998281 stress-ng.time.minor_page_faults
> 12705 +5.1% 13353 stress-ng.time.percent_of_cpu_this_job_got
> 7559 +5.6% 7986 stress-ng.time.system_time
> 132.77 -24.1% 100.72 stress-ng.time.user_time
> 29099733 -22.3% 22601666 stress-ng.time.voluntary_context_switches
> 340547 +1.4% 345226 proc-vmstat.nr_mapped
> 150971 -3.2% 146184 proc-vmstat.nr_page_table_pages
> 48017 -2.0% 47078 proc-vmstat.nr_slab_reclaimable
> 540694 ± 9% +50.6% 814286 ± 15% proc-vmstat.numa_hint_faults
> 255145 ± 22% +62.3% 414122 ± 17% proc-vmstat.numa_hint_faults_local
> 86757062 -22.3% 67418409 proc-vmstat.numa_hit
> 86519300 -22.4% 67180920 proc-vmstat.numa_local
> 89935256 -22.2% 69939407 proc-vmstat.pgalloc_normal
> 27887502 -20.1% 22295448 proc-vmstat.pgfault
> 86343992 -22.7% 66777255 proc-vmstat.pgfree
> 1187131 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.avg_vruntime.stddev
> 12970740 ± 42% -49.3% 6577803 ± 11% sched_debug.cfs_rq:/.left_deadline.max
> 2408752 ± 4% -9.6% 2177658 ± 2% sched_debug.cfs_rq:/.left_deadline.stddev
> 12970554 ± 42% -49.3% 6577515 ± 11% sched_debug.cfs_rq:/.left_vruntime.max
> 2408688 ± 4% -9.6% 2177606 ± 2% sched_debug.cfs_rq:/.left_vruntime.stddev
> 1187132 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.min_vruntime.stddev
> 12970563 ± 42% -49.3% 6577516 ± 11% sched_debug.cfs_rq:/.right_vruntime.max
> 2408788 ± 4% -9.6% 2177610 ± 2% sched_debug.cfs_rq:/.right_vruntime.stddev
> 2096120 -68.2% 665792 sched_debug.cpu.curr->pid.max
> 655956 ± 8% -53.1% 307752 sched_debug.cpu.curr->pid.stddev
> 124008 -24.6% 93528 sched_debug.cpu.nr_switches.avg
> 270857 ± 4% -38.9% 165624 ± 10% sched_debug.cpu.nr_switches.max
> 27972 ± 13% -67.5% 9102 ± 17% sched_debug.cpu.nr_switches.stddev
> 179.43 ± 4% +17.8% 211.44 ± 4% sched_debug.cpu.nr_uninterruptible.stddev
> 4.21 -13.4% 3.65 perf-stat.i.MPKI
> 2.03e+10 -8.3% 1.863e+10 perf-stat.i.branch-instructions
> 0.66 -0.1 0.61 perf-stat.i.branch-miss-rate%
> 1.289e+08 -16.7% 1.074e+08 perf-stat.i.branch-misses
> 39.17 +0.7 39.92 perf-stat.i.cache-miss-rate%
> 3.806e+08 -21.8% 2.976e+08 perf-stat.i.cache-misses
> 9.691e+08 -23.3% 7.437e+08 perf-stat.i.cache-references
> 903142 -24.9% 678436 perf-stat.i.context-switches
> 6.89 +11.5% 7.69 perf-stat.i.cpi
> 6.239e+11 +1.0% 6.304e+11 perf-stat.i.cpu-cycles
> 311004 -18.5% 253387 perf-stat.i.cpu-migrations
> 1631 +29.1% 2106 perf-stat.i.cycles-between-cache-misses
> 9.068e+10 -9.7% 8.192e+10 perf-stat.i.instructions
> 0.15 -9.5% 0.14 perf-stat.i.ipc
> 10.41 -22.2% 8.11 perf-stat.i.metric.K/sec
> 462421 -19.7% 371144 perf-stat.i.minor-faults
> 668589 -21.0% 527974 perf-stat.i.page-faults
> 4.22 -13.6% 3.65 perf-stat.overall.MPKI
> 0.63 -0.1 0.57 perf-stat.overall.branch-miss-rate%
> 39.29 +0.7 40.04 perf-stat.overall.cache-miss-rate%
> 6.94 +11.7% 7.75 perf-stat.overall.cpi
> 1643 +29.3% 2125 perf-stat.overall.cycles-between-cache-misses
> 0.14 -10.5% 0.13 perf-stat.overall.ipc
> 1.971e+10 -8.6% 1.801e+10 perf-stat.ps.branch-instructions
> 1.237e+08 -17.2% 1.024e+08 perf-stat.ps.branch-misses
> 3.713e+08 -22.3% 2.887e+08 perf-stat.ps.cache-misses
> 9.451e+08 -23.7% 7.21e+08 perf-stat.ps.cache-references
> 883135 -25.3% 659967 perf-stat.ps.context-switches
> 304186 -18.9% 246645 perf-stat.ps.cpu-migrations
> 8.797e+10 -10.0% 7.916e+10 perf-stat.ps.instructions
> 445107 -20.6% 353509 perf-stat.ps.minor-faults
> 646755 -21.7% 506142 perf-stat.ps.page-faults
> 5.397e+12 -10.2% 4.846e+12 perf-stat.total.instructions
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression
2025-01-28 10:51 ` Christian Brauner
@ 2025-01-28 13:38 ` Mateusz Guzik
0 siblings, 0 replies; 3+ messages in thread
From: Mateusz Guzik @ 2025-01-28 13:38 UTC (permalink / raw)
To: Christian Brauner
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, linux-fsdevel
On Tue, Jan 28, 2025 at 11:51:49AM +0100, Christian Brauner wrote:
> On Mon, Jan 27, 2025 at 10:32:11PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 12.6% regression of stress-ng.fstat.ops_per_sec on:
>
> I'm confused about how this would affect stat performance given that it
> has absolutely nothing to do with stat. Is this stating pidfds at least?
>
>
stress-ng is issuing the "claimed" syscall in some capacity, but it also
mixes in other stuff.
In this particular case the test continuously creates and destroys
threads.
This in turn runs into pid alloc/dealloc code you modified.
I verified with bpftrace that contention around pid alloc *is seen*.
one-liner: bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }'
@[
__pv_queued_spin_lock_slowpath+5
_raw_spin_lock_irqsave+49
free_pid+44
release_task+609
do_exit+1717
__x64_sys_exit+27
x64_sys_call+4654
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
]: 472350
@[
__pv_queued_spin_lock_slowpath+5
_raw_spin_lock_irq+42
alloc_pid+390
copy_process+6112
kernel_clone+155
__do_sys_clone3+194
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
]: 568447
there is of course tons more
So the new code is plausibly slower to alloc/dealloc and is lowering
throughput as a result.
I'll note though that thread creation/destruction has pretty horrid
scalability as is.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-28 13:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-27 14:32 [linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression kernel test robot
2025-01-28 10:51 ` Christian Brauner
2025-01-28 13:38 ` Mateusz Guzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox