* [bytedance:5.15-velinux] [mm] bf31671edf: netperf.Throughput_Mbps 11.0% regression
@ 2025-08-25 7:05 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-08-25 7:05 UTC (permalink / raw)
To: jason.zeng; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed a 11.0% regression of netperf.Throughput_Mbps on:
commit: bf31671edffe7ea925baa93caff59a860f1ddfa8 ("mm: memcg: make stats flushing threshold per-memcg")
https://github.com/bytedance/kernel.git 5.15-velinux
testcase: netperf
config: x86_64-bytedance-5.15-velinux
compiler: gcc-12
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:
ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
send_size: 10K
test: SCTP_STREAM_MANY
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508251443.3f803480-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250825/202508251443.3f803480-lkp@intel.com
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/send_size/tbox_group/test/testcase:
cs-localhost/gcc-12/performance/ipv4/x86_64-bytedance-5.15-velinux/200%/debian-12-x86_64-20240206.cgz/300s/10K/lkp-gnr-2sp3/SCTP_STREAM_MANY/netperf
commit:
69642f5099 ("mm: memcg: move vmstats structs definition above flushing code")
bf31671edf ("mm: memcg: make stats flushing threshold per-memcg")
69642f5099423baf bf31671edffe7ea925baa93caff
---------------- ---------------------------
%stddev %change %stddev
\ | \
85772547 ± 4% -10.4% 76821561 cpuidle..usage
135.00 ± 9% -16.8% 112.33 ± 8% perf-c2c.HITM.remote
7.04 ± 81% -83.3% 1.18 ±177% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.kthread.ret_from_fork
521963 ± 4% -11.2% 463279 vmstat.system.cs
309897 ± 3% -10.7% 276864 vmstat.system.in
1.65 ± 5% -0.2 1.42 ± 2% mpstat.cpu.all.soft%
5.69 ± 5% -0.7 4.96 mpstat.cpu.all.sys%
9.42 ± 4% -12.8% 8.21 ± 5% mpstat.max_utilization_pct
302723 ± 10% -19.0% 245225 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
266650 ± 14% -26.2% 196867 ± 18% sched_debug.cfs_rq:/.spread0.max
305935 ± 4% -11.2% 271568 sched_debug.cpu.nr_switches.avg
4.854e+08 ± 4% -11.0% 4.318e+08 proc-vmstat.numa_hit
4.851e+08 ± 4% -11.0% 4.315e+08 proc-vmstat.numa_local
2.726e+09 ± 4% -11.1% 2.424e+09 proc-vmstat.pgalloc_normal
2.726e+09 ± 4% -11.1% 2.424e+09 proc-vmstat.pgfree
4.9e+09 ± 4% -10.3% 4.394e+09 perf-stat.i.branch-instructions
19860290 ± 8% -11.5% 17582329 perf-stat.i.branch-misses
13095483 ± 4% -10.0% 11784812 ± 3% perf-stat.i.cache-misses
7.297e+08 ± 4% -10.4% 6.54e+08 perf-stat.i.cache-references
527531 ± 4% -11.2% 468461 perf-stat.i.context-switches
8.112e+10 ± 5% -11.7% 7.163e+10 perf-stat.i.cpu-cycles
2.417e+10 ± 4% -10.5% 2.164e+10 perf-stat.i.instructions
2.06 ± 4% -11.3% 1.83 perf-stat.i.metric.K/sec
202.69 ± 4% -11.0% 180.30 netperf.ThroughputBoth_Mbps
103778 ± 4% -11.0% 92315 netperf.ThroughputBoth_total_Mbps
202.69 ± 4% -11.0% 180.30 netperf.Throughput_Mbps
103778 ± 4% -11.0% 92315 netperf.Throughput_total_Mbps
23850 ± 2% -6.9% 22200 netperf.time.involuntary_context_switches
1145 ± 5% -12.0% 1008 netperf.time.percent_of_cpu_this_job_got
3443 ± 5% -12.0% 3031 netperf.time.system_time
11648832 ± 3% -9.8% 10509763 netperf.time.voluntary_context_switches
3.8e+08 ± 4% -11.0% 3.381e+08 netperf.workload
20.64 -0.7 19.94 perf-profile.calltrace.cycles-pp.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv
17.64 -0.5 17.19 perf-profile.calltrace.cycles-pp.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv.sctp_backlog_rcv
1.98 ± 3% -0.4 1.57 ± 19% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue.get_page_from_freelist.__alloc_pages.kmalloc_large_node
1.14 ± 22% -0.3 0.86 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.kmalloc_large_node
48.45 -0.2 48.21 perf-profile.calltrace.cycles-pp.sctp_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
49.16 -0.2 48.93 perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.92 ± 2% -0.2 2.71 ± 2% perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
2.84 ± 2% -0.2 2.63 ± 2% perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page.skb_release_data.kfree_skb_reason.sctp_recvmsg
2.71 ± 2% -0.2 2.51 ± 2% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_commit.free_unref_page.skb_release_data.kfree_skb_reason
9.25 -0.2 9.06 perf-profile.calltrace.cycles-pp.sctp_datamsg_from_user.sctp_sendmsg_to_asoc.sctp_sendmsg.____sys_sendmsg.___sys_sendmsg
8.87 -0.2 8.69 perf-profile.calltrace.cycles-pp.sctp_packet_pack.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm
8.60 -0.2 8.42 perf-profile.calltrace.cycles-pp.memcpy_erms.sctp_packet_pack.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter
4.86 -0.1 4.71 perf-profile.calltrace.cycles-pp.sctp_make_datafrag_empty.sctp_datamsg_from_user.sctp_sendmsg_to_asoc.sctp_sendmsg.____sys_sendmsg
4.75 -0.1 4.61 perf-profile.calltrace.cycles-pp._sctp_make_chunk.sctp_make_datafrag_empty.sctp_datamsg_from_user.sctp_sendmsg_to_asoc.sctp_sendmsg
4.19 -0.1 4.08 perf-profile.calltrace.cycles-pp.__alloc_skb._sctp_make_chunk.sctp_make_datafrag_empty.sctp_datamsg_from_user.sctp_sendmsg_to_asoc
3.61 -0.1 3.50 perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb._sctp_make_chunk.sctp_make_datafrag_empty
3.62 -0.1 3.51 perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb._sctp_make_chunk.sctp_make_datafrag_empty.sctp_datamsg_from_user
3.58 -0.1 3.48 perf-profile.calltrace.cycles-pp.kmalloc_large_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb._sctp_make_chunk
0.64 -0.1 0.58 ± 3% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.secondary_startup_64_no_verify
0.66 -0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.secondary_startup_64_no_verify
1.30 +0.1 1.40 perf-profile.calltrace.cycles-pp.sctp_ulpevent_free.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
1.56 +0.2 1.76 ± 5% perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.sctp_sendmsg_to_asoc.sctp_sendmsg.____sys_sendmsg
1.58 +0.2 1.77 ± 4% perf-profile.calltrace.cycles-pp.__sk_mem_schedule.sctp_sendmsg_to_asoc.sctp_sendmsg.____sys_sendmsg.___sys_sendmsg
0.58 ± 2% +0.2 0.83 ± 6% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put
3.48 +0.3 3.76 ± 3% perf-profile.calltrace.cycles-pp.sctp_outq_sack.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv.sctp_backlog_rcv
1.64 +0.3 1.98 perf-profile.calltrace.cycles-pp.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack
1.74 +0.3 2.07 perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack.sctp_cmd_interpreter
8.18 +0.5 8.65 perf-profile.calltrace.cycles-pp.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
0.88 ± 2% +0.5 1.41 ± 4% perf-profile.calltrace.cycles-pp.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_schedule.sctp_sendmsg_to_asoc.sctp_sendmsg
0.93 ± 2% +0.8 1.74 ± 5% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
1.55 +0.8 2.38 ± 4% perf-profile.calltrace.cycles-pp.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.sock_recvmsg
0.00 +0.9 0.88 ± 5% perf-profile.calltrace.cycles-pp.__mod_memcg_state.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_schedule.sctp_sendmsg_to_asoc
0.00 +1.2 1.22 ± 5% perf-profile.calltrace.cycles-pp.__mod_memcg_state.mem_cgroup_uncharge_skmem.__sk_mem_reduce_allocated.skb_release_head_state.kfree_skb_reason
0.00 +1.3 1.28 ± 5% perf-profile.calltrace.cycles-pp.mem_cgroup_uncharge_skmem.__sk_mem_reduce_allocated.skb_release_head_state.kfree_skb_reason.sctp_recvmsg
30.64 -0.6 30.00 perf-profile.children.cycles-pp.sctp_packet_transmit
32.46 -0.6 31.87 perf-profile.children.cycles-pp.sctp_outq_flush
48.58 -0.2 48.34 perf-profile.children.cycles-pp.sctp_sendmsg
49.18 -0.2 48.94 perf-profile.children.cycles-pp.____sys_sendmsg
9.26 -0.2 9.07 perf-profile.children.cycles-pp.sctp_datamsg_from_user
9.00 -0.2 8.82 perf-profile.children.cycles-pp.memcpy_erms
9.53 -0.2 9.36 perf-profile.children.cycles-pp.sctp_packet_pack
5.01 -0.2 4.86 perf-profile.children.cycles-pp._sctp_make_chunk
4.87 -0.1 4.72 perf-profile.children.cycles-pp.sctp_make_datafrag_empty
0.52 ± 2% -0.1 0.46 ± 4% perf-profile.children.cycles-pp.drain_stock
0.51 ± 3% -0.1 0.45 ± 4% perf-profile.children.cycles-pp.page_counter_uncharge
0.66 ± 2% -0.1 0.60 ± 4% perf-profile.children.cycles-pp.refill_stock
0.66 -0.1 0.61 ± 3% perf-profile.children.cycles-pp.schedule_idle
0.45 -0.0 0.40 ± 4% perf-profile.children.cycles-pp.psi_group_change
0.57 ± 2% -0.0 0.53 perf-profile.children.cycles-pp.__free_one_page
0.35 ± 2% -0.0 0.31 ± 4% perf-profile.children.cycles-pp.psi_task_switch
0.71 ± 2% -0.0 0.68 perf-profile.children.cycles-pp.__skb_clone
0.09 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.finish_task_switch
0.06 ± 11% +0.0 0.08 ± 6% perf-profile.children.cycles-pp.netif_rx
0.10 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.arch_local_irq_enable
0.82 +0.0 0.85 ± 2% perf-profile.children.cycles-pp.ipv4_dst_check
2.63 +0.1 2.69 perf-profile.children.cycles-pp.sctp_outq_flush_data
1.31 +0.1 1.41 perf-profile.children.cycles-pp.sctp_ulpevent_free
2.53 +0.3 2.87 ± 5% perf-profile.children.cycles-pp.__sk_mem_raise_allocated
2.55 +0.3 2.89 ± 5% perf-profile.children.cycles-pp.__sk_mem_schedule
1.76 +0.4 2.11 perf-profile.children.cycles-pp.sctp_wfree
8.19 +0.5 8.66 perf-profile.children.cycles-pp.kfree_skb_reason
1.63 +0.7 2.36 ± 5% perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
1.72 +1.2 2.90 ± 4% perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
3.62 +1.2 4.81 ± 2% perf-profile.children.cycles-pp.skb_release_head_state
0.75 +1.2 1.99 ± 5% perf-profile.children.cycles-pp.mem_cgroup_uncharge_skmem
1.30 +2.0 3.28 ± 5% perf-profile.children.cycles-pp.__mod_memcg_state
0.84 ± 2% -0.4 0.46 ± 5% perf-profile.self.cycles-pp.__sk_mem_raise_allocated
8.91 -0.2 8.73 perf-profile.self.cycles-pp.memcpy_erms
0.42 ± 3% -0.1 0.35 ± 4% perf-profile.self.cycles-pp.page_counter_uncharge
1.75 -0.1 1.69 perf-profile.self.cycles-pp.get_page_from_freelist
0.44 ± 2% -0.0 0.40 ± 3% perf-profile.self.cycles-pp.psi_group_change
0.16 ± 5% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.cpuidle_idle_call
0.48 -0.0 0.45 perf-profile.self.cycles-pp.__free_one_page
0.46 -0.0 0.43 ± 3% perf-profile.self.cycles-pp.sctp_chunkify
0.23 ± 3% -0.0 0.21 ± 4% perf-profile.self.cycles-pp.sctp_inq_pop
0.81 +0.0 0.84 ± 2% perf-profile.self.cycles-pp.ipv4_dst_check
0.25 ± 3% +0.0 0.30 ± 4% perf-profile.self.cycles-pp.sctp_chunk_put
1.22 +2.0 3.20 ± 5% perf-profile.self.cycles-pp.__mod_memcg_state
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-08-25 7:05 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 7:05 [bytedance:5.15-velinux] [mm] bf31671edf: netperf.Throughput_Mbps 11.0% regression kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).