From: kernel test robot <oliver.sang@intel.com>
To: Frank van der Linden <fvdl@google.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
Arnd Bergmann <arnd@arndb.de>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <akpm@linux-foundation.org>,
<muchun.song@linux.dev>, <yuzhao@google.com>,
<usamaarif642@gmail.com>, <joao.m.martins@oracle.com>,
<roman.gushchin@linux.dev>, <ziy@nvidia.com>, <david@redhat.com>,
"Frank van der Linden" <fvdl@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested
Date: Wed, 5 Mar 2025 14:28:40 +0800 [thread overview]
Message-ID: <202503051327.e87dce82-lkp@intel.com> (raw)
In-Reply-To: <20250228182928.2645936-3-fvdl@google.com>
Hello,
kernel test robot noticed a 15.1% improvement of netperf.Throughput_tps on:
commit: a957f140831b0d42e4fdbe83cf93997ef1b51bda ("[PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested")
url: https://github.com/intel-lab-lkp/linux/commits/Frank-van-der-Linden/mm-cma-export-total-and-free-number-of-pages-for-CMA-areas/20250301-023339
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 276f98efb64a2c31c099465ace78d3054c662a0f
patch link: https://lore.kernel.org/all/20250228182928.2645936-3-fvdl@google.com/
patch subject: [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested
testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:
ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_CRR
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250305/202503051327.e87dce82-lkp@intel.com
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/TCP_CRR/netperf
commit:
cdc31e6532 ("mm/cma: export total and free number of pages for CMA areas")
a957f14083 ("mm, cma: support multiple contiguous ranges, if requested")
cdc31e65328522c6 a957f140831b0d42e4fdbe83cf9
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.43 +0.5 2.90 ± 4% mpstat.cpu.all.usr%
4718850 +15.4% 5446771 vmstat.system.cs
62006 ± 43% -59.6% 25067 ±137% numa-meminfo.node0.Mapped
2884295 ± 41% -59.4% 1171696 ±135% numa-meminfo.node0.Unevictable
28159 ± 2% -17.7% 23164 ± 2% perf-c2c.HITM.local
5426 ± 3% +28.5% 6973 ± 8% perf-c2c.HITM.remote
33586 ± 2% -10.3% 30137 ± 3% perf-c2c.HITM.total
5642375 ± 2% +15.5% 6519596 sched_debug.cpu.nr_switches.avg
7473763 ± 4% +18.0% 8815709 ± 2% sched_debug.cpu.nr_switches.max
4352931 ± 3% +12.7% 4906391 ± 2% sched_debug.cpu.nr_switches.min
2485115 ± 6% +31.9% 3277456 ± 11% numa-numastat.node0.local_node
2526446 ± 6% +32.8% 3356120 ± 11% numa-numastat.node0.numa_hit
3522582 ± 10% +28.7% 4535065 ± 23% numa-numastat.node1.local_node
3613797 ± 10% +27.0% 4588978 ± 22% numa-numastat.node1.numa_hit
40617 +5.4% 42811 ± 5% proc-vmstat.nr_slab_reclaimable
6144430 ± 4% +29.4% 7948120 ± 16% proc-vmstat.numa_hit
6011884 ± 4% +30.0% 7815542 ± 16% proc-vmstat.numa_local
26402145 ± 2% +40.6% 37129548 ± 14% proc-vmstat.pgalloc_normal
25226079 +42.1% 35834032 ± 13% proc-vmstat.pgfree
15712 ± 43% -59.6% 6348 ±137% numa-vmstat.node0.nr_mapped
721073 ± 41% -59.4% 292924 ±135% numa-vmstat.node0.nr_unevictable
721073 ± 41% -59.4% 292924 ±135% numa-vmstat.node0.nr_zone_unevictable
2526848 ± 6% +32.8% 3355902 ± 11% numa-vmstat.node0.numa_hit
2485517 ± 6% +31.9% 3277238 ± 11% numa-vmstat.node0.numa_local
3614259 ± 10% +27.0% 4589442 ± 22% numa-vmstat.node1.numa_hit
3523043 ± 10% +28.7% 4535533 ± 23% numa-vmstat.node1.numa_local
1711802 +15.1% 1969470 netperf.ThroughputBoth_total_tps
6686 +15.1% 7693 netperf.ThroughputBoth_tps
1711802 +15.1% 1969470 netperf.Throughput_total_tps
6686 +15.1% 7693 netperf.Throughput_tps
4.052e+08 ± 5% +16.7% 4.728e+08 ± 4% netperf.time.involuntary_context_switches
535.88 +18.1% 633.12 netperf.time.user_time
3.175e+08 ± 3% +13.9% 3.615e+08 ± 3% netperf.time.voluntary_context_switches
5.135e+08 +15.1% 5.908e+08 netperf.workload
0.07 ± 8% -31.3% 0.05 ± 23% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
0.46 ±114% -71.4% 0.13 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.__inet_stream_connect.inet_stream_connect.__sys_connect
5.70 ± 90% +2752.3% 162.72 ±202% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
33.94 ± 19% +50.3% 50.99 ± 18% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
30764 ± 22% -32.1% 20881 ± 22% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
7.03 ± 60% +11736.2% 832.16 ±150% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.14 ± 8% -33.5% 0.09 ± 26% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
0.11 ± 8% -14.3% 0.10 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
33.61 ± 19% +50.4% 50.57 ± 18% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.69 ±109% -59.0% 0.28 ± 27% perf-sched.wait_time.max.ms.__cond_resched.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg
0.76 -39.9% 0.46 ± 12% perf-stat.i.MPKI
3.959e+10 +14.9% 4.55e+10 perf-stat.i.branch-instructions
0.92 -0.0 0.90 perf-stat.i.branch-miss-rate%
3.564e+08 +12.7% 4.017e+08 perf-stat.i.branch-misses
1.561e+08 -32.2% 1.058e+08 ± 12% perf-stat.i.cache-misses
6.91e+08 -33.8% 4.574e+08 ± 6% perf-stat.i.cache-references
4760614 +15.5% 5496803 perf-stat.i.context-switches
1.54 -13.5% 1.33 perf-stat.i.cpi
2048 +49.1% 3054 ± 9% perf-stat.i.cycles-between-cache-misses
2.084e+11 +14.9% 2.394e+11 perf-stat.i.instructions
0.65 +15.3% 0.75 perf-stat.i.ipc
37.20 +15.5% 42.97 perf-stat.i.metric.K/sec
0.75 -41.0% 0.44 ± 12% perf-stat.overall.MPKI
0.90 -0.0 0.88 perf-stat.overall.branch-miss-rate%
1.54 -13.6% 1.33 perf-stat.overall.cpi
2060 +48.5% 3060 ± 10% perf-stat.overall.cycles-between-cache-misses
0.65 +15.7% 0.75 perf-stat.overall.ipc
3.947e+10 +14.9% 4.536e+10 perf-stat.ps.branch-instructions
3.553e+08 +12.7% 4.005e+08 perf-stat.ps.branch-misses
1.557e+08 -32.2% 1.055e+08 ± 12% perf-stat.ps.cache-misses
6.889e+08 -33.8% 4.56e+08 ± 6% perf-stat.ps.cache-references
4746041 +15.5% 5479885 perf-stat.ps.context-switches
2.078e+11 +14.9% 2.387e+11 perf-stat.ps.instructions
6.363e+13 +14.9% 7.312e+13 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2025-03-05 6:29 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 18:29 [PATCH v5 00/27] hugetlb/CMA improvements for large systems Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 01/27] mm/cma: export total and free number of pages for CMA areas Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested Frank van der Linden
2025-03-05 6:28 ` kernel test robot [this message]
2025-03-05 18:02 ` Frank van der Linden
2025-04-07 11:50 ` Geert Uytterhoeven
2025-04-07 15:52 ` Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 03/27] mm/cma: introduce cma_intersects function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 04/27] mm, hugetlb: use cma_declare_contiguous_multi Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 05/27] mm/hugetlb: remove redundant __ClearPageReserved Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 06/27] mm/hugetlb: use online nodes for bootmem allocation Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 07/27] mm/hugetlb: convert cmdline parameters from setup to early Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 08/27] x86/mm: make register_page_bootmem_memmap handle PTE mappings Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 09/27] mm/bootmem_info: export register_page_bootmem_memmap Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 10/27] mm/sparse: allow for alternate vmemmap section init at boot Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 11/27] mm/hugetlb: set migratetype for bootmem folios Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 12/27] mm: define __init_reserved_page_zone function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 13/27] mm/hugetlb: check bootmem pages for zone intersections Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 14/27] mm/sparse: add vmemmap_*_hvo functions Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 15/27] mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 16/27] mm/hugetlb: move huge_boot_pages list init " Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 17/27] mm/hugetlb: add pre-HVO framework Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 18/27] mm/hugetlb_vmemmap: fix hugetlb_vmemmap_restore_folios definition Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 19/27] mm/hugetlb: do pre-HVO for bootmem allocated pages Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 20/27] x86/setup: call hugetlb_bootmem_alloc early Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 21/27] x86/mm: set ARCH_WANT_HUGETLB_VMEMMAP_PREINIT Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 22/27] mm/cma: simplify zone intersection check Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 23/27] mm/cma: introduce a cma validate function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 24/27] mm/cma: introduce interface for early reservations Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 25/27] mm/hugetlb: add hugetlb_cma_only cmdline option Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 26/27] mm/hugetlb: enable bootmem allocation from CMA areas Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 27/27] mm/hugetlb: move hugetlb CMA code in to its own file Frank van der Linden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202503051327.e87dce82-lkp@intel.com \
--to=oliver.sang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=david@redhat.com \
--cc=fvdl@google.com \
--cc=joao.m.martins@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=muchun.song@linux.dev \
--cc=oe-lkp@lists.linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.