All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Frank van der Linden <fvdl@google.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Arnd Bergmann <arnd@arndb.de>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <akpm@linux-foundation.org>,
	<muchun.song@linux.dev>, <yuzhao@google.com>,
	<usamaarif642@gmail.com>, <joao.m.martins@oracle.com>,
	<roman.gushchin@linux.dev>, <ziy@nvidia.com>, <david@redhat.com>,
	"Frank van der Linden" <fvdl@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested
Date: Wed, 5 Mar 2025 14:28:40 +0800	[thread overview]
Message-ID: <202503051327.e87dce82-lkp@intel.com> (raw)
In-Reply-To: <20250228182928.2645936-3-fvdl@google.com>



Hello,

kernel test robot noticed a 15.1% improvement of netperf.Throughput_tps on:


commit: a957f140831b0d42e4fdbe83cf93997ef1b51bda ("[PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested")
url: https://github.com/intel-lab-lkp/linux/commits/Frank-van-der-Linden/mm-cma-export-total-and-free-number-of-pages-for-CMA-areas/20250301-023339
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 276f98efb64a2c31c099465ace78d3054c662a0f
patch link: https://lore.kernel.org/all/20250228182928.2645936-3-fvdl@google.com/
patch subject: [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested

testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 200%
	cluster: cs-localhost
	test: TCP_CRR
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250305/202503051327.e87dce82-lkp@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/TCP_CRR/netperf

commit: 
  cdc31e6532 ("mm/cma: export total and free number of pages for CMA areas")
  a957f14083 ("mm, cma: support multiple contiguous ranges, if requested")

cdc31e65328522c6 a957f140831b0d42e4fdbe83cf9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.43            +0.5        2.90 ±  4%  mpstat.cpu.all.usr%
   4718850           +15.4%    5446771        vmstat.system.cs
     62006 ± 43%     -59.6%      25067 ±137%  numa-meminfo.node0.Mapped
   2884295 ± 41%     -59.4%    1171696 ±135%  numa-meminfo.node0.Unevictable
     28159 ±  2%     -17.7%      23164 ±  2%  perf-c2c.HITM.local
      5426 ±  3%     +28.5%       6973 ±  8%  perf-c2c.HITM.remote
     33586 ±  2%     -10.3%      30137 ±  3%  perf-c2c.HITM.total
   5642375 ±  2%     +15.5%    6519596        sched_debug.cpu.nr_switches.avg
   7473763 ±  4%     +18.0%    8815709 ±  2%  sched_debug.cpu.nr_switches.max
   4352931 ±  3%     +12.7%    4906391 ±  2%  sched_debug.cpu.nr_switches.min
   2485115 ±  6%     +31.9%    3277456 ± 11%  numa-numastat.node0.local_node
   2526446 ±  6%     +32.8%    3356120 ± 11%  numa-numastat.node0.numa_hit
   3522582 ± 10%     +28.7%    4535065 ± 23%  numa-numastat.node1.local_node
   3613797 ± 10%     +27.0%    4588978 ± 22%  numa-numastat.node1.numa_hit
     40617            +5.4%      42811 ±  5%  proc-vmstat.nr_slab_reclaimable
   6144430 ±  4%     +29.4%    7948120 ± 16%  proc-vmstat.numa_hit
   6011884 ±  4%     +30.0%    7815542 ± 16%  proc-vmstat.numa_local
  26402145 ±  2%     +40.6%   37129548 ± 14%  proc-vmstat.pgalloc_normal
  25226079           +42.1%   35834032 ± 13%  proc-vmstat.pgfree
     15712 ± 43%     -59.6%       6348 ±137%  numa-vmstat.node0.nr_mapped
    721073 ± 41%     -59.4%     292924 ±135%  numa-vmstat.node0.nr_unevictable
    721073 ± 41%     -59.4%     292924 ±135%  numa-vmstat.node0.nr_zone_unevictable
   2526848 ±  6%     +32.8%    3355902 ± 11%  numa-vmstat.node0.numa_hit
   2485517 ±  6%     +31.9%    3277238 ± 11%  numa-vmstat.node0.numa_local
   3614259 ± 10%     +27.0%    4589442 ± 22%  numa-vmstat.node1.numa_hit
   3523043 ± 10%     +28.7%    4535533 ± 23%  numa-vmstat.node1.numa_local
   1711802           +15.1%    1969470        netperf.ThroughputBoth_total_tps
      6686           +15.1%       7693        netperf.ThroughputBoth_tps
   1711802           +15.1%    1969470        netperf.Throughput_total_tps
      6686           +15.1%       7693        netperf.Throughput_tps
 4.052e+08 ±  5%     +16.7%  4.728e+08 ±  4%  netperf.time.involuntary_context_switches
    535.88           +18.1%     633.12        netperf.time.user_time
 3.175e+08 ±  3%     +13.9%  3.615e+08 ±  3%  netperf.time.voluntary_context_switches
 5.135e+08           +15.1%  5.908e+08        netperf.workload
      0.07 ±  8%     -31.3%       0.05 ± 23%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
      0.46 ±114%     -71.4%       0.13 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.__inet_stream_connect.inet_stream_connect.__sys_connect
      5.70 ± 90%   +2752.3%     162.72 ±202%  perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     33.94 ± 19%     +50.3%      50.99 ± 18%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     30764 ± 22%     -32.1%      20881 ± 22%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      7.03 ± 60%  +11736.2%     832.16 ±150%  perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.14 ±  8%     -33.5%       0.09 ± 26%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
      0.11 ±  8%     -14.3%       0.10 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
     33.61 ± 19%     +50.4%      50.57 ± 18%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.69 ±109%     -59.0%       0.28 ± 27%  perf-sched.wait_time.max.ms.__cond_resched.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg
      0.76           -39.9%       0.46 ± 12%  perf-stat.i.MPKI
 3.959e+10           +14.9%   4.55e+10        perf-stat.i.branch-instructions
      0.92            -0.0        0.90        perf-stat.i.branch-miss-rate%
 3.564e+08           +12.7%  4.017e+08        perf-stat.i.branch-misses
 1.561e+08           -32.2%  1.058e+08 ± 12%  perf-stat.i.cache-misses
  6.91e+08           -33.8%  4.574e+08 ±  6%  perf-stat.i.cache-references
   4760614           +15.5%    5496803        perf-stat.i.context-switches
      1.54           -13.5%       1.33        perf-stat.i.cpi
      2048           +49.1%       3054 ±  9%  perf-stat.i.cycles-between-cache-misses
 2.084e+11           +14.9%  2.394e+11        perf-stat.i.instructions
      0.65           +15.3%       0.75        perf-stat.i.ipc
     37.20           +15.5%      42.97        perf-stat.i.metric.K/sec
      0.75           -41.0%       0.44 ± 12%  perf-stat.overall.MPKI
      0.90            -0.0        0.88        perf-stat.overall.branch-miss-rate%
      1.54           -13.6%       1.33        perf-stat.overall.cpi
      2060           +48.5%       3060 ± 10%  perf-stat.overall.cycles-between-cache-misses
      0.65           +15.7%       0.75        perf-stat.overall.ipc
 3.947e+10           +14.9%  4.536e+10        perf-stat.ps.branch-instructions
 3.553e+08           +12.7%  4.005e+08        perf-stat.ps.branch-misses
 1.557e+08           -32.2%  1.055e+08 ± 12%  perf-stat.ps.cache-misses
 6.889e+08           -33.8%   4.56e+08 ±  6%  perf-stat.ps.cache-references
   4746041           +15.5%    5479885        perf-stat.ps.context-switches
 2.078e+11           +14.9%  2.387e+11        perf-stat.ps.instructions
 6.363e+13           +14.9%  7.312e+13        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  reply	other threads:[~2025-03-05  6:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28 18:29 [PATCH v5 00/27] hugetlb/CMA improvements for large systems Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 01/27] mm/cma: export total and free number of pages for CMA areas Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 02/27] mm, cma: support multiple contiguous ranges, if requested Frank van der Linden
2025-03-05  6:28   ` kernel test robot [this message]
2025-03-05 18:02     ` Frank van der Linden
2025-04-07 11:50   ` Geert Uytterhoeven
2025-04-07 15:52     ` Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 03/27] mm/cma: introduce cma_intersects function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 04/27] mm, hugetlb: use cma_declare_contiguous_multi Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 05/27] mm/hugetlb: remove redundant __ClearPageReserved Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 06/27] mm/hugetlb: use online nodes for bootmem allocation Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 07/27] mm/hugetlb: convert cmdline parameters from setup to early Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 08/27] x86/mm: make register_page_bootmem_memmap handle PTE mappings Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 09/27] mm/bootmem_info: export register_page_bootmem_memmap Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 10/27] mm/sparse: allow for alternate vmemmap section init at boot Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 11/27] mm/hugetlb: set migratetype for bootmem folios Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 12/27] mm: define __init_reserved_page_zone function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 13/27] mm/hugetlb: check bootmem pages for zone intersections Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 14/27] mm/sparse: add vmemmap_*_hvo functions Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 15/27] mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 16/27] mm/hugetlb: move huge_boot_pages list init " Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 17/27] mm/hugetlb: add pre-HVO framework Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 18/27] mm/hugetlb_vmemmap: fix hugetlb_vmemmap_restore_folios definition Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 19/27] mm/hugetlb: do pre-HVO for bootmem allocated pages Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 20/27] x86/setup: call hugetlb_bootmem_alloc early Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 21/27] x86/mm: set ARCH_WANT_HUGETLB_VMEMMAP_PREINIT Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 22/27] mm/cma: simplify zone intersection check Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 23/27] mm/cma: introduce a cma validate function Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 24/27] mm/cma: introduce interface for early reservations Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 25/27] mm/hugetlb: add hugetlb_cma_only cmdline option Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 26/27] mm/hugetlb: enable bootmem allocation from CMA areas Frank van der Linden
2025-02-28 18:29 ` [PATCH v5 27/27] mm/hugetlb: move hugetlb CMA code in to its own file Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202503051327.e87dce82-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=david@redhat.com \
    --cc=fvdl@google.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=muchun.song@linux.dev \
    --cc=oe-lkp@lists.linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.