From: Ankur Arora <ankur.a.arora@oracle.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
rostedt@goodmis.org, tglx@linutronix.de,
vincent.guittot@linaro.org, jon.grimm@amd.com, bharata@amd.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH 0/9] x86/clear_huge_page: multi-page clearing
Date: Sat, 08 Apr 2023 15:46:56 -0700 [thread overview]
Message-ID: <87ttxqf0v3.fsf@oracle.com> (raw)
In-Reply-To: <271b85ec-281e-d33b-5495-59eb2bc9fde4@amd.com>
Raghavendra K T <raghavendra.kt@amd.com> writes:
> On 4/3/2023 10:52 AM, Ankur Arora wrote:
>> This series introduces multi-page clearing for hugepages.
> *Milan* mm/clear_huge_page x86/clear_huge_page change
> (GB/s) (GB/s)
> pg-sz=2MB 12.24 17.54 +43.30%
> pg-sz=1GB 17.98 37.24 +107.11%
>
>
> Hello Ankur,
>
> Was able to test your patches. To summarize, am seeing 2x-3x perf
> improvement for 2M, 1GB base hugepage sizes.
Great. Thanks Raghavendra.
> SUT: Genoa AMD EPYC
> Thread(s) per core: 2
> Core(s) per socket: 128
> Socket(s): 2
>
> NUMA:
> NUMA node(s): 2
> NUMA node0 CPU(s): 0-127,256-383
> NUMA node1 CPU(s): 128-255,384-511
>
> Test: Use mmap(MAP_HUGETLB) to demand a fault on 64GB region (NUMA node0), for
> both base-hugepage-size=2M and 1GB
>
> perf stat -r 10 -d -d numactl -m 0 -N 0 <test>
>
> time in seconds elapsed (average of 10 runs) (lower = better)
>
> Result:
> page-size mm/clear_huge_page x86/clear_huge_page
> 2M 5.4567 2.6774
> 1G 2.64452 1.011281
So translating into BW, for Genoa we have:
page-size mm/clear_huge_page x86/clear_huge_page
2M 11.74 23.97
1G 24.24 63.36
That's a pretty good bump over Milan:
> *Milan* mm/clear_huge_page x86/clear_huge_page
> (GB/s) (GB/s)
> pg-sz=2MB 12.24 17.54
> pg-sz=1GB 17.98 37.24
Btw, are these numbers with boost=1?
> Full perfstat info
>
> page size = 2M mm/clear_huge_page
>
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_2M' (10 runs):
>
> 5,434.71 msec task-clock # 0.996 CPUs utilized
> ( +- 0.55% )
> 8 context-switches # 1.466 /sec
> ( +- 4.66% )
> 0 cpu-migrations # 0.000 /sec
> 32,918 page-faults # 6.034 K/sec
> ( +- 0.00% )
> 16,977,242,482 cycles # 3.112 GHz
> ( +- 0.04% ) (35.70%)
> 1,961,724 stalled-cycles-frontend # 0.01% frontend cycles
> idle ( +- 1.09% ) (35.72%)
> 35,685,674 stalled-cycles-backend # 0.21% backend cycles idle
> ( +- 3.48% ) (35.74%)
> 1,038,327,182 instructions # 0.06 insn per cycle
> # 0.04 stalled cycles per
> insn ( +- 0.38% )
> (35.75%)
> 221,409,216 branches # 40.584 M/sec
> ( +- 0.36% ) (35.75%)
> 350,730 branch-misses # 0.16% of all branches
> ( +- 1.18% ) (35.75%)
> 2,520,888,779 L1-dcache-loads # 462.077 M/sec
> ( +- 0.03% ) (35.73%)
> 1,094,178,209 L1-dcache-load-misses # 43.46% of all L1-dcache
> accesses ( +- 0.02% ) (35.71%)
> 67,751,730 L1-icache-loads # 12.419 M/sec
> ( +- 0.11% ) (35.70%)
> 271,118 L1-icache-load-misses # 0.40% of all L1-icache
> accesses ( +- 2.55% ) (35.70%)
> 506,635 dTLB-loads # 92.866 K/sec
> ( +- 3.31% ) (35.70%)
> 237,385 dTLB-load-misses # 43.64% of all dTLB cache
> accesses ( +- 7.00% ) (35.69%)
> 268 iTLB-load-misses # 6700.00% of all iTLB cache
> accesses ( +- 13.86% ) (35.70%)
>
> 5.4567 +- 0.0300 seconds time elapsed ( +- 0.55% )
>
> page size = 2M x86/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_2M' (10 runs):
>
> 2,780.69 msec task-clock # 1.039 CPUs utilized
> ( +- 1.03% )
> 3 context-switches # 1.121 /sec
> ( +- 21.34% )
> 0 cpu-migrations # 0.000 /sec
> 32,918 page-faults # 12.301 K/sec
> ( +- 0.00% )
> 8,143,619,771 cycles # 3.043 GHz
> ( +- 0.25% ) (35.62%)
> 2,024,872 stalled-cycles-frontend # 0.02% frontend cycles
> idle ( +-320.93% ) (35.66%)
> 717,198,728 stalled-cycles-backend # 8.82% backend cycles idle
> ( +- 8.26% ) (35.69%)
> 606,549,334 instructions # 0.07 insn per cycle
> # 1.39 stalled cycles per
> insn ( +- 0.23% )
> (35.73%)
> 108,856,550 branches # 40.677 M/sec
> ( +- 0.24% ) (35.76%)
> 202,490 branch-misses # 0.18% of all branches
> ( +- 3.58% ) (35.78%)
> 2,348,818,806 L1-dcache-loads # 877.701 M/sec
> ( +- 0.03% ) (35.78%)
> 1,081,562,988 L1-dcache-load-misses # 46.04% of all L1-dcache
> accesses ( +- 0.01% ) (35.78%)
> <not supported> LLC-loads
> <not supported> LLC-load-misses
> 43,411,167 L1-icache-loads # 16.222 M/sec
> ( +- 0.19% ) (35.77%)
> 273,042 L1-icache-load-misses # 0.64% of all L1-icache
> accesses ( +- 4.94% ) (35.76%)
> 834,482 dTLB-loads # 311.827 K/sec
> ( +- 9.73% ) (35.72%)
> 437,343 dTLB-load-misses # 65.86% of all dTLB cache
> accesses ( +- 8.56% ) (35.68%)
> 0 iTLB-loads # 0.000 /sec
> (35.65%)
> 160 iTLB-load-misses # 1777.78% of all iTLB cache
> accesses ( +- 15.82% ) (35.62%)
>
> 2.6774 +- 0.0287 seconds time elapsed ( +- 1.07% )
>
> page size = 1G mm/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
>
> 2,625.24 msec task-clock # 0.993 CPUs utilized
> ( +- 0.23% )
> 4 context-switches # 1.513 /sec
> ( +- 4.49% )
> 1 cpu-migrations # 0.378 /sec
> 214 page-faults # 80.965 /sec
> ( +- 0.13% )
> 8,178,624,349 cycles # 3.094 GHz
> ( +- 0.23% ) (35.65%)
> 2,942,576 stalled-cycles-frontend # 0.04% frontend cycles
> idle ( +- 75.22% ) (35.69%)
> 7,117,425 stalled-cycles-backend # 0.09% backend cycles idle
> ( +- 3.79% ) (35.73%)
> 454,521,647 instructions # 0.06 insn per cycle
> # 0.02 stalled cycles per
> insn ( +- 0.10% )
> (35.77%)
> 113,223,853 branches # 42.837 M/sec
> ( +- 0.08% ) (35.80%)
> 84,766 branch-misses # 0.07% of all branches
> ( +- 5.37% ) (35.80%)
> 2,294,528,890 L1-dcache-loads # 868.111 M/sec
> ( +- 0.02% ) (35.81%)
> 1,075,907,551 L1-dcache-load-misses # 46.88% of all L1-dcache
> accesses ( +- 0.02% ) (35.78%)
> 26,167,323 L1-icache-loads # 9.900 M/sec
> ( +- 0.24% ) (35.74%)
> 139,675 L1-icache-load-misses # 0.54% of all L1-icache
> accesses ( +- 0.37% ) (35.70%)
> 3,459 dTLB-loads # 1.309 K/sec
> ( +- 12.75% ) (35.67%)
> 732 dTLB-load-misses # 19.71% of all dTLB cache
> accesses ( +- 26.61% ) (35.62%)
> 11 iTLB-load-misses # 192.98% of all iTLB cache
> accesses ( +-238.28% ) (35.62%)
>
> 2.64452 +- 0.00600 seconds time elapsed ( +- 0.23% )
>
>
> page size = 1G x86/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
>
> 1,009.09 msec task-clock # 0.998 CPUs utilized
> ( +- 0.06% )
> 2 context-switches # 1.980 /sec
> ( +- 23.63% )
> 1 cpu-migrations # 0.990 /sec
> 214 page-faults # 211.887 /sec
> ( +- 0.16% )
> 3,154,980,463 cycles # 3.124 GHz
> ( +- 0.06% ) (35.77%)
> 145,051 stalled-cycles-frontend # 0.00% frontend cycles
> idle ( +- 6.26% ) (35.78%)
> 730,087,143 stalled-cycles-backend # 23.12% backend cycles idle
> ( +- 9.75% ) (35.78%)
> 45,813,391 instructions # 0.01 insn per cycle
> # 18.51 stalled cycles per
> insn ( +- 1.00% )
> (35.78%)
> 8,498,282 branches # 8.414 M/sec
> ( +- 1.54% ) (35.78%)
> 63,351 branch-misses # 0.74% of all branches
> ( +- 6.70% ) (35.69%)
> 29,135,863 L1-dcache-loads # 28.848 M/sec
> ( +- 5.67% ) (35.68%)
> 8,537,280 L1-dcache-load-misses # 28.66% of all L1-dcache
> accesses ( +- 10.15% ) (35.68%)
> 1,040,087 L1-icache-loads # 1.030 M/sec
> ( +- 1.60% ) (35.68%)
> 9,147 L1-icache-load-misses # 0.85% of all L1-icache
> accesses ( +- 6.50% ) (35.67%)
> 1,084 dTLB-loads # 1.073 K/sec
> ( +- 12.05% ) (35.68%)
> 431 dTLB-load-misses # 40.28% of all dTLB cache
> accesses ( +- 43.46% ) (35.68%)
> 16 iTLB-load-misses # 0.00% of all iTLB cache
> accesses ( +- 40.54% ) (35.68%)
>
> 1.011281 +- 0.000624 seconds time elapsed ( +- 0.06% )
>
> Please feel free to add
>
> Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Thanks
Ankur
> Will come back with further observations on patch/performance if any
next prev parent reply other threads:[~2023-04-08 22:47 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-03 5:22 [PATCH 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-04-03 5:22 ` [PATCH 1/9] huge_pages: get rid of process_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 2/9] huge_page: get rid of {clear,copy}_subpage() Ankur Arora
2023-04-03 5:22 ` [PATCH 3/9] huge_page: allow arch override for clear/copy_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 4/9] x86/clear_page: parameterize clear_page*() to specify length Ankur Arora
2023-04-06 8:19 ` Peter Zijlstra
2023-04-07 3:03 ` Ankur Arora
2023-04-03 5:22 ` [PATCH 5/9] x86/clear_pages: add clear_pages() Ankur Arora
2023-04-06 8:23 ` Peter Zijlstra
2023-04-07 0:50 ` Ankur Arora
2023-04-07 10:34 ` Peter Zijlstra
2023-04-09 13:26 ` Matthew Wilcox
2023-04-03 5:22 ` [PATCH 6/9] mm/clear_huge_page: use multi-page clearing Ankur Arora
2023-04-03 5:22 ` [PATCH 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-04-05 20:07 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-04-04 9:38 ` Thomas Gleixner
2023-04-05 5:29 ` Ankur Arora
2023-04-05 20:22 ` Peter Zijlstra
2023-04-06 16:56 ` Ankur Arora
2023-04-06 20:13 ` Peter Zijlstra
2023-04-06 20:16 ` Peter Zijlstra
2023-04-07 2:29 ` Ankur Arora
2023-04-07 10:23 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-04-05 20:27 ` Peter Zijlstra
2023-04-06 17:00 ` Ankur Arora
2023-04-05 19:48 ` [PATCH 0/9] x86/clear_huge_page: multi-page clearing Raghavendra K T
2023-04-08 22:46 ` Ankur Arora [this message]
2023-04-10 6:26 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttxqf0v3.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.