From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-kernel@vger.kernel.org
Cc: "André Almeida" <andrealmeid@igalia.com>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Ingo Molnar" <mingo@redhat.com>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Valentin Schneider" <vschneid@redhat.com>,
"Waiman Long" <longman@redhat.com>
Subject: Re: [RFC v2 PATCH 0/4] futex: Add support task local hash maps.
Date: Thu, 31 Oct 2024 16:56:40 +0100 [thread overview]
Message-ID: <20241031155640.Fhtm3uFD@linutronix.de> (raw)
In-Reply-To: <20241028121921.1264150-1-bigeasy@linutronix.de>
On 2024-10-28 13:13:54 [+0100], To linux-kernel@vger.kernel.org wrote:
> Need to do
> more testing.
So there is "perf bench futex hash". On a 256 CPU NUMA box:
perf bench futex hash -t 240 -m -s -b $hb
and hb 2 … 131072 (moved the allocation to kvmalloc) I get the following
(averaged over 3 three runs)
buckets op/sec
2 9158.33
4 21665.66 + ~136%
8 44686.66 + ~106
16 84144.33 + ~ 88
32 139998.33 + ~ 66
64 279957.0 + ~ 99
128 509533.0 + ~100
256 1019846.0 + ~100
512 1634940.0 + ~ 60
1024 1834859.33 + ~ 12
1868129.33 (global hash, 65536 hash)
2048 1912071.33 + ~ 4
4096 1918686.66 + ~ 0
8192 1922285.66 + ~ 0
16384 1923017.0 + ~ 0
32768 1923319.0 + ~ 0
65536 1932906.0 + ~ 0
131072 2042571.33 + ~ 5
By doubling the hash size the ops/sec almost double until 256 slots.
After 2048 slots the increase is almost noise (except for the last
entry).
Pinning the bench to individual CPUs belonging to a NUMA node and
running the same test with 110 threads only (avg over 5 runs):
ops/sec global ops/sec local
node 0 2278572.2 2534827.4
node 1 2229838.6 2437498.8
node 0+1 2542602.4 2535749.8
<--->
RAW numbers:
futex hash table entries: 65536 (order: 10, 4194304 bytes, vmalloc hugepage)
Run summary [PID 4541]: 240 threads, each operating on 1024 [private] futexes for 10 secs.
Averaged 1883542 operations/sec (+- 0,28%), total secs = 10
Averaged 1864680 operations/sec (+- 0,31%), total secs = 10
Averaged 1856166 operations/sec (+- 0,32%), total secs = 10
1868129.3333333333
====
Run summary [PID 6247]: 240 threads, hash slots: 2 each operating on 1024 [private] futexes for 10 secs.
Averaged 9219 operations/sec (+- 0,19%), total secs = 10
Averaged 9185 operations/sec (+- 0,18%), total secs = 10
Averaged 9071 operations/sec (+- 0,20%), total secs = 10
9158.333333333334
Run summary [PID 6970]: 240 threads, hash slots: 4 each operating on 1024 [private] futexes for 10 secs.
Averaged 16911 operations/sec (+- 0,29%), total secs = 10
Averaged 24145 operations/sec (+- 0,17%), total secs = 10
Averaged 23941 operations/sec (+- 0,17%), total secs = 10
21665.666666666668
Run summary [PID 7693]: 240 threads, hash slots: 8 each operating on 1024 [private] futexes for 10 secs.
Averaged 45376 operations/sec (+- 0,25%), total secs = 10
Averaged 44587 operations/sec (+- 0,17%), total secs = 10
Averaged 44097 operations/sec (+- 0,26%), total secs = 10
44686.666666666664
Run summary [PID 8416]: 240 threads, hash slots: 16 each operating on 1024 [private] futexes for 10 secs.
Averaged 84547 operations/sec (+- 0,25%), total secs = 10
Averaged 84672 operations/sec (+- 0,18%), total secs = 10
Averaged 83214 operations/sec (+- 0,26%), total secs = 10
84144.33333333333
Run summary [PID 9139]: 240 threads, hash slots: 32 each operating on 1024 [private] futexes for 10 secs.
Averaged 163342 operations/sec (+- 0,55%), total secs = 10
Averaged 127630 operations/sec (+- 0,28%), total secs = 10
Averaged 129023 operations/sec (+- 0,27%), total secs = 10
139998.33333333334
Run summary [PID 9862]: 240 threads, hash slots: 64 each operating on 1024 [private] futexes for 10 secs.
Averaged 279627 operations/sec (+- 0,29%), total secs = 10
Averaged 279572 operations/sec (+- 0,21%), total secs = 10
Averaged 280672 operations/sec (+- 0,26%), total secs = 10
279957.0
Run summary [PID 10585]: 240 threads, hash slots: 128 each operating on 1024 [private] futexes for 10 secs.
Averaged 508759 operations/sec (+- 0,21%), total secs = 10
Averaged 511253 operations/sec (+- 0,22%), total secs = 10
Averaged 508587 operations/sec (+- 0,26%), total secs = 10
509533.0
Run summary [PID 11308]: 240 threads, hash slots: 256 each operating on 1024 [private] futexes for 10 secs.
Averaged 1023552 operations/sec (+- 0,10%), total secs = 10
Averaged 1034426 operations/sec (+- 0,11%), total secs = 10
Averaged 1001560 operations/sec (+- 0,10%), total secs = 10
1019846.0
Run summary [PID 12031]: 240 threads, hash slots: 512 each operating on 1024 [private] futexes for 10 secs.
Averaged 1636187 operations/sec (+- 0,22%), total secs = 10
Averaged 1607427 operations/sec (+- 0,23%), total secs = 10
Averaged 1661206 operations/sec (+- 0,24%), total secs = 10
1634940.0
Run summary [PID 12756]: 240 threads, hash slots: 1024 each operating on 1024 [private] futexes for 10 secs.
Averaged 1833474 operations/sec (+- 0,24%), total secs = 10
Averaged 1835817 operations/sec (+- 0,24%), total secs = 10
Averaged 1835287 operations/sec (+- 0,25%), total secs = 10
1834859.3333333333
Run summary [PID 13479]: 240 threads, hash slots: 2048 each operating on 1024 [private] futexes for 10 secs.
Averaged 1915836 operations/sec (+- 0,29%), total secs = 10
Averaged 1907866 operations/sec (+- 0,28%), total secs = 10
Averaged 1912512 operations/sec (+- 0,29%), total secs = 10
1912071.3333333333
Run summary [PID 14202]: 240 threads, hash slots: 4096 each operating on 1024 [private] futexes for 10 secs.
Averaged 1916947 operations/sec (+- 0,27%), total secs = 10
Averaged 1918102 operations/sec (+- 0,28%), total secs = 10
Averaged 1921011 operations/sec (+- 0,29%), total secs = 10
1918686.6666666667
Run summary [PID 14925]: 240 threads, hash slots: 8192 each operating on 1024 [private] futexes for 10 secs.
Averaged 1916001 operations/sec (+- 0,27%), total secs = 10
Averaged 1923156 operations/sec (+- 0,27%), total secs = 10
Averaged 1927700 operations/sec (+- 0,27%), total secs = 10
1922285.6666666667
Run summary [PID 15648]: 240 threads, hash slots: 16384 each operating on 1024 [private] futexes for 10 secs.
Averaged 1928497 operations/sec (+- 0,28%), total secs = 10
Averaged 1916906 operations/sec (+- 0,27%), total secs = 10
Averaged 1923648 operations/sec (+- 0,26%), total secs = 10
1923017.0
Run summary [PID 16371]: 240 threads, hash slots: 32768 each operating on 1024 [private] futexes for 10 secs.
Averaged 1920425 operations/sec (+- 0,27%), total secs = 10
Averaged 1923449 operations/sec (+- 0,27%), total secs = 10
Averaged 1926083 operations/sec (+- 0,29%), total secs = 10
1923319.0
Run summary [PID 17094]: 240 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 1927007 operations/sec (+- 0,28%), total secs = 10
Averaged 1935182 operations/sec (+- 0,28%), total secs = 10
Averaged 1936529 operations/sec (+- 0,28%), total secs = 10
1932906.0
Run summary [PID 17817]: 240 threads, hash slots: 131072 each operating on 1024 [private] futexes for 10 secs.
Averaged 2033664 operations/sec (+- 0,32%), total secs = 10
Averaged 2060081 operations/sec (+- 0,33%), total secs = 10
Averaged 2033969 operations/sec (+- 0,32%), total secs = 10
2042571.3333333333
----
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 7679's current affinity list: 64-127,192-255
====
# Running 'futex/hash' benchmark:
Run summary [PID 23094]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2180419 operations/sec (+- 0,77%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23205]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2258612 operations/sec (+- 0,87%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23317]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2245819 operations/sec (+- 0,80%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23428]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2231469 operations/sec (+- 0,81%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23539]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2232874 operations/sec (+- 0,78%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 23650]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2469636 operations/sec (+- 0,92%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23761]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2432942 operations/sec (+- 0,91%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23872]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2411433 operations/sec (+- 0,90%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 23983]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2438380 operations/sec (+- 0,94%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24094]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2435103 operations/sec (+- 0,94%), total secs = 10
====
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 9731's current affinity list: 0-63,128-191
====
# Running 'futex/hash' benchmark:
Run summary [PID 24207]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2206612 operations/sec (+- 0,75%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24318]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2321819 operations/sec (+- 0,85%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24429]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2238386 operations/sec (+- 0,77%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24541]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2325869 operations/sec (+- 0,85%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24652]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2300175 operations/sec (+- 0,82%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 24763]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2530561 operations/sec (+- 0,96%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24874]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2573315 operations/sec (+- 1,03%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 24985]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2517479 operations/sec (+- 0,99%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25096]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2554631 operations/sec (+- 1,01%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25207]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2498151 operations/sec (+- 0,94%), total secs = 10
====
bigeasy@z3:~$ taskset -pc $$; ./run-numa.sh
pid 10975's current affinity list: 0-255
====
# Running 'futex/hash' benchmark:
Run summary [PID 25324]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2561817 operations/sec (+- 0,14%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25435]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539522 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25546]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2532349 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25657]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539481 operations/sec (+- 0,11%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25768]: 110 threads, hash slots: -65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2539843 operations/sec (+- 0,13%), total secs = 10
====
# Running 'futex/hash' benchmark:
Run summary [PID 25879]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2540858 operations/sec (+- 0,50%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 25990]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2550342 operations/sec (+- 0,48%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26101]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2522785 operations/sec (+- 0,48%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26212]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2528686 operations/sec (+- 0,49%), total secs = 10
# Running 'futex/hash' benchmark:
Run summary [PID 26323]: 110 threads, hash slots: 65536 each operating on 1024 [private] futexes for 10 secs.
Averaged 2536078 operations/sec (+- 0,48%), total secs = 10
====
Sebastian
next prev parent reply other threads:[~2024-10-31 15:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-28 12:13 [RFC v2 PATCH 0/4] futex: Add support task local hash maps Sebastian Andrzej Siewior
2024-10-28 12:13 ` [RFC PATCH v2 1/4] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
2024-10-28 12:13 ` [RFC PATCH v2 2/4] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
2024-10-28 12:13 ` [RFC PATCH v2 3/4] futex: Use the task local hashmap Sebastian Andrzej Siewior
2024-10-28 12:13 ` [RFC PATCH v2 4/4] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
2024-10-28 17:50 ` [RFC v2 PATCH 0/4] futex: Add support task local hash maps Sebastian Andrzej Siewior
2024-10-29 11:10 ` Juri Lelli
2024-10-29 15:06 ` Sebastian Andrzej Siewior
2024-10-31 15:56 ` Sebastian Andrzej Siewior [this message]
2024-10-31 17:47 ` Sebastian Andrzej Siewior
2024-11-01 11:08 ` Sebastian Andrzej Siewior
2024-10-31 20:18 ` Waiman Long
2024-10-31 20:28 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241031155640.Fhtm3uFD@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=andrealmeid@igalia.com \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.