From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: "André Almeida" <andrealmeid@igalia.com>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Ingo Molnar" <mingo@redhat.com>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Waiman Long" <longman@redhat.com>,
linux-kernel@vger.kernel.org,
"Nysal Jan K.A." <nysal@linux.ibm.com>
Subject: Re: [PATCH v10 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL
Date: Wed, 26 Mar 2025 10:31:53 +0100 [thread overview]
Message-ID: <20250326093153.Ib5b2p6M@linutronix.de> (raw)
In-Reply-To: <587d45c3-2098-4914-9dfc-275b5d0b9bb7@linux.ibm.com>
On 2025-03-26 00:34:23 [+0530], Shrikanth Hegde wrote:
> Hi Sebastian.
Hi Shrikanth,
> So, did some more bench-marking using the same perf futex hash.
> I see that perf creates N threads and binds each thread to a CPU and then
> calls futex_wait such that it never blocks. It always returns EWOULDBLOCK.
> only futex_hash is exercised.
It also does spin_lock() + unlock on the hash bucket. Without the
locking, you would have constant numbers.
> Numbers with different threads. (private futexes)
> threads baseline with series (ratio)
> 1 3386265 3266560 0.96
> 10 1972069 821565 0.41
> 40 1580497 277900 0.17
> 80 1555482 150450 0.096
>
>
> With Shared Futex: (-s option)
> Threads baseline with series (ratio)
> 80 590144 585067 0.99
The shared numbers are equal since the code path there is unchanged.
> After looking into code, and after some hacking, could get the
> performance back with below change. this is likely functionally not correct.
> the reason for below change is,
>
> 1. perf report showed significant time in futex_private_hash_put.
> so removed rcu usage for users. that brought some improvements.
> from 150k to 300k. Is there a better way to do this users protection?
This is likely from the atomic dec operation itself. Then there is also
the preemption counter operation. The inc should be also visible but
might be inlined into the hash operation.
This is _just_ the atomic inc/ dec that doubled the "throughput" but you
don't have anything from the regular path.
Anyway. To avoid the atomic part we would need to have a per-CPU counter
instead of a global one and a more expensive slow path for the resize
since you have to sum up all the per-CPU counters and so on. Not sure it
is worth it.
> 2. Since number of buckets would be less by default, this would cause hb
> collision. This was seen by queued_spin_lock_slowpath. Increased the hash
> bucket size what was before the series. That brought the numbers back to
> 1.5M. This could be achieved with prctl in perf/bench/futex-hash.c i guess.
Yes. The idea is to avoid a resize at runtime and setting to something
you know best. You can also use it now to disable the private hash and
stick with the global.
> Note: Just increasing the hash bucket size without point 1, didn't matter much.
Sebastian
next prev parent reply other threads:[~2025-03-26 9:31 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 15:16 [PATCH v10 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 01/21] rcuref: Provide rcuref_is_dead() Sebastian Andrzej Siewior
2025-03-13 4:23 ` Joel Fernandes
2025-03-13 7:55 ` Sebastian Andrzej Siewior
2025-03-14 10:36 ` Peter Zijlstra
2025-03-12 15:16 ` [PATCH v10 02/21] futex: Move futex_queue() into futex_wait_setup() Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 03/21] futex: Pull futex_hash() out of futex_q_lock() Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 04/21] futex: Create hb scopes Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 05/21] futex: Create futex_hash() get/put class Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 06/21] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 07/21] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 08/21] futex: Hash only the address for private futexes Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 09/21] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 10/21] futex: Decrease the waiter count before the unlock operation Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 11/21] futex: Introduce futex_q_lockptr_lock() Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 12/21] futex: Acquire a hash reference in futex_wait_multiple_setup() Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 13/21] futex: Allow to re-allocate the private local hash Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 14/21] futex: Resize local futex hash table based on number of threads Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 15/21] futex: s/hb_p/fph/ Sebastian Andrzej Siewior
2025-03-14 12:36 ` Peter Zijlstra
2025-03-14 13:10 ` Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 16/21] futex: Remove superfluous state Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 17/21] futex: Untangle and naming Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 18/21] futex: Rework SET_SLOTS Sebastian Andrzej Siewior
2025-03-26 15:37 ` Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 19/21] mm: Add vmalloc_huge_node() Sebastian Andrzej Siewior
2025-03-12 22:02 ` Andrew Morton
2025-03-13 7:59 ` Sebastian Andrzej Siewior
2025-03-13 22:08 ` Andrew Morton
2025-03-14 9:59 ` Sebastian Andrzej Siewior
2025-03-14 10:34 ` Andrew Morton
2025-03-12 15:16 ` [PATCH v10 20/21] futex: Implement FUTEX2_NUMA Sebastian Andrzej Siewior
2025-03-25 19:52 ` Shrikanth Hegde
2025-03-25 22:52 ` Peter Zijlstra
2025-03-25 22:56 ` Peter Zijlstra
2025-03-26 12:57 ` Shrikanth Hegde
2025-03-26 13:37 ` Peter Zijlstra
2025-03-26 15:06 ` Shrikanth Hegde
2025-03-26 8:03 ` Sebastian Andrzej Siewior
2025-03-12 15:16 ` [PATCH v10 21/21] futex: Implement FUTEX2_MPOL Sebastian Andrzej Siewior
2025-03-12 15:18 ` [PATCH v10 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL Sebastian Andrzej Siewior
2025-03-14 10:42 ` Peter Zijlstra
2025-03-14 10:58 ` Peter Zijlstra
2025-03-14 11:28 ` Sebastian Andrzej Siewior
2025-03-14 11:41 ` Peter Zijlstra
2025-03-14 12:00 ` Sebastian Andrzej Siewior
2025-03-14 12:30 ` Peter Zijlstra
2025-03-14 13:30 ` Sebastian Andrzej Siewior
2025-03-14 14:18 ` Peter Zijlstra
2025-03-14 14:40 ` Paul E. McKenney
2025-03-18 13:24 ` Shrikanth Hegde
2025-03-18 16:12 ` Davidlohr Bueso
2025-03-25 19:04 ` Shrikanth Hegde
2025-03-26 9:31 ` Sebastian Andrzej Siewior [this message]
2025-03-26 12:54 ` Shrikanth Hegde
2025-03-26 14:01 ` Sebastian Andrzej Siewior
2025-03-26 8:49 ` Sebastian Andrzej Siewior
2025-04-07 16:15 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250326093153.Ib5b2p6M@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=andrealmeid@igalia.com \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=nysal@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox