* [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option
@ 2026-04-17 0:24 Tejun Heo
2026-04-17 0:24 ` [PATCH for-7.1-fixes 2/2] sched_ext: mark scx_sched_hash and dsq_hash no_sync_grow Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Tejun Heo @ 2026-04-17 0:24 UTC (permalink / raw)
To: Herbert Xu, Thomas Graf, David Vernet, Andrea Righi, Changwoo Min
Cc: Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel
The sync grow path on insert calls get_random_u32() and kvmalloc(), both of
which take regular spinlocks and are unsafe under raw_spinlock_t. Add an
opt-in flag that skips the >100% grow check; inserts always succeed by
appending to the chain, and the deferred worker still grows at >75% load.
sched_ext already uses rhashtable under raw_spinlock_t and is technically
broken, though hard to hit in practice because the tables stay small.
Cc: stable@vger.kernel.org # prerequisite for the next sched_ext fix
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello,
The follow-up sched_ext patch is a fix targeting sched_ext/for-7.1-fixes
which I'd like to send Linus's way sooner than later. Would it be okay
to route both patches through sched_ext/for-7.1-fixes? If you'd prefer
to route the rhashtable change differently, that works too. Please let
me know, thanks.
Based on linus/master (3cd8b194bf34).
include/linux/rhashtable-types.h | 2 ++
include/linux/rhashtable.h | 5 ++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h
index 015c8298bebc..555eea6077f8 100644
--- a/include/linux/rhashtable-types.h
+++ b/include/linux/rhashtable-types.h
@@ -50,6 +50,7 @@ typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg,
* @max_size: Maximum size while expanding
* @min_size: Minimum size while shrinking
* @automatic_shrinking: Enable automatic shrinking of tables
+ * @no_sync_grow: Skip sync grow on insert (allows inserts under raw_spinlock_t)
* @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
* @obj_hashfn: Function to hash object
* @obj_cmpfn: Function to compare key with object
@@ -62,6 +63,7 @@ struct rhashtable_params {
unsigned int max_size;
u16 min_size;
bool automatic_shrinking;
+ bool no_sync_grow;
rht_hashfn_t hashfn;
rht_obj_hashfn_t obj_hashfn;
rht_obj_cmpfn_t obj_cmpfn;
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 0480509a6339..a977a597d288 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -197,11 +197,14 @@ static inline bool rht_shrink_below_30(const struct rhashtable *ht,
* rht_grow_above_100 - returns true if nelems > table-size
* @ht: hash table
* @tbl: current table
+ *
+ * Returns false if the caller opted out of synchronous grow.
*/
static inline bool rht_grow_above_100(const struct rhashtable *ht,
const struct bucket_table *tbl)
{
- return atomic_read(&ht->nelems) > tbl->size &&
+ return !ht->p.no_sync_grow &&
+ atomic_read(&ht->nelems) > tbl->size &&
(!ht->p.max_size || tbl->size < ht->p.max_size);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH for-7.1-fixes 2/2] sched_ext: mark scx_sched_hash and dsq_hash no_sync_grow 2026-04-17 0:24 [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Tejun Heo @ 2026-04-17 0:24 ` Tejun Heo 2026-04-17 0:43 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu 2026-04-17 1:22 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu 2 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2026-04-17 0:24 UTC (permalink / raw) To: Herbert Xu, Thomas Graf, David Vernet, Andrea Righi, Changwoo Min Cc: Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel Both are inserted/removed under raw_spinlock_t (scx_sched_lock and dsq->lock respectively). rhashtable's sync grow path is unsafe under raw_spinlock_t. Set no_sync_grow so inserts only ever take the bucket bit-spin lock; the deferred worker continues to grow the table. Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Fixes: 25037af712eb ("sched_ext: Add rhashtable lookup for sub-schedulers") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/sched/ext.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 012ca8bd70fb..3f1467fde075 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -32,6 +32,7 @@ static const struct rhashtable_params scx_sched_hash_params = { .key_len = sizeof_field(struct scx_sched, ops.sub_cgroup_id), .key_offset = offsetof(struct scx_sched, ops.sub_cgroup_id), .head_offset = offsetof(struct scx_sched, hash_node), + .no_sync_grow = true, /* inserted under scx_sched_lock */ }; static struct rhashtable scx_sched_hash; @@ -122,6 +123,7 @@ static const struct rhashtable_params dsq_hash_params = { .key_len = sizeof_field(struct scx_dispatch_q, id), .key_offset = offsetof(struct scx_dispatch_q, id), .head_offset = offsetof(struct scx_dispatch_q, hash_node), + .no_sync_grow = true, /* removed under dsq->lock */ }; static LLIST_HEAD(dsqs_to_free); -- 2.53.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 0:24 [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Tejun Heo 2026-04-17 0:24 ` [PATCH for-7.1-fixes 2/2] sched_ext: mark scx_sched_hash and dsq_hash no_sync_grow Tejun Heo @ 2026-04-17 0:43 ` Herbert Xu 2026-04-17 0:53 ` Tejun Heo 2026-04-17 1:22 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu 2 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-17 0:43 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel On Thu, Apr 16, 2026 at 02:24:48PM -1000, Tejun Heo wrote: > The sync grow path on insert calls get_random_u32() and kvmalloc(), both of Where does the sync grow path call kvmalloc? I think it's supposed to use GFP_ATOMIC kmalloc only. Only the normal growth path does the kvmalloc for a linear hash table. The emergency path is supposed to do page-by-page allocations to minimise failures. As to get_random_u32, we can probably avoid doing it for emergency growing and continue to reuse the existing seed. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 0:43 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu @ 2026-04-17 0:53 ` Tejun Heo 2026-04-17 1:11 ` Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2026-04-17 0:53 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel Hello, On Fri, Apr 17, 2026 at 08:43:30AM +0800, Herbert Xu wrote: > On Thu, Apr 16, 2026 at 02:24:48PM -1000, Tejun Heo wrote: > > The sync grow path on insert calls get_random_u32() and kvmalloc(), both of > > Where does the sync grow path call kvmalloc? I think it's supposed Oops, that's a mistake. I meant GFP_ATOMIC kmalloc allocation. kmalloc uses regular spin_lock so can't be called under raw_spin_lock. There's the new kmalloc_nolock() but that means even smaller reserve size, so higher chance of failing. I'm not sure it can even accomodate larger allocations. Another aspect is that for some use cases, it's more problematic to fail insertion than delaying hash table resize (e.g. that can lead to fork failures on a thrashing system). > to use GFP_ATOMIC kmalloc only. Only the normal growth path does > the kvmalloc for a linear hash table. The emergency path is supposed > to do page-by-page allocations to minimise failures. > > As to get_random_u32, we can probably avoid doing it for emergency > growing and continue to reuse the existing seed. Oh, that's great. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 0:53 ` Tejun Heo @ 2026-04-17 1:11 ` Herbert Xu 2026-04-17 7:38 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-17 1:11 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel On Thu, Apr 16, 2026 at 02:53:41PM -1000, Tejun Heo wrote: > > Oops, that's a mistake. I meant GFP_ATOMIC kmalloc allocation. kmalloc uses > regular spin_lock so can't be called under raw_spin_lock. There's the new > kmalloc_nolock() but that means even smaller reserve size, so higher chance > of failing. I'm not sure it can even accomodate larger allocations. We should at least try to grow even if it fails. > Another aspect is that for some use cases, it's more problematic to fail > insertion than delaying hash table resize (e.g. that can lead to fork > failures on a thrashing system). rhashtable is meant to grow way before you reach 100% occupancy. So the fact that you even reached this point means that something has gone terribly wrong. Is this failure reproducible? I had a look at kernel/sched/ext.c and all the calls to rhashtable insertion come from thread context. So why does it even use a raw spinlock? Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 1:11 ` Herbert Xu @ 2026-04-17 7:38 ` Tejun Heo 2026-04-17 7:51 ` Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2026-04-17 7:38 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel Hello, On Fri, Apr 17, 2026 at 09:11:12AM +0800, Herbert Xu wrote: > On Thu, Apr 16, 2026 at 02:53:41PM -1000, Tejun Heo wrote: > > > > Oops, that's a mistake. I meant GFP_ATOMIC kmalloc allocation. kmalloc uses > > regular spin_lock so can't be called under raw_spin_lock. There's the new > > kmalloc_nolock() but that means even smaller reserve size, so higher chance > > of failing. I'm not sure it can even accomodate larger allocations. > > We should at least try to grow even if it fails. > > > Another aspect is that for some use cases, it's more problematic to fail > > insertion than delaying hash table resize (e.g. that can lead to fork > > failures on a thrashing system). > > rhashtable is meant to grow way before you reach 100% occupancy. > So the fact that you even reached this point means that something > has gone terribly wrong. > > Is this failure reproducible? > > I had a look at kernel/sched/ext.c and all the calls to rhashtable > insertion come from thread context. So why does it even use a > raw spinlock? Yeah, the DSQ conversion is incorrect. Sorry about that. In the current master branch, there's another rhashtable - scx_sched_hash - which is inserted under scx_sched_lock which is a raw spin lock. This is a bit difficult to trigger because each insertion involves loading a subscheduler which usually takes quite some time and the whole operatoin is serialized. In a devel branch, I'm adding a unique thread ID to task rhashtable - scx_tid_hash: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git which is initialized and populated when the root scheduler loads. As all existing tasks are inserted during init, it ends up expanding the hashtable rapidly and it's not difficult to make it hit the synchronous growth path. - Build kernel and boot. Build the example schedulers - `make -C tools/sched_ext` - Create a lot of dormant threads: stress-ng --pthread 1 --pthread-max 10000 --timeout 60s - Run the qmap scheduler which enables tid hashing: tools/sched_ext/build/bin/scx_qmap and the following lockdep triggers: [ 34.344355] sched_ext: BPF scheduler "qmap" enabled [ 35.999783] sched_ext: BPF scheduler "qmap" disabled (unregistered from user space) [ 92.996704] [ 92.996947] ============================= [ 92.997462] [ BUG: Invalid wait context ] [ 92.997974] 7.0.0-work-08607-g4e2e9e22ddbe-dirty #423 Not tainted [ 92.998759] ----------------------------- [ 92.999285] scx_enable_help/1907 is trying to lock: [ 92.999903] ffff888237aa9450 (batched_entropy_u32.lock){..-.}-{3:3}, at: get_random_u32+0x40/0x270 [ 93.001047] other info that might help us debug this: [ 93.001716] context-{5:5} [ 93.002057] 6 locks held by scx_enable_help/1907: [ 93.002677] #0: ffffffff83a90d48 (scx_enable_mutex){+.+.}-{4:4}, at: scx_root_enable_workfn+0x2b/0xad0 [ 93.003872] #1: ffffffff83a90e40 (scx_fork_rwsem){++++}-{0:0}, at: scx_root_enable_workfn+0x407/0xad0 [ 93.005053] #2: ffffffff83a90f80 (scx_cgroup_ops_rwsem){++++}-{0:0}, at: scx_cgroup_lock+0x11/0x20 [ 93.006380] #3: ffffffff83c3de28 (cgroup_mutex){+.+.}-{4:4}, at: scx_root_enable_workfn+0x436/0xad0 [ 93.007535] #4: ffffffff83a90e88 (scx_tasks_lock){....}-{2:2}, at: scx_root_enable_workfn+0x511/0xad0 [ 93.008757] #5: ffffffff83b77c98 (rcu_read_lock){....}-{1:3}, at: rhashtable_insert_slow+0x65/0x9e0 [ 93.009909] stack backtrace: [ 93.010298] CPU: 2 UID: 0 PID: 1907 Comm: scx_enable_help Not tainted 7.0.0-work-08607-g4e2e9e22ddbe-dirty #423 PREEMPT(full) [ 93.010300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 02/02/2022 [ 93.010301] Sched_ext: qmap (enabling) [ 93.010301] Call Trace: [ 93.010304] <TASK> [ 93.010305] dump_stack_lvl+0x50/0x70 [ 93.010307] __lock_acquire+0x9e8/0x2760 [ 93.010314] lock_acquire+0xb2/0x240 [ 93.010317] get_random_u32+0x58/0x270 [ 93.010321] bucket_table_alloc+0xa6/0x190 [ 93.010322] rhashtable_insert_slow+0x8ac/0x9e0 [ 93.010325] scx_root_enable_workfn+0x893/0xad0 [ 93.010328] kthread_worker_fn+0x108/0x300 [ 93.010330] kthread+0xfa/0x120 [ 93.010332] ret_from_fork+0x175/0x320 [ 93.010334] ret_from_fork_asm+0x1a/0x30 [ 93.010336] </TASK> Now, I can move the insertion out of the raw scx_tasks_lock; however, that complicates init path as now it has to account for possible races against task exit path. Also, taking a step back, if rhashtable allows usage under raw spin locks, isn't this broken regardless of how easy or difficult it may be to reproduce the problem? Practically speaking, the scx_sched_hash one is unlikely to trigger in real world; however, it is still theoretically possible and I'm pretty positive that one would be able to create a repro case with the right interference workload. It'd be contrived for sure but should be possible. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 7:38 ` Tejun Heo @ 2026-04-17 7:51 ` Herbert Xu 2026-04-17 16:25 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-17 7:51 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev On Thu, Apr 16, 2026 at 09:38:52PM -1000, Tejun Heo wrote: > > Also, taking a step back, if rhashtable allows usage under raw spin locks, > isn't this broken regardless of how easy or difficult it may be to reproduce > the problem? Practically speaking, the scx_sched_hash one is unlikely to > trigger in real world; however, it is still theoretically possible and I'm > pretty positive that one would be able to create a repro case with the right > interference workload. It'd be contrived for sure but should be possible. rhashtable originated in networking where it tries very hard to stop the hash table from ever degenerating into a linked list. If your use-case is not as adversarial as that, and you're happy for the hash table to degenerate into a linked-list in the worst case, then yes it's aboslutely fine to not grow the table (or try to grow it and fail with kmalloc_nolock). It's just that we haven't had any users like this until now and the feature that you want got removed because of that. I'm more than happy to bring it back (commit 5f8ddeab10ce). Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 7:51 ` Herbert Xu @ 2026-04-17 16:25 ` Tejun Heo 2026-04-18 0:44 ` Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2026-04-17 16:25 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev Hello, On Fri, Apr 17, 2026 at 03:51:20PM +0800, Herbert Xu wrote: > rhashtable originated in networking where it tries very hard to > stop the hash table from ever degenerating into a linked list. I see. > If your use-case is not as adversarial as that, and you're happy > for the hash table to degenerate into a linked-list in the worst > case, then yes it's aboslutely fine to not grow the table (or > try to grow it and fail with kmalloc_nolock). My use case is a bit different. I want a resizable hashtable which can be used under raw spinlock and doesn't fail unnecessarily. My only adversary is memory pressure and operation failures can be harmful. ie. If the system is under severe memory pressure, hashtable becoming temporarily slower is not a big problem as long as it restores reasonable operation once the system recovers. However, if the insertion operation fails under e.g. sudden network rx burst that drains atomic reserve, that can lead to fatal failure - e.g. forks failing out of blue on a busy but mostly okay system. I think this pretty much requires all hashtable growths to be asynchronous. > It's just that we haven't had any users like this until now and > the feature that you want got removed because of that. > > I'm more than happy to bring it back (commit 5f8ddeab10ce). That'd be great but looking at the commit, I'm not sure it reliably avoids allocation in the synchronous path. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 16:25 ` Tejun Heo @ 2026-04-18 0:44 ` Herbert Xu 2026-04-18 0:52 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-18 0:44 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev On Fri, Apr 17, 2026 at 06:25:22AM -1000, Tejun Heo wrote: > > That'd be great but looking at the commit, I'm not sure it reliably avoids > allocation in the synchronous path. If insecure_elasticity is set it should skip the slow path altogether and just do the insertion unconditionally. So there will be no kmallocs at all. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-18 0:44 ` Herbert Xu @ 2026-04-18 0:52 ` Tejun Heo 2026-04-18 0:53 ` Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2026-04-18 0:52 UTC (permalink / raw) To: Herbert Xu Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev Hello, On Sat, Apr 18, 2026 at 08:44:33AM +0800, Herbert Xu wrote: > On Fri, Apr 17, 2026 at 06:25:22AM -1000, Tejun Heo wrote: > > > > That'd be great but looking at the commit, I'm not sure it reliably avoids > > allocation in the synchronous path. > > If insecure_elasticity is set it should skip the slow path > altogether and just do the insertion unconditionally. So > there will be no kmallocs at all. I see. Thanks, that should work. How should we go about reverting the removal? Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-18 0:52 ` Tejun Heo @ 2026-04-18 0:53 ` Herbert Xu 2026-04-18 1:38 ` [PATCH] rhashtable: Restore insecure_elasticity toggle Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-18 0:53 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev On Fri, Apr 17, 2026 at 02:52:57PM -1000, Tejun Heo wrote: > > I see. Thanks, that should work. How should we go about reverting the > removal? I'll work on that today and then you can include it in your two-patch series. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] rhashtable: Restore insecure_elasticity toggle 2026-04-18 0:53 ` Herbert Xu @ 2026-04-18 1:38 ` Herbert Xu 2026-04-18 1:41 ` [v2 PATCH] " Herbert Xu 0 siblings, 1 reply; 14+ messages in thread From: Herbert Xu @ 2026-04-18 1:38 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev, NeilBrown Some users of rhashtable cannot handle insertion failures, and are happy to accept the consequences of a hash table that having very long chains. Restore the insecure_elasticity toggle for these users. In addition to disabling the chain length checks, this also removes the emergency resize that would otherwise occur when the hash table occupancy hits 100% (an async resize is still scheduled at 75%). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h index 015c8298bebc..72082428d6c6 100644 --- a/include/linux/rhashtable-types.h +++ b/include/linux/rhashtable-types.h @@ -49,6 +49,7 @@ typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg, * @head_offset: Offset of rhash_head in struct to be hashed * @max_size: Maximum size while expanding * @min_size: Minimum size while shrinking + * @insecure_elasticity: Set to true to disable chain length checks * @automatic_shrinking: Enable automatic shrinking of tables * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash) * @obj_hashfn: Function to hash object @@ -61,6 +62,7 @@ struct rhashtable_params { u16 head_offset; unsigned int max_size; u16 min_size; + bool insecure_elasticity; bool automatic_shrinking; rht_hashfn_t hashfn; rht_obj_hashfn_t obj_hashfn; diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 0480509a6339..c793849d3f61 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -821,14 +821,15 @@ static __always_inline void *__rhashtable_insert_fast( goto out; } - if (elasticity <= 0) + if (elasticity <= 0 && !params->insecure_elasticity) goto slow_path; data = ERR_PTR(-E2BIG); if (unlikely(rht_grow_above_max(ht, tbl))) goto out_unlock; - if (unlikely(rht_grow_above_100(ht, tbl))) + if (unlikely(rht_grow_above_100(ht, tbl)) && + !params->insecure_elasticity) goto slow_path; /* Inserting at head of list makes unlocking free. */ diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 6074ed5f66f3..b60d55e5b19b 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -538,7 +538,7 @@ static void *rhashtable_lookup_one(struct rhashtable *ht, return NULL; } - if (elasticity <= 0) + if (elasticity <= 0 && !ht->p->insecure_elasticity) return ERR_PTR(-EAGAIN); return ERR_PTR(-ENOENT); @@ -568,7 +568,8 @@ static struct bucket_table *rhashtable_insert_one( if (unlikely(rht_grow_above_max(ht, tbl))) return ERR_PTR(-E2BIG); - if (unlikely(rht_grow_above_100(ht, tbl))) + if (unlikely(rht_grow_above_100(ht, tbl)) && + !ht->p->insecure_elasticity) return ERR_PTR(-EAGAIN); head = rht_ptr(bkt, tbl, hash); -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v2 PATCH] rhashtable: Restore insecure_elasticity toggle 2026-04-18 1:38 ` [PATCH] rhashtable: Restore insecure_elasticity toggle Herbert Xu @ 2026-04-18 1:41 ` Herbert Xu 0 siblings, 0 replies; 14+ messages in thread From: Herbert Xu @ 2026-04-18 1:41 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev, NeilBrown This one actually compiles. ---8<--- Some users of rhashtable cannot handle insertion failures, and are happy to accept the consequences of a hash table that having very long chains. Restore the insecure_elasticity toggle for these users. In addition to disabling the chain length checks, this also removes the emergency resize that would otherwise occur when the hash table occupancy hits 100% (an async resize is still scheduled at 75%). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h index 015c8298bebc..72082428d6c6 100644 --- a/include/linux/rhashtable-types.h +++ b/include/linux/rhashtable-types.h @@ -49,6 +49,7 @@ typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg, * @head_offset: Offset of rhash_head in struct to be hashed * @max_size: Maximum size while expanding * @min_size: Minimum size while shrinking + * @insecure_elasticity: Set to true to disable chain length checks * @automatic_shrinking: Enable automatic shrinking of tables * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash) * @obj_hashfn: Function to hash object @@ -61,6 +62,7 @@ struct rhashtable_params { u16 head_offset; unsigned int max_size; u16 min_size; + bool insecure_elasticity; bool automatic_shrinking; rht_hashfn_t hashfn; rht_obj_hashfn_t obj_hashfn; diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 0480509a6339..7def3f0f556b 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -821,14 +821,15 @@ static __always_inline void *__rhashtable_insert_fast( goto out; } - if (elasticity <= 0) + if (elasticity <= 0 && !params.insecure_elasticity) goto slow_path; data = ERR_PTR(-E2BIG); if (unlikely(rht_grow_above_max(ht, tbl))) goto out_unlock; - if (unlikely(rht_grow_above_100(ht, tbl))) + if (unlikely(rht_grow_above_100(ht, tbl)) && + !params.insecure_elasticity) goto slow_path; /* Inserting at head of list makes unlocking free. */ diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 6074ed5f66f3..fb2b7bc137ba 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -538,7 +538,7 @@ static void *rhashtable_lookup_one(struct rhashtable *ht, return NULL; } - if (elasticity <= 0) + if (elasticity <= 0 && !ht->p.insecure_elasticity) return ERR_PTR(-EAGAIN); return ERR_PTR(-ENOENT); @@ -568,7 +568,8 @@ static struct bucket_table *rhashtable_insert_one( if (unlikely(rht_grow_above_max(ht, tbl))) return ERR_PTR(-E2BIG); - if (unlikely(rht_grow_above_100(ht, tbl))) + if (unlikely(rht_grow_above_100(ht, tbl)) && + !ht->p.insecure_elasticity) return ERR_PTR(-EAGAIN); head = rht_ptr(bkt, tbl, hash); -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option 2026-04-17 0:24 [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Tejun Heo 2026-04-17 0:24 ` [PATCH for-7.1-fixes 2/2] sched_ext: mark scx_sched_hash and dsq_hash no_sync_grow Tejun Heo 2026-04-17 0:43 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu @ 2026-04-17 1:22 ` Herbert Xu 2 siblings, 0 replies; 14+ messages in thread From: Herbert Xu @ 2026-04-17 1:22 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Graf, David Vernet, Andrea Righi, Changwoo Min, Emil Tsalapatis, linux-crypto, sched-ext, linux-kernel, Florian Westphal, netdev On Thu, Apr 16, 2026 at 02:24:48PM -1000, Tejun Heo wrote: > > The follow-up sched_ext patch is a fix targeting sched_ext/for-7.1-fixes > which I'd like to send Linus's way sooner than later. Would it be okay > to route both patches through sched_ext/for-7.1-fixes? If you'd prefer > to route the rhashtable change differently, that works too. Please let > me know, thanks. As I said earlier, we should work out if this is really needed or not. But if it is needed, we did have this feature before. It's called insecure_elasticity which was removed because we moved all its users off to better solutions. We can always bring it back. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-04-18 1:41 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-17 0:24 [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Tejun Heo 2026-04-17 0:24 ` [PATCH for-7.1-fixes 2/2] sched_ext: mark scx_sched_hash and dsq_hash no_sync_grow Tejun Heo 2026-04-17 0:43 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu 2026-04-17 0:53 ` Tejun Heo 2026-04-17 1:11 ` Herbert Xu 2026-04-17 7:38 ` Tejun Heo 2026-04-17 7:51 ` Herbert Xu 2026-04-17 16:25 ` Tejun Heo 2026-04-18 0:44 ` Herbert Xu 2026-04-18 0:52 ` Tejun Heo 2026-04-18 0:53 ` Herbert Xu 2026-04-18 1:38 ` [PATCH] rhashtable: Restore insecure_elasticity toggle Herbert Xu 2026-04-18 1:41 ` [v2 PATCH] " Herbert Xu 2026-04-17 1:22 ` [PATCH for-7.1-fixes 1/2] rhashtable: add no_sync_grow option Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox