* [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
@ 2024-11-28 12:16 Breno Leitao
2024-11-29 5:47 ` Herbert Xu
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Breno Leitao @ 2024-11-28 12:16 UTC (permalink / raw)
To: Andrew Morton, Thomas Graf, Herbert Xu, Tejun Heo, Hao Luo,
Josh Don, Barret Rhoden
Cc: netdev, linux-kernel, Breno Leitao
Move the hash table growth check and work scheduling outside the
rht lock to prevent a possible circular locking dependency.
The original implementation could trigger a lockdep warning due to
a potential deadlock scenario involving nested locks between
rhashtable bucket, rq lock, and dsq lock. By relocating the
growth check and work scheduling after releasing the rth lock, we break
this potential deadlock chain.
This change expands the flexibility of rhashtable by removing
restrictive locking that previously limited its use in scheduler
and workqueue contexts.
Import to say that this calls rht_grow_above_75(), which reads from
struct rhashtable without holding the lock, if this is a problem, we can
move the check to the lock, and schedule the workqueue after the lock.
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
lib/rhashtable.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6c902639728b767cc3ee42c61256d2e9618e6ce7..5a27ccd72db9a25d92d1ed2f8d519afcfc672afe 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -585,9 +585,6 @@ static struct bucket_table *rhashtable_insert_one(
rht_assign_locked(bkt, obj);
atomic_inc(&ht->nelems);
- if (rht_grow_above_75(ht, tbl))
- schedule_work(&ht->run_work);
-
return NULL;
}
@@ -624,6 +621,9 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
data = ERR_CAST(new_tbl);
rht_unlock(tbl, bkt, flags);
+ if (rht_grow_above_75(ht, tbl))
+ schedule_work(&ht->run_work);
+
}
} while (!IS_ERR_OR_NULL(new_tbl));
---
base-commit: 0a31ca318eea4da46e5f495c79ccc4442c92f4dc
change-id: 20241128-scx_lockdep-3fa87553609d
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao @ 2024-11-29 5:47 ` Herbert Xu 2024-12-02 20:16 ` Breno Leitao 2024-12-03 20:16 ` Tejun Heo ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Herbert Xu @ 2024-11-29 5:47 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote: > Move the hash table growth check and work scheduling outside the > rht lock to prevent a possible circular locking dependency. > > The original implementation could trigger a lockdep warning due to > a potential deadlock scenario involving nested locks between > rhashtable bucket, rq lock, and dsq lock. By relocating the > growth check and work scheduling after releasing the rth lock, we break > this potential deadlock chain. > > This change expands the flexibility of rhashtable by removing > restrictive locking that previously limited its use in scheduler > and workqueue contexts. Could you please explain the deadlock? Is the workqueue system actually using rhashtable? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-11-29 5:47 ` Herbert Xu @ 2024-12-02 20:16 ` Breno Leitao 0 siblings, 0 replies; 25+ messages in thread From: Breno Leitao @ 2024-12-02 20:16 UTC (permalink / raw) To: Herbert Xu Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel Hello Herbert, On Fri, Nov 29, 2024 at 01:47:42PM +0800, Herbert Xu wrote: > On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote: > > Move the hash table growth check and work scheduling outside the > > rht lock to prevent a possible circular locking dependency. > > > > The original implementation could trigger a lockdep warning due to > > a potential deadlock scenario involving nested locks between > > rhashtable bucket, rq lock, and dsq lock. By relocating the > > growth check and work scheduling after releasing the rth lock, we break > > this potential deadlock chain. > > > > This change expands the flexibility of rhashtable by removing > > restrictive locking that previously limited its use in scheduler > > and workqueue contexts. > > Could you please explain the deadlock? I understand that there is a locking order inversion here: A possible code flow can hold the rhashtable_bucket, and then can get the dqs->lock, as shown int eh following snippet: Chain exists of: rhashtable_bucket --> &rq->__lock --> &dsq->lock The same is true, when sched_ext holds rthe dsq->lock and try to get hold of rhashtable lock. This could be seen in the following snippers: rht_lock+0x69/0xd0 destroy_dsq+0x22d/0x790 scx_ops_disable_workfn+0x9d2/0xaf0 > Is the workqueue system actually using rhashtable? It seems so when using sched_ext scheduler class. For instance, lockdep got it in scx_ops_disable_workfn(). rht_lock+0x69/0xd0 destroy_dsq+0x22d/0x790 scx_ops_disable_workfn+0x9d2/0xaf0 kthread_worker_fn+0x137/0x350 This is the full lockdep splat, if it helps. Sorry it is not decoded, but this can give you the code-flow. ====================================================== WARNING: possible circular locking dependency detected hardirqs last enabled at (2088145): [<ffffffff822ab674>] _raw_spin_unlock_irq+0x24/0x50 hardirqs last disabled at (2088144): [<ffffffff822ab4bf>] _raw_spin_lock_irq+0x2f/0x80 ------------------------------------------------------ softirqs last enabled at (2088116): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100 sched_ext_ops_h/10451 is trying to acquire lock: softirqs last disabled at (2088111): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100 ffff888288059038 (rhashtable_bucket){....}-{0:0}, at: rht_lock+0x51/0xd0 but task is already holding lock: ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&dsq->lock){-.-.}-{2:2}: _raw_spin_lock+0x2f/0x40 dispatch_enqueue+0x7c/0x3e0 enqueue_task_scx.llvm.3416789782249720787+0x1ae/0x250 sched_enq_and_set_task+0x5f/0xb0 bpf_scx_reg+0xf21/0x11b0 bpf_struct_ops_link_create+0xec/0x160 __sys_bpf+0x34d/0x3b0 __x64_sys_bpf+0x18/0x20 do_syscall_64+0x7e/0x150 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #3 (&rq->__lock){-.-.}-{2:2}: _raw_spin_lock_nested+0x32/0x40 raw_spin_rq_lock_nested+0x20/0x30 task_fork_fair.llvm.5382994275699419189+0x3b/0x110 sched_cgroup_fork+0xe3/0x100 copy_process+0xc3c/0x14a0 kernel_clone+0x90/0x360 user_mode_thread+0xbc/0xe0 rest_init+0x1f/0x1f0 start_kernel+0x41b/0x470 x86_64_start_reservations+0x26/0x30 x86_64_start_kernel+0x9b/0xa0 common_startup_64+0x13e/0x140 -> #2 (&p->pi_lock){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x5a/0x90 try_to_wake_up+0x58/0x730 create_worker+0x1d6/0x240 workqueue_init+0x2c0/0x390 kernel_init_freeable+0x147/0x200 kernel_init+0x16/0x1c0 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x11/0x20 -> #1 (&pool->lock){-.-.}-{2:2}: _raw_spin_lock+0x2f/0x40 __queue_work+0x24b/0x610 queue_work_on+0xa5/0xf0 rhashtable_insert_slow+0x524/0x970 __xdp_reg_mem_model+0x181/0x240 xdp_rxq_info_reg_mem_model+0x19/0xf0 bnxt_alloc_mem+0x1178/0x1c80 __bnxt_open_nic+0x1bb/0xe20 bnxt_open_nic+0x26/0x60 ethtool_set_channels+0x1b7/0x1f0 dev_ethtool+0x555/0x740 dev_ioctl+0x2ac/0x3f0 sock_do_ioctl+0x111/0x180 sock_ioctl+0x1fb/0x2e0 __se_sys_ioctl+0x72/0xc0 do_syscall_64+0x7e/0x150 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (rhashtable_bucket){....}-{0:0}: __lock_acquire+0x1742/0x3470 lock_acquire+0xf0/0x290 rht_lock+0x69/0xd0 destroy_dsq+0x22d/0x790 scx_ops_disable_workfn+0x9d2/0xaf0 kthread_worker_fn+0x137/0x350 kthread+0x102/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x11/0x20 other info that might help us debug this: Chain exists of: rhashtable_bucket --> &rq->__lock --> &dsq->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dsq->lock); lock(&rq->__lock); lock(&dsq->lock); lock(rhashtable_bucket); *** DEADLOCK *** 5 locks held by sched_ext_ops_h/10451: #0: ffffffff83695ad0 (scx_ops_enable_mutex){+.+.}-{3:3}, at: scx_ops_disable_workfn+0x111/0xaf0 #1: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: rhashtable_walk_start_check+0x1f/0x3e0 #2: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790 #3: ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790 #4: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao 2024-11-29 5:47 ` Herbert Xu @ 2024-12-03 20:16 ` Tejun Heo 2024-12-04 1:34 ` Herbert Xu 2024-12-12 12:33 ` Herbert Xu 2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck 3 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2024-12-03 20:16 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, Thomas Graf, Herbert Xu, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote: > Move the hash table growth check and work scheduling outside the > rht lock to prevent a possible circular locking dependency. > > The original implementation could trigger a lockdep warning due to > a potential deadlock scenario involving nested locks between > rhashtable bucket, rq lock, and dsq lock. By relocating the > growth check and work scheduling after releasing the rth lock, we break > this potential deadlock chain. > > This change expands the flexibility of rhashtable by removing > restrictive locking that previously limited its use in scheduler > and workqueue contexts. > > Import to say that this calls rht_grow_above_75(), which reads from > struct rhashtable without holding the lock, if this is a problem, we can > move the check to the lock, and schedule the workqueue after the lock. > > Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Tejun Heo <tj@kernel.org> This solves a possible deadlock for sched_ext and makes rhashtable more useful and I don't see any downsides. Andrew, can you please pick up this one? Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-12-03 20:16 ` Tejun Heo @ 2024-12-04 1:34 ` Herbert Xu 0 siblings, 0 replies; 25+ messages in thread From: Herbert Xu @ 2024-12-04 1:34 UTC (permalink / raw) To: Tejun Heo Cc: Breno Leitao, Andrew Morton, Thomas Graf, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Tue, Dec 03, 2024 at 10:16:24AM -1000, Tejun Heo wrote: > > This solves a possible deadlock for sched_ext and makes rhashtable more > useful and I don't see any downsides. > > Andrew, can you please pick up this one? I will be taking this through my tree once I've reviewed fully. Thanks for your patience. -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao 2024-11-29 5:47 ` Herbert Xu 2024-12-03 20:16 ` Tejun Heo @ 2024-12-12 12:33 ` Herbert Xu 2024-12-21 9:06 ` Herbert Xu 2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck 3 siblings, 1 reply; 25+ messages in thread From: Herbert Xu @ 2024-12-12 12:33 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote: > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index 6c902639728b767cc3ee42c61256d2e9618e6ce7..5a27ccd72db9a25d92d1ed2f8d519afcfc672afe 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -585,9 +585,6 @@ static struct bucket_table *rhashtable_insert_one( > rht_assign_locked(bkt, obj); > > atomic_inc(&ht->nelems); > - if (rht_grow_above_75(ht, tbl)) > - schedule_work(&ht->run_work); > - > return NULL; > } > > @@ -624,6 +621,9 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > + if (rht_grow_above_75(ht, tbl)) > + schedule_work(&ht->run_work); > + The growth check should stay with the atomic_inc. Something like this should work: if (PTR_ERR(data) == -ENOENT && !new_tbl) { atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work); break; } Could you please resend this via linux-crypto? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-12-12 12:33 ` Herbert Xu @ 2024-12-21 9:06 ` Herbert Xu 2025-01-02 10:15 ` Breno Leitao 0 siblings, 1 reply; 25+ messages in thread From: Herbert Xu @ 2024-12-21 9:06 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote: > > The growth check should stay with the atomic_inc. Something like > this should work: OK I've applied your patch with the atomic_inc move. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-12-21 9:06 ` Herbert Xu @ 2025-01-02 10:15 ` Breno Leitao 2025-01-09 3:16 ` Michael Kelley 0 siblings, 1 reply; 25+ messages in thread From: Breno Leitao @ 2025-01-02 10:15 UTC (permalink / raw) To: Herbert Xu Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote: > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote: > > > > The growth check should stay with the atomic_inc. Something like > > this should work: > > OK I've applied your patch with the atomic_inc move. Sorry, I was on vacation, and I am back now. Let me know if you need anything further. Thanks for fixing it, --breno ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-02 10:15 ` Breno Leitao @ 2025-01-09 3:16 ` Michael Kelley 2025-01-09 10:15 ` Breno Leitao 0 siblings, 1 reply; 25+ messages in thread From: Michael Kelley @ 2025-01-09 3:16 UTC (permalink / raw) To: Breno Leitao, Herbert Xu, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org From: Breno Leitao <leitao@debian.org> Sent: Thursday, January 2, 2025 2:16 AM > > On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote: > > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote: > > > > > > The growth check should stay with the atomic_inc. Something like > > > this should work: > > > > OK I've applied your patch with the atomic_inc move. > > Sorry, I was on vacation, and I am back now. Let me know if you need > anything further. > > Thanks for fixing it, > --breno Breno and Herbert -- This patch seems to break things in linux-next. I'm testing with linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5 ethernet NIC in the VM is failing to get setup. I bisected to commit e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock"), then debugged why opening the mlx5 NIC device is failing. The failure is in the XDP code in function __xdp_reg_mem_model() where the call to rhashtable_insert_slow() is returning -E2BIG. The problem does not occur when the commit is reverted. The function call stack is this: dev_open() __dev_open() mlx5e_open() mlx5e_open_locked() mlx5e_open_channels() mlx5e_open_channel() mlx5e_open_queues() mlx5e_open_rxq_rq() mlx5e_open_rq() mlx5e_alloc_rq() xdp_rxq_info_reg_mem_model() __xdp_reg_mem_model() rhashtable_insert_slow() I have not debugged further as I don't know anything about the rhashtable code or the XDP code. The only repro I have is a VM in Azure. I thought I'd ask you (Breno and Herbert) to review the patch again and see if there's a path that could cause the hash table to be incorrectly detected as full. I've included the linux-hyperv mailing list and the mlx5 driver maintainers on this email. Someone involved with Azure/Hyper-V or the mlx5 driver may have seen the problem, and I want to try to avoid duplicative debugging. Let me know if there's something I can do to help debug further. Thanks, Michael Kelley ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-09 3:16 ` Michael Kelley @ 2025-01-09 10:15 ` Breno Leitao 2025-01-10 9:27 ` Herbert Xu 0 siblings, 1 reply; 25+ messages in thread From: Breno Leitao @ 2025-01-09 10:15 UTC (permalink / raw) To: Michael Kelley Cc: Herbert Xu, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Hello Michael, On Thu, Jan 09, 2025 at 03:16:03AM +0000, Michael Kelley wrote: > From: Breno Leitao <leitao@debian.org> Sent: Thursday, January 2, 2025 2:16 AM > > > > On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote: > > > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote: > > > > > > > > The growth check should stay with the atomic_inc. Something like > > > > this should work: > > > > > > OK I've applied your patch with the atomic_inc move. > > > > Sorry, I was on vacation, and I am back now. Let me know if you need > > anything further. > > > > Thanks for fixing it, > > --breno > > Breno and Herbert -- > > This patch seems to break things in linux-next. I'm testing with > linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5 > ethernet NIC in the VM is failing to get setup. Thanks for reporting the issue. I started rolling this patch to Meta's fleet, and we started seeing a similar problem. Altough not fully understood yet. I would suggest we revert this patch until we investigate further. I'll prepare and send a revert patch shortly. Sorry for the noise, --breno ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-09 10:15 ` Breno Leitao @ 2025-01-10 9:27 ` Herbert Xu 2025-01-10 9:49 ` Breno Leitao ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Herbert Xu @ 2025-01-10 9:27 UTC (permalink / raw) To: Breno Leitao Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote: > > I would suggest we revert this patch until we investigate further. I'll > prepare and send a revert patch shortly. Sorry, I think it was my addition that broke things. The condition for checking whether an entry is inserted is incorrect, thus resulting in an underflow of the number of entries after entry removal. Please test this patch: ---8<--- The function rhashtable_insert_one only returns NULL iff the insertion was successful, so that alone should be tested before increment nelems. Testing the variable data is redundant, and buggy because we will have overwritten the original value of data by this point. Reported-by: Michael Kelley <mhklinux@outlook.com> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/lib/rhashtable.c b/lib/rhashtable.c index bf956b85455a..e196b6f0e35a 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, rht_unlock(tbl, bkt, flags); - if (PTR_ERR(data) == -ENOENT && !new_tbl) { + if (!new_tbl) { atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work); -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-10 9:27 ` Herbert Xu @ 2025-01-10 9:49 ` Breno Leitao 2025-01-10 10:07 ` Herbert Xu 2025-01-10 14:46 ` Zaslonko Mikhail 2025-01-10 16:59 ` Michael Kelley 2 siblings, 1 reply; 25+ messages in thread From: Breno Leitao @ 2025-01-10 9:49 UTC (permalink / raw) To: Herbert Xu Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Hello Herbet, On Fri, Jan 10, 2025 at 05:27:49PM +0800, Herbert Xu wrote: > Sorry, I think it was my addition that broke things. The condition > for checking whether an entry is inserted is incorrect, thus resulting > in an underflow of the number of entries after entry removal. That is what I though originally as well, but I was not convinced. While reading the code, I understood that, if new_tbl is not NULL, then PTR_ERR(data) will be -ENOENT. In which case `net_tbl` will not be NULL, and PTR_ERR(data) != -ENOENT? Thanks for solving it. > Please test this patch: I don't have an easy reproducer yet, but, I will get this patch in ~50 hosts and see if any of them misbehave during the weekend. Misbehaving in my case is strongly associate with messages like the following being printed: kobject_uevent: unable to create netlink socket! Thanks --breno ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-10 9:49 ` Breno Leitao @ 2025-01-10 10:07 ` Herbert Xu 0 siblings, 0 replies; 25+ messages in thread From: Herbert Xu @ 2025-01-10 10:07 UTC (permalink / raw) To: Breno Leitao Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Jan 10, 2025 at 01:49:44AM -0800, Breno Leitao wrote: > > That is what I though originally as well, but I was not convinced. While > reading the code, I understood that, if new_tbl is not NULL, then > PTR_ERR(data) will be -ENOENT. > > In which case `net_tbl` will not be NULL, and PTR_ERR(data) != -ENOENT? The bug arises when an insertion succeeds. So new_tbl is NULL. The original value of data should have been -ENOENT, however, it gets overwritten after rhashtable_insert_one (data is now NULL). Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-10 9:27 ` Herbert Xu 2025-01-10 9:49 ` Breno Leitao @ 2025-01-10 14:46 ` Zaslonko Mikhail 2025-01-10 16:59 ` Michael Kelley 2 siblings, 0 replies; 25+ messages in thread From: Zaslonko Mikhail @ 2025-01-10 14:46 UTC (permalink / raw) To: Herbert Xu, Breno Leitao Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Heiko Carstens, Alexander Gordeev Herbert and Breno, On 10.01.2025 10:27, Herbert Xu wrote: > On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote: > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..e196b6f0e35a 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > + if (!new_tbl) { > atomic_inc(&ht->nelems); > if (rht_grow_above_75(ht, tbl)) > schedule_work(&ht->run_work); I'd like to let you know that I was getting OOM failure on s390 when booting the kernel (linux-next20250109) with limited memory (mem=1G kernel parameter). Problem took place in both zVM and LPAR environments. Bisecting also revealed the commit e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock"). Afterwards, I tried the fix from Herbert above and the error does not appear any more. So it seems to resolve the issue. Thanks, Mikhail ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-10 9:27 ` Herbert Xu 2025-01-10 9:49 ` Breno Leitao 2025-01-10 14:46 ` Zaslonko Mikhail @ 2025-01-10 16:59 ` Michael Kelley 2025-01-10 17:24 ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu 2 siblings, 1 reply; 25+ messages in thread From: Michael Kelley @ 2025-01-10 16:59 UTC (permalink / raw) To: Herbert Xu, Breno Leitao Cc: saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Friday, January 10, 2025 1:28 AM > > On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote: > > > > I would suggest we revert this patch until we investigate further. I'll > > prepare and send a revert patch shortly. > > Sorry, I think it was my addition that broke things. The condition > for checking whether an entry is inserted is incorrect, thus resulting > in an underflow of the number of entries after entry removal. > > Please test this patch: > > ---8<--- > The function rhashtable_insert_one only returns NULL iff the > insertion was successful, so that alone should be tested before > increment nelems. Testing the variable data is redundant, and > buggy because we will have overwritten the original value of data > by this point. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work > outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..e196b6f0e35a 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const > void *key, > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > + if (!new_tbl) { > atomic_inc(&ht->nelems); > if (rht_grow_above_75(ht, tbl)) > schedule_work(&ht->run_work); > -- This patch fixes the problem I saw with VMs in the Azure cloud. Thanks! Michael Kelley ^ permalink raw reply [flat|nested] 25+ messages in thread
* [v2 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-10 16:59 ` Michael Kelley @ 2025-01-10 17:24 ` Herbert Xu 2025-01-10 18:22 ` Michael Kelley 0 siblings, 1 reply; 25+ messages in thread From: Herbert Xu @ 2025-01-10 17:24 UTC (permalink / raw) To: Michael Kelley Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Jan 10, 2025 at 04:59:28PM +0000, Michael Kelley wrote: > > This patch fixes the problem I saw with VMs in the Azure cloud. Thanks! Sorry, but the test on data is needed after all (it was just buggy). Otherwise we will break rhlist. So please test this patch instead. ---8<--- The test on whether rhashtable_insert_one did an insertion relies on the value returned by rhashtable_lookup_one. Unfortunately that value is overwritten after rhashtable_insert_one returns. Fix this by saving the old value. Also simplify the test as only data == NULL matters. Reported-by: Michael Kelley <mhklinux@outlook.com> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/lib/rhashtable.c b/lib/rhashtable.c index bf956b85455a..e36b36f3146d 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); data = ERR_PTR(-EAGAIN); } else { + void *odata; + flags = rht_lock(tbl, bkt); data = rhashtable_lookup_one(ht, bkt, tbl, hash, key, obj); new_tbl = rhashtable_insert_one(ht, bkt, tbl, hash, obj, data); + odata = data; if (PTR_ERR(new_tbl) != -EEXIST) data = ERR_CAST(new_tbl); rht_unlock(tbl, bkt, flags); - if (PTR_ERR(data) == -ENOENT && !new_tbl) { + if (odata && !new_tbl) { atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work); -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: [v2 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-10 17:24 ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu @ 2025-01-10 18:22 ` Michael Kelley 2025-01-14 3:15 ` [v3 " Herbert Xu 0 siblings, 1 reply; 25+ messages in thread From: Michael Kelley @ 2025-01-10 18:22 UTC (permalink / raw) To: Herbert Xu Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Friday, January 10, 2025 9:24 AM > > On Fri, Jan 10, 2025 at 04:59:28PM +0000, Michael Kelley wrote: > > > > This patch fixes the problem I saw with VMs in the Azure cloud. Thanks! > > Sorry, but the test on data is needed after all (it was just > buggy). Otherwise we will break rhlist. So please test this > patch instead. > > ---8<--- > The test on whether rhashtable_insert_one did an insertion relies > on the value returned by rhashtable_lookup_one. Unfortunately that > value is overwritten after rhashtable_insert_one returns. Fix this > by saving the old value. > > Also simplify the test as only data == NULL matters. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work > outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..e36b36f3146d 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht, > const void *key, > new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); > data = ERR_PTR(-EAGAIN); > } else { > + void *odata; > + > flags = rht_lock(tbl, bkt); > data = rhashtable_lookup_one(ht, bkt, tbl, > hash, key, obj); > new_tbl = rhashtable_insert_one(ht, bkt, tbl, > hash, obj, data); > + odata = data; > if (PTR_ERR(new_tbl) != -EEXIST) > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > + if (odata && !new_tbl) { > atomic_inc(&ht->nelems); > if (rht_grow_above_75(ht, tbl)) > schedule_work(&ht->run_work); This patch passes my tests. I'm doing a narrow test to verify that the boot failure when opening the Mellanox NIC is no longer occurring. I also unloaded/reloaded the mlx5 driver a couple of times. For good measure, I then did a full Linux kernel build, and all is good. My testing does not broadly verify correct operation of rhashtable except as it gets exercised implicitly by these basic tests. Michael ^ permalink raw reply [flat|nested] 25+ messages in thread
* [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-10 18:22 ` Michael Kelley @ 2025-01-14 3:15 ` Herbert Xu 2025-01-14 11:58 ` Michael Kelley ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Herbert Xu @ 2025-01-14 3:15 UTC (permalink / raw) To: Michael Kelley Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote: > > This patch passes my tests. I'm doing a narrow test to verify that > the boot failure when opening the Mellanox NIC is no longer occurring. > I also unloaded/reloaded the mlx5 driver a couple of times. For good > measure, I then did a full Linux kernel build, and all is good. My testing > does not broadly verify correct operation of rhashtable except as it > gets exercised implicitly by these basic tests. Thanks for testing! The patch needs one more change though as moving the atomic_inc outside of the lock was a bad idea on my part. This could cause atomic_inc/atomic_dec to be reordered thus resulting in an underflow. Thanks, ---8<--- The test on whether rhashtable_insert_one did an insertion relies on the value returned by rhashtable_lookup_one. Unfortunately that value is overwritten after rhashtable_insert_one returns. Fix this by moving the test before data gets overwritten. Simplify the test as only data == NULL matters. Finally move atomic_inc back within the lock as otherwise it may be reordered with the atomic_dec on the removal side, potentially leading to an underflow. Reported-by: Michael Kelley <mhklinux@outlook.com> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/lib/rhashtable.c b/lib/rhashtable.c index bf956b85455a..0e9a1d4cf89b 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); data = ERR_PTR(-EAGAIN); } else { + bool inserted; + flags = rht_lock(tbl, bkt); data = rhashtable_lookup_one(ht, bkt, tbl, hash, key, obj); new_tbl = rhashtable_insert_one(ht, bkt, tbl, hash, obj, data); + inserted = data && !new_tbl; + if (inserted) + atomic_inc(&ht->nelems); if (PTR_ERR(new_tbl) != -EEXIST) data = ERR_CAST(new_tbl); rht_unlock(tbl, bkt, flags); - if (PTR_ERR(data) == -ENOENT && !new_tbl) { - atomic_inc(&ht->nelems); - if (rht_grow_above_75(ht, tbl)) - schedule_work(&ht->run_work); - } + if (inserted && rht_grow_above_75(ht, tbl)) + schedule_work(&ht->run_work); } } while (!IS_ERR_OR_NULL(new_tbl)); -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-14 3:15 ` [v3 " Herbert Xu @ 2025-01-14 11:58 ` Michael Kelley 2025-01-15 15:15 ` Breno Leitao ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: Michael Kelley @ 2025-01-14 11:58 UTC (permalink / raw) To: Herbert Xu Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Monday, January 13, 2025 7:15 PM > > On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote: > > > > This patch passes my tests. I'm doing a narrow test to verify that > > the boot failure when opening the Mellanox NIC is no longer occurring. > > I also unloaded/reloaded the mlx5 driver a couple of times. For good > > measure, I then did a full Linux kernel build, and all is good. My testing > > does not broadly verify correct operation of rhashtable except as it > > gets exercised implicitly by these basic tests. > > Thanks for testing! The patch needs one more change though as > moving the atomic_inc outside of the lock was a bad idea on my > part. This could cause atomic_inc/atomic_dec to be reordered > thus resulting in an underflow. > > Thanks, I've tested this version of the fix in the same limited way as before. All is good. I'm still testing against linux-next20250108 from a few days ago, but that should not make any difference. I have *not* reviewed the code change itself. Tested-by: Michael Kelley <mhklinux@outlook.com> > > ---8<--- > The test on whether rhashtable_insert_one did an insertion relies > on the value returned by rhashtable_lookup_one. Unfortunately that > value is overwritten after rhashtable_insert_one returns. Fix this > by moving the test before data gets overwritten. > > Simplify the test as only data == NULL matters. > > Finally move atomic_inc back within the lock as otherwise it may > be reordered with the atomic_dec on the removal side, potentially > leading to an underflow. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..0e9a1d4cf89b 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, > const void *key, > new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); > data = ERR_PTR(-EAGAIN); > } else { > + bool inserted; > + > flags = rht_lock(tbl, bkt); > data = rhashtable_lookup_one(ht, bkt, tbl, > hash, key, obj); > new_tbl = rhashtable_insert_one(ht, bkt, tbl, > hash, obj, data); > + inserted = data && !new_tbl; > + if (inserted) > + atomic_inc(&ht->nelems); > if (PTR_ERR(new_tbl) != -EEXIST) > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > - atomic_inc(&ht->nelems); > - if (rht_grow_above_75(ht, tbl)) > - schedule_work(&ht->run_work); > - } > + if (inserted && rht_grow_above_75(ht, tbl)) > + schedule_work(&ht->run_work); > } > } while (!IS_ERR_OR_NULL(new_tbl)); > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-14 3:15 ` [v3 " Herbert Xu 2025-01-14 11:58 ` Michael Kelley @ 2025-01-15 15:15 ` Breno Leitao 2025-01-16 9:10 ` Herbert Xu 2025-01-16 11:48 ` Alexander Gordeev 2025-01-17 13:20 ` Zaslonko Mikhail 3 siblings, 1 reply; 25+ messages in thread From: Breno Leitao @ 2025-01-15 15:15 UTC (permalink / raw) To: Herbert Xu Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote: > On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote: > > > > This patch passes my tests. I'm doing a narrow test to verify that > > the boot failure when opening the Mellanox NIC is no longer occurring. > > I also unloaded/reloaded the mlx5 driver a couple of times. For good > > measure, I then did a full Linux kernel build, and all is good. My testing > > does not broadly verify correct operation of rhashtable except as it > > gets exercised implicitly by these basic tests. > > Thanks for testing! The patch needs one more change though as > moving the atomic_inc outside of the lock was a bad idea on my > part. This could cause atomic_inc/atomic_dec to be reordered > thus resulting in an underflow. > > Thanks, > > ---8<--- > The test on whether rhashtable_insert_one did an insertion relies > on the value returned by rhashtable_lookup_one. Unfortunately that > value is overwritten after rhashtable_insert_one returns. Fix this > by moving the test before data gets overwritten. > > Simplify the test as only data == NULL matters. > > Finally move atomic_inc back within the lock as otherwise it may > be reordered with the atomic_dec on the removal side, potentially > leading to an underflow. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Breno Leitao <leitao@debian.org> > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..0e9a1d4cf89b 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, > new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); > data = ERR_PTR(-EAGAIN); > } else { > + bool inserted; > + > flags = rht_lock(tbl, bkt); > data = rhashtable_lookup_one(ht, bkt, tbl, > hash, key, obj); > new_tbl = rhashtable_insert_one(ht, bkt, tbl, > hash, obj, data); > + inserted = data && !new_tbl; > + if (inserted) > + atomic_inc(&ht->nelems); > if (PTR_ERR(new_tbl) != -EEXIST) > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > - atomic_inc(&ht->nelems); > - if (rht_grow_above_75(ht, tbl)) > - schedule_work(&ht->run_work); > - } > + if (inserted && rht_grow_above_75(ht, tbl)) > + schedule_work(&ht->run_work); That makes sense, since data could be ERR_PTR(-ENOENT) and ERR_PTR(-EAGAIN), and the object being inserted, which means that nelems should be increased. It was hard to review this patch, basically rhashtable_insert_one() returns three type of values, and you are interested in only one case, when the obj was inserted. These are the type of values that is coming from rhashtable_insert_one(): 1) NULL: if object was inserted OR if data is NULL 2) Non error and !NULL: A new table to look at 3) ERR: Definitely not added I am wondering if we decoupled the first case, and only return NULL iff the object was added, it would simplify this logic. Something like the following (not tested): diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 3e555d012ed60..5a0ec71e990ee 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one( return ERR_PTR(-EEXIST); if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT) - return ERR_CAST(data); + return ERR_PTR(-EINVAL); new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); if (new_tbl) Thanks for fixing it, --breno ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-15 15:15 ` Breno Leitao @ 2025-01-16 9:10 ` Herbert Xu 0 siblings, 0 replies; 25+ messages in thread From: Herbert Xu @ 2025-01-16 9:10 UTC (permalink / raw) To: Breno Leitao Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List On Wed, Jan 15, 2025 at 07:15:46AM -0800, Breno Leitao wrote: > > Something like the following (not tested): > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index 3e555d012ed60..5a0ec71e990ee 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one( > return ERR_PTR(-EEXIST); > > if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT) > - return ERR_CAST(data); > + return ERR_PTR(-EINVAL); But data == NULL is not an error, it is a successful return value for rhltable. It's when a key already exists in the rhltable and we're simply appending the new object to it. Thus the insertion was successful but the hash table did not grow. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-14 3:15 ` [v3 " Herbert Xu 2025-01-14 11:58 ` Michael Kelley 2025-01-15 15:15 ` Breno Leitao @ 2025-01-16 11:48 ` Alexander Gordeev 2025-01-17 13:20 ` Zaslonko Mikhail 3 siblings, 0 replies; 25+ messages in thread From: Alexander Gordeev @ 2025-01-16 11:48 UTC (permalink / raw) To: Herbert Xu Cc: Michael Kelley, Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List, Zaslonko Mikhail On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote: Hi Herbert, > Thanks for testing! The patch needs one more change though as > moving the atomic_inc outside of the lock was a bad idea on my > part. This could cause atomic_inc/atomic_dec to be reordered > thus resulting in an underflow. I want to confirm that this patch fixes massive strangenesses and few crashes observed on s390 systems, in addition to OOMs reported by Mikhail Zaslonko earlier. > Thanks, Thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test 2025-01-14 3:15 ` [v3 " Herbert Xu ` (2 preceding siblings ...) 2025-01-16 11:48 ` Alexander Gordeev @ 2025-01-17 13:20 ` Zaslonko Mikhail 3 siblings, 0 replies; 25+ messages in thread From: Zaslonko Mikhail @ 2025-01-17 13:20 UTC (permalink / raw) To: Herbert Xu, Michael Kelley Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linux Crypto Mailing List Hello, On 14.01.2025 04:15, Herbert Xu wrote: > On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote: > > Thanks for testing! The patch needs one more change though as > moving the atomic_inc outside of the lock was a bad idea on my > part. This could cause atomic_inc/atomic_dec to be reordered > thus resulting in an underflow. > > Thanks, I've tested v3 patch on s390 for the boot OOM problem with 'mem=1G' kernel parameter we experience on current linux-next kernel and confirm that the issue is no longer present. Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> > > ---8<--- > The test on whether rhashtable_insert_one did an insertion relies > on the value returned by rhashtable_lookup_one. Unfortunately that > value is overwritten after rhashtable_insert_one returns. Fix this > by moving the test before data gets overwritten. > > Simplify the test as only data == NULL matters. > > Finally move atomic_inc back within the lock as otherwise it may > be reordered with the atomic_dec on the removal side, potentially > leading to an underflow. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..0e9a1d4cf89b 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, > new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); > data = ERR_PTR(-EAGAIN); > } else { > + bool inserted; > + > flags = rht_lock(tbl, bkt); > data = rhashtable_lookup_one(ht, bkt, tbl, > hash, key, obj); > new_tbl = rhashtable_insert_one(ht, bkt, tbl, > hash, obj, data); > + inserted = data && !new_tbl; > + if (inserted) > + atomic_inc(&ht->nelems); > if (PTR_ERR(new_tbl) != -EEXIST) > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > - atomic_inc(&ht->nelems); > - if (rht_grow_above_75(ht, tbl)) > - schedule_work(&ht->run_work); > - } > + if (inserted && rht_grow_above_75(ht, tbl)) > + schedule_work(&ht->run_work); > } > } while (!IS_ERR_OR_NULL(new_tbl)); > Thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao ` (2 preceding siblings ...) 2024-12-12 12:33 ` Herbert Xu @ 2025-01-13 19:50 ` Guenter Roeck 2025-01-14 3:23 ` Herbert Xu 3 siblings, 1 reply; 25+ messages in thread From: Guenter Roeck @ 2025-01-13 19:50 UTC (permalink / raw) To: Breno Leitao Cc: Andrew Morton, Thomas Graf, Herbert Xu, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel Hi, On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote: > Move the hash table growth check and work scheduling outside the > rht lock to prevent a possible circular locking dependency. > > The original implementation could trigger a lockdep warning due to > a potential deadlock scenario involving nested locks between > rhashtable bucket, rq lock, and dsq lock. By relocating the > growth check and work scheduling after releasing the rth lock, we break > this potential deadlock chain. > > This change expands the flexibility of rhashtable by removing > restrictive locking that previously limited its use in scheduler > and workqueue contexts. > > Import to say that this calls rht_grow_above_75(), which reads from > struct rhashtable without holding the lock, if this is a problem, we can > move the check to the lock, and schedule the workqueue after the lock. > > Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> With this patch in linux-next, I get some unit test errors. [ 3.800185] # Subtest: hw_breakpoint [ 3.800469] # module: hw_breakpoint_test [ 3.800718] 1..9 [ 3.810825] # test_one_cpu: pass:1 fail:0 skip:0 total:1 [ 3.810950] ok 1 test_one_cpu [ 3.814941] # test_many_cpus: pass:1 fail:0 skip:0 total:1 [ 3.815092] ok 2 test_many_cpus [ 3.822977] # test_one_task_on_all_cpus: pass:1 fail:0 skip:0 total:1 [ 3.823100] ok 3 test_one_task_on_all_cpus [ 3.829071] # test_two_tasks_on_all_cpus: pass:1 fail:0 skip:0 total:1 [ 3.829199] ok 4 test_two_tasks_on_all_cpus [ 3.830914] # test_one_task_on_one_cpu: ASSERTION FAILED at kernel/events/hw_breakpoint_test.c:70 [ 3.830914] Expected IS_ERR(bp) to be false, but is true [ 3.832572] # test_one_task_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320 [ 3.832572] Expected hw_breakpoint_is_used() to be false, but is true [ 3.833002] # test_one_task_on_one_cpu: pass:0 fail:1 skip:0 total:1 [ 3.833071] not ok 5 test_one_task_on_one_cpu [ 3.834994] # test_one_task_mixed: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320 [ 3.834994] Expected hw_breakpoint_is_used() to be false, but is true [ 3.835583] # test_one_task_mixed: pass:0 fail:1 skip:0 total:1 [ 3.835638] not ok 6 test_one_task_mixed [ 3.837131] # test_two_tasks_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320 [ 3.837131] Expected hw_breakpoint_is_used() to be false, but is true [ 3.837827] # test_two_tasks_on_one_cpu: pass:0 fail:1 skip:0 total:1 [ 3.837882] not ok 7 test_two_tasks_on_one_cpu [ 3.839868] # test_two_tasks_on_one_all_cpus: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320 [ 3.839868] Expected hw_breakpoint_is_used() to be false, but is true [ 3.840294] # test_two_tasks_on_one_all_cpus: pass:0 fail:1 skip:0 total:1 [ 3.840538] not ok 8 test_two_tasks_on_one_all_cpus [ 3.843599] # test_task_on_all_and_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320 [ 3.843599] Expected hw_breakpoint_is_used() to be false, but is true [ 3.844163] # test_task_on_all_and_one_cpu: pass:0 fail:1 skip:0 total:1 [ 3.844215] not ok 9 test_task_on_all_and_one_cpu [ 3.844453] # hw_breakpoint: pass:4 fail:5 skip:0 total:9 [ 3.844610] # Totals: pass:4 fail:5 skip:0 total:9 [ 3.844797] not ok 1 hw_breakpoint Sometimes I also see: [ 12.579842] # Subtest: Handshake API tests [ 12.579971] 1..11 [ 12.580052] KTAP version 1 [ 12.580410] # Subtest: req_alloc API fuzzing [ 12.582206] ok 1 handshake_req_alloc NULL proto [ 12.583541] ok 2 handshake_req_alloc CLASS_NONE [ 12.585419] ok 3 handshake_req_alloc CLASS_MAX [ 12.587291] ok 4 handshake_req_alloc no callbacks [ 12.589239] ok 5 handshake_req_alloc no done callback [ 12.590758] ok 6 handshake_req_alloc excessive privsize [ 12.592642] ok 7 handshake_req_alloc all good [ 12.592802] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7 [ 12.593185] ok 1 req_alloc API fuzzing [ 12.597371] # req_submit NULL req arg: pass:1 fail:0 skip:0 total:1 [ 12.597501] ok 2 req_submit NULL req arg [ 12.599208] # req_submit NULL sock arg: pass:1 fail:0 skip:0 total:1 [ 12.599338] ok 3 req_submit NULL sock arg [ 12.601549] # req_submit NULL sock->file: pass:1 fail:0 skip:0 total:1 [ 12.601680] ok 4 req_submit NULL sock->file [ 12.605334] # req_lookup works: pass:1 fail:0 skip:0 total:1 [ 12.605469] ok 5 req_lookup works [ 12.609596] # req_submit max pending: pass:1 fail:0 skip:0 total:1 [ 12.609730] ok 6 req_submit max pending [ 12.613796] # req_submit multiple: pass:1 fail:0 skip:0 total:1 [ 12.614250] ok 7 req_submit multiple [ 12.616395] # req_cancel before accept: ASSERTION FAILED at net/handshake/handshake-test.c:333 [ 12.616395] Expected err == 0, but [ 12.616395] err == -16 (0xfffffffffffffff0) [ 12.618061] # req_cancel before accept: pass:0 fail:1 skip:0 total:1 [ 12.618135] not ok 8 req_cancel before accept [ 12.619437] # req_cancel after accept: ASSERTION FAILED at net/handshake/handshake-test.c:369 [ 12.619437] Expected err == 0, but [ 12.619437] err == -16 (0xfffffffffffffff0) [ 12.621055] # req_cancel after accept: pass:0 fail:1 skip:0 total:1 [ 12.621119] not ok 9 req_cancel after accept [ 12.622342] # req_cancel after done: ASSERTION FAILED at net/handshake/handshake-test.c:411 [ 12.622342] Expected err == 0, but [ 12.622342] err == -16 (0xfffffffffffffff0) [ 12.623547] # req_cancel after done: pass:0 fail:1 skip:0 total:1 [ 12.623608] not ok 10 req_cancel after done [ 12.625297] # req_destroy works: ASSERTION FAILED at net/handshake/handshake-test.c:469 [ 12.625297] Expected err == 0, but [ 12.625297] err == -16 (0xfffffffffffffff0) [ 12.626633] # req_destroy works: pass:0 fail:1 skip:0 total:1 [ 12.626696] not ok 11 req_destroy works [ 12.626837] # Handshake API tests: pass:7 fail:4 skip:0 total:11 [ 12.627298] # Totals: pass:13 fail:4 skip:0 total:17 [ 12.627446] not ok 90 Handshake API tests The log is from a test with x86_64, but other architectures are affected as well. Reverting this patch fixes the problem. Bisect log is attached for reference. Guenter --- # bad: [37136bf5c3a6f6b686d74f41837a6406bec6b7bc] Add linux-next specific files for 20250113 # good: [9d89551994a430b50c4fffcb1e617a057fa76e20] Linux 6.13-rc6 git bisect start 'HEAD' 'v6.13-rc6' # good: [25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git git bisect good 25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038 # bad: [c6ab5ee56509953c3ee6647ac9f266a7c628f082] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git git bisect bad c6ab5ee56509953c3ee6647ac9f266a7c628f082 # good: [39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d] Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next git bisect good 39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d # bad: [0f8b2b2250abe043cef890caa378bebe5c4f5d88] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git git bisect bad 0f8b2b2250abe043cef890caa378bebe5c4f5d88 # good: [67fcb0469b17071890761d437bdf83d2e2d14575] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git git bisect good 67fcb0469b17071890761d437bdf83d2e2d14575 # bad: [7fd4a4e4c397fc315f8ebf0fb22e50526f66580a] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux git bisect bad 7fd4a4e4c397fc315f8ebf0fb22e50526f66580a # good: [e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7] drm/amd/display: Initialize denominator defaults to 1 git bisect good e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7 # bad: [5b7981c1ca61ca7ad7162cfe95bf271d001d29ac] crypto: x86/aes-xts - use .irp when useful git bisect bad 5b7981c1ca61ca7ad7162cfe95bf271d001d29ac # good: [ce8fd0500b741b3669c246cc604f1f2343cdd6fd] crypto: qce - use __free() for a buffer that's always freed git bisect good ce8fd0500b741b3669c246cc604f1f2343cdd6fd # good: [5e252f490c1c2c989cdc2ca50744f30fbca356b4] crypto: tea - stop using cra_alignmask git bisect good 5e252f490c1c2c989cdc2ca50744f30fbca356b4 # good: [f916e44487f56df4827069ff3a2070c0746dc511] crypto: keywrap - remove assignment of 0 to cra_alignmask git bisect good f916e44487f56df4827069ff3a2070c0746dc511 # bad: [b9b894642fede191d50230d08608bd4f4f49f73d] crypto: lib/gf128mul - Remove some bbe deadcode git bisect bad b9b894642fede191d50230d08608bd4f4f49f73d # bad: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock git bisect bad e1d3422c95f003eba241c176adfe593c33e8a8f6 # first bad commit: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock 2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck @ 2025-01-14 3:23 ` Herbert Xu 0 siblings, 0 replies; 25+ messages in thread From: Herbert Xu @ 2025-01-14 3:23 UTC (permalink / raw) To: Guenter Roeck Cc: Breno Leitao, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden, netdev, linux-kernel On Mon, Jan 13, 2025 at 11:50:36AM -0800, Guenter Roeck wrote: > > With this patch in linux-next, I get some unit test errors. Thanks! This patch should fix the problem. https://patchwork.kernel.org/project/linux-crypto/patch/Z4XWx5X0doetOJni@gondor.apana.org.au/ -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-01-17 13:21 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao 2024-11-29 5:47 ` Herbert Xu 2024-12-02 20:16 ` Breno Leitao 2024-12-03 20:16 ` Tejun Heo 2024-12-04 1:34 ` Herbert Xu 2024-12-12 12:33 ` Herbert Xu 2024-12-21 9:06 ` Herbert Xu 2025-01-02 10:15 ` Breno Leitao 2025-01-09 3:16 ` Michael Kelley 2025-01-09 10:15 ` Breno Leitao 2025-01-10 9:27 ` Herbert Xu 2025-01-10 9:49 ` Breno Leitao 2025-01-10 10:07 ` Herbert Xu 2025-01-10 14:46 ` Zaslonko Mikhail 2025-01-10 16:59 ` Michael Kelley 2025-01-10 17:24 ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu 2025-01-10 18:22 ` Michael Kelley 2025-01-14 3:15 ` [v3 " Herbert Xu 2025-01-14 11:58 ` Michael Kelley 2025-01-15 15:15 ` Breno Leitao 2025-01-16 9:10 ` Herbert Xu 2025-01-16 11:48 ` Alexander Gordeev 2025-01-17 13:20 ` Zaslonko Mikhail 2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck 2025-01-14 3:23 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox