[PATCH] rhashtable: Fix potential deadlock by moving schedule

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
@ 2024-11-28 12:16 Breno Leitao
  2024-11-29  5:47 ` Herbert Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Breno Leitao @ 2024-11-28 12:16 UTC (permalink / raw)
  To: Andrew Morton, Thomas Graf, Herbert Xu, Tejun Heo, Hao Luo,
	Josh Don, Barret Rhoden
  Cc: netdev, linux-kernel, Breno Leitao

Move the hash table growth check and work scheduling outside the
rht lock to prevent a possible circular locking dependency.

The original implementation could trigger a lockdep warning due to
a potential deadlock scenario involving nested locks between
rhashtable bucket, rq lock, and dsq lock. By relocating the
growth check and work scheduling after releasing the rth lock, we break
this potential deadlock chain.

This change expands the flexibility of rhashtable by removing
restrictive locking that previously limited its use in scheduler
and workqueue contexts.

Import to say that this calls rht_grow_above_75(), which reads from
struct rhashtable without holding the lock, if this is a problem, we can
move the check to the lock, and schedule the workqueue after the lock.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/rhashtable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6c902639728b767cc3ee42c61256d2e9618e6ce7..5a27ccd72db9a25d92d1ed2f8d519afcfc672afe 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -585,9 +585,6 @@ static struct bucket_table *rhashtable_insert_one(
 	rht_assign_locked(bkt, obj);

 	atomic_inc(&ht->nelems);
-	if (rht_grow_above_75(ht, tbl))
-		schedule_work(&ht->run_work);
-
 	return NULL;
 }

@@ -624,6 +621,9 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 				data = ERR_CAST(new_tbl);

 			rht_unlock(tbl, bkt, flags);
+			if (rht_grow_above_75(ht, tbl))
+				schedule_work(&ht->run_work);
+
 		}
 	} while (!IS_ERR_OR_NULL(new_tbl));

---
base-commit: 0a31ca318eea4da46e5f495c79ccc4442c92f4dc
change-id: 20241128-scx_lockdep-3fa87553609d

Best regards,
-- 
Breno Leitao <leitao@debian.org>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
@ 2024-11-29  5:47 ` Herbert Xu
  2024-12-02 20:16   ` Breno Leitao
  2024-12-03 20:16 ` Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2024-11-29  5:47 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
> 
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
> 
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.

Could you please explain the deadlock? Is the workqueue system actually
using rhashtable?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-11-29  5:47 ` Herbert Xu
@ 2024-12-02 20:16   ` Breno Leitao
  0 siblings, 0 replies; 25+ messages in thread
From: Breno Leitao @ 2024-12-02 20:16 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

Hello Herbert,

On Fri, Nov 29, 2024 at 01:47:42PM +0800, Herbert Xu wrote:
> On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> > Move the hash table growth check and work scheduling outside the
> > rht lock to prevent a possible circular locking dependency.
> > 
> > The original implementation could trigger a lockdep warning due to
> > a potential deadlock scenario involving nested locks between
> > rhashtable bucket, rq lock, and dsq lock. By relocating the
> > growth check and work scheduling after releasing the rth lock, we break
> > this potential deadlock chain.
> > 
> > This change expands the flexibility of rhashtable by removing
> > restrictive locking that previously limited its use in scheduler
> > and workqueue contexts.
> 
> Could you please explain the deadlock?

I understand that there is a locking order inversion here:

A possible code flow can hold the rhashtable_bucket, and then can get
the dqs->lock, as shown int eh following snippet:

         Chain exists of:
           rhashtable_bucket --> &rq->__lock --> &dsq->lock

The same is true, when sched_ext holds rthe dsq->lock and try to get
hold of rhashtable lock.

This could be seen in the following snippers:

                rht_lock+0x69/0xd0
                destroy_dsq+0x22d/0x790
                scx_ops_disable_workfn+0x9d2/0xaf0

> Is the workqueue system actually using rhashtable?

It seems so when using sched_ext scheduler class. For instance, lockdep
got it in scx_ops_disable_workfn().

                rht_lock+0x69/0xd0
                destroy_dsq+0x22d/0x790
                scx_ops_disable_workfn+0x9d2/0xaf0
                kthread_worker_fn+0x137/0x350


This is the full lockdep splat, if it helps. Sorry it is not decoded,
but this can give you the code-flow.

	 ======================================================
	 WARNING: possible circular locking dependency detected

	 hardirqs last  enabled at (2088145): [<ffffffff822ab674>] _raw_spin_unlock_irq+0x24/0x50
	 hardirqs last disabled at (2088144): [<ffffffff822ab4bf>] _raw_spin_lock_irq+0x2f/0x80
	 ------------------------------------------------------
	 softirqs last  enabled at (2088116): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100
	 sched_ext_ops_h/10451 is trying to acquire lock:
	 softirqs last disabled at (2088111): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100
	 ffff888288059038 (rhashtable_bucket){....}-{0:0}, at: rht_lock+0x51/0xd0

	 but task is already holding lock:
	 ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790

	 which lock already depends on the new lock.


	 the existing dependency chain (in reverse order) is:

	 -> #4 (&dsq->lock){-.-.}-{2:2}:
		_raw_spin_lock+0x2f/0x40
		dispatch_enqueue+0x7c/0x3e0
		enqueue_task_scx.llvm.3416789782249720787+0x1ae/0x250
		sched_enq_and_set_task+0x5f/0xb0
		bpf_scx_reg+0xf21/0x11b0
		bpf_struct_ops_link_create+0xec/0x160
		__sys_bpf+0x34d/0x3b0
		__x64_sys_bpf+0x18/0x20
		do_syscall_64+0x7e/0x150
		entry_SYSCALL_64_after_hwframe+0x4b/0x53

	 -> #3 (&rq->__lock){-.-.}-{2:2}:
		_raw_spin_lock_nested+0x32/0x40
		raw_spin_rq_lock_nested+0x20/0x30
		task_fork_fair.llvm.5382994275699419189+0x3b/0x110
		sched_cgroup_fork+0xe3/0x100
		copy_process+0xc3c/0x14a0
		kernel_clone+0x90/0x360
		user_mode_thread+0xbc/0xe0
		rest_init+0x1f/0x1f0
		start_kernel+0x41b/0x470
		x86_64_start_reservations+0x26/0x30
		x86_64_start_kernel+0x9b/0xa0
		common_startup_64+0x13e/0x140

	 -> #2 (&p->pi_lock){-.-.}-{2:2}:
		_raw_spin_lock_irqsave+0x5a/0x90
		try_to_wake_up+0x58/0x730
		create_worker+0x1d6/0x240

		workqueue_init+0x2c0/0x390
		kernel_init_freeable+0x147/0x200
		kernel_init+0x16/0x1c0
		ret_from_fork+0x2f/0x40
		ret_from_fork_asm+0x11/0x20

	 -> #1 (&pool->lock){-.-.}-{2:2}:
		_raw_spin_lock+0x2f/0x40
		__queue_work+0x24b/0x610
		queue_work_on+0xa5/0xf0
		rhashtable_insert_slow+0x524/0x970
		__xdp_reg_mem_model+0x181/0x240
		xdp_rxq_info_reg_mem_model+0x19/0xf0
		bnxt_alloc_mem+0x1178/0x1c80
		__bnxt_open_nic+0x1bb/0xe20
		bnxt_open_nic+0x26/0x60
		ethtool_set_channels+0x1b7/0x1f0
		dev_ethtool+0x555/0x740
		dev_ioctl+0x2ac/0x3f0
		sock_do_ioctl+0x111/0x180
		sock_ioctl+0x1fb/0x2e0
		__se_sys_ioctl+0x72/0xc0
		do_syscall_64+0x7e/0x150
		entry_SYSCALL_64_after_hwframe+0x4b/0x53

	 -> #0 (rhashtable_bucket){....}-{0:0}:
		__lock_acquire+0x1742/0x3470
		lock_acquire+0xf0/0x290
		rht_lock+0x69/0xd0
		destroy_dsq+0x22d/0x790
		scx_ops_disable_workfn+0x9d2/0xaf0
		kthread_worker_fn+0x137/0x350
		kthread+0x102/0x120
		ret_from_fork+0x2f/0x40
		ret_from_fork_asm+0x11/0x20

	 other info that might help us debug this:

	 Chain exists of:
	   rhashtable_bucket --> &rq->__lock --> &dsq->lock

	  Possible unsafe locking scenario:

		CPU0                    CPU1
		----                    ----
	   lock(&dsq->lock);
					lock(&rq->__lock);
					lock(&dsq->lock);
	   lock(rhashtable_bucket);

	  *** DEADLOCK ***

	 5 locks held by sched_ext_ops_h/10451:
	  #0: ffffffff83695ad0 (scx_ops_enable_mutex){+.+.}-{3:3}, at: scx_ops_disable_workfn+0x111/0xaf0
	  #1: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: rhashtable_walk_start_check+0x1f/0x3e0
	  #2: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790
	  #3: ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790
	  #4: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
  2024-11-29  5:47 ` Herbert Xu
@ 2024-12-03 20:16 ` Tejun Heo
  2024-12-04  1:34   ` Herbert Xu
  2024-12-12 12:33 ` Herbert Xu
  2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck
  3 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2024-12-03 20:16 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Thomas Graf, Herbert Xu, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
> 
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
> 
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.
> 
> Import to say that this calls rht_grow_above_75(), which reads from
> struct rhashtable without holding the lock, if this is a problem, we can
> move the check to the lock, and schedule the workqueue after the lock.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

  Acked-by: Tejun Heo <tj@kernel.org>

This solves a possible deadlock for sched_ext and makes rhashtable more
useful and I don't see any downsides.

Andrew, can you please pick up this one?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-12-03 20:16 ` Tejun Heo
@ 2024-12-04  1:34   ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2024-12-04  1:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Breno Leitao, Andrew Morton, Thomas Graf, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Tue, Dec 03, 2024 at 10:16:24AM -1000, Tejun Heo wrote:
>
> This solves a possible deadlock for sched_ext and makes rhashtable more
> useful and I don't see any downsides.
> 
> Andrew, can you please pick up this one?

I will be taking this through my tree once I've reviewed fully.

Thanks for your patience.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
  2024-11-29  5:47 ` Herbert Xu
  2024-12-03 20:16 ` Tejun Heo
@ 2024-12-12 12:33 ` Herbert Xu
  2024-12-21  9:06   ` Herbert Xu
  2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck
  3 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2024-12-12 12:33 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 6c902639728b767cc3ee42c61256d2e9618e6ce7..5a27ccd72db9a25d92d1ed2f8d519afcfc672afe 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -585,9 +585,6 @@ static struct bucket_table *rhashtable_insert_one(
>  	rht_assign_locked(bkt, obj);
>  
>  	atomic_inc(&ht->nelems);
> -	if (rht_grow_above_75(ht, tbl))
> -		schedule_work(&ht->run_work);
> -
>  	return NULL;
>  }
>  
> @@ -624,6 +621,9 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  				data = ERR_CAST(new_tbl);
>  
>  			rht_unlock(tbl, bkt, flags);
> +			if (rht_grow_above_75(ht, tbl))
> +				schedule_work(&ht->run_work);
> +

The growth check should stay with the atomic_inc.  Something like
this should work:

			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
				atomic_inc(&ht->nelems);
				if (rht_grow_above_75(ht, tbl))
					schedule_work(&ht->run_work);
				break;
			}

Could you please resend this via linux-crypto?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-12-12 12:33 ` Herbert Xu
@ 2024-12-21  9:06   ` Herbert Xu
  2025-01-02 10:15     ` Breno Leitao
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2024-12-21  9:06 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote:
>
> The growth check should stay with the atomic_inc.  Something like
> this should work:

OK I've applied your patch with the atomic_inc move.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-12-21  9:06   ` Herbert Xu
@ 2025-01-02 10:15     ` Breno Leitao
  2025-01-09  3:16       ` Michael Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Breno Leitao @ 2025-01-02 10:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev, linux-kernel

On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote:
> On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote:
> >
> > The growth check should stay with the atomic_inc.  Something like
> > this should work:
> 
> OK I've applied your patch with the atomic_inc move.

Sorry, I was on vacation, and I am back now. Let me know if you need
anything further.

Thanks for fixing it,
--breno

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-02 10:15     ` Breno Leitao
@ 2025-01-09  3:16       ` Michael Kelley
  2025-01-09 10:15         ` Breno Leitao
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2025-01-09  3:16 UTC (permalink / raw)
  To: Breno Leitao, Herbert Xu, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org
  Cc: Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo, Josh Don,
	Barret Rhoden, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: Breno Leitao <leitao@debian.org> Sent: Thursday, January 2, 2025 2:16 AM
> 
> On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote:
> > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote:
> > >
> > > The growth check should stay with the atomic_inc.  Something like
> > > this should work:
> >
> > OK I've applied your patch with the atomic_inc move.
> 
> Sorry, I was on vacation, and I am back now. Let me know if you need
> anything further.
> 
> Thanks for fixing it,
> --breno

Breno and Herbert --

This patch seems to break things in linux-next. I'm testing with
linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5
ethernet NIC in the VM is failing to get setup.

I bisected to commit e1d3422c95f0 ("rhashtable: Fix potential deadlock
by moving schedule_work outside lock"), then debugged why opening
the mlx5 NIC device is failing. The failure is in the XDP code in function
__xdp_reg_mem_model() where the call to rhashtable_insert_slow()
is returning -E2BIG. The problem does not occur when the commit
is reverted.

The function call stack is this:

dev_open()
__dev_open()
mlx5e_open()
mlx5e_open_locked()
mlx5e_open_channels()
mlx5e_open_channel()
mlx5e_open_queues()
mlx5e_open_rxq_rq()
mlx5e_open_rq()
mlx5e_alloc_rq()
xdp_rxq_info_reg_mem_model()
__xdp_reg_mem_model()
rhashtable_insert_slow()

I have not debugged further as I don't know anything about the
rhashtable code or the XDP code. The only repro I have is a VM
in Azure. I thought I'd ask you (Breno and Herbert) to review
the patch again and see if there's a path that could cause the
hash table to be incorrectly detected as full.

I've included the linux-hyperv mailing list and the mlx5 driver
maintainers on this email. Someone involved with Azure/Hyper-V
or the mlx5 driver may have seen the problem, and I want to try
to avoid duplicative debugging.

Let me know if there's something I can do to help debug further.

Thanks,

Michael Kelley

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-09  3:16       ` Michael Kelley
@ 2025-01-09 10:15         ` Breno Leitao
  2025-01-10  9:27           ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Breno Leitao @ 2025-01-09 10:15 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Herbert Xu, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

Hello Michael,

On Thu, Jan 09, 2025 at 03:16:03AM +0000, Michael Kelley wrote:
> From: Breno Leitao <leitao@debian.org> Sent: Thursday, January 2, 2025 2:16 AM
> > 
> > On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote:
> > > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote:
> > > >
> > > > The growth check should stay with the atomic_inc.  Something like
> > > > this should work:
> > >
> > > OK I've applied your patch with the atomic_inc move.
> > 
> > Sorry, I was on vacation, and I am back now. Let me know if you need
> > anything further.
> > 
> > Thanks for fixing it,
> > --breno
> 
> Breno and Herbert --
> 
> This patch seems to break things in linux-next. I'm testing with
> linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5
> ethernet NIC in the VM is failing to get setup.

Thanks for reporting the issue. I started rolling this patch to Meta's
fleet, and we started seeing a similar problem. Altough not fully
understood yet.

I would suggest we revert this patch until we investigate further. I'll
prepare and send a revert patch shortly.

Sorry for the noise,
--breno

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-09 10:15         ` Breno Leitao
@ 2025-01-10  9:27           ` Herbert Xu
  2025-01-10  9:49             ` Breno Leitao
                               ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Herbert Xu @ 2025-01-10  9:27 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote:
>
> I would suggest we revert this patch until we investigate further. I'll
> prepare and send a revert patch shortly.

Sorry, I think it was my addition that broke things.  The condition
for checking whether an entry is inserted is incorrect, thus resulting
in an underflow of the number of entries after entry removal.

Please test this patch:

---8<---
The function rhashtable_insert_one only returns NULL iff the
insertion was successful, so that alone should be tested before
increment nelems.  Testing the variable data is redundant, and
buggy because we will have overwritten the original value of data
by this point.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bf956b85455a..e196b6f0e35a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 
 			rht_unlock(tbl, bkt, flags);
 
-			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
+			if (!new_tbl) {
 				atomic_inc(&ht->nelems);
 				if (rht_grow_above_75(ht, tbl))
 					schedule_work(&ht->run_work);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-10  9:27           ` Herbert Xu
@ 2025-01-10  9:49             ` Breno Leitao
  2025-01-10 10:07               ` Herbert Xu
  2025-01-10 14:46             ` Zaslonko Mikhail
  2025-01-10 16:59             ` Michael Kelley
  2 siblings, 1 reply; 25+ messages in thread
From: Breno Leitao @ 2025-01-10  9:49 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

Hello Herbet,

On Fri, Jan 10, 2025 at 05:27:49PM +0800, Herbert Xu wrote:

> Sorry, I think it was my addition that broke things.  The condition
> for checking whether an entry is inserted is incorrect, thus resulting
> in an underflow of the number of entries after entry removal.

That is what I though originally as well, but I was not convinced. While
reading the code, I understood that, if new_tbl is not NULL, then
PTR_ERR(data) will be -ENOENT.

In which case `net_tbl` will not be NULL, and PTR_ERR(data) != -ENOENT?

Thanks for solving it.

> Please test this patch:

I don't have an easy reproducer yet, but, I will get this patch in ~50
hosts and see if any of them misbehave during the weekend.

Misbehaving in my case is strongly associate with messages like the
following being printed:

	kobject_uevent: unable to create netlink socket!

Thanks
--breno

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-10  9:49             ` Breno Leitao
@ 2025-01-10 10:07               ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2025-01-10 10:07 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

On Fri, Jan 10, 2025 at 01:49:44AM -0800, Breno Leitao wrote:
>
> That is what I though originally as well, but I was not convinced. While
> reading the code, I understood that, if new_tbl is not NULL, then
> PTR_ERR(data) will be -ENOENT.
> 
> In which case `net_tbl` will not be NULL, and PTR_ERR(data) != -ENOENT?

The bug arises when an insertion succeeds.  So new_tbl is NULL.
The original value of data should have been -ENOENT, however,
it gets overwritten after rhashtable_insert_one (data is now
NULL).

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-10  9:27           ` Herbert Xu
  2025-01-10  9:49             ` Breno Leitao
@ 2025-01-10 14:46             ` Zaslonko Mikhail
  2025-01-10 16:59             ` Michael Kelley
  2 siblings, 0 replies; 25+ messages in thread
From: Zaslonko Mikhail @ 2025-01-10 14:46 UTC (permalink / raw)
  To: Herbert Xu, Breno Leitao
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Heiko Carstens, Alexander Gordeev

Herbert and Breno,

On 10.01.2025 10:27, Herbert Xu wrote:
> On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote:
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..e196b6f0e35a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  
>  			rht_unlock(tbl, bkt, flags);
>  
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> +			if (!new_tbl) {
>  				atomic_inc(&ht->nelems);
>  				if (rht_grow_above_75(ht, tbl))
>  					schedule_work(&ht->run_work);

I'd like to let you know that I was getting OOM failure on s390 when booting
the kernel (linux-next20250109) with limited memory (mem=1G kernel parameter).
Problem took place in both zVM and LPAR environments. Bisecting also revealed
the commit e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving
schedule_work outside lock").
Afterwards, I tried the fix from Herbert above and the error does not appear
any more. So it seems to resolve the issue.

Thanks,
Mikhail


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-10  9:27           ` Herbert Xu
  2025-01-10  9:49             ` Breno Leitao
  2025-01-10 14:46             ` Zaslonko Mikhail
@ 2025-01-10 16:59             ` Michael Kelley
  2025-01-10 17:24               ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu
  2 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2025-01-10 16:59 UTC (permalink / raw)
  To: Herbert Xu, Breno Leitao
  Cc: saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Friday, January 10, 2025 1:28 AM
> 
> On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote:
> >
> > I would suggest we revert this patch until we investigate further. I'll
> > prepare and send a revert patch shortly.
> 
> Sorry, I think it was my addition that broke things.  The condition
> for checking whether an entry is inserted is incorrect, thus resulting
> in an underflow of the number of entries after entry removal.
> 
> Please test this patch:
> 
> ---8<---
> The function rhashtable_insert_one only returns NULL iff the
> insertion was successful, so that alone should be tested before
> increment nelems.  Testing the variable data is redundant, and
> buggy because we will have overwritten the original value of data
> by this point.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work
> outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..e196b6f0e35a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const
> void *key,
> 
>  			rht_unlock(tbl, bkt, flags);
> 
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> +			if (!new_tbl) {
>  				atomic_inc(&ht->nelems);
>  				if (rht_grow_above_75(ht, tbl))
>  					schedule_work(&ht->run_work);
> --

This patch fixes the problem I saw with VMs in the Azure cloud.  Thanks!

Michael Kelley

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [v2 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-10 16:59             ` Michael Kelley
@ 2025-01-10 17:24               ` Herbert Xu
  2025-01-10 18:22                 ` Michael Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2025-01-10 17:24 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

On Fri, Jan 10, 2025 at 04:59:28PM +0000, Michael Kelley wrote:
>
> This patch fixes the problem I saw with VMs in the Azure cloud.  Thanks!

Sorry, but the test on data is needed after all (it was just
buggy).  Otherwise we will break rhlist.  So please test this
patch instead.

---8<---
The test on whether rhashtable_insert_one did an insertion relies
on the value returned by rhashtable_lookup_one.  Unfortunately that
value is overwritten after rhashtable_insert_one returns.  Fix this
by saving the old value.

Also simplify the test as only data == NULL matters.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bf956b85455a..e36b36f3146d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 			data = ERR_PTR(-EAGAIN);
 		} else {
+			void *odata;
+
 			flags = rht_lock(tbl, bkt);
 			data = rhashtable_lookup_one(ht, bkt, tbl,
 						     hash, key, obj);
 			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
 							hash, obj, data);
+			odata = data;
 			if (PTR_ERR(new_tbl) != -EEXIST)
 				data = ERR_CAST(new_tbl);
 
 			rht_unlock(tbl, bkt, flags);
 
-			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
+			if (odata && !new_tbl) {
 				atomic_inc(&ht->nelems);
 				if (rht_grow_above_75(ht, tbl))
 					schedule_work(&ht->run_work);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [v2 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-10 17:24               ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu
@ 2025-01-10 18:22                 ` Michael Kelley
  2025-01-14  3:15                   ` [v3 " Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2025-01-10 18:22 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Friday, January 10, 2025 9:24 AM
> 
> On Fri, Jan 10, 2025 at 04:59:28PM +0000, Michael Kelley wrote:
> >
> > This patch fixes the problem I saw with VMs in the Azure cloud.  Thanks!
> 
> Sorry, but the test on data is needed after all (it was just
> buggy).  Otherwise we will break rhlist.  So please test this
> patch instead.
> 
> ---8<---
> The test on whether rhashtable_insert_one did an insertion relies
> on the value returned by rhashtable_lookup_one.  Unfortunately that
> value is overwritten after rhashtable_insert_one returns.  Fix this
> by saving the old value.
> 
> Also simplify the test as only data == NULL matters.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work
> outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..e36b36f3146d 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht,
> const void *key,
>  			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>  			data = ERR_PTR(-EAGAIN);
>  		} else {
> +			void *odata;
> +
>  			flags = rht_lock(tbl, bkt);
>  			data = rhashtable_lookup_one(ht, bkt, tbl,
>  						     hash, key, obj);
>  			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>  							hash, obj, data);
> +			odata = data;
>  			if (PTR_ERR(new_tbl) != -EEXIST)
>  				data = ERR_CAST(new_tbl);
> 
>  			rht_unlock(tbl, bkt, flags);
> 
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> +			if (odata && !new_tbl) {
>  				atomic_inc(&ht->nelems);
>  				if (rht_grow_above_75(ht, tbl))
>  					schedule_work(&ht->run_work);

This patch passes my tests. I'm doing a narrow test to verify that
the boot failure when opening the Mellanox NIC is no longer occurring.
I also unloaded/reloaded the mlx5 driver a couple of times. For good
measure, I then did a full Linux kernel build, and all is good. My testing
does not broadly verify correct operation of rhashtable except as it
gets exercised implicitly by these basic tests.

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
                   ` (2 preceding siblings ...)
  2024-12-12 12:33 ` Herbert Xu
@ 2025-01-13 19:50 ` Guenter Roeck
  2025-01-14  3:23   ` Herbert Xu
  3 siblings, 1 reply; 25+ messages in thread
From: Guenter Roeck @ 2025-01-13 19:50 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Thomas Graf, Herbert Xu, Tejun Heo, Hao Luo,
	Josh Don, Barret Rhoden, netdev, linux-kernel

Hi,

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
> 
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
> 
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.
> 
> Import to say that this calls rht_grow_above_75(), which reads from
> struct rhashtable without holding the lock, if this is a problem, we can
> move the check to the lock, and schedule the workqueue after the lock.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

With this patch in linux-next, I get some unit test errors.

[    3.800185]     # Subtest: hw_breakpoint
[    3.800469]     # module: hw_breakpoint_test
[    3.800718]     1..9
[    3.810825]     # test_one_cpu: pass:1 fail:0 skip:0 total:1
[    3.810950]     ok 1 test_one_cpu
[    3.814941]     # test_many_cpus: pass:1 fail:0 skip:0 total:1
[    3.815092]     ok 2 test_many_cpus
[    3.822977]     # test_one_task_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.823100]     ok 3 test_one_task_on_all_cpus
[    3.829071]     # test_two_tasks_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.829199]     ok 4 test_two_tasks_on_all_cpus
[    3.830914]     # test_one_task_on_one_cpu: ASSERTION FAILED at kernel/events/hw_breakpoint_test.c:70
[    3.830914]     Expected IS_ERR(bp) to be false, but is true
[    3.832572]     # test_one_task_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.832572]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.833002]     # test_one_task_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.833071]     not ok 5 test_one_task_on_one_cpu
[    3.834994]     # test_one_task_mixed: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.834994]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.835583]     # test_one_task_mixed: pass:0 fail:1 skip:0 total:1
[    3.835638]     not ok 6 test_one_task_mixed
[    3.837131]     # test_two_tasks_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.837131]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.837827]     # test_two_tasks_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.837882]     not ok 7 test_two_tasks_on_one_cpu
[    3.839868]     # test_two_tasks_on_one_all_cpus: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.839868]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.840294]     # test_two_tasks_on_one_all_cpus: pass:0 fail:1 skip:0 total:1
[    3.840538]     not ok 8 test_two_tasks_on_one_all_cpus
[    3.843599]     # test_task_on_all_and_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.843599]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.844163]     # test_task_on_all_and_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.844215]     not ok 9 test_task_on_all_and_one_cpu
[    3.844453] # hw_breakpoint: pass:4 fail:5 skip:0 total:9
[    3.844610] # Totals: pass:4 fail:5 skip:0 total:9
[    3.844797] not ok 1 hw_breakpoint

Sometimes I also see:

[   12.579842]     # Subtest: Handshake API tests
[   12.579971]     1..11
[   12.580052]         KTAP version 1
[   12.580410]         # Subtest: req_alloc API fuzzing
[   12.582206]         ok 1 handshake_req_alloc NULL proto
[   12.583541]         ok 2 handshake_req_alloc CLASS_NONE
[   12.585419]         ok 3 handshake_req_alloc CLASS_MAX
[   12.587291]         ok 4 handshake_req_alloc no callbacks
[   12.589239]         ok 5 handshake_req_alloc no done callback
[   12.590758]         ok 6 handshake_req_alloc excessive privsize
[   12.592642]         ok 7 handshake_req_alloc all good
[   12.592802]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[   12.593185]     ok 1 req_alloc API fuzzing
[   12.597371]     # req_submit NULL req arg: pass:1 fail:0 skip:0 total:1
[   12.597501]     ok 2 req_submit NULL req arg
[   12.599208]     # req_submit NULL sock arg: pass:1 fail:0 skip:0 total:1
[   12.599338]     ok 3 req_submit NULL sock arg
[   12.601549]     # req_submit NULL sock->file: pass:1 fail:0 skip:0 total:1
[   12.601680]     ok 4 req_submit NULL sock->file
[   12.605334]     # req_lookup works: pass:1 fail:0 skip:0 total:1
[   12.605469]     ok 5 req_lookup works
[   12.609596]     # req_submit max pending: pass:1 fail:0 skip:0 total:1
[   12.609730]     ok 6 req_submit max pending
[   12.613796]     # req_submit multiple: pass:1 fail:0 skip:0 total:1
[   12.614250]     ok 7 req_submit multiple
[   12.616395]     # req_cancel before accept: ASSERTION FAILED at net/handshake/handshake-test.c:333
[   12.616395]     Expected err == 0, but
[   12.616395]         err == -16 (0xfffffffffffffff0)
[   12.618061]     # req_cancel before accept: pass:0 fail:1 skip:0 total:1
[   12.618135]     not ok 8 req_cancel before accept
[   12.619437]     # req_cancel after accept: ASSERTION FAILED at net/handshake/handshake-test.c:369
[   12.619437]     Expected err == 0, but
[   12.619437]         err == -16 (0xfffffffffffffff0)
[   12.621055]     # req_cancel after accept: pass:0 fail:1 skip:0 total:1
[   12.621119]     not ok 9 req_cancel after accept
[   12.622342]     # req_cancel after done: ASSERTION FAILED at net/handshake/handshake-test.c:411
[   12.622342]     Expected err == 0, but
[   12.622342]         err == -16 (0xfffffffffffffff0)
[   12.623547]     # req_cancel after done: pass:0 fail:1 skip:0 total:1
[   12.623608]     not ok 10 req_cancel after done
[   12.625297]     # req_destroy works: ASSERTION FAILED at net/handshake/handshake-test.c:469
[   12.625297]     Expected err == 0, but
[   12.625297]         err == -16 (0xfffffffffffffff0)
[   12.626633]     # req_destroy works: pass:0 fail:1 skip:0 total:1
[   12.626696]     not ok 11 req_destroy works
[   12.626837] # Handshake API tests: pass:7 fail:4 skip:0 total:11
[   12.627298] # Totals: pass:13 fail:4 skip:0 total:17
[   12.627446] not ok 90 Handshake API tests

The log is from a test with x86_64, but other architectures are affected
as well. Reverting this patch fixes the problem.

Bisect log is attached for reference.

Guenter

---
# bad: [37136bf5c3a6f6b686d74f41837a6406bec6b7bc] Add linux-next specific files for 20250113
# good: [9d89551994a430b50c4fffcb1e617a057fa76e20] Linux 6.13-rc6
git bisect start 'HEAD' 'v6.13-rc6'
# good: [25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038
# bad: [c6ab5ee56509953c3ee6647ac9f266a7c628f082] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
git bisect bad c6ab5ee56509953c3ee6647ac9f266a7c628f082
# good: [39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d] Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next
git bisect good 39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d
# bad: [0f8b2b2250abe043cef890caa378bebe5c4f5d88] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect bad 0f8b2b2250abe043cef890caa378bebe5c4f5d88
# good: [67fcb0469b17071890761d437bdf83d2e2d14575] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect good 67fcb0469b17071890761d437bdf83d2e2d14575
# bad: [7fd4a4e4c397fc315f8ebf0fb22e50526f66580a] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 7fd4a4e4c397fc315f8ebf0fb22e50526f66580a
# good: [e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7] drm/amd/display: Initialize denominator defaults to 1
git bisect good e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7
# bad: [5b7981c1ca61ca7ad7162cfe95bf271d001d29ac] crypto: x86/aes-xts - use .irp when useful
git bisect bad 5b7981c1ca61ca7ad7162cfe95bf271d001d29ac
# good: [ce8fd0500b741b3669c246cc604f1f2343cdd6fd] crypto: qce - use __free() for a buffer that's always freed
git bisect good ce8fd0500b741b3669c246cc604f1f2343cdd6fd
# good: [5e252f490c1c2c989cdc2ca50744f30fbca356b4] crypto: tea - stop using cra_alignmask
git bisect good 5e252f490c1c2c989cdc2ca50744f30fbca356b4
# good: [f916e44487f56df4827069ff3a2070c0746dc511] crypto: keywrap - remove assignment of 0 to cra_alignmask
git bisect good f916e44487f56df4827069ff3a2070c0746dc511
# bad: [b9b894642fede191d50230d08608bd4f4f49f73d] crypto: lib/gf128mul - Remove some bbe deadcode
git bisect bad b9b894642fede191d50230d08608bd4f4f49f73d
# bad: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock
git bisect bad e1d3422c95f003eba241c176adfe593c33e8a8f6
# first bad commit: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-10 18:22                 ` Michael Kelley
@ 2025-01-14  3:15                   ` Herbert Xu
  2025-01-14 11:58                     ` Michael Kelley
                                       ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Herbert Xu @ 2025-01-14  3:15 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List

On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
>
> This patch passes my tests. I'm doing a narrow test to verify that
> the boot failure when opening the Mellanox NIC is no longer occurring.
> I also unloaded/reloaded the mlx5 driver a couple of times. For good
> measure, I then did a full Linux kernel build, and all is good. My testing
> does not broadly verify correct operation of rhashtable except as it
> gets exercised implicitly by these basic tests.

Thanks for testing! The patch needs one more change though as
moving the atomic_inc outside of the lock was a bad idea on my
part.  This could cause atomic_inc/atomic_dec to be reordered
thus resulting in an underflow.

Thanks,

---8<---
The test on whether rhashtable_insert_one did an insertion relies
on the value returned by rhashtable_lookup_one.  Unfortunately that
value is overwritten after rhashtable_insert_one returns.  Fix this
by moving the test before data gets overwritten.

Simplify the test as only data == NULL matters.

Finally move atomic_inc back within the lock as otherwise it may
be reordered with the atomic_dec on the removal side, potentially
leading to an underflow.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bf956b85455a..0e9a1d4cf89b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 			data = ERR_PTR(-EAGAIN);
 		} else {
+			bool inserted;
+
 			flags = rht_lock(tbl, bkt);
 			data = rhashtable_lookup_one(ht, bkt, tbl,
 						     hash, key, obj);
 			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
 							hash, obj, data);
+			inserted = data && !new_tbl;
+			if (inserted)
+				atomic_inc(&ht->nelems);
 			if (PTR_ERR(new_tbl) != -EEXIST)
 				data = ERR_CAST(new_tbl);
 
 			rht_unlock(tbl, bkt, flags);
 
-			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
-				atomic_inc(&ht->nelems);
-				if (rht_grow_above_75(ht, tbl))
-					schedule_work(&ht->run_work);
-			}
+			if (inserted && rht_grow_above_75(ht, tbl))
+				schedule_work(&ht->run_work);
 		}
 	} while (!IS_ERR_OR_NULL(new_tbl));
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
  2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck
@ 2025-01-14  3:23   ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2025-01-14  3:23 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Breno Leitao, Andrew Morton, Thomas Graf, Tejun Heo, Hao Luo,
	Josh Don, Barret Rhoden, netdev, linux-kernel

On Mon, Jan 13, 2025 at 11:50:36AM -0800, Guenter Roeck wrote:
>
> With this patch in linux-next, I get some unit test errors.

Thanks! This patch should fix the problem.

https://patchwork.kernel.org/project/linux-crypto/patch/Z4XWx5X0doetOJni@gondor.apana.org.au/
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-14  3:15                   ` [v3 " Herbert Xu
@ 2025-01-14 11:58                     ` Michael Kelley
  2025-01-15 15:15                     ` Breno Leitao
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Michael Kelley @ 2025-01-14 11:58 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List

From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Monday, January 13, 2025 7:15 PM
> 
> On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
> >
> > This patch passes my tests. I'm doing a narrow test to verify that
> > the boot failure when opening the Mellanox NIC is no longer occurring.
> > I also unloaded/reloaded the mlx5 driver a couple of times. For good
> > measure, I then did a full Linux kernel build, and all is good. My testing
> > does not broadly verify correct operation of rhashtable except as it
> > gets exercised implicitly by these basic tests.
> 
> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.
> 
> Thanks,

I've tested this version of the fix in the same limited way as
before. All is good. I'm still testing against linux-next20250108
from a few days ago, but that should not make any difference.
I have *not* reviewed the code change itself.

Tested-by: Michael Kelley <mhklinux@outlook.com>

> 
> ---8<---
> The test on whether rhashtable_insert_one did an insertion relies
> on the value returned by rhashtable_lookup_one.  Unfortunately that
> value is overwritten after rhashtable_insert_one returns.  Fix this
> by moving the test before data gets overwritten.
> 
> Simplify the test as only data == NULL matters.
> 
> Finally move atomic_inc back within the lock as otherwise it may
> be reordered with the atomic_dec on the removal side, potentially
> leading to an underflow.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..0e9a1d4cf89b 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht,
> const void *key,
>  			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>  			data = ERR_PTR(-EAGAIN);
>  		} else {
> +			bool inserted;
> +
>  			flags = rht_lock(tbl, bkt);
>  			data = rhashtable_lookup_one(ht, bkt, tbl,
>  						     hash, key, obj);
>  			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>  							hash, obj, data);
> +			inserted = data && !new_tbl;
> +			if (inserted)
> +				atomic_inc(&ht->nelems);
>  			if (PTR_ERR(new_tbl) != -EEXIST)
>  				data = ERR_CAST(new_tbl);
> 
>  			rht_unlock(tbl, bkt, flags);
> 
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> -				atomic_inc(&ht->nelems);
> -				if (rht_grow_above_75(ht, tbl))
> -					schedule_work(&ht->run_work);
> -			}
> +			if (inserted && rht_grow_above_75(ht, tbl))
> +				schedule_work(&ht->run_work);
>  		}
>  	} while (!IS_ERR_OR_NULL(new_tbl));
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-14  3:15                   ` [v3 " Herbert Xu
  2025-01-14 11:58                     ` Michael Kelley
@ 2025-01-15 15:15                     ` Breno Leitao
  2025-01-16  9:10                       ` Herbert Xu
  2025-01-16 11:48                     ` Alexander Gordeev
  2025-01-17 13:20                     ` Zaslonko Mikhail
  3 siblings, 1 reply; 25+ messages in thread
From: Breno Leitao @ 2025-01-15 15:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List

On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote:
> On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
> >
> > This patch passes my tests. I'm doing a narrow test to verify that
> > the boot failure when opening the Mellanox NIC is no longer occurring.
> > I also unloaded/reloaded the mlx5 driver a couple of times. For good
> > measure, I then did a full Linux kernel build, and all is good. My testing
> > does not broadly verify correct operation of rhashtable except as it
> > gets exercised implicitly by these basic tests.
> 
> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.
> 
> Thanks,
> 
> ---8<---
> The test on whether rhashtable_insert_one did an insertion relies
> on the value returned by rhashtable_lookup_one.  Unfortunately that
> value is overwritten after rhashtable_insert_one returns.  Fix this
> by moving the test before data gets overwritten.
> 
> Simplify the test as only data == NULL matters.
> 
> Finally move atomic_inc back within the lock as otherwise it may
> be reordered with the atomic_dec on the removal side, potentially
> leading to an underflow.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Reviewed-by: Breno Leitao <leitao@debian.org>

> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..0e9a1d4cf89b 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>  			data = ERR_PTR(-EAGAIN);
>  		} else {
> +			bool inserted;
> +
>  			flags = rht_lock(tbl, bkt);
>  			data = rhashtable_lookup_one(ht, bkt, tbl,
>  						     hash, key, obj);
>  			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>  							hash, obj, data);
> +			inserted = data && !new_tbl;
> +			if (inserted)
> +				atomic_inc(&ht->nelems);
>  			if (PTR_ERR(new_tbl) != -EEXIST)
>  				data = ERR_CAST(new_tbl);
>  
>  			rht_unlock(tbl, bkt, flags);
>  
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> -				atomic_inc(&ht->nelems);
> -				if (rht_grow_above_75(ht, tbl))
> -					schedule_work(&ht->run_work);
> -			}
> +			if (inserted && rht_grow_above_75(ht, tbl))
> +				schedule_work(&ht->run_work);

That makes sense, since data could be ERR_PTR(-ENOENT) and ERR_PTR(-EAGAIN), and
the object being inserted, which means that nelems should be increased.

It was hard to review this patch, basically rhashtable_insert_one()
returns three type of values, and you are interested in only one case,
when the obj was inserted.

These are the type of values that is coming from
rhashtable_insert_one():

  1) NULL: if object was inserted OR if data is NULL
  2) Non error and !NULL: A new table to look at
  3) ERR: Definitely not added

I am wondering if we decoupled the first case, and only return NULL iff
the object was added, it would simplify this logic.

Something like the following (not tested):

	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
	index 3e555d012ed60..5a0ec71e990ee 100644
	--- a/lib/rhashtable.c
	+++ b/lib/rhashtable.c
	@@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one(
			return ERR_PTR(-EEXIST);

		if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT)
	-               return ERR_CAST(data);
	+               return ERR_PTR(-EINVAL);

		new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
		if (new_tbl)


Thanks for fixing it,
--breno

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-15 15:15                     ` Breno Leitao
@ 2025-01-16  9:10                       ` Herbert Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Herbert Xu @ 2025-01-16  9:10 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Michael Kelley, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List

On Wed, Jan 15, 2025 at 07:15:46AM -0800, Breno Leitao wrote:
>
> Something like the following (not tested):
> 
> 	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> 	index 3e555d012ed60..5a0ec71e990ee 100644
> 	--- a/lib/rhashtable.c
> 	+++ b/lib/rhashtable.c
> 	@@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one(
> 			return ERR_PTR(-EEXIST);
> 
> 		if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT)
> 	-               return ERR_CAST(data);
> 	+               return ERR_PTR(-EINVAL);

But data == NULL is not an error, it is a successful return value
for rhltable.  It's when a key already exists in the rhltable
and we're simply appending the new object to it.  Thus the insertion
was successful but the hash table did not grow.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-14  3:15                   ` [v3 " Herbert Xu
  2025-01-14 11:58                     ` Michael Kelley
  2025-01-15 15:15                     ` Breno Leitao
@ 2025-01-16 11:48                     ` Alexander Gordeev
  2025-01-17 13:20                     ` Zaslonko Mikhail
  3 siblings, 0 replies; 25+ messages in thread
From: Alexander Gordeev @ 2025-01-16 11:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael Kelley, Breno Leitao, saeedm@nvidia.com,
	tariqt@nvidia.com, linux-hyperv@vger.kernel.org, Andrew Morton,
	Thomas Graf, Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List, Zaslonko Mikhail

On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote:

Hi Herbert,

> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.

I want to confirm that this patch fixes massive strangenesses and
few crashes observed on s390 systems, in addition to OOMs reported
by Mikhail Zaslonko earlier.

> Thanks,

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3 PATCH] rhashtable: Fix rhashtable_try_insert test
  2025-01-14  3:15                   ` [v3 " Herbert Xu
                                       ` (2 preceding siblings ...)
  2025-01-16 11:48                     ` Alexander Gordeev
@ 2025-01-17 13:20                     ` Zaslonko Mikhail
  3 siblings, 0 replies; 25+ messages in thread
From: Zaslonko Mikhail @ 2025-01-17 13:20 UTC (permalink / raw)
  To: Herbert Xu, Michael Kelley
  Cc: Breno Leitao, saeedm@nvidia.com, tariqt@nvidia.com,
	linux-hyperv@vger.kernel.org, Andrew Morton, Thomas Graf,
	Tejun Heo, Hao Luo, Josh Don, Barret Rhoden,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux Crypto Mailing List

Hello,

On 14.01.2025 04:15, Herbert Xu wrote:
> On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
> 
> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.
> 
> Thanks,

I've tested v3 patch on s390 for the boot OOM problem with 'mem=1G'
kernel parameter we experience on current linux-next kernel and confirm
that the issue is no longer present.

Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>

> 
> ---8<---
> The test on whether rhashtable_insert_one did an insertion relies
> on the value returned by rhashtable_lookup_one.  Unfortunately that
> value is overwritten after rhashtable_insert_one returns.  Fix this
> by moving the test before data gets overwritten.
> 
> Simplify the test as only data == NULL matters.
> 
> Finally move atomic_inc back within the lock as otherwise it may
> be reordered with the atomic_dec on the removal side, potentially
> leading to an underflow.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..0e9a1d4cf89b 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>  			data = ERR_PTR(-EAGAIN);
>  		} else {
> +			bool inserted;
> +
>  			flags = rht_lock(tbl, bkt);
>  			data = rhashtable_lookup_one(ht, bkt, tbl,
>  						     hash, key, obj);
>  			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>  							hash, obj, data);
> +			inserted = data && !new_tbl;
> +			if (inserted)
> +				atomic_inc(&ht->nelems);
>  			if (PTR_ERR(new_tbl) != -EEXIST)
>  				data = ERR_CAST(new_tbl);
>  
>  			rht_unlock(tbl, bkt, flags);
>  
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> -				atomic_inc(&ht->nelems);
> -				if (rht_grow_above_75(ht, tbl))
> -					schedule_work(&ht->run_work);
> -			}
> +			if (inserted && rht_grow_above_75(ht, tbl))
> +				schedule_work(&ht->run_work);
>  		}
>  	} while (!IS_ERR_OR_NULL(new_tbl));
>  

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-01-17 13:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
2024-11-29  5:47 ` Herbert Xu
2024-12-02 20:16   ` Breno Leitao
2024-12-03 20:16 ` Tejun Heo
2024-12-04  1:34   ` Herbert Xu
2024-12-12 12:33 ` Herbert Xu
2024-12-21  9:06   ` Herbert Xu
2025-01-02 10:15     ` Breno Leitao
2025-01-09  3:16       ` Michael Kelley
2025-01-09 10:15         ` Breno Leitao
2025-01-10  9:27           ` Herbert Xu
2025-01-10  9:49             ` Breno Leitao
2025-01-10 10:07               ` Herbert Xu
2025-01-10 14:46             ` Zaslonko Mikhail
2025-01-10 16:59             ` Michael Kelley
2025-01-10 17:24               ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu
2025-01-10 18:22                 ` Michael Kelley
2025-01-14  3:15                   ` [v3 " Herbert Xu
2025-01-14 11:58                     ` Michael Kelley
2025-01-15 15:15                     ` Breno Leitao
2025-01-16  9:10                       ` Herbert Xu
2025-01-16 11:48                     ` Alexander Gordeev
2025-01-17 13:20                     ` Zaslonko Mikhail
2025-01-13 19:50 ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Guenter Roeck
2025-01-14  3:23   ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox