public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
@ 2026-04-09 13:08 Vasily Gorbik
  2026-04-09 17:22 ` Paul E. McKenney
  2026-04-09 17:26 ` Boqun Feng
  0 siblings, 2 replies; 13+ messages in thread
From: Vasily Gorbik @ 2026-04-09 13:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
non-preemptible") defers srcu_node tree allocation when called under
raw spinlock, putting SRCU through ~6 transitional grace periods
(SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
per-CPU pools directly - pools for not-online CPUs have no workers,
work accumulates, workqueue lockup detector fires.

Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
SRCU_SIZE_BIG, the mask = ~0 path was never reached.

Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
where possible CPUs > online CPUs is the usual case.
Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)

s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):

  BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
  BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
  ...
  BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
  Showing busy workqueues and worker pools:
  workqueue rcu_gp: flags=0x108
    pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks
    pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks
    ...
    pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
      pending: 3*srcu_invoke_callbacks

Not sure if replacing mask = ~0 with something derived from
cpu_online_mask would be racy in that context.

[1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
[2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
@ 2026-04-09 17:22 ` Paul E. McKenney
  2026-04-09 19:15   ` Vasily Gorbik
  2026-04-09 17:26 ` Boqun Feng
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2026-04-09 17:22 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") defers srcu_node tree allocation when called under
> raw spinlock, putting SRCU through ~6 transitional grace periods
> (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> per-CPU pools directly - pools for not-online CPUs have no workers,
> work accumulates, workqueue lockup detector fires.
> 
> Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> 
> Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> where possible CPUs > online CPUs is the usual case.
> Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> 
> s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> 
>   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
>   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
>   ...
>   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
>   Showing busy workqueues and worker pools:
>   workqueue rcu_gp: flags=0x108
>     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     ...
>     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
> 
> Not sure if replacing mask = ~0 with something derived from
> cpu_online_mask would be racy in that context.
> 
> [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

First, thank you for the bug report and apologies for the hassle!
This was a pre-existing bug, but the change made it much more likely
to happen.

Does the alleged (and untested) fix below do the trick?  The theory is
that if a given CPU has ever been fully online, it has workqueues set up.
Directly checking whether a CPU is currently online is vulnerable to a CPU
piling up lots of SRCU callbacks, then going offline.  So we do need to
be prepared to invoke SRCU callbacks for CPUs that are currently offline.

In the meantime, I will start up some tests.  Not that they saw the
bug in the first place, so it is your tests that matter here.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..e68ee7f69e1fc 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -898,7 +898,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))) || !rcu_cpu_beenfullyonline(cpu))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
  2026-04-09 17:22 ` Paul E. McKenney
@ 2026-04-09 17:26 ` Boqun Feng
  2026-04-09 17:40   ` Boqun Feng
  1 sibling, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2026-04-09 17:26 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Tejun Heo, Lai Jiangshan

On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") defers srcu_node tree allocation when called under
> raw spinlock, putting SRCU through ~6 transitional grace periods
> (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> per-CPU pools directly - pools for not-online CPUs have no workers,

[Cc workqueue]

Hmm.. I thought for offline CPUs the corresponding worker pools become a
unbound one hence there are still workers?

Regards,
Boqun

> work accumulates, workqueue lockup detector fires.
> 
> Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> 
> Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> where possible CPUs > online CPUs is the usual case.
> Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> 
> s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> 
>   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
>   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
>   ...
>   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
>   Showing busy workqueues and worker pools:
>   workqueue rcu_gp: flags=0x108
>     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
>     ...
>     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
>       pending: 3*srcu_invoke_callbacks
> 
> Not sure if replacing mask = ~0 with something derived from
> cpu_online_mask would be racy in that context.
> 
> [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:26 ` Boqun Feng
@ 2026-04-09 17:40   ` Boqun Feng
  2026-04-09 17:47     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2026-04-09 17:40 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Tejun Heo, Lai Jiangshan

On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") defers srcu_node tree allocation when called under
> > raw spinlock, putting SRCU through ~6 transitional grace periods
> > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > per-CPU pools directly - pools for not-online CPUs have no workers,
> 
> [Cc workqueue]
> 
> Hmm.. I thought for offline CPUs the corresponding worker pools become a
> unbound one hence there are still workers?
> 

Ah, as Paul replied in another email, the problem was because these CPUs
had never been onlined, so they don't even have unbound workers?

Regards,
Boqun

> Regards,
> Boqun
> 
> > work accumulates, workqueue lockup detector fires.
> > 
> > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > 
> > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > where possible CPUs > online CPUs is the usual case.
> > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > 
> > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > 
> >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   ...
> >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   Showing busy workqueues and worker pools:
> >   workqueue rcu_gp: flags=0x108
> >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     ...
> >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> > 
> > Not sure if replacing mask = ~0 with something derived from
> > cpu_online_mask would be racy in that context.
> > 
> > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:40   ` Boqun Feng
@ 2026-04-09 17:47     ` Tejun Heo
  2026-04-09 17:48       ` Tejun Heo
  2026-04-09 18:10       ` Boqun Feng
  0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2026-04-09 17:47 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > 
> > [Cc workqueue]
> > 
> > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > unbound one hence there are still workers?
> > 
> 
> Ah, as Paul replied in another email, the problem was because these CPUs
> had never been onlined, so they don't even have unbound workers?

Hahaha, we do initialize worker pool for every possible CPU but the
transition to unbound operation happens in the hot unplug callback. We
probably need to do some of the hot unplug operation during init if the CPU
is possible but not online. That said, what kind of machine is it? Is the
firmware just reporting bogus possible mask? How come the CPUs weren't
online during boot?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:47     ` Tejun Heo
@ 2026-04-09 17:48       ` Tejun Heo
  2026-04-09 18:04         ` Paul E. McKenney
  2026-04-09 18:10       ` Boqun Feng
  1 sibling, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2026-04-09 17:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > 
> > > [Cc workqueue]
> > > 
> > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > unbound one hence there are still workers?
> > > 
> > 
> > Ah, as Paul replied in another email, the problem was because these CPUs
> > had never been onlined, so they don't even have unbound workers?
> 
> Hahaha, we do initialize worker pool for every possible CPU but the
> transition to unbound operation happens in the hot unplug callback. We
> probably need to do some of the hot unplug operation during init if the CPU
> is possible but not online. That said, what kind of machine is it? Is the
> firmware just reporting bogus possible mask? How come the CPUs weren't
> online during boot?

Just saw ibm on the cc list. Guess this was on s390?

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:48       ` Tejun Heo
@ 2026-04-09 18:04         ` Paul E. McKenney
  2026-04-09 18:09           ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > 
> > > > [Cc workqueue]
> > > > 
> > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > unbound one hence there are still workers?
> > > > 
> > > 
> > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > had never been onlined, so they don't even have unbound workers?
> > 
> > Hahaha, we do initialize worker pool for every possible CPU but the
> > transition to unbound operation happens in the hot unplug callback. We
> > probably need to do some of the hot unplug operation during init if the CPU
> > is possible but not online. That said, what kind of machine is it? Is the
> > firmware just reporting bogus possible mask? How come the CPUs weren't
> > online during boot?
> 
> Just saw ibm on the cc list. Guess this was on s390?

It was indeed.  What workqueue tricks does s390 play?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:04         ` Paul E. McKenney
@ 2026-04-09 18:09           ` Tejun Heo
  2026-04-09 18:15             ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2026-04-09 18:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 11:04:09AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > > 
> > > > > [Cc workqueue]
> > > > > 
> > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > unbound one hence there are still workers?
> > > > > 
> > > > 
> > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > had never been onlined, so they don't even have unbound workers?
> > > 
> > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > transition to unbound operation happens in the hot unplug callback. We
> > > probably need to do some of the hot unplug operation during init if the CPU
> > > is possible but not online. That said, what kind of machine is it? Is the
> > > firmware just reporting bogus possible mask? How come the CPUs weren't
> > > online during boot?
> > 
> > Just saw ibm on the cc list. Guess this was on s390?
> 
> It was indeed.  What workqueue tricks does s390 play?

They just come up with genuinely possible but offline CPUs. Most setups
don't do that. I'll spin up a patch later today.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:47     ` Tejun Heo
  2026-04-09 17:48       ` Tejun Heo
@ 2026-04-09 18:10       ` Boqun Feng
  2026-04-09 18:27         ` Paul E. McKenney
  1 sibling, 1 reply; 13+ messages in thread
From: Boqun Feng @ 2026-04-09 18:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vasily Gorbik, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Uladzislau Rezki, rcu,
	linux-kernel, linux-s390, Lai Jiangshan

On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > 
> > > [Cc workqueue]
> > > 
> > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > unbound one hence there are still workers?
> > > 
> > 
> > Ah, as Paul replied in another email, the problem was because these CPUs
> > had never been onlined, so they don't even have unbound workers?
> 
> Hahaha, we do initialize worker pool for every possible CPU but the
> transition to unbound operation happens in the hot unplug callback. We

;-) ;-) ;-)

> probably need to do some of the hot unplug operation during init if the CPU

Seems that we (mostly Paul) have our own trick to track whether a CPU
has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
used it in his fix [1]. And I think it won't be that hard to copy it
into workqueue and let queue_work_on() use it so that if the user queues
a work on a never-onlined CPU, it can detect it (with a warning?) and do
something?

[1]: https://lore.kernel.org/rcu/073abb55-197a-4519-b177-f9f776624fed@paulmck-laptop/

Regards,
Boqun

> is possible but not online. That said, what kind of machine is it? Is the
> firmware just reporting bogus possible mask? How come the CPUs weren't
> online during boot?
> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:09           ` Tejun Heo
@ 2026-04-09 18:15             ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boqun Feng, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 08:09:46AM -1000, Tejun Heo wrote:
> On Thu, Apr 09, 2026 at 11:04:09AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 07:48:28AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > > > 
> > > > > > [Cc workqueue]
> > > > > > 
> > > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > > unbound one hence there are still workers?
> > > > > > 
> > > > > 
> > > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > > had never been onlined, so they don't even have unbound workers?
> > > > 
> > > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > > transition to unbound operation happens in the hot unplug callback. We
> > > > probably need to do some of the hot unplug operation during init if the CPU
> > > > is possible but not online. That said, what kind of machine is it? Is the
> > > > firmware just reporting bogus possible mask? How come the CPUs weren't
> > > > online during boot?
> > > 
> > > Just saw ibm on the cc list. Guess this was on s390?
> > 
> > It was indeed.  What workqueue tricks does s390 play?
> 
> They just come up with genuinely possible but offline CPUs. Most setups
> don't do that. I'll spin up a patch later today.

I would be more than happy for workqueues to be happy to queue and
execute a handler for a never-been-online CPU.  Whatever works!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 18:10       ` Boqun Feng
@ 2026-04-09 18:27         ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2026-04-09 18:27 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Tejun Heo, Vasily Gorbik, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Uladzislau Rezki, rcu, linux-kernel, linux-s390,
	Lai Jiangshan

On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > 
> > > > [Cc workqueue]
> > > > 
> > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > unbound one hence there are still workers?
> > > > 
> > > 
> > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > had never been onlined, so they don't even have unbound workers?
> > 
> > Hahaha, we do initialize worker pool for every possible CPU but the
> > transition to unbound operation happens in the hot unplug callback. We
> 
> ;-) ;-) ;-)
> 
> > probably need to do some of the hot unplug operation during init if the CPU
> 
> Seems that we (mostly Paul) have our own trick to track whether a CPU
> has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> used it in his fix [1]. And I think it won't be that hard to copy it
> into workqueue and let queue_work_on() use it so that if the user queues
> a work on a never-onlined CPU, it can detect it (with a warning?) and do
> something?
> 
> [1]: https://lore.kernel.org/rcu/073abb55-197a-4519-b177-f9f776624fed@paulmck-laptop/

It might be that my patch (or something like it) will be required in
addition to Tejun's fix because the current Tree SRCU code is happy
to schedule a workqueue handler on a CPU that does not even have a bit
set in the cpu_possible_mask.  This could happen on a system with the
first 50 CPUs, as in 0-49, in cpu_possible_mask.  Tree SRCU would then
be quite happy to schedule workqueue handlers on the mythical CPUs 50-63.
Which, now that I think on it, does seem a bit more brave than absolutely
warranted.  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > is possible but not online. That said, what kind of machine is it? Is the
> > firmware just reporting bogus possible mask? How come the CPUs weren't
> > online during boot?
> > 
> > Thanks.
> > 
> > -- 
> > tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 17:22 ` Paul E. McKenney
@ 2026-04-09 19:15   ` Vasily Gorbik
  2026-04-09 20:10     ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Vasily Gorbik @ 2026-04-09 19:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 10:22:00AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > non-preemptible") defers srcu_node tree allocation when called under
> > raw spinlock, putting SRCU through ~6 transitional grace periods
> > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > per-CPU pools directly - pools for not-online CPUs have no workers,
> > work accumulates, workqueue lockup detector fires.
> > 
> > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > 
> > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > where possible CPUs > online CPUs is the usual case.
> > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > 
> > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > 
> >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   ...
> >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> >   Showing busy workqueues and worker pools:
> >   workqueue rcu_gp: flags=0x108
> >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> >     ...
> >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> >       pending: 3*srcu_invoke_callbacks
> > 
> > Not sure if replacing mask = ~0 with something derived from
> > cpu_online_mask would be racy in that context.
> > 
> > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop
> 
> This was a pre-existing bug, but the change made it much more likely
> to happen.

Yes, indeed.

> Does the alleged (and untested) fix below do the trick?  The theory is
> that if a given CPU has ever been fully online, it has workqueues set up.
> Directly checking whether a CPU is currently online is vulnerable to a CPU
> piling up lots of SRCU callbacks, then going offline.  So we do need to
> be prepared to invoke SRCU callbacks for CPUs that are currently offline.

Yes, tested on s390 LPAR (76 online, 400 possible) as well as
on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
no more workqueue lockup in both cases.

Thank you!

Tested-by: Vasily Gorbik <gor@linux.ibm.com>

> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0d01cd8c4b4a7..e68ee7f69e1fc 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -898,7 +898,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>  	int cpu;
>  
>  	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1UL << (cpu - snp->grplo))))
> +		if (!(mask & (1UL << (cpu - snp->grplo))) || !rcu_cpu_beenfullyonline(cpu))
>  			continue;
>  		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>  	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
  2026-04-09 19:15   ` Vasily Gorbik
@ 2026-04-09 20:10     ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2026-04-09 20:10 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, rcu, linux-kernel, linux-s390

On Thu, Apr 09, 2026 at 09:15:50PM +0200, Vasily Gorbik wrote:
> On Thu, Apr 09, 2026 at 10:22:00AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > work accumulates, workqueue lockup detector fires.
> > > 
> > > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > > 
> > > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400 possible),
> > > where possible CPUs > online CPUs is the usual case.
> > > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > > 
> > > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > > 
> > >   BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   ...
> > >   BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for 1842s!
> > >   Showing busy workqueues and worker pools:
> > >   workqueue rcu_gp: flags=0x108
> > >     pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > >     pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > >     ...
> > >     pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > >       pending: 3*srcu_invoke_callbacks
> > > 
> > > Not sure if replacing mask = ~0 with something derived from
> > > cpu_online_mask would be racy in that context.
> > > 
> > > [1] https://lore.kernel.org/rcu/acRho9L4zA2MRuxc@tardis.local
> > > [2] https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop
> > 
> > This was a pre-existing bug, but the change made it much more likely
> > to happen.
> 
> Yes, indeed.
> 
> > Does the alleged (and untested) fix below do the trick?  The theory is
> > that if a given CPU has ever been fully online, it has workqueues set up.
> > Directly checking whether a CPU is currently online is vulnerable to a CPU
> > piling up lots of SRCU callbacks, then going offline.  So we do need to
> > be prepared to invoke SRCU callbacks for CPUs that are currently offline.
> 
> Yes, tested on s390 LPAR (76 online, 400 possible) as well as
> on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
> no more workqueue lockup in both cases.
> 
> Thank you!
> 
> Tested-by: Vasily Gorbik <gor@linux.ibm.com>

Thank you for testing this!

Please see below for an updated patch.  Tejun's patch might obsolete
this one, but just in case he balks at SRCU queueing handlers for CPUs
that are not even in the cpu_possible_mask.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit dcc14db7e76af899f1ff4606ec4316580d7b6f88
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Apr 9 11:16:02 2026 -0700

    srcu: Don't queue workqueue handlers to never-online CPUs
    
    While an srcu_struct structure is in the midst of switching from CPU-0
    to all-CPUs state, it can attempt to invoke callbacks for CPUs that
    have never been online.  Worse yet, it can attempt in invoke callbacks
    for CPUs that never will be online due to not being present in the
    cpu_possible_mask.  This can cause hangs on s390, which is not set up to
    deal with workqueue handlers being scheduled on such CPUs.  This commit
    therefore causes Tree SRCU to refrain from queueing workqueue handlers
    on CPUs that have not yet (and might never) come online.
    
    Reported-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Tejun Heo <tj@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..a67af44fc0745 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 {
 	int cpu;
 
-	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1UL << (cpu - snp->grplo))))
-			continue;
-		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
-	}
+	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+		if ((mask & (1UL << (cpu - snp->grplo))) && rcu_cpu_beenfullyonline(cpu))
+			srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-09 20:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 13:08 BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition Vasily Gorbik
2026-04-09 17:22 ` Paul E. McKenney
2026-04-09 19:15   ` Vasily Gorbik
2026-04-09 20:10     ` Paul E. McKenney
2026-04-09 17:26 ` Boqun Feng
2026-04-09 17:40   ` Boqun Feng
2026-04-09 17:47     ` Tejun Heo
2026-04-09 17:48       ` Tejun Heo
2026-04-09 18:04         ` Paul E. McKenney
2026-04-09 18:09           ` Tejun Heo
2026-04-09 18:15             ` Paul E. McKenney
2026-04-09 18:10       ` Boqun Feng
2026-04-09 18:27         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox