public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] workqueue: destroy_worker() vs isolated CPUs
@ 2022-10-04 15:05 Valentin Schneider
  2022-10-04 15:05 ` [PATCH v4 1/4] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex Valentin Schneider
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Valentin Schneider @ 2022-10-04 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

Hi folks,

I haven't sent an update for this in a while, but the issue has risen again in
some other environment so I'm getting more reasons to push this out.

Revisions
=========

RFCv3 -> v4
+++++++++++

o Rebase onto v6.0
o Split into more patches for reviewability
o Take dying workers out of the pool->workers as suggested by Lai

RFCv2 -> RFCv3
++++++++++++++

o Rebase onto v5.19
o Add new patch (1/3) around accessing wq_unbound_cpumask

o Prevent WORKER_DIE workers for kfree()'ing themselves before the idle reaper
  gets to handle them (Tejun)

  Bit of an aside on that: I've been struggling to convince myself this can
  happen due to spurious wakeups and would like some help here.

  Idle workers are TASK_UNINTERRUPTIBLE, so they can't be woken up by
  signals. That state is set *under* pool->lock, and all wakeups (before this
  patch) are also done while holding pool->lock.
  
  wake_up_worker() is done under pool->lock AND only wakes a worker on the
  pool->idle_list. Thus the to-be-woken worker *cannot* have WORKER_DIE, though
  it could gain it *after* being woken but *before* it runs, e.g.:
                          
  LOCK pool->lock
  wake_up_worker(pool)
      wake_up_process(p)
  UNLOCK pool->lock
                          idle_reaper_fn()
                            LOCK pool->lock
                            destroy_worker(worker, list);
			    UNLOCK pool->lock
			                            worker_thread()
						      goto woke_up;
                                                      LOCK pool->lock
						      READ worker->flags & WORKER_DIE
                                                          UNLOCK pool->lock
                                                          ...
						          kfree(worker);
                            reap_worker(worker);
			        // Uh-oh
			  
  ... But IMO that's not a spurious wakeup, that's a concurrency issue. I don't
  see any spurious/unexpected worker wakeup happening once a worker is off the
  pool->idle_list.
  

RFCv1 -> RFCv2
++++++++++++++

o Change the pool->timer into a delayed_work to have a sleepable context for
  unbinding kworkers

Cheers,
Valentin

Lai Jiangshan (1):
  workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex

Valentin Schneider (3):
  workqueue: Factorize unbind/rebind_workers() logic
  workqueue: Convert the idle_timer to a delayed_work
  workqueue: Unbind workers before sending them to exit()

 kernel/workqueue.c | 195 +++++++++++++++++++++++++++++++--------------
 1 file changed, 136 insertions(+), 59 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-02 17:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-04 15:05 [PATCH v4 0/4] workqueue: destroy_worker() vs isolated CPUs Valentin Schneider
2022-10-04 15:05 ` [PATCH v4 1/4] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex Valentin Schneider
2022-10-04 15:05 ` [PATCH v4 2/4] workqueue: Factorize unbind/rebind_workers() logic Valentin Schneider
2022-10-04 15:05 ` [PATCH v4 3/4] workqueue: Convert the idle_timer to a delayed_work Valentin Schneider
2022-10-31 18:49   ` Tejun Heo
2022-11-02 17:13     ` Valentin Schneider
2022-10-04 15:05 ` [PATCH v4 4/4] workqueue: Unbind workers before sending them to exit() Valentin Schneider
     [not found] ` <20221005010832.1934-1-hdanton@sina.com>
2022-10-05 11:13   ` Valentin Schneider
     [not found]   ` <20221005145022.1695-1-hdanton@sina.com>
2022-10-05 16:14     ` Valentin Schneider
2022-10-25  9:42 ` [PATCH v4 0/4] workqueue: destroy_worker() vs isolated CPUs Valentin Schneider
2022-10-31 18:45   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox