From: Valentin Schneider <vschneid@redhat.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Frederic Weisbecker <frederic@kernel.org>,
Juri Lelli <juri.lelli@redhat.com>, Phil Auld <pauld@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [RFC PATCH v3 2/3] workqueue: Unbind workers before sending them to exit()
Date: Fri, 05 Aug 2022 17:47:09 +0100 [thread overview]
Message-ID: <xhsmh8ro2d4du.mognet@vschneid.remote.csb> (raw)
In-Reply-To: <CAJhGHyAzoa5Mb7cHd8oxbWOfgsGEt-8afTTVdjOWY8sgHY0Mcg@mail.gmail.com>
On 05/08/22 11:16, Lai Jiangshan wrote:
> On Tue, Aug 2, 2022 at 4:42 PM Valentin Schneider <vschneid@redhat.com> wrote:
>> +/*
>> + * Unlikely as it may be, a worker could wake after destroy_worker() has
>> + * happened but before reap_workers(). WORKER_DIE would be set in worker->flags,
>> + * so it would be able to kfree(worker) and head out to do_exit().
>> + *
>> + * Rather than make the reaper wait for each to-be-reaped kworker to exit and
>> + * kfree(worker) itself, make the kworkers (which have nothing to do but go
>> + * do_exit() anyway) wait for the reaper to be done with them.
>> + */
>> +static void worker_wait_reaped(struct worker *worker)
>> +{
>> + WARN_ON_ONCE(current != worker->task);
>> +
>> + for (;;) {
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + if (READ_ONCE(worker->reaped))
>> + break;
>> + schedule();
>> + }
>> + __set_current_state(TASK_RUNNING);
>> +}
>
>
> It is not a good idea to add this scheduler-ist code here.
>
> Using wq_pool_attach_mutex to protects the whole body of idle_reaper_fn()
> can stop the worker from freeing itself since the worker has to
> get the mutex before exiting.
>
Right, there's worker_detach_from_pool() before kfree(worker), hadn't
thought of that. I want to limit how many locks I'm hoarding with the
reaper, but given that one is for attach/detach I think that's OK - and I
also really don't like this worker_wait_reaped() function, so will be happy
to get rid of it. I'll give this a try, thanks!
> And I don't think batching destruction is a good idea since
> it is not a hot path.
>
The batching is mostly there because checking & removing a worker from its
pool->idle_list has to be done under pool->lock, but changing its affinity
requires a sleepable context, so I batched that outside of the spinlock
section.
>> while (too_many_workers(pool)) {
>> - struct worker *worker;
>> unsigned long expires;
>> + unsigned long now = jiffies;
>>
>> /* idle_list is kept in LIFO order, check the last one */
>> worker = list_entry(pool->idle_list.prev, struct worker, entry);
>> expires = worker->last_active + IDLE_WORKER_TIMEOUT;
>>
>> - if (time_before(jiffies, expires)) {
>> - mod_timer(&pool->idle_timer, expires);
>> + /*
>> + * Careful: queueing a work item from here can and will cause a
>> + * self-deadlock when dealing with an unbound pool. However,
>> + * here the delay *cannot* be zero and *has* to be in the
>> + * future, which works.
>> + */
>> + if (time_before(now, expires)) {
>
> IMHO, using raw_spin_unlock_irq(&pool->lock) here is better than
> violating locking rules *overtly* and documenting that it can not be
> really violated. But It would bring a "goto" statement.
I was worried about serializing accesses to pool->idle_reaper_work and its
underlying timer (worker_enter_idle() vs idle_reaper_fn()), though I think
the worst that can happen if idle_reaper_fn() does that without holding
pool->lock is worker_enter_idle() pushing back the timer to
IDLE_WORKER_TIMEOUT (rather than (last_active + IDLE_WORKER_TIMEOUT) -
now).
>> + mod_delayed_work(system_unbound_wq,
>> + &pool->idle_reaper_work,
>> + expires - now);
>> break;
>> }
>> @@ -5030,11 +5128,8 @@ static void rebind_workers(struct worker_pool *pool)
>> * of all workers first and then clear UNBOUND. As we're called
>> * from CPU_ONLINE, the following shouldn't fail.
>> */
>> - for_each_pool_worker(worker, pool) {
>> - kthread_set_per_cpu(worker->task, pool->cpu);
>> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>> - pool->attrs->cpumask) < 0);
>> - }
>> + for_each_pool_worker(worker, pool)
>> + rebind_worker(worker, pool);
>
>
> It is better to skip the workers which are WORKER_DIE.
> Or just detach the worker when reaping it.
Hadn't even thought about this racing with to-be-destroyed workers. Having
worker_detach_from_pool() done by the worker itself is convenient for the
serialization with wq_pool_attach_mutex as you suggested, let me scratch my
head some more.
>
>>
>> raw_spin_lock_irq(&pool->lock);
>>
>> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
>> index e00b1204a8e9..a3d60e10a76f 100644
>> --- a/kernel/workqueue_internal.h
>> +++ b/kernel/workqueue_internal.h
>> @@ -46,6 +46,7 @@ struct worker {
>> unsigned int flags; /* X: flags */
>> int id; /* I: worker id */
>> int sleeping; /* None */
>> + int reaped; /* None */
>>
>> /*
>> * Opaque string set with work_set_desc(). Printed out with task
>> --
>> 2.31.1
>>
next prev parent reply other threads:[~2022-08-05 16:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-02 8:41 [RFC PATCH v3 0/3] workqueue: destroy_worker() vs isolated CPUs Valentin Schneider
2022-08-02 8:41 ` [RFC PATCH v3 1/3] workqueue: Hold wq_pool_mutex while affining tasks to wq_unbound_cpumask Valentin Schneider
2022-08-03 3:40 ` Lai Jiangshan
2022-08-04 11:40 ` Valentin Schneider
2022-08-05 2:43 ` Lai Jiangshan
2022-08-15 23:50 ` Tejun Heo
2022-08-18 14:33 ` [PATCH] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex Lai Jiangshan
2022-08-27 0:33 ` Tejun Heo
2022-08-30 9:32 ` Lai Jiangshan
2022-09-04 20:23 ` Tejun Heo
2022-08-30 14:16 ` [RFC PATCH v3 1/3] workqueue: Hold wq_pool_mutex while affining tasks to wq_unbound_cpumask Lai Jiangshan
2022-08-02 8:41 ` [RFC PATCH v3 2/3] workqueue: Unbind workers before sending them to exit() Valentin Schneider
2022-08-05 3:16 ` Lai Jiangshan
2022-08-05 16:47 ` Valentin Schneider [this message]
2022-08-02 8:41 ` [RFC PATCH v3 3/3] DEBUG-DO-NOT-MERGE: workqueue: kworker spawner Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xhsmh8ro2d4du.mognet@vschneid.remote.csb \
--to=vschneid@redhat.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox