public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <vschneid@redhat.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>, Phil Auld <pauld@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [RFC PATCH v3 2/3] workqueue: Unbind workers before sending them to exit()
Date: Fri, 05 Aug 2022 17:47:09 +0100	[thread overview]
Message-ID: <xhsmh8ro2d4du.mognet@vschneid.remote.csb> (raw)
In-Reply-To: <CAJhGHyAzoa5Mb7cHd8oxbWOfgsGEt-8afTTVdjOWY8sgHY0Mcg@mail.gmail.com>

On 05/08/22 11:16, Lai Jiangshan wrote:
> On Tue, Aug 2, 2022 at 4:42 PM Valentin Schneider <vschneid@redhat.com> wrote:
>> +/*
>> + * Unlikely as it may be, a worker could wake after destroy_worker() has
>> + * happened but before reap_workers(). WORKER_DIE would be set in worker->flags,
>> + * so it would be able to kfree(worker) and head out to do_exit().
>> + *
>> + * Rather than make the reaper wait for each to-be-reaped kworker to exit and
>> + * kfree(worker) itself, make the kworkers (which have nothing to do but go
>> + * do_exit() anyway) wait for the reaper to be done with them.
>> + */
>> +static void worker_wait_reaped(struct worker *worker)
>> +{
>> +       WARN_ON_ONCE(current != worker->task);
>> +
>> +       for (;;) {
>> +               set_current_state(TASK_INTERRUPTIBLE);
>> +               if (READ_ONCE(worker->reaped))
>> +                       break;
>> +               schedule();
>> +       }
>> +       __set_current_state(TASK_RUNNING);
>> +}
>
>
> It is not a good idea to add this scheduler-ist code here.
>
> Using wq_pool_attach_mutex to protects the whole body of idle_reaper_fn()
> can stop the worker from freeing itself since the worker has to
> get the mutex before exiting.
>

Right, there's worker_detach_from_pool() before kfree(worker), hadn't
thought of that. I want to limit how many locks I'm hoarding with the
reaper, but given that one is for attach/detach I think that's OK - and I
also really don't like this worker_wait_reaped() function, so will be happy
to get rid of it. I'll give this a try, thanks!

> And I don't think batching destruction is a good idea since
> it is not a hot path.
>

The batching is mostly there because checking & removing a worker from its
pool->idle_list has to be done under pool->lock, but changing its affinity
requires a sleepable context, so I batched that outside of the spinlock
section.

>>         while (too_many_workers(pool)) {
>> -               struct worker *worker;
>>                 unsigned long expires;
>> +               unsigned long now = jiffies;
>>
>>                 /* idle_list is kept in LIFO order, check the last one */
>>                 worker = list_entry(pool->idle_list.prev, struct worker, entry);
>>                 expires = worker->last_active + IDLE_WORKER_TIMEOUT;
>>
>> -               if (time_before(jiffies, expires)) {
>> -                       mod_timer(&pool->idle_timer, expires);
>> +               /*
>> +                * Careful: queueing a work item from here can and will cause a
>> +                * self-deadlock when dealing with an unbound pool. However,
>> +                * here the delay *cannot* be zero and *has* to be in the
>> +                * future, which works.
>> +                */
>> +               if (time_before(now, expires)) {
>
> IMHO, using raw_spin_unlock_irq(&pool->lock) here is better than
> violating locking rules *overtly* and documenting that it can not be
> really violated. But It would bring a "goto" statement.

I was worried about serializing accesses to pool->idle_reaper_work and its
underlying timer (worker_enter_idle() vs idle_reaper_fn()), though I think
the worst that can happen if idle_reaper_fn() does that without holding
pool->lock is worker_enter_idle() pushing back the timer to
IDLE_WORKER_TIMEOUT (rather than (last_active + IDLE_WORKER_TIMEOUT) -
now).

>> +                       mod_delayed_work(system_unbound_wq,
>> +                                        &pool->idle_reaper_work,
>> +                                        expires - now);
>>                         break;
>>                 }

>> @@ -5030,11 +5128,8 @@ static void rebind_workers(struct worker_pool *pool)
>>          * of all workers first and then clear UNBOUND.  As we're called
>>          * from CPU_ONLINE, the following shouldn't fail.
>>          */
>> -       for_each_pool_worker(worker, pool) {
>> -               kthread_set_per_cpu(worker->task, pool->cpu);
>> -               WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>> -                                                 pool->attrs->cpumask) < 0);
>> -       }
>> +       for_each_pool_worker(worker, pool)
>> +               rebind_worker(worker, pool);
>
>
> It is better to skip the workers which are WORKER_DIE.
> Or just detach the worker when reaping it.

Hadn't even thought about this racing with to-be-destroyed workers. Having
worker_detach_from_pool() done by the worker itself is convenient for the
serialization with wq_pool_attach_mutex as you suggested, let me scratch my
head some more.

>
>>
>>         raw_spin_lock_irq(&pool->lock);
>>
>> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
>> index e00b1204a8e9..a3d60e10a76f 100644
>> --- a/kernel/workqueue_internal.h
>> +++ b/kernel/workqueue_internal.h
>> @@ -46,6 +46,7 @@ struct worker {
>>         unsigned int            flags;          /* X: flags */
>>         int                     id;             /* I: worker id */
>>         int                     sleeping;       /* None */
>> +       int                     reaped;         /* None */
>>
>>         /*
>>          * Opaque string set with work_set_desc().  Printed out with task
>> --
>> 2.31.1
>>


  reply	other threads:[~2022-08-05 16:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-02  8:41 [RFC PATCH v3 0/3] workqueue: destroy_worker() vs isolated CPUs Valentin Schneider
2022-08-02  8:41 ` [RFC PATCH v3 1/3] workqueue: Hold wq_pool_mutex while affining tasks to wq_unbound_cpumask Valentin Schneider
2022-08-03  3:40   ` Lai Jiangshan
2022-08-04 11:40     ` Valentin Schneider
2022-08-05  2:43       ` Lai Jiangshan
2022-08-15 23:50     ` Tejun Heo
2022-08-18 14:33       ` [PATCH] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex Lai Jiangshan
2022-08-27  0:33         ` Tejun Heo
2022-08-30  9:32           ` Lai Jiangshan
2022-09-04 20:23             ` Tejun Heo
2022-08-30 14:16   ` [RFC PATCH v3 1/3] workqueue: Hold wq_pool_mutex while affining tasks to wq_unbound_cpumask Lai Jiangshan
2022-08-02  8:41 ` [RFC PATCH v3 2/3] workqueue: Unbind workers before sending them to exit() Valentin Schneider
2022-08-05  3:16   ` Lai Jiangshan
2022-08-05 16:47     ` Valentin Schneider [this message]
2022-08-02  8:41 ` [RFC PATCH v3 3/3] DEBUG-DO-NOT-MERGE: workqueue: kworker spawner Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xhsmh8ro2d4du.mognet@vschneid.remote.csb \
    --to=vschneid@redhat.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox