From: Oleg Nesterov <oleg@tv-sign.ru>
To: Gautham R Shenoy <ego@in.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Rusty Russel <rusty@rustcorp.com.au>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Dipankar Sarma <dipankar@in.ibm.com>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC PATCH 4/5] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c
Date: Wed, 24 Oct 2007 17:38:18 +0400 [thread overview]
Message-ID: <20071024133818.GA82@tv-sign.ru> (raw)
In-Reply-To: <20071024053716.GD27074@in.ibm.com>
On 10/24, Gautham R Shenoy wrote:
>
(reordered)
> With get_online_cpus()/put_online_cpus(), we can eliminate
> the workqueue_mutex and reintroduce the workqueue_lock,
> which is a spinlock which serializes the accesses to the
> workqueues list.
This change is obviously good, can't it go into the previous patch?
Because,
> Solution is not to cleanup the worker thread. Instead let it remain
> even after the cpu goes offline. Since no one can queue any work
> on an offlined cpu, this thread will be forever sleeping, untill
> someone onlines the cpu.
I still think this patch is questionable. Please look at my previous
response http://marc.info/?l=linux-kernel&m=119262203729543
In short: with this patch it is not possible to guarantee that work->fun()
will run on the correct CPU.
> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
> {
> /*
> - * Our caller is either destroy_workqueue() or CPU_DEAD,
> - * workqueue_mutex protects cwq->thread
> + * Our caller is destroy_workqueue(). So warn on a double
> + * destroy.
> */
> - if (cwq->thread == NULL)
> + if (cwq->thread == NULL) {
> + WARN_ON(1);
Looks wrong. It is possible that cwq->thread == NULL, because currently we
never "shrink" cpu_populated_map.
> cleanup_workqueue_thread() in the CPU_DEAD and CPU_UP_CANCELLED path
> will cause a deadlock if the worker thread is executing a work item
> which is blocked on get_online_cpus(). This will lead to a irrecoverable
> hang.
Yes. But there is nothing new. Currently, work->func() can't share the locks
with cpu_down's patch. Not only only it can't take workqueue_mutex, it can't
take any other lock which could be taken by notifier callbacks, etc.
Can't we ignore this problem, at least for now? I believe we need intrusive
changes to solve this problem correctly. Perhaps I am wrong, of course, but
I don't see a simple solution.
Another option. Note that get_online_cpus() does more than just pinning
cpu maps, actually it blocks hotplug entirely. Now let's look at
schedule_on_each_cpu(), for example. It doesn't need to block hotplug,
it only needs a stable cpu_online_map.
Suppose for a moment that _cpu_down() does cpu_hotplug_done() earlier,
right after __cpu_die(cpu) which removes CPU from the map (yes, this
is wrong, I know). Now, we don't need to change workqueue_cpu_callback(),
work->func() can use get_online_cpus() without fear of deadlock.
So, can't we introduce 2 nested rw locks? The first one blocks cpu hotplug
(like get_online_cpus does currently), the second one just pins cpu maps.
I think most users needs only this, not more.
What do you think?
(Gautham, I apologize in advance, can't be responsive till weekend).
Oleg.
next prev parent reply other threads:[~2007-10-24 13:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-24 5:29 [RFC PATCH 0/5] Refcount based Cpu Hotplug. V2 Gautham R Shenoy
2007-10-24 5:30 ` [RFC PATCH 1/5] Refcount Based Cpu Hotplug implementation Gautham R Shenoy
2007-10-24 5:32 ` [RFC PATCH 2/5] Replace lock_cpu_hotplug() with get_online_cpus() Gautham R Shenoy
2007-10-24 5:34 ` [RFC PATCH 3/5] Replace per-subsystem mutexes " Gautham R Shenoy
2007-10-24 5:37 ` [RFC PATCH 4/5] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c Gautham R Shenoy
2007-10-24 7:21 ` Rusty Russell
2007-10-24 8:35 ` Gautham R Shenoy
2007-10-24 13:44 ` Oleg Nesterov
2007-10-24 13:38 ` Oleg Nesterov [this message]
2007-10-24 17:45 ` Gautham R Shenoy
2007-10-24 18:14 ` Oleg Nesterov
2007-10-24 5:39 ` [RFC PATCH 5/5] Update get_online_cpus() in Documentation/cpu-hotplug.c Gautham R Shenoy
2007-10-24 17:04 ` [RFC PATCH 0/5] Refcount based Cpu Hotplug. V2 Christoph Lameter
2007-10-24 18:00 ` Gautham R Shenoy
2007-10-24 18:17 ` Oleg Nesterov
2007-10-24 18:22 ` Gautham R Shenoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071024133818.GA82@tv-sign.ru \
--to=oleg@tv-sign.ru \
--cc=akpm@linux-foundation.org \
--cc=dipankar@in.ibm.com \
--cc=ego@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@linux-foundation.org \
--cc=vatsa@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox