From: Andrew Morton <akpm@linux-foundation.org>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: rusty@rustcorp.com.au, mingo@elte.hu, paulmck@linux.vnet.ibm.com,
linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 2/2] cpuhotplug: introduce try_get_online_cpus()
Date: Fri, 29 May 2009 13:31:18 -0700 [thread overview]
Message-ID: <20090529133118.1c7b16c2.akpm@linux-foundation.org> (raw)
In-Reply-To: <4A1F9CEE.5090305@cn.fujitsu.com>
On Fri, 29 May 2009 16:29:34 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
> get_online_cpus() is a typically coarsely granular lock.
> It's a source of ABBA deadlock.
>
> Thanks to the CPU notifiers, Some subsystem's global lock will
> be required after cpu_hotplug.rwlock. Subsystem's global lock
> is coarsely granular lock too, thus a lot's of lock in kernel
> should be required after cpu_hotplug.rwlock(if we need
> cpu_hotplug.rwlock held too)
>
> Otherwise it may come to a ABBA deadlock like this:
>
> thread 1 | thread 2
> _cpu_down() | Lock a-kernel-lock.
> cpu_hotplug_begin() |
> down_write(&cpu_hotplug.rwlock) |
> __raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
> ------------------------------------------------------------------------
> Lock a-kernel-lock.(wait thread2) | down_read(&cpu_hotplug.rwlock)
> (wait thread 1)
>
> But CPU online/offline are happened very rarely, get_online_cpus()
> returns success quickly in all probability.
> So it's an asinine behavior that get_online_cpus() is not allowed
> to be required after we had held "a-kernel-lock".
>
> To dispel the ABBA deadlock, this patch introduces
> try_get_online_cpus(). It returns fail very rarely. It gives the
> caller a chance to select an alternative way to finish works,
> instead of sleeping or deadlock.
>
We really really really don't want to add new trylocks. They're nasty
things, requiring that callers provide alternative fallback code paths
which are really hard to test. The original code writer _might_ have
performed runtime testing of the contention path when originally
writing the patch, but after that nobody will tesst that path at all
and there's a good chance that it will become broken over time and
nobody will know about it.
I assume that one advantage of your rwlock conversion patch is that
code paths such as the above are now checked by lockdep, yes? And this
is how you discovered the bug in some subsystem whcih you didn't tell
us about?
> ---
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 2643d84..98f5c4b 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;
>
> extern void get_online_cpus(void);
> extern void put_online_cpus(void);
> +extern int try_get_online_cpus(void);
> #define hotcpu_notifier(fn, pri) { \
> static struct notifier_block fn##_nb __cpuinitdata = \
> { .notifier_call = fn, .priority = pri }; \
> @@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);
>
> #define get_online_cpus() do { } while (0)
> #define put_online_cpus() do { } while (0)
> +#define try_get_online_cpus() (1)
> #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
> /* These aren't inline functions due to a GCC bug. */
> #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 62198ec..e948f19 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -66,6 +66,15 @@ void put_online_cpus(void)
> }
> EXPORT_SYMBOL_GPL(put_online_cpus);
>
> +int try_get_online_cpus(void)
> +{
> + might_sleep();
> + if (cpu_hotplug.active_writer == current)
> + return 1;
> + return down_read_trylock(&cpu_hotplug.rwlock);
> +
> +}
> +EXPORT_SYMBOL_GPL(try_get_online_cpus);
It's strange to add a might_sleep() to a function which doesn't sleep.
The patch adds no callers to this function. This is significant
because it would be quite interesting to find out which subsystem(s)
you've found to have this deadlock. I do think that we should look at
alternative (non-trylocky) ways of fixing them.
next prev parent reply other threads:[~2009-05-29 20:32 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-29 8:29 [PATCH 2/2] cpuhotplug: introduce try_get_online_cpus() Lai Jiangshan
2009-05-29 20:31 ` Andrew Morton [this message]
2009-06-01 2:42 ` Lai Jiangshan
2009-06-01 7:31 ` Rusty Russell
2009-06-01 16:19 ` Paul E. McKenney
2009-06-04 0:16 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090529133118.1c7b16c2.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox