From: Andrew Morton <akpm@linux-foundation.org>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jiri Kosina <jkosina@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Christoph Lameter <cl@linux-foundation.org>,
Pekka Enberg <penberg@kernel.org>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
Date: Wed, 3 Oct 2012 14:13:11 -0700 [thread overview]
Message-ID: <20121003141311.09fb3ffc.akpm@linux-foundation.org> (raw)
In-Reply-To: <506C3535.3070401@linux.vnet.ibm.com>
On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> kernel/cpu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> + BUG_ON(cpu_hotplug.refcount == 0);
> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> wake_up_process(cpu_hotplug.active_writer);
> mutex_unlock(&cpu_hotplug.lock);
I think calling BUG() here is a bit harsh. We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.
But a cpu-hotplug locking imbalance is a really really really minor
problem! So how about we emit a warning then try to fix things up?
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.
--- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
- BUG_ON(cpu_hotplug.refcount == 0);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
+ if (!--cpu_hotplug.refcount) {
+ if (WARN_ON(cpu_hotplug.refcount == -1))
+ cpu_hotplug.refcount++; /* try to fix things up */
+ if (unlikely(cpu_hotplug.active_writer))
+ wake_up_process(cpu_hotplug.active_writer);
+ }
mutex_unlock(&cpu_hotplug.lock);
}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jiri Kosina <jkosina@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Christoph Lameter <cl@linux-foundation.org>,
Pekka Enberg <penberg@kernel.org>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
Date: Wed, 3 Oct 2012 14:13:11 -0700 [thread overview]
Message-ID: <20121003141311.09fb3ffc.akpm@linux-foundation.org> (raw)
In-Reply-To: <506C3535.3070401@linux.vnet.ibm.com>
On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> kernel/cpu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> + BUG_ON(cpu_hotplug.refcount == 0);
> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> wake_up_process(cpu_hotplug.active_writer);
> mutex_unlock(&cpu_hotplug.lock);
I think calling BUG() here is a bit harsh. We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.
But a cpu-hotplug locking imbalance is a really really really minor
problem! So how about we emit a warning then try to fix things up?
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.
--- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
- BUG_ON(cpu_hotplug.refcount == 0);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
+ if (!--cpu_hotplug.refcount) {
+ if (WARN_ON(cpu_hotplug.refcount == -1))
+ cpu_hotplug.refcount++; /* try to fix things up */
+ if (unlikely(cpu_hotplug.active_writer))
+ wake_up_process(cpu_hotplug.active_writer);
+ }
mutex_unlock(&cpu_hotplug.lock);
}
_
next prev parent reply other threads:[~2012-10-03 21:13 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27 ` Jiri Kosina
2012-10-02 21:49 ` Jiri Kosina
2012-10-02 21:58 ` Jiri Kosina
2012-10-02 23:31 ` Paul E. McKenney
2012-10-02 23:48 ` Jiri Kosina
2012-10-03 0:15 ` Paul E. McKenney
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 0:45 ` Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 9:46 ` Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 12:53 ` Srivatsa S. Bhat
2012-10-03 21:13 ` Andrew Morton [this message]
2012-10-03 21:13 ` Andrew Morton
2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-05 3:24 ` Yasuaki Ishimatsu
2012-10-05 3:24 ` Yasuaki Ishimatsu
2012-10-05 5:35 ` Srivatsa S. Bhat
2012-10-05 5:35 ` Srivatsa S. Bhat
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:50 ` Paul E. McKenney
2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 16:00 ` Paul E. McKenney
2012-10-03 16:00 ` Paul E. McKenney
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:15 ` Christoph Lameter
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 14:34 ` Jiri Kosina
2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
2012-10-03 15:05 ` Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
2012-10-03 18:49 ` David Rientjes
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
2012-10-08 7:26 ` Jiri Kosina
2012-10-10 6:27 ` Pekka Enberg
2012-10-10 6:27 ` Pekka Enberg
2012-10-03 3:59 ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03 4:07 ` Paul E. McKenney
2012-10-03 4:15 ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17 ` Jiri Kosina
2012-10-03 3:35 ` Srivatsa S. Bhat
2012-10-03 3:44 ` Paul E. McKenney
2012-10-03 4:04 ` Srivatsa S. Bhat
2012-10-03 7:43 ` Jiri Kosina
2012-10-03 8:11 ` Srivatsa S. Bhat
2012-10-03 8:19 ` Jiri Kosina
2012-10-03 8:30 ` Srivatsa S. Bhat
2012-10-03 9:24 ` Jiri Kosina
2012-10-03 9:58 ` Srivatsa S. Bhat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121003141311.09fb3ffc.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=jkosina@suse.cz \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.