linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	Josh Triplett <josh@joshtriplett.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
Date: Thu, 04 Oct 2012 11:46:07 +0530	[thread overview]
Message-ID: <506D29A7.1000805@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121003141311.09fb3ffc.akpm@linux-foundation.org>

On 10/04/2012 02:43 AM, Andrew Morton wrote:
> On Wed, 03 Oct 2012 18:23:09 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> The synchronization between CPU hotplug readers and writers is achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>> it. If we ever hit an imbalance between the two, we end up compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>> where the refcount can go negative.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  kernel/cpu.c |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..00d29bc 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>  	if (cpu_hotplug.active_writer == current)
>>  		return;
>>  	mutex_lock(&cpu_hotplug.lock);
>> +	BUG_ON(cpu_hotplug.refcount == 0);
>>  	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>  		wake_up_process(cpu_hotplug.active_writer);
>>  	mutex_unlock(&cpu_hotplug.lock);
> 
> I think calling BUG() here is a bit harsh.  We should only do that if
> there's a risk to proceeding: a risk of data loss, a reduced ability to
> analyse the underlying bug, etc.
> 
> But a cpu-hotplug locking imbalance is a really really really minor
> problem!  So how about we emit a warning then try to fix things up? 

That would be better indeed, thanks!

> This should increase the chance that the machine will keep running and
> so will increase the chance that a user will be able to report the bug
> to us.
>

Yep, sounds good.
 
> 
> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
> +++ a/kernel/cpu.c
> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>  	if (cpu_hotplug.active_writer == current)
>  		return;
>  	mutex_lock(&cpu_hotplug.lock);
> -	BUG_ON(cpu_hotplug.refcount == 0);
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> +	if (!--cpu_hotplug.refcount) {

This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we intended to detect).

> +		if (WARN_ON(cpu_hotplug.refcount == -1))
> +			cpu_hotplug.refcount++;	/* try to fix things up */
> +		if (unlikely(cpu_hotplug.active_writer))
> +			wake_up_process(cpu_hotplug.active_writer);
> +	}
>  	mutex_unlock(&cpu_hotplug.lock);
> 
>  }

So how about something like below:

------------------------------------------------------>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we can
continue to run.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
+
+	if (WARN_ON(!cpu_hotplug.refcount))
+		cpu_hotplug.refcount++; /* try to fix things up */
+
 	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
 		wake_up_process(cpu_hotplug.active_writer);
 	mutex_unlock(&cpu_hotplug.lock);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-10-04  6:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.LNX.2.00.1210021810350.23544@pobox.suse.cz>
     [not found] ` <20121002170149.GC2465@linux.vnet.ibm.com>
     [not found]   ` <alpine.LNX.2.00.1210022324050.23544@pobox.suse.cz>
     [not found]     ` <alpine.LNX.2.00.1210022331130.23544@pobox.suse.cz>
     [not found]       ` <alpine.LNX.2.00.1210022356370.23544@pobox.suse.cz>
     [not found]         ` <20121002233138.GD2465@linux.vnet.ibm.com>
     [not found]           ` <alpine.LNX.2.00.1210030142570.23544@pobox.suse.cz>
     [not found]             ` <20121003001530.GF2465@linux.vnet.ibm.com>
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat [this message]
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506D29A7.1000805@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).