From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Stephen Boyd <sboyd@codeaurora.org>
Cc: mc@linux.vnet.ibm.com, Alexander Viro <viro@zeniv.linux.org.uk>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Nick Piggin <npiggin@kernel.dk>,
david@fromorbit.com,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Maciej Rutecki <maciej.rutecki@gmail.com>
Subject: Re: [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs
Date: Mon, 19 Dec 2011 16:33:47 +0530 [thread overview]
Message-ID: <4EEF1A13.4000801@linux.vnet.ibm.com> (raw)
In-Reply-To: <4EEF0003.3010800@codeaurora.org>
On 12/19/2011 02:42 PM, Stephen Boyd wrote:
> On 12/18/2011 11:31 PM, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> I feel the following patch is a better fix for 2 reasons:
>>
>> 1. As Al Viro pointed out, if we do for_each_possible_cpus() then we
>> might
>> encounter unnecessary performance hit in some scenarios. So working with
>> only online cpus, safely(a.k.a race-free), if possible, would be a good
>> solution (which this patch implements).
>>
>> 2. *_global_lock_online() and *_global_unlock_online() needs fixing as
>> well
>> because, the names suggest that they lock/unlock per-CPU locks of only
>> the
>> currently online CPUs, but unfortunately they do not have any
>> synchronization
>> to prevent offlining those CPUs in between, if it happens to race with
>> a CPU
>> hotplug operation.
>>
>> And if we solve issue 2 above "carefully" (as mentioned in the
>> changelog below),
>> it solves this whole thing!
>
> We started seeing this same problem last week. I've come up with almost
> the same solution but you beat me to the list!
>
Oh :-)
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index f549056..583d1a8 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -126,6 +127,7 @@
>> int i; \
>> preempt_disable(); \
>> rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \
>> + get_online_cpus(); \
>> for_each_online_cpu(i) { \
>> arch_spinlock_t *lock; \
>> lock =&per_cpu(name##_lock, i); \
>> @@ -142,6 +144,7 @@
>> lock =&per_cpu(name##_lock, i); \
>> arch_spin_unlock(lock); \
>> } \
>> + put_online_cpus(); \
>> preempt_enable(); \
>> } \
>> EXPORT_SYMBOL(name##_global_unlock_online); \
>
> Don't you want to call {get,put}_online_cpus() outside the
> preempt_{disable,enable}()? Otherwise you are scheduling while atomic?
>
Oh right, thanks for spotting this! (and thanks for your Ack too!)
> With that fixed
>
> Acked-by: Stephen Boyd <sboyd@codeaurora.org>
>
> but I wonder if taking the hotplug mutex even for a short time reduces
> the effectiveness of these locks? Or is it more about fast readers and
> slow writers?
>
IMHO, we don't need to be concerned here because, {get,put}_online_cpus()
implement a refcounting solution, and they don't really serialize stuff
unnecessarily. The readers (those who prevent cpu hotplug, such as this lock-
unlock code) are fast and can be concurrent, while the writers (the task that
is doing the cpu hotplug) waits till all existing readers are gone/done with
their work.
So, since we are readers here, IMO, we don't have to worry about performance.
(I know that we get serialized just for a moment while incrementing the
refcount, but that should not be worrisome right?)
Moreover, using for_each_online_cpu() without using {get,put}_online_cpus()
around that, is plain wrong, because of the unhandled race with cpu hotplug.
IOW, our primary concern here is functionality, isn't it?
To summarize, in the current design of these VFS locks, using
{get,put}_online_cpus() is *essential* to fix a functionality-related bug,
(and not so bad performance-wise as well).
The following patch (v2) incorporates your comments:
---
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH v2] VFS: Fix race between CPU hotplug and *_global_[un]lock_online()
The *_global_[un]lock_online() macros defined in include/linux/lglock.h
can race with CPU hotplug operations. Fix this race by using the pair
get_online_cpus() and put_online_cpus() around them, so as to prevent CPU
hotplug happening at the same time.
But be careful to note the semantics here: if we enable CPU hotplug in-between
a lock_online() and the corresponding unlock_online(), the lock states can
get messed up, since we might end up for example, in a situation such as taking
a lock on an online CPU but not releasing it because that CPU was offline when
unlock_online() was invoked (thanks to Cong Meng for debugging this issue).
[Soft-lockups detected as a consequence of this issue was reported earlier in
https://lkml.org/lkml/2011/8/24/185.]
So, we are careful to allow CPU hotplug only after the lock-unlock sequence
is complete.
v2: Moved {get,put}_online_cpus() outside of preempt_{disable,enable} to avoid
scheduling while atomic.
Debugged-by: Cong Meng <mc@linux.vnet.ibm.com>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/lglock.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index f549056..7ef257d 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -22,6 +22,7 @@
#include <linux/spinlock.h>
#include <linux/lockdep.h>
#include <linux/percpu.h>
+#include <linux/cpu.h>
/* can make br locks by using local lock for read side, global lock for write */
#define br_lock_init(name) name##_lock_init()
@@ -124,6 +125,7 @@
\
void name##_global_lock_online(void) { \
int i; \
+ get_online_cpus(); \
preempt_disable(); \
rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \
for_each_online_cpu(i) { \
@@ -143,6 +145,7 @@
arch_spin_unlock(lock); \
} \
preempt_enable(); \
+ put_online_cpus(); \
} \
EXPORT_SYMBOL(name##_global_unlock_online); \
\
next prev parent reply other threads:[~2011-12-19 11:04 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-19 3:36 [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs mengcong
2011-12-19 4:11 ` Al Viro
2011-12-19 5:00 ` Dave Chinner
2011-12-19 6:07 ` mengcong
2011-12-19 7:31 ` Srivatsa S. Bhat
2011-12-19 9:12 ` Stephen Boyd
2011-12-19 11:03 ` Srivatsa S. Bhat [this message]
2011-12-19 12:11 ` Al Viro
2011-12-19 20:23 ` Srivatsa S. Bhat
2011-12-19 20:52 ` Al Viro
2011-12-20 4:56 ` Srivatsa S. Bhat
2011-12-20 6:27 ` Al Viro
2011-12-20 7:28 ` Srivatsa S. Bhat
2011-12-20 9:37 ` mengcong
2011-12-20 10:36 ` Srivatsa S. Bhat
2011-12-20 11:08 ` Srivatsa S. Bhat
2011-12-20 12:50 ` Srivatsa S. Bhat
2011-12-20 14:06 ` Al Viro
2011-12-20 14:35 ` Srivatsa S. Bhat
2011-12-20 17:59 ` Al Viro
2011-12-20 19:12 ` Srivatsa S. Bhat
2011-12-20 19:58 ` Al Viro
2011-12-20 22:27 ` Dave Chinner
2011-12-20 23:31 ` Al Viro
2011-12-21 21:15 ` Srivatsa S. Bhat
2011-12-21 22:02 ` Al Viro
2011-12-21 22:12 ` Andrew Morton
2011-12-22 7:02 ` Al Viro
2011-12-22 7:20 ` Andrew Morton
2011-12-22 8:08 ` Al Viro
2011-12-22 8:17 ` Andi Kleen
2011-12-22 8:39 ` Al Viro
2011-12-22 8:22 ` Andi Kleen
2011-12-20 7:30 ` mengcong
2011-12-20 7:37 ` Srivatsa S. Bhat
2011-12-19 23:56 ` Dave Chinner
2011-12-20 4:05 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EEF1A13.4000801@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.rutecki@gmail.com \
--cc=mc@linux.vnet.ibm.com \
--cc=npiggin@kernel.dk \
--cc=sboyd@codeaurora.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).