From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: mc@linux.vnet.ibm.com, Stephen Boyd <sboyd@codeaurora.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Nick Piggin <npiggin@kernel.dk>,
david@fromorbit.com,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Maciej Rutecki <maciej.rutecki@gmail.com>
Subject: Re: [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs
Date: Wed, 21 Dec 2011 22:02:39 +0000 [thread overview]
Message-ID: <20111221220239.GJ23916@ZenIV.linux.org.uk> (raw)
In-Reply-To: <4EF24C71.6000609@linux.vnet.ibm.com>
On Thu, Dec 22, 2011 at 02:45:29AM +0530, Srivatsa S. Bhat wrote:
> The following patch is the same as the patch you posted, but with this small
> change (aimed merely at making the code a bit easier to understand) and a
> commit message added. Please point out if this change seems bad for
> any reason. And if it is fine, Viro, can you please sign-off on this patch?
> (since this patch has code contributions from both you and me)
I can live with that. I still think it's not particulary useful, but...
Anyway,
ACKed-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I couldn't care less who ends up in the Author: fields. I can feed it to
Linus tonight, if you prefer it to go through the VFS tree. But IMO that's
properly a CPU-hotplug issue and VFS is only involved as it's the only
current user of br-locks. Should go into -stable as well, all way back to
2.6.36...
Sad that Nick is still MIA, BTW - it's his code and he'd been Cc'ed on the
thread all along... ;-/
> I tested this patch locally - the originally reported issue did not crop up
> (soft-lockups in VFS callpaths).
>
> ---
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] VFS: Fix race between CPU hotplug and lglocks
>
> Currently, the *_global_[un]lock_online() routines are not at all synchronized
> with CPU hotplug. Soft-lockups detected as a consequence of this race was
> reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
> for finding out that the root-cause of this issue is the race condition
> between br_write_[un]lock() and CPU hotplug, which results in the lock states
> getting messed up).
>
> Fixing this race by just adding {get,put}_online_cpus() at appropriate places
> in *_global_[un]lock_online() is not a good option, because, then suddenly
> br_write_[un]lock() would become blocking, whereas they have been kept as
> non-blocking all this time, and we would want to keep them that way.
>
> So, overall, we want to ensure 3 things:
> 1. br_write_lock() and br_write_unlock() must remain as non-blocking.
> 2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
> for different sets of CPUs.
> 3. Either prevent any new CPU online operation in between this lock-unlock, or
> ensure that the newly onlined CPU does not proceed with its corresponding
> per-cpu spinlock unlocked.
>
> To achieve all this:
> (a) We introduce a new spinlock that is taken by the *_global_lock_online()
> routine and released by the *_global_unlock_online() routine.
> (b) We register a callback for CPU hotplug notifications, and this callback
> takes the same spinlock as above.
> (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
> initialized in the lock_init() code, all future updates to it are done in
> the callback, under the above spinlock.
> (d) The above bitmap is used (instead of cpu_online_mask) while locking and
> unlocking the per-cpu locks.
>
> The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
> br_write_lock-unlock sequence is in progress, the callback keeps spinning,
> thus preventing the CPU online operation till the lock-unlock sequence is
> complete. This takes care of requirement (3).
>
> The bitmap that we maintain remains unmodified throughout the lock-unlock
> sequence, since all updates to it are managed by the callback, which takes
> the same spinlock as the one taken by the lock code and released only by the
> unlock routine. Combining this with (d) above, satisfies requirement (2).
>
> Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
> operations from racing with br_write_lock-unlock, requirement (1) is also
> taken care of.
>
> By the way, it is to be noted that a CPU offline operation can actually run
> in parallel with our lock-unlock sequence, because our callback doesn't react
> to notifications earlier than CPU_DEAD (in order to maintain our bitmap
> properly). And this means, since we use our own bitmap (which is stale, on
> purpose) during the lock-unlock sequence, we could end up unlocking the
> per-cpu lock of an offline CPU (because we had locked it earlier, when the
> CPU was online), in order to satisfy requirement (2). But this is harmless,
> though it looks a bit awkward.
>
> Debugged-by: Cong Meng <mc@linux.vnet.ibm.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> include/linux/lglock.h | 36 ++++++++++++++++++++++++++++++++----
> 1 files changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
> index f549056..87f402c 100644
> --- a/include/linux/lglock.h
> +++ b/include/linux/lglock.h
> @@ -22,6 +22,7 @@
> #include <linux/spinlock.h>
> #include <linux/lockdep.h>
> #include <linux/percpu.h>
> +#include <linux/cpu.h>
>
> /* can make br locks by using local lock for read side, global lock for write */
> #define br_lock_init(name) name##_lock_init()
> @@ -72,9 +73,31 @@
>
> #define DEFINE_LGLOCK(name) \
> \
> + DEFINE_SPINLOCK(name##_cpu_lock); \
> + cpumask_t name##_cpus __read_mostly; \
> DEFINE_PER_CPU(arch_spinlock_t, name##_lock); \
> DEFINE_LGLOCK_LOCKDEP(name); \
> \
> + static int \
> + name##_lg_cpu_callback(struct notifier_block *nb, \
> + unsigned long action, void *hcpu) \
> + { \
> + switch (action & ~CPU_TASKS_FROZEN) { \
> + case CPU_UP_PREPARE: \
> + spin_lock(&name##_cpu_lock); \
> + cpu_set((unsigned long)hcpu, name##_cpus); \
> + spin_unlock(&name##_cpu_lock); \
> + break; \
> + case CPU_UP_CANCELED: case CPU_DEAD: \
> + spin_lock(&name##_cpu_lock); \
> + cpu_clear((unsigned long)hcpu, name##_cpus); \
> + spin_unlock(&name##_cpu_lock); \
> + } \
> + return NOTIFY_OK; \
> + } \
> + static struct notifier_block name##_lg_cpu_notifier = { \
> + .notifier_call = name##_lg_cpu_callback, \
> + }; \
> void name##_lock_init(void) { \
> int i; \
> LOCKDEP_INIT_MAP(&name##_lock_dep_map, #name, &name##_lock_key, 0); \
> @@ -83,6 +106,11 @@
> lock = &per_cpu(name##_lock, i); \
> *lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; \
> } \
> + register_hotcpu_notifier(&name##_lg_cpu_notifier); \
> + get_online_cpus(); \
> + for_each_online_cpu(i) \
> + cpu_set(i, name##_cpus); \
> + put_online_cpus(); \
> } \
> EXPORT_SYMBOL(name##_lock_init); \
> \
> @@ -124,9 +152,9 @@
> \
> void name##_global_lock_online(void) { \
> int i; \
> - preempt_disable(); \
> + spin_lock(&name##_cpu_lock); \
> rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \
> - for_each_online_cpu(i) { \
> + for_each_cpu(i, &name##_cpus) { \
> arch_spinlock_t *lock; \
> lock = &per_cpu(name##_lock, i); \
> arch_spin_lock(lock); \
> @@ -137,12 +165,12 @@
> void name##_global_unlock_online(void) { \
> int i; \
> rwlock_release(&name##_lock_dep_map, 1, _RET_IP_); \
> - for_each_online_cpu(i) { \
> + for_each_cpu(i, &name##_cpus) { \
> arch_spinlock_t *lock; \
> lock = &per_cpu(name##_lock, i); \
> arch_spin_unlock(lock); \
> } \
> - preempt_enable(); \
> + spin_unlock(&name##_cpu_lock); \
> } \
> EXPORT_SYMBOL(name##_global_unlock_online); \
> \
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-12-21 22:02 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-19 3:36 [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs mengcong
2011-12-19 4:11 ` Al Viro
2011-12-19 5:00 ` Dave Chinner
2011-12-19 6:07 ` mengcong
2011-12-19 7:31 ` Srivatsa S. Bhat
2011-12-19 9:12 ` Stephen Boyd
2011-12-19 11:03 ` Srivatsa S. Bhat
2011-12-19 12:11 ` Al Viro
2011-12-19 20:23 ` Srivatsa S. Bhat
2011-12-19 20:52 ` Al Viro
2011-12-20 4:56 ` Srivatsa S. Bhat
2011-12-20 6:27 ` Al Viro
2011-12-20 7:28 ` Srivatsa S. Bhat
2011-12-20 9:37 ` mengcong
2011-12-20 10:36 ` Srivatsa S. Bhat
2011-12-20 11:08 ` Srivatsa S. Bhat
2011-12-20 12:50 ` Srivatsa S. Bhat
2011-12-20 14:06 ` Al Viro
2011-12-20 14:35 ` Srivatsa S. Bhat
2011-12-20 17:59 ` Al Viro
2011-12-20 19:12 ` Srivatsa S. Bhat
2011-12-20 19:58 ` Al Viro
2011-12-20 22:27 ` Dave Chinner
2011-12-20 23:31 ` Al Viro
2011-12-21 21:15 ` Srivatsa S. Bhat
2011-12-21 22:02 ` Al Viro [this message]
2011-12-21 22:12 ` Andrew Morton
2011-12-22 7:02 ` Al Viro
2011-12-22 7:20 ` Andrew Morton
2011-12-22 8:08 ` Al Viro
2011-12-22 8:17 ` Andi Kleen
2011-12-22 8:39 ` Al Viro
2011-12-22 8:22 ` Andi Kleen
2011-12-20 7:30 ` mengcong
2011-12-20 7:37 ` Srivatsa S. Bhat
2011-12-19 23:56 ` Dave Chinner
2011-12-20 4:05 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111221220239.GJ23916@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.rutecki@gmail.com \
--cc=mc@linux.vnet.ibm.com \
--cc=npiggin@kernel.dk \
--cc=sboyd@codeaurora.org \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.