public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves
@ 2015-09-29 21:47 Tejun Heo
  2015-09-29 21:47 ` [PATCH 1/5] percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team

Hello,

Mostly for historical reasons, percpu_ref atomic/percpu mode switching
operations require the users to synchronize different operations
although there are valid usage scenarios where different types of
operations racing against each other makes sense.  This unusual
requirement led to a subtle race condition in blk-mq which was spotted
by Akinobu Mita[1].

Akinobu proposed a blk-mq specific fix where it adds an enclosing
mutex around the percpu_ref operations; however, this is a percpu_ref
deficiency and fixing it at the source is the better long term
approach.  Unfortunately, this ended up being a somewhat invasive set
of changes and the right thing to do likely is applying Akinobu's fix
for 4.3 / -stable, applying this patchset for 4.4 window and reverting
Akinobu's patch after the 4.4 merge window.

Akinobu, can you please verify that this patchset makes the race
condition go away?

This patchset contains the following five patches.

 0001-percpu_ref-remove-unnecessary-RCU-grace-period-for-s.patch
 0002-percpu_ref-reorganize-__percpu_ref_switch_to_atomic-.patch
 0003-percpu_ref-unify-staggered-atomic-switching-wait-beh.patch
 0004-percpu_ref-restructure-operation-mode-switching.patch
 0005-percpu_ref-allow-operation-mode-switching-operations.patch

0001-0004 are prep patches.  0001 and 0003 involve minor behavior
changes which shouldn't affect users.  0005 implements interal
synchronization among percpu mode switching operations.

diffstat follows.  Thanks.

 lib/percpu-refcount.c |  166 +++++++++++++++++++++++++++-----------------------
 1 file changed, 92 insertions(+), 74 deletions(-)

[1] http://lkml.kernel.org/g/1443287365-4244-7-git-send-email-akinobu.mita@gmail.com

--
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
@ 2015-09-29 21:47 ` Tejun Heo
  2015-09-29 21:47 ` [PATCH 2/5] percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team, Tejun Heo

At the beginning, percpu_ref guaranteed a RCU grace period between a
call to percpu_ref_kill_and_confirm() and the invocation of the
confirmation callback.  This guarantee exposed internal implementation
details and got rescinded while switching over to sched RCU; however,
__percpu_ref_switch_to_atomic() still inserts a full sched RCU grace
period even when it can simply wait for the previous attempt.

Remove the unnecessary grace period and perform the confirmation
synchronously for staggered atomic switching attempts.  Update
comments accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 lib/percpu-refcount.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 6111bcb..95f3c13 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -177,17 +177,11 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 		call_rcu_sched(&ref->rcu, percpu_ref_switch_to_atomic_rcu);
 	} else if (confirm_switch) {
 		/*
-		 * Somebody already set ATOMIC.  Switching may still be in
-		 * progress.  @confirm_switch must be invoked after the
-		 * switching is complete and a full sched RCU grace period
-		 * has passed.  Wait synchronously for the previous
-		 * switching and schedule @confirm_switch invocation.
+		 * Somebody else already set ATOMIC.  Wait for its
+		 * completion and invoke @confirm_switch() directly.
 		 */
 		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
-		ref->confirm_switch = confirm_switch;
-
-		percpu_ref_get(ref);	/* put after confirmation */
-		call_rcu_sched(&ref->rcu, percpu_ref_call_confirm_rcu);
+		confirm_switch(ref);
 	}
 }
 
@@ -211,10 +205,6 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
  * but it may block if @confirm_kill is specified and @ref is already in
  * the process of switching to atomic mode.  In such cases, @confirm_switch
  * will be invoked after the switching is complete.
- *
- * Due to the way percpu_ref is implemented, @confirm_switch will be called
- * after at least one full sched RCU grace period has passed but this is an
- * implementation detail and must not be depended upon.
  */
 void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_switch)
@@ -290,11 +280,7 @@ void percpu_ref_switch_to_percpu(struct percpu_ref *ref)
  *
  * This function normally doesn't block and can be called from any context
  * but it may block if @confirm_kill is specified and @ref is in the
- * process of switching to atomic mode by percpu_ref_switch_atomic().
- *
- * Due to the way percpu_ref is implemented, @confirm_switch will be called
- * after at least one full sched RCU grace period has passed but this is an
- * implementation detail and must not be depended upon.
+ * process of switching to atomic mode by percpu_ref_switch_to_atomic().
  */
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_kill)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
  2015-09-29 21:47 ` [PATCH 1/5] percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation Tejun Heo
@ 2015-09-29 21:47 ` Tejun Heo
  2015-09-29 21:47 ` [PATCH 3/5] percpu_ref: unify staggered atomic switching wait behavior Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team, Tejun Heo

Reorganize __percpu_ref_switch_to_atomic() so that it looks
structurally similar to __percpu_ref_switch_to_percpu() and relocate
percpu_ref_switch_to_atomic so that the two internal functions are
co-located.

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 lib/percpu-refcount.c | 98 ++++++++++++++++++++++++++-------------------------
 1 file changed, 50 insertions(+), 48 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 95f3c13..622e93b 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -161,56 +161,30 @@ static void percpu_ref_noop_confirm_switch(struct percpu_ref *ref)
 static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 					  percpu_ref_func_t *confirm_switch)
 {
-	if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) {
-		/* switching from percpu to atomic */
-		ref->percpu_count_ptr |= __PERCPU_REF_ATOMIC;
-
-		/*
-		 * Non-NULL ->confirm_switch is used to indicate that
-		 * switching is in progress.  Use noop one if unspecified.
-		 */
-		WARN_ON_ONCE(ref->confirm_switch);
-		ref->confirm_switch =
-			confirm_switch ?: percpu_ref_noop_confirm_switch;
-
-		percpu_ref_get(ref);	/* put after confirmation */
-		call_rcu_sched(&ref->rcu, percpu_ref_switch_to_atomic_rcu);
-	} else if (confirm_switch) {
-		/*
-		 * Somebody else already set ATOMIC.  Wait for its
-		 * completion and invoke @confirm_switch() directly.
-		 */
-		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
-		confirm_switch(ref);
+	if (ref->percpu_count_ptr & __PERCPU_REF_ATOMIC) {
+		if (confirm_switch) {
+			/*
+			 * Somebody else already set ATOMIC.  Wait for its
+			 * completion and invoke @confirm_switch() directly.
+			 */
+			wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+			confirm_switch(ref);
+		}
+		return;
 	}
-}
 
-/**
- * percpu_ref_switch_to_atomic - switch a percpu_ref to atomic mode
- * @ref: percpu_ref to switch to atomic mode
- * @confirm_switch: optional confirmation callback
- *
- * There's no reason to use this function for the usual reference counting.
- * Use percpu_ref_kill[_and_confirm]().
- *
- * Schedule switching of @ref to atomic mode.  All its percpu counts will
- * be collected to the main atomic counter.  On completion, when all CPUs
- * are guaraneed to be in atomic mode, @confirm_switch, which may not
- * block, is invoked.  This function may be invoked concurrently with all
- * the get/put operations and can safely be mixed with kill and reinit
- * operations.  Note that @ref will stay in atomic mode across kill/reinit
- * cycles until percpu_ref_switch_to_percpu() is called.
- *
- * This function normally doesn't block and can be called from any context
- * but it may block if @confirm_kill is specified and @ref is already in
- * the process of switching to atomic mode.  In such cases, @confirm_switch
- * will be invoked after the switching is complete.
- */
-void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
-				 percpu_ref_func_t *confirm_switch)
-{
-	ref->force_atomic = true;
-	__percpu_ref_switch_to_atomic(ref, confirm_switch);
+	/* switching from percpu to atomic */
+	ref->percpu_count_ptr |= __PERCPU_REF_ATOMIC;
+
+	/*
+	 * Non-NULL ->confirm_switch is used to indicate that switching is
+	 * in progress.  Use noop one if unspecified.
+	 */
+	WARN_ON_ONCE(ref->confirm_switch);
+	ref->confirm_switch = confirm_switch ?: percpu_ref_noop_confirm_switch;
+
+	percpu_ref_get(ref);	/* put after confirmation */
+	call_rcu_sched(&ref->rcu, percpu_ref_switch_to_atomic_rcu);
 }
 
 static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
@@ -241,6 +215,34 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 }
 
 /**
+ * percpu_ref_switch_to_atomic - switch a percpu_ref to atomic mode
+ * @ref: percpu_ref to switch to atomic mode
+ * @confirm_switch: optional confirmation callback
+ *
+ * There's no reason to use this function for the usual reference counting.
+ * Use percpu_ref_kill[_and_confirm]().
+ *
+ * Schedule switching of @ref to atomic mode.  All its percpu counts will
+ * be collected to the main atomic counter.  On completion, when all CPUs
+ * are guaraneed to be in atomic mode, @confirm_switch, which may not
+ * block, is invoked.  This function may be invoked concurrently with all
+ * the get/put operations and can safely be mixed with kill and reinit
+ * operations.  Note that @ref will stay in atomic mode across kill/reinit
+ * cycles until percpu_ref_switch_to_percpu() is called.
+ *
+ * This function normally doesn't block and can be called from any context
+ * but it may block if @confirm_kill is specified and @ref is already in
+ * the process of switching to atomic mode.  In such cases, @confirm_switch
+ * will be invoked after the switching is complete.
+ */
+void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
+				 percpu_ref_func_t *confirm_switch)
+{
+	ref->force_atomic = true;
+	__percpu_ref_switch_to_atomic(ref, confirm_switch);
+}
+
+/**
  * percpu_ref_switch_to_percpu - switch a percpu_ref to percpu mode
  * @ref: percpu_ref to switch to percpu mode
  *
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] percpu_ref: unify staggered atomic switching wait behavior
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
  2015-09-29 21:47 ` [PATCH 1/5] percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation Tejun Heo
  2015-09-29 21:47 ` [PATCH 2/5] percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() Tejun Heo
@ 2015-09-29 21:47 ` Tejun Heo
  2015-09-29 21:47 ` [PATCH 4/5] percpu_ref: restructure operation mode switching Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team, Tejun Heo

When an atomic or percpu switching starts before the previous atomic
switching finishes, the taken behaviors are

* If the new atomic switching has confirmation callback, it waits
  for the previous atomic switching to complete.

* If the new percpu switching is the first percpu switching following
  the previous atomic switching, it waits the previous atomic
  switching to complete.

No percpu_ref user depends on these subtleties.  The only meaningful
part is that, if the caller ensures that atomic switching isn't in
progress, mode switching operations can be issued from any context.

This patch pulls the wait logic to the top of both switching functions
so that they always wait for the previous atomic switching to
complete.  This makes the behavior simpler and consistent for both
directions and will help allowing concurrent invocations of mode
switching functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 lib/percpu-refcount.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 622e93b..6a36597 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -161,15 +161,19 @@ static void percpu_ref_noop_confirm_switch(struct percpu_ref *ref)
 static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 					  percpu_ref_func_t *confirm_switch)
 {
+	/*
+	 * If the previous ATOMIC switching hasn't finished yet, wait for
+	 * its completion.  If the caller ensures that ATOMIC switching
+	 * isn't in progress, this function can be called from any context.
+	 * Do an extra confirm_switch test to circumvent the unconditional
+	 * might_sleep() in wait_event().
+	 */
+	if (ref->confirm_switch)
+		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+
 	if (ref->percpu_count_ptr & __PERCPU_REF_ATOMIC) {
-		if (confirm_switch) {
-			/*
-			 * Somebody else already set ATOMIC.  Wait for its
-			 * completion and invoke @confirm_switch() directly.
-			 */
-			wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+		if (confirm_switch)
 			confirm_switch(ref);
-		}
 		return;
 	}
 
@@ -180,7 +184,6 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 	 * Non-NULL ->confirm_switch is used to indicate that switching is
 	 * in progress.  Use noop one if unspecified.
 	 */
-	WARN_ON_ONCE(ref->confirm_switch);
 	ref->confirm_switch = confirm_switch ?: percpu_ref_noop_confirm_switch;
 
 	percpu_ref_get(ref);	/* put after confirmation */
@@ -192,13 +195,21 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
 	int cpu;
 
+	/*
+	 * If the previous ATOMIC switching hasn't finished yet, wait for
+	 * its completion.  If the caller ensures that ATOMIC switching
+	 * isn't in progress, this function can be called from any context.
+	 * Do an extra confirm_switch test to circumvent the unconditional
+	 * might_sleep() in wait_event().
+	 */
+	if (ref->confirm_switch)
+		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+
 	BUG_ON(!percpu_count);
 
 	if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC))
 		return;
 
-	wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
-
 	atomic_long_add(PERCPU_COUNT_BIAS, &ref->count);
 
 	/*
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] percpu_ref: restructure operation mode switching
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
                   ` (2 preceding siblings ...)
  2015-09-29 21:47 ` [PATCH 3/5] percpu_ref: unify staggered atomic switching wait behavior Tejun Heo
@ 2015-09-29 21:47 ` Tejun Heo
  2015-09-29 21:47 ` [PATCH 5/5] percpu_ref: allow operation mode switching operations to be called concurrently Tejun Heo
  2016-08-10 19:06 ` [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team, Tejun Heo

Restructure atomic/percpu mode switching.

* The users of __percpu_ref_switch_to_atomic/percpu() now call a new
  function __percpu_ref_switch_mode() which calls either of the
  original switching functions depending on the current state of
  ref->force_atomic and the __PERCPU_REF_DEAD flag.  The callers no
  longer check whether switching is necessary but always invoke
  __percpu_ref_switch_mode().

* !ref->confirm_switch waiting is collected into
  __percpu_ref_switch_mode().

This patch doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 lib/percpu-refcount.c | 64 +++++++++++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 35 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 6a36597..1e69c3b 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -161,16 +161,6 @@ static void percpu_ref_noop_confirm_switch(struct percpu_ref *ref)
 static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 					  percpu_ref_func_t *confirm_switch)
 {
-	/*
-	 * If the previous ATOMIC switching hasn't finished yet, wait for
-	 * its completion.  If the caller ensures that ATOMIC switching
-	 * isn't in progress, this function can be called from any context.
-	 * Do an extra confirm_switch test to circumvent the unconditional
-	 * might_sleep() in wait_event().
-	 */
-	if (ref->confirm_switch)
-		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
-
 	if (ref->percpu_count_ptr & __PERCPU_REF_ATOMIC) {
 		if (confirm_switch)
 			confirm_switch(ref);
@@ -195,16 +185,6 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
 	int cpu;
 
-	/*
-	 * If the previous ATOMIC switching hasn't finished yet, wait for
-	 * its completion.  If the caller ensures that ATOMIC switching
-	 * isn't in progress, this function can be called from any context.
-	 * Do an extra confirm_switch test to circumvent the unconditional
-	 * might_sleep() in wait_event().
-	 */
-	if (ref->confirm_switch)
-		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
-
 	BUG_ON(!percpu_count);
 
 	if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC))
@@ -225,6 +205,25 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 			  ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC);
 }
 
+static void __percpu_ref_switch_mode(struct percpu_ref *ref,
+				     percpu_ref_func_t *confirm_switch)
+{
+	/*
+	 * If the previous ATOMIC switching hasn't finished yet, wait for
+	 * its completion.  If the caller ensures that ATOMIC switching
+	 * isn't in progress, this function can be called from any context.
+	 * Do an extra confirm_switch test to circumvent the unconditional
+	 * might_sleep() in wait_event().
+	 */
+	if (ref->confirm_switch)
+		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+
+	if (ref->force_atomic || (ref->percpu_count_ptr & __PERCPU_REF_DEAD))
+		__percpu_ref_switch_to_atomic(ref, confirm_switch);
+	else
+		__percpu_ref_switch_to_percpu(ref);
+}
+
 /**
  * percpu_ref_switch_to_atomic - switch a percpu_ref to atomic mode
  * @ref: percpu_ref to switch to atomic mode
@@ -241,16 +240,15 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
  * operations.  Note that @ref will stay in atomic mode across kill/reinit
  * cycles until percpu_ref_switch_to_percpu() is called.
  *
- * This function normally doesn't block and can be called from any context
- * but it may block if @confirm_kill is specified and @ref is already in
- * the process of switching to atomic mode.  In such cases, @confirm_switch
- * will be invoked after the switching is complete.
+ * This function may block if @ref is in the process of switching to atomic
+ * mode.  If the caller ensures that @ref is not in the process of
+ * switching to atomic mode, this function can be called from any context.
  */
 void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_switch)
 {
 	ref->force_atomic = true;
-	__percpu_ref_switch_to_atomic(ref, confirm_switch);
+	__percpu_ref_switch_mode(ref, confirm_switch);
 }
 
 /**
@@ -267,17 +265,14 @@ void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
  * dying or dead, the actual switching takes place on the following
  * percpu_ref_reinit().
  *
- * This function normally doesn't block and can be called from any context
- * but it may block if @ref is in the process of switching to atomic mode
- * by percpu_ref_switch_atomic().
+ * This function may block if @ref is in the process of switching to atomic
+ * mode.  If the caller ensures that @ref is not in the process of
+ * switching to atomic mode, this function can be called from any context.
  */
 void percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 {
 	ref->force_atomic = false;
-
-	/* a dying or dead ref can't be switched to percpu mode w/o reinit */
-	if (!(ref->percpu_count_ptr & __PERCPU_REF_DEAD))
-		__percpu_ref_switch_to_percpu(ref);
+	__percpu_ref_switch_mode(ref, NULL);
 }
 
 /**
@@ -302,7 +297,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 		  "%s called more than once on %pf!", __func__, ref->release);
 
 	ref->percpu_count_ptr |= __PERCPU_REF_DEAD;
-	__percpu_ref_switch_to_atomic(ref, confirm_kill);
+	__percpu_ref_switch_mode(ref, confirm_kill);
 	percpu_ref_put(ref);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
@@ -324,7 +319,6 @@ void percpu_ref_reinit(struct percpu_ref *ref)
 
 	ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
 	percpu_ref_get(ref);
-	if (!ref->force_atomic)
-		__percpu_ref_switch_to_percpu(ref);
+	__percpu_ref_switch_mode(ref, NULL);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_reinit);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] percpu_ref: allow operation mode switching operations to be called concurrently
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
                   ` (3 preceding siblings ...)
  2015-09-29 21:47 ` [PATCH 4/5] percpu_ref: restructure operation mode switching Tejun Heo
@ 2015-09-29 21:47 ` Tejun Heo
  2016-08-10 19:06 ` [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-09-29 21:47 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team, Tejun Heo

percpu_ref initially didn't have explicit mode switching operations.
It started out in percpu mode and switched to atomic mode on kill and
then released.  Ensuring that kill operation is initiated only after
init completes was naturally the caller's responsibility.

percpu_ref_reinit() was introduced later but it didn't shift the
synchronization responsibility.  Reinit can't be performed until kill
is confirmed, so there was nothing to worry about
synchronization-wise.  Also, as both reinit and kill manipulate the
base reference, invocations of the same function couldn't be allowed
to race each other.

The latest additions of percpu_ref_switch_to_atomic/percpu() changed
the situation.  These two functions can be called any time as long as
the percpu_ref is between init and exit and thus there are valid valid
usage scenarios where these new functions race with each other or
against reinit/kill.  Mostly from inertia, f47ad4578461 ("percpu_ref:
decouple switching to percpu mode and reinit") still left
synchronization among percpu mode switching operations to its users.

That the new switch functions can be freely mixed with kill/reinit but
the operations themselves should be synchronized is too subtle a
requirement and led to a very subtle race condition in blk-mq freezing
path.

This patch fixes the situation by introducing percpu_ref_switch_lock
to protect mode switching operations.  This ensures that percpu-ref
users don't have to worry about mode changing operations racing
against each other, e.g. switch_to_percpu against kill, as long as the
sequence of operations is valid.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: http://lkml.kernel.org/g/1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
---
 lib/percpu-refcount.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 1e69c3b..a1868aa 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -33,6 +33,7 @@
 
 #define PERCPU_COUNT_BIAS	(1LU << (BITS_PER_LONG - 1))
 
+static DEFINE_SPINLOCK(percpu_ref_switch_lock);
 static DECLARE_WAIT_QUEUE_HEAD(percpu_ref_switch_waitq);
 
 static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref)
@@ -208,15 +209,15 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 static void __percpu_ref_switch_mode(struct percpu_ref *ref,
 				     percpu_ref_func_t *confirm_switch)
 {
+	lockdep_assert_held(&percpu_ref_switch_lock);
+
 	/*
 	 * If the previous ATOMIC switching hasn't finished yet, wait for
 	 * its completion.  If the caller ensures that ATOMIC switching
 	 * isn't in progress, this function can be called from any context.
-	 * Do an extra confirm_switch test to circumvent the unconditional
-	 * might_sleep() in wait_event().
 	 */
-	if (ref->confirm_switch)
-		wait_event(percpu_ref_switch_waitq, !ref->confirm_switch);
+	wait_event_lock_irq(percpu_ref_switch_waitq, !ref->confirm_switch,
+			    percpu_ref_switch_lock);
 
 	if (ref->force_atomic || (ref->percpu_count_ptr & __PERCPU_REF_DEAD))
 		__percpu_ref_switch_to_atomic(ref, confirm_switch);
@@ -247,8 +248,14 @@ static void __percpu_ref_switch_mode(struct percpu_ref *ref,
 void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_switch)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+
 	ref->force_atomic = true;
 	__percpu_ref_switch_mode(ref, confirm_switch);
+
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 
 /**
@@ -271,8 +278,14 @@ void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
  */
 void percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+
 	ref->force_atomic = false;
 	__percpu_ref_switch_mode(ref, NULL);
+
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 
 /**
@@ -293,12 +306,18 @@ void percpu_ref_switch_to_percpu(struct percpu_ref *ref)
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_kill)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+
 	WARN_ONCE(ref->percpu_count_ptr & __PERCPU_REF_DEAD,
 		  "%s called more than once on %pf!", __func__, ref->release);
 
 	ref->percpu_count_ptr |= __PERCPU_REF_DEAD;
 	__percpu_ref_switch_mode(ref, confirm_kill);
 	percpu_ref_put(ref);
+
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
 
@@ -315,10 +334,16 @@ EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
  */
 void percpu_ref_reinit(struct percpu_ref *ref)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+
 	WARN_ON_ONCE(!percpu_ref_is_zero(ref));
 
 	ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
 	percpu_ref_get(ref);
 	__percpu_ref_switch_mode(ref, NULL);
+
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_reinit);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves
  2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
                   ` (4 preceding siblings ...)
  2015-09-29 21:47 ` [PATCH 5/5] percpu_ref: allow operation mode switching operations to be called concurrently Tejun Heo
@ 2016-08-10 19:06 ` Tejun Heo
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2016-08-10 19:06 UTC (permalink / raw)
  To: axboe, akinobu.mita, tom.leiming
  Cc: kent.overstreet, cl, linux-kernel, kernel-team

On Tue, Sep 29, 2015 at 05:47:15PM -0400, Tejun Heo wrote:
> Hello,
> 
> Mostly for historical reasons, percpu_ref atomic/percpu mode switching
> operations require the users to synchronize different operations
> although there are valid usage scenarios where different types of
> operations racing against each other makes sense.  This unusual
> requirement led to a subtle race condition in blk-mq which was spotted
> by Akinobu Mita[1].
> 
> Akinobu proposed a blk-mq specific fix where it adds an enclosing
> mutex around the percpu_ref operations; however, this is a percpu_ref
> deficiency and fixing it at the source is the better long term
> approach.  Unfortunately, this ended up being a somewhat invasive set
> of changes and the right thing to do likely is applying Akinobu's fix
> for 4.3 / -stable, applying this patchset for 4.4 window and reverting
> Akinobu's patch after the 4.4 merge window.
> 
> Akinobu, can you please verify that this patchset makes the race
> condition go away?

Okay, this patchset got forgotten while waiting response from Akinobu.
I'm applying the patchset to percpu/for-4.9 as the changes are general
improvements to percpu_refcnt.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-10 19:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29 21:47 [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo
2015-09-29 21:47 ` [PATCH 1/5] percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation Tejun Heo
2015-09-29 21:47 ` [PATCH 2/5] percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() Tejun Heo
2015-09-29 21:47 ` [PATCH 3/5] percpu_ref: unify staggered atomic switching wait behavior Tejun Heo
2015-09-29 21:47 ` [PATCH 4/5] percpu_ref: restructure operation mode switching Tejun Heo
2015-09-29 21:47 ` [PATCH 5/5] percpu_ref: allow operation mode switching operations to be called concurrently Tejun Heo
2016-08-10 19:06 ` [PATCHSET percpu/for-4.4] percpu_ref: make mode switching operations synchronize themselves Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox