public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch v2] perf_event: fix a race condition in perf_remove_from_context()
@ 2014-09-02 22:27 Cong Wang
  2014-09-03 11:32 ` Peter Zijlstra
  2014-09-09 14:53 ` [tip:perf/urgent] perf: Fix " tip-bot for Cong Wang
  0 siblings, 2 replies; 3+ messages in thread
From: Cong Wang @ 2014-09-02 22:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Cong Wang, Cong Wang

From: Cong Wang <cwang@twopensource.com>

We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.

It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().
This should cure the above soft lockup.

Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---

v2: fix perf_install_in_context() too and re-generate this patch

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f9c1ed0..d640a8b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1524,6 +1524,11 @@ static void perf_remove_from_context(struct perf_event *event, bool detach_group
 	 */
 	if (ctx->is_active) {
 		raw_spin_unlock_irq(&ctx->lock);
+		/*
+		 * Reload the task pointer, it might have been changed by
+		 * a concurrent perf_event_context_sched_out().
+		 */
+		task = ctx->task;
 		goto retry;
 	}
 
@@ -1967,6 +1972,11 @@ perf_install_in_context(struct perf_event_context *ctx,
 	 */
 	if (ctx->is_active) {
 		raw_spin_unlock_irq(&ctx->lock);
+		/*
+		 * Reload the task pointer, it might have been changed by
+		 * a concurrent perf_event_context_sched_out().
+		 */
+		task = ctx->task;
 		goto retry;
 	}
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Patch v2] perf_event: fix a race condition in perf_remove_from_context()
  2014-09-02 22:27 [Patch v2] perf_event: fix a race condition in perf_remove_from_context() Cong Wang
@ 2014-09-03 11:32 ` Peter Zijlstra
  2014-09-09 14:53 ` [tip:perf/urgent] perf: Fix " tip-bot for Cong Wang
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2014-09-03 11:32 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, stable, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Cong Wang

On Tue, Sep 02, 2014 at 03:27:20PM -0700, Cong Wang wrote:
> From: Cong Wang <cwang@twopensource.com>
> 
> We saw a kernel soft lockup in perf_remove_from_context(),
> it looks like the `perf` process, when exiting, could not go
> out of the retry loop. Meanwhile, the target process was forking
> a child. So either the target process should execute the smp
> function call to deactive the event (if it was running) or it should
> do a context switch which deactives the event.
> 
> It seems we optimize out a context switch in perf_event_context_sched_out(),
> and what's more important, we still test an obsolete task pointer when
> retrying, so no one actually would deactive that event in this situation.
> Fix it directly by reloading the task pointer in perf_remove_from_context().
> This should cure the above soft lockup.
> 
> Cc: stable@vger.kernel.org
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:perf/urgent] perf: Fix a race condition in perf_remove_from_context()
  2014-09-02 22:27 [Patch v2] perf_event: fix a race condition in perf_remove_from_context() Cong Wang
  2014-09-03 11:32 ` Peter Zijlstra
@ 2014-09-09 14:53 ` tip-bot for Cong Wang
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Cong Wang @ 2014-09-09 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, hpa, mingo, torvalds, peterz, stable, acme,
	xiyou.wangcong, cwang, tglx

Commit-ID:  3577af70a2ce4853d58e57d832e687d739281479
Gitweb:     http://git.kernel.org/tip/3577af70a2ce4853d58e57d832e687d739281479
Author:     Cong Wang <cwang@twopensource.com>
AuthorDate: Tue, 2 Sep 2014 15:27:20 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 9 Sep 2014 06:53:42 +0200

perf: Fix a race condition in perf_remove_from_context()

We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.

It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().

This should cure the above soft lockup.

Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f9c1ed0..d640a8b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1524,6 +1524,11 @@ retry:
 	 */
 	if (ctx->is_active) {
 		raw_spin_unlock_irq(&ctx->lock);
+		/*
+		 * Reload the task pointer, it might have been changed by
+		 * a concurrent perf_event_context_sched_out().
+		 */
+		task = ctx->task;
 		goto retry;
 	}
 
@@ -1967,6 +1972,11 @@ retry:
 	 */
 	if (ctx->is_active) {
 		raw_spin_unlock_irq(&ctx->lock);
+		/*
+		 * Reload the task pointer, it might have been changed by
+		 * a concurrent perf_event_context_sched_out().
+		 */
+		task = ctx->task;
 		goto retry;
 	}
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-09-09 14:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-02 22:27 [Patch v2] perf_event: fix a race condition in perf_remove_from_context() Cong Wang
2014-09-03 11:32 ` Peter Zijlstra
2014-09-09 14:53 ` [tip:perf/urgent] perf: Fix " tip-bot for Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox