[RFC][PATCH 0/7] perf: more fuzzer inspired patches

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/7] perf: more fuzzer inspired patches
@ 2016-02-19 14:37 Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 1/7] perf: Close install vs exit race Peter Zijlstra
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

Most of these patches are a result of syz-kaller runs.

With these patches I get a significant reduction in fail, but we're not there
yet.  There is at least one known issue with unthrottling events (if only it
wouldn't not go into hiding whenever I add more debug code), and there are a
number of NMI watchdog triggers that so far make little sense.

There's a number of patches I'm not really confident in, so please have
a hard look at all of them.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 1/7] perf: Close install vs exit race
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 2/7] perf: Do not double free Peter Zijlstra
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-1.patch --]
[-- Type: text/plain, Size: 2712 bytes --]

The following scenario:

  CPU0					CPU1
  
  ctx = find_get_ctx();
  					perf_event_exit_task_context()
  mutex_lock(&ctx->mutex);
  perf_install_in_context(ctx, ...);
    /* NO-OP */
  mutex_unlock(&ctx->mutex);

  ...

  perf_release()
    WARN_ON_ONCE(event->state != STATE_EXIT);


Since the event doesn't pass through perf_remove_from_context()
because perf_install_in_context() NO-OPs because the ctx is dead, and
perf_event_exit_task_context() will not observe the event because its
not attached yet, the event->state will not be set.

Solve this by revalidating ctx->task after we acquire ctx->mutex and
failing the event creation as a whole.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2158,13 +2158,15 @@ perf_install_in_context(struct perf_even
 	 */
 	raw_spin_lock_irq(&ctx->lock);
 	task = ctx->task;
+
 	/*
-	 * Worse, we cannot even rely on the ctx actually existing anymore. If
-	 * between find_get_context() and perf_install_in_context() the task
-	 * went through perf_event_exit_task() its dead and we should not be
-	 * adding new events.
+	 * If between ctx = find_get_context() and mutex_lock(&ctx->mutex) the
+	 * ctx gets destroyed, we must not install an event into it.
+	 *
+	 * This is normally tested for after we acquire the mutex, so this is
+	 * a sanity check.
 	 */
-	if (task == TASK_TOMBSTONE) {
+	if (WARN_ON_ONCE(task == TASK_TOMBSTONE)) {
 		raw_spin_unlock_irq(&ctx->lock);
 		return;
 	}
@@ -8389,10 +8391,19 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (move_group) {
 		gctx = group_leader->ctx;
 		mutex_lock_double(&gctx->mutex, &ctx->mutex);
+		if (gctx->task == TASK_TOMBSTONE) {
+			err = -ESRCH;
+			goto err_locked;
+		}
 	} else {
 		mutex_lock(&ctx->mutex);
 	}
 
+	if (ctx->task == TASK_TOMBSTONE) {
+		err = -ESRCH;
+		goto err_locked;
+	}
+
 	if (!perf_event_validate_size(event)) {
 		err = -E2BIG;
 		goto err_locked;
@@ -8563,12 +8574,14 @@ perf_event_create_kernel_counter(struct
 
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
+	if (ctx->task == TASK_TOMBSTONE) {
+		err = -ESRCH;
+		goto err_unlock;
+	}
+
 	if (!exclusive_event_installable(event, ctx)) {
-		mutex_unlock(&ctx->mutex);
-		perf_unpin_context(ctx);
-		put_ctx(ctx);
 		err = -EBUSY;
-		goto err_free;
+		goto err_unlock;
 	}
 
 	perf_install_in_context(ctx, event, cpu);
@@ -8577,6 +8590,10 @@ perf_event_create_kernel_counter(struct
 
 	return event;
 
+err_unlock:
+	mutex_unlock(&ctx->mutex);
+	perf_unpin_context(ctx);
+	put_ctx(ctx);
 err_free:
 	free_event(event);
 err:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 2/7] perf: Do not double free
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 1/7] perf: Close install vs exit race Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 3/7] perf: Allow perf_release() with !event->ctx Peter Zijlstra
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-2.patch --]
[-- Type: text/plain, Size: 705 bytes --]

In case of: err_file: fput(event_file), we'll end up calling
perf_release() which in turn will free the event.

Do not then free the event _again_.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8520,7 +8520,12 @@ SYSCALL_DEFINE5(perf_event_open,
 	perf_unpin_context(ctx);
 	put_ctx(ctx);
 err_alloc:
-	free_event(event);
+	/*
+	 * If event_file is set, the fput() above will have called ->release()
+	 * and that will take care of freeing the event.
+	 */
+	if (!event_file)
+		free_event(event);
 err_cpus:
 	put_online_cpus();
 err_task:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 3/7] perf: Allow perf_release() with !event->ctx
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 1/7] perf: Close install vs exit race Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 2/7] perf: Do not double free Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec Peter Zijlstra
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-3.patch --]
[-- Type: text/plain, Size: 1208 bytes --]

In the err_file: fput(event_file) case, the event will not yet have
been attached to a context. However perf_release() does assume it has
been. Cure this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3754,9 +3754,19 @@ static void put_event(struct perf_event
  */
 int perf_event_release_kernel(struct perf_event *event)
 {
-	struct perf_event_context *ctx;
+	struct perf_event_context *ctx = event->ctx;
 	struct perf_event *child, *tmp;
 
+	/*
+	 * If we got here through err_file: fput(event_file); we will not have
+	 * attached to a context yet.
+	 */
+	if (!ctx) {
+		WARN_ON_ONCE(event->attach_state &
+				(PERF_ATTACH_CONTEXT|PERF_ATTACH_GROUP));
+		goto no_ctx;
+	}
+
 	if (!is_kernel_event(event))
 		perf_remove_from_owner(event);
 
@@ -3832,8 +3842,8 @@ int perf_event_release_kernel(struct per
 	}
 	mutex_unlock(&event->child_mutex);
 
-	/* Must be the last reference */
-	put_event(event);
+no_ctx:
+	put_event(event); /* Must be the 'last' reference */
 	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_event_release_kernel);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
                   ` (2 preceding siblings ...)
  2016-02-19 14:37 ` [RFC][PATCH 3/7] perf: Allow perf_release() with !event->ctx Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-23 15:27   ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 5/7] perf: Fix cloning Peter Zijlstra
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-4.patch --]
[-- Type: text/plain, Size: 1908 bytes --]

Oleg reported that enable_on_exec results in weird scale factors.

The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
scheduling") caused this by moving task_ctx_sched_out() from before
__perf_event_mask_enable() to after it.

The overlooked concequence of that change is that task_ctx_sched_out()
would update the ctx time fields, and now __perf_event_mask_enable()
uses stale time.

Fix this by adding an explicit time update.

While looking at this, I also found that we need an ctx->is_active
check in perf_install_in_context().

XXX: does this actually fix the reported issue? I'm not sure what the
reproduction case is. Also an earlier version made Jiri's machine
explode -- something I've not managed to reproduce either.

Fixes: 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2170,12 +2170,12 @@ perf_install_in_context(struct perf_even
 		raw_spin_unlock_irq(&ctx->lock);
 		return;
 	}
-	update_context_time(ctx);
-	/*
-	 * Update cgrp time only if current cgrp matches event->cgrp.
-	 * Must be done before calling add_event_to_ctx().
-	 */
-	update_cgrp_time_from_event(event);
+
+	if (ctx->is_active) {
+		update_context_time(ctx);
+		update_cgrp_time_from_event(event);
+	}
+
 	add_event_to_ctx(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
 
@@ -3122,6 +3122,12 @@ static void perf_event_enable_on_exec(in
 
 	cpuctx = __get_cpu_context(ctx);
 	perf_ctx_lock(cpuctx, ctx);
+
+	if (ctx->is_active) {
+		update_context_time(ctx);
+		update_cgrp_time_from_cpuctx(cpuctx);
+	}
+
 	list_for_each_entry(event, &ctx->event_list, event_entry)
 		enabled |= event_enable_on_exec(event, ctx);
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 5/7] perf: Fix cloning
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
                   ` (3 preceding siblings ...)
  2016-02-19 14:37 ` [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 6/7] perf: Fix race between event install and jump_labels Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 7/7] perf: Cure event->pending_disable race Peter Zijlstra
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-5.patch --]
[-- Type: text/plain, Size: 4857 bytes --]

Alexander reported that when the 'original' context gets destroyed, no
new clones happen.

This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.

perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.

However, since inherit_event() does:

	if (parent_event->parent)
		parent_event = parent_event->parent;

it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.

Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.

Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.

STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.

Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   29 ++++++++++++++---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -397,6 +397,7 @@ struct pmu {
  * enum perf_event_active_state - the states of a event
  */
 enum perf_event_active_state {
+	PERF_EVENT_STATE_DEAD		= -4,
 	PERF_EVENT_STATE_EXIT		= -3,
 	PERF_EVENT_STATE_ERROR		= -2,
 	PERF_EVENT_STATE_OFF		= -1,
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1645,7 +1645,7 @@ static void perf_group_detach(struct per
 
 static bool is_orphaned_event(struct perf_event *event)
 {
-	return event->state == PERF_EVENT_STATE_EXIT;
+	return event->state == PERF_EVENT_STATE_DEAD;
 }
 
 static inline int pmu_filter_match(struct perf_event *event)
@@ -1732,7 +1732,6 @@ group_sched_out(struct perf_event *group
 }
 
 #define DETACH_GROUP	0x01UL
-#define DETACH_STATE	0x02UL
 
 /*
  * Cross CPU call to remove a performance event
@@ -1752,8 +1751,6 @@ __perf_remove_from_context(struct perf_e
 	if (flags & DETACH_GROUP)
 		perf_group_detach(event);
 	list_del_event(event, ctx);
-	if (flags & DETACH_STATE)
-		event->state = PERF_EVENT_STATE_EXIT;
 
 	if (!ctx->nr_events && ctx->is_active) {
 		ctx->is_active = 0;
@@ -3776,22 +3773,24 @@ int perf_event_release_kernel(struct per
 
 	ctx = perf_event_ctx_lock(event);
 	WARN_ON_ONCE(ctx->parent_ctx);
-	perf_remove_from_context(event, DETACH_GROUP | DETACH_STATE);
-	perf_event_ctx_unlock(event, ctx);
+	perf_remove_from_context(event, DETACH_GROUP);
 
+	raw_spin_lock_irq(&ctx->lock);
 	/*
-	 * At this point we must have event->state == PERF_EVENT_STATE_EXIT,
-	 * either from the above perf_remove_from_context() or through
-	 * perf_event_exit_event().
+	 * Mark this even as STATE_DEAD, there is no external reference to it
+	 * anymore.
 	 *
-	 * Therefore, anybody acquiring event->child_mutex after the below
-	 * loop _must_ also see this, most importantly inherit_event() which
-	 * will avoid placing more children on the list.
+	 * Anybody acquiring event->child_mutex after the below loop _must_
+	 * also see this, most importantly inherit_event() which will avoid
+	 * placing more children on the list.
 	 *
 	 * Thus this guarantees that we will in fact observe and kill _ALL_
 	 * child events.
 	 */
-	WARN_ON_ONCE(event->state != PERF_EVENT_STATE_EXIT);
+	event->state = PERF_EVENT_STATE_DEAD;
+	raw_spin_unlock_irq(&ctx->lock);
+
+	perf_event_ctx_unlock(event, ctx);
 
 again:
 	mutex_lock(&event->child_mutex);
@@ -4004,7 +4003,7 @@ static bool is_event_hup(struct perf_eve
 {
 	bool no_children;
 
-	if (event->state != PERF_EVENT_STATE_EXIT)
+	if (event->state > PERF_EVENT_STATE_EXIT)
 		return false;
 
 	mutex_lock(&event->child_mutex);
@@ -8731,7 +8730,7 @@ perf_event_exit_event(struct perf_event
 	if (parent_event)
 		perf_group_detach(child_event);
 	list_del_event(child_event, child_ctx);
-	child_event->state = PERF_EVENT_STATE_EXIT; /* see perf_event_release_kernel() */
+	child_event->state = PERF_EVENT_STATE_EXIT; /* is_event_hup() */
 	raw_spin_unlock_irq(&child_ctx->lock);
 
 	/*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 6/7] perf: Fix race between event install and jump_labels
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
                   ` (4 preceding siblings ...)
  2016-02-19 14:37 ` [RFC][PATCH 5/7] perf: Fix cloning Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  2016-02-19 14:37 ` [RFC][PATCH 7/7] perf: Cure event->pending_disable race Peter Zijlstra
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-6.patch --]
[-- Type: text/plain, Size: 4579 bytes --]

perf_install_in_context() relies upon the context switch hooks to have
scheduled in events when the IPI misses its target -- after all, if
the task has moved from the CPU (or wasn't running at all), it will
have to context switch to run elsewhere.

This however doesn't appear to be happening.

It is possible for the IPI to not happen (task wasn't running) only to
later observe the task running with an inactive context.

The only possible explanation is that the context switch hooks are not
called. Therefore put in a sync_sched() after toggling the jump_label
to guarantee all CPUs will have them enabled before we install an
event.

A simple if (0->1) sync_sched() will not in fact work, because any
further increment can race and complete before the sync_sched().
Therefore we must jump through some hoops.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h |    6 ++---
 kernel/events/core.c       |   49 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 44 insertions(+), 11 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -906,7 +906,7 @@ perf_sw_event_sched(u32 event_id, u64 nr
 	}
 }
 
-extern struct static_key_deferred perf_sched_events;
+extern struct static_key_false perf_sched_events;
 
 static __always_inline bool
 perf_sw_migrate_enabled(void)
@@ -925,7 +925,7 @@ static inline void perf_event_task_migra
 static inline void perf_event_task_sched_in(struct task_struct *prev,
 					    struct task_struct *task)
 {
-	if (static_key_false(&perf_sched_events.key))
+	if (static_branch_unlikely(&perf_sched_events))
 		__perf_event_task_sched_in(prev, task);
 
 	if (perf_sw_migrate_enabled() && task->sched_migrated) {
@@ -942,7 +942,7 @@ static inline void perf_event_task_sched
 {
 	perf_sw_event_sched(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 0);
 
-	if (static_key_false(&perf_sched_events.key))
+	if (static_branch_unlikely(&perf_sched_events))
 		__perf_event_task_sched_out(prev, next);
 }
 
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -321,7 +321,13 @@ enum event_type_t {
  * perf_sched_events : >0 events exist
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
  */
-struct static_key_deferred perf_sched_events __read_mostly;
+
+static void perf_sched_delayed(struct work_struct *work);
+DEFINE_STATIC_KEY_FALSE(perf_sched_events);
+static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
+static DEFINE_MUTEX(perf_sched_mutex);
+static atomic_t perf_sched_count;
+
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(int, perf_sched_cb_usages);
 
@@ -3542,12 +3548,22 @@ static void unaccount_event(struct perf_
 	if (has_branch_stack(event))
 		dec = true;
 
-	if (dec)
-		static_key_slow_dec_deferred(&perf_sched_events);
+	if (dec) {
+		if (!atomic_add_unless(&perf_sched_count, -1, 1))
+			schedule_delayed_work(&perf_sched_work, HZ);
+	}
 
 	unaccount_event_cpu(event, event->cpu);
 }
 
+static void perf_sched_delayed(struct work_struct *work)
+{
+	mutex_lock(&perf_sched_mutex);
+	if (atomic_dec_and_test(&perf_sched_count))
+		static_branch_disable(&perf_sched_events);
+	mutex_unlock(&perf_sched_mutex);
+}
+
 /*
  * The following implement mutual exclusion of events on "exclusive" pmus
  * (PERF_PMU_CAP_EXCLUSIVE). Such pmus can only have one event scheduled
@@ -7786,8 +7802,28 @@ static void account_event(struct perf_ev
 	if (is_cgroup_event(event))
 		inc = true;
 
-	if (inc)
-		static_key_slow_inc(&perf_sched_events.key);
+	if (inc) {
+		if (atomic_inc_not_zero(&perf_sched_count))
+			goto enabled;
+
+		mutex_lock(&perf_sched_mutex);
+		if (!atomic_read(&perf_sched_count)) {
+			static_branch_enable(&perf_sched_events);
+			/*
+			 * Guarantee that all CPUs observe they key change and
+			 * call the perf scheduling hooks before proceeding to
+			 * install events that need them.
+			 */
+			synchronize_sched();
+		}
+		/*
+		 * Now that we have waited for the sync_sched(), allow further
+		 * increments to by-pass the mutex.
+		 */
+		atomic_inc(&perf_sched_count);
+		mutex_unlock(&perf_sched_mutex);
+	}
+enabled:
 
 	account_event_cpu(event, event->cpu);
 }
@@ -9352,9 +9388,6 @@ void __init perf_event_init(void)
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
 
-	/* do not patch jump label more than once per second */
-	jump_label_rate_limit(&perf_sched_events, HZ);
-
 	/*
 	 * Build time assertion that we keep the data_head at the intended
 	 * location.  IOW, validation we got the __reserved[] size right.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH 7/7] perf: Cure event->pending_disable race
  2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
                   ` (5 preceding siblings ...)
  2016-02-19 14:37 ` [RFC][PATCH 6/7] perf: Fix race between event install and jump_labels Peter Zijlstra
@ 2016-02-19 14:37 ` Peter Zijlstra
  6 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-19 14:37 UTC (permalink / raw)
  To: peterz, mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

[-- Attachment #1: peterz-perf-more-fixes-9.patch --]
[-- Type: text/plain, Size: 1330 bytes --]

Because event_sched_out() checks event->pending_disable _before_
actually disabling the event, it can happen that the event fires after
it checks but before it gets disabled.

This would leave event->pending_disable set and the queued irq_work
will try and process it.

However, if the event trigger was during schedule(), the event might
have been de-scheduled by the time the irq_work runs, and
perf_event_disable_local() will fail.

Fix this by checking event->pending_disable _after_ we call
event->pmu->del(). This depends on the latter being a compiler
barrier, such that the compiler does not lift the load and re-creates
the problem.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1696,14 +1696,14 @@ event_sched_out(struct perf_event *event

 	perf_pmu_disable(event->pmu);

+	event->tstamp_stopped = tstamp;
+	event->pmu->del(event, 0);
+	event->oncpu = -1;
 	event->state = PERF_EVENT_STATE_INACTIVE;
 	if (event->pending_disable) {
 		event->pending_disable = 0;
 		event->state = PERF_EVENT_STATE_OFF;
 	}
-	event->tstamp_stopped = tstamp;
-	event->pmu->del(event, 0);
-	event->oncpu = -1;

 	if (!is_software_event(event))
 		cpuctx->active_oncpu--;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-19 14:37 ` [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec Peter Zijlstra
@ 2016-02-23 15:27   ` Peter Zijlstra
  2016-02-23 15:48     ` Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-23 15:27 UTC (permalink / raw)
  To: mingo, alexander.shishkin, eranian
  Cc: linux-kernel, vince, dvyukov, andi, jolsa, sasha.levin, oleg

On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> Oleg reported that enable_on_exec results in weird scale factors.
> 
> The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> scheduling") caused this by moving task_ctx_sched_out() from before
> __perf_event_mask_enable() to after it.
> 
> The overlooked concequence of that change is that task_ctx_sched_out()
> would update the ctx time fields, and now __perf_event_mask_enable()
> uses stale time.
> 
> Fix this by adding an explicit time update.
> 
> While looking at this, I also found that we need an ctx->is_active
> check in perf_install_in_context().
> 
> XXX: does this actually fix the reported issue? I'm not sure what the
> reproduction case is. Also an earlier version made Jiri's machine
> explode -- something I've not managed to reproduce either.

Jiri, can you have a look at this and perhaps share the reproducer?

> Fixes: 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling")
> Reported-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |   18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2170,12 +2170,12 @@ perf_install_in_context(struct perf_even
>  		raw_spin_unlock_irq(&ctx->lock);
>  		return;
>  	}
> -	update_context_time(ctx);
> -	/*
> -	 * Update cgrp time only if current cgrp matches event->cgrp.
> -	 * Must be done before calling add_event_to_ctx().
> -	 */
> -	update_cgrp_time_from_event(event);
> +
> +	if (ctx->is_active) {
> +		update_context_time(ctx);
> +		update_cgrp_time_from_event(event);
> +	}
> +
>  	add_event_to_ctx(event, ctx);
>  	raw_spin_unlock_irq(&ctx->lock);
>  
> @@ -3122,6 +3122,12 @@ static void perf_event_enable_on_exec(in
>  
>  	cpuctx = __get_cpu_context(ctx);
>  	perf_ctx_lock(cpuctx, ctx);
> +
> +	if (ctx->is_active) {
> +		update_context_time(ctx);
> +		update_cgrp_time_from_cpuctx(cpuctx);
> +	}
> +
>  	list_for_each_entry(event, &ctx->event_list, event_entry)
>  		enabled |= event_enable_on_exec(event, ctx);
>  
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 15:27   ` Peter Zijlstra
@ 2016-02-23 15:48     ` Jiri Olsa
  2016-02-23 16:35       ` Pratyush Anand
  2016-02-23 21:44       ` Jiri Olsa
  0 siblings, 2 replies; 18+ messages in thread
From: Jiri Olsa @ 2016-02-23 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, alexander.shishkin, eranian, linux-kernel, vince, dvyukov,
	andi, sasha.levin, oleg, Pratyush Anand

On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> > Oleg reported that enable_on_exec results in weird scale factors.
> > 
> > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> > scheduling") caused this by moving task_ctx_sched_out() from before
> > __perf_event_mask_enable() to after it.
> > 
> > The overlooked concequence of that change is that task_ctx_sched_out()
> > would update the ctx time fields, and now __perf_event_mask_enable()
> > uses stale time.
> > 
> > Fix this by adding an explicit time update.
> > 
> > While looking at this, I also found that we need an ctx->is_active
> > check in perf_install_in_context().
> > 
> > XXX: does this actually fix the reported issue? I'm not sure what the
> > reproduction case is. Also an earlier version made Jiri's machine
> > explode -- something I've not managed to reproduce either.
> 
> Jiri, can you have a look at this and perhaps share the reproducer?

yep, I'm testing this patchset, but got stuck with 'crash' tool to get
some reasonable output.. got stuck on unrelated sched deadlock ;-)

the reproducer is described in this email:
http://marc.info/?l=linux-kernel&m=145568006709552&w=2

CC-ing Pratyush

jjirka

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 15:48     ` Jiri Olsa
@ 2016-02-23 16:35       ` Pratyush Anand
  2016-02-23 17:47         ` Peter Zijlstra
  2016-02-23 21:44       ` Jiri Olsa
  1 sibling, 1 reply; 18+ messages in thread
From: Pratyush Anand @ 2016-02-23 16:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

Hi Peter/Jiri,

On 23/02/2016:04:48:49 PM, Jiri Olsa wrote:
> On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> > > Oleg reported that enable_on_exec results in weird scale factors.
> > > 
> > > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> > > scheduling") caused this by moving task_ctx_sched_out() from before
> > > __perf_event_mask_enable() to after it.
> > > 
> > > The overlooked concequence of that change is that task_ctx_sched_out()
> > > would update the ctx time fields, and now __perf_event_mask_enable()
> > > uses stale time.
> > > 
> > > Fix this by adding an explicit time update.
> > > 
> > > While looking at this, I also found that we need an ctx->is_active
> > > check in perf_install_in_context().
> > > 
> > > XXX: does this actually fix the reported issue? I'm not sure what the
> > > reproduction case is. Also an earlier version made Jiri's machine
> > > explode -- something I've not managed to reproduce either.
> > 
> > Jiri, can you have a look at this and perhaps share the reproducer?
> 
> yep, I'm testing this patchset, but got stuck with 'crash' tool to get
> some reasonable output.. got stuck on unrelated sched deadlock ;-)
> 
> the reproducer is described in this email:
> http://marc.info/?l=linux-kernel&m=145568006709552&w=2
> 
> CC-ing Pratyush

Thanks for CCing.

Its better with this patch, still count is 1 more in case of higher probe hits (
like 65535 times).

Reproducer is:
---------------------------------------------------------
git clone https://github.com/rfmvh/perftool-testsuite.git
cd perftool-testsuite/base_probe/
./setup.sh 
for i in 1 2 3 103 997 65535; do
perf probe -x examples/exact_counts --add f_${i}x
done
perf stat -x';' -e 'probe_exact:*' examples/exact_counts
---------------------------------------------------------

I see following with above code:

65536;;probe_exact:f_65535x;84476560;100.00
997;;probe_exact:f_997x;84476560;100.00
103;;probe_exact:f_103x;84476560;100.00
3;;probe_exact:f_3x;84476560;100.00
2;;probe_exact:f_2x;84476560;100.00
1;;probe_exact:f_1x;84476560;100.00

~Pratyush

PS: Do I need to take all patches of series? Currently I have taken only this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 16:35       ` Pratyush Anand
@ 2016-02-23 17:47         ` Peter Zijlstra
  2016-02-24 11:53           ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-23 17:47 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: Jiri Olsa, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

On Tue, Feb 23, 2016 at 10:05:50PM +0530, Pratyush Anand wrote:
> Its better with this patch, still count is 1 more in case of higher probe hits (
> like 65535 times).

Ah, ok, I'll go try again.

> PS: Do I need to take all patches of series? Currently I have taken only this.

No, they're all quite independent.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 15:48     ` Jiri Olsa
  2016-02-23 16:35       ` Pratyush Anand
@ 2016-02-23 21:44       ` Jiri Olsa
  2016-02-26  2:25         ` Oleg Nesterov
  1 sibling, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-02-23 21:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, alexander.shishkin, eranian, linux-kernel, vince, dvyukov,
	andi, sasha.levin, oleg, Pratyush Anand

On Tue, Feb 23, 2016 at 04:48:49PM +0100, Jiri Olsa wrote:
> On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> > > Oleg reported that enable_on_exec results in weird scale factors.
> > > 
> > > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> > > scheduling") caused this by moving task_ctx_sched_out() from before
> > > __perf_event_mask_enable() to after it.
> > > 
> > > The overlooked concequence of that change is that task_ctx_sched_out()
> > > would update the ctx time fields, and now __perf_event_mask_enable()
> > > uses stale time.
> > > 
> > > Fix this by adding an explicit time update.
> > > 
> > > While looking at this, I also found that we need an ctx->is_active
> > > check in perf_install_in_context().
> > > 
> > > XXX: does this actually fix the reported issue? I'm not sure what the
> > > reproduction case is. Also an earlier version made Jiri's machine
> > > explode -- something I've not managed to reproduce either.
> > 
> > Jiri, can you have a look at this and perhaps share the reproducer?
> 
> yep, I'm testing this patchset, but got stuck with 'crash' tool to get
> some reasonable output.. got stuck on unrelated sched deadlock ;-)
> 
> the reproducer is described in this email:
> http://marc.info/?l=linux-kernel&m=145568006709552&w=2

so I finally got some reasonable backtrace and figured that crash finally:

 #7 [ffff8802751afcd0] general_protection at ffffffff817a69e8
    [exception RIP: special_mapping_fault+47]
    RIP: ffffffff811e40df  RSP: ffff8802751afd88  RFLAGS: 00010282
    RAX: ffff8802747e8b68  RBX: 00007fffffffe080  RCX: c4712d0070657267
    RDX: ffff8802751afd98  RSI: ffff8802742c4f00  RDI: ffff8802747e8b68
    RBP: ffff8802751afd88   R8: 0000000000000000   R9: ffff8802751afe58
    R10: 00000000000001fe  R11: 00003fffffe00000  R12: ffff8802742c4f00
    R13: ffff8802751afe58  R14: 0000000000000000  R15: ffff880273f59ff8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #8 [ffff8802751afd90] __do_fault at ffffffff811db505
 #9 [ffff8802751afdf8] handle_mm_fault at ffffffff811e0b03
#10 [ffff8802751afec8] __do_page_fault at ffffffff8106734a
#11 [ffff8802751aff20] do_page_fault at ffffffff810675df
#12 [ffff8802751aff50] page_fault at ffffffff817a6a48


it was caused by:
  - f872f5400cc0 mm: Add a vm_special_mapping.fault() method
    that added call of vm_special_mapping::fault if it's defined

  - and uprobes code not initializing this fault pointer properly,
    attached patch fixed the issue for me,
    Oleg, I'm not sure this is how you want to fix this though..


however I still see the off by 1 as Pratyush said:
  65536;;probe_exact:f_65535x;132185462;100.00

I have another patch making the ena/run times equal for software
events in read syscal, and that obviously works.. but I'm not sure
how fix this otherwise ATM

thanks,
jirka


---
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0167679182c0..0c045aad28a2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1169,7 +1169,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
 	uprobe_opcode_t insn = UPROBE_SWBP_INSN;
 	struct xol_area *area;
 
-	area = kmalloc(sizeof(*area), GFP_KERNEL);
+	area = kzalloc(sizeof(*area), GFP_KERNEL);
 	if (unlikely(!area))
 		goto out;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 17:47         ` Peter Zijlstra
@ 2016-02-24 11:53           ` Peter Zijlstra
  2016-02-24 14:02             ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-24 11:53 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: Jiri Olsa, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

On Tue, Feb 23, 2016 at 06:47:41PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 23, 2016 at 10:05:50PM +0530, Pratyush Anand wrote:
> > Its better with this patch, still count is 1 more in case of higher probe hits (
> > like 65535 times).
> 
> Ah, ok, I'll go try again.

OK, so the below seems to cure this for me, but now I'm hurting my head
to make the same true for perf_install_in_context(), because 'tricky' :/

(the below is a fold of 2 patches, I'll send out a new series once I get
my head around the install_in_context muck :/)

---
 kernel/events/core.c | 84 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 31 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94c47e3f9a0a..8326a7f5729c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -314,7 +314,8 @@ static void event_function_call(struct perf_event *event, event_f func, void *da
 enum event_type_t {
 	EVENT_FLEXIBLE = 0x1,
 	EVENT_PINNED = 0x2,
-	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
+	EVENT_TIME = 0x4,
+	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED | EVENT_TIME,
 };
 
 /*
@@ -1288,16 +1289,18 @@ static u64 perf_event_time(struct perf_event *event)
 
 /*
  * Update the total_time_enabled and total_time_running fields for a event.
- * The caller of this function needs to hold the ctx->lock.
  */
 static void update_event_times(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
 	u64 run_end;
 
+	lockdep_assert_held(&ctx->lock);
+
 	if (event->state < PERF_EVENT_STATE_INACTIVE ||
 	    event->group_leader->state < PERF_EVENT_STATE_INACTIVE)
 		return;
+
 	/*
 	 * in cgroup mode, time_enabled represents
 	 * the time the event was enabled AND active
@@ -2063,14 +2066,27 @@ static void add_event_to_ctx(struct perf_event *event,
 	event->tstamp_stopped = tstamp;
 }
 
-static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
-			       struct perf_event_context *ctx);
+static void ctx_sched_out(struct perf_event_context *ctx,
+			  struct perf_cpu_context *cpuctx,
+			  enum event_type_t event_type);
 static void
 ctx_sched_in(struct perf_event_context *ctx,
 	     struct perf_cpu_context *cpuctx,
 	     enum event_type_t event_type,
 	     struct task_struct *task);
 
+static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
+			       struct perf_event_context *ctx)
+{
+	if (!cpuctx->task_ctx)
+		return;
+
+	if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
+		return;
+
+	ctx_sched_out(ctx, cpuctx, EVENT_ALL);
+}
+
 static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
 				struct perf_event_context *ctx,
 				struct task_struct *task)
@@ -2219,17 +2235,17 @@ static void __perf_event_enable(struct perf_event *event,
 	    event->state <= PERF_EVENT_STATE_ERROR)
 		return;
 
-	update_context_time(ctx);
+	ctx_sched_out(ctx, cpuctx, EVENT_TIME);
+
 	__perf_event_mark_enabled(event);
 
 	if (!ctx->is_active)
 		return;
 
 	if (!event_filter_match(event)) {
-		if (is_cgroup_event(event)) {
-			perf_cgroup_set_timestamp(current, ctx); // XXX ?
+		if (is_cgroup_event(event))
 			perf_cgroup_defer_enabled(event);
-		}
+		ctx_sched_in(ctx, cpuctx, EVENT_TIME, current);
 		return;
 	}
 
@@ -2237,8 +2253,10 @@ static void __perf_event_enable(struct perf_event *event,
 	 * If the event is in a group and isn't the group leader,
 	 * then don't put it on unless the group is on.
 	 */
-	if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
+	if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE) {
+		ctx_sched_in(ctx, cpuctx, EVENT_TIME, current);
 		return;
+	}
 
 	task_ctx = cpuctx->task_ctx;
 	if (ctx->task)
@@ -2344,24 +2362,33 @@ static void ctx_sched_out(struct perf_event_context *ctx,
 	}
 
 	ctx->is_active &= ~event_type;
+	if (!(ctx->is_active & (EVENT_FLEXIBLE | EVENT_PINNED)))
+		ctx->is_active = 0;
+
 	if (ctx->task) {
 		WARN_ON_ONCE(cpuctx->task_ctx != ctx);
 		if (!ctx->is_active)
 			cpuctx->task_ctx = NULL;
 	}
 
-	update_context_time(ctx);
-	update_cgrp_time_from_cpuctx(cpuctx);
+	is_active ^= ctx->is_active; /* changed bits */
+
+	if (is_active & EVENT_TIME) {
+		/* update (and stop) ctx time */
+		update_context_time(ctx);
+		update_cgrp_time_from_cpuctx(cpuctx);
+	}
+
 	if (!ctx->nr_active)
 		return;
 
 	perf_pmu_disable(ctx->pmu);
-	if ((is_active & EVENT_PINNED) && (event_type & EVENT_PINNED)) {
+	if (is_active & EVENT_PINNED) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
 	}
 
-	if ((is_active & EVENT_FLEXIBLE) && (event_type & EVENT_FLEXIBLE)) {
+	if (is_active & EVENT_FLEXIBLE) {
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
 	}
@@ -2641,18 +2668,6 @@ void __perf_event_task_sched_out(struct task_struct *task,
 		perf_cgroup_sched_out(task, next);
 }
 
-static void task_ctx_sched_out(struct perf_cpu_context *cpuctx,
-			       struct perf_event_context *ctx)
-{
-	if (!cpuctx->task_ctx)
-		return;
-
-	if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
-		return;
-
-	ctx_sched_out(ctx, cpuctx, EVENT_ALL);
-}
-
 /*
  * Called with IRQs disabled
  */
@@ -2735,7 +2750,7 @@ ctx_sched_in(struct perf_event_context *ctx,
 	if (likely(!ctx->nr_events))
 		return;
 
-	ctx->is_active |= event_type;
+	ctx->is_active |= (event_type | EVENT_TIME);
 	if (ctx->task) {
 		if (!is_active)
 			cpuctx->task_ctx = ctx;
@@ -2743,18 +2758,24 @@ ctx_sched_in(struct perf_event_context *ctx,
 			WARN_ON_ONCE(cpuctx->task_ctx != ctx);
 	}
 
-	now = perf_clock();
-	ctx->timestamp = now;
-	perf_cgroup_set_timestamp(task, ctx);
+	is_active ^= ctx->is_active; /* changed bits */
+
+	if (is_active & EVENT_TIME) {
+		/* start ctx time */
+		now = perf_clock();
+		ctx->timestamp = now;
+		perf_cgroup_set_timestamp(task, ctx);
+	}
+
 	/*
 	 * First go through the list and put on any pinned groups
 	 * in order to give them the best chance of going on.
 	 */
-	if (!(is_active & EVENT_PINNED) && (event_type & EVENT_PINNED))
+	if (is_active & EVENT_PINNED)
 		ctx_pinned_sched_in(ctx, cpuctx);
 
 	/* Then walk through the lower prio flexible groups */
-	if (!(is_active & EVENT_FLEXIBLE) && (event_type & EVENT_FLEXIBLE))
+	if (is_active & EVENT_FLEXIBLE)
 		ctx_flexible_sched_in(ctx, cpuctx);
 }
 
@@ -3120,6 +3141,7 @@ static void perf_event_enable_on_exec(int ctxn)
 
 	cpuctx = __get_cpu_context(ctx);
 	perf_ctx_lock(cpuctx, ctx);
+	ctx_sched_out(ctx, cpuctx, EVENT_TIME);
 	list_for_each_entry(event, &ctx->event_list, event_entry)
 		enabled |= event_enable_on_exec(event, ctx);
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-24 11:53           ` Peter Zijlstra
@ 2016-02-24 14:02             ` Peter Zijlstra
  2016-02-24 16:02               ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-24 14:02 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: Jiri Olsa, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

On Wed, Feb 24, 2016 at 12:53:51PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 23, 2016 at 06:47:41PM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 23, 2016 at 10:05:50PM +0530, Pratyush Anand wrote:
> > > Its better with this patch, still count is 1 more in case of higher probe hits (
> > > like 65535 times).
> > 
> > Ah, ok, I'll go try again.
> 
> OK, so the below seems to cure this for me, but now I'm hurting my head
> to make the same true for perf_install_in_context(), because 'tricky' :/
> 

FWIW, it would be nice to have a similar test for:

	attr = {
		.disabled = true;
	}

	sys_perf_event_open(&attr, .pid = self);

	if (attr.disabled)
		ioctl(ENABLE);

	/* generate N events */

	ioctl(DISABLE);

	read();

	/* print event cnt and scale factors */

and one that has .disabled = false.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-24 14:02             ` Peter Zijlstra
@ 2016-02-24 16:02               ` Peter Zijlstra
  2016-02-25  4:07                 ` Pratyush Anand
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-02-24 16:02 UTC (permalink / raw)
  To: Pratyush Anand
  Cc: Jiri Olsa, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

On Wed, Feb 24, 2016 at 03:02:39PM +0100, Peter Zijlstra wrote:
> FWIW, it would be nice to have a similar test for:
> 
> 	attr = {
> 		.disabled = true;
> 	}
> 
> 	sys_perf_event_open(&attr, .pid = self);
> 
> 	if (attr.disabled)
> 		ioctl(ENABLE);
> 
> 	/* generate N events */
> 
> 	ioctl(DISABLE);
> 
> 	read();
> 
> 	/* print event cnt and scale factors */
> 
> and one that has .disabled = false.


root@ivb-ep:~/perf# ./main 
1000000903 218851613 218851613
root@ivb-ep:~/perf# ./main 1
1000000235 218981231 218981231


Appears to work...

---

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

#include "perf.h"

static struct perf_event_attr perf_attr = {
	.type = PERF_TYPE_HARDWARE,
	.config = PERF_COUNT_HW_INSTRUCTIONS,
	.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
		       PERF_FORMAT_TOTAL_TIME_RUNNING,
	.exclude_kernel = 1,
	.pinned = 1,
};

void die(const char *err, ...)
{
	va_list params;

	va_start(params, err);
	vfprintf(stderr, err, params);
	va_end(params);

	exit(-1);
}

int main (int argc, char **argv)
{
	u64 val[3];
	int i, fd;

	perf_attr.disabled = argc > 1;

	fd = sys_perf_event_open(&perf_attr, 0, -1, -1, 0);
	if (fd < 0)
		die("failed to create perf_event");

	if (perf_attr.disabled)
		ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

	for (i = 0; i < 100000000; i++) {
		asm volatile ("nop\n\r"
			      "nop\n\r"
			      "nop\n\r"
			      "nop\n\r"
			      "nop\n\r"
			      "nop\n\r"
			      "nop\n\r");
	}

	ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);

	read(fd, &val, sizeof(val));

	printf("%Lu %Lu %Lu\n", val[0], val[1], val[2]);

	return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-24 16:02               ` Peter Zijlstra
@ 2016-02-25  4:07                 ` Pratyush Anand
  0 siblings, 0 replies; 18+ messages in thread
From: Pratyush Anand @ 2016-02-25  4:07 UTC (permalink / raw)
  To: Peter Zijlstra, mpetlan
  Cc: Jiri Olsa, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, oleg

Hi Peter,

On 24/02/2016:05:02:41 PM, Peter Zijlstra wrote:
> On Wed, Feb 24, 2016 at 03:02:39PM +0100, Peter Zijlstra wrote:
> > FWIW, it would be nice to have a similar test for:

+ Michael, (C Test case for following proposed test case is at the end)

> > 
> > 	attr = {
> > 		.disabled = true;
> > 	}
> > 
> > 	sys_perf_event_open(&attr, .pid = self);
> > 
> > 	if (attr.disabled)
> > 		ioctl(ENABLE);
> > 
> > 	/* generate N events */
> > 
> > 	ioctl(DISABLE);
> > 
> > 	read();
> > 
> > 	/* print event cnt and scale factors */
> > 
> > and one that has .disabled = false.
> 
> 
> root@ivb-ep:~/perf# ./main 
> 1000000903 218851613 218851613
> root@ivb-ep:~/perf# ./main 1
> 1000000235 218981231 218981231
> 
> 
> Appears to work...

Thanks, Picked patches from your new series and it worked :-)

# perf stat -x';' -e 'probe_exact:*' examples/exact_counts  
65535;;probe_exact:f_65535x;85984820;100.00
997;;probe_exact:f_997x;85984820;100.00
103;;probe_exact:f_103x;85984820;100.00
3;;probe_exact:f_3x;85984820;100.00
2;;probe_exact:f_2x;85984820;100.00
1;;probe_exact:f_1x;85984820;100.00

~Pratyush

> 
> ---
> 
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdarg.h>
> 
> #include "perf.h"
> 
> static struct perf_event_attr perf_attr = {
> 	.type = PERF_TYPE_HARDWARE,
> 	.config = PERF_COUNT_HW_INSTRUCTIONS,
> 	.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
> 		       PERF_FORMAT_TOTAL_TIME_RUNNING,
> 	.exclude_kernel = 1,
> 	.pinned = 1,
> };
> 
> void die(const char *err, ...)
> {
> 	va_list params;
> 
> 	va_start(params, err);
> 	vfprintf(stderr, err, params);
> 	va_end(params);
> 
> 	exit(-1);
> }
> 
> int main (int argc, char **argv)
> {
> 	u64 val[3];
> 	int i, fd;
> 
> 	perf_attr.disabled = argc > 1;
> 
> 	fd = sys_perf_event_open(&perf_attr, 0, -1, -1, 0);
> 	if (fd < 0)
> 		die("failed to create perf_event");
> 
> 	if (perf_attr.disabled)
> 		ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> 
> 	for (i = 0; i < 100000000; i++) {
> 		asm volatile ("nop\n\r"
> 			      "nop\n\r"
> 			      "nop\n\r"
> 			      "nop\n\r"
> 			      "nop\n\r"
> 			      "nop\n\r"
> 			      "nop\n\r");
> 	}
> 
> 	ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
> 
> 	read(fd, &val, sizeof(val));
> 
> 	printf("%Lu %Lu %Lu\n", val[0], val[1], val[2]);
> 
> 	return 0;
> }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec
  2016-02-23 21:44       ` Jiri Olsa
@ 2016-02-26  2:25         ` Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2016-02-26  2:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, mingo, alexander.shishkin, eranian, linux-kernel,
	vince, dvyukov, andi, sasha.levin, Pratyush Anand

On 02/23, Jiri Olsa wrote:
>
> so I finally got some reasonable backtrace and figured that crash finally:
>
>  #7 [ffff8802751afcd0] general_protection at ffffffff817a69e8
>     [exception RIP: special_mapping_fault+47]
>     RIP: ffffffff811e40df  RSP: ffff8802751afd88  RFLAGS: 00010282
>     RAX: ffff8802747e8b68  RBX: 00007fffffffe080  RCX: c4712d0070657267
>     RDX: ffff8802751afd98  RSI: ffff8802742c4f00  RDI: ffff8802747e8b68
>     RBP: ffff8802751afd88   R8: 0000000000000000   R9: ffff8802751afe58
>     R10: 00000000000001fe  R11: 00003fffffe00000  R12: ffff8802742c4f00
>     R13: ffff8802751afe58  R14: 0000000000000000  R15: ffff880273f59ff8
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>  #8 [ffff8802751afd90] __do_fault at ffffffff811db505
>  #9 [ffff8802751afdf8] handle_mm_fault at ffffffff811e0b03
> #10 [ffff8802751afec8] __do_page_fault at ffffffff8106734a
> #11 [ffff8802751aff20] do_page_fault at ffffffff810675df
> #12 [ffff8802751aff50] page_fault at ffffffff817a6a48
>
>
> it was caused by:
>   - f872f5400cc0 mm: Add a vm_special_mapping.fault() method
>     that added call of vm_special_mapping::fault if it's defined

I guess it came from tip/x86...

>   - and uprobes code not initializing this fault pointer properly,
>     attached patch fixed the issue for me,
>     Oleg, I'm not sure this is how you want to fix this though..

Thanks! I'll send a simple fix tomorrow.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-26  2:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-19 14:37 [RFC][PATCH 0/7] perf: more fuzzer inspired patches Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 1/7] perf: Close install vs exit race Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 2/7] perf: Do not double free Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 3/7] perf: Allow perf_release() with !event->ctx Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 4/7] perf: Fix scaling vs enable_on_exec Peter Zijlstra
2016-02-23 15:27   ` Peter Zijlstra
2016-02-23 15:48     ` Jiri Olsa
2016-02-23 16:35       ` Pratyush Anand
2016-02-23 17:47         ` Peter Zijlstra
2016-02-24 11:53           ` Peter Zijlstra
2016-02-24 14:02             ` Peter Zijlstra
2016-02-24 16:02               ` Peter Zijlstra
2016-02-25  4:07                 ` Pratyush Anand
2016-02-23 21:44       ` Jiri Olsa
2016-02-26  2:25         ` Oleg Nesterov
2016-02-19 14:37 ` [RFC][PATCH 5/7] perf: Fix cloning Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 6/7] perf: Fix race between event install and jump_labels Peter Zijlstra
2016-02-19 14:37 ` [RFC][PATCH 7/7] perf: Cure event->pending_disable race Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).