Re: [PATCH RFC] remove jump_label optimization for perf sched events

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Gleb Natapov <gleb@redhat.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	Jason Baron <jbaron@redhat.com>, rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH RFC] remove jump_label optimization for perf sched events
Date: Thu, 17 Nov 2011 13:49:19 +0100	[thread overview]
Message-ID: <1321534159.27735.33.camel@twins> (raw)
In-Reply-To: <20111117123029.GB16853@redhat.com>

On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote:
> jump_lable patching is very expensive operation that involves pausing all
> cpus. The patching of perf_sched_events jump_label is easily controllable
> from userspace by unprivileged user. When user runs loop like this
> "while true; do perf stat -e cycles true; done" the performance of my
> test application that just increments a counter for one second drops by
> 4%. This is on a 16 cpu box with my test application using only one of
> them. An impact on a real server doing real work will be much worse.
> Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
> record" since KVM PMU implementation creates and destroys perf event
> frequently.

Ideally we'd fix text_poke to not use stop_machine() we know how to, but
we haven't had the green light from Intel/AMD yet.

Rostedt was going to implement it anyway and see if anything breaks.

Also, virt might be able to pull something smart on text_poke() dunno.

That said, I'd much rather throttle this particular jump label than
remove it altogether, some people really don't like all this scheduler
hot path crap.

Something I've pondered for a while but never actually tried yet (and it
hasn't even seen a compiler) is something like the below, I don't think
there's any reason to have two scheduler hooks.

It wouldn't solve your problem, but having only one hooks does make it
easier to play around with throttling stuff.

---
 include/linux/perf_event.h |   19 +++++--------------
 kernel/events/core.c       |   14 ++++++++++----
 kernel/sched.c             |    9 +--------
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5..f1f621a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -947,10 +947,8 @@ extern void perf_pmu_unregister(struct pmu *pmu);
 
 extern int perf_num_counters(void);
 extern const char *perf_pmu_name(void);
-extern void __perf_event_task_sched_in(struct task_struct *prev,
-				       struct task_struct *task);
-extern void __perf_event_task_sched_out(struct task_struct *prev,
-					struct task_struct *next);
+extern void __perf_event_task_sched(struct task_struct *prev,
+				    struct task_struct *next);
 extern int perf_event_init_task(struct task_struct *child);
 extern void perf_event_exit_task(struct task_struct *child);
 extern void perf_event_free_task(struct task_struct *task);
@@ -1064,20 +1062,13 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 
 extern struct jump_label_key perf_sched_events;
 
-static inline void perf_event_task_sched_in(struct task_struct *prev,
-					    struct task_struct *task)
-{
-	if (static_branch(&perf_sched_events))
-		__perf_event_task_sched_in(prev, task);
-}
-
-static inline void perf_event_task_sched_out(struct task_struct *prev,
-					     struct task_struct *next)
+static inline void perf_event_task_sched(struct task_struct *prev,
+					 struct task_struct *next)
 {
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0);
 
 	if (static_branch(&perf_sched_events))
-		__perf_event_task_sched_out(prev, next);
+		__perf_event_task_sched(prev, next);
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2e41c8e..bf9bccb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2019,8 +2019,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
  * accessing the event control register. If a NMI hits, then it will
  * not restart the event.
  */
-void __perf_event_task_sched_out(struct task_struct *task,
-				 struct task_struct *next)
+static void 
+__perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
 {
 	int ctxn;
 
@@ -2199,8 +2199,8 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
  * accessing the event control register. If a NMI hits, then it will
  * keep the event running.
  */
-void __perf_event_task_sched_in(struct task_struct *prev,
-				struct task_struct *task)
+static void
+__perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task)
 {
 	struct perf_event_context *ctx;
 	int ctxn;
@@ -2221,6 +2221,12 @@ void __perf_event_task_sched_in(struct task_struct *prev,
 		perf_cgroup_sched_in(prev, task);
 }
 
+void __perf_event_task_sched(struct task_struct *prev, struct task_struct *next)
+{
+	__perf_event_task_sched_out(prev, next);
+	__perf_event_task_sched_in(prev, next);
+}
+
 static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count)
 {
 	u64 frequency = event->attr.sample_freq;
diff --git a/kernel/sched.c b/kernel/sched.c
index c9e3ab6..657bbc1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3183,7 +3183,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
 	sched_info_switch(prev, next);
-	perf_event_task_sched_out(prev, next);
+	perf_event_task_sched(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
@@ -3226,13 +3226,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	 */
 	prev_state = prev->state;
 	finish_arch_switch(prev);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	local_irq_disable();
-#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
-	perf_event_task_sched_in(prev, current);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	local_irq_enable();
-#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
 	finish_lock_switch(rq, prev);
 
 	fire_sched_in_preempt_notifiers(current);

next prev parent reply	other threads:[~2011-11-17 12:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-17 12:30 [PATCH RFC] remove jump_label optimization for perf sched events Gleb Natapov
2011-11-17 12:49 ` Peter Zijlstra [this message]
2011-11-17 13:00   ` Gleb Natapov
2011-11-17 13:10     ` Peter Zijlstra
2011-11-17 13:24       ` Avi Kivity
2011-11-17 13:47         ` Peter Zijlstra
2011-11-17 14:12           ` Avi Kivity
2011-11-17 13:29   ` Borislav Petkov
2011-11-17 13:47     ` Gleb Natapov
2011-11-21 13:17   ` Gleb Natapov
2011-11-24 13:23     ` Peter Zijlstra
2011-11-24 13:45       ` Gleb Natapov
2011-11-24 14:18         ` Peter Zijlstra
2011-11-24 17:43           ` Gleb Natapov

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1e9ebe5 dfblob:f1f621a dfblob:2e41c8e dfblob:bf9bccb
dfblob:c9e3ab6 dfblob:657bbc1 )
 OR (
bs:"Re: [PATCH RFC] remove jump_label optimization for perf sched events" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1321534159.27735.33.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=gleb@redhat.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox