public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] remove jump_label optimization for perf sched events
@ 2011-11-17 12:30 Gleb Natapov
  2011-11-17 12:49 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Gleb Natapov @ 2011-11-17 12:30 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: linux-kernel, mingo

jump_lable patching is very expensive operation that involves pausing all
cpus. The patching of perf_sched_events jump_label is easily controllable
from userspace by unprivileged user. When user runs loop like this
"while true; do perf stat -e cycles true; done" the performance of my
test application that just increments a counter for one second drops by
4%. This is on a 16 cpu box with my test application using only one of
them. An impact on a real server doing real work will be much worse.
Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
record" since KVM PMU implementation creates and destroys perf event
frequently.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5..afac189 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1062,12 +1063,12 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 	}
 }
 
-extern struct jump_label_key perf_sched_events;
+extern atomic_t perf_sched_events;
 
 static inline void perf_event_task_sched_in(struct task_struct *prev,
 					    struct task_struct *task)
 {
-	if (static_branch(&perf_sched_events))
+	if (atomic_read(&perf_sched_events))
 		__perf_event_task_sched_in(prev, task);
 }
 
@@ -1076,7 +1077,7 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
 {
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0);
 
-	if (static_branch(&perf_sched_events))
+	if (atomic_read(&perf_sched_events))
 		__perf_event_task_sched_out(prev, next);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bdcd413..8033600 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -128,7 +128,7 @@ enum event_type_t {
  * perf_sched_events : >0 events exist
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
  */
-struct jump_label_key perf_sched_events __read_mostly;
+atomic_t perf_sched_events;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 
 static atomic_t nr_mmap_events __read_mostly;
@@ -2943,7 +2963,7 @@ static void free_event(struct perf_event *event)
 
 	if (!event->parent) {
 		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_dec(&perf_sched_events);
+			atomic_dec(&perf_sched_events);
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_dec(&nr_mmap_events);
 		if (event->attr.comm)
@@ -2954,7 +2974,7 @@ static void free_event(struct perf_event *event)
 			put_callchain_buffers();
 		if (is_cgroup_event(event)) {
 			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
-			jump_label_dec(&perf_sched_events);
+			atomic_dec(&perf_sched_events);
 		}
 	}
 
@@ -5897,7 +5917,7 @@ done:
 
 	if (!event->parent) {
 		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_inc(&perf_sched_events);
+			atomic_inc(&perf_sched_events);
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_inc(&nr_mmap_events);
 		if (event->attr.comm)
@@ -6133,7 +6153,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		 * - that may need work on context switch
 		 */
 		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
-		jump_label_inc(&perf_sched_events);
+		atomic_inc(&perf_sched_events);
 	}
 
 	/*
--
			Gleb.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 12:30 [PATCH RFC] remove jump_label optimization for perf sched events Gleb Natapov
@ 2011-11-17 12:49 ` Peter Zijlstra
  2011-11-17 13:00   ` Gleb Natapov
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2011-11-17 12:49 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote:
> jump_lable patching is very expensive operation that involves pausing all
> cpus. The patching of perf_sched_events jump_label is easily controllable
> from userspace by unprivileged user. When user runs loop like this
> "while true; do perf stat -e cycles true; done" the performance of my
> test application that just increments a counter for one second drops by
> 4%. This is on a 16 cpu box with my test application using only one of
> them. An impact on a real server doing real work will be much worse.
> Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
> record" since KVM PMU implementation creates and destroys perf event
> frequently.

Ideally we'd fix text_poke to not use stop_machine() we know how to, but
we haven't had the green light from Intel/AMD yet.

Rostedt was going to implement it anyway and see if anything breaks.

Also, virt might be able to pull something smart on text_poke() dunno.

That said, I'd much rather throttle this particular jump label than
remove it altogether, some people really don't like all this scheduler
hot path crap.

Something I've pondered for a while but never actually tried yet (and it
hasn't even seen a compiler) is something like the below, I don't think
there's any reason to have two scheduler hooks.

It wouldn't solve your problem, but having only one hooks does make it
easier to play around with throttling stuff.

---
 include/linux/perf_event.h |   19 +++++--------------
 kernel/events/core.c       |   14 ++++++++++----
 kernel/sched.c             |    9 +--------
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e9ebe5..f1f621a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -947,10 +947,8 @@ extern void perf_pmu_unregister(struct pmu *pmu);
 
 extern int perf_num_counters(void);
 extern const char *perf_pmu_name(void);
-extern void __perf_event_task_sched_in(struct task_struct *prev,
-				       struct task_struct *task);
-extern void __perf_event_task_sched_out(struct task_struct *prev,
-					struct task_struct *next);
+extern void __perf_event_task_sched(struct task_struct *prev,
+				    struct task_struct *next);
 extern int perf_event_init_task(struct task_struct *child);
 extern void perf_event_exit_task(struct task_struct *child);
 extern void perf_event_free_task(struct task_struct *task);
@@ -1064,20 +1062,13 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 
 extern struct jump_label_key perf_sched_events;
 
-static inline void perf_event_task_sched_in(struct task_struct *prev,
-					    struct task_struct *task)
-{
-	if (static_branch(&perf_sched_events))
-		__perf_event_task_sched_in(prev, task);
-}
-
-static inline void perf_event_task_sched_out(struct task_struct *prev,
-					     struct task_struct *next)
+static inline void perf_event_task_sched(struct task_struct *prev,
+					 struct task_struct *next)
 {
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0);
 
 	if (static_branch(&perf_sched_events))
-		__perf_event_task_sched_out(prev, next);
+		__perf_event_task_sched(prev, next);
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2e41c8e..bf9bccb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2019,8 +2019,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
  * accessing the event control register. If a NMI hits, then it will
  * not restart the event.
  */
-void __perf_event_task_sched_out(struct task_struct *task,
-				 struct task_struct *next)
+static void 
+__perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
 {
 	int ctxn;
 
@@ -2199,8 +2199,8 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
  * accessing the event control register. If a NMI hits, then it will
  * keep the event running.
  */
-void __perf_event_task_sched_in(struct task_struct *prev,
-				struct task_struct *task)
+static void
+__perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task)
 {
 	struct perf_event_context *ctx;
 	int ctxn;
@@ -2221,6 +2221,12 @@ void __perf_event_task_sched_in(struct task_struct *prev,
 		perf_cgroup_sched_in(prev, task);
 }
 
+void __perf_event_task_sched(struct task_struct *prev, struct task_struct *next)
+{
+	__perf_event_task_sched_out(prev, next);
+	__perf_event_task_sched_in(prev, next);
+}
+
 static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count)
 {
 	u64 frequency = event->attr.sample_freq;
diff --git a/kernel/sched.c b/kernel/sched.c
index c9e3ab6..657bbc1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3183,7 +3183,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
 	sched_info_switch(prev, next);
-	perf_event_task_sched_out(prev, next);
+	perf_event_task_sched(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
@@ -3226,13 +3226,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	 */
 	prev_state = prev->state;
 	finish_arch_switch(prev);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	local_irq_disable();
-#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
-	perf_event_task_sched_in(prev, current);
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-	local_irq_enable();
-#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
 	finish_lock_switch(rq, prev);
 
 	fire_sched_in_preempt_notifiers(current);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 12:49 ` Peter Zijlstra
@ 2011-11-17 13:00   ` Gleb Natapov
  2011-11-17 13:10     ` Peter Zijlstra
  2011-11-17 13:29   ` Borislav Petkov
  2011-11-21 13:17   ` Gleb Natapov
  2 siblings, 1 reply; 14+ messages in thread
From: Gleb Natapov @ 2011-11-17 13:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote:
> > jump_lable patching is very expensive operation that involves pausing all
> > cpus. The patching of perf_sched_events jump_label is easily controllable
> > from userspace by unprivileged user. When user runs loop like this
> > "while true; do perf stat -e cycles true; done" the performance of my
> > test application that just increments a counter for one second drops by
> > 4%. This is on a 16 cpu box with my test application using only one of
> > them. An impact on a real server doing real work will be much worse.
> > Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
> > record" since KVM PMU implementation creates and destroys perf event
> > frequently.
> 
> Ideally we'd fix text_poke to not use stop_machine() we know how to, but
> we haven't had the green light from Intel/AMD yet.
> 
> Rostedt was going to implement it anyway and see if anything breaks.
> 
Hmm interesting.

> Also, virt might be able to pull something smart on text_poke() dunno.
> 
The problem with virt is not text_poke() in a guest, but the one in a
host. The guest I am testing with has only one cpu. Basically creating
fist perf event/destroying last perf event is very expensive currently
and when "perf record" is running in a guest this happens a lot in a
host.

> That said, I'd much rather throttle this particular jump label than
> remove it altogether, some people really don't like all this scheduler
> hot path crap.
What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers?
preempt notifiers checking is already on the scheduler hot path, so no
additional overhead for perf case.

--
			Gleb.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 13:00   ` Gleb Natapov
@ 2011-11-17 13:10     ` Peter Zijlstra
  2011-11-17 13:24       ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2011-11-17 13:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote:

> > That said, I'd much rather throttle this particular jump label than
> > remove it altogether, some people really don't like all this scheduler
> > hot path crap.
> What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers?
> preempt notifiers checking is already on the scheduler hot path, so no
> additional overhead for perf case.

Same problem really, some people complain about the overhead of preempt
notifiers, also not all kernels have those in. 

Futhermore I loathe notifier lists because they obscure wtf is done.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 13:10     ` Peter Zijlstra
@ 2011-11-17 13:24       ` Avi Kivity
  2011-11-17 13:47         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Avi Kivity @ 2011-11-17 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt,
	Thomas Gleixner

On 11/17/2011 03:10 PM, Peter Zijlstra wrote:
> On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote:
>
> > > That said, I'd much rather throttle this particular jump label than
> > > remove it altogether, some people really don't like all this scheduler
> > > hot path crap.
> > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers?
> > preempt notifiers checking is already on the scheduler hot path, so no
> > additional overhead for perf case.
>
> Same problem really, some people complain about the overhead of preempt
> notifiers, also not all kernels have those in. 

We could combine the two, sort-circuit preempt notifiers with jump
labels if empty && not much activity on them.

> Futhermore I loathe notifier lists because they obscure wtf is done.

That's life in a general purpose kernel, if everyone gets their hook in
to keep their code clean, the scheduler will bloat.

An advantage of preempt notifiers is that you can make the perf code
modular.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 12:49 ` Peter Zijlstra
  2011-11-17 13:00   ` Gleb Natapov
@ 2011-11-17 13:29   ` Borislav Petkov
  2011-11-17 13:47     ` Gleb Natapov
  2011-11-21 13:17   ` Gleb Natapov
  2 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2011-11-17 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt,
	Thomas Gleixner, H. Peter Anvin

On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote:
> Ideally we'd fix text_poke to not use stop_machine() we know how to,
> but we haven't had the green light from Intel/AMD yet.

We talked about it briefly with hpa and tglx at KS. AFAICR, there's a
patchset in the works...

-- 
Regards/Gruss,
   Boris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 13:29   ` Borislav Petkov
@ 2011-11-17 13:47     ` Gleb Natapov
  0 siblings, 0 replies; 14+ messages in thread
From: Gleb Natapov @ 2011-11-17 13:47 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra, linux-kernel, mingo, Jason Baron,
	rostedt, Thomas Gleixner, H. Peter Anvin

On Thu, Nov 17, 2011 at 02:29:39PM +0100, Borislav Petkov wrote:
> On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote:
> > Ideally we'd fix text_poke to not use stop_machine() we know how to,
> > but we haven't had the green light from Intel/AMD yet.
> 
> We talked about it briefly with hpa and tglx at KS. AFAICR, there's a
> patchset in the works...
> 
I am eager to test it. If someone has something testable send it my way :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 13:24       ` Avi Kivity
@ 2011-11-17 13:47         ` Peter Zijlstra
  2011-11-17 14:12           ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2011-11-17 13:47 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt,
	Thomas Gleixner

On Thu, 2011-11-17 at 15:24 +0200, Avi Kivity wrote:
> On 11/17/2011 03:10 PM, Peter Zijlstra wrote:
> > On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote:
> >
> > > > That said, I'd much rather throttle this particular jump label than
> > > > remove it altogether, some people really don't like all this scheduler
> > > > hot path crap.
> > > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers?
> > > preempt notifiers checking is already on the scheduler hot path, so no
> > > additional overhead for perf case.
> >
> > Same problem really, some people complain about the overhead of preempt
> > notifiers, also not all kernels have those in. 
> 
> We could combine the two, sort-circuit preempt notifiers with jump
> labels if empty && not much activity on them.

Jump-labels are still more efficient, also I don't much like preempt
notifiers.

> > Futhermore I loathe notifier lists because they obscure wtf is done.
> 
> That's life in a general purpose kernel, if everyone gets their hook in
> to keep their code clean, the scheduler will bloat.

Uhm, no. The bloat isn't different, the only difference is you can
actually see it. So I very much prefer direct hooks.

> An advantage of preempt notifiers is that you can make the perf code
> modular.

Yeah, and you know I loathe modules even more.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 13:47         ` Peter Zijlstra
@ 2011-11-17 14:12           ` Avi Kivity
  0 siblings, 0 replies; 14+ messages in thread
From: Avi Kivity @ 2011-11-17 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt,
	Thomas Gleixner

On 11/17/2011 03:47 PM, Peter Zijlstra wrote:
> On Thu, 2011-11-17 at 15:24 +0200, Avi Kivity wrote:
> > On 11/17/2011 03:10 PM, Peter Zijlstra wrote:
> > > On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote:
> > >
> > > > > That said, I'd much rather throttle this particular jump label than
> > > > > remove it altogether, some people really don't like all this scheduler
> > > > > hot path crap.
> > > > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers?
> > > > preempt notifiers checking is already on the scheduler hot path, so no
> > > > additional overhead for perf case.
> > >
> > > Same problem really, some people complain about the overhead of preempt
> > > notifiers, also not all kernels have those in. 
> > 
> > We could combine the two, sort-circuit preempt notifiers with jump
> > labels if empty && not much activity on them.
>
> Jump-labels are still more efficient, also I don't much like preempt
> notifiers.
>
> > > Futhermore I loathe notifier lists because they obscure wtf is done.
> > 
> > That's life in a general purpose kernel, if everyone gets their hook in
> > to keep their code clean, the scheduler will bloat.
>
> Uhm, no. The bloat isn't different, the only difference is you can
> actually see it. So I very much prefer direct hooks.
>
> > An advantage of preempt notifiers is that you can make the perf code
> > modular.
>
> Yeah, and you know I loathe modules even more.

Is there something you like?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-17 12:49 ` Peter Zijlstra
  2011-11-17 13:00   ` Gleb Natapov
  2011-11-17 13:29   ` Borislav Petkov
@ 2011-11-21 13:17   ` Gleb Natapov
  2011-11-24 13:23     ` Peter Zijlstra
  2 siblings, 1 reply; 14+ messages in thread
From: Gleb Natapov @ 2011-11-21 13:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote:
> > jump_lable patching is very expensive operation that involves pausing all
> > cpus. The patching of perf_sched_events jump_label is easily controllable
> > from userspace by unprivileged user. When user runs loop like this
> > "while true; do perf stat -e cycles true; done" the performance of my
> > test application that just increments a counter for one second drops by
> > 4%. This is on a 16 cpu box with my test application using only one of
> > them. An impact on a real server doing real work will be much worse.
> > Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
> > record" since KVM PMU implementation creates and destroys perf event
> > frequently.
> 
> Ideally we'd fix text_poke to not use stop_machine() we know how to, but
> we haven't had the green light from Intel/AMD yet.
> 
> Rostedt was going to implement it anyway and see if anything breaks.
> 
> Also, virt might be able to pull something smart on text_poke() dunno.
> 
> That said, I'd much rather throttle this particular jump label than
> remove it altogether, some people really don't like all this scheduler
> hot path crap.
> 
So how about throttling it like the patch below does until stop_machine()
no longer needed for patching (and it is possible that new way of patching
will still have significant overhead).

Signed-off-by: Gleb Natapov <gleb@redhat.com>
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 66f23dc..a4687f6 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -3,12 +3,15 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/workqueue.h>
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
 struct jump_label_key {
 	atomic_t enabled;
 	struct jump_entry *entries;
+	unsigned long rl_timeout;
+	struct delayed_work rl_work;
 #ifdef CONFIG_MODULES
 	struct jump_label_mod *next;
 #endif
@@ -28,9 +31,18 @@ struct module;
 #ifdef HAVE_JUMP_LABEL
 
 #ifdef CONFIG_MODULES
-#define JUMP_LABEL_INIT {{ 0 }, NULL, NULL}
+#define JUMP_LABEL_INIT {	\
+	.enabled = { 0 },	\
+	.entries = NULL,	\
+	.rl_timeout = 0,	\
+	.next = NULL,		\
+	}
 #else
-#define JUMP_LABEL_INIT {{ 0 }, NULL}
+#define JUMP_LABEL_INIT {	\
+	.enabled = { 0 },	\
+	.entries = NULL,	\
+	.rl_timeout = 0,	\
+	}
 #endif
 
 static __always_inline bool static_branch(struct jump_label_key *key)
@@ -51,6 +63,7 @@ extern void jump_label_inc(struct jump_label_key *key);
 extern void jump_label_dec(struct jump_label_key *key);
 extern bool jump_label_enabled(struct jump_label_key *key);
 extern void jump_label_apply_nops(struct module *mod);
+extern void jump_label_rate_limit(struct jump_label_key *key, unsigned long rl);
 
 #else
 
@@ -97,6 +110,10 @@ static inline int jump_label_apply_nops(struct module *mod)
 	return 0;
 }
 
+static inline void jump_label_rate_limit(struct jump_label_key *key,
+		unsigned long rl)
+{
+}
 #endif
 
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bdcd413..50bb631 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7071,6 +7091,9 @@ void __init perf_event_init(void)
 
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
+
+	/* do not patch jump label more than once per second */
+	jump_label_rate_limit(&perf_sched_events, HZ);
 }
 
 static int __init perf_event_sysfs_init(void)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index e6f1f24..88a82d2 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -72,15 +72,39 @@ void jump_label_inc(struct jump_label_key *key)
 	jump_label_unlock();
 }
 
-void jump_label_dec(struct jump_label_key *key)
+static void __jump_label_dec(struct jump_label_key *key,
+		unsigned long rate_limit)
 {
 	if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex))
 		return;
 
-	jump_label_update(key, JUMP_LABEL_DISABLE);
+	if (rate_limit) {
+		atomic_inc(&key->enabled);
+		schedule_delayed_work(&key->rl_work, rate_limit);
+	} else
+		jump_label_update(key, JUMP_LABEL_DISABLE);
+
 	jump_label_unlock();
 }
 
+static void jump_label_update_timeout(struct work_struct *work)
+{
+	struct jump_label_key *key =
+		container_of(work, struct jump_label_key, rl_work.work);
+	__jump_label_dec(key, 0);
+}
+
+void jump_label_dec(struct jump_label_key *key)
+{
+	__jump_label_dec(key, key->rl_timeout);
+}
+
+void jump_label_rate_limit(struct jump_label_key *key, unsigned long rl)
+{
+	key->rl_timeout = rl;
+	INIT_DELAYED_WORK(&key->rl_work, jump_label_update_timeout);
+}
+
 static int addr_conflict(struct jump_entry *entry, void *start, void *end)
 {
 	if (entry->code <= (unsigned long)end &&
--
			Gleb.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-21 13:17   ` Gleb Natapov
@ 2011-11-24 13:23     ` Peter Zijlstra
  2011-11-24 13:45       ` Gleb Natapov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2011-11-24 13:23 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Mon, 2011-11-21 at 15:17 +0200, Gleb Natapov wrote:
> So how about throttling it like the patch below does until stop_machine()
> no longer needed for patching (and it is possible that new way of patching
> will still have significant overhead).
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 66f23dc..a4687f6 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -3,12 +3,15 @@
>  
>  #include <linux/types.h>
>  #include <linux/compiler.h>
> +#include <linux/workqueue.h>
>  
>  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
>  
>  struct jump_label_key {
>         atomic_t enabled;
>         struct jump_entry *entries;
> +       unsigned long rl_timeout;
> +       struct delayed_work rl_work;
>  #ifdef CONFIG_MODULES
>         struct jump_label_mod *next;


I'm not sure its worth it doing in generic code like this, it bloats the
jump_label_key structure quite significantly (and there's tons of those
around) for only 1 real user.

If we want to do this in generic code, I think its best to introduce
something like:

struct jump_label_key_deferred {
	struct jump_label_key key;
	unsigned long timeout;
	struct delayed_work work;
}

But is there really any other user for this? All the trace bits are root
only iirc and kvm itself only sets them on the guest kernel I think for
paravirt, so that's not a problem.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-24 13:23     ` Peter Zijlstra
@ 2011-11-24 13:45       ` Gleb Natapov
  2011-11-24 14:18         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Gleb Natapov @ 2011-11-24 13:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, Nov 24, 2011 at 02:23:00PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-21 at 15:17 +0200, Gleb Natapov wrote:
> > So how about throttling it like the patch below does until stop_machine()
> > no longer needed for patching (and it is possible that new way of patching
> > will still have significant overhead).
> > 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> > index 66f23dc..a4687f6 100644
> > --- a/include/linux/jump_label.h
> > +++ b/include/linux/jump_label.h
> > @@ -3,12 +3,15 @@
> >  
> >  #include <linux/types.h>
> >  #include <linux/compiler.h>
> > +#include <linux/workqueue.h>
> >  
> >  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> >  
> >  struct jump_label_key {
> >         atomic_t enabled;
> >         struct jump_entry *entries;
> > +       unsigned long rl_timeout;
> > +       struct delayed_work rl_work;
> >  #ifdef CONFIG_MODULES
> >         struct jump_label_mod *next;
> 
> 
> I'm not sure its worth it doing in generic code like this, it bloats the
> jump_label_key structure quite significantly (and there's tons of those
> around) for only 1 real user.
To do it in the perf code we will have to replicate "enabled" accounting
of jump_label in the perf code. The code will be much uglier this way
IMO. On the other hand the feature fits nicely into generic jump_label
code and apart from jump_label_key structure size overhead has no other
downsides.  And size overhead will be eliminated by your suggestion below.

> 
> If we want to do this in generic code, I think its best to introduce
> something like:
> 
> struct jump_label_key_deferred {
> 	struct jump_label_key key;
> 	unsigned long timeout;
> 	struct delayed_work work;
> }
> 
Yes, I contemplated this. I didn't realized that there are tons of
jump_labels though.

> But is there really any other user for this? All the trace bits are root
> only iirc and kvm itself only sets them on the guest kernel I think for
> paravirt, so that's not a problem.
> 
The problem I am trying to fix with this patch is not strictly
virtualization related. As I showed in my previous email on the
subject you can trigger this jump label patching very often without
virtualization at all. Of course with virtulization the problem is much
more serious since the patching is going on in the host kernel and guest
kernel.

--
			Gleb.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-24 13:45       ` Gleb Natapov
@ 2011-11-24 14:18         ` Peter Zijlstra
  2011-11-24 17:43           ` Gleb Natapov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2011-11-24 14:18 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, 2011-11-24 at 15:45 +0200, Gleb Natapov wrote:
> Yes, I contemplated this. I didn't realized that there are tons of
> jump_labels though.

Yeah, every tracepoint has one, that's by far the biggest user
currently.

> > But is there really any other user for this? All the trace bits are root
> > only iirc and kvm itself only sets them on the guest kernel I think for
> > paravirt, so that's not a problem.
> > 
> The problem I am trying to fix with this patch is not strictly
> virtualization related. 

No I know, its about user-trigerably jump_label conversions, but afaik
that's only perf. The others: tracepoints, and paravirt can't be toggled
by unpriv. users.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC] remove jump_label optimization for perf sched events
  2011-11-24 14:18         ` Peter Zijlstra
@ 2011-11-24 17:43           ` Gleb Natapov
  0 siblings, 0 replies; 14+ messages in thread
From: Gleb Natapov @ 2011-11-24 17:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner

On Thu, Nov 24, 2011 at 03:18:04PM +0100, Peter Zijlstra wrote:
> > > But is there really any other user for this? All the trace bits are root
> > > only iirc and kvm itself only sets them on the guest kernel I think for
> > > paravirt, so that's not a problem.
> > > 
> > The problem I am trying to fix with this patch is not strictly
> > virtualization related. 
> 
> No I know, its about user-trigerably jump_label conversions, but afaik
> that's only perf. The others: tracepoints, and paravirt can't be toggled
> by unpriv. users.
Currently it may be only perf, I haven't checked. But as jump_label will gain
more users the problem may become more widespread.  Now that I know
how jump_label work I will look into changing some KVM ifs into it. If
some of them will be triggered by a guest action (for instance enabling
debug, or entering real mode) they will have to be throttled too. But
this is just speculation at this point. What is important is fixing the
one we have now. I looked into doing it outside of jump_label code, but
then I saw that I need to duplicate most of
jump_label_inc/jump_label_dec code in perf so I went for more general
solution.

--
			Gleb.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-11-24 17:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-17 12:30 [PATCH RFC] remove jump_label optimization for perf sched events Gleb Natapov
2011-11-17 12:49 ` Peter Zijlstra
2011-11-17 13:00   ` Gleb Natapov
2011-11-17 13:10     ` Peter Zijlstra
2011-11-17 13:24       ` Avi Kivity
2011-11-17 13:47         ` Peter Zijlstra
2011-11-17 14:12           ` Avi Kivity
2011-11-17 13:29   ` Borislav Petkov
2011-11-17 13:47     ` Gleb Natapov
2011-11-21 13:17   ` Gleb Natapov
2011-11-24 13:23     ` Peter Zijlstra
2011-11-24 13:45       ` Gleb Natapov
2011-11-24 14:18         ` Peter Zijlstra
2011-11-24 17:43           ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox