* [PATCH RFC] remove jump_label optimization for perf sched events @ 2011-11-17 12:30 Gleb Natapov 2011-11-17 12:49 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Gleb Natapov @ 2011-11-17 12:30 UTC (permalink / raw) To: a.p.zijlstra; +Cc: linux-kernel, mingo jump_lable patching is very expensive operation that involves pausing all cpus. The patching of perf_sched_events jump_label is easily controllable from userspace by unprivileged user. When user runs loop like this "while true; do perf stat -e cycles true; done" the performance of my test application that just increments a counter for one second drops by 4%. This is on a 16 cpu box with my test application using only one of them. An impact on a real server doing real work will be much worse. Performance of KVM PMU drops nearly 50% due to jump_lable for "perf record" since KVM PMU implementation creates and destroys perf event frequently. Signed-off-by: Gleb Natapov <gleb@redhat.com> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1e9ebe5..afac189 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1062,12 +1063,12 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) } } -extern struct jump_label_key perf_sched_events; +extern atomic_t perf_sched_events; static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { - if (static_branch(&perf_sched_events)) + if (atomic_read(&perf_sched_events)) __perf_event_task_sched_in(prev, task); } @@ -1076,7 +1077,7 @@ static inline void perf_event_task_sched_out(struct task_struct *prev, { perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0); - if (static_branch(&perf_sched_events)) + if (atomic_read(&perf_sched_events)) __perf_event_task_sched_out(prev, next); } diff --git a/kernel/events/core.c b/kernel/events/core.c index bdcd413..8033600 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -128,7 +128,7 @@ enum event_type_t { * perf_sched_events : >0 events exist * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu */ -struct jump_label_key perf_sched_events __read_mostly; +atomic_t perf_sched_events; static DEFINE_PER_CPU(atomic_t, perf_cgroup_events); static atomic_t nr_mmap_events __read_mostly; @@ -2943,7 +2963,7 @@ static void free_event(struct perf_event *event) if (!event->parent) { if (event->attach_state & PERF_ATTACH_TASK) - jump_label_dec(&perf_sched_events); + atomic_dec(&perf_sched_events); if (event->attr.mmap || event->attr.mmap_data) atomic_dec(&nr_mmap_events); if (event->attr.comm) @@ -2954,7 +2974,7 @@ static void free_event(struct perf_event *event) put_callchain_buffers(); if (is_cgroup_event(event)) { atomic_dec(&per_cpu(perf_cgroup_events, event->cpu)); - jump_label_dec(&perf_sched_events); + atomic_dec(&perf_sched_events); } } @@ -5897,7 +5917,7 @@ done: if (!event->parent) { if (event->attach_state & PERF_ATTACH_TASK) - jump_label_inc(&perf_sched_events); + atomic_inc(&perf_sched_events); if (event->attr.mmap || event->attr.mmap_data) atomic_inc(&nr_mmap_events); if (event->attr.comm) @@ -6133,7 +6153,7 @@ SYSCALL_DEFINE5(perf_event_open, * - that may need work on context switch */ atomic_inc(&per_cpu(perf_cgroup_events, event->cpu)); - jump_label_inc(&perf_sched_events); + atomic_inc(&perf_sched_events); } /* -- Gleb. ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 12:30 [PATCH RFC] remove jump_label optimization for perf sched events Gleb Natapov @ 2011-11-17 12:49 ` Peter Zijlstra 2011-11-17 13:00 ` Gleb Natapov ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Peter Zijlstra @ 2011-11-17 12:49 UTC (permalink / raw) To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote: > jump_lable patching is very expensive operation that involves pausing all > cpus. The patching of perf_sched_events jump_label is easily controllable > from userspace by unprivileged user. When user runs loop like this > "while true; do perf stat -e cycles true; done" the performance of my > test application that just increments a counter for one second drops by > 4%. This is on a 16 cpu box with my test application using only one of > them. An impact on a real server doing real work will be much worse. > Performance of KVM PMU drops nearly 50% due to jump_lable for "perf > record" since KVM PMU implementation creates and destroys perf event > frequently. Ideally we'd fix text_poke to not use stop_machine() we know how to, but we haven't had the green light from Intel/AMD yet. Rostedt was going to implement it anyway and see if anything breaks. Also, virt might be able to pull something smart on text_poke() dunno. That said, I'd much rather throttle this particular jump label than remove it altogether, some people really don't like all this scheduler hot path crap. Something I've pondered for a while but never actually tried yet (and it hasn't even seen a compiler) is something like the below, I don't think there's any reason to have two scheduler hooks. It wouldn't solve your problem, but having only one hooks does make it easier to play around with throttling stuff. --- include/linux/perf_event.h | 19 +++++-------------- kernel/events/core.c | 14 ++++++++++---- kernel/sched.c | 9 +-------- 3 files changed, 16 insertions(+), 26 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1e9ebe5..f1f621a 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -947,10 +947,8 @@ extern void perf_pmu_unregister(struct pmu *pmu); extern int perf_num_counters(void); extern const char *perf_pmu_name(void); -extern void __perf_event_task_sched_in(struct task_struct *prev, - struct task_struct *task); -extern void __perf_event_task_sched_out(struct task_struct *prev, - struct task_struct *next); +extern void __perf_event_task_sched(struct task_struct *prev, + struct task_struct *next); extern int perf_event_init_task(struct task_struct *child); extern void perf_event_exit_task(struct task_struct *child); extern void perf_event_free_task(struct task_struct *task); @@ -1064,20 +1062,13 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) extern struct jump_label_key perf_sched_events; -static inline void perf_event_task_sched_in(struct task_struct *prev, - struct task_struct *task) -{ - if (static_branch(&perf_sched_events)) - __perf_event_task_sched_in(prev, task); -} - -static inline void perf_event_task_sched_out(struct task_struct *prev, - struct task_struct *next) +static inline void perf_event_task_sched(struct task_struct *prev, + struct task_struct *next) { perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0); if (static_branch(&perf_sched_events)) - __perf_event_task_sched_out(prev, next); + __perf_event_task_sched(prev, next); } extern void perf_event_mmap(struct vm_area_struct *vma); diff --git a/kernel/events/core.c b/kernel/events/core.c index 2e41c8e..bf9bccb 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2019,8 +2019,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn, * accessing the event control register. If a NMI hits, then it will * not restart the event. */ -void __perf_event_task_sched_out(struct task_struct *task, - struct task_struct *next) +static void +__perf_event_task_sched_out(struct task_struct *task, struct task_struct *next) { int ctxn; @@ -2199,8 +2199,8 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx, * accessing the event control register. If a NMI hits, then it will * keep the event running. */ -void __perf_event_task_sched_in(struct task_struct *prev, - struct task_struct *task) +static void +__perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { struct perf_event_context *ctx; int ctxn; @@ -2221,6 +2221,12 @@ void __perf_event_task_sched_in(struct task_struct *prev, perf_cgroup_sched_in(prev, task); } +void __perf_event_task_sched(struct task_struct *prev, struct task_struct *next) +{ + __perf_event_task_sched_out(prev, next); + __perf_event_task_sched_in(prev, next); +} + static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count) { u64 frequency = event->attr.sample_freq; diff --git a/kernel/sched.c b/kernel/sched.c index c9e3ab6..657bbc1 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3183,7 +3183,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next) { sched_info_switch(prev, next); - perf_event_task_sched_out(prev, next); + perf_event_task_sched(prev, next); fire_sched_out_preempt_notifiers(prev, next); prepare_lock_switch(rq, next); prepare_arch_switch(next); @@ -3226,13 +3226,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) */ prev_state = prev->state; finish_arch_switch(prev); -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW - local_irq_disable(); -#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */ - perf_event_task_sched_in(prev, current); -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW - local_irq_enable(); -#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */ finish_lock_switch(rq, prev); fire_sched_in_preempt_notifiers(current); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 12:49 ` Peter Zijlstra @ 2011-11-17 13:00 ` Gleb Natapov 2011-11-17 13:10 ` Peter Zijlstra 2011-11-17 13:29 ` Borislav Petkov 2011-11-21 13:17 ` Gleb Natapov 2 siblings, 1 reply; 14+ messages in thread From: Gleb Natapov @ 2011-11-17 13:00 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote: > On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote: > > jump_lable patching is very expensive operation that involves pausing all > > cpus. The patching of perf_sched_events jump_label is easily controllable > > from userspace by unprivileged user. When user runs loop like this > > "while true; do perf stat -e cycles true; done" the performance of my > > test application that just increments a counter for one second drops by > > 4%. This is on a 16 cpu box with my test application using only one of > > them. An impact on a real server doing real work will be much worse. > > Performance of KVM PMU drops nearly 50% due to jump_lable for "perf > > record" since KVM PMU implementation creates and destroys perf event > > frequently. > > Ideally we'd fix text_poke to not use stop_machine() we know how to, but > we haven't had the green light from Intel/AMD yet. > > Rostedt was going to implement it anyway and see if anything breaks. > Hmm interesting. > Also, virt might be able to pull something smart on text_poke() dunno. > The problem with virt is not text_poke() in a guest, but the one in a host. The guest I am testing with has only one cpu. Basically creating fist perf event/destroying last perf event is very expensive currently and when "perf record" is running in a guest this happens a lot in a host. > That said, I'd much rather throttle this particular jump label than > remove it altogether, some people really don't like all this scheduler > hot path crap. What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers? preempt notifiers checking is already on the scheduler hot path, so no additional overhead for perf case. -- Gleb. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 13:00 ` Gleb Natapov @ 2011-11-17 13:10 ` Peter Zijlstra 2011-11-17 13:24 ` Avi Kivity 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2011-11-17 13:10 UTC (permalink / raw) To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote: > > That said, I'd much rather throttle this particular jump label than > > remove it altogether, some people really don't like all this scheduler > > hot path crap. > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers? > preempt notifiers checking is already on the scheduler hot path, so no > additional overhead for perf case. Same problem really, some people complain about the overhead of preempt notifiers, also not all kernels have those in. Futhermore I loathe notifier lists because they obscure wtf is done. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 13:10 ` Peter Zijlstra @ 2011-11-17 13:24 ` Avi Kivity 2011-11-17 13:47 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Avi Kivity @ 2011-11-17 13:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On 11/17/2011 03:10 PM, Peter Zijlstra wrote: > On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote: > > > > That said, I'd much rather throttle this particular jump label than > > > remove it altogether, some people really don't like all this scheduler > > > hot path crap. > > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers? > > preempt notifiers checking is already on the scheduler hot path, so no > > additional overhead for perf case. > > Same problem really, some people complain about the overhead of preempt > notifiers, also not all kernels have those in. We could combine the two, sort-circuit preempt notifiers with jump labels if empty && not much activity on them. > Futhermore I loathe notifier lists because they obscure wtf is done. That's life in a general purpose kernel, if everyone gets their hook in to keep their code clean, the scheduler will bloat. An advantage of preempt notifiers is that you can make the perf code modular. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 13:24 ` Avi Kivity @ 2011-11-17 13:47 ` Peter Zijlstra 2011-11-17 14:12 ` Avi Kivity 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2011-11-17 13:47 UTC (permalink / raw) To: Avi Kivity Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, 2011-11-17 at 15:24 +0200, Avi Kivity wrote: > On 11/17/2011 03:10 PM, Peter Zijlstra wrote: > > On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote: > > > > > > That said, I'd much rather throttle this particular jump label than > > > > remove it altogether, some people really don't like all this scheduler > > > > hot path crap. > > > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers? > > > preempt notifiers checking is already on the scheduler hot path, so no > > > additional overhead for perf case. > > > > Same problem really, some people complain about the overhead of preempt > > notifiers, also not all kernels have those in. > > We could combine the two, sort-circuit preempt notifiers with jump > labels if empty && not much activity on them. Jump-labels are still more efficient, also I don't much like preempt notifiers. > > Futhermore I loathe notifier lists because they obscure wtf is done. > > That's life in a general purpose kernel, if everyone gets their hook in > to keep their code clean, the scheduler will bloat. Uhm, no. The bloat isn't different, the only difference is you can actually see it. So I very much prefer direct hooks. > An advantage of preempt notifiers is that you can make the perf code > modular. Yeah, and you know I loathe modules even more. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 13:47 ` Peter Zijlstra @ 2011-11-17 14:12 ` Avi Kivity 0 siblings, 0 replies; 14+ messages in thread From: Avi Kivity @ 2011-11-17 14:12 UTC (permalink / raw) To: Peter Zijlstra Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On 11/17/2011 03:47 PM, Peter Zijlstra wrote: > On Thu, 2011-11-17 at 15:24 +0200, Avi Kivity wrote: > > On 11/17/2011 03:10 PM, Peter Zijlstra wrote: > > > On Thu, 2011-11-17 at 15:00 +0200, Gleb Natapov wrote: > > > > > > > > That said, I'd much rather throttle this particular jump label than > > > > > remove it altogether, some people really don't like all this scheduler > > > > > hot path crap. > > > > What about moving perf_event_task_sched() to sched_(in|out)_preempt_notifiers? > > > > preempt notifiers checking is already on the scheduler hot path, so no > > > > additional overhead for perf case. > > > > > > Same problem really, some people complain about the overhead of preempt > > > notifiers, also not all kernels have those in. > > > > We could combine the two, sort-circuit preempt notifiers with jump > > labels if empty && not much activity on them. > > Jump-labels are still more efficient, also I don't much like preempt > notifiers. > > > > Futhermore I loathe notifier lists because they obscure wtf is done. > > > > That's life in a general purpose kernel, if everyone gets their hook in > > to keep their code clean, the scheduler will bloat. > > Uhm, no. The bloat isn't different, the only difference is you can > actually see it. So I very much prefer direct hooks. > > > An advantage of preempt notifiers is that you can make the perf code > > modular. > > Yeah, and you know I loathe modules even more. Is there something you like? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 12:49 ` Peter Zijlstra 2011-11-17 13:00 ` Gleb Natapov @ 2011-11-17 13:29 ` Borislav Petkov 2011-11-17 13:47 ` Gleb Natapov 2011-11-21 13:17 ` Gleb Natapov 2 siblings, 1 reply; 14+ messages in thread From: Borislav Petkov @ 2011-11-17 13:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Gleb Natapov, linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner, H. Peter Anvin On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote: > Ideally we'd fix text_poke to not use stop_machine() we know how to, > but we haven't had the green light from Intel/AMD yet. We talked about it briefly with hpa and tglx at KS. AFAICR, there's a patchset in the works... -- Regards/Gruss, Boris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 13:29 ` Borislav Petkov @ 2011-11-17 13:47 ` Gleb Natapov 0 siblings, 0 replies; 14+ messages in thread From: Gleb Natapov @ 2011-11-17 13:47 UTC (permalink / raw) To: Borislav Petkov, Peter Zijlstra, linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner, H. Peter Anvin On Thu, Nov 17, 2011 at 02:29:39PM +0100, Borislav Petkov wrote: > On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote: > > Ideally we'd fix text_poke to not use stop_machine() we know how to, > > but we haven't had the green light from Intel/AMD yet. > > We talked about it briefly with hpa and tglx at KS. AFAICR, there's a > patchset in the works... > I am eager to test it. If someone has something testable send it my way :) -- Gleb. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-17 12:49 ` Peter Zijlstra 2011-11-17 13:00 ` Gleb Natapov 2011-11-17 13:29 ` Borislav Petkov @ 2011-11-21 13:17 ` Gleb Natapov 2011-11-24 13:23 ` Peter Zijlstra 2 siblings, 1 reply; 14+ messages in thread From: Gleb Natapov @ 2011-11-21 13:17 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, Nov 17, 2011 at 01:49:19PM +0100, Peter Zijlstra wrote: > On Thu, 2011-11-17 at 14:30 +0200, Gleb Natapov wrote: > > jump_lable patching is very expensive operation that involves pausing all > > cpus. The patching of perf_sched_events jump_label is easily controllable > > from userspace by unprivileged user. When user runs loop like this > > "while true; do perf stat -e cycles true; done" the performance of my > > test application that just increments a counter for one second drops by > > 4%. This is on a 16 cpu box with my test application using only one of > > them. An impact on a real server doing real work will be much worse. > > Performance of KVM PMU drops nearly 50% due to jump_lable for "perf > > record" since KVM PMU implementation creates and destroys perf event > > frequently. > > Ideally we'd fix text_poke to not use stop_machine() we know how to, but > we haven't had the green light from Intel/AMD yet. > > Rostedt was going to implement it anyway and see if anything breaks. > > Also, virt might be able to pull something smart on text_poke() dunno. > > That said, I'd much rather throttle this particular jump label than > remove it altogether, some people really don't like all this scheduler > hot path crap. > So how about throttling it like the patch below does until stop_machine() no longer needed for patching (and it is possible that new way of patching will still have significant overhead). Signed-off-by: Gleb Natapov <gleb@redhat.com> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 66f23dc..a4687f6 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -3,12 +3,15 @@ #include <linux/types.h> #include <linux/compiler.h> +#include <linux/workqueue.h> #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) struct jump_label_key { atomic_t enabled; struct jump_entry *entries; + unsigned long rl_timeout; + struct delayed_work rl_work; #ifdef CONFIG_MODULES struct jump_label_mod *next; #endif @@ -28,9 +31,18 @@ struct module; #ifdef HAVE_JUMP_LABEL #ifdef CONFIG_MODULES -#define JUMP_LABEL_INIT {{ 0 }, NULL, NULL} +#define JUMP_LABEL_INIT { \ + .enabled = { 0 }, \ + .entries = NULL, \ + .rl_timeout = 0, \ + .next = NULL, \ + } #else -#define JUMP_LABEL_INIT {{ 0 }, NULL} +#define JUMP_LABEL_INIT { \ + .enabled = { 0 }, \ + .entries = NULL, \ + .rl_timeout = 0, \ + } #endif static __always_inline bool static_branch(struct jump_label_key *key) @@ -51,6 +63,7 @@ extern void jump_label_inc(struct jump_label_key *key); extern void jump_label_dec(struct jump_label_key *key); extern bool jump_label_enabled(struct jump_label_key *key); extern void jump_label_apply_nops(struct module *mod); +extern void jump_label_rate_limit(struct jump_label_key *key, unsigned long rl); #else @@ -97,6 +110,10 @@ static inline int jump_label_apply_nops(struct module *mod) return 0; } +static inline void jump_label_rate_limit(struct jump_label_key *key, + unsigned long rl) +{ +} #endif #endif diff --git a/kernel/events/core.c b/kernel/events/core.c index bdcd413..50bb631 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7071,6 +7091,9 @@ void __init perf_event_init(void) ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); + + /* do not patch jump label more than once per second */ + jump_label_rate_limit(&perf_sched_events, HZ); } static int __init perf_event_sysfs_init(void) diff --git a/kernel/jump_label.c b/kernel/jump_label.c index e6f1f24..88a82d2 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -72,15 +72,39 @@ void jump_label_inc(struct jump_label_key *key) jump_label_unlock(); } -void jump_label_dec(struct jump_label_key *key) +static void __jump_label_dec(struct jump_label_key *key, + unsigned long rate_limit) { if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex)) return; - jump_label_update(key, JUMP_LABEL_DISABLE); + if (rate_limit) { + atomic_inc(&key->enabled); + schedule_delayed_work(&key->rl_work, rate_limit); + } else + jump_label_update(key, JUMP_LABEL_DISABLE); + jump_label_unlock(); } +static void jump_label_update_timeout(struct work_struct *work) +{ + struct jump_label_key *key = + container_of(work, struct jump_label_key, rl_work.work); + __jump_label_dec(key, 0); +} + +void jump_label_dec(struct jump_label_key *key) +{ + __jump_label_dec(key, key->rl_timeout); +} + +void jump_label_rate_limit(struct jump_label_key *key, unsigned long rl) +{ + key->rl_timeout = rl; + INIT_DELAYED_WORK(&key->rl_work, jump_label_update_timeout); +} + static int addr_conflict(struct jump_entry *entry, void *start, void *end) { if (entry->code <= (unsigned long)end && -- Gleb. ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-21 13:17 ` Gleb Natapov @ 2011-11-24 13:23 ` Peter Zijlstra 2011-11-24 13:45 ` Gleb Natapov 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2011-11-24 13:23 UTC (permalink / raw) To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Mon, 2011-11-21 at 15:17 +0200, Gleb Natapov wrote: > So how about throttling it like the patch below does until stop_machine() > no longer needed for patching (and it is possible that new way of patching > will still have significant overhead). > > Signed-off-by: Gleb Natapov <gleb@redhat.com> > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > index 66f23dc..a4687f6 100644 > --- a/include/linux/jump_label.h > +++ b/include/linux/jump_label.h > @@ -3,12 +3,15 @@ > > #include <linux/types.h> > #include <linux/compiler.h> > +#include <linux/workqueue.h> > > #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) > > struct jump_label_key { > atomic_t enabled; > struct jump_entry *entries; > + unsigned long rl_timeout; > + struct delayed_work rl_work; > #ifdef CONFIG_MODULES > struct jump_label_mod *next; I'm not sure its worth it doing in generic code like this, it bloats the jump_label_key structure quite significantly (and there's tons of those around) for only 1 real user. If we want to do this in generic code, I think its best to introduce something like: struct jump_label_key_deferred { struct jump_label_key key; unsigned long timeout; struct delayed_work work; } But is there really any other user for this? All the trace bits are root only iirc and kvm itself only sets them on the guest kernel I think for paravirt, so that's not a problem. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-24 13:23 ` Peter Zijlstra @ 2011-11-24 13:45 ` Gleb Natapov 2011-11-24 14:18 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Gleb Natapov @ 2011-11-24 13:45 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, Nov 24, 2011 at 02:23:00PM +0100, Peter Zijlstra wrote: > On Mon, 2011-11-21 at 15:17 +0200, Gleb Natapov wrote: > > So how about throttling it like the patch below does until stop_machine() > > no longer needed for patching (and it is possible that new way of patching > > will still have significant overhead). > > > > Signed-off-by: Gleb Natapov <gleb@redhat.com> > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > > index 66f23dc..a4687f6 100644 > > --- a/include/linux/jump_label.h > > +++ b/include/linux/jump_label.h > > @@ -3,12 +3,15 @@ > > > > #include <linux/types.h> > > #include <linux/compiler.h> > > +#include <linux/workqueue.h> > > > > #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) > > > > struct jump_label_key { > > atomic_t enabled; > > struct jump_entry *entries; > > + unsigned long rl_timeout; > > + struct delayed_work rl_work; > > #ifdef CONFIG_MODULES > > struct jump_label_mod *next; > > > I'm not sure its worth it doing in generic code like this, it bloats the > jump_label_key structure quite significantly (and there's tons of those > around) for only 1 real user. To do it in the perf code we will have to replicate "enabled" accounting of jump_label in the perf code. The code will be much uglier this way IMO. On the other hand the feature fits nicely into generic jump_label code and apart from jump_label_key structure size overhead has no other downsides. And size overhead will be eliminated by your suggestion below. > > If we want to do this in generic code, I think its best to introduce > something like: > > struct jump_label_key_deferred { > struct jump_label_key key; > unsigned long timeout; > struct delayed_work work; > } > Yes, I contemplated this. I didn't realized that there are tons of jump_labels though. > But is there really any other user for this? All the trace bits are root > only iirc and kvm itself only sets them on the guest kernel I think for > paravirt, so that's not a problem. > The problem I am trying to fix with this patch is not strictly virtualization related. As I showed in my previous email on the subject you can trigger this jump label patching very often without virtualization at all. Of course with virtulization the problem is much more serious since the patching is going on in the host kernel and guest kernel. -- Gleb. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-24 13:45 ` Gleb Natapov @ 2011-11-24 14:18 ` Peter Zijlstra 2011-11-24 17:43 ` Gleb Natapov 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2011-11-24 14:18 UTC (permalink / raw) To: Gleb Natapov; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, 2011-11-24 at 15:45 +0200, Gleb Natapov wrote: > Yes, I contemplated this. I didn't realized that there are tons of > jump_labels though. Yeah, every tracepoint has one, that's by far the biggest user currently. > > But is there really any other user for this? All the trace bits are root > > only iirc and kvm itself only sets them on the guest kernel I think for > > paravirt, so that's not a problem. > > > The problem I am trying to fix with this patch is not strictly > virtualization related. No I know, its about user-trigerably jump_label conversions, but afaik that's only perf. The others: tracepoints, and paravirt can't be toggled by unpriv. users. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] remove jump_label optimization for perf sched events 2011-11-24 14:18 ` Peter Zijlstra @ 2011-11-24 17:43 ` Gleb Natapov 0 siblings, 0 replies; 14+ messages in thread From: Gleb Natapov @ 2011-11-24 17:43 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, mingo, Jason Baron, rostedt, Thomas Gleixner On Thu, Nov 24, 2011 at 03:18:04PM +0100, Peter Zijlstra wrote: > > > But is there really any other user for this? All the trace bits are root > > > only iirc and kvm itself only sets them on the guest kernel I think for > > > paravirt, so that's not a problem. > > > > > The problem I am trying to fix with this patch is not strictly > > virtualization related. > > No I know, its about user-trigerably jump_label conversions, but afaik > that's only perf. The others: tracepoints, and paravirt can't be toggled > by unpriv. users. Currently it may be only perf, I haven't checked. But as jump_label will gain more users the problem may become more widespread. Now that I know how jump_label work I will look into changing some KVM ifs into it. If some of them will be triggered by a guest action (for instance enabling debug, or entering real mode) they will have to be throttled too. But this is just speculation at this point. What is important is fixing the one we have now. I looked into doing it outside of jump_label code, but then I saw that I need to duplicate most of jump_label_inc/jump_label_dec code in perf so I went for more general solution. -- Gleb. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-11-24 17:43 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-17 12:30 [PATCH RFC] remove jump_label optimization for perf sched events Gleb Natapov 2011-11-17 12:49 ` Peter Zijlstra 2011-11-17 13:00 ` Gleb Natapov 2011-11-17 13:10 ` Peter Zijlstra 2011-11-17 13:24 ` Avi Kivity 2011-11-17 13:47 ` Peter Zijlstra 2011-11-17 14:12 ` Avi Kivity 2011-11-17 13:29 ` Borislav Petkov 2011-11-17 13:47 ` Gleb Natapov 2011-11-21 13:17 ` Gleb Natapov 2011-11-24 13:23 ` Peter Zijlstra 2011-11-24 13:45 ` Gleb Natapov 2011-11-24 14:18 ` Peter Zijlstra 2011-11-24 17:43 ` Gleb Natapov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox