* Re: [RFC] Full syscall argument decode in "perf trace"
2013-09-26 7:41 ` Denys Vlasenko
@ 2013-09-30 11:33 ` Denys Vlasenko
0 siblings, 0 replies; 9+ messages in thread
From: Denys Vlasenko @ 2013-09-30 11:33 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Denys Vlasenko, Tom Zanussi, Steven Rostedt, Ingo Molnar,
Jiri Olsa, Masami Hiramatsu, Oleg Nesterov,
Linux Kernel Mailing List, Jiri Moskovcak
[-- Attachment #1: Type: text/plain, Size: 3388 bytes --]
On Thu, Sep 26, 2013 at 9:41 AM, Denys Vlasenko
<vda.linux@googlemail.com> wrote:
> On Wed, Sep 18, 2013 at 4:33 PM, Arnaldo Carvalho de Melo
> <acme@redhat.com> wrote:
>>> The problem: ~100 more tracepoints need to be added merely to get
>>> to the point where strace already is, wrt quality of syscall decoding.
>>> strace has nearly 300 separate custom syscall formatting functions,
>>> some of them quite complex.
>>>
>>> If we need to add syscall stopping feature (which, as I said above,
>>> will be necessary anyway IMO), then syscall decoding can be as good
>>> as strace *already*. Then, gradually more tracepoints are added
>>> to make it faster.
>>>
>>> I am thinking about going into this direction.
>>>
>>> Therefore my question should be restated as:
>>>
>>> Would perf developers accept the "syscall pausing" feature,
>>> or it won't be accepted?
>>
>> Do you have some patch for us to try?
>
> I have a patch which is a bit strace specific: it sidesteps
> the question of the synchronization between traced process
> and its tracer by using ptrace's existing method of reporting stops.
>
> This works for strace, and is very easy to implement.
> Naturally, other tracers (e.g. "perf trace" wouldn't
> want to start using ptrace! Synchronization needs
> to be done in some other way, not as a ptrace stop.
>
> For one, the stopping flag needs to be a counter, so that
> more than one tracer can use this feature concurrently.
>
> But anyway, I am attaching it.
>
> It adds a new flag, attr.sysexit_stop, which makes process stop
> at next syscall exit when this tracepoint overflows.
Here is the next iteration of the work in progress.
I added syscall masks.
This necessitated propagation of pointer to struct pt_regs
which points to userspace registers from sys_{enter,exit}
tracepoints to overflow handling functions, in order to get syscall#.
(Yes, I discovered that pt_regs which was already there wasn't
the *userspace* one).
The patch is tested: I have a modified version of strace
which decodes all syscalls properly and which avoids stopping
on all syscall entries and on a selected few syscall exits too.
As I see it, the next thing to tackle is the stopping method.
(The current patch still uses my old ptrace-specific hack).
How about the following: add a per-task "pause counter".
If it is <= 0, then task is not paused. If it is > 0, task is paused.
When an attached perf fd causes task to pause, the counter
is incremented, a marker is written into the perf buffer,
and task goes to sleep.
When tracer process sees the marker, it commands traced
process to "unpause", which decrements the counter.
Why this way?
* this allows traced process to be paused by several tracers
at once.
* this does not need heavy-weight notifications to be sent
to tracers (unlike my current hack, which invokes the
waitpid notification machinery, the source of much of strace's
slowness).
* it might work even if counter increment is reordered
relative to perf marker writing. if tracer sees the marker,
it can "unpause" - decrement counter and cause it to go -1.
The task is not paused (the rule is "<= 0", not "= 0").
Then kernel increments the counter, it's 0 now,
and task is still not paused. (I'm not sure whether
such property is useful, but if it is, we have it - good :)
The downside is, we'd need one new field in task struct.
Does this look sensible to you?
[-- Attachment #2: perf_trace_stop_RFC_v2.diff --]
[-- Type: application/octet-stream, Size: 25871 bytes --]
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/alpha/kernel/perf_event.c linux-3.10.11-100.fc18.x86_64/arch/alpha/kernel/perf_event.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/alpha/kernel/perf_event.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/alpha/kernel/perf_event.c 2013-09-30 11:19:04.290849329 +0200
@@ -850,7 +850,7 @@ static void alpha_perf_event_irq_handler
perf_sample_data_init(&data, 0, hwc->last_period);
if (alpha_perf_event_set_period(event, hwc, idx)) {
- if (perf_event_overflow(event, &data, regs)) {
+ if (perf_event_overflow(event, &data, regs, NULL)) {
/* Interrupts coming too quickly; "throttle" the
* counter, i.e., disable it for a little while.
*/
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_v6.c linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_v6.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_v6.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_v6.c 2013-09-30 11:19:04.291849332 +0200
@@ -514,7 +514,7 @@ armv6pmu_handle_irq(int irq_num,
if (!armpmu_event_set_period(event))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
cpu_pmu->disable(event);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_v7.c linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_v7.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_v7.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_v7.c 2013-09-30 11:19:04.293849338 +0200
@@ -1074,7 +1074,7 @@ static irqreturn_t armv7pmu_handle_irq(i
if (!armpmu_event_set_period(event))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
cpu_pmu->disable(event);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_xscale.c linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_xscale.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/arm/kernel/perf_event_xscale.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/arm/kernel/perf_event_xscale.c 2013-09-30 11:19:04.294849341 +0200
@@ -265,7 +265,7 @@ xscale1pmu_handle_irq(int irq_num, void
if (!armpmu_event_set_period(event))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
cpu_pmu->disable(event);
}
@@ -606,7 +606,7 @@ xscale2pmu_handle_irq(int irq_num, void
if (!armpmu_event_set_period(event))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
cpu_pmu->disable(event);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/arm64/kernel/perf_event.c linux-3.10.11-100.fc18.x86_64/arch/arm64/kernel/perf_event.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/arm64/kernel/perf_event.c 2013-09-23 12:03:25.604253957 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/arm64/kernel/perf_event.c 2013-09-30 11:19:04.295849344 +0200
@@ -1063,7 +1063,7 @@ static irqreturn_t armv8pmu_handle_irq(i
if (!armpmu_event_set_period(event, hwc, idx))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
cpu_pmu->disable(hwc, idx);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/metag/kernel/perf/perf_event.c linux-3.10.11-100.fc18.x86_64/arch/metag/kernel/perf/perf_event.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/metag/kernel/perf/perf_event.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/metag/kernel/perf/perf_event.c 2013-09-30 11:19:04.296849347 +0200
@@ -789,7 +789,7 @@ static irqreturn_t metag_pmu_counter_ove
* completed. Note the counter value may have been modified while it was
* inactive to set it up ready for the next interrupt.
*/
- if (!perf_event_overflow(event, &sampledata, regs)) {
+ if (!perf_event_overflow(event, &sampledata, regs, NULL)) {
__global_lock2(flags);
counter = (counter & 0xff000000) |
(metag_in32(PERF_COUNT(idx)) & 0x00ffffff);
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/mips/kernel/perf_event_mipsxx.c linux-3.10.11-100.fc18.x86_64/arch/mips/kernel/perf_event_mipsxx.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/mips/kernel/perf_event_mipsxx.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/mips/kernel/perf_event_mipsxx.c 2013-09-30 11:19:04.297849351 +0200
@@ -746,7 +746,7 @@ static void handle_associated_event(stru
if (!mipspmu_event_set_period(event, hwc, idx))
return;
- if (perf_event_overflow(event, data, regs))
+ if (perf_event_overflow(event, data, regs, NULL))
mipsxx_pmu_disable_event(idx);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/powerpc/perf/core-book3s.c linux-3.10.11-100.fc18.x86_64/arch/powerpc/perf/core-book3s.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/powerpc/perf/core-book3s.c 2013-09-23 12:03:25.610253955 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/powerpc/perf/core-book3s.c 2013-09-30 11:19:04.297849351 +0200
@@ -1639,7 +1639,7 @@ static void record_and_restart(struct pe
data.br_stack = &cpuhw->bhrb_stack;
}
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
power_pmu_stop(event, 0);
}
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/powerpc/perf/core-fsl-emb.c linux-3.10.11-100.fc18.x86_64/arch/powerpc/perf/core-fsl-emb.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/powerpc/perf/core-fsl-emb.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/powerpc/perf/core-fsl-emb.c 2013-09-30 11:19:04.297849351 +0200
@@ -615,7 +615,7 @@ static void record_and_restart(struct pe
perf_sample_data_init(&data, 0, event->hw.last_period);
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
fsl_emb_pmu_stop(event, 0);
}
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/sparc/kernel/perf_event.c linux-3.10.11-100.fc18.x86_64/arch/sparc/kernel/perf_event.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/sparc/kernel/perf_event.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/sparc/kernel/perf_event.c 2013-09-30 11:19:04.297849351 +0200
@@ -1633,7 +1633,7 @@ static int __kprobes perf_event_nmi_hand
if (!sparc_perf_event_set_period(event, hwc, idx))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
sparc_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_amd_ibs.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_amd_ibs.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_amd_ibs.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_amd_ibs.c 2013-09-30 11:19:04.298849354 +0200
@@ -580,7 +580,7 @@ static int perf_ibs_handle_irq(struct pe
data.raw = &raw;
}
- throttle = perf_event_overflow(event, &data, ®s);
+ throttle = perf_event_overflow(event, &data, ®s, NULL);
out:
if (throttle)
perf_ibs_disable_event(perf_ibs, hwc, *config);
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event.c 2013-09-30 11:19:04.298849354 +0200
@@ -1225,7 +1225,7 @@ int x86_pmu_handle_irq(struct pt_regs *r
if (!x86_perf_event_set_period(event))
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
x86_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_intel.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_intel.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_intel.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_intel.c 2013-09-30 11:19:04.298849354 +0200
@@ -1222,7 +1222,7 @@ again:
if (has_branch_stack(event))
data.br_stack = &cpuc->lbr_stack;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
x86_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_intel_ds.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_intel_ds.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_intel_ds.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_intel_ds.c 2013-09-30 11:19:04.298849354 +0200
@@ -761,7 +761,7 @@ static void __intel_pmu_pebs_event(struc
if (has_branch_stack(event))
data.br_stack = &cpuc->lbr_stack;
- if (perf_event_overflow(event, &data, ®s))
+ if (perf_event_overflow(event, &data, ®s, NULL))
x86_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_knc.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_knc.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_knc.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_knc.c 2013-09-30 11:19:04.299849357 +0200
@@ -251,7 +251,7 @@ again:
perf_sample_data_init(&data, 0, event->hw.last_period);
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
x86_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_p4.c linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_p4.c
--- linux-3.10.11-100.fc18.x86_64.ORG/arch/x86/kernel/cpu/perf_event_p4.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/arch/x86/kernel/cpu/perf_event_p4.c 2013-09-30 11:19:04.299849357 +0200
@@ -1037,7 +1037,7 @@ static int p4_pmu_handle_irq(struct pt_r
continue;
- if (perf_event_overflow(event, &data, regs))
+ if (perf_event_overflow(event, &data, regs, NULL))
x86_pmu_stop(event, 0);
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/include/linux/ftrace_event.h linux-3.10.11-100.fc18.x86_64/include/linux/ftrace_event.h
--- linux-3.10.11-100.fc18.x86_64.ORG/include/linux/ftrace_event.h 2013-09-23 12:03:25.714253910 +0200
+++ linux-3.10.11-100.fc18.x86_64/include/linux/ftrace_event.h 2013-09-30 11:19:04.299849357 +0200
@@ -376,10 +376,10 @@ extern void *perf_trace_buf_prepare(int
static inline void
perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
- u64 count, struct pt_regs *regs, void *head,
- struct task_struct *task)
+ u64 count, struct pt_regs *regs, struct pt_regs *user_regs,
+ void *head, struct task_struct *task)
{
- perf_tp_event(addr, count, raw_data, size, regs, head, rctx, task);
+ perf_tp_event(addr, count, raw_data, size, regs, user_regs, head, rctx, task);
}
#endif
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/include/linux/perf_event.h linux-3.10.11-100.fc18.x86_64/include/linux/perf_event.h
--- linux-3.10.11-100.fc18.x86_64.ORG/include/linux/perf_event.h 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/include/linux/perf_event.h 2013-09-30 11:19:04.299849357 +0200
@@ -602,7 +602,8 @@ extern void perf_prepare_sample(struct p
extern int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
- struct pt_regs *regs);
+ struct pt_regs *regs,
+ struct pt_regs *user_regs);
static inline bool is_sampling_event(struct perf_event *event)
{
@@ -717,7 +718,7 @@ static inline bool perf_paranoid_kernel(
extern void perf_event_init(void);
extern void perf_tp_event(u64 addr, u64 count, void *record,
- int entry_size, struct pt_regs *regs,
+ int entry_size, struct pt_regs *regs, struct pt_regs *user_regs,
struct hlist_head *head, int rctx,
struct task_struct *task);
extern void perf_bp_event(struct perf_event *event, void *data);
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/include/trace/events/syscalls.h linux-3.10.11-100.fc18.x86_64/include/trace/events/syscalls.h
--- linux-3.10.11-100.fc18.x86_64.ORG/include/trace/events/syscalls.h 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/include/trace/events/syscalls.h 2013-09-30 12:05:15.658006437 +0200
@@ -30,6 +30,7 @@ TRACE_EVENT_FN(sys_enter,
TP_fast_assign(
__entry->id = id;
syscall_get_arguments(current, regs, 0, 6, __entry->args);
+ user_regs = regs;
),
TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
@@ -56,6 +57,7 @@ TRACE_EVENT_FN(sys_exit,
TP_fast_assign(
__entry->id = syscall_get_nr(current, regs);
__entry->ret = ret;
+ user_regs = regs;
),
TP_printk("NR %ld = %ld",
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/include/trace/ftrace.h linux-3.10.11-100.fc18.x86_64/include/trace/ftrace.h
--- linux-3.10.11-100.fc18.x86_64.ORG/include/trace/ftrace.h 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/include/trace/ftrace.h 2013-09-30 12:10:59.065011590 +0200
@@ -519,6 +519,8 @@ ftrace_raw_event_##call(void *__data, pr
struct ftrace_raw_##call *entry; \
struct ring_buffer *buffer; \
unsigned long irq_flags; \
+ /* dummy. "assign" macro param might need it to exist: */ \
+ struct pt_regs __maybe_unused *user_regs; \
int __data_size; \
int pc; \
\
@@ -652,6 +654,8 @@ perf_trace_##call(void *__data, proto)
struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
struct ftrace_raw_##call *entry; \
struct pt_regs __regs; \
+ /* "assign" macro parameter might overwrite it: */ \
+ struct pt_regs *user_regs = NULL; \
u64 __addr = 0, __count = 1; \
struct task_struct *__task = NULL; \
struct hlist_head *head; \
@@ -681,7 +685,7 @@ perf_trace_##call(void *__data, proto)
\
head = this_cpu_ptr(event_call->perf_events); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
- __count, &__regs, head, __task); \
+ __count, &__regs, user_regs, head, __task); \
}
/*
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/include/uapi/linux/perf_event.h linux-3.10.11-100.fc18.x86_64/include/uapi/linux/perf_event.h
--- linux-3.10.11-100.fc18.x86_64.ORG/include/uapi/linux/perf_event.h 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/include/uapi/linux/perf_event.h 2013-09-30 11:19:04.300849360 +0200
@@ -273,7 +273,10 @@ struct perf_event_attr {
exclude_callchain_kernel : 1, /* exclude kernel callchains */
exclude_callchain_user : 1, /* exclude user callchains */
- __reserved_1 : 41;
+ sysenter_stop : 1,
+ sysexit_stop : 1,
+
+ __reserved_1 : 39;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -304,6 +307,15 @@ struct perf_event_attr {
/* Align to u64. */
__u32 __reserved_2;
+
+ /*
+ * If sys{enter,exit}_stop should ignore some syscalls,
+ * these bitmasks specify which to ignore. Otherwise set to 0/NULL.
+ */
+ unsigned sysenter_mask_len;
+ unsigned sysexit_mask_len;
+ unsigned long *sysenter_mask_ptr;
+ unsigned long *sysexit_mask_ptr;
};
#define perf_flags(attr) (*(&(attr)->read_format + 1))
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/kernel/events/core.c linux-3.10.11-100.fc18.x86_64/kernel/events/core.c
--- linux-3.10.11-100.fc18.x86_64.ORG/kernel/events/core.c 2013-09-23 12:03:25.719253908 +0200
+++ linux-3.10.11-100.fc18.x86_64/kernel/events/core.c 2013-09-30 12:11:21.929011933 +0200
@@ -43,6 +43,7 @@
#include "internal.h"
#include <asm/irq_regs.h>
+#include <asm/syscall.h>
struct remote_function_call {
struct task_struct *p;
@@ -2933,6 +2934,7 @@ static void free_event_rcu(struct rcu_he
if (event->ns)
put_pid_ns(event->ns);
perf_event_free_filter(event);
+ kfree(event->attr.sysenter_mask_ptr);
kfree(event);
}
@@ -4964,7 +4966,8 @@ static void perf_log_throttle(struct per
static int __perf_event_overflow(struct perf_event *event,
int throttle, struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ struct pt_regs *user_regs)
{
int events = atomic_read(&event->event_limit);
struct hw_perf_event *hwc = &event->hw;
@@ -5026,14 +5029,35 @@ static int __perf_event_overflow(struct
irq_work_queue(&event->pending);
}
+ if (!in_interrupt() && event->attr.sysexit_stop && current->ptrace && user_regs) {
+ if (event->attr.sysexit_mask_len != 0) {
+ int bits;
+ int scno;
+
+ scno = syscall_get_nr(current, user_regs);
+ if (scno < 0)
+ goto stop;
+ bits = event->attr.sysexit_mask_len * 8;
+ if (scno >= bits)
+ goto stop;
+ if (!test_bit(scno, event->attr.sysexit_mask_ptr))
+ goto stop;
+ goto skip;
+ }
+ stop:
+ set_tsk_thread_flag(current, TIF_SYSCALL_TRACE);
+ skip: ;
+ }
+
return ret;
}
int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ struct pt_regs *user_regs)
{
- return __perf_event_overflow(event, 1, data, regs);
+ return __perf_event_overflow(event, 1, data, regs, user_regs);
}
/*
@@ -5083,7 +5107,8 @@ again:
static void perf_swevent_overflow(struct perf_event *event, u64 overflow,
struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ struct pt_regs *user_regs)
{
struct hw_perf_event *hwc = &event->hw;
int throttle = 0;
@@ -5096,7 +5121,7 @@ static void perf_swevent_overflow(struct
for (; overflow; overflow--) {
if (__perf_event_overflow(event, throttle,
- data, regs)) {
+ data, regs, user_regs)) {
/*
* We inhibit the overflow from happening when
* hwc->interrupts == MAX_INTERRUPTS.
@@ -5109,7 +5134,8 @@ static void perf_swevent_overflow(struct
static void perf_swevent_event(struct perf_event *event, u64 nr,
struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ struct pt_regs *user_regs)
{
struct hw_perf_event *hwc = &event->hw;
@@ -5123,17 +5149,17 @@ static void perf_swevent_event(struct pe
if ((event->attr.sample_type & PERF_SAMPLE_PERIOD) && !event->attr.freq) {
data->period = nr;
- return perf_swevent_overflow(event, 1, data, regs);
+ return perf_swevent_overflow(event, 1, data, regs, user_regs);
} else
data->period = event->hw.last_period;
if (nr == 1 && hwc->sample_period == 1 && !event->attr.freq)
- return perf_swevent_overflow(event, 1, data, regs);
+ return perf_swevent_overflow(event, 1, data, regs, user_regs);
if (local64_add_negative(nr, &hwc->period_left))
return;
- perf_swevent_overflow(event, 0, data, regs);
+ perf_swevent_overflow(event, 0, data, regs, user_regs);
}
static int perf_exclude_event(struct perf_event *event,
@@ -5223,7 +5249,8 @@ find_swevent_head(struct swevent_htable
static void do_perf_sw_event(enum perf_type_id type, u32 event_id,
u64 nr,
struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ struct pt_regs *user_regs)
{
struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
struct perf_event *event;
@@ -5236,7 +5263,7 @@ static void do_perf_sw_event(enum perf_t
hlist_for_each_entry_rcu(event, head, hlist_entry) {
if (perf_swevent_match(event, type, event_id, data, regs))
- perf_swevent_event(event, nr, data, regs);
+ perf_swevent_event(event, nr, data, regs, user_regs);
}
end:
rcu_read_unlock();
@@ -5269,7 +5296,7 @@ void __perf_sw_event(u32 event_id, u64 n
perf_sample_data_init(&data, addr, 0);
- do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, &data, regs);
+ do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, &data, regs, NULL);
perf_swevent_put_recursion_context(rctx);
preempt_enable_notrace();
@@ -5514,7 +5541,8 @@ static int perf_tp_event_match(struct pe
}
void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
- struct pt_regs *regs, struct hlist_head *head, int rctx,
+ struct pt_regs *regs, struct pt_regs *user_regs,
+ struct hlist_head *head, int rctx,
struct task_struct *task)
{
struct perf_sample_data data;
@@ -5530,7 +5558,7 @@ void perf_tp_event(u64 addr, u64 count,
hlist_for_each_entry_rcu(event, head, hlist_entry) {
if (perf_tp_event_match(event, &data, regs))
- perf_swevent_event(event, count, &data, regs);
+ perf_swevent_event(event, count, &data, regs, user_regs);
}
/*
@@ -5552,7 +5580,7 @@ void perf_tp_event(u64 addr, u64 count,
if (event->attr.config != entry->type)
continue;
if (perf_tp_event_match(event, &data, regs))
- perf_swevent_event(event, count, &data, regs);
+ perf_swevent_event(event, count, &data, regs, user_regs);
}
unlock:
rcu_read_unlock();
@@ -5656,7 +5684,7 @@ void perf_bp_event(struct perf_event *bp
perf_sample_data_init(&sample, bp->attr.bp_addr, 0);
if (!bp->hw.state && !perf_exclude_event(bp, regs))
- perf_swevent_event(bp, 1, &sample, regs);
+ perf_swevent_event(bp, 1, &sample, regs, NULL);
}
#endif
@@ -5684,7 +5712,7 @@ static enum hrtimer_restart perf_swevent
if (regs && !perf_exclude_event(event, regs)) {
if (!(event->attr.exclude_idle && is_idle_task(current)))
- if (__perf_event_overflow(event, 1, &data, regs))
+ if (__perf_event_overflow(event, 1, &data, regs, NULL))
ret = HRTIMER_NORESTART;
}
@@ -6469,6 +6497,32 @@ static int perf_copy_attr(struct perf_ev
ret = -EINVAL;
}
+ if ((attr->sysenter_mask_len | attr->sysexit_mask_len) & (sizeof(long)-1))
+ return -EINVAL;
+ size = attr->sysenter_mask_len + attr->sysexit_mask_len;
+ if (size > PAGE_SIZE)
+ return -EINVAL;
+ if (size != 0) {
+ unsigned long *kp = kzalloc(size, GFP_KERNEL);
+ if (!kp)
+ return -ENOMEM;
+
+ ret = copy_from_user(kp, (void __user *)attr->sysenter_mask_ptr, attr->sysenter_mask_len);
+ attr->sysenter_mask_ptr = kp;
+ if (!ret) {
+ kp = (void*)kp + attr->sysenter_mask_len;
+ ret = copy_from_user(kp, (void __user *)attr->sysexit_mask_ptr, attr->sysexit_mask_len);
+ attr->sysexit_mask_ptr = kp;
+ }
+ if (ret) {
+ kfree(attr->sysenter_mask_ptr);
+ goto out;
+ }
+ } else {
+ attr->sysenter_mask_ptr = NULL;
+ attr->sysexit_mask_ptr = NULL;
+ }
+
out:
return ret;
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_event_perf.c linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_event_perf.c
--- linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_event_perf.c 2013-07-01 00:13:29.000000000 +0200
+++ linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_event_perf.c 2013-09-30 11:19:04.301849363 +0200
@@ -282,7 +282,7 @@ perf_ftrace_function_call(unsigned long
head = this_cpu_ptr(event_function.perf_events);
perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
- 1, ®s, head, NULL);
+ 1, ®s, NULL, head, NULL);
#undef ENTRY_SIZE
}
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_kprobe.c linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_kprobe.c
--- linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_kprobe.c 2013-09-23 12:03:25.726253905 +0200
+++ linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_kprobe.c 2013-09-30 11:19:04.301849363 +0200
@@ -1193,7 +1193,7 @@ kprobe_perf_func(struct trace_probe *tp,
head = this_cpu_ptr(call->perf_events);
perf_trace_buf_submit(entry, size, rctx,
- entry->ip, 1, regs, head, NULL);
+ entry->ip, 1, regs, NULL, head, NULL);
}
/* Kretprobe profile handler */
@@ -1225,7 +1225,7 @@ kretprobe_perf_func(struct trace_probe *
head = this_cpu_ptr(call->perf_events);
perf_trace_buf_submit(entry, size, rctx,
- entry->ret_ip, 1, regs, head, NULL);
+ entry->ret_ip, 1, regs, NULL, head, NULL);
}
#endif /* CONFIG_PERF_EVENTS */
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_syscalls.c linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_syscalls.c
--- linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_syscalls.c 2013-09-23 12:03:25.726253905 +0200
+++ linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_syscalls.c 2013-09-30 11:19:04.301849363 +0200
@@ -585,7 +585,7 @@ static void perf_syscall_enter(void *ign
(unsigned long *)&rec->args);
head = this_cpu_ptr(sys_data->enter_event->perf_events);
- perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head, NULL);
+ perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, regs, head, NULL);
}
static int perf_sysenter_enable(struct ftrace_event_call *call)
@@ -663,7 +663,7 @@ static void perf_syscall_exit(void *igno
rec->ret = syscall_get_return_value(current, regs);
head = this_cpu_ptr(sys_data->exit_event->perf_events);
- perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head, NULL);
+ perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, regs, head, NULL);
}
static int perf_sysexit_enable(struct ftrace_event_call *call)
diff -urp linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_uprobe.c linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_uprobe.c
--- linux-3.10.11-100.fc18.x86_64.ORG/kernel/trace/trace_uprobe.c 2013-09-23 12:03:25.727253904 +0200
+++ linux-3.10.11-100.fc18.x86_64/kernel/trace/trace_uprobe.c 2013-09-30 11:19:04.301849363 +0200
@@ -862,7 +862,7 @@ static void uprobe_perf_print(struct tra
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
- perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
+ perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, NULL, head, NULL);
out:
preempt_enable();
}
^ permalink raw reply [flat|nested] 9+ messages in thread