* [PATCH 1/1] perf/core: find auxiliary events in running pmus list
@ 2016-02-24 21:20 kan.liang
2016-02-25 8:17 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: kan.liang @ 2016-02-24 21:20 UTC (permalink / raw)
To: peterz, linux-kernel
Cc: ak, eranian, vincent.weaver, tglx, mingo, acme, jolsa, ying.huang,
Kan Liang
From: Kan Liang <kan.liang@intel.com>
perf_event_aux funciton goes through pmus list to find proper auxiliary
events to output. The pmus list consists of all possible pmus in the
system, that may or may not be running at the moment, while the
auxiliary events must be from the running pmus. Therefore searching
non-running pmus is unnecessary and expensive especially when there are
many non-running pmus on the list.
For example, the brk test case in lkp triggers many mmap operations,
at the time, perf with cycles:pp is also running on the system. As a
result, many perf_event_aux are invoked, and each would search the whole
pmus list. If we enable the uncore support (even when uncore event are
not really used), dozens of uncore pmus will be added into pmus list,
which can significantly decrease brk_test's ops_per_sec. Based on our
test, the ops_per_sec without uncore patch is 2647573, while the
ops_per_sec with uncore patch is only 1768444, which is a 33.2%
reduction.
This patch introduces a running_pmus list which only tracks the running
pmus in the system. The perf_event_aux uses running_pmus list instead of
pmus list to find auxiliary events.
Reported-by: Huang, Ying <ying.huang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
kernel/events/core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94c47e3..e33a0de 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -335,6 +335,14 @@ static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
static struct srcu_struct pmus_srcu;
+struct running_pmu {
+ struct list_head entry;
+ struct pmu *pmu;
+ int nr_event;
+};
+static LIST_HEAD(running_pmus);
+static DEFINE_MUTEX(running_pmus_lock);
+
/*
* perf event paranoia level:
* -1 - not paranoid at all
@@ -3511,6 +3519,23 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
atomic_dec(&per_cpu(perf_cgroup_events, cpu));
}
+static void unaccount_running_pmu(struct perf_event *event)
+{
+ struct running_pmu *pmu, *tmp;
+
+ mutex_lock(&running_pmus_lock);
+
+ list_for_each_entry_safe(pmu, tmp, &running_pmus, entry) {
+ if ((pmu->pmu == event->pmu) && !(--pmu->nr_event)) {
+ list_del_rcu(&pmu->entry);
+ kfree(pmu);
+ break;
+ }
+ }
+
+ mutex_unlock(&running_pmus_lock);
+}
+
static void unaccount_event(struct perf_event *event)
{
bool dec = false;
@@ -3541,6 +3566,8 @@ static void unaccount_event(struct perf_event *event)
static_key_slow_dec_deferred(&perf_sched_events);
unaccount_event_cpu(event, event->cpu);
+
+ unaccount_running_pmu(event);
}
/*
@@ -5616,6 +5643,7 @@ perf_event_aux(perf_event_aux_output_cb output, void *data,
{
struct perf_cpu_context *cpuctx;
struct perf_event_context *ctx;
+ struct running_pmu *running_pmu;
struct pmu *pmu;
int ctxn;
@@ -5631,7 +5659,9 @@ perf_event_aux(perf_event_aux_output_cb output, void *data,
}
rcu_read_lock();
- list_for_each_entry_rcu(pmu, &pmus, entry) {
+
+ list_for_each_entry_rcu(running_pmu, &running_pmus, entry) {
+ pmu = running_pmu->pmu;
cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
if (cpuctx->unique_pmu != pmu)
goto next;
@@ -7740,6 +7770,29 @@ static void account_event_cpu(struct perf_event *event, int cpu)
atomic_inc(&per_cpu(perf_cgroup_events, cpu));
}
+static void account_running_pmu(struct perf_event *event)
+{
+ struct running_pmu *pmu;
+
+ mutex_lock(&running_pmus_lock);
+
+ list_for_each_entry(pmu, &running_pmus, entry) {
+ if (pmu->pmu == event->pmu) {
+ pmu->nr_event++;
+ goto out;
+ }
+ }
+
+ pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
+ if (pmu != NULL) {
+ pmu->nr_event++;
+ pmu->pmu = event->pmu;
+ list_add_rcu(&pmu->entry, &running_pmus);
+ }
+out:
+ mutex_unlock(&running_pmus_lock);
+}
+
static void account_event(struct perf_event *event)
{
bool inc = false;
@@ -7772,6 +7825,8 @@ static void account_event(struct perf_event *event)
static_key_slow_inc(&perf_sched_events.key);
account_event_cpu(event, event->cpu);
+
+ account_running_pmu(event);
}
/*
--
2.5.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-24 21:20 [PATCH 1/1] perf/core: find auxiliary events in running pmus list kan.liang
@ 2016-02-25 8:17 ` Peter Zijlstra
2016-02-25 13:29 ` Alexander Shishkin
2016-02-26 10:36 ` Peter Zijlstra
2 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-02-25 8:17 UTC (permalink / raw)
To: kan.liang
Cc: linux-kernel, ak, eranian, vincent.weaver, tglx, mingo, acme,
jolsa, ying.huang, Alexander Shishkin
You forgot to Cc Alexander, who wrote most of the AUX bits.
On Wed, Feb 24, 2016 at 01:20:36PM -0800, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> perf_event_aux funciton goes through pmus list to find proper auxiliary
> events to output. The pmus list consists of all possible pmus in the
> system, that may or may not be running at the moment, while the
> auxiliary events must be from the running pmus. Therefore searching
> non-running pmus is unnecessary and expensive especially when there are
> many non-running pmus on the list.
>
> For example, the brk test case in lkp triggers many mmap operations,
> at the time, perf with cycles:pp is also running on the system. As a
> result, many perf_event_aux are invoked, and each would search the whole
> pmus list. If we enable the uncore support (even when uncore event are
> not really used), dozens of uncore pmus will be added into pmus list,
> which can significantly decrease brk_test's ops_per_sec. Based on our
> test, the ops_per_sec without uncore patch is 2647573, while the
> ops_per_sec with uncore patch is only 1768444, which is a 33.2%
> reduction.
>
> This patch introduces a running_pmus list which only tracks the running
> pmus in the system. The perf_event_aux uses running_pmus list instead of
> pmus list to find auxiliary events.
>
> Reported-by: Huang, Ying <ying.huang@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
> kernel/events/core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 94c47e3..e33a0de 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -335,6 +335,14 @@ static LIST_HEAD(pmus);
> static DEFINE_MUTEX(pmus_lock);
> static struct srcu_struct pmus_srcu;
>
> +struct running_pmu {
> + struct list_head entry;
> + struct pmu *pmu;
> + int nr_event;
> +};
> +static LIST_HEAD(running_pmus);
> +static DEFINE_MUTEX(running_pmus_lock);
> +
> /*
> * perf event paranoia level:
> * -1 - not paranoid at all
> @@ -3511,6 +3519,23 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
> atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> }
>
> +static void unaccount_running_pmu(struct perf_event *event)
> +{
> + struct running_pmu *pmu, *tmp;
> +
> + mutex_lock(&running_pmus_lock);
> +
> + list_for_each_entry_safe(pmu, tmp, &running_pmus, entry) {
> + if ((pmu->pmu == event->pmu) && !(--pmu->nr_event)) {
> + list_del_rcu(&pmu->entry);
> + kfree(pmu);
> + break;
> + }
> + }
> +
> + mutex_unlock(&running_pmus_lock);
> +}
> +
> static void unaccount_event(struct perf_event *event)
> {
> bool dec = false;
> @@ -3541,6 +3566,8 @@ static void unaccount_event(struct perf_event *event)
> static_key_slow_dec_deferred(&perf_sched_events);
>
> unaccount_event_cpu(event, event->cpu);
> +
> + unaccount_running_pmu(event);
> }
>
> /*
> @@ -5616,6 +5643,7 @@ perf_event_aux(perf_event_aux_output_cb output, void *data,
> {
> struct perf_cpu_context *cpuctx;
> struct perf_event_context *ctx;
> + struct running_pmu *running_pmu;
> struct pmu *pmu;
> int ctxn;
>
> @@ -5631,7 +5659,9 @@ perf_event_aux(perf_event_aux_output_cb output, void *data,
> }
>
> rcu_read_lock();
> - list_for_each_entry_rcu(pmu, &pmus, entry) {
> +
> + list_for_each_entry_rcu(running_pmu, &running_pmus, entry) {
> + pmu = running_pmu->pmu;
> cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
> if (cpuctx->unique_pmu != pmu)
> goto next;
> @@ -7740,6 +7770,29 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> }
>
> +static void account_running_pmu(struct perf_event *event)
> +{
> + struct running_pmu *pmu;
> +
> + mutex_lock(&running_pmus_lock);
> +
> + list_for_each_entry(pmu, &running_pmus, entry) {
> + if (pmu->pmu == event->pmu) {
> + pmu->nr_event++;
> + goto out;
> + }
> + }
> +
> + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> + if (pmu != NULL) {
> + pmu->nr_event++;
> + pmu->pmu = event->pmu;
> + list_add_rcu(&pmu->entry, &running_pmus);
> + }
> +out:
> + mutex_unlock(&running_pmus_lock);
> +}
> +
> static void account_event(struct perf_event *event)
> {
> bool inc = false;
> @@ -7772,6 +7825,8 @@ static void account_event(struct perf_event *event)
> static_key_slow_inc(&perf_sched_events.key);
>
> account_event_cpu(event, event->cpu);
> +
> + account_running_pmu(event);
> }
>
> /*
> --
> 2.5.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-24 21:20 [PATCH 1/1] perf/core: find auxiliary events in running pmus list kan.liang
2016-02-25 8:17 ` Peter Zijlstra
@ 2016-02-25 13:29 ` Alexander Shishkin
2016-02-25 15:38 ` Liang, Kan
2016-02-26 10:36 ` Peter Zijlstra
2 siblings, 1 reply; 7+ messages in thread
From: Alexander Shishkin @ 2016-02-25 13:29 UTC (permalink / raw)
To: kan.liang, peterz, linux-kernel
Cc: ak, eranian, vincent.weaver, tglx, mingo, acme, jolsa, ying.huang,
Kan Liang
kan.liang@intel.com writes:
> From: Kan Liang <kan.liang@intel.com>
>
> perf_event_aux funciton goes through pmus list to find proper auxiliary
> events to output. The pmus list consists of all possible pmus in the
> system, that may or may not be running at the moment, while the
> auxiliary events must be from the running pmus. Therefore searching
> non-running pmus is unnecessary and expensive especially when there are
> many non-running pmus on the list.
>
> For example, the brk test case in lkp triggers many mmap operations,
> at the time, perf with cycles:pp is also running on the system. As a
> result, many perf_event_aux are invoked, and each would search the whole
> pmus list. If we enable the uncore support (even when uncore event are
> not really used), dozens of uncore pmus will be added into pmus list,
> which can significantly decrease brk_test's ops_per_sec. Based on our
> test, the ops_per_sec without uncore patch is 2647573, while the
> ops_per_sec with uncore patch is only 1768444, which is a 33.2%
> reduction.
What does this ops_per_sec measure, exactly? Just curious.
You'll probably also observe the same effect if you simply create a
bunch of disabled events before you measure the time that it takes
perf_event_aux() to notify everybody. Even worse, because you can have
way more events than pmus. Question is, is this really a problem.
> This patch introduces a running_pmus list which only tracks the running
> pmus in the system. The perf_event_aux uses running_pmus list instead of
> pmus list to find auxiliary events.
This patch also adds a global mutex that serializes *all* event
creation/freeing. Including the fork and exit paths.
I mean:
> @@ -7740,6 +7770,29 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> }
>
> +static void account_running_pmu(struct perf_event *event)
> +{
> + struct running_pmu *pmu;
> +
> + mutex_lock(&running_pmus_lock);
> +
> + list_for_each_entry(pmu, &running_pmus, entry) {
> + if (pmu->pmu == event->pmu) {
> + pmu->nr_event++;
> + goto out;
> + }
> + }
> +
> + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> + if (pmu != NULL) {
> + pmu->nr_event++;
> + pmu->pmu = event->pmu;
> + list_add_rcu(&pmu->entry, &running_pmus);
> + }
> +out:
> + mutex_unlock(&running_pmus_lock);
> +}
> +
> static void account_event(struct perf_event *event)
> {
> bool inc = false;
> @@ -7772,6 +7825,8 @@ static void account_event(struct perf_event *event)
> static_key_slow_inc(&perf_sched_events.key);
>
> account_event_cpu(event, event->cpu);
> +
> + account_running_pmu(event);
doesn't look justified.
Regards,
--
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-25 13:29 ` Alexander Shishkin
@ 2016-02-25 15:38 ` Liang, Kan
0 siblings, 0 replies; 7+ messages in thread
From: Liang, Kan @ 2016-02-25 15:38 UTC (permalink / raw)
To: Alexander Shishkin, peterz@infradead.org,
linux-kernel@vger.kernel.org
Cc: ak@linux.intel.com, eranian@google.com, vincent.weaver@maine.edu,
tglx@linutronix.de, mingo@kernel.org, acme@redhat.com,
jolsa@redhat.com, ying.huang@linux.intel.com
> kan.liang@intel.com writes:
>
> > From: Kan Liang <kan.liang@intel.com>
> >
> > perf_event_aux funciton goes through pmus list to find proper
> > auxiliary events to output. The pmus list consists of all possible
> > pmus in the system, that may or may not be running at the moment,
> > while the auxiliary events must be from the running pmus. Therefore
> > searching non-running pmus is unnecessary and expensive especially
> > when there are many non-running pmus on the list.
> >
> > For example, the brk test case in lkp triggers many mmap operations,
> > at the time, perf with cycles:pp is also running on the system. As a
> > result, many perf_event_aux are invoked, and each would search the
> > whole pmus list. If we enable the uncore support (even when uncore
> > event are not really used), dozens of uncore pmus will be added into
> > pmus list, which can significantly decrease brk_test's ops_per_sec.
> > Based on our test, the ops_per_sec without uncore patch is 2647573,
> > while the ops_per_sec with uncore patch is only 1768444, which is a
> > 33.2% reduction.
>
> What does this ops_per_sec measure, exactly? Just curious.
The brk_test just do brk system call blindly. So I think basically it measures
the whole brk system call.
> You'll probably also observe the same effect if you simply create a bunch of
> disabled events before you measure the time that it takes
> perf_event_aux() to notify everybody. Even worse, because you can have
> way more events than pmus. Question is, is this really a problem.
>
Yes, I can observe the same effect if I create a bunch of disabled events.
But the problem is that the user did nothing but upgrade the system. They
will suffer the performance downgrade.
There is another performance data about this issue.
On the kernel without uncore enabled, perf_event_aux uses 16.84% CPU
cycles, if we run brk_test and perf with cycles:pp.
While on the kernel with uncore enabled, perf_event_aux uses 46.37% CPU
cycles, when we run same brk_test and measure same cycles:pp event.
As we can see, the user's behavior doesn't change. The only difference is that
many uncore PMUs are registered in the system. Even no one uses uncore
PMUs at the moment, they still impact the performance. I think that is really
a problem, especially considering that more and more PMUs will be added
in future.
> > This patch introduces a running_pmus list which only tracks the
> > running pmus in the system. The perf_event_aux uses running_pmus list
> > instead of pmus list to find auxiliary events.
>
> This patch also adds a global mutex that serializes *all* event
> creation/freeing. Including the fork and exit paths.
I think we have to maintain a global mutex for running_pmus.
There will be some impacts, but it should be limited.
It only impacts in event creation/freeing.
The running_pmus list are usually small, and most events should from same
PMU. So the time spend in critical sections should not big.
Thanks,
Kan
>
> I mean:
>
> > @@ -7740,6 +7770,29 @@ static void account_event_cpu(struct
> perf_event *event, int cpu)
> > atomic_inc(&per_cpu(perf_cgroup_events, cpu)); }
> >
> > +static void account_running_pmu(struct perf_event *event) {
> > + struct running_pmu *pmu;
> > +
> > + mutex_lock(&running_pmus_lock);
> > +
> > + list_for_each_entry(pmu, &running_pmus, entry) {
> > + if (pmu->pmu == event->pmu) {
> > + pmu->nr_event++;
> > + goto out;
> > + }
> > + }
> > +
> > + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> > + if (pmu != NULL) {
> > + pmu->nr_event++;
> > + pmu->pmu = event->pmu;
> > + list_add_rcu(&pmu->entry, &running_pmus);
> > + }
> > +out:
> > + mutex_unlock(&running_pmus_lock);
> > +}
> > +
> > static void account_event(struct perf_event *event) {
> > bool inc = false;
> > @@ -7772,6 +7825,8 @@ static void account_event(struct perf_event
> *event)
> > static_key_slow_inc(&perf_sched_events.key);
> >
> > account_event_cpu(event, event->cpu);
> > +
> > + account_running_pmu(event);
>
> doesn't look justified.
>
> Regards,
> --
> Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-24 21:20 [PATCH 1/1] perf/core: find auxiliary events in running pmus list kan.liang
2016-02-25 8:17 ` Peter Zijlstra
2016-02-25 13:29 ` Alexander Shishkin
@ 2016-02-26 10:36 ` Peter Zijlstra
2016-02-28 16:31 ` Liang, Kan
2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-02-26 10:36 UTC (permalink / raw)
To: kan.liang
Cc: linux-kernel, ak, eranian, vincent.weaver, tglx, mingo, acme,
jolsa, ying.huang, alexander.shishkin
> index 9> +static void account_running_pmu(struct perf_event *event)
> +{
> + struct running_pmu *pmu;
> +
> + mutex_lock(&running_pmus_lock);
> +
> + list_for_each_entry(pmu, &running_pmus, entry) {
> + if (pmu->pmu == event->pmu) {
> + pmu->nr_event++;
> + goto out;
> + }
> + }
> +
> + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> + if (pmu != NULL) {
> + pmu->nr_event++;
> + pmu->pmu = event->pmu;
> + list_add_rcu(&pmu->entry, &running_pmus);
> + }
That kzalloc() doesn't make any sense, why not simply add a member to
struct pmu?
> +out:
> + mutex_unlock(&running_pmus_lock);
> +}
In any case, can't we replace the whole perf_event_aux muck with a
data structure for finding interested events?
Because not only is iterating all PMUs silly, we then further iterate
all events in whatever contexts we find from them. Even if none of the
events in these contexts is interested in the side-band event.
We have:
mmap,comm,task,mmap_data,mmap2,comm_exec,context_switch
Which I think we can reduce like:
{mmap,mmap_data,mmap2} -> mmap
{comm,comm_exec} -> comm
mmap,comm,task,context_switch
This would allow us to keep 4 per-cpu lists of events like:
struct pmu_event_list {
raw_spinlock_t lock;
struct list_head list;
};
enum event_sb_channel {
sb_mmap = 0,
sb_comm,
sb_task,
sb_switch,
sb_nr,
}
static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events[sb_nr]);
static void attach_sb_event(struct perf_event *event, enum event_sb_channel sb)
{
struct pmu_event_list *pel = per_cpu_ptr(&pmu_sb_events[sb], event->cpu);
raw_spin_lock(&pel->lock);
list_add_rcu(&event->sb_list[sb], &pel->list);
raw_spin_unlock(&pel->lock);
}
static void account_pmu_sb_event(struct perf_event *event)
{
if (event->parent)
return;
if (event->attach & ATTACH_TASK)
return;
if (event->attr.mmap || event->attr.mmap_data || event->attr.mmap2)
attach_sb_event(event, sb_mmap);
if (event->attr.comm || event->attr.comm_exec)
attach_sb_event(event, sb_comm);
if (event->attr.task)
attach_sb_event(event, sb_task);
if (event->attr.context_switch)
attach_sb_event(event, sb_switch);
}
/* matching unaccount */
static void perf_event_sb_iterate(enum event_sb_channel sb,
perf_event_sb_output_f output, void *data)
{
struct pmu_event_list *pel = __this_cpu_ptr(&pmu_sb_events[sb]);
struct perf_event *event;
list_for_each_entry_rcu(event, &pel->list, sb_list[sb]) {
if (event->state < PERF_EVENT_STATE_INACTIVE)
continue;
if (!event_filter_match(event))
continue;
output(event, data);
}
}
static void perf_event_sb_mask(unsigned int sb_mask, perf_event_sb_output_f output, void *data)
{
int sb;
for (sb = 0; sb < sb_nr; sb++) {
if (!(sb_mask & (1 << sb)))
continue;
perf_event_sb_iterate(sb, output, data);
}
}
Note: the mask is needed because a task event (as per perf_event_task)
needs to go out to sb_comm, sb_mmap and sb_task, see
perf_event_task_match().
And then you can replace the whole global part of perf_event_aux (which
I would rename to perf_event_sb), with this.
You still have to do something like:
for_each_task_context_nr(ctxn) {
ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
if (!ctx)
continue;
perf_event_sb_ctx(ctx, output, data);
}
To get at the per task events, because I don't think we want to go
update more global state on context switch, nor am I entirely sure its
worth it to keep per sb ctx->event_list[]s.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-26 10:36 ` Peter Zijlstra
@ 2016-02-28 16:31 ` Liang, Kan
2016-02-29 12:23 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2016-02-28 16:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org, ak@linux.intel.com,
eranian@google.com, vincent.weaver@maine.edu, tglx@linutronix.de,
mingo@kernel.org, acme@redhat.com, jolsa@redhat.com,
ying.huang@linux.intel.com, alexander.shishkin@linux.intel.com
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Friday, February 26, 2016 5:37 AM
> To: Liang, Kan
> Cc: linux-kernel@vger.kernel.org; ak@linux.intel.com;
> eranian@google.com; vincent.weaver@maine.edu; tglx@linutronix.de;
> mingo@kernel.org; acme@redhat.com; jolsa@redhat.com;
> ying.huang@linux.intel.com; alexander.shishkin@linux.intel.com
> Subject: Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus
> list
>
> > index 9> +static void account_running_pmu(struct perf_event *event)
> > +{
> > + struct running_pmu *pmu;
> > +
> > + mutex_lock(&running_pmus_lock);
> > +
> > + list_for_each_entry(pmu, &running_pmus, entry) {
> > + if (pmu->pmu == event->pmu) {
> > + pmu->nr_event++;
> > + goto out;
> > + }
> > + }
> > +
> > + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> > + if (pmu != NULL) {
> > + pmu->nr_event++;
> > + pmu->pmu = event->pmu;
> > + list_add_rcu(&pmu->entry, &running_pmus);
> > + }
>
> That kzalloc() doesn't make any sense, why not simply add a member to
> struct pmu?
>
> > +out:
> > + mutex_unlock(&running_pmus_lock);
> > +}
>
> In any case, can't we replace the whole perf_event_aux muck with a data
> structure for finding interested events?
>
> Because not only is iterating all PMUs silly, we then further iterate all
> events in whatever contexts we find from them. Even if none of the
> events in these contexts is interested in the side-band event.
>
> We have:
> mmap,comm,task,mmap_data,mmap2,comm_exec,context_switc
> h
>
>
> Which I think we can reduce like:
>
> {mmap,mmap_data,mmap2} -> mmap
> {comm,comm_exec} -> comm
>
> mmap,comm,task,context_switch
>
> This would allow us to keep 4 per-cpu lists of events like:
>
>
> struct pmu_event_list {
> raw_spinlock_t lock;
> struct list_head list;
> };
>
> enum event_sb_channel {
> sb_mmap = 0,
> sb_comm,
> sb_task,
> sb_switch,
> sb_nr,
> }
>
> static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events[sb_nr]);
>
> static void attach_sb_event(struct perf_event *event, enum
> event_sb_channel sb) {
> struct pmu_event_list *pel = per_cpu_ptr(&pmu_sb_events[sb],
> event->cpu);
>
> raw_spin_lock(&pel->lock);
> list_add_rcu(&event->sb_list[sb], &pel->list);
> raw_spin_unlock(&pel->lock);
> }
>
> static void account_pmu_sb_event(struct perf_event *event) {
> if (event->parent)
> return;
>
> if (event->attach & ATTACH_TASK)
> return;
>
> if (event->attr.mmap || event->attr.mmap_data || event-
> >attr.mmap2)
> attach_sb_event(event, sb_mmap);
>
> if (event->attr.comm || event->attr.comm_exec)
> attach_sb_event(event, sb_comm);
>
> if (event->attr.task)
> attach_sb_event(event, sb_task);
>
> if (event->attr.context_switch)
> attach_sb_event(event, sb_switch);
> }
>
> /* matching unaccount */
>
>
> static void perf_event_sb_iterate(enum event_sb_channel sb,
> perf_event_sb_output_f output, void
> *data) {
> struct pmu_event_list *pel =
> __this_cpu_ptr(&pmu_sb_events[sb]);
> struct perf_event *event;
>
> list_for_each_entry_rcu(event, &pel->list, sb_list[sb]) {
> if (event->state < PERF_EVENT_STATE_INACTIVE)
> continue;
> if (!event_filter_match(event))
> continue;
> output(event, data);
> }
> }
>
> static void perf_event_sb_mask(unsigned int sb_mask,
> perf_event_sb_output_f output, void *data) {
> int sb;
>
> for (sb = 0; sb < sb_nr; sb++) {
> if (!(sb_mask & (1 << sb)))
> continue;
> perf_event_sb_iterate(sb, output, data);
> }
> }
>
> Note: the mask is needed because a task event (as per perf_event_task)
> needs to go out to sb_comm, sb_mmap and sb_task, see
> perf_event_task_match().
>
> And then you can replace the whole global part of perf_event_aux (which I
> would rename to perf_event_sb), with this.
>
> You still have to do something like:
>
> for_each_task_context_nr(ctxn) {
> ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
> if (!ctx)
> continue;
> perf_event_sb_ctx(ctx, output, data);
> }
>
I'm not sure why we need something like for_each_task_context_nr(ctxn) in
perf_event_aux/perf_event_sb.
Isn't perf_event_sb_mask enough?
It looks perf_event_sb_mask will go through all possible interested events
(includes both per cpu and per task events). That's what we want to do in
perf_event_aux, right?
Thanks,
Kan
> To get at the per task events, because I don't think we want to go update
> more global state on context switch, nor am I entirely sure its worth it to
> keep per sb ctx->event_list[]s.
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus list
2016-02-28 16:31 ` Liang, Kan
@ 2016-02-29 12:23 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-02-29 12:23 UTC (permalink / raw)
To: Liang, Kan
Cc: linux-kernel@vger.kernel.org, ak@linux.intel.com,
eranian@google.com, vincent.weaver@maine.edu, tglx@linutronix.de,
mingo@kernel.org, acme@redhat.com, jolsa@redhat.com,
ying.huang@linux.intel.com, alexander.shishkin@linux.intel.com
On Sun, Feb 28, 2016 at 04:31:49PM +0000, Liang, Kan wrote:
> > static void account_pmu_sb_event(struct perf_event *event) {
> > if (event->parent)
> > return;
> >
> > if (event->attach & ATTACH_TASK)
> > return;
^^^^ this
> > if (event->attr.mmap || event->attr.mmap_data || event->attr.mmap2)
> > attach_sb_event(event, sb_mmap);
> >
> > if (event->attr.comm || event->attr.comm_exec)
> > attach_sb_event(event, sb_comm);
> >
> > if (event->attr.task)
> > attach_sb_event(event, sb_task);
> >
> > if (event->attr.context_switch)
> > attach_sb_event(event, sb_switch);
> > }
> > And then you can replace the whole global part of perf_event_aux (which I
> > would rename to perf_event_sb), with this.
> >
> > You still have to do something like:
> >
> > for_each_task_context_nr(ctxn) {
> > ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
> > if (!ctx)
> > continue;
> > perf_event_sb_ctx(ctx, output, data);
> > }
> >
>
> I'm not sure why we need something like for_each_task_context_nr(ctxn) in
> perf_event_aux/perf_event_sb.
> Isn't perf_event_sb_mask enough?
> It looks perf_event_sb_mask will go through all possible interested events
> (includes both per cpu and per task events).
See sites marked "^^^ this"
This stuff explicitly does not handle task events, because of the below:
> > To get at the per task events, because I don't think we want to go update
> > more global state on context switch, nor am I entirely sure its worth it to
> > keep per sb ctx->event_list[]s.
^^^ this
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-29 12:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24 21:20 [PATCH 1/1] perf/core: find auxiliary events in running pmus list kan.liang
2016-02-25 8:17 ` Peter Zijlstra
2016-02-25 13:29 ` Alexander Shishkin
2016-02-25 15:38 ` Liang, Kan
2016-02-26 10:36 ` Peter Zijlstra
2016-02-28 16:31 ` Liang, Kan
2016-02-29 12:23 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox