Re: [RFC][PATCH] perf: Rewrite core context handling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>,
	songliubraving@fb.com, Thomas Gleixner <tglx@linutronix.de>,
	Mark Rutland <mark.rutland@arm.com>,
	megha.dey@intel.com, frederic@kernel.org
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling
Date: Tue, 16 Oct 2018 11:32:53 +0200	[thread overview]
Message-ID: <20181016093253.GD4030@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBQ48dM0bTFr_o3pSpAP8e_aH5gHeqXEdkPS0LY3bxBtLw@mail.gmail.com>

On Mon, Oct 15, 2018 at 11:31:24AM -0700, Stephane Eranian wrote:

> I have always had a hard time understanding the role of all these
> structs in the generic code. This is still very confusing and very
> hard to follow.
> 
> In my mind, you have per-task and per-cpu perf_events contexts.  And
> for each you can have multiple PMUs, some hw some sw.  Each PMU has
> its own list of events maintained in RB tree. There is never any
> interactions between PMUs.

That is more or less how it was. We have per PMU task or CPU contexts:

  task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--> pmu <--'
                                              v           ^
                                         perf_event ------'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of events
for that PMU. The task related perf_event_context's have a pointer back
to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to that
PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc..

Each perf_event is then associated with its PMU and one
perf_event_context.

> Maybe this is how this is done or proposed by your patches, but it
> certainly is not obvious.

No, my patch somewhat completely wrecks the above; and reduces to a
single task context and a single CPU context.

There were a number of problems with the above. One is that task-array
of pointer, which limited the number of task contexts we could have.

Now, we could've easily changed that to a list and called it a day.
That is not in fact a horribly difficult patch. If you combine that with
a patch that actually freed task context's when they go empty, that
might actually work.

But there are a number of other considerations that resulted in the
patch as presented:

 - there is a bunch of per context state that is simply duplicated
   between contexts, like for instance the time keeping. There is no
   point in tracking the time for 'n' per task/cpu contexts when in fact
   they're all the same.

 - on context switch we have to iterate all these 'n' contexts and
   switch them one by one. Instead of just switching one context and
   calling it a day.

 - for big.little we'd end up with 2 per-task contexts and only ever use
   1 at any one time, which increases 'n' in the above cases for no
   purpose.

 - the actual per-pmu-per-context state is very small (as I think Alexey
   already implied).

 - a single context simplifies a bunch of things; including the
   move_group case (we no longer have to adjust perf_event::ctx) and the
   cpu-online tests and the ctx locking and it removes a bunch of
   context lists (like active_ctx_list).

So a single context is what I went with. That all results in:

  task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
       ^                                 |    ^ ^
       `---------------------------------'    | |
                                              | `--> perf_event_pmu_context
                                              |       ^   ^
                                              |       |   |
                                              | ,-----'   v
                                              | |      perf_cpu_pmu_context
                                              | |         ^
                                              | |         |
                                              v v         v
                                         perf_event ---> pmu

Because while the per-pmu-per-context state is small, it does exists,
this gives rise to perf_event_pmu_context. It tracks nr_events and
nr_active, which is used to (quickly) tell if rotation is required (it
is possible to reduce this state I think, but I've not yet gotten it
down to 0). It also tracks which events are actually active; iterating a
list is cheaper than finding them all in the RB-tree.

It also contains the task_ctx_data thing for LBR, which is a PMU
specific extra data thingy.

We then also keep a list of (active) perf_event_pmu_context in
perf_event_context, such that we can quickly find which PMUs are in fact
involved with the context. This simplifies context scheduling a little.

We then also need per-pmu-per-cpu state, which gives rise to
perf_cpu_pmu_context, and that mostly includes bits to drive the event
rotation, which per ABI is per PMU, but it also includes bits to do
perf_event_attr::exclusive scheduling, which is also naturally
per-pmu-per-cpu.

And yes, the above looks more complicated, but at the same time, a bunch
of things did get simplified. Maybe once the dust settles someone can
turn this here email into a sensible comment or something ;-)

> Also the Intel LBR is not a PMU on is own. Maybe you are talking about
> the BTS in arch/x86/even/sintel/bts.c.

This thing:

  https://lkml.kernel.org/r/1510970046-25387-1-git-send-email-megha.dey@linux.intel.com

next prev parent reply	other threads:[~2018-10-16  9:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10 10:45 [RFC][PATCH] perf: Rewrite core context handling Peter Zijlstra
2018-10-11  7:50 ` Song Liu
2018-10-11  9:29   ` Peter Zijlstra
2018-10-11 22:37     ` Song Liu
2018-10-12  9:50       ` Peter Zijlstra
2018-10-12 14:25         ` Peter Zijlstra
2018-10-13  8:31         ` Song Liu
2018-10-16  9:50           ` Peter Zijlstra
2018-10-16 16:34             ` Song Liu
2018-10-16 18:10               ` Peter Zijlstra
2018-10-16 18:24                 ` Song Liu
2018-10-12  7:04     ` Alexey Budankov
2018-10-12 11:54       ` Peter Zijlstra
2018-10-15  7:26 ` Alexey Budankov
2018-10-15  8:34   ` Peter Zijlstra
2018-10-15  8:53     ` Peter Zijlstra
2018-10-15 17:29     ` Alexey Budankov
2018-10-15 18:31       ` Stephane Eranian
2018-10-16  6:39         ` Alexey Budankov
2018-10-16  9:32         ` Peter Zijlstra [this message]
2018-10-15 22:09     ` Song Liu
2018-10-16 18:28       ` Song Liu
2018-10-17 11:06         ` Peter Zijlstra
2018-10-17 16:43           ` Song Liu
2018-10-17 17:19             ` Peter Zijlstra
2018-10-17 18:33               ` Peter Zijlstra
2018-10-17 18:57                 ` Song Liu
2018-10-16 16:26 ` Mark Rutland
2018-10-16 18:07   ` Peter Zijlstra
2018-10-17  8:57 ` Alexey Budankov
2018-10-17 15:01   ` Alexander Shishkin
2018-10-17 15:58     ` Alexey Budankov
2018-10-17 16:30   ` Peter Zijlstra
2018-10-18  7:05     ` Alexey Budankov
2018-10-22 13:26 ` Alexander Shishkin
2018-10-23  6:13 ` Song Liu
2018-10-23  6:55   ` Peter Zijlstra
2019-05-15 11:17 ` Alexander Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181016093253.GD4030@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=eranian@google.com \
    --cc=frederic@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=megha.dey@intel.com \
    --cc=mingo@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox