All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>,
	songliubraving@fb.com, Thomas Gleixner <tglx@linutronix.de>,
	Mark Rutland <mark.rutland@arm.com>,
	megha.dey@intel.com, frederic@kernel.org
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling
Date: Tue, 16 Oct 2018 11:32:53 +0200	[thread overview]
Message-ID: <20181016093253.GD4030@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBQ48dM0bTFr_o3pSpAP8e_aH5gHeqXEdkPS0LY3bxBtLw@mail.gmail.com>

On Mon, Oct 15, 2018 at 11:31:24AM -0700, Stephane Eranian wrote:

> I have always had a hard time understanding the role of all these
> structs in the generic code. This is still very confusing and very
> hard to follow.
> 
> In my mind, you have per-task and per-cpu perf_events contexts.  And
> for each you can have multiple PMUs, some hw some sw.  Each PMU has
> its own list of events maintained in RB tree. There is never any
> interactions between PMUs.

That is more or less how it was. We have per PMU task or CPU contexts:


  task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--> pmu <--'
                                              v           ^
                                         perf_event ------'


Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of events
for that PMU. The task related perf_event_context's have a pointer back
to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to that
PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc..

Each perf_event is then associated with its PMU and one
perf_event_context.

> Maybe this is how this is done or proposed by your patches, but it
> certainly is not obvious.

No, my patch somewhat completely wrecks the above; and reduces to a
single task context and a single CPU context.

There were a number of problems with the above. One is that task-array
of pointer, which limited the number of task contexts we could have.

Now, we could've easily changed that to a list and called it a day.
That is not in fact a horribly difficult patch. If you combine that with
a patch that actually freed task context's when they go empty, that
might actually work.

But there are a number of other considerations that resulted in the
patch as presented:

 - there is a bunch of per context state that is simply duplicated
   between contexts, like for instance the time keeping. There is no
   point in tracking the time for 'n' per task/cpu contexts when in fact
   they're all the same.

 - on context switch we have to iterate all these 'n' contexts and
   switch them one by one. Instead of just switching one context and
   calling it a day.

 - for big.little we'd end up with 2 per-task contexts and only ever use
   1 at any one time, which increases 'n' in the above cases for no
   purpose.

 - the actual per-pmu-per-context state is very small (as I think Alexey
   already implied).

 - a single context simplifies a bunch of things; including the
   move_group case (we no longer have to adjust perf_event::ctx) and the
   cpu-online tests and the ctx locking and it removes a bunch of
   context lists (like active_ctx_list).

So a single context is what I went with. That all results in:


  task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
       ^                                 |    ^ ^
       `---------------------------------'    | |
                                              | `--> perf_event_pmu_context
                                              |       ^   ^
                                              |       |   |
                                              | ,-----'   v
                                              | |      perf_cpu_pmu_context
                                              | |         ^
                                              | |         |
                                              v v         v
                                         perf_event ---> pmu


Because while the per-pmu-per-context state is small, it does exists,
this gives rise to perf_event_pmu_context. It tracks nr_events and
nr_active, which is used to (quickly) tell if rotation is required (it
is possible to reduce this state I think, but I've not yet gotten it
down to 0). It also tracks which events are actually active; iterating a
list is cheaper than finding them all in the RB-tree.

It also contains the task_ctx_data thing for LBR, which is a PMU
specific extra data thingy.

We then also keep a list of (active) perf_event_pmu_context in
perf_event_context, such that we can quickly find which PMUs are in fact
involved with the context. This simplifies context scheduling a little.

We then also need per-pmu-per-cpu state, which gives rise to
perf_cpu_pmu_context, and that mostly includes bits to drive the event
rotation, which per ABI is per PMU, but it also includes bits to do
perf_event_attr::exclusive scheduling, which is also naturally
per-pmu-per-cpu.

And yes, the above looks more complicated, but at the same time, a bunch
of things did get simplified. Maybe once the dust settles someone can
turn this here email into a sensible comment or something ;-)

> Also the Intel LBR is not a PMU on is own. Maybe you are talking about
> the BTS in arch/x86/even/sintel/bts.c.

This thing:

  https://lkml.kernel.org/r/1510970046-25387-1-git-send-email-megha.dey@linux.intel.com

  parent reply	other threads:[~2018-10-16  9:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10 10:45 [RFC][PATCH] perf: Rewrite core context handling Peter Zijlstra
2018-10-11  7:50 ` Song Liu
2018-10-11  9:29   ` Peter Zijlstra
2018-10-11 22:37     ` Song Liu
2018-10-12  9:50       ` Peter Zijlstra
2018-10-12 14:25         ` Peter Zijlstra
2018-10-13  8:31         ` Song Liu
2018-10-16  9:50           ` Peter Zijlstra
2018-10-16 16:34             ` Song Liu
2018-10-16 18:10               ` Peter Zijlstra
2018-10-16 18:24                 ` Song Liu
2018-10-12  7:04     ` Alexey Budankov
2018-10-12 11:54       ` Peter Zijlstra
2018-10-15  7:26 ` Alexey Budankov
2018-10-15  8:34   ` Peter Zijlstra
2018-10-15  8:53     ` Peter Zijlstra
2018-10-15 17:29     ` Alexey Budankov
2018-10-15 18:31       ` Stephane Eranian
2018-10-16  6:39         ` Alexey Budankov
2018-10-16  9:32         ` Peter Zijlstra [this message]
2018-10-15 22:09     ` Song Liu
2018-10-16 18:28       ` Song Liu
2018-10-17 11:06         ` Peter Zijlstra
2018-10-17 16:43           ` Song Liu
2018-10-17 17:19             ` Peter Zijlstra
2018-10-17 18:33               ` Peter Zijlstra
2018-10-17 18:57                 ` Song Liu
2018-10-16 16:26 ` Mark Rutland
2018-10-16 18:07   ` Peter Zijlstra
2018-10-17  8:57 ` Alexey Budankov
2018-10-17 15:01   ` Alexander Shishkin
2018-10-17 15:58     ` Alexey Budankov
2018-10-17 16:30   ` Peter Zijlstra
2018-10-18  7:05     ` Alexey Budankov
2018-10-22 13:26 ` Alexander Shishkin
2018-10-23  6:13 ` Song Liu
2018-10-23  6:55   ` Peter Zijlstra
2019-05-15 11:17 ` Alexander Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181016093253.GD4030@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=eranian@google.com \
    --cc=frederic@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=megha.dey@intel.com \
    --cc=mingo@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.