From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Jiri Olsa <jolsa@kernel.org>,
Kan Liang <kan.liang@linux.intel.com>,
Ravi Bangoria <ravi.bangoria@amd.com>,
bpf@vger.kernel.org
Subject: Re: [PATCH 2/3] perf/core: Set data->sample_flags in perf_prepare_sample()
Date: Mon, 9 Jan 2023 12:21:25 -0800 [thread overview]
Message-ID: <Y7x3RUd67smv3EFQ@google.com> (raw)
In-Reply-To: <Y7wFJ+NF0NwnmzLa@hirez.programming.kicks-ass.net>
Hi Peter,
On Mon, Jan 09, 2023 at 01:14:31PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 29, 2022 at 12:41:00PM -0800, Namhyung Kim wrote:
>
> So I like the general idea; I just think it's turned into a bit of a
> mess. That is code is already overly branchy which is known to hurt
> performance, we should really try and not make it worse than absolutely
> needed.
Agreed.
>
> > kernel/events/core.c | 86 ++++++++++++++++++++++++++++++++------------
> > 1 file changed, 63 insertions(+), 23 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index eacc3702654d..70bff8a04583 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -7582,14 +7582,21 @@ void perf_prepare_sample(struct perf_event_header *header,
> > filtered_sample_type = sample_type & ~data->sample_flags;
> > __perf_event_header__init_id(header, data, event, filtered_sample_type);
> >
> > - if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> > - data->ip = perf_instruction_pointer(regs);
> > + if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> > + /* attr.sample_type may not have PERF_SAMPLE_IP */
>
> Right, but that shouldn't matter, IIRC its OK to have more bits set in
> data->sample_flags than we have set in attr.sample_type. It just means
> we have data available for sample types we're (possibly) not using.
>
> That is, I think you can simply write this like:
>
> > + if (!(data->sample_flags & PERF_SAMPLE_IP)) {
> > + data->ip = perf_instruction_pointer(regs);
> > + data->sample_flags |= PERF_SAMPLE_IP;
> > + }
> > + }
>
> if (filtered_sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> data->ip = perf_instruction_pointer(regs);
> data->sample_flags |= PERF_SAMPLE_IP);
> }
>
> ...
>
> if (filtered_sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) {
> data->code_page_size = perf_get_page_size(data->ip);
> data->sample_flags |= PERF_SAMPLE_CODE_PAGE_SIZE;
> }
>
> Then after a single perf_prepare_sample() run we have:
>
> pre | post
> ----------------------------------------
> 0 | 0
> IP | IP
> CODE_PAGE_SIZE | IP|CODE_PAGE_SIZE
> IP|CODE_PAGE_SIZE | IP|CODE_PAGE_SIZE
>
> So while data->sample_flags will have an extra bit set in the 3rd case,
> that will not affect perf_sample_outout() which only looks at data->type
> (== attr.sample_type).
>
> And since data->sample_flags will have both bits set, a second run will
> filter out both and avoid the extra work (except doing that will mess up
> the branch predictors).
Yeah, it'd be better to check filtered_sample_type in the first place.
Btw, I was thinking about a hypothetical scenario that IP set by a PMU
driver not from the regs. In this case, having CODE_PAGE_SIZE will
overwrite the IP. I don't think we need to worry about that for now
since PMU drivers updates the regs (using set_linear_ip). But it seems
like a possible scenario for something like PEBS or IBS.
>
>
> > if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> > int size = 1;
> >
> > - if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
> > + if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
> > data->callchain = perf_callchain(event, regs);
> > + data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> > + }
> >
> > size += data->callchain->nr;
> >
>
> This, why can't this be:
>
> if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
> data->callchain = perf_callchain(event, regs);
> data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
>
> header->size += (1 + data->callchain->nr) * sizeof(u64);
> }
>
> I suppose this is because perf_event_header lives on the stack of the
> overflow handler and all that isn't available / relevant for the BPF
> thing.
Right, it needs to calculate the data size for each sample data.
>
> And we can't pull that out into anther function without adding yet
> another branch fest.
>
> However; inspired by your next patch; we can do something like so:
>
> if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
> data->callchain = perf_callchain(event, regs);
> data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
>
> data->size += (1 + data->callchain->nr) * sizeof(u64);
> }
This is fine as long as all other places (like in PMU drivers) set the
callchain update the sample data size accordingly. If not, we can get
the callchain but the data size will be wrong.
>
> And then have __perf_event_output() (or something thereabout) do:
>
> perf_prepare_sample(data, event, regs);
> perf_prepare_header(&header, data, event);
> err = output_begin(&handle, data, event, header.size);
> if (err)
> goto exit;
> perf_output_sample(&handle, &header, data, event);
> perf_output_end(&handle);
>
> With perf_prepare_header() being something like:
>
> header->type = PERF_RECORD_SAMPLE;
> header->size = sizeof(*header) + event->header_size + data->size;
> header->misc = perf_misc_flags(regs);
> ...
>
> Hmm ?
>
> (same for all the other sites)
Looks good. But I'm confused by the tip-bot2 messages saying it's
merged. Do you want me to work on it as a follow up?
Thanks,
Namhyung
next prev parent reply other threads:[~2023-01-09 20:22 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-29 20:40 [PATCH 1/3] perf/core: Change the layout of perf_sample_data Namhyung Kim
2022-12-29 20:41 ` [PATCH 2/3] perf/core: Set data->sample_flags in perf_prepare_sample() Namhyung Kim
2023-01-09 12:14 ` Peter Zijlstra
2023-01-09 20:21 ` Namhyung Kim [this message]
2023-01-10 10:54 ` Peter Zijlstra
2023-01-10 11:10 ` Ingo Molnar
2023-01-10 19:00 ` Namhyung Kim
2023-01-10 10:55 ` Peter Zijlstra
2023-01-10 19:01 ` Namhyung Kim
2023-01-10 20:06 ` Namhyung Kim
2023-01-11 12:54 ` Peter Zijlstra
2023-01-11 16:45 ` Peter Zijlstra
2023-01-11 17:59 ` Namhyung Kim
2023-01-09 17:02 ` [tip: perf/core] " tip-bot2 for Namhyung Kim
2022-12-29 20:41 ` [PATCH 3/3] perf/core: Save calculated sample data size Namhyung Kim
2023-01-09 17:02 ` [tip: perf/core] " tip-bot2 for Namhyung Kim
2023-01-09 17:02 ` [tip: perf/core] perf/core: Change the layout of perf_sample_data tip-bot2 for Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7x3RUd67smv3EFQ@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox