Re: [PATCH 02/14] perf/x86: output NMI overhead

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Mark Rutland <mark.rutland@arm.com>
To: kan.liang@intel.com
Cc: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	linux-kernel@vger.kernel.org, alexander.shishkin@linux.intel.com,
	tglx@linutronix.de, namhyung@kernel.org, jolsa@kernel.org,
	adrian.hunter@intel.com, wangnan0@huawei.com,
	andi@firstfloor.org
Subject: Re: [PATCH 02/14] perf/x86: output NMI overhead
Date: Thu, 24 Nov 2016 16:19:09 +0000	[thread overview]
Message-ID: <20161124161712.GA2444@remoulade> (raw)
In-Reply-To: <1479894292-16277-3-git-send-email-kan.liang@intel.com>

On Wed, Nov 23, 2016 at 04:44:40AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> NMI handler is one of the most important part which brings overhead.
> 
> There are lots of NMI during sampling. It's very expensive to log each
> NMI. So the accumulated time and NMI# will be output when event is going
> to be disabled or task is scheduling out.
> The newly introduced flag PERF_EF_LOG indicate to output the overhead
> log.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  arch/x86/events/core.c          | 19 ++++++++++++++-
>  arch/x86/events/perf_event.h    |  2 ++
>  include/linux/perf_event.h      |  1 +
>  include/uapi/linux/perf_event.h |  2 ++
>  kernel/events/core.c            | 54 ++++++++++++++++++++++-------------------
>  5 files changed, 52 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index d31735f..6c3b0ef 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1397,6 +1397,11 @@ static void x86_pmu_del(struct perf_event *event, int flags)
>  
>  	perf_event_update_userpage(event);
>  
> +	if ((flags & PERF_EF_LOG) && cpuc->nmi_overhead.nr) {
> +		cpuc->nmi_overhead.cpu = smp_processor_id();
> +		perf_log_overhead(event, PERF_NMI_OVERHEAD, &cpuc->nmi_overhead);
> +	}
> +
>  do_del:
>  	if (x86_pmu.del) {
>  		/*
> @@ -1475,11 +1480,21 @@ void perf_events_lapic_init(void)
>  	apic_write(APIC_LVTPC, APIC_DM_NMI);
>  }
>  
> +static void
> +perf_caculate_nmi_overhead(u64 time)

s/caculate/calculate/ - this tripped me up when grepping.

> @@ -1492,8 +1507,10 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>  	start_clock = sched_clock();
>  	ret = x86_pmu.handle_irq(regs);
>  	finish_clock = sched_clock();
> +	clock = finish_clock - start_clock;
>  
> -	perf_sample_event_took(finish_clock - start_clock);
> +	perf_caculate_nmi_overhead(clock);
> +	perf_sample_event_took(clock);

Ah, so it's the *sampling* overhead, not the NMI overhead.

This doesn't take into account the cost of entering/exiting the handler, which
could be larger than the sampling overhead (e.g. if the PMU is connected
through chained interrupt controllers).

>  enum perf_record_overhead_type {
> +	PERF_NMI_OVERHEAD	= 0,

As above, it may be worth calling this PERF_SAMPLE_OVERHEAD; this doesn't count
the entire cost of the NMI, and other architectures may want to implement this,
yet don't have NMI.

[...]

>  static void
>  event_sched_out(struct perf_event *event,
>  		  struct perf_cpu_context *cpuctx,
> -		  struct perf_event_context *ctx)
> +		  struct perf_event_context *ctx,
> +		  bool log_overhead)

Boolean parameter are always confusing. Why not pass the flags directly? That
way we can pass *which* overhead to log, and make the callsites easier to
understand.

>  	event->tstamp_stopped = tstamp;
> -	event->pmu->del(event, 0);
> +	event->pmu->del(event, log_overhead ? PERF_EF_LOG : 0);

... which we could pass on here.

> @@ -1835,20 +1835,21 @@ event_sched_out(struct perf_event *event,
>  static void
>  group_sched_out(struct perf_event *group_event,
>  		struct perf_cpu_context *cpuctx,
> -		struct perf_event_context *ctx)
> +		struct perf_event_context *ctx,
> +		bool log_overhead)

Likewise.

> @@ -1872,7 +1873,7 @@ __perf_remove_from_context(struct perf_event *event,
>  {
>  	unsigned long flags = (unsigned long)info;
>  
> -	event_sched_out(event, cpuctx, ctx);
> +	event_sched_out(event, cpuctx, ctx, false);
>  	if (flags & DETACH_GROUP)
>  		perf_group_detach(event);
>  	list_del_event(event, ctx);
> @@ -1918,9 +1919,9 @@ static void __perf_event_disable(struct perf_event *event,
>  	update_cgrp_time_from_event(event);
>  	update_group_times(event);
>  	if (event == event->group_leader)
> -		group_sched_out(event, cpuctx, ctx);
> +		group_sched_out(event, cpuctx, ctx, true);
>  	else
> -		event_sched_out(event, cpuctx, ctx);
> +		event_sched_out(event, cpuctx, ctx, true);

Why does this differ from __perf_remove_from_context()?

What's the policy for when we do or do not measure overhead?

Thanks,
Mark.

next prev parent reply	other threads:[~2016-11-24 16:27 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23  9:44 [PATCH 00/14] export perf overheads information kan.liang
2016-11-23  9:44 ` [PATCH 01/14] perf/x86: Introduce PERF_RECORD_OVERHEAD kan.liang
2016-11-23 20:11   ` Peter Zijlstra
2016-11-23 20:13   ` Peter Zijlstra
2016-11-23 23:41   ` Jiri Olsa
2016-11-24 13:45     ` Liang, Kan
2016-11-24 13:50       ` Peter Zijlstra
2016-11-24 13:56         ` Liang, Kan
2016-11-24 14:27           ` Jiri Olsa
2016-11-24 14:39             ` Liang, Kan
2016-11-24 14:47               ` Jiri Olsa
2016-11-24 18:28         ` Andi Kleen
2016-11-24 18:58           ` Peter Zijlstra
2016-11-24 19:02             ` Andi Kleen
2016-11-24 19:08               ` Peter Zijlstra
2016-11-23  9:44 ` [PATCH 02/14] perf/x86: output NMI overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-24 16:19   ` Mark Rutland [this message]
2016-11-24 19:02     ` Peter Zijlstra
2016-11-24 19:40     ` Liang, Kan
2016-11-24 23:26       ` Namhyung Kim
2016-11-23  9:44 ` [PATCH 03/14] perf/x86: output multiplexing overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-23 20:09     ` Liang, Kan
2016-11-23  9:44 ` [PATCH 04/14] perf/x86: output side-band events overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-24 16:21   ` Mark Rutland
2016-11-24 19:40     ` Liang, Kan
2016-11-23  9:44 ` [PATCH 05/14] perf tools: handle PERF_RECORD_OVERHEAD record type kan.liang
2016-11-23 22:35   ` Jiri Olsa
2016-11-23 22:58     ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 06/14] perf tools: show NMI overhead kan.liang
2016-11-23 22:51   ` Jiri Olsa
2016-11-24 13:37     ` Liang, Kan
2016-11-24 15:27       ` Jiri Olsa
2016-11-24 23:20         ` Namhyung Kim
2016-11-24 23:45           ` Jiri Olsa
2016-11-25  0:21         ` Andi Kleen
2016-11-23 22:52   ` Jiri Olsa
2016-11-23 22:52   ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 07/14] perf tools: show multiplexing overhead kan.liang
2016-11-23  9:44 ` [PATCH 08/14] perf tools: show side-band events overhead kan.liang
2016-11-23  9:44 ` [PATCH 09/14] perf tools: make get_nsecs visible for buildin files kan.liang
2016-11-23  9:44 ` [PATCH 10/14] perf tools: introduce PERF_RECORD_USER_OVERHEAD kan.liang
2016-11-23  9:44 ` [PATCH 11/14] perf tools: record write data overhead kan.liang
2016-11-23 23:02   ` Jiri Olsa
2016-11-23 23:06   ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 12/14] perf tools: record elapsed time kan.liang
2016-11-23  9:44 ` [PATCH 13/14] perf tools: warn on high overhead kan.liang
2016-11-23 20:25   ` Andi Kleen
2016-11-23 22:03     ` Liang, Kan
2016-11-25 20:42       ` Andi Kleen
2016-11-23  9:44 ` [PATCH 14/14] perf script: show overhead events kan.liang
2016-11-23 23:17   ` Jiri Olsa
2016-11-23 23:18   ` Jiri Olsa
2016-11-23 23:19   ` Jiri Olsa
2016-11-23 23:22   ` Jiri Olsa
2016-11-24  4:27 ` [PATCH 00/14] export perf overheads information Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161124161712.GA2444@remoulade \
    --to=mark.rutland@arm.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox