Re: [PATCH 02/14] perf/x86: output NMI overhead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mark Rutland <mark.rutland@arm.com>
To: kan.liang@intel.com
Cc: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	linux-kernel@vger.kernel.org, alexander.shishkin@linux.intel.com,
	tglx@linutronix.de, namhyung@kernel.org, jolsa@kernel.org,
	adrian.hunter@intel.com, wangnan0@huawei.com,
	andi@firstfloor.org
Subject: Re: [PATCH 02/14] perf/x86: output NMI overhead
Date: Thu, 24 Nov 2016 16:19:09 +0000	[thread overview]
Message-ID: <20161124161712.GA2444@remoulade> (raw)
In-Reply-To: <1479894292-16277-3-git-send-email-kan.liang@intel.com>

On Wed, Nov 23, 2016 at 04:44:40AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> NMI handler is one of the most important part which brings overhead.
> 
> There are lots of NMI during sampling. It's very expensive to log each
> NMI. So the accumulated time and NMI# will be output when event is going
> to be disabled or task is scheduling out.
> The newly introduced flag PERF_EF_LOG indicate to output the overhead
> log.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  arch/x86/events/core.c          | 19 ++++++++++++++-
>  arch/x86/events/perf_event.h    |  2 ++
>  include/linux/perf_event.h      |  1 +
>  include/uapi/linux/perf_event.h |  2 ++
>  kernel/events/core.c            | 54 ++++++++++++++++++++++-------------------
>  5 files changed, 52 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index d31735f..6c3b0ef 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1397,6 +1397,11 @@ static void x86_pmu_del(struct perf_event *event, int flags)
>  
>  	perf_event_update_userpage(event);
>  
> +	if ((flags & PERF_EF_LOG) && cpuc->nmi_overhead.nr) {
> +		cpuc->nmi_overhead.cpu = smp_processor_id();
> +		perf_log_overhead(event, PERF_NMI_OVERHEAD, &cpuc->nmi_overhead);
> +	}
> +
>  do_del:
>  	if (x86_pmu.del) {
>  		/*
> @@ -1475,11 +1480,21 @@ void perf_events_lapic_init(void)
>  	apic_write(APIC_LVTPC, APIC_DM_NMI);
>  }
>  
> +static void
> +perf_caculate_nmi_overhead(u64 time)

s/caculate/calculate/ - this tripped me up when grepping.

> @@ -1492,8 +1507,10 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>  	start_clock = sched_clock();
>  	ret = x86_pmu.handle_irq(regs);
>  	finish_clock = sched_clock();
> +	clock = finish_clock - start_clock;
>  
> -	perf_sample_event_took(finish_clock - start_clock);
> +	perf_caculate_nmi_overhead(clock);
> +	perf_sample_event_took(clock);

Ah, so it's the *sampling* overhead, not the NMI overhead.

This doesn't take into account the cost of entering/exiting the handler, which
could be larger than the sampling overhead (e.g. if the PMU is connected
through chained interrupt controllers).

>  enum perf_record_overhead_type {
> +	PERF_NMI_OVERHEAD	= 0,

As above, it may be worth calling this PERF_SAMPLE_OVERHEAD; this doesn't count
the entire cost of the NMI, and other architectures may want to implement this,
yet don't have NMI.

[...]

>  static void
>  event_sched_out(struct perf_event *event,
>  		  struct perf_cpu_context *cpuctx,
> -		  struct perf_event_context *ctx)
> +		  struct perf_event_context *ctx,
> +		  bool log_overhead)

Boolean parameter are always confusing. Why not pass the flags directly? That
way we can pass *which* overhead to log, and make the callsites easier to
understand.

>  	event->tstamp_stopped = tstamp;
> -	event->pmu->del(event, 0);
> +	event->pmu->del(event, log_overhead ? PERF_EF_LOG : 0);

... which we could pass on here.

> @@ -1835,20 +1835,21 @@ event_sched_out(struct perf_event *event,
>  static void
>  group_sched_out(struct perf_event *group_event,
>  		struct perf_cpu_context *cpuctx,
> -		struct perf_event_context *ctx)
> +		struct perf_event_context *ctx,
> +		bool log_overhead)

Likewise.

> @@ -1872,7 +1873,7 @@ __perf_remove_from_context(struct perf_event *event,
>  {
>  	unsigned long flags = (unsigned long)info;
>  
> -	event_sched_out(event, cpuctx, ctx);
> +	event_sched_out(event, cpuctx, ctx, false);
>  	if (flags & DETACH_GROUP)
>  		perf_group_detach(event);
>  	list_del_event(event, ctx);
> @@ -1918,9 +1919,9 @@ static void __perf_event_disable(struct perf_event *event,
>  	update_cgrp_time_from_event(event);
>  	update_group_times(event);
>  	if (event == event->group_leader)
> -		group_sched_out(event, cpuctx, ctx);
> +		group_sched_out(event, cpuctx, ctx, true);
>  	else
> -		event_sched_out(event, cpuctx, ctx);
> +		event_sched_out(event, cpuctx, ctx, true);

Why does this differ from __perf_remove_from_context()?

What's the policy for when we do or do not measure overhead?

Thanks,
Mark.

next prev parent reply	other threads:[~2016-11-24 16:27 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23  9:44 [PATCH 00/14] export perf overheads information kan.liang
2016-11-23  9:44 ` [PATCH 01/14] perf/x86: Introduce PERF_RECORD_OVERHEAD kan.liang
2016-11-23 20:11   ` Peter Zijlstra
2016-11-23 20:13   ` Peter Zijlstra
2016-11-23 23:41   ` Jiri Olsa
2016-11-24 13:45     ` Liang, Kan
2016-11-24 13:50       ` Peter Zijlstra
2016-11-24 13:56         ` Liang, Kan
2016-11-24 14:27           ` Jiri Olsa
2016-11-24 14:39             ` Liang, Kan
2016-11-24 14:47               ` Jiri Olsa
2016-11-24 18:28         ` Andi Kleen
2016-11-24 18:58           ` Peter Zijlstra
2016-11-24 19:02             ` Andi Kleen
2016-11-24 19:08               ` Peter Zijlstra
2016-11-23  9:44 ` [PATCH 02/14] perf/x86: output NMI overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-24 16:19   ` Mark Rutland [this message]
2016-11-24 19:02     ` Peter Zijlstra
2016-11-24 19:40     ` Liang, Kan
2016-11-24 23:26       ` Namhyung Kim
2016-11-23  9:44 ` [PATCH 03/14] perf/x86: output multiplexing overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-23 20:09     ` Liang, Kan
2016-11-23  9:44 ` [PATCH 04/14] perf/x86: output side-band events overhead kan.liang
2016-11-23 20:06   ` Peter Zijlstra
2016-11-24 16:21   ` Mark Rutland
2016-11-24 19:40     ` Liang, Kan
2016-11-23  9:44 ` [PATCH 05/14] perf tools: handle PERF_RECORD_OVERHEAD record type kan.liang
2016-11-23 22:35   ` Jiri Olsa
2016-11-23 22:58     ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 06/14] perf tools: show NMI overhead kan.liang
2016-11-23 22:51   ` Jiri Olsa
2016-11-24 13:37     ` Liang, Kan
2016-11-24 15:27       ` Jiri Olsa
2016-11-24 23:20         ` Namhyung Kim
2016-11-24 23:45           ` Jiri Olsa
2016-11-25  0:21         ` Andi Kleen
2016-11-23 22:52   ` Jiri Olsa
2016-11-23 22:52   ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 07/14] perf tools: show multiplexing overhead kan.liang
2016-11-23  9:44 ` [PATCH 08/14] perf tools: show side-band events overhead kan.liang
2016-11-23  9:44 ` [PATCH 09/14] perf tools: make get_nsecs visible for buildin files kan.liang
2016-11-23  9:44 ` [PATCH 10/14] perf tools: introduce PERF_RECORD_USER_OVERHEAD kan.liang
2016-11-23  9:44 ` [PATCH 11/14] perf tools: record write data overhead kan.liang
2016-11-23 23:02   ` Jiri Olsa
2016-11-23 23:06   ` Jiri Olsa
2016-11-23  9:44 ` [PATCH 12/14] perf tools: record elapsed time kan.liang
2016-11-23  9:44 ` [PATCH 13/14] perf tools: warn on high overhead kan.liang
2016-11-23 20:25   ` Andi Kleen
2016-11-23 22:03     ` Liang, Kan
2016-11-25 20:42       ` Andi Kleen
2016-11-23  9:44 ` [PATCH 14/14] perf script: show overhead events kan.liang
2016-11-23 23:17   ` Jiri Olsa
2016-11-23 23:18   ` Jiri Olsa
2016-11-23 23:19   ` Jiri Olsa
2016-11-23 23:22   ` Jiri Olsa
2016-11-24  4:27 ` [PATCH 00/14] export perf overheads information Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161124161712.GA2444@remoulade \
    --to=mark.rutland@arm.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.