[RFC][PATCH 6/7] perf: Optimize perF_pmu_sched_task()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: mingo@kernel.org, acme@kernel.org, linux-kernel@vger.kernel.org
Cc: andi@firstfloor.org, eranian@google.com, jolsa@kernel.org,
	torvalds@linux-foundation.org, davidcc@google.com,
	alexander.shishkin@linux.intel.com, namhyung@kernel.org,
	kan.liang@intel.com, khandual@linux.vnet.ibm.com,
	peterz@infradead.org
Subject: [RFC][PATCH 6/7] perf: Optimize perF_pmu_sched_task()
Date: Fri, 08 Jul 2016 15:31:05 +0200	[thread overview]
Message-ID: <20160708134113.649783957@infradead.org> (raw)
In-Reply-To: 20160708133059.031522978@infradead.org

[-- Attachment #1: peterz-perf-optimize-sched_task.patch --]
[-- Type: text/plain, Size: 3599 bytes --]

For perf record -b, which requires the pmu::sched_task callback the
current code is rather expensive:

     7.68%  sched-pipe  [kernel.vmlinux]    [k] perf_pmu_sched_task
     5.95%  sched-pipe  [kernel.vmlinux]    [k] __switch_to
     5.20%  sched-pipe  [kernel.vmlinux]    [k] __intel_pmu_disable_all
     3.95%  sched-pipe  perf                [.] worker_thread

The problem is that it will iterate all registered PMUs, most of which
will not have anything to do. Avoid this by keeping an explicit list
of PMUs that have requested the callback.

The perf_sched_cb_{inc,dec}() functions already take the required pmu
argument, and now that these functions are no longer called from NMI
context we can use them to manage a list.

With this patch applied the function doesn't show up in the top 4
anymore (it dropped to 18th place).

     6.67%  sched-pipe  [kernel.vmlinux]    [k] __switch_to
     6.18%  sched-pipe  [kernel.vmlinux]    [k] __intel_pmu_disable_all
     3.92%  sched-pipe  [kernel.vmlinux]    [k] switch_mm_irqs_off
     3.71%  sched-pipe  perf                [.] worker_thread

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h |    3 +++
 kernel/events/core.c       |   43 ++++++++++++++++++++++++-------------------
 2 files changed, 27 insertions(+), 19 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -757,6 +757,9 @@ struct perf_cpu_context {
 
 	struct pmu			*unique_pmu;
 	struct perf_cgroup		*cgrp;
+
+	struct list_head		sched_cb_entry;
+	int				sched_cb_usage;
 };
 
 struct perf_output_handle {
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2777,13 +2777,26 @@ static void perf_event_context_sched_out
 	}
 }
 
+static DEFINE_PER_CPU(struct list_head, sched_cb_list);
+
 void perf_sched_cb_dec(struct pmu *pmu)
 {
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
 	this_cpu_dec(perf_sched_cb_usages);
+
+	if (!--cpuctx->sched_cb_usage)
+		list_del(&cpuctx->sched_cb_entry);
 }
 
+
 void perf_sched_cb_inc(struct pmu *pmu)
 {
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+	if (!cpuctx->sched_cb_usage++)
+		list_add(&cpuctx->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
+
 	this_cpu_inc(perf_sched_cb_usages);
 }
 
@@ -2801,34 +2814,24 @@ static void perf_pmu_sched_task(struct t
 {
 	struct perf_cpu_context *cpuctx;
 	struct pmu *pmu;
-	unsigned long flags;
 
 	if (prev == next)
 		return;
 
-	local_irq_save(flags);
-
-	rcu_read_lock();
-
-	list_for_each_entry_rcu(pmu, &pmus, entry) {
-		if (pmu->sched_task) {
-			cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
-
-			perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+	list_for_each_entry(cpuctx, this_cpu_ptr(&sched_cb_list), sched_cb_entry) {
+		pmu = cpuctx->unique_pmu; /* software PMUs will not have sched_task */
 
-			perf_pmu_disable(pmu);
+		if (WARN_ON_ONCE(!pmu->sched_task))
+			continue;
 
-			pmu->sched_task(cpuctx->task_ctx, sched_in);
+		perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+		perf_pmu_disable(pmu);
 
-			perf_pmu_enable(pmu);
+		pmu->sched_task(cpuctx->task_ctx, sched_in);
 
-			perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
-		}
+		perf_pmu_enable(pmu);
+		perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
 	}
-
-	rcu_read_unlock();
-
-	local_irq_restore(flags);
 }
 
 static void perf_event_switch(struct task_struct *task,
@@ -10337,6 +10340,8 @@ static void __init perf_event_init_all_c
 
 		INIT_LIST_HEAD(&per_cpu(pmu_sb_events.list, cpu));
 		raw_spin_lock_init(&per_cpu(pmu_sb_events.lock, cpu));
+
+		INIT_LIST_HEAD(&per_cpu(sched_cb_list, cpu));
 	}
 }

next prev parent reply	other threads:[~2016-07-08 14:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08 13:30 [RFC][PATCH 0/7] perf: Branch stack annotation and fixes Peter Zijlstra
2016-07-08 13:31 ` [RFC][PATCH 1/7] perf/x86/intel: Rework the large PEBS setup code Peter Zijlstra
2016-07-08 16:36   ` Jiri Olsa
2016-07-08 22:00     ` Peter Zijlstra
2016-07-08 22:25       ` Peter Zijlstra
2016-07-10  9:08         ` Jiri Olsa
2016-07-08 13:31 ` [RFC][PATCH 2/7] perf,x86: Ensure perf_sched_cb_{inc,dec}() is only called from pmu::{add,del}() Peter Zijlstra
2016-07-08 13:31 ` [RFC][PATCH 3/7] perf/x86/intel: DCE intel_pmu_lbr_del() Peter Zijlstra
2016-07-08 13:31 ` [RFC][PATCH 4/7] perf/x86/intel: Remove redundant test from intel_pmu_lbr_add() Peter Zijlstra
2016-07-08 13:31 ` [RFC][PATCH 5/7] perf/x86/intel: Clean up LBR state tracking Peter Zijlstra
2016-07-08 13:31 ` Peter Zijlstra [this message]
2016-07-08 13:31 ` [RFC][PATCH 7/7] perf/annotate: Add branch stack / basic block information Peter Zijlstra
2016-07-08 14:55   ` Ingo Molnar
2016-07-08 16:27     ` Peter Zijlstra
2016-07-08 16:36       ` Peter Zijlstra
2016-09-08 16:18         ` Arnaldo Carvalho de Melo
2016-09-08 16:41           ` Peter Zijlstra
2016-09-08 16:51             ` Peter Zijlstra
2016-09-08 17:07               ` Arnaldo Carvalho de Melo
2016-09-08 16:43           ` Stephane Eranian
2016-09-08 16:59             ` Andi Kleen
2016-09-08 17:11               ` Arnaldo Carvalho de Melo
2016-09-09  2:40                 ` Jin, Yao
2016-09-08 18:15             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160708134113.649783957@infradead.org \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).