Re: [RFC 2/2] perf/core: Invoke pmu::sched_task callback for per-cpu events

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>, Ian Rogers <irogers@google.com>,
	Gabriel Marin <gmx@google.com>
Subject: Re: [RFC 2/2] perf/core: Invoke pmu::sched_task callback for per-cpu events
Date: Thu, 5 Nov 2020 18:12:20 -0500	[thread overview]
Message-ID: <d86b24aa-2887-0e29-2221-925acef77f8d@linux.intel.com> (raw)
In-Reply-To: <CABPqkBSJPyE0txSgRzRVucMjd+X1XyUiGe141Kcn1y3v7G9k=g@mail.gmail.com>



On 11/5/2020 4:15 PM, Stephane Eranian wrote:
> On Thu, Nov 5, 2020 at 11:40 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 11/5/2020 10:54 AM, Namhyung Kim wrote:
>>>> -void perf_sched_cb_inc(struct pmu *pmu)
>>>> +void perf_sched_cb_inc(struct pmu *pmu, bool systemwide)
>>>>     {
>>>>           struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
>>>>
>>>> -       if (!cpuctx->sched_cb_usage++)
>>>> -               list_add(&cpuctx->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
>>>> +       cpuctx->sched_cb_usage++;
>>>>
>>>> -       this_cpu_inc(perf_sched_cb_usages);
>>>> +       if (systemwide) {
>>>> +               this_cpu_inc(perf_sched_cb_usages);
>>>> +               list_add(&cpuctx->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
>>> You need to check the value and make sure it's added only once.
>>
>> Right, maybe we have to add a new variable for that.
>>
> Sure, I tend to agree here that we need a narrower scope trigger, only
> when needed, i.e., an event
> or PMU feature that requires ctxsw work. In fact, I am also interested
> in splitting ctxswin and ctswout
> callbacks. The reason is that you have overhead in the way the
> callback is invoked. You may end up
> calling the sched_task on ctxswout when only ctxwin is needed. In
> doing that you pay the cost of
> stopping/starting the PMU for possibly nothing. Stopping the PMU can
> be expensive, like on AMD
> where you need multiple wrmsr.
> 
> So splitting or adding a flag to convey that either CTXSW_IN or
> CTXSW_OUT is needed would help.
> I am suggesting this now given you are adding a flag.
> 

Yes, adding a new flag would avoid the unnecessary PMU stop/restart for 
both per-cpu and per-process event.

How about the patch as below?
(Just did simple test. Should need another patch to split the callback.)

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c
index 6586f7e71cfb..e5837fdf82e3 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -370,6 +370,7 @@ static void power_pmu_bhrb_reset(void)
  static void power_pmu_bhrb_enable(struct perf_event *event)
  {
  	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	int state = PERF_SCHED_CB_SW_IN;

  	if (!ppmu->bhrb_nr)
  		return;
@@ -380,7 +381,11 @@ static void power_pmu_bhrb_enable(struct perf_event 
*event)
  		cpuhw->bhrb_context = event->ctx;
  	}
  	cpuhw->bhrb_users++;
-	perf_sched_cb_inc(event->ctx->pmu);
+
+	if (!(event->attach_state & PERF_ATTACH_TASK))
+		state |= PERF_SCHED_CB_CPU;
+
+	perf_sched_cb_inc(event->ctx->pmu, state);
  }

  static void power_pmu_bhrb_disable(struct perf_event *event)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 444e5f061d04..f4f9d737ca85 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1019,12 +1019,16 @@ pebs_update_state(bool needed_cb, struct 
cpu_hw_events *cpuc,
  	 * that does not hurt:
  	 */
  	bool update = cpuc->n_pebs == 1;
+	int state = PERF_SCHED_CB_SW_OUT;

  	if (needed_cb != pebs_needs_sched_cb(cpuc)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			state |= PERF_SCHED_CB_CPU;
+
  		if (!needed_cb)
-			perf_sched_cb_inc(pmu);
+			perf_sched_cb_inc(pmu, state);
  		else
-			perf_sched_cb_dec(pmu);
+			perf_sched_cb_dec(pmu, state);

  		update = true;
  	}
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 8961653c5dd2..e4c500580df5 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -660,6 +660,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
  {
  	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int state = PERF_SCHED_CB_SW_IN;

  	if (!x86_pmu.lbr_nr)
  		return;
@@ -693,7 +694,13 @@ void intel_pmu_lbr_add(struct perf_event *event)
  	 */
  	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip > 0)
  		cpuc->lbr_pebs_users++;
-	perf_sched_cb_inc(event->ctx->pmu);
+
+	if (!(event->attach_state & PERF_ATTACH_TASK))
+		state |= PERF_SCHED_CB_CPU;
+	if (event->attach_state & PERF_ATTACH_TASK_DATA)
+		state |= PERF_SCHED_CB_SW_OUT;
+
+	perf_sched_cb_inc(event->ctx->pmu, state);
  	if (!cpuc->lbr_users++ && !event->total_time_running)
  		intel_pmu_lbr_reset();

@@ -724,6 +731,7 @@ void release_lbr_buffers(void)
  void intel_pmu_lbr_del(struct perf_event *event)
  {
  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int state = 0;

  	if (!x86_pmu.lbr_nr)
  		return;
@@ -740,7 +748,10 @@ void intel_pmu_lbr_del(struct perf_event *event)
  	cpuc->lbr_users--;
  	WARN_ON_ONCE(cpuc->lbr_users < 0);
  	WARN_ON_ONCE(cpuc->lbr_pebs_users < 0);
-	perf_sched_cb_dec(event->ctx->pmu);
+
+	if (!(event->attach_state & PERF_ATTACH_TASK))
+		state |= PERF_SCHED_CB_CPU;
+	perf_sched_cb_dec(event->ctx->pmu, state);
  }

  static inline bool vlbr_exclude_host(void)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a1b91f2de264..1556fdbfc615 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -854,6 +854,10 @@ struct perf_event_context {
   */
  #define PERF_NR_CONTEXTS	4

+#define PERF_SCHED_CB_CPU	0x1
+#define PERF_SCHED_CB_SW_IN	0x2
+#define PERF_SCHED_CB_SW_OUT	0x4
+
  /**
   * struct perf_event_cpu_context - per cpu event context structure
   */
@@ -875,6 +879,8 @@ struct perf_cpu_context {

  	struct list_head		sched_cb_entry;
  	int				sched_cb_usage;
+	int				sched_cb_cpu_usage;
+	int				sched_cb_state;

  	int				online;
  	/*
@@ -967,8 +973,8 @@ extern const struct perf_event_attr 
*perf_event_attrs(struct perf_event *event);
  extern void perf_event_print_debug(void);
  extern void perf_pmu_disable(struct pmu *pmu);
  extern void perf_pmu_enable(struct pmu *pmu);
-extern void perf_sched_cb_dec(struct pmu *pmu);
-extern void perf_sched_cb_inc(struct pmu *pmu);
+extern void perf_sched_cb_dec(struct pmu *pmu, int state);
+extern void perf_sched_cb_inc(struct pmu *pmu, int state);
  extern int perf_event_task_disable(void);
  extern int perf_event_task_enable(void);

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 66a9bd71f3da..3a9d017a75b6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3384,6 +3384,34 @@ static void perf_event_sync_stat(struct 
perf_event_context *ctx,
  	}
  }

+static inline bool perf_pmu_sched_in_ctx(struct perf_cpu_context *cpuctx)
+{
+	return !!(cpuctx->sched_cb_state & PERF_SCHED_CB_SW_IN);
+}
+
+static inline bool perf_pmu_sched_out_ctx(struct perf_cpu_context *cpuctx)
+{
+	return !!(cpuctx->sched_cb_state & PERF_SCHED_CB_SW_OUT);
+}
+
+static inline bool perf_pmu_has_sched_task(struct perf_cpu_context 
*cpuctx, bool sched_in)
+{
+	struct pmu *pmu;
+
+	pmu = cpuctx->ctx.pmu; /* software PMUs will not have sched_task */
+
+	if (!pmu->sched_task)
+		return false;
+
+	if (sched_in && perf_pmu_sched_in_ctx(cpuctx))
+		return true;
+
+	if (!sched_in && perf_pmu_sched_out_ctx(cpuctx))
+		return true;
+
+	return false;
+}
+
  static void perf_event_context_sched_out(struct task_struct *task, int 
ctxn,
  					 struct task_struct *next)
  {
@@ -3433,7 +3461,7 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,

  			perf_pmu_disable(pmu);

-			if (cpuctx->sched_cb_usage && pmu->sched_task)
+			if (cpuctx->sched_cb_usage && perf_pmu_has_sched_task(cpuctx, false))
  				pmu->sched_task(ctx, false);

  			/*
@@ -3473,7 +3501,7 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
  		raw_spin_lock(&ctx->lock);
  		perf_pmu_disable(pmu);

-		if (cpuctx->sched_cb_usage && pmu->sched_task)
+		if (cpuctx->sched_cb_usage && perf_pmu_has_sched_task(cpuctx, false))
  			pmu->sched_task(ctx, false);
  		task_ctx_sched_out(cpuctx, ctx, EVENT_ALL);

@@ -3484,22 +3512,35 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,

  static DEFINE_PER_CPU(struct list_head, sched_cb_list);

-void perf_sched_cb_dec(struct pmu *pmu)
+void perf_sched_cb_dec(struct pmu *pmu, int state)
  {
  	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);

+	if (!--cpuctx->sched_cb_usage)
+		cpuctx->sched_cb_state = 0;
+
+	if (!(state & PERF_SCHED_CB_CPU))
+		return;
+
  	this_cpu_dec(perf_sched_cb_usages);

-	if (!--cpuctx->sched_cb_usage)
+	if (!--cpuctx->sched_cb_cpu_usage)
  		list_del(&cpuctx->sched_cb_entry);
  }


-void perf_sched_cb_inc(struct pmu *pmu)
+void perf_sched_cb_inc(struct pmu *pmu, int state)
  {
  	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);

-	if (!cpuctx->sched_cb_usage++)
+	cpuctx->sched_cb_usage++;
+
+	cpuctx->sched_cb_state |= state;
+
+	if (!(state & PERF_SCHED_CB_CPU))
+		return;
+
+	if (!cpuctx->sched_cb_cpu_usage++)
  		list_add(&cpuctx->sched_cb_entry, this_cpu_ptr(&sched_cb_list));

  	this_cpu_inc(perf_sched_cb_usages);
@@ -3544,8 +3585,8 @@ static void perf_pmu_sched_task(struct task_struct 
*prev,
  		/* will be handled in perf_event_context_sched_in/out */
  		if (cpuctx->task_ctx)
  			continue;
-
-		__perf_pmu_sched_task(cpuctx, sched_in);
+		if (perf_pmu_has_sched_task(cpuctx, sched_in))
+			__perf_pmu_sched_task(cpuctx, sched_in);
  	}
  }

@@ -3809,7 +3850,8 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,

  	cpuctx = __get_cpu_context(ctx);
  	if (cpuctx->task_ctx == ctx) {
-		if (cpuctx->sched_cb_usage)
+		if (cpuctx->sched_cb_usage &&
+		    (cpuctx->sched_cb_state & PERF_SCHED_CB_SW_IN))
  			__perf_pmu_sched_task(cpuctx, true);
  		return;
  	}
@@ -3835,7 +3877,7 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
  		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
  	perf_event_sched_in(cpuctx, ctx, task);

-	if (cpuctx->sched_cb_usage && pmu->sched_task)
+	if (cpuctx->sched_cb_usage && perf_pmu_has_sched_task(cpuctx, true))
  		pmu->sched_task(cpuctx->task_ctx, true);

  	perf_pmu_enable(pmu);



Thanks,
Kan

next prev parent reply	other threads:[~2020-11-05 23:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 14:52 [RFC 0/2] perf/core: Invoke pmu::sched_task callback for cpu events Namhyung Kim
2020-11-02 14:52 ` [RFC 1/2] perf/core: Enable sched_task callbacks if PMU has it Namhyung Kim
2020-11-05 14:47   ` Liang, Kan
2020-11-05 15:45     ` Namhyung Kim
2020-11-05 19:01       ` Liang, Kan
2020-11-06  0:53         ` Namhyung Kim
2020-11-06 16:11           ` Liang, Kan
2020-11-02 14:52 ` [RFC 2/2] perf/core: Invoke pmu::sched_task callback for per-cpu events Namhyung Kim
2020-11-05 14:48   ` Liang, Kan
2020-11-05 15:54     ` Namhyung Kim
2020-11-05 19:39       ` Liang, Kan
2020-11-05 21:15         ` Stephane Eranian
2020-11-05 23:12           ` Liang, Kan [this message]
2020-11-05  8:29 ` [RFC 0/2] perf/core: Invoke pmu::sched_task callback for cpu events Stephane Eranian

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6586f7e71cf dfblob:e5837fdf82e dfblob:444e5f061d0
dfblob:f4f9d737ca8 dfblob:8961653c5dd dfblob:e4c500580df
dfblob:a1b91f2de26 dfblob:1556fdbfc61 dfblob:66a9bd71f3d
dfblob:3a9d017a75b )
 OR (
bs:"Re: [RFC 2/2] perf/core: Invoke pmu::sched_task callback for per-cpu events" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d86b24aa-2887-0e29-2221-925acef77f8d@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=gmx@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox