All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: David Carrillo-Cisneros <davidcc@google.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Shivappa Vikas <vikas.shivappa@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Stephane Eranian <eranian@google.com>,
	hpa@zytor.com
Subject: Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation
Date: Tue, 27 Dec 2016 15:10:49 -0800	[thread overview]
Message-ID: <20161227231049.GT26852@two.firstfloor.org> (raw)
In-Reply-To: <CALcN6mjDo-0bqZb1ePrWXkAhVbe2TBAvadM3xqEyscovZCQoWQ@mail.gmail.com>

On Tue, Dec 27, 2016 at 01:33:46PM -0800, David Carrillo-Cisneros wrote:
> When using one intel_cmt/llc_occupancy/ cgroup perf_event in one CPU, the
> avg time to do __perf_event_task_sched_out + __perf_event_task_sched_in is
> ~1170ns
> 
> most of the time is spend in cgroup ctx switch (~1120ns) .
> 
> When using continuous monitoring in CQM driver, the avg time to
> find the rmid to write inside of pqr_context switch  is ~16ns
> 
> Note that this excludes the MSR write. It's only the overhead of
> finding the RMID
> to write in PQR_ASSOC. Both paths call the same routine to find the
> RMID, so there are
> about 1100 ns of overhead in perf_cgroup_switch. By inspection I assume most
> of it comes from iterating over the pmu list.

Do Kan's pmu list patches help? 

https://patchwork.kernel.org/patch/9420035/

> 
> > Or is there some other overhead other than the MSR write
> > you're concerned about?
> 
> No, that problem is solved with the PQR software cache introduced in the series.

So it's already fixed?

How much is the cost with your cache?

> 
> 
> > Perhaps some optimization could be done in the code to make it faster,
> > then the new interface wouldn't be needed.
> 
> There are some. One in my list is to create a list of pmus with at
> least one cgroup event
> and use it to iterate over in perf_cgroup_switch, instead of using the
> "pmus" list.
> The pmus list has grown a lot recently with the addition of all the uncore pmus.

Kan's patches above already do that I believe.

> 
> Despite this optimization, it's unlikely that the whole sched_out +
> sched_in gets that
> close to the 15 ns of the non perf_event approach.

It would be good to see how close we can get. I assume
there is more potential for optimizations and fast pathing.

-Andi

  reply	other threads:[~2016-12-27 23:10 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16 23:12 [PATCH V4 00/14] Cqm2: Intel Cache Monitoring fixes and enhancements Vikas Shivappa
2016-12-16 23:12 ` [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation Vikas Shivappa
2016-12-23 12:32   ` Peter Zijlstra
2016-12-23 19:35     ` Shivappa Vikas
2016-12-23 20:33       ` Peter Zijlstra
2016-12-23 21:41         ` Shivappa Vikas
2016-12-25  1:51         ` Shivappa Vikas
2016-12-27  7:13           ` David Carrillo-Cisneros
2016-12-27 20:00           ` Andi Kleen
2016-12-27 20:21             ` Shivappa Vikas
2016-12-27 21:38               ` David Carrillo-Cisneros
2016-12-27 21:33             ` David Carrillo-Cisneros
2016-12-27 23:10               ` Andi Kleen [this message]
2016-12-28  1:23                 ` David Carrillo-Cisneros
2016-12-28 20:03                   ` Shivappa Vikas
2016-12-16 23:12 ` [PATCH 02/14] x86/cqm: Remove cqm recycling/conflict handling Vikas Shivappa
2016-12-16 23:12 ` [PATCH 03/14] x86/rdt: Add rdt common/cqm compile option Vikas Shivappa
2016-12-16 23:12 ` [PATCH 04/14] x86/cqm: Add Per pkg rmid support Vikas Shivappa
2016-12-16 23:12 ` [PATCH 05/14] x86/cqm,perf/core: Cgroup support prepare Vikas Shivappa
2016-12-16 23:13 ` [PATCH 06/14] x86/cqm: Add cgroup hierarchical monitoring support Vikas Shivappa
2016-12-16 23:13 ` [PATCH 07/14] x86/rdt,cqm: Scheduling support update Vikas Shivappa
2016-12-16 23:13 ` [PATCH 08/14] x86/cqm: Add support for monitoring task and cgroup together Vikas Shivappa
2016-12-16 23:13 ` [PATCH 09/14] x86/cqm: Add Continuous cgroup monitoring Vikas Shivappa
2016-12-16 23:13 ` [PATCH 10/14] x86/cqm: Add RMID reuse Vikas Shivappa
2016-12-16 23:13 ` [PATCH 11/14] x86/cqm: Add failure on open and read Vikas Shivappa
2016-12-23 11:58   ` David Carrillo-Cisneros
2016-12-16 23:13 ` [PATCH 12/14] perf/core,x86/cqm: Add read for Cgroup events,per pkg reads Vikas Shivappa
2016-12-16 23:13 ` [PATCH 13/14] perf/stat: fix bug in handling events in error state Vikas Shivappa
2016-12-16 23:13 ` [PATCH 14/14] perf/stat: revamp read error handling, snapshot and per_pkg events Vikas Shivappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161227231049.GT26852@two.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@intel.com \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.