Re: Optimize perf stat for large number of events/cpus v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jiri Olsa <jolsa@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: acme@kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org,
	eranian@google.com, kan.liang@linux.intel.com,
	peterz@infradead.org
Subject: Re: Optimize perf stat for large number of events/cpus v2
Date: Tue, 22 Oct 2019 10:02:23 +0200	[thread overview]
Message-ID: <20191022080223.GC28177@krava> (raw)
In-Reply-To: <20191020175202.32456-1-andi@firstfloor.org>

On Sun, Oct 20, 2019 at 10:51:53AM -0700, Andi Kleen wrote:
> [The earlier v1 version had a lot of conflicts against some
> recent libperf changes in tip/perf/core. Resolve that and
> also fix some minor issues.]
> 
> This patch kit optimizes perf stat for a large number of events 
> on systems with many CPUs and PMUs.
> 
> Some profiling shows that the most overhead is doing IPIs to
> all the target CPUs. We can optimize this by using sched_setaffinity
> to set the affinity to a target CPU once and then doing
> the perf operation for all events on that CPU. This requires
> some restructuring, but cuts the set up time quite a bit.
> 
> In theory we could go further by parallelizing these setups
> too, but that would be much more complicated and for now just batching it
> per CPU seems to be sufficient. At some point with many more cores 
> parallelization or a better bulk perf setup API might be needed though.
> 
> In addition perf does a lot of redundant /sys accesses with
> many PMUs, which can be also expensve. This is also optimized.
> 
> On a large test case (>700 events with many weak groups) on a 94 CPU
> system I go from
> 
> real	0m8.607s
> user	0m0.550s
> sys	0m8.041s
> 
> to 
> 
> real	0m3.269s
> user	0m0.760s
> sys	0m1.694s
> 
> so shaving ~6 seconds of system time, at slightly more cost
> in perf stat itself. On a 4 socket system with the savings
> are more dramatic:
> 
> real	0m15.641s
> user	0m0.873s
> sys	0m14.729s
> 
> to 
> 
> real	0m4.493s
> user	0m1.578s
> sys	0m2.444s
> 
> so 11s difference in the user visible set up time.
> 
> Also available in 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-4
> 
> v1: Initial post.
> v2: Rebase. Fix some minor issues.

looks really helpful, I ack-ed 1st 2 patches,
I'll need more time for the rest

thanks,
jirka

next prev parent reply	other threads:[~2019-10-22  8:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-20 17:51 Optimize perf stat for large number of events/cpus v2 Andi Kleen
2019-10-20 17:51 ` [PATCH v2 1/9] perf evsel: Always preserve errno while cleaning up perf_event_open failures Andi Kleen
2019-10-22  8:01   ` Jiri Olsa
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Andi Kleen
2019-10-20 17:51 ` [PATCH v2 2/9] perf evsel: Avoid close(-1) Andi Kleen
2019-10-22  8:01   ` Jiri Olsa
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Andi Kleen
2019-10-20 17:51 ` [PATCH v2 3/9] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
2019-10-23  9:47   ` Jiri Olsa
2019-10-20 17:51 ` [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
2019-10-23  9:59   ` Jiri Olsa
2019-10-23 13:02     ` Andi Kleen
2019-10-23 14:30       ` Jiri Olsa
2019-10-23 14:52         ` Andi Kleen
2019-10-23 16:16           ` Alexey Budankov
2019-10-23 17:19             ` Andi Kleen
2019-10-23 18:08               ` Alexey Budankov
2019-10-23 22:37                 ` Andi Kleen
2019-10-24  8:46                   ` Alexey Budankov
2019-10-20 17:51 ` [PATCH v2 5/9] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
2019-10-20 17:51 ` [PATCH v2 6/9] perf stat: Use affinity for closing file descriptors Andi Kleen
2019-10-20 17:52 ` [PATCH v2 7/9] perf stat: Use affinity for opening events Andi Kleen
2019-10-20 17:52 ` [PATCH v2 8/9] perf stat: Use affinity for reading Andi Kleen
2019-10-20 17:52 ` [PATCH v2 9/9] perf stat: Use affinity for enabling/disabling events Andi Kleen
2019-10-23 10:30   ` Jiri Olsa
2019-10-23 13:07     ` Andi Kleen
2019-10-22  8:02 ` Jiri Olsa [this message]
2019-10-22 14:11   ` Optimize perf stat for large number of events/cpus v2 Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022080223.GC28177@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=andi@firstfloor.org \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.