From: Jiri Olsa <jolsa@redhat.com>
To: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Namhyung Kim <namhyung@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 0/4] Reduce NUMA related overhead in perf record profiling on large server systems
Date: Wed, 9 Jan 2019 15:41:25 +0100 [thread overview]
Message-ID: <20190109144125.GA2515@krava> (raw)
In-Reply-To: <b31ddfd5-a017-d8da-14eb-d1b48bfc255b@linux.intel.com>
On Wed, Jan 09, 2019 at 12:19:20PM +0300, Alexey Budankov wrote:
>
> It has been observed that trace reading thread runs on the same hw thread
> most of the time during perf record sampling collection. This scheduling
> layout leads up to 30% profiling overhead in case when some cpu intensive
> workload fully utilizes a large server system with NUMA. Overhead usually
> arises from remote (cross node) HW and memory references that have much
> longer latencies than local ones [1].
>
> This patch set implements --affinity option that lowers 30% overhead
> completely for serial trace streaming (--affinity=cpu) and from 30% to
> 10% for AIO1 (--aio=1) trace streaming (--affinity=node|cpu).
> See OVERHEAD section below for more details.
>
> Implemented extension provides users with capability to instruct Perf
> tool to bounce trace reading thread's affinity mask between NUMA nodes
> (--affinity=node) or assign the thread to the exact cpu (--affinity=cpu)
> that trace buffer being processed belongs to.
>
> The extension brings improvement in case of full system utilization when
> Perf tool process contends with workload process on cpu cores. In case a
> system has free cores to execute Perf tool process during profiling the
> default system scheduling layout induces the lowest overhead.
>
> The patch set has been validated on BT benchmark from NAS Parallel
> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell
> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5
> (tip perf/core).
>
> OVERHEAD:
> BENCH REPORT BASED ELAPSED TIME BASED
> v4.20.0-rc5
> (tip perf/core):
>
> (current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
> SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
> SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
>
> AIO1-SYS / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
> AIO1-NODE / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
> AIO1-CPU / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
>
> v4.4.0-21-generic
> (Ubuntu 16.04 LTS):
>
> (current) SERIAL-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
> SERIAL-CPU / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
>
> AIO1-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> AIO1-NODE / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
> AIO1-CPU / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
>
> The patch set is generated for acme perf/core repository.
>
> ---
> Alexey Budankov (4):
> perf record: allocate affinity masks
> perf record: bind the AIO user space buffers to nodes
> perf record: apply affinity masks when reading mmap buffers
> perf record: implement --affinity=node|cpu option
hi,
can't apply your code on latest Arnaldo's perf/core:
Applying: perf record: allocate affinity masks
Applying: perf record: bind the AIO user space buffers to nodes
Applying: perf record: apply affinity masks when reading mmap buffers
Applying: perf record: implement --affinity=node|cpu option
error: corrupt patch at line 62
Patch failed at 0004 perf record: implement --affinity=node|cpu option
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
jirka
next prev parent reply other threads:[~2019-01-09 14:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-09 9:19 [PATCH v3 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
2019-01-09 9:35 ` [PATCH v3 1/4] perf record: allocate affinity masks Alexey Budankov
2019-01-09 9:37 ` [PATCH v3 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
2019-01-09 15:58 ` Jiri Olsa
2019-01-09 16:58 ` Alexey Budankov
2019-01-09 9:38 ` [PATCH v3 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
2019-01-09 16:53 ` Jiri Olsa
2019-01-10 9:41 ` Alexey Budankov
2019-01-10 9:54 ` Jiri Olsa
2019-01-10 10:19 ` Alexey Budankov
2019-01-09 9:40 ` [PATCH v3 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
2019-01-09 14:41 ` Jiri Olsa [this message]
2019-01-09 15:51 ` [PATCH v3 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Jiri Olsa
2019-01-09 16:11 ` Alexey Budankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190109144125.GA2515@krava \
--to=jolsa@redhat.com \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexey.budankov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.