From: Andrei Vagin <avagin@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Safonov <dima@arista.com>,
linux-kernel@vger.kernel.org, Adrian Reber <adrian@lisas.de>,
Andrei Vagin <avagin@openvz.org>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Christian Brauner <christian.brauner@ubuntu.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Dmitry Safonov <0x7f454c46@gmail.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>,
Oleg Nesterov <oleg@redhat.com>,
Pavel Emelyanov <xemul@virtuozzo.com>,
Shuah Khan <shuah@kernel.org>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
containers@lists.linux-foundation.org, criu@openvz.org,
linux-api@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
Date: Sat, 22 Jun 2019 22:26:48 -0700 [thread overview]
Message-ID: <20190623052647.GA9838@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1906141610060.1722@nanos.tec.linutronix.de>
On Fri, Jun 14, 2019 at 04:13:31PM +0200, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>
> > From: Andrei Vagin <avagin@gmail.com>
> >
> > After performance testing VDSO patches a noticeable 20% regression was
> > found on gettime_perf selftest with a cold cache.
> > As it turns to be, before time namespaces introduction, VDSO functions
> > were quite aligned to cache lines, but adding a new code to adjust
> > timens offset inside namespace created a small shift and vdso functions
> > become unaligned on cache lines.
> >
> > Add align to vdso functions with gcc option to fix performance drop.
> >
> > Coping the resulting numbers from cover letter:
> >
> > Hot CPU cache (more gettime_perf.c cycles - the better):
> > | before | CONFIG_TIME_NS=n | host | inside timens
> > --------|------------|------------------|-------------|-------------
> > cycles | 139887013 | 139453003 | 139899785 | 128792458
> > diff (%)| 100 | 99.7 | 100 | 92
>
> Why is CONFIG_TIME_NS=n behaving worse than current mainline and
> worse than 'host' mode?
We had to specify a precision of these numbers, it is more than this
0.3%, so at that time I decided that here is nothing to worry about. I
did these measurments a few mounth ago for the second version of this
series. I repeated measurments for this set of patches:
| before | CONFIG_TIME_NS=n | host | inside timens
--------------------------------------------------------------
| 144645498 | 142916801 | 140364862 | 132378440
| 143440633 | 141545739 | 140540053 | 132714190
| 144876395 | 144650599 | 140026814 | 131843318
| 143984551 | 144595770 | 140359260 | 131683544
| 144875682 | 143799788 | 140692618 | 131300332
--------------------------------------------------------------
avg | 144364551 | 143501739 | 140396721 | 131983964
diff % | 100 | 99.4 | 97.2 | 91.4
-------------------------------------------------------------
stdev % | 0.4 | 0.9 | 0.1 | 0.4
>
> > Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
> > | before | CONFIG_TIME_NS=n | host | inside timens
> > --------|------------|------------------|-------------|-------------
> > tsc | 6748 | 6718 | 6862 | 12682
> > diff (%)| 100 | 99.6 | 101.7 | 188
>
> Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode
> drops.
The precision of these numbers is much smaller than of the previous set.
These numbers are for the second version of this series, so I decided to
repeat measurements for this version. When I run the test, I found that
there is some degradation in compare with v5.0. I bisected and found
that the problem is in 2b539aefe9e4 ("mm/resource: Let
walk_system_ram_range() search child resources"). At this point, I
realized that my test isn't quite right. On each iteration, the test
starts a new process, then do start=rdtsc();clock_gettime();end=rdtsc()
and prints (end-start). The problem here is that when clock_gettime() is
called the first time, vdso pages are not mapped into a process address
space, so the test measures how fast vdso pages are mapped into the
process address space. I modified this test, now it uses the clflush
instruction to drop cpu caches. Here are the results:
| before | CONFIG_TIME_NS=n | host | inside timens
--------------------------------------------------------------
tsc | 434 | 433 | 437 | 477
stdev(tsc) | 5 | 5 | 5 | 3
diff (%) | 1 | 1 | 100.1 | 109
Here is the source code for the modified test:
https://github.com/avagin/linux-task-diag/blob/wip/timens-rfc-v4/tools/testing/selftests/timens/gettime_perf_cold.c
This test does 10K iterations. At the first glance, the numbers look
noisy, so I sort them and take only 8K numbers in the middle:
$ ./gettime_perf_cold > raw
$ cat raw | sort -n | tail -n 9000 | head -n 8000 > results
>
> Either I'm misreading the numbers or missing something or I'm just confused
> as usual :)
>
> Thanks,
> > tglx
next prev parent reply other threads:[~2019-06-23 5:26 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
2019-06-12 19:25 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
2019-06-14 13:11 ` Thomas Gleixner
2019-06-14 14:32 ` Dmitry Safonov
2019-07-29 22:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 13:32 ` Thomas Gleixner
2019-06-14 14:39 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 13:37 ` Thomas Gleixner
2019-06-16 17:43 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 13:42 ` Thomas Gleixner
2019-06-16 17:45 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 13:49 ` Thomas Gleixner
2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
2019-06-14 13:50 ` Thomas Gleixner
2019-06-16 17:48 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 13:58 ` Thomas Gleixner
2019-06-16 17:49 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 16/28] x86/vdso: Allocate timens vdso Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 14:05 ` Thomas Gleixner
2019-06-16 17:51 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 18/28] vdso: introduce timens_static_branch Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 19/28] timens: Add align for timens_offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 22/28] selftest/timens: Add a test for timerfd Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 24/28] selftest/timens: Add procfs selftest Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 25/28] selftest/timens: Add timer offsets test Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-14 14:13 ` Thomas Gleixner
2019-06-23 5:26 ` Andrei Vagin [this message]
2019-06-12 19:26 ` [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov
2019-06-12 19:26 ` Dmitry Safonov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190623052647.GA9838@gmail.com \
--to=avagin@gmail.com \
--cc=0x7f454c46@gmail.com \
--cc=adrian@lisas.de \
--cc=arnd@arndb.de \
--cc=avagin@openvz.org \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux-foundation.org \
--cc=criu@openvz.org \
--cc=dima@arista.com \
--cc=ebiederm@xmission.com \
--cc=gorcunov@openvz.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jdike@addtoit.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=vincenzo.frascino@arm.com \
--cc=x86@kernel.org \
--cc=xemul@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.