From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vincenzo Frascino Subject: Re: [PATCHv7 18/33] lib/vdso: Add unlikely() hint into vdso_read_begin() Date: Thu, 24 Oct 2019 10:30:20 +0100 Message-ID: <9aa9857e-ee1c-0117-bfcb-45fc6bcab866@arm.com> References: <20191011012341.846266-1-dima@arista.com> <20191011012341.846266-19-dima@arista.com> <100f6921-9081-7eb0-7acc-f10cfb647c21@arm.com> <20191024061311.GA4541@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20191024061311.GA4541@gmail.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Andrei Vagin Cc: Dmitry Safonov , linux-kernel@vger.kernel.org, Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andrei Vagin , Andy Lutomirski , Arnd Bergmann , Christian Brauner , Cyrill Gorcunov , "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , containers@lists.linux-foundation.org, criu@openvz.org, linux-api@vger.kernel.org, x86@kernel.org List-Id: linux-api@vger.kernel.org Hi Andrei, On 10/24/19 7:13 AM, Andrei Vagin wrote: > On Wed, Oct 16, 2019 at 12:24:14PM +0100, Vincenzo Frascino wrote: >> On 10/11/19 2:23 AM, Dmitry Safonov wrote: >>> From: Andrei Vagin >>> >>> Place the branch with no concurrent write before contended case. >>> >>> Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz >>> (more clock_gettime() cycles - the better): >>> | before | after >>> ----------------------------------- >>> | 150252214 | 153242367 >>> | 150301112 | 153324800 >>> | 150392773 | 153125401 >>> | 150373957 | 153399355 >>> | 150303157 | 153489417 >>> | 150365237 | 153494270 >>> ----------------------------------- >>> avg | 150331408 | 153345935 >>> diff % | 2 | 0 >>> ----------------------------------- >>> stdev % | 0.3 | 0.1 >>> >>> Signed-off-by: Andrei Vagin >>> Co-developed-by: Dmitry Safonov >>> Signed-off-by: Dmitry Safonov >> >> Reviewed-by: Vincenzo Frascino >> Tested-by: Vincenzo Frascino > > Hello Vincenzo, > > Could you test the attached patch on aarch64? On x86, it gives about 9% > performance improvement for CLOCK_MONOTONIC and CLOCK_BOOTTIME. > I did run similar tests in past with a previous version of the unified vDSO library and what I can tell based on the results of those is that the impact of "__always_inline" alone was around 7% on arm64, in fact I had a comment stating "To improve performances, in this file, __always_inline it is used for the functions called multiple times." in my implementation [1]. [1] https://bit.ly/2W9zMxB I spent some time yesterday trying to dig out why the approach did not make the cut but I could not infer it from the review process. > Here is my test: > https://github.com/avagin/vdso-perf > > It is calling clock_gettime() in a loop for three seconds and then > reports a number of iterations. > I am happy to run the test on arm64 and provide some results. > Thanks, > Andrei > -- Regards, Vincenzo