From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1880362399385135018==" MIME-Version: 1.0 From: Andrea Arcangeli To: lkp@lists.01.org Subject: Re: [mm] 09bc0443e9: will-it-scale.per_thread_ops -7.2% regression Date: Mon, 03 May 2021 21:59:11 -0400 Message-ID: In-Reply-To: List-Id: --===============1880362399385135018== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hello, here's the result of this benchmark work of all source code released under GPLv2 on github and kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=3Dm= ain&id=3Dd9c85cf85aeb8de7d1490aa97b19be2feb2a1048 Added: "This commits increases the SMP scalability of pin_user_pages_fast() executed by different threads of the same process by more than 4000%." https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=3Dm= ain&id=3D0d2285c622103f0314ced7485c3b5b43f870c2d3 Added: "will-it-scale "mmap2" shows no change in performance with enterprise config as expected. will-it-scale "pin_fast" retains the > 4000% SMP scalability performance improvement against upstream as expected. This is a noop as far as overall performance and SMP scalability are concerned. " Also documented in the summary of the mapcount_deshare branch: https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=3Dm= apcount_deshare&id=3D68e65632c1ce13e846aa39e38f8ea7399d5abbbd "- >4000% SMP scalability performance improvement to pin_user_pages_fast() with 1 thread per CPU and 2 NUMA nodes with 64 cores each." I didn't add any reported-by because the report couldn't be confirmed and there was no source code change that resulted from it, to the contrary I documented the improvement the code already delivered with will-it-scale (as it was already described in the commit header, but now it's exactly quantified and verified). Thanks, Andrea On Mon, May 03, 2021 at 09:10:14PM -0400, Andrea Arcangeli wrote: > On Sun, May 02, 2021 at 09:25:54PM -0400, Andrea Arcangeli wrote: > > =3D=3D=3D pin_fast will-it-scale follows =3D=3D=3D > > = > > In addition, to see a related scalability increase, I added a pin_fast > > testcase to will-it-scale that I'm submitting here to Anton (CC'ed). > > = > > To run it in lkp-test you only have to add the attached patch on top > > of your patch and then call "pin_fast" instead of "mmap2" in the > > invocation, like below: > > = > > # for i in `seq 3`; do python3 runtest.py pin_fast 295 thread `nproc`; = done > > = > > I recommend to add this to lkp-test since I think it's much more > > interesting than mmap2 and will show huge differences. Ideally we > > should add a both FOLL_WRITE test too later. > > = > > This is aa.git main branch commit 918037878bcf: > > = > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,1196513,0.19,0 > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,1194664,0.19,0 > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,1193194,0.19,0 > > = > > This is mainline, upstream commit 18a3c5f7abfd: > > = > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,25641,0.17,0 > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,25652,0.16,0 > > tasks,processes,processes_idle,threads,threads_idle,linear > > 0,0,100,0,100,0 > > 256,0,0.00,25559,0.16,0 > = > I now verified that the 4668% increase in scalability as expected is > thanks to this very patch: > = > https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id= =3D5a9bd1dce03d0a7c55c5f81992bc06fc6630f78d > = > "mm: gup: allow FOLL_PIN to scale in SMP" > = > And the patch you flagged as regression: > = > https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id= =3Dd2f271ca96b42cafd701d445f11af0d2ef993772 > = > changes nothing in performance, it retains the 4668% improvement in > SMP scalability of FOLL_PIN compared upstream, and it changes nothing > for all other tests, but it saves 64bit per vma of RAM, it packs the > structure. > = > All results matches what I described in the commit headers. > = > I'll document the 4668% improvement in SMP scalability of FOLL_PIN in > the first patch commit header now for the next rebase. > = > Thanks, > Andrea --===============1880362399385135018==--