From mboxrd@z Thu Jan 1 00:00:00 1970 From: Waiman Long Subject: Re: [PATCH RFC 1/2] qrwlock: A queue read/write lock implementation Date: Tue, 23 Jul 2013 20:03:36 -0400 Message-ID: <51EF19D8.2090307@hp.com> References: <1373679249-27123-1-git-send-email-Waiman.Long@hp.com> <1373679249-27123-2-git-send-email-Waiman.Long@hp.com> <51E49FA3.4030202@hp.com> <20130718074204.GA22623@gmail.com> <51E7F03A.4090305@hp.com> <20130719084023.GB25784@gmail.com> <51E95B85.8090003@hp.com> <20130722103402.GA1991@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from g1t0026.austin.hp.com ([15.216.28.33]:29155 "EHLO g1t0026.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752810Ab3GXAD4 (ORCPT ); Tue, 23 Jul 2013 20:03:56 -0400 In-Reply-To: <20130722103402.GA1991@gmail.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Ingo Molnar Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Arnd Bergmann , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Steven Rostedt , Andrew Morton , Richard Weinberger , Catalin Marinas , Greg Kroah-Hartman , Matt Fleming , Herbert Xu , Akinobu Mita , Rusty Russell , Michel Lespinasse , Andi Kleen , Rik van Riel , "Paul E. McKenney" , Linus Torvalds , "Chandramouleeswaran, Aswin" , Norton, Sc On 07/22/2013 06:34 AM, Ingo Molnar wrote: > * Waiman Long wrote: > >> I had run some performance tests using the fserver and new_fserver >> benchmarks (on ext4 filesystems) of the AIM7 test suite on a 80-core >> DL980 with HT on. The following kernels were used: >> >> 1. Modified 3.10.1 kernel with mb_cache_spinlock in fs/mbcache.c >> replaced by a rwlock >> 2. Modified 3.10.1 kernel + modified __read_lock_failed code as suggested >> by Ingo >> 3. Modified 3.10.1 kernel + queue read/write lock >> 4. Modified 3.10.1 kernel + queue read/write lock in classic read/write >> lock behavior >> >> The last one is with the read lock stealing flag set in the qrwlock >> structure to give priority to readers and behave more like the classic >> read/write lock with less fairness. >> >> The following table shows the averaged results in the 200-1000 >> user ranges: >> >> +-----------------+--------+--------+--------+--------+ >> | Kernel | 1 | 2 | 3 | 4 | >> +-----------------+--------+--------+--------+--------+ >> | fserver JPM | 245598 | 274457 | 403348 | 411941 | >> | % change from 1 | 0% | +11.8% | +64.2% | +67.7% | >> +-----------------+--------+--------+--------+--------+ >> | new-fserver JPM | 231549 | 269807 | 399093 | 399418 | >> | % change from 1 | 0% | +16.5% | +72.4% | +72.5% | >> +-----------------+--------+--------+--------+--------+ > So it's not just herding that is a problem. > > I'm wondering, how sensitive is this particular benchmark to fairness? > I.e. do the 200-1000 simulated users each perform the same number of ops, > so that any smearing of execution time via unfairness gets amplified? > > I.e. does steady-state throughput go up by 60%+ too with your changes? For this particular benchmark, there are interplay of different locks that determine the overall performance of the system. Yes, I got steady state performance gain of 60%+ with the qrwlock change with the modified mbcache.c. Without the modified mbcache.c file, the performance gain drop to 20-30%. I am still trying to find out more about the performance variations in different situations. Regards, Longman From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from g1t0026.austin.hp.com ([15.216.28.33]:29155 "EHLO g1t0026.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752810Ab3GXAD4 (ORCPT ); Tue, 23 Jul 2013 20:03:56 -0400 Message-ID: <51EF19D8.2090307@hp.com> Date: Tue, 23 Jul 2013 20:03:36 -0400 From: Waiman Long MIME-Version: 1.0 Subject: Re: [PATCH RFC 1/2] qrwlock: A queue read/write lock implementation References: <1373679249-27123-1-git-send-email-Waiman.Long@hp.com> <1373679249-27123-2-git-send-email-Waiman.Long@hp.com> <51E49FA3.4030202@hp.com> <20130718074204.GA22623@gmail.com> <51E7F03A.4090305@hp.com> <20130719084023.GB25784@gmail.com> <51E95B85.8090003@hp.com> <20130722103402.GA1991@gmail.com> In-Reply-To: <20130722103402.GA1991@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: Ingo Molnar Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Arnd Bergmann , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Steven Rostedt , Andrew Morton , Richard Weinberger , Catalin Marinas , Greg Kroah-Hartman , Matt Fleming , Herbert Xu , Akinobu Mita , Rusty Russell , Michel Lespinasse , Andi Kleen , Rik van Riel , "Paul E. McKenney" , Linus Torvalds , "Chandramouleeswaran, Aswin" , "Norton, Scott J" , George Spelvin Message-ID: <20130724000336.Cu2CiAGG95GmFQJ2NVavNpyC3bRSEm_eZqTfrlfvx0c@z> On 07/22/2013 06:34 AM, Ingo Molnar wrote: > * Waiman Long wrote: > >> I had run some performance tests using the fserver and new_fserver >> benchmarks (on ext4 filesystems) of the AIM7 test suite on a 80-core >> DL980 with HT on. The following kernels were used: >> >> 1. Modified 3.10.1 kernel with mb_cache_spinlock in fs/mbcache.c >> replaced by a rwlock >> 2. Modified 3.10.1 kernel + modified __read_lock_failed code as suggested >> by Ingo >> 3. Modified 3.10.1 kernel + queue read/write lock >> 4. Modified 3.10.1 kernel + queue read/write lock in classic read/write >> lock behavior >> >> The last one is with the read lock stealing flag set in the qrwlock >> structure to give priority to readers and behave more like the classic >> read/write lock with less fairness. >> >> The following table shows the averaged results in the 200-1000 >> user ranges: >> >> +-----------------+--------+--------+--------+--------+ >> | Kernel | 1 | 2 | 3 | 4 | >> +-----------------+--------+--------+--------+--------+ >> | fserver JPM | 245598 | 274457 | 403348 | 411941 | >> | % change from 1 | 0% | +11.8% | +64.2% | +67.7% | >> +-----------------+--------+--------+--------+--------+ >> | new-fserver JPM | 231549 | 269807 | 399093 | 399418 | >> | % change from 1 | 0% | +16.5% | +72.4% | +72.5% | >> +-----------------+--------+--------+--------+--------+ > So it's not just herding that is a problem. > > I'm wondering, how sensitive is this particular benchmark to fairness? > I.e. do the 200-1000 simulated users each perform the same number of ops, > so that any smearing of execution time via unfairness gets amplified? > > I.e. does steady-state throughput go up by 60%+ too with your changes? For this particular benchmark, there are interplay of different locks that determine the overall performance of the system. Yes, I got steady state performance gain of 60%+ with the qrwlock change with the modified mbcache.c. Without the modified mbcache.c file, the performance gain drop to 20-30%. I am still trying to find out more about the performance variations in different situations. Regards, Longman