From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 12 Jul 2002 03:01:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 12 Jul 2002 03:01:19 -0400 Received: from e21.nc.us.ibm.com ([32.97.136.227]:24778 "EHLO e21.nc.us.ibm.com") by vger.kernel.org with ESMTP id ; Fri, 12 Jul 2002 03:01:16 -0400 Date: Fri, 12 Jul 2002 12:37:51 +0530 From: Dipankar Sarma To: linux-kernel@vger.kernel.org Cc: lse-tech@lists.sourceforge.net, wilsont@us.ibm.com Subject: specweb99: dcache scalability results Message-ID: <20020712123751.B19931@in.ibm.com> Reply-To: dipankar@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org At OLS, Hanna presented some of our work related to directory entry cache. I am presenting another set of interesting results that are a part of this ongoing work. Mainly, we have been working on two tracks - reduce dcache_lock acquisitions by holding it while walking a cached path as suggested by Al Viro (fastwalk) and do a completely lockfree lookup/walk using RCU and lazy updation of the LRU list (dcache_rcu). Both showed promise and Troy Wilson from LTC perfromance team did a comparative study of these two dcache patches using specweb99. His measurement identifies what is good for dentry cache to use. Here is a summary of his reasults. The measurements were done on a 8-CPU PIII Xeon server with more RAM than you and I can dream of getting in your desktop ;-) More details of the directory entry cache work can be found in http://lse.sourceforge.net/locking/dcache/dcache.html. The dcache patches are available from Read-Copy Update package in http://www.sourceforge.net/projects/lse. Throughput comparison: --------------------- kernel throughput % improvement (simultaneous connections) ----- -------------------------- ------------- 2.4.17+lse02E 2258 - 2.4.17+lse02E+fastwalk 2280 1% 2.4.17+lse02E+dcache_rcu 2530 12% Lockmeter comparison: (with apology for the > 80 col text) --------------------- 2.4.17+lse02E: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 15.7% 20.8% 2.1us(6668us) 23us( 14ms)( 4.4%) 5215460 79.2% 20.8% 0% dcache_lock 2.4.17+lse02E+fastwalk: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 17.2% 17.7% 7.0us( 13ms) 53us( 30ms)( 2.9%) 1608566 82.3% 17.7% 0% dcache_lock 2.4.17+lse02E+dcache_rcu: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 1.9% 2.3% 2.0us(3343us) 71us(9406us)(0.20%) 657152 97.7% 2.3% 0% dcache_lock Conclusions: ------------ Fastwalk clearly helps by reducing the number of dcache_lock acquisitions by 69.9%. However, holding the lock over entire walk of the path is clearly detrimental to performance as seen by the 3-fold increase in average hold time and average wait time. dcache_rcu has the biggest positive impact on performance for webserver type of workload. Its gains come from two improvements - 87.3% reduction in lock acquisitions at the same time keeping the lock hold time constant. Acknowledgements: ----------------- SPEC(tm) and the benchmark name SPECweb(tm) are registered trademarks of the Standard Performance Evaluation Corporation. The benchmarking were done for research purpose only and were non-compliant with the following devaitions from the rules - 1. It was run on hardware that does not meet the SPEC availability-to-the-public criteria. The machine was an engineering sample. 2. access_log wasn't kept for full accounting. It was being written, but deleted every 200 seconds. For the latest SPECweb99 benchmark results visit http://www.spec.org Thanks -- Dipankar Sarma http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India.