From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751420AbdBWPdr (ORCPT ); Thu, 23 Feb 2017 10:33:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:54119 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750828AbdBWPdq (ORCPT ); Thu, 23 Feb 2017 10:33:46 -0500 Date: Thu, 23 Feb 2017 15:19:08 +0000 From: Mel Gorman To: Michal Hocko Cc: Ye Xiaolong , Stephen Rothwell , Minchan Kim , Hillf Danton , Johannes Weiner , Andrew Morton , LKML , lkp@01.org Subject: Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per_sec -11.1% regression Message-ID: <20170223151908.z7disw2es7jlnf7b@suse.de> References: <20170123012644.GD17561@yexl-desktop> <20170124134424.GL6867@dhcp22.suse.cz> <20170125042706.GL17561@yexl-desktop> <20170126091317.GB6590@dhcp22.suse.cz> <20170204081604.GH12121@yexl-desktop> <20170206081236.GA3097@dhcp22.suse.cz> <20170207022213.GC2568@yexl-desktop> <20170207144315.GS5065@dhcp22.suse.cz> <20170223012734.GB31776@yexl-desktop> <20170223073544.uiy6rvw3d44irixf@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170223073544.uiy6rvw3d44irixf@dhcp22.suse.cz> User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 23, 2017 at 08:35:45AM +0100, Michal Hocko wrote: > > 57.60 ± 0% -11.1% 51.20 ± 0% fsmark.files_per_sec > > 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time > > 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time.max > > 14317 ± 6% -12.2% 12568 ± 7% fsmark.time.involuntary_context_switches > > 1864 ± 0% +0.5% 1873 ± 0% fsmark.time.maximum_resident_set_size > > 12425 ± 0% +23.3% 15320 ± 3% fsmark.time.minor_page_faults > > 33.00 ± 3% -33.9% 21.80 ± 1% fsmark.time.percent_of_cpu_this_job_got > > 203.49 ± 3% -28.1% 146.31 ± 1% fsmark.time.system_time > > 605701 ± 0% +3.6% 627486 ± 0% fsmark.time.voluntary_context_switches > > 307106 ± 2% +20.2% 368992 ± 9% interrupts.CAL:Function_call_interrupts > > 183040 ± 0% +23.2% 225559 ± 3% softirqs.BLOCK > > 12203 ± 57% +236.4% 41056 ±103% softirqs.NET_RX > > 186118 ± 0% +21.9% 226922 ± 2% softirqs.TASKLET > > 14317 ± 6% -12.2% 12568 ± 7% time.involuntary_context_switches > > 12425 ± 0% +23.3% 15320 ± 3% time.minor_page_faults > > 33.00 ± 3% -33.9% 21.80 ± 1% time.percent_of_cpu_this_job_got > > 203.49 ± 3% -28.1% 146.31 ± 1% time.system_time > > 3.47 ± 3% -13.0% 3.02 ± 1% turbostat.%Busy > > 99.60 ± 1% -9.6% 90.00 ± 1% turbostat.Avg_MHz > > 78.69 ± 1% +1.7% 80.01 ± 0% turbostat.CorWatt > > 3.56 ± 61% -91.7% 0.30 ± 76% turbostat.Pkg%pc2 > > 207790 ± 0% -8.2% 190654 ± 1% vmstat.io.bo > > 30667691 ± 0% +65.9% 50890669 ± 1% vmstat.memory.cache > > 34549892 ± 0% -58.4% 14378939 ± 4% vmstat.memory.free > > 6768 ± 0% -1.3% 6681 ± 1% vmstat.system.cs > > 1.089e+10 ± 2% +13.4% 1.236e+10 ± 3% cpuidle.C1E-IVT.time > > 11475304 ± 2% +13.4% 13007849 ± 3% cpuidle.C1E-IVT.usage > > 2.7e+09 ± 6% +13.2% 3.057e+09 ± 3% cpuidle.C3-IVT.time > > 2954294 ± 6% +14.3% 3375966 ± 3% cpuidle.C3-IVT.usage > > 96963295 ± 14% +17.5% 1.139e+08 ± 12% cpuidle.POLL.time > > 8761 ± 7% +17.6% 10299 ± 9% cpuidle.POLL.usage > > 30454483 ± 0% +66.4% 50666102 ± 1% meminfo.Cached > > > > Do you see what's happening? > > not really. All I could see in the previous data was that the memory > locality was different (and better) with my patch, which I cannot > explain either because get_scan_count is always per-node thing. Moreover > the change shouldn't make any difference for normal GFP_KERNEL requests > on 64b systems because the reclaim index covers all zones so there is > nothing to skip over. > > > Or is there anything we can do to improve fsmark benchmark setup to > > make it more reasonable? > > Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows > better. There is not much to be an expert on with that benchmark. It creates a bunch of files of the requested size for a number of iterations. In async configurations, it can be heavily skewed by the first few iterations until dirty limits are hit. Once that point is reached, the files/sec drops rapidly to some value below the writing speed of the underlying device. Hence, looking at the average performance of it is risky and very sensitive to exact timing unless this is properly accounted for. In async configurations, stalls are dominated by balance_dirty_pages and some filesystem details such as whether it needs to wait for space in a transaction log. That also limits the overall performance of the workload. Once the stable phase is reached, there still will be quite some variability due to the timing of the writeback threads that cause a bit of jitter as well as the usual concerns with multiple threads writing to different parts of the disk. When NUMA is taken into account, it is important to consider the size of the NUMA nodes as assymetric sizes will affect when remote memory is used and to a lesser extent when balance_dirty_pages is triggered. The benchmark is what it is. You can force it to generate stable figures but it won't have the same behaviour so it all depends on how you define "reasonable". At the very minimum, take into account that an average of multiple iterations will be skewed early in the workloads lifetime by the fact it hasn't hit dirty limits yet. -- Mel Gorman SUSE Labs