From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758751Ab0J2S07 (ORCPT ); Fri, 29 Oct 2010 14:26:59 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:51635 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756969Ab0J2S06 (ORCPT ); Fri, 29 Oct 2010 14:26:58 -0400 Date: Fri, 29 Oct 2010 11:25:41 -0700 From: Andrew Morton To: Christoph Lameter Cc: Mel Gorman , Shaohua Li , KOSAKI Motohiro , David Rientjes , KAMEZAWA Hiroyuki , LKML , Linux-MM Subject: Re: [PATCH 1/2] mm: page allocator: Adjust the per-cpu counter threshold when memory is low Message-Id: <20101029112541.8ab906bb.akpm@linux-foundation.org> In-Reply-To: References: <1288278816-32667-1-git-send-email-mel@csn.ul.ie> <1288278816-32667-2-git-send-email-mel@csn.ul.ie> <20101028150433.fe4f2d77.akpm@linux-foundation.org> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.9; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 29 Oct 2010 09:58:25 -0500 (CDT) Christoph Lameter wrote: > On Thu, 28 Oct 2010, Andrew Morton wrote: > > > > To ensure that kswapd wakes up, a safe version of zone_watermark_ok() > > > is introduced that takes a more accurate reading of NR_FREE_PAGES when > > > called from wakeup_kswapd, when deciding whether it is really safe to go > > > back to sleep in sleeping_prematurely() and when deciding if a zone is > > > really balanced or not in balance_pgdat(). We are still using an expensive > > > function but limiting how often it is called. > > > > Here I go again. I have a feeling that I already said this, but I > > can't find versions 2 or 3 in the archives.. > > > > Did you evaluate using plain on percpu_counters for this? They won't > > solve the performance problem as they're basically the same thing as > > these open-coded counters. But they'd reduce the amount of noise and > > custom-coded boilerplate in mm/. > > The zone counters are done using the ZVCs in vmstat.c to save space well, they actually waste space because of that threshold thing. > and to > be in the same cacheline as other hot data necessary for allocation and > free. Yes, that'll save some misses. > > > > > + threshold = max(1, (int)(watermark_distance / num_online_cpus())); > > > + > > > + /* > > > + * Maximum threshold is 125 > > > > Reasoning? > > Differentials are stored in 8 bit signed ints. > > > > + put_online_cpus(); > > > +} > > > > Given that ->stat_threshold is the same for each CPU, why store it for > > each CPU at all? Why not put it in the zone and eliminate the inner > > loop? > > Doing that caused cache misses in the past and reduced the performance of > the ZVCs. This way the threshold is in the same cacheline as the > differentials. This sounds wrong. As long as that threshold isn't stored in a cacheline which other CPUs are modifying, all CPUs should be able to happily cache it. Maybe it needed a bit of padding inside the zone struct.