From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751831Ab0LOQpx (ORCPT ); Wed, 15 Dec 2010 11:45:53 -0500 Received: from hera.kernel.org ([140.211.167.34]:39277 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750910Ab0LOQpv (ORCPT ); Wed, 15 Dec 2010 11:45:51 -0500 Message-ID: <4D08F0A2.9010301@kernel.org> Date: Wed, 15 Dec 2010 17:45:22 +0100 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7 MIME-Version: 1.0 To: Christoph Lameter CC: akpm@linux-foundation.org, Pekka Enberg , linux-kernel@vger.kernel.org, Eric Dumazet , "H. Peter Anvin" , Mathieu Desnoyers Subject: Re: [cpuops cmpxchg V2 4/5] vmstat: User per cpu atomics to avoid interrupt disable / enable References: <20101214162842.542421046@linux.com> <20101214162854.811759020@linux.com> In-Reply-To: <20101214162854.811759020@linux.com> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (hera.kernel.org [127.0.0.1]); Wed, 15 Dec 2010 16:45:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/14/2010 05:28 PM, Christoph Lameter wrote: > Currently the operations to increment vm counters must disable interrupts > in order to not mess up their housekeeping of counters. > > So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer > count on preremption being disabled we still have some minor issues. > The fetching of the counter thresholds is racy. > A threshold from another cpu may be applied if we happen to be > rescheduled on another cpu. However, the following vmstat operation > will then bring the counter again under the threshold limit. > > The operations for __xxx_zone_state are not changed since the caller > has taken care of the synchronization needs (and therefore the cycle > count is even less than the optimized version for the irq disable case > provided here). > > The optimization using this_cpu_cmpxchg will only be used if the arch > supports efficient this_cpu_ops (must have CONFIG_CMPXCHG_LOCAL set!) > > The use of this_cpu_cmpxchg reduces the cycle count for the counter > operations by %80 (inc_zone_page_state goes from 170 cycles to 32). > > Signed-off-by: Christoph Lameter > +/* + * If we have cmpxchg_local support then we do not need to incur the overhead + * that comes with local_irq_save/restore if we use this_cpu_cmpxchg. + * + * mod_state() modifies the zone counter state through atomic per cpu + * operations. + * + * Overstep mode specifies how overstep should handled: + * 0 No overstepping + * 1 Overstepping half of threshold + * -1 Overstepping minus half of threshold +*/ +static inline void mod_state(struct zone *zone, + enum zone_stat_item item, int delta, int overstep_mode) +{ + struct per_cpu_pageset __percpu *pcp = zone->pageset; + s8 __percpu *p = pcp->vm_stat_diff + item; + long o, n, t, z; + + do { + z = 0; /* overflow to zone counters */ + + /* + * The fetching of the stat_threshold is racy. We may apply + * a counter threshold to the wrong the cpu if we get + * rescheduled while executing here. However, the following + * will apply the threshold again and therefore bring the + * counter under the threshold. + */ What does "the following" mean here? Later executions of the function? It seems like the counter can go out of the threshold at least temporarily, which probably is okay but I think the comment can be improved a bit. Thanks. -- tejun