From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755262Ab1HYSkk (ORCPT ); Thu, 25 Aug 2011 14:40:40 -0400 Received: from casper.infradead.org ([85.118.1.10]:53063 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755214Ab1HYSkj convert rfc822-to-8bit (ORCPT ); Thu, 25 Aug 2011 14:40:39 -0400 Subject: Re: [PATCH] memcg: remove unneeded preempt_disable From: Peter Zijlstra To: Christoph Lameter Cc: James Bottomley , Andrew Morton , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, KAMEZAWA Hiroyuki , Balbir Singh , Daisuke Nishimura , linux-arch@vger.kernel.org Date: Thu, 25 Aug 2011 20:40:13 +0200 In-Reply-To: References: <1313650253-21794-1-git-send-email-gthelen@google.com> <20110818144025.8e122a67.akpm@linux-foundation.org> <1314284272.27911.32.camel@twins> <1314289208.3268.4.camel@mulgrave> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Mailer: Evolution 3.0.2- Message-ID: <1314297613.27911.83.camel@twins> Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2011-08-25 at 11:31 -0500, Christoph Lameter wrote: > On Thu, 25 Aug 2011, James Bottomley wrote: > > > On Thu, 2011-08-25 at 10:11 -0500, Christoph Lameter wrote: > > > On Thu, 25 Aug 2011, Peter Zijlstra wrote: > > > > > > > On Thu, 2011-08-18 at 14:40 -0700, Andrew Morton wrote: > > > > > > > > > > I think I'll apply it, as the call frequency is low (correct?) and the > > > > > problem will correct itself as other architectures implement their > > > > > atomic this_cpu_foo() operations. > > > > > > > > Which leads me to wonder, can anything but x86 implement that this_cpu_* > > > > muck? I doubt any of the risk chips can actually do all this. > > > > Maybe Itanic, but then that seems to be dying fast. > > > > > > The cpu needs to have an RMW instruction that does something to a > > > variable relative to a register that points to the per cpu base. > > > > > > Thats generally possible. The problem is how expensive the RMW is going to > > > be. > > > > Risc systems generally don't have a single instruction for this, that's > > correct. Obviously we can do it as a non atomic sequence: read > > variable, compute relative, read, modify, write ... but there's > > absolutely no point hand crafting that in asm since the compiler can > > usually work it out nicely. And, of course, to have this atomic, we > > have to use locks, which ends up being very expensive. > > ARM seems to have these LDREX/STREX instructions for that purpose which > seem to be used for generating atomic instructions without lockes. I guess > other RISC architectures have similar means of doing it? Even with LL/SC and the CPU base in a register you need to do something like: again: LL $target-reg, $cpubase-reg + offset SC $ret, $target-reg, $cpubase-reg + offset if !$ret goto again Its the +offset that's problematic, it either doesn't exist or is very limited (a quick look at the MIPS instruction set gives a limit of 64k). Without the +offset you need: again: $tmp-reg = $cpubase-reg $tmp-reg += offset; LL $target-reg, $tmp-reg SC $ret, $target-reg, $tmp-reg if !$ret goto again Which is wide open to migration races. Also, very often there are constraints on LL/SC that mandate we use preempt_disable/enable around its use, which pretty much voids the whole purpose, since if we disable preemption we might as well just use C (ARM belongs in this class). It does look POWERPC's lwarx/stwcx is sane enough, although the instruction reference I found doesn't list what happens if the LL/SC doesn't use the same effective address or has other loads/stores in between, if its ok with those and simply fails the SC it should be good. Still, creating atomic ops for per-cpu ops might be more expensive than simply doing the preempt-disable/rmw/enable dance, dunno don't know these archs that well.