All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Christoph Lameter <cl@linux.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH] memcg: remove unneeded preempt_disable
Date: Thu, 25 Aug 2011 20:40:13 +0200	[thread overview]
Message-ID: <1314297613.27911.83.camel@twins> (raw)
In-Reply-To: <alpine.DEB.2.00.1108251128460.27407@router.home>

On Thu, 2011-08-25 at 11:31 -0500, Christoph Lameter wrote:
> On Thu, 25 Aug 2011, James Bottomley wrote:
> 
> > On Thu, 2011-08-25 at 10:11 -0500, Christoph Lameter wrote:
> > > On Thu, 25 Aug 2011, Peter Zijlstra wrote:
> > >
> > > > On Thu, 2011-08-18 at 14:40 -0700, Andrew Morton wrote:
> > > > >
> > > > > I think I'll apply it, as the call frequency is low (correct?) and the
> > > > > problem will correct itself as other architectures implement their
> > > > > atomic this_cpu_foo() operations.
> > > >
> > > > Which leads me to wonder, can anything but x86 implement that this_cpu_*
> > > > muck? I doubt any of the risk chips can actually do all this.
> > > > Maybe Itanic, but then that seems to be dying fast.
> > >
> > > The cpu needs to have an RMW instruction that does something to a
> > > variable relative to a register that points to the per cpu base.
> > >
> > > Thats generally possible. The problem is how expensive the RMW is going to
> > > be.
> >
> > Risc systems generally don't have a single instruction for this, that's
> > correct.  Obviously we can do it as a non atomic sequence: read
> > variable, compute relative, read, modify, write ... but there's
> > absolutely no point hand crafting that in asm since the compiler can
> > usually work it out nicely.  And, of course, to have this atomic, we
> > have to use locks, which ends up being very expensive.
> 
> ARM seems to have these LDREX/STREX instructions for that purpose which
> seem to be used for generating atomic instructions without lockes. I guess
> other RISC architectures have similar means of doing it?

Even with LL/SC and the CPU base in a register you need to do something
like:

again:
	LL $target-reg, $cpubase-reg + offset
	<foo>
	SC $ret, $target-reg, $cpubase-reg + offset
	if !$ret goto again

Its the +offset that's problematic, it either doesn't exist or is very
limited (a quick look at the MIPS instruction set gives a limit of 64k).

Without the +offset you need:

again:
	$tmp-reg = $cpubase-reg
	$tmp-reg += offset;

	LL $target-reg, $tmp-reg
	<foo>
	SC $ret, $target-reg, $tmp-reg
	if !$ret goto again


Which is wide open to migration races. Also, very often there are
constraints on LL/SC that mandate we use preempt_disable/enable around
its use, which pretty much voids the whole purpose, since if we disable
preemption we might as well just use C (ARM belongs in this class).

It does look POWERPC's lwarx/stwcx is sane enough, although the
instruction reference I found doesn't list what happens if the LL/SC
doesn't use the same effective address or has other loads/stores in
between, if its ok with those and simply fails the SC it should be good.

Still, creating atomic ops for per-cpu ops might be more expensive than
simply doing the preempt-disable/rmw/enable dance, dunno don't know
these archs that well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Christoph Lameter <cl@linux.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH] memcg: remove unneeded preempt_disable
Date: Thu, 25 Aug 2011 20:40:13 +0200	[thread overview]
Message-ID: <1314297613.27911.83.camel@twins> (raw)
Message-ID: <20110825184013.TVxRwmcAgkOhs7R0oCFvYdbbHDPeVgDX_ocX-wwj4Vw@z> (raw)
In-Reply-To: <alpine.DEB.2.00.1108251128460.27407@router.home>

On Thu, 2011-08-25 at 11:31 -0500, Christoph Lameter wrote:
> On Thu, 25 Aug 2011, James Bottomley wrote:
> 
> > On Thu, 2011-08-25 at 10:11 -0500, Christoph Lameter wrote:
> > > On Thu, 25 Aug 2011, Peter Zijlstra wrote:
> > >
> > > > On Thu, 2011-08-18 at 14:40 -0700, Andrew Morton wrote:
> > > > >
> > > > > I think I'll apply it, as the call frequency is low (correct?) and the
> > > > > problem will correct itself as other architectures implement their
> > > > > atomic this_cpu_foo() operations.
> > > >
> > > > Which leads me to wonder, can anything but x86 implement that this_cpu_*
> > > > muck? I doubt any of the risk chips can actually do all this.
> > > > Maybe Itanic, but then that seems to be dying fast.
> > >
> > > The cpu needs to have an RMW instruction that does something to a
> > > variable relative to a register that points to the per cpu base.
> > >
> > > Thats generally possible. The problem is how expensive the RMW is going to
> > > be.
> >
> > Risc systems generally don't have a single instruction for this, that's
> > correct.  Obviously we can do it as a non atomic sequence: read
> > variable, compute relative, read, modify, write ... but there's
> > absolutely no point hand crafting that in asm since the compiler can
> > usually work it out nicely.  And, of course, to have this atomic, we
> > have to use locks, which ends up being very expensive.
> 
> ARM seems to have these LDREX/STREX instructions for that purpose which
> seem to be used for generating atomic instructions without lockes. I guess
> other RISC architectures have similar means of doing it?

Even with LL/SC and the CPU base in a register you need to do something
like:

again:
	LL $target-reg, $cpubase-reg + offset
	<foo>
	SC $ret, $target-reg, $cpubase-reg + offset
	if !$ret goto again

Its the +offset that's problematic, it either doesn't exist or is very
limited (a quick look at the MIPS instruction set gives a limit of 64k).

Without the +offset you need:

again:
	$tmp-reg = $cpubase-reg
	$tmp-reg += offset;

	LL $target-reg, $tmp-reg
	<foo>
	SC $ret, $target-reg, $tmp-reg
	if !$ret goto again


Which is wide open to migration races. Also, very often there are
constraints on LL/SC that mandate we use preempt_disable/enable around
its use, which pretty much voids the whole purpose, since if we disable
preemption we might as well just use C (ARM belongs in this class).

It does look POWERPC's lwarx/stwcx is sane enough, although the
instruction reference I found doesn't list what happens if the LL/SC
doesn't use the same effective address or has other loads/stores in
between, if its ok with those and simply fails the SC it should be good.

Still, creating atomic ops for per-cpu ops might be more expensive than
simply doing the preempt-disable/rmw/enable dance, dunno don't know
these archs that well.


  parent reply	other threads:[~2011-08-25 18:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-18  6:50 [PATCH] memcg: remove unneeded preempt_disable Greg Thelen
2011-08-18  6:50 ` Greg Thelen
2011-08-18  6:52 ` KAMEZAWA Hiroyuki
2011-08-18  6:52   ` KAMEZAWA Hiroyuki
2011-08-18  9:38 ` Johannes Weiner
2011-08-18  9:38   ` Johannes Weiner
2011-08-18 14:26   ` Valdis.Kletnieks
2011-08-18 14:41     ` Johannes Weiner
2011-08-18 14:41       ` Johannes Weiner
2011-08-18 18:27       ` Valdis.Kletnieks
2011-08-18 21:40 ` Andrew Morton
2011-08-18 21:40   ` Andrew Morton
2011-08-19  0:00   ` Greg Thelen
2011-08-19  0:00     ` Greg Thelen
2011-08-25 14:57   ` Peter Zijlstra
2011-08-25 14:57     ` Peter Zijlstra
2011-08-25 15:11     ` Christoph Lameter
2011-08-25 15:11       ` Christoph Lameter
2011-08-25 16:20       ` James Bottomley
2011-08-25 16:20         ` James Bottomley
2011-08-25 16:31         ` Christoph Lameter
2011-08-25 16:31           ` Christoph Lameter
2011-08-25 16:34           ` James Bottomley
2011-08-25 16:34             ` James Bottomley
2011-08-25 17:07             ` Christoph Lameter
2011-08-25 17:07               ` Christoph Lameter
2011-08-25 18:34               ` James Bottomley
2011-08-25 18:34                 ` James Bottomley
2011-08-25 18:46                 ` Christoph Lameter
2011-08-25 18:46                   ` Christoph Lameter
2011-08-25 18:49                   ` Peter Zijlstra
2011-08-25 18:49                     ` Peter Zijlstra
2011-08-25 18:53                   ` Peter Zijlstra
2011-08-25 18:53                     ` Peter Zijlstra
2011-08-25 19:05                   ` Peter Zijlstra
2011-08-25 19:05                     ` Peter Zijlstra
2011-08-25 19:19                     ` Christoph Lameter
2011-08-25 19:19                       ` Christoph Lameter
2011-08-25 22:56                       ` Peter Zijlstra
2011-08-25 22:56                         ` Peter Zijlstra
2011-08-25 19:29                   ` James Bottomley
2011-08-25 19:29                     ` James Bottomley
2011-08-25 23:01                     ` Peter Zijlstra
2011-08-25 23:01                       ` Peter Zijlstra
2011-08-25 18:40           ` Peter Zijlstra [this message]
2011-08-25 18:40             ` Peter Zijlstra
2011-09-06  9:58 ` Johannes Weiner
2011-09-06  9:58   ` Johannes Weiner
2011-09-06 10:04   ` KAMEZAWA Hiroyuki
2011-09-06 10:04     ` KAMEZAWA Hiroyuki
2011-09-06 10:49     ` Johannes Weiner
2011-09-06 10:49       ` Johannes Weiner
2011-09-06 10:45       ` KAMEZAWA Hiroyuki
2011-09-06 10:45         ` KAMEZAWA Hiroyuki
2011-09-06 18:04   ` Greg Thelen
2011-09-06 18:04     ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1314297613.27911.83.camel@twins \
    --to=peterz@infradead.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cl@linux.com \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.