* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-16 0:49 UTC (permalink / raw)
To: Herbert Xu
Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
Chris Snook, Linux Kernel Mailing List, linux-arch,
Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
jesper.juhl, segher
In-Reply-To: <20070816003023.GA29447@gondor.apana.org.au>
On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:23:10PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> > > >
> > > > > > Communicating between process context and interrupt/NMI handlers using
> > > > > > per-CPU variables.
> > > > >
> > > > > Remeber we're talking about atomic_read/atomic_set. Please
> > > > > cite the actual file/function name you have in mind.
> > > >
> > > > Yep, we are indeed talking about atomic_read()/atomic_set().
> > > >
> > > > We have been through this issue already in this thread.
> > >
> > > Sorry, but I must've missed it. Could you cite the file or
> > > function for my benefit?
> >
> > I might summarize the thread if there is interest, but I am not able to
> > do so right this minute.
>
> Thanks. But I don't need a summary of the thread, I'm asking
> for an extant code snippet in our kernel that benefits from
> the volatile change and is not part of a busy-wait.
Sorry, can't help you there. I really do believe that the information
you need (as opposed to the specific item you are asking for) really
has been put forth in this thread.
Thanx, Paul
^ permalink raw reply
* Re: [PATCH] [IPv6]: Invalid semicolon after if statement
From: David Miller @ 2007-08-16 0:51 UTC (permalink / raw)
To: davej; +Cc: netdev
In-Reply-To: <20070815235219.GA17004@redhat.com>
From: Dave Jones <davej@redhat.com>
Date: Wed, 15 Aug 2007 19:52:20 -0400
> On Wed, Aug 15, 2007 at 03:08:14PM -0700, David Miller wrote:
> > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> > Date: Thu, 16 Aug 2007 00:57:00 +0300 (EEST)
> >
> > > A similar fix to netfilter from Eric Dumazet inspired me to
> > > look around a bit by using some grep/sed stuff as looking for
> > > this kind of bugs seemed easy to automate. This is one of them
> > > I found where it looks like this semicolon is not valid.
> > >
> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> >
> > Yikes! Makes you want to audit the entire tree for these
> > things :-)))
>
> Indeed. Here's another one.
>
> Signed-off-by: Dave Jones <davej@redhat.com>
That got fixed the other day and is the "A similar fix to netfilter
from Eric Dumazet" Ilpo is reffering to above :)
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Herbert Xu @ 2007-08-16 0:51 UTC (permalink / raw)
To: Satyam Sharma
Cc: Segher Boessenkool, horms, Stefan Richter,
Linux Kernel Mailing List, Paul E. McKenney, ak, netdev, cfriesen,
Heiko Carstens, rpjday, jesper.juhl, linux-arch, Andrew Morton,
zlynx, clameter, schwidefsky, Chris Snook, davem, Linus Torvalds,
wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708160621260.24380@enigma.security.iitk.ac.in>
On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
>
> > The udelay itself certainly should have some form of cpu_relax in it.
>
> Yes, a form of barrier() must be present in mdelay() or udelay() itself
> as you say, having it in __const_udelay() is *not* enough (superflous
> actually, considering it is already a separate translation unit and
> invisible to the compiler).
As long as __const_udelay does something which has the same
effect as barrier it is enough even if it's in the same unit.
As a matter of fact it does on i386 where __delay either uses
rep_nop or asm/volatile.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Herbert Xu @ 2007-08-16 0:53 UTC (permalink / raw)
To: Paul E. McKenney
Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
Chris Snook, Linux Kernel Mailing List, linux-arch,
Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
jesper.juhl, segher
In-Reply-To: <20070816004950.GZ9645@linux.vnet.ibm.com>
On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
>
> > Thanks. But I don't need a summary of the thread, I'm asking
> > for an extant code snippet in our kernel that benefits from
> > the volatile change and is not part of a busy-wait.
>
> Sorry, can't help you there. I really do believe that the information
> you need (as opposed to the specific item you are asking for) really
> has been put forth in this thread.
That only leads me to believe that such a code snippet simply
does not exist.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-16 0:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <Pine.LNX.4.64.0708151740570.10497@schroedinger.engr.sgi.com>
On Wed, Aug 15, 2007 at 05:42:07PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
>
> > Seems to me that we face greater chance of confusion without the
> > volatile than with, particularly as compiler optimizations become
> > more aggressive. Yes, we could simply disable optimization, but
> > optimization can be quite helpful.
>
> A volatile default would disable optimizations for atomic_read.
> atomic_read without volatile would allow for full optimization by the
> compiler. Seems that this is what one wants in many cases.
The volatile cast should not disable all that many optimizations,
for example, it is much less hurtful than barrier(). Furthermore,
the main optimizations disabled (pulling atomic_read() and atomic_set()
out of loops) really do need to be disabled.
Thanx, Paul
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-08-16 0:59 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <20070816005348.GA9645@linux.vnet.ibm.com>
On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> The volatile cast should not disable all that many optimizations,
> for example, it is much less hurtful than barrier(). Furthermore,
> the main optimizations disabled (pulling atomic_read() and atomic_set()
> out of loops) really do need to be disabled.
In many cases you do not need a barrier. Having volatile there *will*
impact optimization because the compiler cannot use a register that may
contain the value that was fetched earlier. And the compiler cannot choose
freely when to fetch the value. The order of memory accesses are fixed if
you use volatile. If the variable is not volatile then the compiler can
arrange memory accesses any way they fit and thus generate better code.
^ permalink raw reply
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Nick Piggin @ 2007-08-16 1:09 UTC (permalink / raw)
To: paulmck
Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
Arnd Bergmann
In-Reply-To: <20070815201501.GM9645@linux.vnet.ibm.com>
Paul E. McKenney wrote:
> On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
>>Especially since several big architectures don't have volatile in their
>>atomic_get and _set, I think it would be a step backwards to add them in
>>as a "just in case" thin now (unless there is a better reason).
>
>
> Good point, except that I would expect gcc's optimization to continue
> to improve. I would like the kernel to be able to take advantage of
> improved optimization, which means that we are going to have to make
> a few changes. Adding volatile to atomic_get() and atomic_set() is
> IMHO one of those changes.
What optimisations? gcc already does most of the things you need a
barrier/volatile for, like reordering non-dependant loads and stores,
and eliminating mem ops completely by caching in registers.
>>As to your followup question of why to use it over ACCESS_ONCE. I
>>guess, aside from consistency with the rest of the barrier APIs, you
>>can use it in other primitives when you don't actually know what the
>>caller is going to do or if it even will make an access. You could
>>also use it between calls to _other_ primitives, etc... it just
>>seems more flexible to me, but I haven't actually used such a thing
>>in real code...
>>
>>ACCESS_ONCE doesn't seem as descriptive. What it results in is the
>>memory location being loaded or stored (presumably once exactly),
>>but I think the more general underlying idea is a barrier point.
>
>
> OK, first, I am not arguing that ACCESS_ONCE() can replace all current
> uses of barrier().
OK. Well I also wasn't saying that ACCESS_ONCE should not be
implemented. But if we want something like it, then it would make
sense to have an equivalent barrier statement as well (ie. order()).
--
SUSE Labs, Novell Inc.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-16 1:23 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
Linux Kernel Mailing List, Paul E. McKenney, netdev, ak, cfriesen,
rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
wensong, wjiang
In-Reply-To: <d36ec7799cfb43557770951c1cd57355@kernel.crashing.org>
On Wed, 15 Aug 2007, Segher Boessenkool wrote:
> [...]
> > BTW:
> >
> > #define atomic_read(a) (*(volatile int *)&(a))
> > #define atomic_set(a,i) (*(volatile int *)&(a) = (i))
> >
> > int a;
> >
> > void func(void)
> > {
> > int b;
> >
> > b = atomic_read(a);
> > atomic_set(a, 20);
> > b = atomic_read(a);
> > }
> >
> > gives:
> >
> > func:
> > pushl %ebp
> > movl a, %eax
> > movl %esp, %ebp
> > movl $20, a
> > movl a, %eax
> > popl %ebp
> > ret
> >
> > so the first atomic_read() wasn't optimized away.
>
> Of course. It is executed by the abstract machine, so
> it will be executed by the actual machine. On the other
> hand, try
>
> b = 0;
> if (b)
> b = atomic_read(a);
>
> or similar.
Yup, obviously. Volatile accesses (or any access to volatile objects),
or even "__volatile__ asms" (which gcc normally promises never to elid)
can always be optimized for cases such as these where the compiler can
trivially determine that the code in question is not reachable.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-16 1:14 UTC (permalink / raw)
To: Christoph Lameter
Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <Pine.LNX.4.64.0708151757400.11196@schroedinger.engr.sgi.com>
On Wed, Aug 15, 2007 at 05:59:41PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
>
> > The volatile cast should not disable all that many optimizations,
> > for example, it is much less hurtful than barrier(). Furthermore,
> > the main optimizations disabled (pulling atomic_read() and atomic_set()
> > out of loops) really do need to be disabled.
>
> In many cases you do not need a barrier. Having volatile there *will*
> impact optimization because the compiler cannot use a register that may
> contain the value that was fetched earlier. And the compiler cannot choose
> freely when to fetch the value. The order of memory accesses are fixed if
> you use volatile. If the variable is not volatile then the compiler can
> arrange memory accesses any way they fit and thus generate better code.
Understood. My point is not that the impact is precisely zero, but
rather that the impact on optimization is much less hurtful than the
problems that could arise otherwise, particularly as compilers become
more aggressive in their optimizations.
Thanx, Paul
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-16 1:14 UTC (permalink / raw)
To: Herbert Xu
Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
Chris Snook, Linux Kernel Mailing List, linux-arch,
Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
jesper.juhl, segher
In-Reply-To: <20070816005316.GA29802@gondor.apana.org.au>
On Thu, Aug 16, 2007 at 08:53:16AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> >
> > > Thanks. But I don't need a summary of the thread, I'm asking
> > > for an extant code snippet in our kernel that benefits from
> > > the volatile change and is not part of a busy-wait.
> >
> > Sorry, can't help you there. I really do believe that the information
> > you need (as opposed to the specific item you are asking for) really
> > has been put forth in this thread.
>
> That only leads me to believe that such a code snippet simply
> does not exist.
Whatever...
Thanx, Paul
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-16 1:18 UTC (permalink / raw)
To: Herbert Xu
Cc: Segher Boessenkool, horms, Stefan Richter,
Linux Kernel Mailing List, Paul E. McKenney, ak, netdev, cfriesen,
Heiko Carstens, rpjday, jesper.juhl, linux-arch, Andrew Morton,
zlynx, clameter, schwidefsky, Chris Snook, davem, Linus Torvalds,
wensong, wjiang
In-Reply-To: <20070816005156.GA29698@gondor.apana.org.au>
Hi Herbert,
On Thu, 16 Aug 2007, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
> >
> > > The udelay itself certainly should have some form of cpu_relax in it.
> >
> > Yes, a form of barrier() must be present in mdelay() or udelay() itself
> > as you say, having it in __const_udelay() is *not* enough (superflous
> > actually, considering it is already a separate translation unit and
> > invisible to the compiler).
>
> As long as __const_udelay does something which has the same
> effect as barrier it is enough even if it's in the same unit.
Only if __const_udelay() is inlined. But as I said, __const_udelay()
-- although marked "inline" -- will never be inlined anywhere in the
kernel in reality. It's an exported symbol, and never inlined from
modules. Even from built-in targets, the definition of __const_udelay
is invisible when gcc is compiling the compilation units of those
callsites. The compiler has no idea that that function has barriers
or not, so we're saved here _only_ by the lucky fact that
__const_udelay() is in a different compilation unit.
> As a matter of fact it does on i386 where __delay either uses
> rep_nop or asm/volatile.
__delay() can be either delay_tsc() or delay_loop() on i386.
delay_tsc() uses the rep_nop() there for it's own little busy
loop, actually. But for a call site that inlines __const_udelay()
-- if it were ever moved to a .h file and marked inline -- the
call to __delay() will _still_ be across compilation units. So,
again for this case, it does not matter if the callee function
has compiler barriers or not (it would've been a different story
if we were discussing real/CPU barriers, I think), what saves us
here is just the fact that a call is made to a function from a
different compilation unit, which is invisible to the compiler
when compiling the callsite, and hence acting as the compiler
barrier.
Regarding delay_loop(), it uses "volatile" for the "asm" which
has quite different semantics from the C language "volatile"
type-qualifier keyword and does not imply any compiler barrier
at all.
Satyam
^ permalink raw reply
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Nick Piggin @ 2007-08-16 1:20 UTC (permalink / raw)
To: Segher Boessenkool
Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
cfriesen, akpm, torvalds, linux-arch, jesper.juhl, zlynx,
schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells
In-Reply-To: <7131d547746a998fcec74d1c091f9f6a@kernel.crashing.org>
Segher Boessenkool wrote:
>> Please check the definition of "cache coherence".
>
>
> Which of the twelve thousand such definitions? :-)
Every definition I have seen says that writes to a single memory
location have a serial order as seen by all CPUs, and that a read
will return the most recent write in the sequence (with a bit of
extra mumbo jumbo to cover store queues and store forwarding).
Within such a definition, I don't see how would be allowed for a
single CPU to execute reads out of order and have the second read
return an earlier write than the first read.
--
SUSE Labs, Novell Inc.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-16 1:23 UTC (permalink / raw)
To: paulmck
Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <20070815224433.GQ9645@linux.vnet.ibm.com>
>>>> No; compilation units have nothing to do with it, GCC can optimise
>>>> across compilation unit boundaries just fine, if you tell it to
>>>> compile more than one compilation unit at once.
>>>
>>> Last I checked, the Linux kernel build system did compile each .c
>>> file
>>> as a separate compilation unit.
>>
>> I have some patches to use -combine -fwhole-program for Linux.
>> Highly experimental, you need a patched bleeding edge toolchain.
>> If there's interest I'll clean it up and put it online.
>>
>> David Woodhouse had some similar patches about a year ago.
>
> Sounds exciting... ;-)
Yeah, the breakage is *quite* spectacular :-)
>>>>> In many cases, the compiler also has to assume that
>>>>> msleep_interruptible()
>>>>> might call back into a function in the current compilation unit,
>>>>> thus
>>>>> possibly modifying global static variables.
>>>>
>>>> It most often is smart enough to see what compilation-unit-local
>>>> variables might be modified that way, though :-)
>>>
>>> Yep. For example, if it knows the current value of a given such
>>> local
>>> variable, and if all code paths that would change some other variable
>>> cannot be reached given that current value of the first variable.
>>
>> Or the most common thing: if neither the address of the translation-
>> unit local variable nor the address of any function writing to that
>> variable can "escape" from that translation unit, nothing outside
>> the translation unit can write to the variable.
>
> But there is usually at least one externally callable function in
> a .c file.
Of course, but often none of those will (indirectly) write a certain
static variable.
Segher
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-16 1:26 UTC (permalink / raw)
To: Herbert Xu
Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam, clameter,
schwidefsky, Chris Snook, davem, wensong, wjiang
In-Reply-To: <20070815234021.GA28775@gondor.apana.org.au>
>> Part of the motivation here is to fix heisenbugs. If I knew where
>> they
>
> By the same token we should probably disable optimisations
> altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.
Segher
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-16 1:30 UTC (permalink / raw)
To: paulmck
Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam, clameter,
schwidefsky, Chris Snook, Herbert Xu, davem, wensong, wjiang
In-Reply-To: <20070815235125.GT9645@linux.vnet.ibm.com>
>>> Part of the motivation here is to fix heisenbugs. If I knew where
>>> they
>>
>> By the same token we should probably disable optimisations
>> altogether since that too can create heisenbugs.
>
> Precisely the point -- use of volatile (whether in casts or on asms)
> in these cases are intended to disable those optimizations likely to
> result in heisenbugs.
The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".
It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)
Segher
^ permalink raw reply
* Re:
From: Segher Boessenkool @ 2007-08-16 1:38 UTC (permalink / raw)
To: Satyam Sharma
Cc: horms, Stefan Richter, Linux Kernel Mailing List,
Paul E. McKenney, ak, netdev, cfriesen, Heiko Carstens, rpjday,
jesper.juhl, linux-arch, Andrew Morton, zlynx, clameter,
schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708160338520.22701@enigma.security.iitk.ac.in>
>> "compilation unit" is a C standard term. It typically boils down
>> to "single .c file".
>
> As you mentioned later, "single .c file with all the other files
> (headers
> or other .c files) that it pulls in via #include" is actually
> "translation
> unit", both in the C standard as well as gcc docs.
Yeah. "single .c file after preprocessing". Same thing :-)
> "Compilation unit"
> doesn't seem to be nearly as standard a term, though in most places it
> is indeed meant to be same as "translation unit", but with the new gcc
> inter-module-analysis stuff that you referred to above, I suspect one
> may
> reasonably want to call a "compilation unit" as all that the compiler
> sees
> at a given instant.
That would be a bit confusing, would it not? They'd better find
some better name for that if they want to name it at all (remember,
none of these optimisations should have any effect on the semantics
of the program, you just get fewer .o files etc.).
Segher
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-08-16 1:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <20070816011414.GC9645@linux.vnet.ibm.com>
On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> Understood. My point is not that the impact is precisely zero, but
> rather that the impact on optimization is much less hurtful than the
> problems that could arise otherwise, particularly as compilers become
> more aggressive in their optimizations.
The problems arise because barriers are not used as required. Volatile
has wishy washy semantics and somehow marries memory barriers with data
access. It is clearer to separate the two. Conceptual cleanness usually
translates into better code. If one really wants the volatile then lets
make it explicit and use
atomic_read_volatile()
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul Mackerras @ 2007-08-16 1:51 UTC (permalink / raw)
To: Christoph Lameter
Cc: Paul E. McKenney, Satyam Sharma, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <Pine.LNX.4.64.0708151740570.10497@schroedinger.engr.sgi.com>
Christoph Lameter writes:
> A volatile default would disable optimizations for atomic_read.
> atomic_read without volatile would allow for full optimization by the
> compiler. Seems that this is what one wants in many cases.
Name one such case.
An atomic_read should do a load from memory. If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read. I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.
Paul.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Herbert Xu @ 2007-08-16 2:00 UTC (permalink / raw)
To: Paul Mackerras
Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
Stefan Richter, Chris Snook, Linux Kernel Mailing List,
linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <18115.44462.622801.683446@cargo.ozlabs.ibm.com>
On Thu, Aug 16, 2007 at 11:51:42AM +1000, Paul Mackerras wrote:
>
> Name one such case.
See sk_stream_mem_schedule in net/core/stream.c:
/* Under limit. */
if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
if (*sk->sk_prot->memory_pressure)
*sk->sk_prot->memory_pressure = 0;
return 1;
}
/* Over hard limit. */
if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
sk->sk_prot->enter_memory_pressure();
goto suppress_allocation;
}
We don't need to reload sk->sk_prot->memory_allocated here.
Now where is your example again?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-16 2:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Paul E. McKenney, Paul Mackerras, Stefan Richter, Chris Snook,
Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
Herbert Xu
In-Reply-To: <Pine.LNX.4.64.0708151839140.11520@schroedinger.engr.sgi.com>
On Wed, 15 Aug 2007, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
>
> > Understood. My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
>
> The problems arise because barriers are not used as required. Volatile
> has wishy washy semantics and somehow marries memory barriers with data
> access. It is clearer to separate the two. Conceptual cleanness usually
> translates into better code. If one really wants the volatile then lets
> make it explicit and use
>
> atomic_read_volatile()
Completely agreed, again. To summarize again (had done so about ~100 mails
earlier in this thread too :-) ...
atomic_{read,set}_volatile() -- guarantees volatility also along with
atomicity (the two _are_ different concepts after all, irrespective of
whether callsites normally want one with the other or not)
atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
free to elid / coalesce / optimize such accesses, can keep the object
in question cached in a local register, leads to smaller text, etc.
As to which one should be the default atomic_read() is a question of
whether majority of callsites (more weightage to important / hot
codepaths, lesser to obscure callsites) want a particular behaviour.
Do we have a consensus here? (hoping against hope, probably :-)
[ This thread has gotten completely out of hand ... for my mail client
alpine as well, it now seems. Reminds of that 1000+ GPLv3 fest :-) ]
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul Mackerras @ 2007-08-16 2:05 UTC (permalink / raw)
To: Herbert Xu
Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
Stefan Richter, Chris Snook, Linux Kernel Mailing List,
linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <20070816020042.GA30650@gondor.apana.org.au>
Herbert Xu writes:
> See sk_stream_mem_schedule in net/core/stream.c:
>
> /* Under limit. */
> if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> if (*sk->sk_prot->memory_pressure)
> *sk->sk_prot->memory_pressure = 0;
> return 1;
> }
>
> /* Over hard limit. */
> if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> sk->sk_prot->enter_memory_pressure();
> goto suppress_allocation;
> }
>
> We don't need to reload sk->sk_prot->memory_allocated here.
Are you sure? How do you know some other CPU hasn't changed the value
in between?
Paul.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-16 2:07 UTC (permalink / raw)
To: Paul Mackerras
Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
Satyam Sharma, Linux Kernel Mailing List, Paul E. McKenney,
netdev, ak, cfriesen, rpjday, jesper.juhl, linux-arch,
Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
Linus Torvalds, wensong, wjiang
In-Reply-To: <18115.44462.622801.683446@cargo.ozlabs.ibm.com>
>> A volatile default would disable optimizations for atomic_read.
>> atomic_read without volatile would allow for full optimization by the
>> compiler. Seems that this is what one wants in many cases.
>
> Name one such case.
>
> An atomic_read should do a load from memory. If the programmer puts
> an atomic_read() in the code then the compiler should emit a load for
> it, not re-use a value returned by a previous atomic_read. I do not
> believe it would ever be useful for the compiler to collapse two
> atomic_read statements into a single load.
An atomic_read() implemented as a "normal" C variable read would
allow that read to be combined with another "normal" read from
that variable. This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.
With an asm() implementation, the compiler can not do this; with
a "volatile" implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).
Segher
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Herbert Xu @ 2007-08-16 2:08 UTC (permalink / raw)
To: Satyam Sharma
Cc: Christoph Lameter, Paul E. McKenney, Paul Mackerras,
Stefan Richter, Chris Snook, Linux Kernel Mailing List,
linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <alpine.LFD.0.999.0708160730590.24380@enigma.security.iitk.ac.in>
On Thu, Aug 16, 2007 at 07:45:44AM +0530, Satyam Sharma wrote:
>
> Completely agreed, again. To summarize again (had done so about ~100 mails
> earlier in this thread too :-) ...
>
> atomic_{read,set}_volatile() -- guarantees volatility also along with
> atomicity (the two _are_ different concepts after all, irrespective of
> whether callsites normally want one with the other or not)
>
> atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
> free to elid / coalesce / optimize such accesses, can keep the object
> in question cached in a local register, leads to smaller text, etc.
>
> As to which one should be the default atomic_read() is a question of
> whether majority of callsites (more weightage to important / hot
> codepaths, lesser to obscure callsites) want a particular behaviour.
>
> Do we have a consensus here? (hoping against hope, probably :-)
I can certainly agree with this.
But I have to say that I still don't know of a single place
where one would actually use the volatile variant.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Herbert Xu @ 2007-08-16 2:11 UTC (permalink / raw)
To: Paul Mackerras
Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
Stefan Richter, Chris Snook, Linux Kernel Mailing List,
linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <18115.45316.702491.681906@cargo.ozlabs.ibm.com>
On Thu, Aug 16, 2007 at 12:05:56PM +1000, Paul Mackerras wrote:
> Herbert Xu writes:
>
> > See sk_stream_mem_schedule in net/core/stream.c:
> >
> > /* Under limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> > if (*sk->sk_prot->memory_pressure)
> > *sk->sk_prot->memory_pressure = 0;
> > return 1;
> > }
> >
> > /* Over hard limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> > sk->sk_prot->enter_memory_pressure();
> > goto suppress_allocation;
> > }
> >
> > We don't need to reload sk->sk_prot->memory_allocated here.
>
> Are you sure? How do you know some other CPU hasn't changed the value
> in between?
Yes I'm sure, because we don't care if others have increased
the reservation.
Note that even if we did we'd be using barriers so volatile
won't do us any good here.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-08-16 2:15 UTC (permalink / raw)
To: Paul Mackerras
Cc: Herbert Xu, Paul E. McKenney, Satyam Sharma, Stefan Richter,
Chris Snook, Linux Kernel Mailing List, linux-arch,
Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
jesper.juhl, segher
In-Reply-To: <18115.45316.702491.681906@cargo.ozlabs.ibm.com>
On Thu, 16 Aug 2007, Paul Mackerras wrote:
> > We don't need to reload sk->sk_prot->memory_allocated here.
>
> Are you sure? How do you know some other CPU hasn't changed the value
> in between?
The cpu knows because the cacheline was not invalidated.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox