From: Ingo Molnar <mingo@elte.hu>
To: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org,
linux-kernel@vger.kernel.org, hpa@zytor.com,
Paul Mundt <lethal@linux-sh.org>,
rmk@arm.linux.org.uk, starvik@axis.com, ralf@linux-mips.org,
davem@davemloft.net, cooloney@kernel.org, kyle@mcmartin.ca,
matthew@wil.cx, grundler@parisc-linux.org, takata@linux-m32r.org,
benh@kernel.crashing.org, rth@twiddle.net,
ink@jurassic.park.msu.ru, heiko.carstens@de.ibm.com,
Linus Torvalds <torvalds@linux-foundation.org>,
Nick Piggin <npiggin@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as the default percpu allocator
Date: Wed, 1 Apr 2009 21:01:13 +0200 [thread overview]
Message-ID: <20090401190113.GA734@elte.hu> (raw)
In-Reply-To: <alpine.DEB.1.10.0904011404240.29839@qirst.com>
* Christoph Lameter <cl@linux.com> wrote:
> __read_mostly should be packed as tightly as possible to increase
> the chance that one cacheline includes multiple of the critical
> variables for the hot code paths. Too much __read_mostly defeats
> its purpose.
That stance is a commonly held but quite wrong and harmful IMHO.
It stiffles the proper identification of read-mostly variables _AND_
it hurts the proper identification of critical write-often variables
as well. Not good.
The solution for critical write-often variables is what we always
used: to identify them explicitly and to place them consciously into
separate cachelines. (Or to per-cpu-ify or object-ify them where
possible/sensible.)
Then annotate everything that is read-mostly and accessed-frequently
with the __read_mostly attribute.
The rest (unannotated variables) is to be assumed "access-rarely" or
"we-dont-care", by default. This is actually 95% of the global
variables.
Yes, a spreading amount of annotations puts increasing pressure on
the places that are frequently access but not properly annotated -
but we should be happy about that: it creates the dynamics and
pressure for them to be properly annotated.
On the other hand, depending on the "put enough data bloat between
critical variables anyway, no need to care alignment" scheme is a
sloppy, fragile concept that does not lead to a reliable and
dependable end result.
It has two problems:
- Thinking that this solves false cacheline sharing reliably is
wrong: there's nothing that guarantees and enforces that slapping
a few variables between two critical variables puts them on
separate cachelines:
- Ongoing changes in code can bit-rot the
thought-to-be-large-enough distance between two critical
variables - and there's no mechanism in place.
Explicitly cacheline aligning them will preserve the
information long-term.
- There are architectures with larger cacheline sizes than
what you are developing on.
- .config variations can move variables closer or farther
apart from each other, hiding/triggering the false cacheline
sharing problem.
It is not a maintainable concept IMHO and we should not pretend
it is.
- It actually prevents true read-mostly variables from being
annotated properly. (In such a case a true read-mostly variable
bouncing around with a frequently-written variable cache line is
almost as bad in terms of MESI latencies and costs as false
cacheline sharing between two write-mostly variables.)
Architecturing the layout of variables in a knowingly random and
.config sensitive way is simply not good design and we should not
pretend it is.
We might not be able to solve the problem if not enough people care
about their variables, but we should at least not be proud of a
non-solution ;-)
Ingo
next prev parent reply other threads:[~2009-04-01 19:06 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-10 7:53 [GIT RFC] percpu: use dynamic percpu allocator as the default percpu allocator Tejun Heo
2009-03-10 7:53 ` [PATCH 1/5] linker script: define __per_cpu_load on all SMP capable archs Tejun Heo
2009-03-10 7:53 ` [PATCH 2/5] percpu: make x86 addr <-> pcpu ptr conversion macros generic Tejun Heo
2009-03-10 7:53 ` [PATCH 3/5] percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk() Tejun Heo
2009-03-10 7:53 ` [PATCH 4/5] percpu: generalize embedding first chunk setup helper Tejun Heo
2009-03-10 7:53 ` [PATCH 5/5] percpu: use dynamic percpu allocator as the default percpu allocator Tejun Heo
2009-03-10 7:57 ` test module to verify " Tejun Heo
2009-03-10 10:59 ` [GIT RFC] percpu: use dynamic percpu allocator as the default " David Miller
2009-03-11 6:03 ` Tejun Heo
2009-03-11 5:56 ` [GIT PULL] pull request for safe part Tejun Heo
2009-03-16 18:01 ` [GIT RFC] percpu: use dynamic percpu allocator as the default percpu allocator Martin Schwidefsky
2009-03-20 2:35 ` Tejun Heo
2009-03-24 15:22 ` Tejun Heo
2009-03-25 11:27 ` Martin Schwidefsky
2009-03-25 11:51 ` Tejun Heo
2009-03-25 12:22 ` Ingo Molnar
2009-03-25 12:27 ` Tejun Heo
2009-03-25 12:39 ` Ingo Molnar
2009-03-25 13:13 ` Martin Schwidefsky
2009-03-25 13:21 ` Tejun Heo
2009-03-25 13:25 ` Martin Schwidefsky
2009-03-25 13:34 ` Tejun Heo
2009-03-31 16:54 ` Martin Schwidefsky
2009-03-31 17:20 ` Christoph Lameter
2009-03-31 20:18 ` Martin Schwidefsky
2009-03-31 21:10 ` Christoph Lameter
2009-04-01 8:01 ` Martin Schwidefsky
2009-03-31 19:17 ` Ivan Kokshaysky
2009-03-31 20:19 ` Martin Schwidefsky
2009-03-31 20:29 ` Ivan Kokshaysky
2009-04-01 0:07 ` Tejun Heo
2009-04-01 8:10 ` Martin Schwidefsky
2009-04-01 8:17 ` Tejun Heo
2009-04-01 8:32 ` Martin Schwidefsky
2009-04-01 8:37 ` David Miller
2009-04-01 8:47 ` Martin Schwidefsky
2009-04-01 8:50 ` Tejun Heo
2009-04-01 9:08 ` Martin Schwidefsky
2009-04-02 1:54 ` Tejun Heo
2009-04-01 8:53 ` David Miller
2009-04-01 8:53 ` Tejun Heo
2009-04-01 11:07 ` Martin Schwidefsky
2009-04-02 1:57 ` Tejun Heo
2009-04-02 7:24 ` Ivan Kokshaysky
2009-04-02 11:13 ` Martin Schwidefsky
2009-04-03 0:31 ` Tejun Heo
2009-04-07 16:09 ` Ivan Kokshaysky
2009-04-08 20:18 ` Tejun Heo
2009-04-09 9:47 ` Ivan Kokshaysky
2009-04-09 11:53 ` Tejun Heo
2009-04-11 1:38 ` Rusty Russell
2009-04-11 1:52 ` Tejun Heo
2009-04-02 0:20 ` Rusty Russell
2009-03-25 14:00 ` Martin Schwidefsky
2009-03-25 14:14 ` Tejun Heo
2009-03-30 10:07 ` [PATCH UPDATED] " Tejun Heo
2009-03-30 10:42 ` Martin Schwidefsky
2009-04-01 0:08 ` Tejun Heo
2009-03-30 11:49 ` Ingo Molnar
2009-03-30 14:50 ` Christoph Lameter
2009-03-31 16:12 ` Christoph Lameter
2009-04-01 0:15 ` Tejun Heo
2009-04-01 13:49 ` Christoph Lameter
2009-04-01 15:49 ` Ingo Molnar
2009-04-01 18:06 ` Christoph Lameter
2009-04-01 19:01 ` Ingo Molnar [this message]
2009-04-01 19:39 ` Linus Torvalds
2009-04-01 20:12 ` Matthew Wilcox
2009-04-02 2:13 ` Ingo Molnar
2009-04-01 22:32 ` Ingo Molnar
2009-04-01 22:57 ` Matthew Wilcox
2009-04-02 2:10 ` Ingo Molnar
2009-04-02 2:21 ` Christoph Lameter
2009-04-02 3:25 ` Ingo Molnar
2009-04-02 3:28 ` Christoph Lameter
2009-04-02 2:30 ` Tejun Heo
2009-04-02 2:18 ` Christoph Lameter
2009-04-02 3:42 ` Ingo Molnar
2009-04-02 13:53 ` Christoph Lameter
2009-04-08 16:26 ` Ingo Molnar
2009-04-13 18:18 ` Christoph Lameter
2009-04-14 14:04 ` Ingo Molnar
2009-04-14 16:48 ` Christoph Lameter
2009-04-14 17:12 ` Ingo Molnar
2009-04-02 2:15 ` Christoph Lameter
2009-04-02 4:19 ` [PATCH 1/2 x86#core/percpu] percpu: don't put the first chunk in reverse-map rbtree Tejun Heo
2009-04-02 4:21 ` [PATCH 2/2 x86#core/percpu] percpu: remove rbtree and use page->index instead Tejun Heo
2009-04-08 17:03 ` [tip:core/percpu] " Christoph Lameter
2009-04-08 17:03 ` [tip:core/percpu] percpu: don't put the first chunk in reverse-map rbtree Tejun Heo
2009-03-31 16:14 ` [PATCH UPDATED] percpu: use dynamic percpu allocator as the default percpu allocator Christoph Lameter
2009-04-01 0:18 ` Tejun Heo
2009-03-31 1:34 ` Rusty Russell
2009-03-31 22:57 ` David Miller
2009-03-31 23:49 ` Benjamin Herrenschmidt
2009-04-01 0:19 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090401190113.GA734@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=benh@kernel.crashing.org \
--cc=cl@linux.com \
--cc=cooloney@kernel.org \
--cc=davem@davemloft.net \
--cc=grundler@parisc-linux.org \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=ink@jurassic.park.msu.ru \
--cc=kyle@mcmartin.ca \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=npiggin@suse.de \
--cc=ralf@linux-mips.org \
--cc=rmk@arm.linux.org.uk \
--cc=rth@twiddle.net \
--cc=rusty@rustcorp.com.au \
--cc=schwidefsky@de.ibm.com \
--cc=starvik@axis.com \
--cc=takata@linux-m32r.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox