* remove __read_mostly?
@ 2006-06-25 18:57 Andrew Morton
2006-06-25 21:19 ` Ravikiran G Thirumalai
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Andrew Morton @ 2006-06-25 18:57 UTC (permalink / raw)
To: Christoph Lameter, Ravikiran G Thirumalai; +Cc: linux-kernel
I'm thinking we should remove __read_mostly.
Because if we use this everywhere where it's supposed to be used, we end up
with .bss and .data 100% populated with write-often variables, packed
closely together. The cachelines will really flying around.
IOW: __read_mostly optimises read-mostly variables and pessimises
write-often variables.
We want something which optimises both read-mostly and write-often storage.
We do that by marking the write-often variables with
__cacheline_aligned_in_smp.
OK?
[Problem is, I don't think any of the make-foo-__read_mostly patches
actually identified _which_ write-often variables were affecting `foo', so
we'll be back to square one.]
[Actually, we should do
#define __write_often __cacheline_aligned_in_smp
and use __write_often
a) for documentation and
b) so the optimisation can be centrally turned off, for space
optimisation or for performance validation.]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 18:57 remove __read_mostly? Andrew Morton
@ 2006-06-25 21:19 ` Ravikiran G Thirumalai
2006-06-26 18:39 ` Paul Jackson
2006-06-25 21:44 ` Arjan van de Ven
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Ravikiran G Thirumalai @ 2006-06-25 21:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel
On Sun, Jun 25, 2006 at 11:57:36AM -0700, Andrew Morton wrote:
>
> I'm thinking we should remove __read_mostly.
>
> Because if we use this everywhere where it's supposed to be used, we end up
> with .bss and .data 100% populated with write-often variables, packed
> closely together. The cachelines will really flying around.
>
> IOW: __read_mostly optimises read-mostly variables and pessimises
> write-often variables.
>
> We want something which optimises both read-mostly and write-often storage.
> We do that by marking the write-often variables with
> __cacheline_aligned_in_smp.
We already mark write often structures with __cacheline_aligned_in_smp.
The idea behind __read_mostly is to separate variables like cpu maps,
bootcpuinfo etc which are written to very very rarely -- during
initialization/hot-plugging, but read quite often something like ~100 % read
ratio. They might not be sharing the cacheline with a
variable which is not as frequently written to, to mark
__cacheline_aligned_in_smp, but still, these often read variables would be
invalidated in cache every time there is a write on these other variables.
Considering that there are quite a few structures which are read often,
(like cpu maps, cpu info, node to cpu maps etc) it made sense to place them
in a separate section.
When we mark variables __read_mostly, it doesn't mean all that is left in
.bss and .data is write mostly. Surely the rest of .data and .bss which
do not qualify for ~100% read would not be ~100% write/RMW. We would have
variables which experience varying ratio of reads and writes.
So as I see it the current scheme of thing is
1. If a variable is written quite often -- mark __cacheline_aligned_in_smp.
2. If a variable is read quite often .. something like 99:1 read -- mark them
__read_mostly
>
> OK?
>
> [Problem is, I don't think any of the make-foo-__read_mostly patches
> actually identified _which_ write-often variables were affecting `foo', so
> we'll be back to square one.]
Most of the variables we marked as __read_mostly were ones we were seeing
inter-node transfers for read, when we shouldn't be seeing any -- because
they are just written to once during bootup. And it did not make sense to
mark the other writers on the inter-node cacheline __cacheline_aligned_in_smp,
since it would mean a bloat of 128-4096 bytes depending on the arch, and
those variables were not written often enough to warrant padding. A few not
so-often-written variables on the same cacheline would mean these read
mostly variables are always being invalidated.
>
> [Actually, we should do
>
> #define __write_often __cacheline_aligned_in_smp
>
> and use __write_often
>
> a) for documentation and
>
> b) so the optimisation can be centrally turned off, for space
> optimisation or for performance validation.]
>
Sure, we can change users of __cacheline_aligned_in_smp to __write_often and
change the section to .data.write_mostly from .data.cacheline_aligned for
readability sake. But, IMHO we do need the __read_mostly section for variables
which experience near 100% read access. I cannot see any negative aspects of
having a separate read mostly section as there is no padding or bloating
involved. Granted, we might not see performance difference with some builds
due to the compiler/linker placement / code changes, but this might
resurface later with a newer build/code changes.
Thanks,
Kiran
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 18:57 remove __read_mostly? Andrew Morton
2006-06-25 21:19 ` Ravikiran G Thirumalai
@ 2006-06-25 21:44 ` Arjan van de Ven
2006-06-26 4:20 ` Christoph Lameter
2006-06-25 21:52 ` Arjan van de Ven
2006-06-26 4:17 ` Christoph Lameter
3 siblings, 1 reply; 10+ messages in thread
From: Arjan van de Ven @ 2006-06-25 21:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, Ravikiran G Thirumalai, linux-kernel
On Sun, 2006-06-25 at 11:57 -0700, Andrew Morton wrote:
> I'm thinking we should remove __read_mostly.
>
> Because if we use this everywhere where it's supposed to be used, we end up
> with .bss and .data 100% populated with write-often variables, packed
> closely together. The cachelines will really flying around.
one thing we could/should do is have a "written during boot only"
section; which we then can mark read only as well :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 18:57 remove __read_mostly? Andrew Morton
2006-06-25 21:19 ` Ravikiran G Thirumalai
2006-06-25 21:44 ` Arjan van de Ven
@ 2006-06-25 21:52 ` Arjan van de Ven
2006-06-26 4:17 ` Christoph Lameter
3 siblings, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-06-25 21:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, Ravikiran G Thirumalai, linux-kernel
> Because if we use this everywhere where it's supposed to be used, we end up
> with .bss and .data 100% populated with write-often variables, packed
> closely together. The cachelines will really flying around.
this argument is true if you have unrelated data together; however if
related data is together than suddenly you improve and increase cache
density....
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 18:57 remove __read_mostly? Andrew Morton
` (2 preceding siblings ...)
2006-06-25 21:52 ` Arjan van de Ven
@ 2006-06-26 4:17 ` Christoph Lameter
3 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-06-26 4:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ravikiran G Thirumalai, linux-kernel
On Sun, 25 Jun 2006, Andrew Morton wrote:
> I'm thinking we should remove __read_mostly.
>
> Because if we use this everywhere where it's supposed to be used, we end up
> with .bss and .data 100% populated with write-often variables, packed
> closely together. The cachelines will really flying around.
What we really want is a write-often variable in a cacheline combined with
infrequently read and write data. However, data that is frequently read
(that is __read_mostly) would still need to be in a separate section.
> IOW: __read_mostly optimises read-mostly variables and pessimises
> write-often variables.
>
> We want something which optimises both read-mostly and write-often storage.
> We do that by marking the write-often variables with
> __cacheline_aligned_in_smp.
>
> OK?
I think we would want to group write-often variables with infrequently
used variable. But how does one convince the linker to doing that?
I agree that there is a problem with shift frequently written variables
together which may in itself cause ill effects. So __read_mostly should
only be used when we have identified real cacheline sharing problems and
real cache hot variables may have to be put in a separate cacheline for
itself. I think we already do that and that is at least the way I have
handled it. Too many __read_mostly kill the whole point of the exercise.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 21:44 ` Arjan van de Ven
@ 2006-06-26 4:20 ` Christoph Lameter
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-06-26 4:20 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Andrew Morton, Ravikiran G Thirumalai, linux-kernel
On Sun, 25 Jun 2006, Arjan van de Ven wrote:
> On Sun, 2006-06-25 at 11:57 -0700, Andrew Morton wrote:
> > I'm thinking we should remove __read_mostly.
> >
> > Because if we use this everywhere where it's supposed to be used, we end up
> > with .bss and .data 100% populated with write-often variables, packed
> > closely together. The cachelines will really flying around.
>
>
> one thing we could/should do is have a "written during boot only"
> section; which we then can mark read only as well :)
To replicate an IRIX idea:
We could have section with variables that can only be modified by special
command. F.e.
enable_write_to_most_read_section()
<change variable>
disable_write_to_most_read_section()
This section could be a per cpu section that would be replicated
to all nodes on the system on disable_write_to_most_read_section().
That would mean that critical read only data would be node local.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-25 21:19 ` Ravikiran G Thirumalai
@ 2006-06-26 18:39 ` Paul Jackson
2006-06-26 18:46 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Paul Jackson @ 2006-06-26 18:39 UTC (permalink / raw)
To: Ravikiran G Thirumalai; +Cc: akpm, clameter, linux-kernel
Ravikiran wrote:
> The idea behind __read_mostly is to separate variables like cpu maps,
> bootcpuinfo etc which are written to very very rarely -- during
> initialization/hot-plugging, but read quite often something like ~100 % read
> ratio.
So these variables are __read_hot_write_cold?
In other words, the name __read_mostly is a little misleading, in my
book. That name only suggests read much more than written. In your
words:
something like 99:1 read
It doesn't state that the variable is so "read hot", it is worth keeping
off "write hot" cache lines.
Let's say for example we have a variable is accessed only once per
hour, and that this access is always a read except once a week (once
every 168 hours) when it is a write.
I would not mark that variable __read_mostly, even though it passed
your 99:1 test. That variable is read_cold_write_evencolder. It's an
ideal candidate for the canon fodder that we use to fill up the rest
of a cache line that has a hot variable.
If Andrew's suggestion to remove __read_mostly doesn't fly, then I'd
vote for a name change:
__read_mostly ==> __read_hot_write_cold
I think we want to identify the hottest memory words, keeping them
on separate cache lines, except that __read_hot_write_cold words can
share cache lines with each other.
A given cache line would have at most one hot write word, or one or
more read hot, write cold words.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-26 18:39 ` Paul Jackson
@ 2006-06-26 18:46 ` Christoph Lameter
2006-06-26 19:11 ` Paul Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2006-06-26 18:46 UTC (permalink / raw)
To: Paul Jackson; +Cc: Ravikiran G Thirumalai, akpm, clameter, linux-kernel
On Mon, 26 Jun 2006, Paul Jackson wrote:
> In other words, the name __read_mostly is a little misleading, in my
> book. That name only suggests read much more than written. In your
> words:
>
> something like 99:1 read
99:1 may be too small a ratio.
A read_mostly marked variable should be changed rarely (meaning is
is extremely unlikely that his is going to change) but read frequently.
F.e. configuration data for timer operations, number of possible
processors and stuff like that.
If we would make the operation to write to the read_mostly section more
expensive (by f.e. replicating the data per node) then this would hold off
the uses that are changing too frequently.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-26 18:46 ` Christoph Lameter
@ 2006-06-26 19:11 ` Paul Jackson
2006-06-26 19:52 ` Ravikiran G Thirumalai
0 siblings, 1 reply; 10+ messages in thread
From: Paul Jackson @ 2006-06-26 19:11 UTC (permalink / raw)
To: Christoph Lameter; +Cc: kiran, akpm, clameter, linux-kernel
Christoph wrote:
> 99:1 may be too small a ratio.
Could well be. I was just quoting Ravikiran's number. I suspect he
was using the number loosely, not as a precise value.
> A read_mostly marked variable should be changed rarely (meaning is
> is extremely unlikely that his is going to change) but read frequently.
Yes - well said.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: remove __read_mostly?
2006-06-26 19:11 ` Paul Jackson
@ 2006-06-26 19:52 ` Ravikiran G Thirumalai
0 siblings, 0 replies; 10+ messages in thread
From: Ravikiran G Thirumalai @ 2006-06-26 19:52 UTC (permalink / raw)
To: Paul Jackson; +Cc: Christoph Lameter, akpm, clameter, linux-kernel
On Mon, Jun 26, 2006 at 12:11:24PM -0700, Paul Jackson wrote:
> Christoph wrote:
> > 99:1 may be too small a ratio.
>
> Could well be. I was just quoting Ravikiran's number. I suspect he
> was using the number loosely, not as a precise value.
Yes I was using it loosely. I also mentioned ~100% read which is probably
more accurate :). It is indeed "read hot write cold". Writes on these
variables are typically during bootup/subsystem initialization/hot plug
events.
Thanks,
Kiran
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-06-26 19:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25 18:57 remove __read_mostly? Andrew Morton
2006-06-25 21:19 ` Ravikiran G Thirumalai
2006-06-26 18:39 ` Paul Jackson
2006-06-26 18:46 ` Christoph Lameter
2006-06-26 19:11 ` Paul Jackson
2006-06-26 19:52 ` Ravikiran G Thirumalai
2006-06-25 21:44 ` Arjan van de Ven
2006-06-26 4:20 ` Christoph Lameter
2006-06-25 21:52 ` Arjan van de Ven
2006-06-26 4:17 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox