[PATCH RESEND] fs: Move bh_cachep to the __read

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
@ 2012-05-28 11:58 Vlad Zolotarov
  2012-06-07  8:23 ` Vlad Zolotarov
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2012-05-28 11:58 UTC (permalink / raw)
  To: viro, linux-fsdevel; +Cc: linux-kernel, Shai

From: Shai Fultheim <shai@scalemp.com>

bh_cachep is only written to once on initialization, so move it to the
__read_mostly section.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
---
 fs/buffer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index ad5938c..838a9cf 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
 /*
  * Buffer-head allocation
  */
-static struct kmem_cache *bh_cachep;
+static struct kmem_cache *bh_cachep __read_mostly;
 
 /*
  * Once the number of bh's in the machine exceeds this level, we start
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-05-28 11:58 [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section Vlad Zolotarov
@ 2012-06-07  8:23 ` Vlad Zolotarov
  2012-06-08  0:07 ` Andrew Morton
  2012-07-03  9:08 ` Vlad Zolotarov
  2 siblings, 0 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2012-06-07  8:23 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, Shai

On Monday 28 May 2012 14:58:42 Vlad Zolotarov wrote:
> From: Shai Fultheim <shai@scalemp.com>
> 
> bh_cachep is only written to once on initialization, so move it to the
> __read_mostly section.
> 
> Signed-off-by: Shai Fultheim <shai@scalemp.com>
> Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> ---
>  fs/buffer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ad5938c..838a9cf 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
>  /*
>   * Buffer-head allocation
>   */
> -static struct kmem_cache *bh_cachep;
> +static struct kmem_cache *bh_cachep __read_mostly;
> 
>  /*
>   * Once the number of bh's in the machine exceeds this level, we start

Alexander, could u, pls., take a look at this patch? Could u, pls., update me 
about its state. Should I respin?

thanks in advance,
vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-05-28 11:58 [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section Vlad Zolotarov
  2012-06-07  8:23 ` Vlad Zolotarov
@ 2012-06-08  0:07 ` Andrew Morton
  2012-06-10  9:36   ` Vlad Zolotarov
  2012-07-03  9:08 ` Vlad Zolotarov
  2 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2012-06-08  0:07 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: viro, linux-fsdevel, linux-kernel, Shai

On Mon, 28 May 2012 14:58:42 +0300
Vlad Zolotarov <vlad@scalemp.com> wrote:

> From: Shai Fultheim <shai@scalemp.com>
> 
> bh_cachep is only written to once on initialization, so move it to the
> __read_mostly section.
> 
> Signed-off-by: Shai Fultheim <shai@scalemp.com>
> Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> ---
>  fs/buffer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ad5938c..838a9cf 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
>  /*
>   * Buffer-head allocation
>   */
> -static struct kmem_cache *bh_cachep;
> +static struct kmem_cache *bh_cachep __read_mostly;
>  

hm, I thought I replied to this earlier, but I can't find that email.

Yes, bh_cachep is read-mostly.  In fact it's write-once.  But the same
is true of all kmem_cache*'s.  I don't see any particular reason for
singling out bh_cachep.

Alas, I don't see a smart way of addressing this.  It's either a
patchset which adds __read_mostly to all kmem_cache*'s, or a patchset
which converts all the definitions to use some nasty macro which
inserts the __read_mostly.

And I still have theoretical concerns with __read_mostly.  As we
further sort the storage into read-mostly and write-often sections, the
density of writes in the write-mostly section increases.  IOW, removing
the read-mostly padding *increase* cross-CPU traffic in the write-often
scction.  IOW2, leaving the read-mostly stuff where it is provides
beneficial padding to the write-often fields.  I don't think it has
been shown that there will be net gains.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-06-08  0:07 ` Andrew Morton
@ 2012-06-10  9:36   ` Vlad Zolotarov
  2012-07-01 11:34     ` Vlad Zolotarov
  0 siblings, 1 reply; 13+ messages in thread
From: Vlad Zolotarov @ 2012-06-10  9:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: viro, linux-fsdevel, linux-kernel, Shai

On Thursday 07 June 2012 17:07:21 Andrew Morton wrote:
> On Mon, 28 May 2012 14:58:42 +0300
> 
> Vlad Zolotarov <vlad@scalemp.com> wrote:
> > From: Shai Fultheim <shai@scalemp.com>
> > 
> > bh_cachep is only written to once on initialization, so move it to the
> > __read_mostly section.
> > 
> > Signed-off-by: Shai Fultheim <shai@scalemp.com>
> > Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> > ---
> > 
> >  fs/buffer.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index ad5938c..838a9cf 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
> > 
> >  /*
> >  
> >   * Buffer-head allocation
> >   */
> > 
> > -static struct kmem_cache *bh_cachep;
> > +static struct kmem_cache *bh_cachep __read_mostly;
> 
> hm, I thought I replied to this earlier, but I can't find that email.
> 
> Yes, bh_cachep is read-mostly.  In fact it's write-once.  But the same
> is true of all kmem_cache*'s.  I don't see any particular reason for
> singling out bh_cachep.
> 
> 
> Alas, I don't see a smart way of addressing this.  It's either a
> patchset which adds __read_mostly to all kmem_cache*'s, or a patchset
> which converts all the definitions to use some nasty macro which
> inserts the __read_mostly.

Well, it may be done. However my ability to properly check it is limited as I 
have only a certain number of systems to check it on. I can create the patch, 
test it in our labs and post it on this mailing list with request to test it 
on other platforms (like non-x86 platforms). However we may also hit the 
problem u describe below if do so...

> 
> And I still have theoretical concerns with __read_mostly.  As we
> further sort the storage into read-mostly and write-often sections, the
> density of writes in the write-mostly section increases.  IOW, removing
> the read-mostly padding *increase* cross-CPU traffic in the write-often
> scction.  IOW2, leaving the read-mostly stuff where it is provides
> beneficial padding to the write-often fields.  I don't think it has
> been shown that there will be net gains.

Great explanation! The above actually nicely concludes (maybe u haven't 
actually meant that ;)) why defining write-mostly section(s) is pointless. ;)

This is a main topic of this (http://markmail.org/thread/wl4lnjluroqxgabf) 
thread between me and Ingo.

However there is a clear motivation to define a read-mostly section(s) just 
the same way there is a motivation to put constants separately from non-
constant variables which I don't think anybody argues about. ;)

On the other hand, generally speaking, if we "complete the task" and put ALL 
read-mostly variables into a separate section all the variables that would be 
left will actually represent the write-mostly section, which we would prefer 
not to have (according to u). Yet we are still far from it today... ;)

Unfortunately, we can't consider all types of bad C-code then we define 
something like "const" or "__read_mostly". We do our best. And if someone 
haven't defined a per-CPU write-mostly variable in order to prevent heavy 
cross-CPU traffic in his/her code (like in your example above) we can only try 
to fix this code. But I don't think that the existence of such code shell 
imply that the whole idea of __read_mostly section is actually bad or useless. 
It's this bad C-code that should be fixed and IMHO padding the variables with 
constants is not the proper way to fix it...

That's why I think it could be dangerous to go ahead and patch all variables 
of a certain sort (like kmem_cache*'s) with __read_mostly as we may mess the 
things up in some places (like in your example above) where there should be 
done a deeper code analysis than just pattern matching. 

So, getting back to the first section of my reply, do u still think we want to 
patch all kmem_cache*'s with __read_mostly this time or we would prefer this 
to be done incrementally in order to have better regression-ability?

Pls., comment.

thanks in advance,
vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-06-10  9:36   ` Vlad Zolotarov
@ 2012-07-01 11:34     ` Vlad Zolotarov
       [not found]       ` <CAK-9PRBiqtyiQahunPex9FT2dtKA0k0gv9PR+=sNCXVvn5Bn9Q@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-01 11:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: viro, linux-fsdevel, linux-kernel, Shai

On Sunday 10 June 2012 12:36:52 Vlad Zolotarov wrote:
> On Thursday 07 June 2012 17:07:21 Andrew Morton wrote:
> > On Mon, 28 May 2012 14:58:42 +0300
> > 
> > Vlad Zolotarov <vlad@scalemp.com> wrote:
> > > From: Shai Fultheim <shai@scalemp.com>
> > > 
> > > bh_cachep is only written to once on initialization, so move it to the
> > > __read_mostly section.
> > > 
> > > Signed-off-by: Shai Fultheim <shai@scalemp.com>
> > > Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> > > ---
> > > 
> > >  fs/buffer.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > index ad5938c..838a9cf 100644
> > > --- a/fs/buffer.c
> > > +++ b/fs/buffer.c
> > > @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
> > > 
> > >  /*
> > >  
> > >   * Buffer-head allocation
> > >   */
> > > 
> > > -static struct kmem_cache *bh_cachep;
> > > +static struct kmem_cache *bh_cachep __read_mostly;
> > 
> > hm, I thought I replied to this earlier, but I can't find that email.
> > 
> > Yes, bh_cachep is read-mostly.  In fact it's write-once.  But the same
> > is true of all kmem_cache*'s.  I don't see any particular reason for
> > singling out bh_cachep.
> > 
> > 
> > Alas, I don't see a smart way of addressing this.  It's either a
> > patchset which adds __read_mostly to all kmem_cache*'s, or a patchset
> > which converts all the definitions to use some nasty macro which
> > inserts the __read_mostly.
> 
> Well, it may be done. However my ability to properly check it is limited as
> I have only a certain number of systems to check it on. I can create the
> patch, test it in our labs and post it on this mailing list with request to
> test it on other platforms (like non-x86 platforms). However we may also
> hit the problem u describe below if do so...
> 
> > And I still have theoretical concerns with __read_mostly.  As we
> > further sort the storage into read-mostly and write-often sections, the
> > density of writes in the write-mostly section increases.  IOW, removing
> > the read-mostly padding *increase* cross-CPU traffic in the write-often
> > scction.  IOW2, leaving the read-mostly stuff where it is provides
> > beneficial padding to the write-often fields.  I don't think it has
> > been shown that there will be net gains.
> 
> Great explanation! The above actually nicely concludes (maybe u haven't
> actually meant that ;)) why defining write-mostly section(s) is pointless.
> ;)
> 
> This is a main topic of this (http://markmail.org/thread/wl4lnjluroqxgabf)
> thread between me and Ingo.
> 
> However there is a clear motivation to define a read-mostly section(s) just
> the same way there is a motivation to put constants separately from non-
> constant variables which I don't think anybody argues about. ;)
> 
> On the other hand, generally speaking, if we "complete the task" and put ALL
> read-mostly variables into a separate section all the variables that would
> be left will actually represent the write-mostly section, which we would
> prefer not to have (according to u). Yet we are still far from it today...
> ;)
> 
> Unfortunately, we can't consider all types of bad C-code then we define
> something like "const" or "__read_mostly". We do our best. And if someone
> haven't defined a per-CPU write-mostly variable in order to prevent heavy
> cross-CPU traffic in his/her code (like in your example above) we can only
> try to fix this code. But I don't think that the existence of such code
> shell imply that the whole idea of __read_mostly section is actually bad or
> useless. It's this bad C-code that should be fixed and IMHO padding the
> variables with constants is not the proper way to fix it...
> 
> That's why I think it could be dangerous to go ahead and patch all variables
> of a certain sort (like kmem_cache*'s) with __read_mostly as we may mess
> the things up in some places (like in your example above) where there
> should be done a deeper code analysis than just pattern matching.
> 
> So, getting back to the first section of my reply, do u still think we want
> to patch all kmem_cache*'s with __read_mostly this time or we would prefer
> this to be done incrementally in order to have better regression-ability?
> 
> Pls., comment.

Andrew, could u., pls., update what should be our next steps concerning this 
patch?

thanks,
vlad

> 
> thanks in advance,
> vlad
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
       [not found]       ` <CAK-9PRBiqtyiQahunPex9FT2dtKA0k0gv9PR+=sNCXVvn5Bn9Q@mail.gmail.com>
@ 2012-07-02 12:13         ` Vlad Zolotarov
  2012-07-02 16:00           ` Chinmay V S
  0 siblings, 1 reply; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-02 12:13 UTC (permalink / raw)
  To: Chinmay V S; +Cc: Andrew Morton, viro, linux-fsdevel, linux-kernel, Shai

On Sunday 01 July 2012 17:24:34 Chinmay V S wrote:
> Played around with __read_mostly some time back in an attempt to optimise
> cache usage.
> thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html

Nice article!

> 
> I concluded that in the current form, it is hard to limit the negative
> effects (bunching of write-mostly sections). IMHO, the potential
> costs/benefits of __read_mostly are closely related to the associativity of
> the cpu-cache. This differs vastly across different platforms. Hence (again
> IMHO), better to avoid usage of__read_mostly in cross-platform generic
> kernel code.

U see, if we assume that __read_mostly the way it's currently implemented is a 
bad thing (due to the fact that it implicitly causes the bunching of the 
write-mostly variables) than this means that "const" the way it's currently 
implemented is a bad thing too due to the same reasons. ;)

I'm not saying that your reasoning is completely wrong but it takes us to a 
completely different direction the current gcc tool chain is written today. If 
I extrapolate your idea it will bring us to the following:
 - no const data section. 
 - no read_mostly data section.
 - all const and non-const data resides at the same section and the linker 
tries to pack const and non-const variables together in order to minimize 
false cache line sharing. 

This is an interesting idea however there is (at least) one weakness in it - 
it assumes that linker's heuristics (those that will pack cons and non-const 
variables together in a single cache line) will do a better job than a person 
that writes a specific code section and knows the exact nature and the 
relationship between variables in his/her C code. 

I think that the current gcc tool chain implementation and kernel's linker 
scripts follows the second paradigm and leave a programmer a freedom to decide 
what is a nature of the variables and a packing strategy he/she prefers and 
then (linker) performs the most straight forward optimizations as possible on 
top of those decisions. 
By decisions I meant: 
 - defining variables as const.
 - defining variables as __read_mostly (which is quite similar to "const").
 - declaring per_cpu variables.
 - packing variables into structs and unions.
 - defining a proper alignment for the variables.
 - etc.

By linker optimizations I meant:
 - Packing all constants and __read_mostly variables into a separate data 
section.
 - Packing the per_cpu variables into a separate section.
 - Packing the variables according to the programmer's alignment definition.
 - etc.

Let's get back to the theoretical example of the code that was hammered by the 
usage of the __read_mostly variables that u are referring. 

First of all, let me note that saying that C code performance may benefit from 
NOT using the __read_mostly variables is a bit misleading because here u rely 
on something that is not deterministic: a linker decision to pack one (often 
written) variable together with another (read mostly) variable in a single 
cache line boundaries (thus improving the performance); and this decision may 
change due to a minor code change and u will not know about it. 

So, if there is a code which performance may be harmed by defining one of the 
variables as __read_only (thus taking it out from the data section where it 
was to the read_mostly data section) than the same code may be harmed by 
simply adding/removing a regular variable that will lead to the same 
phenomena.

After saying all that what should we conclude? Not to add/remove new/old 
variables because they can harm the performance of bad-formed code? How can we 
write new code then?

I think u would agree that the right solution for the problem above would be 
to fix the mentioned above code the way it won't depend on linker's decisions 
(probably by properly aligning the highly contended variables or by re-basing 
the variables into the appropriate groups spreading the contention pressure 
between these groups). And if we do so (surprise!!!) its performance won't be 
harmed by defining any read mostly variable as __read_mostly any more... ;)

I agree that there might be a few places in kernel that suffer from the 
weakness described above. That's why it's important to add __read_mostly in 
small portions (even one by one) in order to ease the bisect if and when the 
performance regression occurs.

Pls., comment.

thanks,
vlad



> 
> regards
> ChinmayVS
> 
> On Sun, Jul 1, 2012 at 5:04 PM, Vlad Zolotarov <vlad@scalemp.com> wrote:
> > On Sunday 10 June 2012 12:36:52 Vlad Zolotarov wrote:
> > > On Thursday 07 June 2012 17:07:21 Andrew Morton wrote:
> > > > On Mon, 28 May 2012 14:58:42 +0300
> > > > 
> > > > Vlad Zolotarov <vlad@scalemp.com> wrote:
> > > > > From: Shai Fultheim <shai@scalemp.com>
> > > > > 
> > > > > bh_cachep is only written to once on initialization, so move it to
> > 
> > the
> > 
> > > > > __read_mostly section.
> > > > > 
> > > > > Signed-off-by: Shai Fultheim <shai@scalemp.com>
> > > > > Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> > > > > ---
> > > > > 
> > > > >  fs/buffer.c |    2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > > > index ad5938c..838a9cf 100644
> > > > > --- a/fs/buffer.c
> > > > > +++ b/fs/buffer.c
> > > > > @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long,
> > > > > data)
> > > > > 
> > > > >  /*
> > > > >  
> > > > >   * Buffer-head allocation
> > > > >   */
> > > > > 
> > > > > -static struct kmem_cache *bh_cachep;
> > > > > +static struct kmem_cache *bh_cachep __read_mostly;
> > > > 
> > > > hm, I thought I replied to this earlier, but I can't find that email.
> > > > 
> > > > Yes, bh_cachep is read-mostly.  In fact it's write-once.  But the same
> > > > is true of all kmem_cache*'s.  I don't see any particular reason for
> > > > singling out bh_cachep.
> > > > 
> > > > 
> > > > Alas, I don't see a smart way of addressing this.  It's either a
> > > > patchset which adds __read_mostly to all kmem_cache*'s, or a patchset
> > > > which converts all the definitions to use some nasty macro which
> > > > inserts the __read_mostly.
> > > 
> > > Well, it may be done. However my ability to properly check it is limited
> > 
> > as
> > 
> > > I have only a certain number of systems to check it on. I can create the
> > > patch, test it in our labs and post it on this mailing list with request
> > 
> > to
> > 
> > > test it on other platforms (like non-x86 platforms). However we may also
> > > hit the problem u describe below if do so...
> > > 
> > > > And I still have theoretical concerns with __read_mostly.  As we
> > > > further sort the storage into read-mostly and write-often sections,
> > > > the
> > > > density of writes in the write-mostly section increases.  IOW,
> > > > removing
> > > > the read-mostly padding *increase* cross-CPU traffic in the
> > > > write-often
> > > > scction.  IOW2, leaving the read-mostly stuff where it is provides
> > > > beneficial padding to the write-often fields.  I don't think it has
> > > > been shown that there will be net gains.
> > > 
> > > Great explanation! The above actually nicely concludes (maybe u haven't
> > > actually meant that ;)) why defining write-mostly section(s) is
> > 
> > pointless.
> > 
> > > ;)
> > > 
> > > This is a main topic of this (
> > 
> > http://markmail.org/thread/wl4lnjluroqxgabf)
> > 
> > > thread between me and Ingo.
> > > 
> > > However there is a clear motivation to define a read-mostly section(s)
> > 
> > just
> > 
> > > the same way there is a motivation to put constants separately from non-
> > > constant variables which I don't think anybody argues about. ;)
> > > 
> > > On the other hand, generally speaking, if we "complete the task" and put
> > 
> > ALL
> > 
> > > read-mostly variables into a separate section all the variables that
> > 
> > would
> > 
> > > be left will actually represent the write-mostly section, which we would
> > > prefer not to have (according to u). Yet we are still far from it
> > 
> > today...
> > 
> > > ;)
> > > 
> > > Unfortunately, we can't consider all types of bad C-code then we define
> > > something like "const" or "__read_mostly". We do our best. And if
> > > someone
> > > haven't defined a per-CPU write-mostly variable in order to prevent
> > > heavy
> > > cross-CPU traffic in his/her code (like in your example above) we can
> > 
> > only
> > 
> > > try to fix this code. But I don't think that the existence of such code
> > > shell imply that the whole idea of __read_mostly section is actually bad
> > 
> > or
> > 
> > > useless. It's this bad C-code that should be fixed and IMHO padding the
> > > variables with constants is not the proper way to fix it...
> > > 
> > > That's why I think it could be dangerous to go ahead and patch all
> > 
> > variables
> > 
> > > of a certain sort (like kmem_cache*'s) with __read_mostly as we may mess
> > > the things up in some places (like in your example above) where there
> > > should be done a deeper code analysis than just pattern matching.
> > > 
> > > So, getting back to the first section of my reply, do u still think we
> > 
> > want
> > 
> > > to patch all kmem_cache*'s with __read_mostly this time or we would
> > 
> > prefer
> > 
> > > this to be done incrementally in order to have better
> > > regression-ability?
> > > 
> > > Pls., comment.
> > 
> > Andrew, could u., pls., update what should be our next steps concerning
> > this
> > patch?
> > 
> > thanks,
> > vlad
> > 
> > > thanks in advance,
> > > vlad
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > 
> > in
> > 
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-02 12:13         ` Vlad Zolotarov
@ 2012-07-02 16:00           ` Chinmay V S
  2012-07-02 17:21             ` Vlad Zolotarov
  0 siblings, 1 reply; 13+ messages in thread
From: Chinmay V S @ 2012-07-02 16:00 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: Andrew Morton, viro, linux-fsdevel, linux-kernel, Shai

>On Sunday 01 July 2012 17:24:34 Chinmay V S wrote:
>> Played around with __read_mostly some time back in an attempt to optimise
>> cache usage.
>> thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html
>
>Nice article!

Thank you! :)

>U see, if we assume that __read_mostly the way it's currently implemented is a
>bad thing (due to the fact that it implicitly causes the bunching of the
>write-mostly variables) than this means that "const" the way it's currently
>implemented is a bad thing too due to the same reasons. ;)

Every individual instance of __read_mostly may NOT degrade performance.
What *will* degrade the performance is "excessive" use of __read_mostly.
An interesting discussion on similar lines here[2].

>This is an interesting idea however there is (at least) one weakness in it -
>it assumes that linker's heuristics (those that will pack cons and non-const
>variables together in a single cache line) will do a better job than a person
>that writes a specific code section and knows the exact nature and the
>relationship between variables in his/her C code.

True.

>First of all, let me note that saying that C code performance may benefit from
>NOT using the __read_mostly variables is a bit misleading because here u rely
>on something that is not deterministic: a linker decision to pack one (often
>written) variable together with another (read mostly) variable in a single
>cache line boundaries (thus improving the performance); and this decision may
>change due to a minor code change and u will not know about it.

I totally agree that avoiding use of __read_mostly does NOT guarantee any
performance boost. The point i am trying to make is this:

1. Consider a code with NO instances of __read_mostly.

2. Now we go ahead and add __read_mostly to an object.
Note that we are NOT guaranteed that this object is "hot" i.e. accessed
frequently. All that __read_mostly signifies is that the object is rarely
written to i.e. most of the time it is accessed, it is a read operation.

Cost-Benefit analysis:
Currently each CPU keeps its own copy of the __read_mostly(variable) in the
per-cpu L1 cache(any benefits on non-SMP systems?). As the variable is rarely
written to, rarely do we need to sync it across multiple L1 caches i.e.
cacheline-bouncing is very rare.
So the cost is very less.

As the variable is maintained in L1 cache, rather than being shared across
multiple CPUs in L2 or L3 cache, the access is an order of magnitude faster.
Hence the benefit is very high.

3. We continue adding __read_mostly to other genuine read-mostly objects.

As we continue to increase the number of __read_mostly objects, they get moved
from bss to .data.read_mostly section. This IMHO, increase the chances
(as compared to earlier without __read_mostly) that 2 objects in the bss
compete for the same cache-line. But this is NOT directly evident as modern
cpu-caches are N-way associative i.e .each object has a choice of N different
cache-slots. This tends to intially hide the effect of __read_mostly.
(This is the point i make in my article[1]).

After a few iterations of adding __read_mostly, (if) the cache contention
increases to more than N objects competing for the same cache-slot. False
cache-line sharing occurs i.e. 2 or more objects continue to replace
one another from the cache-slot alternatively. i.e cache-thrashing begins.

Note that false cache-line sharing is NOT a one time cost. Cache thrashing will
continue to happen until the context changes sufficiently for one of the
cache-slots to free-up. Hence this scenario must be avoided at all costs.

>I agree that there might be a few places in kernel that suffer from the
>weakness described above. That's why it's important to add __read_mostly in
>small portions (even one by one) in order to ease the bisect if and when the
>performance regression occurs.

Exactly! So we can conclude that "excessive" use of __read_mostly must be
avoided. "Excessive" varies from system to system based on:
- Degree of SMP (no.of cores).
- Levels of cache (and penalties associated between successive levels).
- Associativity of caches.

Without proper understanding of these params, __read_mostly with be a "mostly"
hit-n-miss affair. In fact a quick grep shows ~1300 __read_mostly  scattered
around the kernel code(3.4-rc1) which on certain systems is already detrimental.
Certain architectures(eg-ARM) completely disable __read_mostly as its evident by
their 2-way associative cache that cache-thrashing will occur so quickly that it
voids any potential performance gains.

[1] thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html

[2] fixunix.com/kernel/262711-rfc-remove-__read_mostly.html

regards
ChinmayVS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-02 16:00           ` Chinmay V S
@ 2012-07-02 17:21             ` Vlad Zolotarov
  2012-07-02 18:24               ` Chinmay V S
  0 siblings, 1 reply; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-02 17:21 UTC (permalink / raw)
  To: Chinmay V S; +Cc: Andrew Morton, viro, linux-fsdevel, linux-kernel, Shai

On Monday 02 July 2012 21:30:04 Chinmay V S wrote:
> >On Sunday 01 July 2012 17:24:34 Chinmay V S wrote:
> >> Played around with __read_mostly some time back in an attempt to optimise
> >> cache usage.
> >> thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.htm
> >> l
> >
> >Nice article!
> 
> Thank you! :)
> 
> >U see, if we assume that __read_mostly the way it's currently implemented
> >is a bad thing (due to the fact that it implicitly causes the bunching of
> >the write-mostly variables) than this means that "const" the way it's
> >currently implemented is a bad thing too due to the same reasons. ;)
> 
> Every individual instance of __read_mostly may NOT degrade performance.
> What *will* degrade the performance is "excessive" use of __read_mostly.
> An interesting discussion on similar lines here[2].
> 
> >This is an interesting idea however there is (at least) one weakness in it
> >- it assumes that linker's heuristics (those that will pack cons and
> >non-const variables together in a single cache line) will do a better job
> >than a person that writes a specific code section and knows the exact
> >nature and the relationship between variables in his/her C code.
> 
> True.
> 
> >First of all, let me note that saying that C code performance may benefit
> >from NOT using the __read_mostly variables is a bit misleading because
> >here u rely on something that is not deterministic: a linker decision to
> >pack one (often written) variable together with another (read mostly)
> >variable in a single cache line boundaries (thus improving the
> >performance); and this decision may change due to a minor code change and
> >u will not know about it.
> 
> I totally agree that avoiding use of __read_mostly does NOT guarantee any
> performance boost. The point i am trying to make is this:
> 
> 1. Consider a code with NO instances of __read_mostly.
> 
> 2. Now we go ahead and add __read_mostly to an object.
> Note that we are NOT guaranteed that this object is "hot" i.e. accessed
> frequently. All that __read_mostly signifies is that the object is rarely
> written to i.e. most of the time it is accessed, it is a read operation.
> 
> Cost-Benefit analysis:
> Currently each CPU keeps its own copy of the __read_mostly(variable) in the
> per-cpu L1 cache(any benefits on non-SMP systems?). 

I think it's clear that the whole __read_mostly infrastructure targets the SMP 
platforms. I suppose __read_mostly should be defined to nothing for non-SMP 
kernels. I think Eric already suggested that somewhere on [2] thread.

> As the variable is
> rarely written to, rarely do we need to sync it across multiple L1 caches
> i.e. cacheline-bouncing is very rare.
> So the cost is very less.
> 
> As the variable is maintained in L1 cache, rather than being shared across
> multiple CPUs in L2 or L3 cache, the access is an order of magnitude faster.
> Hence the benefit is very high.
> 
> 3. We continue adding __read_mostly to other genuine read-mostly objects.
> 
> As we continue to increase the number of __read_mostly objects, they get
> moved from bss to .data.read_mostly section. This IMHO, increase the
> chances (as compared to earlier without __read_mostly) that 2 objects in
> the bss compete for the same cache-line. But this is NOT directly evident
> as modern cpu-caches are N-way associative i.e .each object has a choice of
> N different cache-slots. This tends to intially hide the effect of
> __read_mostly. (This is the point i make in my article[1]).
> 
> After a few iterations of adding __read_mostly, (if) the cache contention
> increases to more than N objects competing for the same cache-slot. False
> cache-line sharing occurs i.e. 2 or more objects continue to replace
> one another from the cache-slot alternatively. i.e cache-thrashing begins.

I think your point is clear and has actually been nicely stated at 
thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.htm (by 
u? :))

In the previous e-mail I tried to explain that a root-cause of the problem are 
describing above is not a __read_mostly infrastructure but a bad C code these 
2 variables a used at. The code that doesn't handle the SMP case properly. The 
code that should be fixed. For instance by defining these two frequently used 
variables as __cache_aligned, which would separate them to different cache 
lines. 

Also note that in your reasoning above u are talking about the chances, which 
in particular means that the problem u have described may happen even without 
the usage of __read_mostly qualifiers - simply because the code is bad. What 
I'm proposing is to fix the code that leads to performance degradation and 
this will resolve the problem for good. For good - is very deterministic... ;)

> 
> Note that false cache-line sharing is NOT a one time cost. Cache thrashing
> will continue to happen until the context changes sufficiently for one of
> the cache-slots to free-up. Hence this scenario must be avoided at all
> costs.
> >I agree that there might be a few places in kernel that suffer from the
> >weakness described above. That's why it's important to add __read_mostly in
> >small portions (even one by one) in order to ease the bisect if and when
> >the performance regression occurs.
> 
> Exactly! So we can conclude that "excessive" use of __read_mostly must be
> avoided. "Excessive" varies from system to system based on:
> - Degree of SMP (no.of cores).
> - Levels of cache (and penalties associated between successive levels).
> - Associativity of caches.

The fact that this sort of problems in the code is hard to catch is by itself 
very disturbing and IMHO we should praise anything that helps us to find such 
points in the code. 

What I'm trying to say is that although your reasoning and understanding the 
problem is absolutely correct but the conclusions u jump to are very different 
from those of mine. 

U say: we should be careful (because there is a problematic code with 
frequently written variables) and avoid the excessive usage of __read_mostly 
qualifiers.
I say: we shell go and add __read_mostly where ever we see objects with this 
sort of nature but we shell add them one-by-one in order to be able to find 
the specific variable which was responsible for the degradation in performance 
once it occurs and then find the frequently written objects that were actually 
harmed and fix them appropriately.

> 
> Without proper understanding of these params, __read_mostly with be a
> "mostly" hit-n-miss affair. In fact a quick grep shows ~1300 __read_mostly 
> scattered around the kernel code(3.4-rc1) which on certain systems is
> already detrimental. Certain architectures(eg-ARM) completely disable
> __read_mostly as its evident by their 2-way associative cache that
> cache-thrashing will occur so quickly that it voids any potential
> performance gains.

Again, I disagree with your perception of the problem: we shell say "Thank u!" 
to the one who has invented __read_mostly because it helps us to clean up our 
code!.. ;) Especially on ARM, where, like u said, cache is specifically 
sensitive to bad C code, which doesn't properly handle frequently written 
objects in an SMP environment. 

To put what I've said above in a single sentence: if there is performance 
degradation due to usage of __read_mostly qualifier, then there is a general 
problem with the code that will eventually hit the performance in the future 
(when one adds/removes some variables). 

So, IMHO instead of blaming the __read_mostly we'd better (use it and) fix the 
code that is not written right (for SMP).

Thanks,
vlad

> 
> [1]
> thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html
> 
> [2] fixunix.com/kernel/262711-rfc-remove-__read_mostly.html
> 
> regards
> ChinmayVS
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-02 17:21             ` Vlad Zolotarov
@ 2012-07-02 18:24               ` Chinmay V S
  2012-07-03  9:05                 ` Vlad Zolotarov
  2012-07-03 21:23                 ` Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Chinmay V S @ 2012-07-02 18:24 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: Andrew Morton, viro, linux-fsdevel, linux-kernel, Shai

Hi Vlad,

I suppose we both are looking at opposite sides of the same coin.
While i am wary of the potential pitfalls,
you have focused on the benefits of using __read_mostly.

At this point i would like to send out a shout to everyone concerned to please
clarify the status of __read_mostly:

1. Is it being actively pursued? (systems that *will* clearly benefit from it)
2. Any actual results on real world use-cases? (w/ & w/o __read_mostly)
3. Is __read_mostly being accepted in non-architecture specific kernel code?
4. Is anyone working on a potential replacement for it?

regards
ChinmayVS

>I think your point is clear and has actually been nicely stated at
>thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.htm
>(by u? :))

PS: Yes, the link i referred to *is* indeed my own article that i posted a few
months ago after my experiments with __read_mostly. I was initially excited when
i found this bit while digging through the code. But upon seeing that an entire
arch had disabled it, and then understanding why, i feel its usage needs to be
restricted to arch specific code and even then thoroughly tested too.
(then again, i may be wrong.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-02 18:24               ` Chinmay V S
@ 2012-07-03  9:05                 ` Vlad Zolotarov
  2012-07-03 21:23                 ` Andrew Morton
  1 sibling, 0 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-03  9:05 UTC (permalink / raw)
  To: Chinmay V S; +Cc: Andrew Morton, viro, linux-fsdevel, linux-kernel, Shai

On Monday 02 July 2012 23:54:09 Chinmay V S wrote:
> Hi Vlad,
> 
> I suppose we both are looking at opposite sides of the same coin.
> While i am wary of the potential pitfalls,
> you have focused on the benefits of using __read_mostly.

Exactly! I'm also afraid that u refuse to agree that those "pitfalls" u are 
referring are not caused by the __read_mostly but by the bad code that HAVE to 
be fixed. And that this bad code remains bad whether u use __read_mostly or 
not. I think this matter has been nicely covered in the thread u've referred 
before: thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-
it.html .

So, again, my statements are as follows:
1) There are no "pitfalls" in the __read_mostly usage/infrastructure - there 
is a bad code that is being revealed by the usage of __read_mostly.
2) The bad code mentioned above is bad and may regress the performance 
regardless the usage of __read_mostly, thus it must be fixed.
3) From (1) and (2) above follows that __read_mostly is a tool that helps to 
discover the bad code thus it should be used as much as possible to do so.
4) To make the discovery of bad code easy __read_mostly qualifiers should be 
added one-by-one despite the natural desire to replace all variables of a 
certain type/nature (like "struct kmem_cache") in one shot.
5) __read_mostly as it's implemented today for x86 arch is good for any SMP 
architecture that have L1 cache. More than that, the less is the level of the 
associativity in the L1 cache the more this platform's performance is 
susceptible to the bad code mentioned in (1) and (2). Therefore according to 
(3) these platforms should use __read_mostly both to gain from straight 
forward benefits of the __read_mostly mechanism and to locate the bad code 
mentioned above and fix it. ARM is one of such platforms and there must have 
been REALLY GOOD reason not to use it. 

Pls., comment.

By the way, note what is the current implementation of __read_mostly in the 
Linus tree:

------------------------------------------------------------------
arch/arm/include/asm/cache.h: line 26:

#define __read_mostly __attribute__((__section__(".data..read_mostly")))
-----------------------------------------------------------------

It's there since 2010-12-04, added by Russell King. Why did u think it's 
defined to nothing? Pls., correct me if I'm missing anything.

> 
> At this point i would like to send out a shout to everyone concerned to
> please clarify the status of __read_mostly:
> 
> 1. Is it being actively pursued? (systems that *will* clearly benefit from 
it) 
> 2. Any actual results on real world use-cases? (w/ & w/o __read_mostly)
> 3. Is __read_mostly being accepted in non-architecture specific kernel code? 
> 4. Is anyone working on a potential replacement for it?

I think u'll find the most answers at  thecodeartist.blogspot.com/2011/12/why-
readmostly-does-not-work-as-it.html ;)

thanks,
vlad

> 
> regards
> ChinmayVS
> 
> >I think your point is clear and has actually been nicely stated at
> >thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.htm
> >(by u? :))
> 
> PS: Yes, the link i referred to *is* indeed my own article that i posted a
> few months ago after my experiments with __read_mostly. I was initially
> excited when i found this bit while digging through the code. But upon
> seeing that an entire arch had disabled it, and then understanding why, i
> feel its usage needs to be restricted to arch specific code and even then
> thoroughly tested too. (then again, i may be wrong.)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-05-28 11:58 [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section Vlad Zolotarov
  2012-06-07  8:23 ` Vlad Zolotarov
  2012-06-08  0:07 ` Andrew Morton
@ 2012-07-03  9:08 ` Vlad Zolotarov
  2 siblings, 0 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-03  9:08 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, Shai

On Monday 28 May 2012 14:58:42 Vlad Zolotarov wrote:
> From: Shai Fultheim <shai@scalemp.com>
> 
> bh_cachep is only written to once on initialization, so move it to the
> __read_mostly section.
> 
> Signed-off-by: Shai Fultheim <shai@scalemp.com>
> Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
> ---
>  fs/buffer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ad5938c..838a9cf 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3152,7 +3152,7 @@ SYSCALL_DEFINE2(bdflush, int, func, long, data)
>  /*
>   * Buffer-head allocation
>   */
> -static struct kmem_cache *bh_cachep;
> +static struct kmem_cache *bh_cachep __read_mostly;
> 
>  /*
>   * Once the number of bh's in the machine exceeds this level, we start

Al, my apologies, I missed that the patch has been applied on May 15-th. :)

thanks,
vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-02 18:24               ` Chinmay V S
  2012-07-03  9:05                 ` Vlad Zolotarov
@ 2012-07-03 21:23                 ` Andrew Morton
  2012-07-04  9:08                   ` Vlad Zolotarov
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2012-07-03 21:23 UTC (permalink / raw)
  To: Chinmay V S; +Cc: Vlad Zolotarov, viro, linux-fsdevel, linux-kernel, Shai

On Mon, 2 Jul 2012 23:54:09 +0530
Chinmay V S <cvs268@gmail.com> wrote:

> Hi Vlad,
> 
> I suppose we both are looking at opposite sides of the same coin.
> While i am wary of the potential pitfalls,
> you have focused on the benefits of using __read_mostly.
> 
> At this point i would like to send out a shout to everyone concerned to please
> clarify the status of __read_mostly:
> 
> 1. Is it being actively pursued? (systems that *will* clearly benefit from it)
> 2. Any actual results on real world use-cases? (w/ & w/o __read_mostly)
> 3. Is __read_mostly being accepted in non-architecture specific kernel code?
> 4. Is anyone working on a potential replacement for it?

I don't recall ever having seen any serious work justifying or
condemning __read_mostly.  It's one of those things which seemed a good
idea at the time, got added and nobody did anything further with it, as
far as I know.  I've always been rather wobbly about it, for reasons
discussed up-thread.

As for this particular patch: I've not seen any reason to do anything
with it, because the changelog doesn't describe why the patch is
needed.  If some performance problem has been encountered then that
should have been fully described in the patch changelog.

If some problem has indeed been observed then an alternative to
__read_mostly is to pad bh_cachep to a cacheline boundary with
____cacheline_aligned_in_smp or similar.  Or, perhaps better, identify
the variable which bh_cachep is sharing with, and pad *that* variable
to a cacheline so it can't cause cache thrashing with something else in
the future.  And once this mystery variable is identified, we can
perhaps beneficially group it into the same cacheline with some other
variables which are related to it.

But I'm madly guessing and can't say aything dispositive or even
useful, because we haven't been told what the problem was.  Still.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section
  2012-07-03 21:23                 ` Andrew Morton
@ 2012-07-04  9:08                   ` Vlad Zolotarov
  0 siblings, 0 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2012-07-04  9:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chinmay V S, viro, linux-fsdevel, linux-kernel, Shai

On Tuesday 03 July 2012 14:23:48 Andrew Morton wrote:
> On Mon, 2 Jul 2012 23:54:09 +0530
> 
> Chinmay V S <cvs268@gmail.com> wrote:
> > Hi Vlad,
> > 
> > I suppose we both are looking at opposite sides of the same coin.
> > While i am wary of the potential pitfalls,
> > you have focused on the benefits of using __read_mostly.
> > 
> > At this point i would like to send out a shout to everyone concerned to
> > please clarify the status of __read_mostly:
> > 
> > 1. Is it being actively pursued? (systems that *will* clearly benefit from
> > it) 2. Any actual results on real world use-cases? (w/ & w/o
> > __read_mostly) 3. Is __read_mostly being accepted in non-architecture
> > specific kernel code? 4. Is anyone working on a potential replacement for
> > it?
> 

Andrew, thanks for joining! :)

> I don't recall ever having seen any serious work justifying or
> condemning __read_mostly.  

Back in the thread I've pointed out that __read_mostly variables (are meant 
to) have actually the same nature as the constants except the fact that they 
are initialized with the some value that may change from one boot (of the 
kernel) to another: e.g. kalloc() returned value, values initialized in __init 
functions, etc.

Have u ever seen any serious work justifying or condemning the usage for 
"const" qualifier? 
If u did, (could u, pls., point us to it) the same reasoning hold in the 
context of __read_mostly. 
If not - do u think somebody should perform such a work? Or maybe we all 
understand that if there a constant in my code I should define it as const? 
And we all agree on const, why do we still arguing about __read_mostly which 
are actually the same?

> It's one of those things which seemed a good idea at the time, got added and 
> nobody did anything further with it, as far as I know.  

Hmmm... ~1300 __read_mostly appearances in the kernel doesn't sound like 
nothing to me... ;) People use it! ;) 

> I've always been rather wobbly about it, for reasons discussed up-thread.

Could u, pls., comment on the following statements that conclude this thread:

1) There are no "pitfalls" in the __read_mostly usage/infrastructure - there 
is a bad code that is being revealed by the usage of __read_mostly.
2) The bad code mentioned above is bad and may regress the performance 
regardless the usage of __read_mostly, thus it must be fixed.
3) From (1) and (2) above follows that __read_mostly is a tool that helps to 
discover the bad code thus it should be used as much as possible to do so.
4) To make the discovery of bad code easy __read_mostly qualifiers should be 
added one-by-one despite the natural desire to replace all variables of a 
certain type/nature (like "struct kmem_cache") in one shot.
5) __read_mostly as it's implemented today for x86 arch is good for any SMP 
architecture that have L1 cache. More than that, the less is the level of the 
associativity in the L1 cache the more this platform's performance is 
susceptible to the bad code mentioned in (1) and (2). Therefore according to 
(3) these platforms should use __read_mostly both to gain from straight 
forward benefits of the __read_mostly mechanism and to locate the bad code 
mentioned above and fix it. ARM is one of such platforms and there must have 
been REALLY GOOD reason not to use it.

> 
> 
> If some problem has indeed been observed then an alternative to
> __read_mostly is to pad bh_cachep to a cacheline boundary with
> ____cacheline_aligned_in_smp or similar.  Or, perhaps better, identify
> the variable which bh_cachep is sharing with, and pad *that* variable
> to a cacheline so it can't cause cache thrashing with something else in
> the future.  And once this mystery variable is identified, we can
> perhaps beneficially group it into the same cacheline with some other
> variables which are related to it.

Again, I don't understand it - we are talking about a constant here. Why to 
pad it or do anything else except for defining it as a constant? The one who 
originally wrote this line (with bh_cachep) either missed that fact or there 
wasn't __read_mostly infrastructure at that time. Either way this patch fixes 
it. What else should be told? Why do we have to justify declaring the const 
object as const?

Thanks in advance for your comments.
vlad

> 
> But I'm madly guessing and can't say aything dispositive or even
> useful, because we haven't been told what the problem was.  Still.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-07-04  9:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-28 11:58 [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section Vlad Zolotarov
2012-06-07  8:23 ` Vlad Zolotarov
2012-06-08  0:07 ` Andrew Morton
2012-06-10  9:36   ` Vlad Zolotarov
2012-07-01 11:34     ` Vlad Zolotarov
     [not found]       ` <CAK-9PRBiqtyiQahunPex9FT2dtKA0k0gv9PR+=sNCXVvn5Bn9Q@mail.gmail.com>
2012-07-02 12:13         ` Vlad Zolotarov
2012-07-02 16:00           ` Chinmay V S
2012-07-02 17:21             ` Vlad Zolotarov
2012-07-02 18:24               ` Chinmay V S
2012-07-03  9:05                 ` Vlad Zolotarov
2012-07-03 21:23                 ` Andrew Morton
2012-07-04  9:08                   ` Vlad Zolotarov
2012-07-03  9:08 ` Vlad Zolotarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).