netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: percpu_to_op() misses memory and flags clobbers
@ 2009-04-01  8:13 Eric Dumazet
  2009-04-01  9:02 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2009-04-01  8:13 UTC (permalink / raw)
  To: Ingo Molnar, Tejun Heo; +Cc: linux kernel, Linux Netdev List, Joe Perches

While playing with new percpu_{read|write|add|sub} stuff in network tree,
I found x86 asm was a litle bit optimistic.

We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
memory and possibly eflags. We could add another parameter to percpu_to_op()
to separate the plain "mov" case (not changing eflags),
but let keep it simple for the moment.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..fd4f8ec 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -82,22 +82,26 @@ do {							\
 	case 1:						\
 		asm(op "b %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 2:						\
 		asm(op "w %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 4:						\
 		asm(op "l %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 8:						\
 		asm(op "q %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "re" ((T__)val));			\
+		    : "re" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	default: __bad_percpu_size();			\
 	}						\

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01  8:13 [PATCH] x86: percpu_to_op() misses memory and flags clobbers Eric Dumazet
@ 2009-04-01  9:02 ` Jeremy Fitzhardinge
  2009-04-01 10:14   ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-01  9:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches

Eric Dumazet wrote:
> While playing with new percpu_{read|write|add|sub} stuff in network tree,
> I found x86 asm was a litle bit optimistic.
>
> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
> memory and possibly eflags. We could add another parameter to percpu_to_op()
> to separate the plain "mov" case (not changing eflags),
> but let keep it simple for the moment.
>   

Did you observe an actual failure that this patch fixed?

> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index aee103b..fd4f8ec 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -82,22 +82,26 @@ do {							\
>  	case 1:						\
>  		asm(op "b %1,"__percpu_arg(0)		\
>  		    : "+m" (var)			\
> -		    : "ri" ((T__)val));			\
> +		    : "ri" ((T__)val)			\
> +		    : "memory", "cc");			\
>   

This shouldn't be necessary.   The "+m" already tells gcc that var is a 
memory input and output, and there are no other memory side-effects 
which it needs to be aware of; clobbering "memory" will force gcc to 
reload all register-cached memory, which is a pretty hard hit.  I think 
all asms implicitly clobber "cc", so that shouldn't have any effect, but 
it does no harm.

Now, its true that the asm isn't actually modifying var itself, but 
%gs:var, which is a different location.  But from gcc's perspective that 
shouldn't matter because var makes a perfectly good proxy for that 
location, and will make sure it correctly order all accesses to var.

I'd be surprised if this were broken, because we'd be seeing all sorts 
of strange crashes all over the place.  We've seen it before when the 
old x86-64 pda code didn't have proper constraints on its asm statements.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01  9:02 ` Jeremy Fitzhardinge
@ 2009-04-01 10:14   ` Eric Dumazet
  2009-04-01 16:12     ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2009-04-01 10:14 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches

Jeremy Fitzhardinge a écrit :
> Eric Dumazet wrote:
>> While playing with new percpu_{read|write|add|sub} stuff in network tree,
>> I found x86 asm was a litle bit optimistic.
>>
>> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
>> memory and possibly eflags. We could add another parameter to
>> percpu_to_op()
>> to separate the plain "mov" case (not changing eflags),
>> but let keep it simple for the moment.
>>   
> 
> Did you observe an actual failure that this patch fixed?
> 

Not in current tree, as we dont use yet percpu_xxxx() very much.

If deployed for SNMP mibs with hundred of call sites,
can you guarantee it will work as is ?

>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>> diff --git a/arch/x86/include/asm/percpu.h
>> b/arch/x86/include/asm/percpu.h
>> index aee103b..fd4f8ec 100644
>> --- a/arch/x86/include/asm/percpu.h
>> +++ b/arch/x86/include/asm/percpu.h
>> @@ -82,22 +82,26 @@ do {                            \
>>      case 1:                        \
>>          asm(op "b %1,"__percpu_arg(0)        \
>>              : "+m" (var)            \
>> -            : "ri" ((T__)val));            \
>> +            : "ri" ((T__)val)            \
>> +            : "memory", "cc");            \
>>   
> 
> This shouldn't be necessary.   The "+m" already tells gcc that var is a
> memory input and output, and there are no other memory side-effects
> which it needs to be aware of; clobbering "memory" will force gcc to
> reload all register-cached memory, which is a pretty hard hit.  I think
> all asms implicitly clobber "cc", so that shouldn't have any effect, but
> it does no harm.


So, we can probably cleanup many asms in tree :)

static inline void __down_read(struct rw_semaphore *sem)
{
        asm volatile("# beginning down_read\n\t"
                     LOCK_PREFIX "  incl      (%%eax)\n\t"
                     /* adds 0x00000001, returns the old value */
                     "  jns        1f\n"
                     "  call call_rwsem_down_read_failed\n"
                     "1:\n\t"
                     "# ending down_read\n\t"
                     : "+m" (sem->count)
                     : "a" (sem)
                     : "memory", "cc");
}




> 
> Now, its true that the asm isn't actually modifying var itself, but
> %gs:var, which is a different location.  But from gcc's perspective that
> shouldn't matter because var makes a perfectly good proxy for that
> location, and will make sure it correctly order all accesses to var.
> 
> I'd be surprised if this were broken, because we'd be seeing all sorts
> of strange crashes all over the place.  We've seen it before when the
> old x86-64 pda code didn't have proper constraints on its asm statements.

I was not saying it is broken, but a "litle bit optimistic" :)

Better be safe than sorry, because those errors are very hard to track, since
it depends a lot on gcc being aggressive or not. I dont have time to test
all gcc versions all over there.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 10:14   ` Eric Dumazet
@ 2009-04-01 16:12     ` Ingo Molnar
  2009-04-01 16:41       ` Jeremy Fitzhardinge
  2009-04-01 17:13       ` Eric Dumazet
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-04-01 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jeremy Fitzhardinge, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell


* Eric Dumazet <dada1@cosmosbay.com> wrote:

> Jeremy Fitzhardinge a écrit :
> > Eric Dumazet wrote:
> >> While playing with new percpu_{read|write|add|sub} stuff in network tree,
> >> I found x86 asm was a litle bit optimistic.
> >>
> >> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
> >> memory and possibly eflags. We could add another parameter to
> >> percpu_to_op()
> >> to separate the plain "mov" case (not changing eflags),
> >> but let keep it simple for the moment.
> >>   
> > 
> > Did you observe an actual failure that this patch fixed?
> > 
> 
> Not in current tree, as we dont use yet percpu_xxxx() very much.
> 
> If deployed for SNMP mibs with hundred of call sites,
> can you guarantee it will work as is ?

Do we "guarantee" it for you? No.

Is it expected to work just fine? Yes.

Are there any known bugs in this area? No.

Will we fix it if it's demonstrated to be broken? Of course! :-)

[ Btw., it's definitely cool that you will make heavy use for it for 
  SNMP mib statistics - please share with us your experiences with 
  the facilities - good or bad experiences alike! ]

> >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> >>
> >> diff --git a/arch/x86/include/asm/percpu.h
> >> b/arch/x86/include/asm/percpu.h
> >> index aee103b..fd4f8ec 100644
> >> --- a/arch/x86/include/asm/percpu.h
> >> +++ b/arch/x86/include/asm/percpu.h
> >> @@ -82,22 +82,26 @@ do {                            \
> >>      case 1:                        \
> >>          asm(op "b %1,"__percpu_arg(0)        \
> >>              : "+m" (var)            \
> >> -            : "ri" ((T__)val));            \
> >> +            : "ri" ((T__)val)            \
> >> +            : "memory", "cc");            \
> >>   
> > 
> > This shouldn't be necessary.   The "+m" already tells gcc that var is a
> > memory input and output, and there are no other memory side-effects
> > which it needs to be aware of; clobbering "memory" will force gcc to
> > reload all register-cached memory, which is a pretty hard hit.  I think
> > all asms implicitly clobber "cc", so that shouldn't have any effect, but
> > it does no harm.
> 
> 
> So, we can probably cleanup many asms in tree :)
> 
> static inline void __down_read(struct rw_semaphore *sem)
> {
>         asm volatile("# beginning down_read\n\t"
>                      LOCK_PREFIX "  incl      (%%eax)\n\t"
>                      /* adds 0x00000001, returns the old value */
>                      "  jns        1f\n"
>                      "  call call_rwsem_down_read_failed\n"
>                      "1:\n\t"
>                      "# ending down_read\n\t"
>                      : "+m" (sem->count)
>                      : "a" (sem)
>                      : "memory", "cc");
> }

Hm, what's your point with pasting this inline function?

> > Now, its true that the asm isn't actually modifying var itself, but
> > %gs:var, which is a different location.  But from gcc's perspective that
> > shouldn't matter because var makes a perfectly good proxy for that
> > location, and will make sure it correctly order all accesses to var.
> > 
> > I'd be surprised if this were broken, because we'd be seeing all sorts
> > of strange crashes all over the place.  We've seen it before when the
> > old x86-64 pda code didn't have proper constraints on its asm statements.
> 
> I was not saying it is broken, but a "litle bit optimistic" :)
> 
> Better be safe than sorry, because those errors are very hard to 
> track, since it depends a lot on gcc being aggressive or not. I 
> dont have time to test all gcc versions all over there.

Well, Jeremy has already made the valid point that your patch 
pessimises the constraints and hence likely causes worse code.

We can only apply assembly constraint patches that:

    either fix a demonstrated bug,

 or improve (speed up) the code emitted,

 or very rarely, we will apply patches that dont actually make the
    code worse (they are an invariant) but are perceived to be safer

This patch matches neither of these tests and in fact it will 
probably make the generated code worse.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 16:12     ` Ingo Molnar
@ 2009-04-01 16:41       ` Jeremy Fitzhardinge
  2009-04-01 16:44         ` Ingo Molnar
  2009-04-01 17:13       ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-01 16:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Ingo Molnar wrote:
>>                      : "memory", "cc");
>> }
>>     
>
> Hm, what's your point with pasting this inline function?
>   

He's pointing out the redundant (but harmless) "cc" clobber.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 16:41       ` Jeremy Fitzhardinge
@ 2009-04-01 16:44         ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-04-01 16:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric Dumazet, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>>>                      : "memory", "cc");
>>> }
>>>     
>>
>> Hm, what's your point with pasting this inline function?
>>   
>
> He's pointing out the redundant (but harmless) "cc" clobber.

ah, yes. We are completely inconsistent about that. It doesnt
matter on x86 so i guess could be removed everywhere.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 16:12     ` Ingo Molnar
  2009-04-01 16:41       ` Jeremy Fitzhardinge
@ 2009-04-01 17:13       ` Eric Dumazet
  2009-04-01 18:07         ` Jeremy Fitzhardinge
  2009-04-01 18:44         ` [RFC] percpu: convert SNMP mibs to new infra Eric Dumazet
  1 sibling, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2009-04-01 17:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Ingo Molnar a écrit :
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> Jeremy Fitzhardinge a écrit :
>>> Eric Dumazet wrote:
>>>> While playing with new percpu_{read|write|add|sub} stuff in network tree,
>>>> I found x86 asm was a litle bit optimistic.
>>>>
>>>> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
>>>> memory and possibly eflags. We could add another parameter to
>>>> percpu_to_op()
>>>> to separate the plain "mov" case (not changing eflags),
>>>> but let keep it simple for the moment.
>>>>   
>>> Did you observe an actual failure that this patch fixed?
>>>
>> Not in current tree, as we dont use yet percpu_xxxx() very much.
>>
>> If deployed for SNMP mibs with hundred of call sites,
>> can you guarantee it will work as is ?
> 
> Do we "guarantee" it for you? No.
> 
> Is it expected to work just fine? Yes.
> 
> Are there any known bugs in this area? No.

Good to know. So I shut up. I am a jerk and should blindly trust
linux kernel, sorry.

> 
> Will we fix it if it's demonstrated to be broken? Of course! :-)
> 
> [ Btw., it's definitely cool that you will make heavy use for it for 
>   SNMP mib statistics - please share with us your experiences with 
>   the facilities - good or bad experiences alike! ]

I tried but I miss kind of an indirect percpu_add() function.

because of Net namespaces, mibs are dynamically allocated, and
current percpu_add() works on static percpu only (because of added
per_cpu__ prefix)

#define percpu_add(var, val)   percpu_to_op("add", per_cpu__##var, val)

I tried adding :

#define dyn_percpu_add(var, val)   percpu_to_op("add", var, val)

But I dont know it this is the plan ?
Should we get rid of "per_cpu__" prefix and use a special ELF section/
marker instead ?

I have a patch to add percpu_inc() and percpu_dec(), I am not
sure its worth it...

[PATCH] percpu: Adds percpu_inc() and percpu_dec()

Increments and decrements are quite common operations for SNMP mibs.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..248be11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -103,6 +103,29 @@ do {							\
 	}						\
 } while (0)
 
+#define percpu_to_op0(op, var)				\
+do {							\
+	switch (sizeof(var)) {				\
+	case 1:						\
+		asm(op "b "__percpu_arg(0)		\
+		    : "+m" (var));			\
+		break;					\
+	case 2:						\
+		asm(op "w "__percpu_arg(0)		\
+		    : "+m" (var));			\
+		break;					\
+	case 4:						\
+		asm(op "l "__percpu_arg(0)		\
+		    : "+m" (var));			\
+		break;					\
+	case 8:						\
+		asm(op "q "__percpu_arg(0)		\
+		    : "+m" (var));			\
+		break;					\
+	default: __bad_percpu_size();			\
+	}						\
+} while (0)
+
 #define percpu_from_op(op, var)				\
 ({							\
 	typeof(var) ret__;				\
@@ -139,6 +162,8 @@ do {							\
 #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
 #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
 #define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
+#define percpu_inc(var)		percpu_to_op0("inc", per_cpu__##var)
+#define percpu_dec(var)		percpu_to_op0("dec", per_cpu__##var)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define x86_test_and_clear_bit_percpu(bit, var)				\
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index 00f45ff..c57357e 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -120,6 +120,14 @@ do {									\
 # define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
 #endif
 
+#ifndef percpu_inc
+# define percpu_inc(var)			do { percpu_add(var, 1); } while (0)
+#endif
+
+#ifndef percpu_dec
+# define percpu_dec(var)			do { percpu_sub(var, 1); } while (0)
+#endif
+
 #ifndef percpu_and
 # define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
 #endif

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 17:13       ` Eric Dumazet
@ 2009-04-01 18:07         ` Jeremy Fitzhardinge
  2009-04-01 18:47           ` Eric Dumazet
  2009-04-02  9:52           ` Herbert Xu
  2009-04-01 18:44         ` [RFC] percpu: convert SNMP mibs to new infra Eric Dumazet
  1 sibling, 2 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-01 18:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Eric Dumazet wrote:
> +#define percpu_inc(var)		percpu_to_op0("inc", per_cpu__##var)
> +#define percpu_dec(var)		percpu_to_op0("dec", per_cpu__##var)
>   

There's probably not a lot of value in this.  The Intel and AMD 
optimisation guides tend to deprecate inc/dec in favour of using 
add/sub, because the former can cause pipeline stalls due to its partial 
flags update.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC] percpu: convert SNMP mibs to new infra
  2009-04-01 17:13       ` Eric Dumazet
  2009-04-01 18:07         ` Jeremy Fitzhardinge
@ 2009-04-01 18:44         ` Eric Dumazet
  2009-04-02  0:13           ` Tejun Heo
  2009-04-02  5:04           ` [RFC] " Rusty Russell
  1 sibling, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2009-04-01 18:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Eric Dumazet a écrit :
> Ingo Molnar a écrit :
>>
>> [ Btw., it's definitely cool that you will make heavy use for it for 
>>   SNMP mib statistics - please share with us your experiences with 
>>   the facilities - good or bad experiences alike! ]
> 
> I tried but I miss kind of an indirect percpu_add() function.
> 
> because of Net namespaces, mibs are dynamically allocated, and
> current percpu_add() works on static percpu only (because of added
> per_cpu__ prefix)
> 
> #define percpu_add(var, val)   percpu_to_op("add", per_cpu__##var, val)
> 
> I tried adding :
> 
> #define dyn_percpu_add(var, val)   percpu_to_op("add", var, val)
> 
> But I dont know it this is the plan ?
> Should we get rid of "per_cpu__" prefix and use a special ELF section/
> marker instead ?
> 

Here is a preliminary patch for SNMP mibs that seems to work well on x86_32

[RFC] percpu: convert SNMP mibs to new infra

Some arches can use percpu infrastructure for safe changes to mibs.
(percpu_add() is safe against preemption and interrupts), but
we want the real thing (a single instruction), not an emulation.

On arches still using an emulation, its better to keep the two views
per mib and preemption disable/enable

This shrinks size of mibs by 50%, but also shrinks vmlinux text size
(minimum IPV4 config)

$ size vmlinux.old vmlinux.new
   text    data     bss     dec     hex filename
4308458  561092 1728512 6598062  64adae vmlinux.old
4303834  561092 1728512 6593438  649b9e vmlinux.new



Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 arch/x86/include/asm/percpu.h |    3 +++
 include/net/snmp.h            |   27 ++++++++++++++++++++++-----
 net/ipv4/af_inet.c            |   28 +++++++++++++++++++---------
 3 files changed, 44 insertions(+), 14 deletions(-)


diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..6b82f6b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -135,6 +135,9 @@ do {							\
 #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
 #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
 #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
+#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
+#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
 #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
 #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
 #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 57c9362..ef9ed31 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -123,15 +123,31 @@ struct linux_xfrm_mib {
 };
 
 /* 
- * FIXME: On x86 and some other CPUs the split into user and softirq parts
+ * On x86 and some other CPUs the split into user and softirq parts
  * is not needed because addl $1,memory is atomic against interrupts (but 
- * atomic_inc would be overkill because of the lock cycles). Wants new 
- * nonlocked_atomic_inc() primitives -AK
+ * atomic_inc would be overkill because of the lock cycles).
  */ 
+#ifdef CONFIG_X86
+# define SNMP_ARRAY_SZ 1
+#else
+# define SNMP_ARRAY_SZ 2
+#endif
+
 #define DEFINE_SNMP_STAT(type, name)	\
-	__typeof__(type) *name[2]
+	__typeof__(type) *name[SNMP_ARRAY_SZ]
 #define DECLARE_SNMP_STAT(type, name)	\
-	extern __typeof__(type) *name[2]
+	extern __typeof__(type) *name[SNMP_ARRAY_SZ]
+
+#if SNMP_ARRAY_SZ == 1
+#define SNMP_INC_STATS(mib, field)	indir_percpu_inc(&mib[0]->mibs[field])
+#define SNMP_INC_STATS_BH(mib, field)	SNMP_INC_STATS(mib, field)
+#define SNMP_INC_STATS_USER(mib, field) SNMP_INC_STATS(mib, field)
+#define SNMP_DEC_STATS(mib, field)	indir_percpu_dec(&mib[0]->mibs[field])
+#define SNMP_ADD_STATS_BH(mib, field, addend) 	\
+				indir_percpu_add(&mib[0]->mibs[field], addend)
+#define SNMP_ADD_STATS_USER(mib, field, addend) 	\
+				indir_percpu_add(&mib[0]->mibs[field], addend)
+#else
 
 #define SNMP_STAT_BHPTR(name)	(name[0])
 #define SNMP_STAT_USRPTR(name)	(name[1])
@@ -160,5 +176,6 @@ struct linux_xfrm_mib {
 		per_cpu_ptr(mib[1], get_cpu())->mibs[field] += addend; \
 		put_cpu(); \
 	} while (0)
+#endif
 
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7f03373..badb568 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1366,27 +1366,37 @@ unsigned long snmp_fold_field(void *mib[], int offt)
 
 	for_each_possible_cpu(i) {
 		res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
+#if SNMP_ARRAY_SZ == 2
 		res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+#endif
 	}
 	return res;
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
-int snmp_mib_init(void *ptr[2], size_t mibsize)
+int snmp_mib_init(void *ptr[SNMP_ARRAY_SZ], size_t mibsize)
 {
 	BUG_ON(ptr == NULL);
 	ptr[0] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
 	if (!ptr[0])
-		goto err0;
+		return -ENOMEM;
+#if SNMP_ARRAY_SZ == 2
 	ptr[1] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
-	if (!ptr[1])
-		goto err1;
+	if (!ptr[1]) {
+		free_percpu(ptr[0]);
+		ptr[0] = NULL;
+		return -ENOMEM;
+	}
+#endif
+	{
+	int i;
+	printk(KERN_INFO "snmp_mib_init(%u) %p ", (unsigned int)mibsize, ptr[0]);
+	for_each_possible_cpu(i) {
+		printk(KERN_INFO "%p ", per_cpu_ptr(ptr[0], i));
+		}
+	printk(KERN_INFO "\n");
+	}
 	return 0;
-err1:
-	free_percpu(ptr[0]);
-	ptr[0] = NULL;
-err0:
-	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_init);
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 18:07         ` Jeremy Fitzhardinge
@ 2009-04-01 18:47           ` Eric Dumazet
  2009-04-02  9:52           ` Herbert Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2009-04-01 18:47 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Tejun Heo, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Jeremy Fitzhardinge a écrit :
> Eric Dumazet wrote:
>> +#define percpu_inc(var)        percpu_to_op0("inc", per_cpu__##var)
>> +#define percpu_dec(var)        percpu_to_op0("dec", per_cpu__##var)
>>   
> 
> There's probably not a lot of value in this.  The Intel and AMD
> optimisation guides tend to deprecate inc/dec in favour of using
> add/sub, because the former can cause pipeline stalls due to its partial
> flags update.
> 
>    J

Sure, but this saves one byte per call, this is probably why we still use
inc/dec in so many places...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] percpu: convert SNMP mibs to new infra
  2009-04-01 18:44         ` [RFC] percpu: convert SNMP mibs to new infra Eric Dumazet
@ 2009-04-02  0:13           ` Tejun Heo
  2009-04-02  4:05             ` Ingo Molnar
  2009-04-02  5:04           ` [RFC] " Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2009-04-02  0:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jeremy Fitzhardinge, linux kernel, Linux Netdev List,
	Joe Perches, Rusty Russell

Hello, Eric, Ingo.

Eric Dumazet wrote:
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index aee103b..6b82f6b 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -135,6 +135,9 @@ do {							\
>  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
>  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
>  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
> +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
> +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
>  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
>  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)

The final goal is to unify static and dynamic accesses but we aren't
there yet, so, for the time being, we'll need some interim solutions.
I would prefer percpu_ptr_add() tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] percpu: convert SNMP mibs to new infra
  2009-04-02  0:13           ` Tejun Heo
@ 2009-04-02  4:05             ` Ingo Molnar
  2009-04-02  8:07               ` [PATCH] " Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-04-02  4:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Eric Dumazet, Jeremy Fitzhardinge, linux kernel,
	Linux Netdev List, Joe Perches, Rusty Russell


* Tejun Heo <htejun@gmail.com> wrote:

> Hello, Eric, Ingo.
> 
> Eric Dumazet wrote:
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index aee103b..6b82f6b 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -135,6 +135,9 @@ do {							\
> >  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
> >  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
> >  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> > +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
> > +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
> > +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
> >  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
> >  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
> >  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
> 
> The final goal is to unify static and dynamic accesses but we 
> aren't there yet, so, for the time being, we'll need some interim 
> solutions. I would prefer percpu_ptr_add() tho.

Yep, that's the standard naming scheme for new APIs: generic to 
specific, left to right.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] percpu: convert SNMP mibs to new infra
  2009-04-01 18:44         ` [RFC] percpu: convert SNMP mibs to new infra Eric Dumazet
  2009-04-02  0:13           ` Tejun Heo
@ 2009-04-02  5:04           ` Rusty Russell
  2009-04-02  5:19             ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-04-02  5:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Tejun Heo, linux kernel,
	Linux Netdev List, Joe Perches

On Thursday 02 April 2009 05:14:47 Eric Dumazet wrote:
> Here is a preliminary patch for SNMP mibs that seems to work well on x86_32
> 
> [RFC] percpu: convert SNMP mibs to new infra

OK, I have a whole heap of "convert to dynamic per-cpu" patches waiting in
the wings too, once Tejun's conversion is complete.

Also, what is optimal depends on the arch: we had a long discussion on this
(it's what local_t was supposed to do, with cpu_local_inc() etc: see
Subject: local_add_return 2008-12-16 thread).

eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
two-counter wins.  On x86, inc wins (obviously).

But efforts to create a single primitive have been problematic: maybe
open-coding it like this is the Right Thing.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] percpu: convert SNMP mibs to new infra
  2009-04-02  5:04           ` [RFC] " Rusty Russell
@ 2009-04-02  5:19             ` Eric Dumazet
  2009-04-02 11:46               ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2009-04-02  5:19 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Tejun Heo, linux kernel,
	Linux Netdev List, Joe Perches

Rusty Russell a écrit :
> On Thursday 02 April 2009 05:14:47 Eric Dumazet wrote:
>> Here is a preliminary patch for SNMP mibs that seems to work well on x86_32
>>
>> [RFC] percpu: convert SNMP mibs to new infra
> 
> OK, I have a whole heap of "convert to dynamic per-cpu" patches waiting in
> the wings too, once Tejun's conversion is complete.
> 
> Also, what is optimal depends on the arch: we had a long discussion on this
> (it's what local_t was supposed to do, with cpu_local_inc() etc: see
> Subject: local_add_return 2008-12-16 thread).
> 
> eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
> two-counter wins.  On x86, inc wins (obviously).
> 
> But efforts to create a single primitive have been problematic: maybe
> open-coding it like this is the Right Thing.
> 

I tried to find a generic CONFIG_ define that would annonce that an arche
has a fast percpu_add() implementation. (faster than __raw_get_cpu_var,
for example, when we already are in a preempt disabled section)

Any idea ?


For example, net/ipv4/route.c has :

static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
#define RT_CACHE_STAT_INC(field) \
        (__raw_get_cpu_var(rt_cache_stat).field++)

We could use percpu_add(rt_cache_stat.field, 1) instead, only if percpu_add()
is not the generic one.

#define __percpu_generic_to_op(var, val, op)                            \
do {                                                                    \
        get_cpu_var(var) op val;                                        \
        put_cpu_var(var);                                               \
} while (0)
#ifndef percpu_add
# define percpu_add(var, val)           __percpu_generic_to_op(var, (val), +=)
#endif


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] percpu: convert SNMP mibs to new infra
  2009-04-02  4:05             ` Ingo Molnar
@ 2009-04-02  8:07               ` Eric Dumazet
  2009-04-03  0:39                 ` Tejun Heo
  2009-04-03 17:10                 ` Ingo Molnar
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Dumazet @ 2009-04-02  8:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, Jeremy Fitzhardinge, linux kernel, Linux Netdev List,
	Rusty Russell

Ingo Molnar a écrit :
> * Tejun Heo <htejun@gmail.com> wrote:
> 
>> Hello, Eric, Ingo.
>>
>> Eric Dumazet wrote:
>>> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
>>> index aee103b..6b82f6b 100644
>>> --- a/arch/x86/include/asm/percpu.h
>>> +++ b/arch/x86/include/asm/percpu.h
>>> @@ -135,6 +135,9 @@ do {							\
>>>  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
>>>  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
>>>  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
>>> +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
>>> +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
>>> +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
>>>  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
>>>  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
>>>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
>> The final goal is to unify static and dynamic accesses but we 
>> aren't there yet, so, for the time being, we'll need some interim 
>> solutions. I would prefer percpu_ptr_add() tho.
> 
> Yep, that's the standard naming scheme for new APIs: generic to 
> specific, left to right.
> 

Here is a second version of the patch, with percpu_ptr_xxx convention,
and more polished form (snmp_mib_free() was forgoten in previous RFC)

Thank you all

[PATCH] percpu: convert SNMP mibs to new infra

Some arches can use percpu infrastructure for safe changes to mibs.
(percpu_add() is safe against preemption and interrupts), but
we want the real thing (a single instruction), not an emulation.

On arches still using an emulation, its better to keep the two views
per mib and preemption disable/enable

This shrinks size of mibs by 50%, but also shrinks vmlinux text size
(minimum IPV4 config)

$ size vmlinux.old vmlinux.new
   text    data     bss     dec     hex filename
4308458  561092 1728512 6598062  64adae vmlinux.old
4303834  561092 1728512 6593438  649b9e vmlinux.new



Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 arch/x86/include/asm/percpu.h |    3 +++
 include/net/snmp.h            |   27 ++++++++++++++++++++++-----
 net/ipv4/af_inet.c            |   31 ++++++++++++++++++-------------
 3 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..f8081e4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -135,6 +135,9 @@ do {							\
 #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
 #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
 #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define percpu_ptr_add(var, val)	percpu_to_op("add", *(var), val)
+#define percpu_ptr_inc(var)       percpu_ptr_add(var, 1)
+#define percpu_ptr_dec(var)       percpu_ptr_add(var, -1)
 #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
 #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
 #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 57c9362..1ba584b 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -123,15 +123,31 @@ struct linux_xfrm_mib {
 };
 
 /* 
- * FIXME: On x86 and some other CPUs the split into user and softirq parts
+ * On x86 and some other CPUs the split into user and softirq parts
  * is not needed because addl $1,memory is atomic against interrupts (but 
- * atomic_inc would be overkill because of the lock cycles). Wants new 
- * nonlocked_atomic_inc() primitives -AK
+ * atomic_inc would be overkill because of the lock cycles).
  */ 
+#ifdef CONFIG_X86
+# define SNMP_ARRAY_SZ 1
+#else
+# define SNMP_ARRAY_SZ 2
+#endif
+
 #define DEFINE_SNMP_STAT(type, name)	\
-	__typeof__(type) *name[2]
+	__typeof__(type) *name[SNMP_ARRAY_SZ]
 #define DECLARE_SNMP_STAT(type, name)	\
-	extern __typeof__(type) *name[2]
+	extern __typeof__(type) *name[SNMP_ARRAY_SZ]
+
+#if SNMP_ARRAY_SZ == 1
+#define SNMP_INC_STATS(mib, field)	percpu_ptr_inc(&mib[0]->mibs[field])
+#define SNMP_INC_STATS_BH(mib, field)	SNMP_INC_STATS(mib, field)
+#define SNMP_INC_STATS_USER(mib, field) SNMP_INC_STATS(mib, field)
+#define SNMP_DEC_STATS(mib, field)	percpu_ptr_dec(&mib[0]->mibs[field])
+#define SNMP_ADD_STATS_BH(mib, field, addend) 	\
+				percpu_ptr_add(&mib[0]->mibs[field], addend)
+#define SNMP_ADD_STATS_USER(mib, field, addend) 	\
+				percpu_ptr_add(&mib[0]->mibs[field], addend)
+#else
 
 #define SNMP_STAT_BHPTR(name)	(name[0])
 #define SNMP_STAT_USRPTR(name)	(name[1])
@@ -160,5 +176,6 @@ struct linux_xfrm_mib {
 		per_cpu_ptr(mib[1], get_cpu())->mibs[field] += addend; \
 		put_cpu(); \
 	} while (0)
+#endif
 
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7f03373..4df3a76 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1366,36 +1366,41 @@ unsigned long snmp_fold_field(void *mib[], int offt)
 
 	for_each_possible_cpu(i) {
 		res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
+#if SNMP_ARRAY_SZ == 2
 		res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+#endif
 	}
 	return res;
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
-int snmp_mib_init(void *ptr[2], size_t mibsize)
+int snmp_mib_init(void *ptr[SNMP_ARRAY_SZ], size_t mibsize)
 {
 	BUG_ON(ptr == NULL);
 	ptr[0] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
 	if (!ptr[0])
-		goto err0;
+		return -ENOMEM;
+#if SNMP_ARRAY_SZ == 2
 	ptr[1] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
-	if (!ptr[1])
-		goto err1;
+	if (!ptr[1]) {
+		free_percpu(ptr[0]);
+		ptr[0] = NULL;
+		return -ENOMEM;
+	}
+#endif
 	return 0;
-err1:
-	free_percpu(ptr[0]);
-	ptr[0] = NULL;
-err0:
-	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_init);
 
-void snmp_mib_free(void *ptr[2])
+void snmp_mib_free(void *ptr[SNMP_ARRAY_SZ])
 {
+	int i;
+
 	BUG_ON(ptr == NULL);
-	free_percpu(ptr[0]);
-	free_percpu(ptr[1]);
-	ptr[0] = ptr[1] = NULL;
+	for (i = 0 ; i < SNMP_ARRAY_SZ; i++) {
+		free_percpu(ptr[i]);
+		ptr[i] = NULL;
+	}
 }
 EXPORT_SYMBOL_GPL(snmp_mib_free);
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-01 18:07         ` Jeremy Fitzhardinge
  2009-04-01 18:47           ` Eric Dumazet
@ 2009-04-02  9:52           ` Herbert Xu
  2009-04-02 14:12             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2009-04-02  9:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: dada1, mingo, htejun, linux-kernel, netdev, joe, rusty

Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
> There's probably not a lot of value in this.  The Intel and AMD 
> optimisation guides tend to deprecate inc/dec in favour of using 
> add/sub, because the former can cause pipeline stalls due to its partial 
> flags update.

Is this still the case on the latest Intel CPUs?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] percpu: convert SNMP mibs to new infra
  2009-04-02  5:19             ` Eric Dumazet
@ 2009-04-02 11:46               ` Rusty Russell
  0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2009-04-02 11:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Tejun Heo, linux kernel,
	Linux Netdev List, Joe Perches

On Thursday 02 April 2009 15:49:19 Eric Dumazet wrote:
> Rusty Russell a écrit :
> > eg. on S/390, atomic_inc is a win over the two-counter version.  On Sparc,
> > two-counter wins.  On x86, inc wins (obviously).
> > 
> > But efforts to create a single primitive have been problematic: maybe
> > open-coding it like this is the Right Thing.
> 
> I tried to find a generic CONFIG_ define that would annonce that an arche
> has a fast percpu_add() implementation. (faster than __raw_get_cpu_var,
> for example, when we already are in a preempt disabled section)

Nope, we don't have one.  It was supposed to work like this:
	DEFINE_PER_CPU(local_t, counter);

	cpu_local_inc(counter);

That would do incl in x86, local_t could even be a long[3] (one for hardirq,
one for softirq, one for user context).  But there were issues:

1) It didn't work on dynamic percpu allocs, which was much of the interesting
   use (Tejun is fixing this bit right now)
2) The x86 version wasn't optimized anyway,
3) Everyone did atomic_long_inc(), so the ftrace code assumed it would be nmi
   safe (tho atomic_t isn't nmi-safe on some archs anyway), so the long[3]
   method would break them,
4) The long[3] version was overkill for networking, which doesn't need hardirq
   so we'd want another variant of local_t plus all the ops,
5) Some people didn't want long: Christoph had a more generic but more complex
   version,
6) It's still not used anywhere in the tree (tho local_t is), so there's no
   reason to stick to the current semantics.

> For example, net/ipv4/route.c has :
> 
> static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
> #define RT_CACHE_STAT_INC(field) \
>         (__raw_get_cpu_var(rt_cache_stat).field++)
> 
> We could use percpu_add(rt_cache_stat.field, 1) instead, only if percpu_add()
> is not the generic one.

Yep, but this one is different from the SNMP stats which needs softirq vs
user context safety.  This is where I start wondering how many interfaces
we're going to have...

Sorry to add more questions than answers :(
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] x86: percpu_to_op() misses memory and flags clobbers
  2009-04-02  9:52           ` Herbert Xu
@ 2009-04-02 14:12             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-02 14:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dada1, mingo, htejun, linux-kernel, netdev, joe, rusty

Herbert Xu wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>   
>> There's probably not a lot of value in this.  The Intel and AMD 
>> optimisation guides tend to deprecate inc/dec in favour of using 
>> add/sub, because the former can cause pipeline stalls due to its partial 
>> flags update.
>>     
>
> Is this still the case on the latest Intel CPUs?
>   

Yes:

    Assembly/Compiler Coding Rule 32. (M impact, H generality) INC and DEC
    instructions should be replaced with ADD or SUB instructions,
    because ADD and
    SUB overwrite all flags, whereas INC and DEC do not, therefore
    creating false
    dependencies on earlier instructions that set the flags.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] percpu: convert SNMP mibs to new infra
  2009-04-02  8:07               ` [PATCH] " Eric Dumazet
@ 2009-04-03  0:39                 ` Tejun Heo
  2009-04-03 17:10                 ` Ingo Molnar
  1 sibling, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2009-04-03  0:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jeremy Fitzhardinge, linux kernel, Linux Netdev List,
	Rusty Russell

Eric Dumazet wrote:
...
>  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
>  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
>  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> +#define percpu_ptr_add(var, val)	percpu_to_op("add", *(var), val)
> +#define percpu_ptr_inc(var)       percpu_ptr_add(var, 1)
> +#define percpu_ptr_dec(var)       percpu_ptr_add(var, -1)
>  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
>  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)

x86 part looks fine to me.

> diff --git a/include/net/snmp.h b/include/net/snmp.h
> index 57c9362..1ba584b 100644
> --- a/include/net/snmp.h
> +++ b/include/net/snmp.h
> @@ -123,15 +123,31 @@ struct linux_xfrm_mib {
>  };
>  
>  /* 
> - * FIXME: On x86 and some other CPUs the split into user and softirq parts
> + * On x86 and some other CPUs the split into user and softirq parts
>   * is not needed because addl $1,memory is atomic against interrupts (but 
> - * atomic_inc would be overkill because of the lock cycles). Wants new 
> - * nonlocked_atomic_inc() primitives -AK
> + * atomic_inc would be overkill because of the lock cycles).
>   */ 
> +#ifdef CONFIG_X86
> +# define SNMP_ARRAY_SZ 1
> +#else
> +# define SNMP_ARRAY_SZ 2
> +#endif

This is quite hacky but, well, for the time being...

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] percpu: convert SNMP mibs to new infra
  2009-04-02  8:07               ` [PATCH] " Eric Dumazet
  2009-04-03  0:39                 ` Tejun Heo
@ 2009-04-03 17:10                 ` Ingo Molnar
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-04-03 17:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tejun Heo, Jeremy Fitzhardinge, linux kernel, Linux Netdev List,
	Rusty Russell


* Eric Dumazet <dada1@cosmosbay.com> wrote:

> Ingo Molnar a écrit :
> > * Tejun Heo <htejun@gmail.com> wrote:
> > 
> >> Hello, Eric, Ingo.
> >>
> >> Eric Dumazet wrote:
> >>> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> >>> index aee103b..6b82f6b 100644
> >>> --- a/arch/x86/include/asm/percpu.h
> >>> +++ b/arch/x86/include/asm/percpu.h
> >>> @@ -135,6 +135,9 @@ do {							\
> >>>  #define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
> >>>  #define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
> >>>  #define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
> >>> +#define indir_percpu_add(var, val)	percpu_to_op("add", *(var), val)
> >>> +#define indir_percpu_inc(var)       percpu_to_op("add", *(var), 1)
> >>> +#define indir_percpu_dec(var)       percpu_to_op("add", *(var), -1)
> >>>  #define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
> >>>  #define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
> >>>  #define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
> >> The final goal is to unify static and dynamic accesses but we 
> >> aren't there yet, so, for the time being, we'll need some interim 
> >> solutions. I would prefer percpu_ptr_add() tho.
> > 
> > Yep, that's the standard naming scheme for new APIs: generic to 
> > specific, left to right.
> > 
> 
> Here is a second version of the patch, with percpu_ptr_xxx convention,
> and more polished form (snmp_mib_free() was forgoten in previous RFC)
> 
> Thank you all
> 
> [PATCH] percpu: convert SNMP mibs to new infra
> 
> Some arches can use percpu infrastructure for safe changes to mibs.
> (percpu_add() is safe against preemption and interrupts), but
> we want the real thing (a single instruction), not an emulation.
> 
> On arches still using an emulation, its better to keep the two views
> per mib and preemption disable/enable
> 
> This shrinks size of mibs by 50%, but also shrinks vmlinux text size
> (minimum IPV4 config)
> 
> $ size vmlinux.old vmlinux.new
>    text    data     bss     dec     hex filename
> 4308458  561092 1728512 6598062  64adae vmlinux.old
> 4303834  561092 1728512 6593438  649b9e vmlinux.new

Wow, that's pretty impressive!

> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
>  arch/x86/include/asm/percpu.h |    3 +++

Acked-by: Ingo Molnar <mingo@elte.hu>

As far as x86 goes, feel free to pick it up into any of the 
networking trees, these bits are easily merged and it's probably 
best if the patch stays in a single piece - it looks compact enough 
and if it breaks it's going to break in networking code.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-04-03 17:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-01  8:13 [PATCH] x86: percpu_to_op() misses memory and flags clobbers Eric Dumazet
2009-04-01  9:02 ` Jeremy Fitzhardinge
2009-04-01 10:14   ` Eric Dumazet
2009-04-01 16:12     ` Ingo Molnar
2009-04-01 16:41       ` Jeremy Fitzhardinge
2009-04-01 16:44         ` Ingo Molnar
2009-04-01 17:13       ` Eric Dumazet
2009-04-01 18:07         ` Jeremy Fitzhardinge
2009-04-01 18:47           ` Eric Dumazet
2009-04-02  9:52           ` Herbert Xu
2009-04-02 14:12             ` Jeremy Fitzhardinge
2009-04-01 18:44         ` [RFC] percpu: convert SNMP mibs to new infra Eric Dumazet
2009-04-02  0:13           ` Tejun Heo
2009-04-02  4:05             ` Ingo Molnar
2009-04-02  8:07               ` [PATCH] " Eric Dumazet
2009-04-03  0:39                 ` Tejun Heo
2009-04-03 17:10                 ` Ingo Molnar
2009-04-02  5:04           ` [RFC] " Rusty Russell
2009-04-02  5:19             ` Eric Dumazet
2009-04-02 11:46               ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).