* [PATCH] core: fix the use of this_cpu_ptr @ 2013-03-28 9:42 roy.qing.li 2013-03-28 13:05 ` Eric Dumazet 2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller 0 siblings, 2 replies; 30+ messages in thread From: roy.qing.li @ 2013-03-28 9:42 UTC (permalink / raw) To: netdev From: Li RongQing <roy.qing.li@gmail.com> flush_tasklet is not percpu var, and percpu is percpu var, and this_cpu_ptr(&info->cache->percpu->flush_tasklet) is not equal to &this_cpu_ptr(info->cache->percpu)->flush_tasklet 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> --- net/core/flow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/flow.c b/net/core/flow.c index 7fae135..e8084b8 100644 --- a/net/core/flow.c +++ b/net/core/flow.c @@ -346,7 +346,7 @@ static void flow_cache_flush_per_cpu(void *data) struct flow_flush_info *info = data; struct tasklet_struct *tasklet; - tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet); + tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet; tasklet->data = (unsigned long)info; tasklet_schedule(tasklet); } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li @ 2013-03-28 13:05 ` Eric Dumazet 2013-03-28 14:38 ` Christoph Lameter 2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller 1 sibling, 1 reply; 30+ messages in thread From: Eric Dumazet @ 2013-03-28 13:05 UTC (permalink / raw) To: roy.qing.li, Shan Wei, Christoph Lameter; +Cc: netdev On Thu, 2013-03-28 at 17:42 +0800, roy.qing.li@gmail.com wrote: > From: Li RongQing <roy.qing.li@gmail.com> > > flush_tasklet is not percpu var, and percpu is percpu var, and > this_cpu_ptr(&info->cache->percpu->flush_tasklet) > is not equal to > &this_cpu_ptr(info->cache->percpu)->flush_tasklet > > 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug. > > Signed-off-by: Li RongQing <roy.qing.li@gmail.com> > --- > net/core/flow.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/flow.c b/net/core/flow.c > index 7fae135..e8084b8 100644 > --- a/net/core/flow.c > +++ b/net/core/flow.c > @@ -346,7 +346,7 @@ static void flow_cache_flush_per_cpu(void *data) > struct flow_flush_info *info = data; > struct tasklet_struct *tasklet; > > - tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet); > + tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet; > tasklet->data = (unsigned long)info; > tasklet_schedule(tasklet); > } Hi Any reason you dont Cc Shan Wei & Christoph Lameter ? Christoph, could this kind of error be detected by the compiler or sparse ? Thanks ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 13:05 ` Eric Dumazet @ 2013-03-28 14:38 ` Christoph Lameter 2013-03-28 15:36 ` Eric Dumazet 2013-03-29 1:24 ` RongQing Li 0 siblings, 2 replies; 30+ messages in thread From: Christoph Lameter @ 2013-03-28 14:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: roy.qing.li, Shan Wei, netdev On Thu, 28 Mar 2013, Eric Dumazet wrote: > > flush_tasklet is not percpu var, and percpu is percpu var, and > > this_cpu_ptr(&info->cache->percpu->flush_tasklet) > > is not equal to > > &this_cpu_ptr(info->cache->percpu)->flush_tasklet &this_cpu_ptr is always an error since you are taking the addresss of an address. this_cpu_ptr(&structure) is the right way to get the address of the cpu instance for this cpu for a per cpu structure. > Christoph, could this kind of error be detected by the compiler or > sparse ? The per cpu variables are marked with __percpu. This should be detected by sparse. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 14:38 ` Christoph Lameter @ 2013-03-28 15:36 ` Eric Dumazet 2013-03-28 16:44 ` Christoph Lameter 2013-03-29 1:24 ` RongQing Li 1 sibling, 1 reply; 30+ messages in thread From: Eric Dumazet @ 2013-03-28 15:36 UTC (permalink / raw) To: Christoph Lameter; +Cc: roy.qing.li, Shan Wei, netdev On Thu, 2013-03-28 at 14:38 +0000, Christoph Lameter wrote: > On Thu, 28 Mar 2013, Eric Dumazet wrote: > > > Christoph, could this kind of error be detected by the compiler or > > sparse ? > > The per cpu variables are marked with __percpu. This should be detected by > sparse. make C=2 net/core/flow.o CHECK net/core/flow.c No warning. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 15:36 ` Eric Dumazet @ 2013-03-28 16:44 ` Christoph Lameter 0 siblings, 0 replies; 30+ messages in thread From: Christoph Lameter @ 2013-03-28 16:44 UTC (permalink / raw) To: Eric Dumazet; +Cc: roy.qing.li, Shan Wei, netdev On Thu, 28 Mar 2013, Eric Dumazet wrote: > On Thu, 2013-03-28 at 14:38 +0000, Christoph Lameter wrote: > > On Thu, 28 Mar 2013, Eric Dumazet wrote: > > > > > > Christoph, could this kind of error be detected by the compiler or > > > sparse ? > > > > The per cpu variables are marked with __percpu. This should be detected by > > sparse. > > make C=2 net/core/flow.o > > CHECK net/core/flow.c > > No warning. huh? this_cpu_ptr uses SHIFT_PERCPU_PTR #ifndef SHIFT_PERCPU_PTR /* Weird cast keeps both GCC and sparse happy. */ #define SHIFT_PERCPU_PTR(__p, __offset) ({ \ __verify_pcpu_ptr((__p)); \ RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset)); \ }) #endif This would mean that __verify_pcpu_ptr is broken. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 14:38 ` Christoph Lameter 2013-03-28 15:36 ` Eric Dumazet @ 2013-03-29 1:24 ` RongQing Li 2013-04-01 15:21 ` Christoph Lameter 1 sibling, 1 reply; 30+ messages in thread From: RongQing Li @ 2013-03-29 1:24 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, Shan Wei, netdev 2013/3/28 Christoph Lameter <cl@linux.com>: > On Thu, 28 Mar 2013, Eric Dumazet wrote: > >> > flush_tasklet is not percpu var, and percpu is percpu var, and >> > this_cpu_ptr(&info->cache->percpu->flush_tasklet) >> > is not equal to >> > &this_cpu_ptr(info->cache->percpu)->flush_tasklet > > &this_cpu_ptr is always an error since you are taking the addresss of an > address. > &this_cpu_ptr()->flush_tasklet, "->" has high priority than "&" so the result is same as &(this_cpu_ptr()->flush_tasklet) it should not a issue. flush_tasklet is not a percpu var, it is a member of percpu var. -Roy > this_cpu_ptr(&structure) is the right way to get the address of the cpu > instance for this cpu for a per cpu structure. > >> Christoph, could this kind of error be detected by the compiler or >> sparse ? > > The per cpu variables are marked with __percpu. This should be detected by > sparse. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-29 1:24 ` RongQing Li @ 2013-04-01 15:21 ` Christoph Lameter 2013-04-01 16:31 ` Eric Dumazet 0 siblings, 1 reply; 30+ messages in thread From: Christoph Lameter @ 2013-04-01 15:21 UTC (permalink / raw) To: RongQing Li; +Cc: Eric Dumazet, Shan Wei, netdev On Fri, 29 Mar 2013, RongQing Li wrote: > > &this_cpu_ptr is always an error since you are taking the addresss of an > > address. > > > > &this_cpu_ptr()->flush_tasklet, "->" has high priority than "&" > so the result is same as > &(this_cpu_ptr()->flush_tasklet) Ok. This is the same as this_cpu_read(xxx.flush_tasklet) Looks less confusing to me. > flush_tasklet is not a percpu var, it is a member of percpu var. Well then it would be best to use this_cpu_read() instead of this_cpu_ptr. It also will generate better code. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-04-01 15:21 ` Christoph Lameter @ 2013-04-01 16:31 ` Eric Dumazet 2013-04-01 18:15 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Eric Dumazet @ 2013-04-01 16:31 UTC (permalink / raw) To: Christoph Lameter; +Cc: RongQing Li, Shan Wei, netdev On Mon, 2013-04-01 at 15:21 +0000, Christoph Lameter wrote: > On Fri, 29 Mar 2013, RongQing Li wrote: > > flush_tasklet is not a percpu var, it is a member of percpu var. > > Well then it would be best to use this_cpu_read() instead of this_cpu_ptr. > It also will generate better code. I believe we already had this discussion in the past. flush_tasklet is a structure, and we need its address, not read its content. You can not use this_cpu_read() to get its address, and following code is fine. tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet; Similar to this code in mm/page_alloc.c pcp = &this_cpu_ptr(zone->pageset)->pcp; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-04-01 16:31 ` Eric Dumazet @ 2013-04-01 18:15 ` Christoph Lameter 2013-04-03 20:41 ` this cpu documentation Christoph Lameter [not found] ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org> 2 siblings, 0 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-01 18:15 UTC (permalink / raw) To: Eric Dumazet; +Cc: RongQing Li, Shan Wei, netdev On Mon, 1 Apr 2013, Eric Dumazet wrote: > On Mon, 2013-04-01 at 15:21 +0000, Christoph Lameter wrote: > > On Fri, 29 Mar 2013, RongQing Li wrote: > > > > flush_tasklet is not a percpu var, it is a member of percpu var. > > > > Well then it would be best to use this_cpu_read() instead of this_cpu_ptr. > > It also will generate better code. > > I believe we already had this discussion in the past. > > flush_tasklet is a structure, and we need its address, not read its > content. > > You can not use this_cpu_read() to get its address, and following > code is fine. > > tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet; that is confusing.. tasklet = this_cpu_ptr(&info->cache->percpu->flushtasklet) this_cpu_ptr performs an address relocation. The address then is the one of flushtasklet. > Similar to this code in mm/page_alloc.c > > pcp = &this_cpu_ptr(zone->pageset)->pcp; Yeah thats my (early) code using these features. pcp = this_cpu_ptr(&zone->pageset->pcp) I need to do a writeup on this one. ^ permalink raw reply [flat|nested] 30+ messages in thread
* this cpu documentation 2013-04-01 16:31 ` Eric Dumazet 2013-04-01 18:15 ` Christoph Lameter @ 2013-04-03 20:41 ` Christoph Lameter 2013-04-03 21:18 ` Tejun Heo 2013-04-04 0:09 ` Randy Dunlap [not found] ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org> 2 siblings, 2 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-03 20:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt From: Christoph Lameter <cl@linux.com> Subject: this_cpu: Add documentation Document the rationale and the way to use this_cpu operations. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/Documentation/this_cpu_ops =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/Documentation/this_cpu_ops 2013-04-03 15:25:41.424846306 -0500 @@ -0,0 +1,194 @@ +this_cpu operations +------------------- + +this_cpu operations are a way of optimizing access to per cpu variables +associated with the *currently* executing processor +through the use of segment registers (or a dedicated register where the cpu +permanently stored the beginning of the per cpu area for a specific +processor). + +The this_cpu operations add an per cpu variable offset to the processor +specific percpu base and encode that operation in the instruction operating +on the per cpu variable. + +This mean there are no atomicity issues between the calculation +of the offset and the operation on the data. Therefore it is not necessary +to disable preempt or interrupts to ensure that the processor is not changed +between the calculation of the address and the operation on the data. + +Read-modify-write operations are of particular interest. Frequently +processors have special lower latency instructions that can operate without +the typical synchronization overhead but still provide some sort of relaxed +atomicity guarantee. The x86 for example can execute RMV instructions like +inc/dec/cmpxchg without the lock prefix and the associated latency penalty. + +Access to the variable without the lock prefix is not synchronized but +synchronization is not necessary since we are dealing with per cpu data +specific to the currently executing processor. Only the current processor +should be accessing that variable and therefore there are no concurency +issues with other processors in the system. + +On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is +then possible to simply use the segment override to relocate a per cpu relative address +to the proper per cpu area for the processor. So the relocation to the per cpu base +is encoded in the instruction via a segment register prefix. + +For example: + + DEFINE_PER_CPU(int, x); + int z; + + z = this_cpu_read(x); + +results in a single instruction + + mov ax, gs:[x] + +instead of a sequence of calculation of the address and then a fetch from +that address which occurs with the percpu operations. Before this_cpu_ops +such sequence also required preempt disable/enable to prevent the Os from +moving the thread to a different processor while the calculation is performed. + + +The main use of the this_cpu operations has been to optimize counter operations. + + + this_cpu_inc(x) + +results in the following single instruction (no lock prefix!) + + inc gs:[x] + + +instead of the following operations required if there is no segment register. + + int *y; + int cpu; + + cpu = get_cpu(); + y = per_cpu_ptr(&x, cpu); + (*y)++; + put_cpu(); + + +Note that these operations can only be used on percpu data that is reserved for +a specific processor. Without disabling preemption in the surrounding code +this_cpu_inc() will only guarantee that one of the percpu counters is correctly +incremented. However, there is no guarantee that the OS will not move the process +directly before or after the this_cpu instruction is executed. In general this +means that the value of the individual counters for each processor are +meaningless. The sum of all the per cpu counters is the only value that is of +interest. + +Per cpu variables are used for performance reasons. Bouncing cache lines can +be avoided if multiple processors concurrently go through the same code paths. +Since each processor has its own per cpu variables no concurrent cacheline +updates take place. The price that has to be paid for this optimization is +the need to add up the per cpu counters when the value of the counter is +needed. + + +Special operations: +------------------- + + y = this_cpu_ptr(&x) + +Takes the offset of a per cpu variable (&x !) and returns the address of the +per cpu variable that belongs to the currently executing processor. +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence +requires. No processor number is available. Instead the offset of the local\ +per cpu area is simply added to the percpu offset. + + + +Per cpu variables and offsets +----------------------------- + +Per cpu variables have *offsets* to the beginning of the percpu area. They do +not have addresses although they look like that in the code. Offsets +cannot be directly dereferenced. The offset must be added to a base pointer of +a percpu area of a processor in order to form a valid address. + +Therefore the use of x or &x outside of the context of per cpu operations +is invalid and will generally be treated like a NULL pointer dereference. + +In the context of per cpu operations + + x is a per cpu variable. Most this_cpu operations take a cpu variable. + + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset + of a per cpu variable which makes this look a bit strange. + + + +Operations on a field of a per cpu structure +-------------------------------------------- + +Lets say we have a percpu structure + + struct s { + int n,m; + }; + + DEFINE_PER_CPU(struct s, p); + + +Operations on these fields are straightforward + + this_cpu_inc(p.m) + + z = this_cpu_cmpxchg(p.m, 0, 1); + + +If we have an offset to struct s: + + struct s __percpu *ps = &p; + + z = this_cpu_dec(ps->m); + + z = this_cpu_inc_return(ps->n); + + +The calculation of the pointer may require the use of this_cpu_ptr() if we +do not make use of this_cpu ops later to manipulate fields: + + struct s *pp; + + pp = this_cpu_ptr(&p); + + pp->m-- + + z = pp->n++ + + +Variants of this_cpu ops +------------------------- + +this_cpu ops are interupt safe. Some architecture do not support these per +cpu local operations. In that case the operation must be replaced by code +that disables interrupts, then does the operations that are guaranteed to be +atomic and then reenable interrupts. Doing so is expensive. If there are +other reasons why the scheduler cannot change the processor we are executing +on then there is no reason to disable interrupts. For that purpose +the __this_cpu operations are provided. F.e. + + __this_cpu_inc(x) + +Will increment x and will not fallback to code that disables interrupts on +platforms that cannot accomplish atomicity through address relocation and +an RMV operation in the same instruction. + + + +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) +-------------------------------------------- + +The first operation takes the offset and forms an address and then adds +the offset of the n field. + +The second one first adds the two offsets and then does the relocation. +IMHO the second form looks cleaner and has an easier time with (). + + +Christoph Lameter, April 3rd, 2013 + ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-03 20:41 ` this cpu documentation Christoph Lameter @ 2013-04-03 21:18 ` Tejun Heo 2013-04-04 0:09 ` Randy Dunlap 1 sibling, 0 replies; 30+ messages in thread From: Tejun Heo @ 2013-04-03 21:18 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, srostedt On Wed, Apr 03, 2013 at 08:41:32PM +0000, Christoph Lameter wrote: > > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation > > Document the rationale and the way to use this_cpu operations. > > Signed-off-by: Christoph Lameter <cl@linux.com> Applied to percpu/for-3.10 with the file renamed to this_cpu_ops.txt. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-03 20:41 ` this cpu documentation Christoph Lameter 2013-04-03 21:18 ` Tejun Heo @ 2013-04-04 0:09 ` Randy Dunlap 2013-04-04 14:41 ` Christoph Lameter 1 sibling, 1 reply; 30+ messages in thread From: Randy Dunlap @ 2013-04-04 0:09 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt On 04/03/13 13:41, Christoph Lameter wrote: > > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation > > Document the rationale and the way to use this_cpu operations. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/Documentation/this_cpu_ops > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/Documentation/this_cpu_ops 2013-04-03 15:25:41.424846306 -0500 > @@ -0,0 +1,194 @@ > +this_cpu operations > +------------------- > + > +this_cpu operations are a way of optimizing access to per cpu variables > +associated with the *currently* executing processor > +through the use of segment registers (or a dedicated register where the cpu > +permanently stored the beginning of the per cpu area for a specific > +processor). > + > +The this_cpu operations add an per cpu variable offset to the processor add a per > +specific percpu base and encode that operation in the instruction operating > +on the per cpu variable. > + > +This mean there are no atomicity issues between the calculation means > +of the offset and the operation on the data. Therefore it is not necessary > +to disable preempt or interrupts to ensure that the processor is not changed > +between the calculation of the address and the operation on the data. > + > +Read-modify-write operations are of particular interest. Frequently > +processors have special lower latency instructions that can operate without > +the typical synchronization overhead but still provide some sort of relaxed > +atomicity guarantee. The x86 for example can execute RMV instructions like RMW ?? > +inc/dec/cmpxchg without the lock prefix and the associated latency penalty. > + > +Access to the variable without the lock prefix is not synchronized but > +synchronization is not necessary since we are dealing with per cpu data > +specific to the currently executing processor. Only the current processor > +should be accessing that variable and therefore there are no concurency concurrency > +issues with other processors in the system. > + > +On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is base > +then possible to simply use the segment override to relocate a per cpu relative address > +to the proper per cpu area for the processor. So the relocation to the per cpu base > +is encoded in the instruction via a segment register prefix. > + > +For example: > + > + DEFINE_PER_CPU(int, x); > + int z; > + > + z = this_cpu_read(x); > + > +results in a single instruction > + > + mov ax, gs:[x] > + > +instead of a sequence of calculation of the address and then a fetch from > +that address which occurs with the percpu operations. Before this_cpu_ops > +such sequence also required preempt disable/enable to prevent the Os from OS or O/S or kernel > +moving the thread to a different processor while the calculation is performed. > + > + > +The main use of the this_cpu operations has been to optimize counter operations. > + > + > + this_cpu_inc(x) > + > +results in the following single instruction (no lock prefix!) > + > + inc gs:[x] > + > + > +instead of the following operations required if there is no segment register. > + > + int *y; > + int cpu; > + > + cpu = get_cpu(); > + y = per_cpu_ptr(&x, cpu); > + (*y)++; > + put_cpu(); > + > + > +Note that these operations can only be used on percpu data that is reserved for > +a specific processor. Without disabling preemption in the surrounding code > +this_cpu_inc() will only guarantee that one of the percpu counters is correctly > +incremented. However, there is no guarantee that the OS will not move the process > +directly before or after the this_cpu instruction is executed. In general this > +means that the value of the individual counters for each processor are > +meaningless. The sum of all the per cpu counters is the only value that is of > +interest. > + > +Per cpu variables are used for performance reasons. Bouncing cache lines can > +be avoided if multiple processors concurrently go through the same code paths. > +Since each processor has its own per cpu variables no concurrent cacheline > +updates take place. The price that has to be paid for this optimization is > +the need to add up the per cpu counters when the value of the counter is > +needed. > + > + > +Special operations: > +------------------- > + > + y = this_cpu_ptr(&x) > + > +Takes the offset of a per cpu variable (&x !) and returns the address of the > +per cpu variable that belongs to the currently executing processor. > +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence > +requires. No processor number is available. Instead the offset of the local\ drop ending backslash > +per cpu area is simply added to the percpu offset. > + > + > + > +Per cpu variables and offsets > +----------------------------- > + > +Per cpu variables have *offsets* to the beginning of the percpu area. They do > +not have addresses although they look like that in the code. Offsets > +cannot be directly dereferenced. The offset must be added to a base pointer of > +a percpu area of a processor in order to form a valid address. > + > +Therefore the use of x or &x outside of the context of per cpu operations > +is invalid and will generally be treated like a NULL pointer dereference. > + > +In the context of per cpu operations > + > + x is a per cpu variable. Most this_cpu operations take a cpu variable. > + > + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset > + of a per cpu variable which makes this look a bit strange. > + > + > + > +Operations on a field of a per cpu structure > +-------------------------------------------- > + > +Lets say we have a percpu structure Let's > + > + struct s { > + int n,m; > + }; > + > + DEFINE_PER_CPU(struct s, p); > + > + > +Operations on these fields are straightforward > + > + this_cpu_inc(p.m) > + > + z = this_cpu_cmpxchg(p.m, 0, 1); > + > + > +If we have an offset to struct s: > + > + struct s __percpu *ps = &p; > + > + z = this_cpu_dec(ps->m); > + > + z = this_cpu_inc_return(ps->n); > + > + > +The calculation of the pointer may require the use of this_cpu_ptr() if we > +do not make use of this_cpu ops later to manipulate fields: > + > + struct s *pp; > + > + pp = this_cpu_ptr(&p); > + > + pp->m-- add ; > + > + z = pp->n++ add ; > + > + > +Variants of this_cpu ops > +------------------------- > + > +this_cpu ops are interupt safe. Some architecture do not support these per interrupt > +cpu local operations. In that case the operation must be replaced by code > +that disables interrupts, then does the operations that are guaranteed to be > +atomic and then reenable interrupts. Doing so is expensive. If there are > +other reasons why the scheduler cannot change the processor we are executing > +on then there is no reason to disable interrupts. For that purpose > +the __this_cpu operations are provided. F.e. E.g. or For example: > + > + __this_cpu_inc(x) > + > +Will increment x and will not fallback to code that disables interrupts on > +platforms that cannot accomplish atomicity through address relocation and > +an RMV operation in the same instruction. RMW ? > + > + > + > +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) > +-------------------------------------------- > + > +The first operation takes the offset and forms an address and then adds > +the offset of the n field. > + > +The second one first adds the two offsets and then does the relocation. > +IMHO the second form looks cleaner and has an easier time with (). > + > + > +Christoph Lameter, April 3rd, 2013 -- ~Randy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 0:09 ` Randy Dunlap @ 2013-04-04 14:41 ` Christoph Lameter 2013-04-04 16:28 ` Tejun Heo 2013-04-04 17:19 ` Randy Dunlap 0 siblings, 2 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-04 14:41 UTC (permalink / raw) To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo From: Christoph Lameter <cl@linux.com> Subject: this_cpu: Add documentation V2 Document the rationale and the way to use this_cpu operations. V2: Improved after feedback from Randy Dunlap Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/Documentation/this_cpu_ops =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/Documentation/this_cpu_ops 2013-04-04 09:40:06.431946280 -0500 @@ -0,0 +1,197 @@ +this_cpu operations +------------------- + +this_cpu operations are a way of optimizing access to per cpu variables +associated with the *currently* executing processor +through the use of segment registers (or a dedicated register where the cpu +permanently stored the beginning of the per cpu area for a specific +processor). + +The this_cpu operations add a per cpu variable offset to the processor +specific percpu base and encode that operation in the instruction operating +on the per cpu variable. + +This meanthere are no atomicity issues between the calculation +of the offset and the operation on the data. Therefore it is not necessary +to disable preempt or interrupts to ensure that the processor is not changed +between the calculation of the address and the operation on the data. + +Read-modify-write operations are of particular interest. Frequently +processors have special lower latency instructions that can operate without +the typical synchronization overhead but still provide some sort of relaxed +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write) +instructions like inc/dec/cmpxchg without the lock prefix and the +associated latency penalty. + +Access to the variable without the lock prefix is not synchronized but +synchronization is not necessary since we are dealing with per cpu data +specific to the currently executing processor. Only the current processor +should be accessing that variable and therefore there are no concurirency +issues with other processors in the system. + +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is +then possible to simply use the segment override to relocate a per cpu relative address +to the proper per cpu area for the processor. So the relocation to the per cpu base +is encoded in the instruction via a segment register prefix. + +For example: + + DEFINE_PER_CPU(int, x); + int z; + + z = this_cpu_read(x); + +results in a single instruction + + mov ax, gs:[x] + +instead of a sequence of calculation of the address and then a fetch from +that address which occurs with the percpu operations. Before this_cpu_ops +such sequence also required preempt disable/enable to prevent the kernel from +moving the thread to a different processor while the calculation is performed. + + +The main use of the this_cpu operations has been to optimize counter operations. + + + this_cpu_inc(x) + +results in the following single instruction (no lock prefix!) + + inc gs:[x] + + +instead of the following operations required if there is no segment register. + + int *y; + int cpu; + + cpu = get_cpu(); + y = per_cpu_ptr(&x, cpu); + (*y)++; + put_cpu(); + + +Note that these operations can only be used on percpu data that is reserved for +a specific processor. Without disabling preemption in the surrounding code +this_cpu_inc() will only guarantee that one of the percpu counters is correctly +incremented. However, there is no guarantee that the OS will not move the process +directly before or after the this_cpu instruction is executed. In general this +means that the value of the individual counters for each processor are +meaningless. The sum of all the per cpu counters is the only value that is of +interest. + +Per cpu variables are used for performance reasons. Bouncing cache lines can +be avoided if multiple processors concurrently go through the same code paths. +Since each processor has its own per cpu variables no concurrent cacheline +updates take place. The price that has to be paid for this optimization is +the need to add up the per cpu counters when the value of the counter is +needed. + + +Special operations: +------------------- + + y = this_cpu_ptr(&x) + +Takes the offset of a per cpu variable (&x !) and returns the address of the +per cpu variable that belongs to the currently executing processor. +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence +requires. No processor number is available. Instead the offset of the local +per cpu area is simply added to the percpu offset. + + + +Per cpu variables and offsets +----------------------------- + +Per cpu variables have *offsets* to the beginning of the percpu area. They do +not have addresses although they look like that in the code. Offsets +cannot be directly dereferenced. The offset must be added to a base pointer of +a percpu area of a processor in order to form a valid address. + +Therefore the use of x or &x outside of the context of per cpu operations +is invalid and will generally be treated like a NULL pointer dereference. + +In the context of per cpu operations + + x is a per cpu variable. Most this_cpu operations take a cpu variable. + + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset + of a per cpu variable which makes this look a bit strange. + + + +Operations on a field of a per cpu structure +-------------------------------------------- + +Let's say we have a percpu structure + + struct s { + int n,m; + }; + + DEFINE_PER_CPU(struct s, p); + + +Operations on these fields are straightforward + + this_cpu_inc(p.m) + + z = this_cpu_cmpxchg(p.m, 0, 1); + + +If we have an offset to struct s: + + struct s __percpu *ps = &p; + + z = this_cpu_dec(ps->m); + + z = this_cpu_inc_return(ps->n); + + +The calculation of the pointer may require the use of this_cpu_ptr() if we +do not make use of this_cpu ops later to manipulate fields: + + struct s *pp; + + pp = this_cpu_ptr(&p); + + pp->m--; + + z = pp->n++; + + +Variants of this_cpu ops +------------------------- + +this_cpu ops are interrupt safe. Some architecture do not support these per +cpu local operations. In that case the operation must be replaced by code +that disables interrupts, then does the operations that are guaranteed to be +atomic and then reenable interrupts. Doing so is expensive. If there are +other reasons why the scheduler cannot change the processor we are executing +on then there is no reason to disable interrupts. For that purpose +the __this_cpu operations are provided. For example. + + __this_cpu_inc(x); + +Will increment x and will not fallback to code that disables interrupts on +platforms that cannot accomplish atomicity through address relocation and +an Read-Modify-Write operation in the same instruction. + + + +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) +-------------------------------------------- + +The first operation takes the offset and forms an address and then adds +the offset of the n field. + +The second one first adds the two offsets and then does the relocation. +IMHO the second form looks cleaner and has an easier time with (). The +second form also is consistent with the way this_cpu_read() and friends +are used. + + +Christoph Lameter, April 3rd, 2013 + ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 14:41 ` Christoph Lameter @ 2013-04-04 16:28 ` Tejun Heo 2013-04-04 17:19 ` Randy Dunlap 1 sibling, 0 replies; 30+ messages in thread From: Tejun Heo @ 2013-04-04 16:28 UTC (permalink / raw) To: Christoph Lameter Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, Apr 04, 2013 at 02:41:08PM +0000, Christoph Lameter wrote: > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation V2 > > Document the rationale and the way to use this_cpu operations. > > V2: Improved after feedback from Randy Dunlap > > Signed-off-by: Christoph Lameter <cl@linux.com> Updated patch applied to wq/for-3.10. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 14:41 ` Christoph Lameter 2013-04-04 16:28 ` Tejun Heo @ 2013-04-04 17:19 ` Randy Dunlap 2013-04-04 17:26 ` Tejun Heo 2013-04-04 17:40 ` Christoph Lameter 1 sibling, 2 replies; 30+ messages in thread From: Randy Dunlap @ 2013-04-04 17:19 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo On 04/04/13 07:41, Christoph Lameter wrote: > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation V2 > > Document the rationale and the way to use this_cpu operations. > > V2: Improved after feedback from Randy Dunlap Thanks. I have a few more corrections to V2 (please see below). > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/Documentation/this_cpu_ops > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/Documentation/this_cpu_ops 2013-04-04 09:40:06.431946280 -0500 > @@ -0,0 +1,197 @@ > +this_cpu operations > +------------------- > + > +this_cpu operations are a way of optimizing access to per cpu variables > +associated with the *currently* executing processor > +through the use of segment registers (or a dedicated register where the cpu > +permanently stored the beginning of the per cpu area for a specific > +processor). > + > +The this_cpu operations add a per cpu variable offset to the processor > +specific percpu base and encode that operation in the instruction operating > +on the per cpu variable. > + > +This meanthere are no atomicity issues between the calculation means there > +of the offset and the operation on the data. Therefore it is not necessary > +to disable preempt or interrupts to ensure that the processor is not changed > +between the calculation of the address and the operation on the data. > + > +Read-modify-write operations are of particular interest. Frequently > +processors have special lower latency instructions that can operate without > +the typical synchronization overhead but still provide some sort of relaxed > +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write) > +instructions like inc/dec/cmpxchg without the lock prefix and the > +associated latency penalty. > + > +Access to the variable without the lock prefix is not synchronized but > +synchronization is not necessary since we are dealing with per cpu data > +specific to the currently executing processor. Only the current processor > +should be accessing that variable and therefore there are no concurirency concurrency > +issues with other processors in the system. > + > +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is > +then possible to simply use the segment override to relocate a per cpu relative address > +to the proper per cpu area for the processor. So the relocation to the per cpu base > +is encoded in the instruction via a segment register prefix. > + > +For example: > + > + DEFINE_PER_CPU(int, x); > + int z; > + > + z = this_cpu_read(x); > + > +results in a single instruction > + > + mov ax, gs:[x] > + > +instead of a sequence of calculation of the address and then a fetch from > +that address which occurs with the percpu operations. Before this_cpu_ops > +such sequence also required preempt disable/enable to prevent the kernel from > +moving the thread to a different processor while the calculation is performed. > + > + > +The main use of the this_cpu operations has been to optimize counter operations. > + > + > + this_cpu_inc(x) > + > +results in the following single instruction (no lock prefix!) > + > + inc gs:[x] > + > + > +instead of the following operations required if there is no segment register. > + > + int *y; > + int cpu; > + > + cpu = get_cpu(); > + y = per_cpu_ptr(&x, cpu); > + (*y)++; > + put_cpu(); > + > + > +Note that these operations can only be used on percpu data that is reserved for > +a specific processor. Without disabling preemption in the surrounding code > +this_cpu_inc() will only guarantee that one of the percpu counters is correctly > +incremented. However, there is no guarantee that the OS will not move the process > +directly before or after the this_cpu instruction is executed. In general this > +means that the value of the individual counters for each processor are > +meaningless. The sum of all the per cpu counters is the only value that is of > +interest. > + > +Per cpu variables are used for performance reasons. Bouncing cache lines can > +be avoided if multiple processors concurrently go through the same code paths. > +Since each processor has its own per cpu variables no concurrent cacheline > +updates take place. The price that has to be paid for this optimization is > +the need to add up the per cpu counters when the value of the counter is > +needed. > + > + > +Special operations: > +------------------- > + > + y = this_cpu_ptr(&x) > + > +Takes the offset of a per cpu variable (&x !) and returns the address of the > +per cpu variable that belongs to the currently executing processor. > +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence > +requires. No processor number is available. Instead the offset of the local > +per cpu area is simply added to the percpu offset. > + > + > + > +Per cpu variables and offsets > +----------------------------- > + > +Per cpu variables have *offsets* to the beginning of the percpu area. They do > +not have addresses although they look like that in the code. Offsets > +cannot be directly dereferenced. The offset must be added to a base pointer of > +a percpu area of a processor in order to form a valid address. > + > +Therefore the use of x or &x outside of the context of per cpu operations > +is invalid and will generally be treated like a NULL pointer dereference. > + > +In the context of per cpu operations > + > + x is a per cpu variable. Most this_cpu operations take a cpu variable. > + > + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset > + of a per cpu variable which makes this look a bit strange. > + > + > + > +Operations on a field of a per cpu structure > +-------------------------------------------- > + > +Let's say we have a percpu structure > + > + struct s { > + int n,m; > + }; > + > + DEFINE_PER_CPU(struct s, p); > + > + > +Operations on these fields are straightforward > + > + this_cpu_inc(p.m) > + > + z = this_cpu_cmpxchg(p.m, 0, 1); > + > + > +If we have an offset to struct s: > + > + struct s __percpu *ps = &p; > + > + z = this_cpu_dec(ps->m); > + > + z = this_cpu_inc_return(ps->n); > + > + > +The calculation of the pointer may require the use of this_cpu_ptr() if we > +do not make use of this_cpu ops later to manipulate fields: > + > + struct s *pp; > + > + pp = this_cpu_ptr(&p); > + > + pp->m--; > + > + z = pp->n++; > + > + > +Variants of this_cpu ops > +------------------------- > + > +this_cpu ops are interrupt safe. Some architecture do not support these per > +cpu local operations. In that case the operation must be replaced by code > +that disables interrupts, then does the operations that are guaranteed to be > +atomic and then reenable interrupts. Doing so is expensive. If there are > +other reasons why the scheduler cannot change the processor we are executing > +on then there is no reason to disable interrupts. For that purpose > +the __this_cpu operations are provided. For example. > + > + __this_cpu_inc(x); > + > +Will increment x and will not fallback to code that disables interrupts on > +platforms that cannot accomplish atomicity through address relocation and > +an Read-Modify-Write operation in the same instruction. a > + > + > + > +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) > +-------------------------------------------- > + > +The first operation takes the offset and forms an address and then adds > +the offset of the n field. > + > +The second one first adds the two offsets and then does the relocation. > +IMHO the second form looks cleaner and has an easier time with (). The > +second form also is consistent with the way this_cpu_read() and friends > +are used. > + > + > +Christoph Lameter, April 3rd, 2013 -- ~Randy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 17:19 ` Randy Dunlap @ 2013-04-04 17:26 ` Tejun Heo 2013-04-04 17:40 ` Christoph Lameter 1 sibling, 0 replies; 30+ messages in thread From: Tejun Heo @ 2013-04-04 17:26 UTC (permalink / raw) To: Randy Dunlap Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, Apr 4, 2013 at 10:19 AM, Randy Dunlap <rdunlap@infradead.org> wrote: > On 04/04/13 07:41, Christoph Lameter wrote: >> From: Christoph Lameter <cl@linux.com> >> Subject: this_cpu: Add documentation V2 >> >> Document the rationale and the way to use this_cpu operations. >> >> V2: Improved after feedback from Randy Dunlap > > Thanks. I have a few more corrections to V2 (please see below). Updated the tree w/ v3. I also re-filled all the paragraphs to 75 column. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 17:19 ` Randy Dunlap 2013-04-04 17:26 ` Tejun Heo @ 2013-04-04 17:40 ` Christoph Lameter 2013-04-04 18:35 ` Randy Dunlap 2013-04-11 17:00 ` Paul E. McKenney 1 sibling, 2 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-04 17:40 UTC (permalink / raw) To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo On Thu, 4 Apr 2013, Randy Dunlap wrote: > Thanks. I have a few more corrections to V2 (please see below). From: Christoph Lameter <cl@linux.com> Subject: this_cpu: Add documentation V3 Document the rationale and the way to use this_cpu operations. V2/V3: Improved after feedback from Randy Dunlap Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/Documentation/this_cpu_ops =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/Documentation/this_cpu_ops 2013-04-04 12:39:38.479720028 -0500 @@ -0,0 +1,197 @@ +this_cpu operations +------------------- + +this_cpu operations are a way of optimizing access to per cpu variables +associated with the *currently* executing processor +through the use of segment registers (or a dedicated register where the cpu +permanently stored the beginning of the per cpu area for a specific +processor). + +The this_cpu operations add a per cpu variable offset to the processor +specific percpu base and encode that operation in the instruction operating +on the per cpu variable. + +This means there are no atomicity issues between the calculation +of the offset and the operation on the data. Therefore it is not necessary +to disable preempt or interrupts to ensure that the processor is not changed +between the calculation of the address and the operation on the data. + +Read-modify-write operations are of particular interest. Frequently +processors have special lower latency instructions that can operate without +the typical synchronization overhead but still provide some sort of relaxed +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write) +instructions like inc/dec/cmpxchg without the lock prefix and the +associated latency penalty. + +Access to the variable without the lock prefix is not synchronized but +synchronization is not necessary since we are dealing with per cpu data +specific to the currently executing processor. Only the current processor +should be accessing that variable and therefore there are no concurirency +issues with other processors in the system. + +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is +then possible to simply use the segment override to relocate a per cpu relative address +to the proper per cpu area for the processor. So the relocation to the per cpu base +is encoded in the instruction via a segment register prefix. + +For example: + + DEFINE_PER_CPU(int, x); + int z; + + z = this_cpu_read(x); + +results in a single instruction + + mov ax, gs:[x] + +instead of a sequence of calculation of the address and then a fetch from +that address which occurs with the percpu operations. Before this_cpu_ops +such sequence also required preempt disable/enable to prevent the kernel from +moving the thread to a different processor while the calculation is performed. + + +The main use of the this_cpu operations has been to optimize counter operations. + + + this_cpu_inc(x) + +results in the following single instruction (no lock prefix!) + + inc gs:[x] + + +instead of the following operations required if there is no segment register. + + int *y; + int cpu; + + cpu = get_cpu(); + y = per_cpu_ptr(&x, cpu); + (*y)++; + put_cpu(); + + +Note that these operations can only be used on percpu data that is reserved for +a specific processor. Without disabling preemption in the surrounding code +this_cpu_inc() will only guarantee that one of the percpu counters is correctly +incremented. However, there is no guarantee that the OS will not move the process +directly before or after the this_cpu instruction is executed. In general this +means that the value of the individual counters for each processor are +meaningless. The sum of all the per cpu counters is the only value that is of +interest. + +Per cpu variables are used for performance reasons. Bouncing cache lines can +be avoided if multiple processors concurrently go through the same code paths. +Since each processor has its own per cpu variables no concurrent cacheline +updates take place. The price that has to be paid for this optimization is +the need to add up the per cpu counters when the value of the counter is +needed. + + +Special operations: +------------------- + + y = this_cpu_ptr(&x) + +Takes the offset of a per cpu variable (&x !) and returns the address of the +per cpu variable that belongs to the currently executing processor. +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence +requires. No processor number is available. Instead the offset of the local +per cpu area is simply added to the percpu offset. + + + +Per cpu variables and offsets +----------------------------- + +Per cpu variables have *offsets* to the beginning of the percpu area. They do +not have addresses although they look like that in the code. Offsets +cannot be directly dereferenced. The offset must be added to a base pointer of +a percpu area of a processor in order to form a valid address. + +Therefore the use of x or &x outside of the context of per cpu operations +is invalid and will generally be treated like a NULL pointer dereference. + +In the context of per cpu operations + + x is a per cpu variable. Most this_cpu operations take a cpu variable. + + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset + of a per cpu variable which makes this look a bit strange. + + + +Operations on a field of a per cpu structure +-------------------------------------------- + +Let's say we have a percpu structure + + struct s { + int n,m; + }; + + DEFINE_PER_CPU(struct s, p); + + +Operations on these fields are straightforward + + this_cpu_inc(p.m) + + z = this_cpu_cmpxchg(p.m, 0, 1); + + +If we have an offset to struct s: + + struct s __percpu *ps = &p; + + z = this_cpu_dec(ps->m); + + z = this_cpu_inc_return(ps->n); + + +The calculation of the pointer may require the use of this_cpu_ptr() if we +do not make use of this_cpu ops later to manipulate fields: + + struct s *pp; + + pp = this_cpu_ptr(&p); + + pp->m--; + + z = pp->n++; + + +Variants of this_cpu ops +------------------------- + +this_cpu ops are interrupt safe. Some architecture do not support these per +cpu local operations. In that case the operation must be replaced by code +that disables interrupts, then does the operations that are guaranteed to be +atomic and then reenable interrupts. Doing so is expensive. If there are +other reasons why the scheduler cannot change the processor we are executing +on then there is no reason to disable interrupts. For that purpose +the __this_cpu operations are provided. For example. + + __this_cpu_inc(x); + +Will increment x and will not fallback to code that disables interrupts on +platforms that cannot accomplish atomicity through address relocation and +a Read-Modify-Write operation in the same instruction. + + + +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) +-------------------------------------------- + +The first operation takes the offset and forms an address and then adds +the offset of the n field. + +The second one first adds the two offsets and then does the relocation. +IMHO the second form looks cleaner and has an easier time with (). The +second form also is consistent with the way this_cpu_read() and friends +are used. + + +Christoph Lameter, April 4th, 2013 + ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 17:40 ` Christoph Lameter @ 2013-04-04 18:35 ` Randy Dunlap 2013-04-04 18:52 ` Tejun Heo 2013-04-11 17:00 ` Paul E. McKenney 1 sibling, 1 reply; 30+ messages in thread From: Randy Dunlap @ 2013-04-04 18:35 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo On 04/04/13 10:40, Christoph Lameter wrote: > On Thu, 4 Apr 2013, Randy Dunlap wrote: > >> Thanks. I have a few more corrections to V2 (please see below). > > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation V3 > > Document the rationale and the way to use this_cpu operations. > > V2/V3: Improved after feedback from Randy Dunlap > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/Documentation/this_cpu_ops > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/Documentation/this_cpu_ops 2013-04-04 12:39:38.479720028 -0500 > @@ -0,0 +1,197 @@ > + > +Access to the variable without the lock prefix is not synchronized but > +synchronization is not necessary since we are dealing with per cpu data > +specific to the currently executing processor. Only the current processor > +should be accessing that variable and therefore there are no concurirency concurrency again. but hopefully Tejun has already corrected that. Thanks. > +issues with other processors in the system. > + -- ~Randy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 18:35 ` Randy Dunlap @ 2013-04-04 18:52 ` Tejun Heo 0 siblings, 0 replies; 30+ messages in thread From: Tejun Heo @ 2013-04-04 18:52 UTC (permalink / raw) To: Randy Dunlap Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, Apr 04, 2013 at 11:35:55AM -0700, Randy Dunlap wrote: > > +should be accessing that variable and therefore there are no concurirency > > concurrency > again. but hopefully Tejun has already corrected that. Yeap, the committed version is at https://git.kernel.org/cgit/linux/kernel/git/tj/percpu.git/tree/Documentation/this_cpu_ops.txt?h=for-3.10 Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: this cpu documentation 2013-04-04 17:40 ` Christoph Lameter 2013-04-04 18:35 ` Randy Dunlap @ 2013-04-11 17:00 ` Paul E. McKenney 1 sibling, 0 replies; 30+ messages in thread From: Paul E. McKenney @ 2013-04-11 17:00 UTC (permalink / raw) To: Christoph Lameter Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo On Thu, Apr 04, 2013 at 05:40:38PM +0000, Christoph Lameter wrote: > On Thu, 4 Apr 2013, Randy Dunlap wrote: > > > Thanks. I have a few more corrections to V2 (please see below). > > From: Christoph Lameter <cl@linux.com> > Subject: this_cpu: Add documentation V3 > > Document the rationale and the way to use this_cpu operations. > > V2/V3: Improved after feedback from Randy Dunlap > > Signed-off-by: Christoph Lameter <cl@linux.com> Very good to see this!!! Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Index: linux/Documentation/this_cpu_ops > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/Documentation/this_cpu_ops 2013-04-04 12:39:38.479720028 -0500 > @@ -0,0 +1,197 @@ > +this_cpu operations > +------------------- > + > +this_cpu operations are a way of optimizing access to per cpu variables > +associated with the *currently* executing processor > +through the use of segment registers (or a dedicated register where the cpu > +permanently stored the beginning of the per cpu area for a specific > +processor). > + > +The this_cpu operations add a per cpu variable offset to the processor > +specific percpu base and encode that operation in the instruction operating > +on the per cpu variable. > + > +This means there are no atomicity issues between the calculation > +of the offset and the operation on the data. Therefore it is not necessary > +to disable preempt or interrupts to ensure that the processor is not changed > +between the calculation of the address and the operation on the data. > + > +Read-modify-write operations are of particular interest. Frequently > +processors have special lower latency instructions that can operate without > +the typical synchronization overhead but still provide some sort of relaxed > +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write) > +instructions like inc/dec/cmpxchg without the lock prefix and the > +associated latency penalty. > + > +Access to the variable without the lock prefix is not synchronized but > +synchronization is not necessary since we are dealing with per cpu data > +specific to the currently executing processor. Only the current processor > +should be accessing that variable and therefore there are no concurirency > +issues with other processors in the system. > + > +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is > +then possible to simply use the segment override to relocate a per cpu relative address > +to the proper per cpu area for the processor. So the relocation to the per cpu base > +is encoded in the instruction via a segment register prefix. > + > +For example: > + > + DEFINE_PER_CPU(int, x); > + int z; > + > + z = this_cpu_read(x); > + > +results in a single instruction > + > + mov ax, gs:[x] > + > +instead of a sequence of calculation of the address and then a fetch from > +that address which occurs with the percpu operations. Before this_cpu_ops > +such sequence also required preempt disable/enable to prevent the kernel from > +moving the thread to a different processor while the calculation is performed. > + > + > +The main use of the this_cpu operations has been to optimize counter operations. > + > + > + this_cpu_inc(x) > + > +results in the following single instruction (no lock prefix!) > + > + inc gs:[x] > + > + > +instead of the following operations required if there is no segment register. > + > + int *y; > + int cpu; > + > + cpu = get_cpu(); > + y = per_cpu_ptr(&x, cpu); > + (*y)++; > + put_cpu(); > + > + > +Note that these operations can only be used on percpu data that is reserved for > +a specific processor. Without disabling preemption in the surrounding code > +this_cpu_inc() will only guarantee that one of the percpu counters is correctly > +incremented. However, there is no guarantee that the OS will not move the process > +directly before or after the this_cpu instruction is executed. In general this > +means that the value of the individual counters for each processor are > +meaningless. The sum of all the per cpu counters is the only value that is of > +interest. > + > +Per cpu variables are used for performance reasons. Bouncing cache lines can > +be avoided if multiple processors concurrently go through the same code paths. > +Since each processor has its own per cpu variables no concurrent cacheline > +updates take place. The price that has to be paid for this optimization is > +the need to add up the per cpu counters when the value of the counter is > +needed. > + > + > +Special operations: > +------------------- > + > + y = this_cpu_ptr(&x) > + > +Takes the offset of a per cpu variable (&x !) and returns the address of the > +per cpu variable that belongs to the currently executing processor. > +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence > +requires. No processor number is available. Instead the offset of the local > +per cpu area is simply added to the percpu offset. > + > + > + > +Per cpu variables and offsets > +----------------------------- > + > +Per cpu variables have *offsets* to the beginning of the percpu area. They do > +not have addresses although they look like that in the code. Offsets > +cannot be directly dereferenced. The offset must be added to a base pointer of > +a percpu area of a processor in order to form a valid address. > + > +Therefore the use of x or &x outside of the context of per cpu operations > +is invalid and will generally be treated like a NULL pointer dereference. > + > +In the context of per cpu operations > + > + x is a per cpu variable. Most this_cpu operations take a cpu variable. > + > + &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset > + of a per cpu variable which makes this look a bit strange. > + > + > + > +Operations on a field of a per cpu structure > +-------------------------------------------- > + > +Let's say we have a percpu structure > + > + struct s { > + int n,m; > + }; > + > + DEFINE_PER_CPU(struct s, p); > + > + > +Operations on these fields are straightforward > + > + this_cpu_inc(p.m) > + > + z = this_cpu_cmpxchg(p.m, 0, 1); > + > + > +If we have an offset to struct s: > + > + struct s __percpu *ps = &p; > + > + z = this_cpu_dec(ps->m); > + > + z = this_cpu_inc_return(ps->n); > + > + > +The calculation of the pointer may require the use of this_cpu_ptr() if we > +do not make use of this_cpu ops later to manipulate fields: > + > + struct s *pp; > + > + pp = this_cpu_ptr(&p); > + > + pp->m--; > + > + z = pp->n++; > + > + > +Variants of this_cpu ops > +------------------------- > + > +this_cpu ops are interrupt safe. Some architecture do not support these per > +cpu local operations. In that case the operation must be replaced by code > +that disables interrupts, then does the operations that are guaranteed to be > +atomic and then reenable interrupts. Doing so is expensive. If there are > +other reasons why the scheduler cannot change the processor we are executing > +on then there is no reason to disable interrupts. For that purpose > +the __this_cpu operations are provided. For example. > + > + __this_cpu_inc(x); > + > +Will increment x and will not fallback to code that disables interrupts on > +platforms that cannot accomplish atomicity through address relocation and > +a Read-Modify-Write operation in the same instruction. > + > + > + > +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) > +-------------------------------------------- > + > +The first operation takes the offset and forms an address and then adds > +the offset of the n field. > + > +The second one first adds the two offsets and then does the relocation. > +IMHO the second form looks cleaner and has an easier time with (). The > +second form also is consistent with the way this_cpu_read() and friends > +are used. > + > + > +Christoph Lameter, April 4th, 2013 > + > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <alpine.DEB.2.02.1304031540110.3444@gentwo.org>]
* [PERCPU] Remove & in front of this_cpu_ptr [not found] ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org> @ 2013-04-03 20:42 ` Christoph Lameter 2013-04-03 21:24 ` Tejun Heo 0 siblings, 1 reply; 30+ messages in thread From: Christoph Lameter @ 2013-04-03 20:42 UTC (permalink / raw) To: Tejun Heo; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet Subject: percpu: Remove & in front of this_cpu_ptr Both this_cpu_ptr(&percpu_pointer->field) [Add Offset in percpu pointer to the field offset in the struct and then add to the local percpu base] as well as &this_cpu_ptr(percpu_pointer)->field [Add percpu variable offset to local percpu base to form an address and then add the field offset to the address]. are correct. However, the latter looks a bit more complicated. The first one is easier to understand. The second one may be more difficult for the compiler to optimize as well. Convert all of them to this_cpu_ptr(&percpu_pointer->field). Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/fs/gfs2/rgrp.c =================================================================== --- linux.orig/fs/gfs2/rgrp.c 2013-04-03 15:25:22.576562629 -0500 +++ linux/fs/gfs2/rgrp.c 2013-04-03 15:26:43.045773676 -0500 @@ -1726,7 +1726,7 @@ static bool gfs2_rgrp_congested(const st s64 var; preempt_disable(); - st = &this_cpu_ptr(sdp->sd_lkstats)->lkstats[LM_TYPE_RGRP]; + st = this_cpu_ptr(&sdp->sd_lkstats->lkstats[LM_TYPE_RGRP]); r_srttb = st->stats[GFS2_LKS_SRTTB]; r_dcount = st->stats[GFS2_LKS_DCOUNT]; var = st->stats[GFS2_LKS_SRTTVARB] + Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c 2013-04-03 15:25:22.576562629 -0500 +++ linux/mm/page_alloc.c 2013-04-03 15:30:02.124769119 -0500 @@ -1342,7 +1342,7 @@ void free_hot_cold_page(struct page *pag migratetype = MIGRATE_MOVABLE; } - pcp = &this_cpu_ptr(zone->pageset)->pcp; + pcp = this_cpu_ptr(&zone->pageset->pcp); if (cold) list_add_tail(&page->lru, &pcp->lists[migratetype]); else @@ -1484,7 +1484,7 @@ again: struct list_head *list; local_irq_save(flags); - pcp = &this_cpu_ptr(zone->pageset)->pcp; + pcp = this_cpu_ptr(&zone->pageset->pcp); list = &pcp->lists[migratetype]; if (list_empty(list)) { pcp->count += rmqueue_bulk(zone, 0, Index: linux/net/core/flow.c =================================================================== --- linux.orig/net/core/flow.c 2013-04-03 15:25:22.576562629 -0500 +++ linux/net/core/flow.c 2013-04-03 15:26:43.045773676 -0500 @@ -328,7 +328,7 @@ static void flow_cache_flush_per_cpu(voi struct flow_flush_info *info = data; struct tasklet_struct *tasklet; - tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet; + tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet); tasklet->data = (unsigned long)info; tasklet_schedule(tasklet); } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-03 20:42 ` [PERCPU] Remove & in front of this_cpu_ptr Christoph Lameter @ 2013-04-03 21:24 ` Tejun Heo 2013-04-03 21:29 ` Eric Dumazet 0 siblings, 1 reply; 30+ messages in thread From: Tejun Heo @ 2013-04-03 21:24 UTC (permalink / raw) To: Christoph Lameter; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet Hello, Christoph. On Wed, Apr 03, 2013 at 08:42:33PM +0000, Christoph Lameter wrote: > Subject: percpu: Remove & in front of this_cpu_ptr > > Both > > this_cpu_ptr(&percpu_pointer->field) > > > [Add Offset in percpu pointer to the field offset in the struct > and then add to the local percpu base] > > as well as > > &this_cpu_ptr(percpu_pointer)->field > > [Add percpu variable offset to local percpu base to form an address > and then add the field offset to the address]. > > are correct. However, the latter looks a bit more complicated. > The first one is easier to understand. The second one may be > more difficult for the compiler to optimize as well. I don't know about this one. I actually prefer the latter in that the pointer being passed into this_cpu_ptr() is something which is the actual percpu pointer either from variable declaration or the allocator. Sure, they both are just different expressions of the same thing but the former requires an extra guarantee from percpu subsystem that the accessors would work for pointers which aren't the exact values defined or allocated. I'd much prefer unfiying things toward the latter than the former. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-03 21:24 ` Tejun Heo @ 2013-04-03 21:29 ` Eric Dumazet 2013-04-04 13:52 ` Christoph Lameter 0 siblings, 1 reply; 30+ messages in thread From: Eric Dumazet @ 2013-04-03 21:29 UTC (permalink / raw) To: Tejun Heo; +Cc: Christoph Lameter, RongQing Li, Shan Wei, netdev On Wed, 2013-04-03 at 14:24 -0700, Tejun Heo wrote: > I don't know about this one. I actually prefer the latter in that the > pointer being passed into this_cpu_ptr() is something which is the > actual percpu pointer either from variable declaration or the > allocator. Sure, they both are just different expressions of the same > thing but the former requires an extra guarantee from percpu subsystem > that the accessors would work for pointers which aren't the exact > values defined or allocated. I'd much prefer unfiying things toward > the latter than the former. I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field The offset is added after getting the address of the (percpu) base object. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-03 21:29 ` Eric Dumazet @ 2013-04-04 13:52 ` Christoph Lameter 2013-04-04 14:00 ` Tejun Heo 2013-04-04 14:29 ` Eric Dumazet 0 siblings, 2 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-04 13:52 UTC (permalink / raw) To: Eric Dumazet; +Cc: Tejun Heo, RongQing Li, Shan Wei, netdev On Wed, 3 Apr 2013, Eric Dumazet wrote: > I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field > > The offset is added after getting the address of the (percpu) base > object. There are two offsets being added! percpu_pointer is not a pointer but an offset. this_cpu_ptr creates a pointer from the percpu base of the current processor by adding the offset of the percpu variable. The offset calculation better be in the parenthesis. The method that I proposed is also conforming with the use of other this_cpu_ops. F.e. In order to do a read one would need to do x = this_cpu_read(percpu_pointer->field) x = this_cpu_read(percpu_pointer)->field does not work (and does not pass sparse). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-04 13:52 ` Christoph Lameter @ 2013-04-04 14:00 ` Tejun Heo 2013-04-04 14:21 ` Christoph Lameter 2013-04-04 14:29 ` Eric Dumazet 1 sibling, 1 reply; 30+ messages in thread From: Tejun Heo @ 2013-04-04 14:00 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev Hello, Christoph. On Thu, Apr 04, 2013 at 01:52:00PM +0000, Christoph Lameter wrote: > The method that I proposed is also conforming with the use of other > this_cpu_ops. F.e. In order to do a read one would need to do > > x = this_cpu_read(percpu_pointer->field) > > > > > x = this_cpu_read(percpu_pointer)->field > > does not work (and does not pass sparse). Right, this is true, and we *do* wanna support this_cpu ops other than this_cpu_ptr on per-cpu struct fields. The usage is still somewhat unusual tho. Can we please add documentation in the comments too? Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-04 14:00 ` Tejun Heo @ 2013-04-04 14:21 ` Christoph Lameter 2013-04-04 14:25 ` Tejun Heo 0 siblings, 1 reply; 30+ messages in thread From: Christoph Lameter @ 2013-04-04 14:21 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, 4 Apr 2013, Tejun Heo wrote: > Right, this is true, and we *do* wanna support this_cpu ops other than > this_cpu_ptr on per-cpu struct fields. The usage is still somewhat > unusual tho. Can we please add documentation in the comments too? I posted a patch adding documentation yesterday and you took it. ??? Add comments where? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-04 14:21 ` Christoph Lameter @ 2013-04-04 14:25 ` Tejun Heo 2013-04-04 15:02 ` Christoph Lameter 0 siblings, 1 reply; 30+ messages in thread From: Tejun Heo @ 2013-04-04 14:25 UTC (permalink / raw) To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, Apr 04, 2013 at 02:21:57PM +0000, Christoph Lameter wrote: > On Thu, 4 Apr 2013, Tejun Heo wrote: > > > Right, this is true, and we *do* wanna support this_cpu ops other than > > this_cpu_ptr on per-cpu struct fields. The usage is still somewhat > > unusual tho. Can we please add documentation in the comments too? > > I posted a patch adding documentation yesterday and you took it. > ??? > > Add comments where? I was thinking above this_cpu_*() ops. Let's make it as conspicious as reasonably possible. It's a similar problem with declaring per-cpu arrays - there are a couple ways to do it and there's no way to automatically reject the one which isn't preferred. I don't know. Maybe all we can do is periodic sweep through the source tree and fix up the "wrong" ones. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-04 14:25 ` Tejun Heo @ 2013-04-04 15:02 ` Christoph Lameter 0 siblings, 0 replies; 30+ messages in thread From: Christoph Lameter @ 2013-04-04 15:02 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev On Thu, 4 Apr 2013, Tejun Heo wrote: > I was thinking above this_cpu_*() ops. Let's make it as conspicious > as reasonably possible. It's a similar problem with declaring per-cpu > arrays - there are a couple ways to do it and there's no way to > automatically reject the one which isn't preferred. I don't know. > Maybe all we can do is periodic sweep through the source tree and fix > up the "wrong" ones. Both ways are working just fine. I'd like to use more of these though and would like to tighten things up a bit before doing sweeps through the kernel. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PERCPU] Remove & in front of this_cpu_ptr 2013-04-04 13:52 ` Christoph Lameter 2013-04-04 14:00 ` Tejun Heo @ 2013-04-04 14:29 ` Eric Dumazet 1 sibling, 0 replies; 30+ messages in thread From: Eric Dumazet @ 2013-04-04 14:29 UTC (permalink / raw) To: Christoph Lameter; +Cc: Tejun Heo, RongQing Li, Shan Wei, netdev On Thu, 2013-04-04 at 13:52 +0000, Christoph Lameter wrote: > On Wed, 3 Apr 2013, Eric Dumazet wrote: > > > I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field > > > > The offset is added after getting the address of the (percpu) base > > object. > > There are two offsets being added! I was speaking of the offsetof(struct ..., field), not on the 'offset' you think (the percpu one). Thats why I prefer &this_cpu_ptr(percpu_pointer)->field Its clearer for me, but thats a very minor issue. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] core: fix the use of this_cpu_ptr 2013-03-28 9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li 2013-03-28 13:05 ` Eric Dumazet @ 2013-03-29 19:13 ` David Miller 1 sibling, 0 replies; 30+ messages in thread From: David Miller @ 2013-03-29 19:13 UTC (permalink / raw) To: roy.qing.li; +Cc: netdev From: roy.qing.li@gmail.com Date: Thu, 28 Mar 2013 17:42:41 +0800 > From: Li RongQing <roy.qing.li@gmail.com> > > flush_tasklet is not percpu var, and percpu is percpu var, and > this_cpu_ptr(&info->cache->percpu->flush_tasklet) > is not equal to > &this_cpu_ptr(info->cache->percpu)->flush_tasklet > > 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug. > > Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Applied. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2013-04-11 17:01 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-28 9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li 2013-03-28 13:05 ` Eric Dumazet 2013-03-28 14:38 ` Christoph Lameter 2013-03-28 15:36 ` Eric Dumazet 2013-03-28 16:44 ` Christoph Lameter 2013-03-29 1:24 ` RongQing Li 2013-04-01 15:21 ` Christoph Lameter 2013-04-01 16:31 ` Eric Dumazet 2013-04-01 18:15 ` Christoph Lameter 2013-04-03 20:41 ` this cpu documentation Christoph Lameter 2013-04-03 21:18 ` Tejun Heo 2013-04-04 0:09 ` Randy Dunlap 2013-04-04 14:41 ` Christoph Lameter 2013-04-04 16:28 ` Tejun Heo 2013-04-04 17:19 ` Randy Dunlap 2013-04-04 17:26 ` Tejun Heo 2013-04-04 17:40 ` Christoph Lameter 2013-04-04 18:35 ` Randy Dunlap 2013-04-04 18:52 ` Tejun Heo 2013-04-11 17:00 ` Paul E. McKenney [not found] ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org> 2013-04-03 20:42 ` [PERCPU] Remove & in front of this_cpu_ptr Christoph Lameter 2013-04-03 21:24 ` Tejun Heo 2013-04-03 21:29 ` Eric Dumazet 2013-04-04 13:52 ` Christoph Lameter 2013-04-04 14:00 ` Tejun Heo 2013-04-04 14:21 ` Christoph Lameter 2013-04-04 14:25 ` Tejun Heo 2013-04-04 15:02 ` Christoph Lameter 2013-04-04 14:29 ` Eric Dumazet 2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).