From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Lameter Subject: Re: Poll about irqsafe_cpu_add and others Date: Thu, 17 Mar 2011 13:42:10 -0500 (CDT) Message-ID: References: <1300371834.6315.93.camel@edumazet-laptop> <20110317.081420.71114992.davem@davemloft.net> <1300380801.6315.306.camel@edumazet-laptop> <1300386569.6315.404.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: In-Reply-To: <1300386569.6315.404.camel@edumazet-laptop> Sender: netfilter-devel-owner@vger.kernel.org To: Eric Dumazet Cc: David Miller , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, netdev@vger.kernel.org, netfilter-devel@vger.kernel.org List-Id: linux-arch.vger.kernel.org On Thu, 17 Mar 2011, Eric Dumazet wrote: > By the way, I noticed : > > DECLARE_PER_CPU(u64, xt_u64); > __this_cpu_add(xt_u64, 2) translates to following x86_32 code : > > mov $xt_u64,%eax > add %fs:0x0,%eax > addl $0x2,(%eax) > adcl $0x0,0x4(%eax) > > > I wonder why we dont use : > > addl $0x2,%fs:xt_u64 > addcl $0x0,%fs:xt_u64+4 The compiler is fed the following *__this_cpu_ptr(xt_u64) += 2 __this_cpu_ptr makes it: *(xt_u64 + __my_cpu_offset) += 2 So the compiler calculates the address first and then increments it. The compiler could optimize this I think. Wonder why that does not happen. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp108.prem.mail.ac4.yahoo.com ([76.13.13.47]:47593 "HELO smtp108.prem.mail.ac4.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1755131Ab1CQSmO (ORCPT ); Thu, 17 Mar 2011 14:42:14 -0400 Date: Thu, 17 Mar 2011 13:42:10 -0500 (CDT) From: Christoph Lameter Subject: Re: Poll about irqsafe_cpu_add and others In-Reply-To: <1300386569.6315.404.camel@edumazet-laptop> Message-ID: References: <1300371834.6315.93.camel@edumazet-laptop> <20110317.081420.71114992.davem@davemloft.net> <1300380801.6315.306.camel@edumazet-laptop> <1300386569.6315.404.camel@edumazet-laptop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-arch-owner@vger.kernel.org List-ID: To: Eric Dumazet Cc: David Miller , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, netdev@vger.kernel.org, netfilter-devel@vger.kernel.org Message-ID: <20110317184210.LbCKZX0lbidgf5FS08-2nDMO8HbofV6-qW9fT1D3n5U@z> On Thu, 17 Mar 2011, Eric Dumazet wrote: > By the way, I noticed : > > DECLARE_PER_CPU(u64, xt_u64); > __this_cpu_add(xt_u64, 2) translates to following x86_32 code : > > mov $xt_u64,%eax > add %fs:0x0,%eax > addl $0x2,(%eax) > adcl $0x0,0x4(%eax) > > > I wonder why we dont use : > > addl $0x2,%fs:xt_u64 > addcl $0x0,%fs:xt_u64+4 The compiler is fed the following *__this_cpu_ptr(xt_u64) += 2 __this_cpu_ptr makes it: *(xt_u64 + __my_cpu_offset) += 2 So the compiler calculates the address first and then increments it. The compiler could optimize this I think. Wonder why that does not happen.