From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH, RFC] x86/HVM: batch vCPU wakeups Date: Wed, 10 Sep 2014 11:37:25 +0100 Message-ID: <541029E5.9020809@citrix.com> References: <540ED7810200007800032678@mail.emea.novell.com> <20140909212925.GC82414@deinos.phlegethon.org> <540F8123.7060000@citrix.com> <20140910102924.GA44982@deinos.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XRfHH-0005vx-5t for xen-devel@lists.xenproject.org; Wed, 10 Sep 2014 10:37:31 +0000 In-Reply-To: <20140910102924.GA44982@deinos.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: Ian Campbell , xen-devel , Keir Fraser , Ian Jackson , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 10/09/14 11:29, Tim Deegan wrote: > At 23:37 +0100 on 09 Sep (1410302243), Andrew Cooper wrote: >>>> +void cpu_raise_softirq_batch_finish(void) >>>> +{ >>>> + unsigned int cpu, this_cpu = smp_processor_id(); >>>> + cpumask_t *mask = &per_cpu(batch_mask, this_cpu); >>> Again, this_cpu()? >> ...But disagree here. Multiple uses of this_cpu($FOO) cannot be >> coalesced due to RELOC_HIDE() deliberately preventing optimisation. For >> multiple uses, pulling it out by pointer to start with results in rather >> more efficient code. > I wasn't questioning the pointer, but to the use of per_cpu(..., > this_cpu) instead of this_cpu(...). Both of those involve a > RELOC_HIDE(). > > Anyway, it's pretty clear from your and Jan's replies that multiple > this_cpu() invocations are slower -- thanks for the clarification! > > Tim. The difference (if any) between per_cpu() vs this_cpu() depends on whether the compiler decides to recalculate smp_processor_id() or not. The former is manual optimisation on behalf of the programmer. I am beginning to wonder whether the use of __attribute__((const)) might help with get_cpu_info(). Despite the explicit stack pointer reference which is undoubtedly the source of optimisation confusion for the compiler, inside a function, the result of get_cpu_info() is genuinely never going to change, even though the compiler can't necessarily prove the fact. ~Andrew