From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752127AbZHATjp (ORCPT ); Sat, 1 Aug 2009 15:39:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752089AbZHATjo (ORCPT ); Sat, 1 Aug 2009 15:39:44 -0400 Received: from terminus.zytor.com ([198.137.202.10]:44193 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752046AbZHATjo (ORCPT ); Sat, 1 Aug 2009 15:39:44 -0400 Message-ID: <4A7499BA.2000405@zytor.com> Date: Sat, 01 Aug 2009 12:38:34 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Thunderbird/3.0b2 MIME-Version: 1.0 To: Linus Torvalds CC: Ingo Molnar , Thomas Gleixner , Linux Kernel Mailing List , Tejun Heo Subject: Re: [GIT PULL] Additional x86 fixes for 2.6.31-rc5 References: <200907311813.n6VIDe9S023442@voreg.hos.anvin.org> <20090731195705.GA12270@elte.hu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/01/2009 12:28 PM, Linus Torvalds wrote: > > Hmm. > > I just noticed another issue on x86 code generation, since I was looking > at assembly language generation due to the do_sigaltstack() kernel stack > info leak thing. > > Our "get_current()" seriously sucks now that it's a per-cpu variable. > > Look at the code generated for something like > > current->sas_ss_sp = (unsigned long) ss_sp; > current->sas_ss_size = ss_size; > > and notice how the code really really sucks: > > movq %gs:per_cpu__current_task,%rcx > movq %rdx, 1152(%rcx) > movq %gs:per_cpu__current_task,%rdx > movq %rax, 1160(%rdx) > > because it reloads that silly per-cpu variable every time, because the > assembler has a constraint of > > "m" (per_cpu__current_task) > > and so gcc is worried that the stores will invalidate the result of the > load from the per-cpu variable. > > I don't know how to fix that _well_, but here's a not-so-very-pretty patch > that seems to shave off 4.5kB from my kernel, and gives gcc much better > scheduling for 'current' and 'thread_info' because now it can load them > early - and cache them - even in the presense of stores. > This is clearly better... now the semi-obvious question becomes if there is any way we can get compiler support to do better and migrate to that as the compiler allows. In particular, if I remember right the problem with using __thread for percpu was exactly that the current cpuness can change almost anywhere, unless preemption is disabled. I'm wondering if we could use __thread or something like it for the stable perthreads, perhaps with additional compiler hints. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf.