From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760022AbXGCUn6 (ORCPT ); Tue, 3 Jul 2007 16:43:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756486AbXGCUnv (ORCPT ); Tue, 3 Jul 2007 16:43:51 -0400 Received: from tomts20.bellnexxia.net ([209.226.175.74]:48975 "EHLO tomts20-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751908AbXGCUnu (ORCPT ); Tue, 3 Jul 2007 16:43:50 -0400 Date: Tue, 3 Jul 2007 16:43:47 -0400 From: Mathieu Desnoyers To: "H. Peter Anvin" Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [patch 06/10] Immediate Value - i386 Optimization Message-ID: <20070703204347.GA8876@Krystal> References: <20070703164046.645090494@polymtl.ca> <20070703164515.071300768@polymtl.ca> <468A9956.9050903@zytor.com> <20070703191605.GB4047@Krystal> <468AAF1F.6010909@zytor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <468AAF1F.6010909@zytor.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 16:23:10 up 2 days, 15:06, 4 users, load average: 0.60, 0.98, 0.69 User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * H. Peter Anvin (hpa@zytor.com) wrote: > Mathieu Desnoyers wrote: > > > > Hi Peter, > > > > I understand your concern. If you find a way to let the code be compiled > > by gcc, put at the end of the functions (never being a branch target) > > and then, dynamically, get the address of the branch instruction and > > patch it, all that in cooperation with gcc, I would be glad to hear from > > it. What I found is that gcc lets us do anything that touches > > variables/registers in an inline assembly, but does not permit to place > > branch instructions ourselves; it does not expect the execution flow to > > be changed in inline asms. > > > > I believe this is correct. It probably would require requesting a gcc > builtin, which might be worthwhile to do if we > > > > > 77: b8 00 00 00 00 mov $0x0,%eax > > 7c: 85 c0 test %eax,%eax > > 7e: 0f 85 16 03 00 00 jne 39a > > here, we just loaded 0 in eax (movl used to make sure we populate the > > whole register so we do not stall the pipeline) > > When we activate the site, > > line 77 becomes: b8 01 00 00 00 mov $0x1,%eax > > > > One could, though, use an indirect jump to achieve, if not as good, at > least most of the effect: > > movl $, > jmp * > Using a jmp * will instruct gcc not to inline inline functions and restrict loop unrolling (but the latter is not used in the linux kernel). We would have to compute different $ for every site generated by putting an immediate in an inline function. > Some x86 cores will be able to detect the movl...jmp forwarding, and > collapse it into a known branch target; however, on the ones that can't, > it might be worse, since one would have to rely on the indirect branch > predictor. > > This would, however, provide infrastructure that could be combined with > a future gcc builtin. > If we can change the compiler, here is what we could do: Tell GCC to put NOPs that could be altered by a branch alternative to some specified code. We should be able to get the instruction pointers (think of inlines) to these nop/branch instructions so we can change them dynamically. Something like: immediate_t myfunc_cond; inline myfunction(void) { static void *insn; /* pointer to nops/branch instruction */ static void *target_inactive, *target_active; __builtin_polymorphic_if(&insn, &myfunc_cond) { /* Do something */ } else { ... } } I could then save all the insns into my immediate value section and later activate them by looking up all of those who refer to myfunc_cond. The default behavior would be to branch to the target_inactive, and we could change insn to jump to target_active dynamically. Note that we should align the jump instruction so the address could be changed atomically in the general case (on x86 and x86_64, we have to use an int3 bypass anyway, so we don't really care). Also, we should fine a way to let gcc tell us what type of jump it had to use depending on how far the target of the branch is. I suspect this would be inherently tricky. If someone is ready to do this and tells me "yes, it will be there in 1 month", I am more than ready to switch my markers to this and help, but since the core of my work is kernel tracing, I don't have the time nor the ressources to tackle this problem. In the event that someone answers "we'll do this in the following 3 years", I might consider to change the if (immediate(var)) into an immediate_if (var) so we can later proceed to the change with simple ifdefs without rewriting all the kernel code that would use it. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68