From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934985AbbCDHNb (ORCPT ); Wed, 4 Mar 2015 02:13:31 -0500 Received: from mail-wi0-f172.google.com ([209.85.212.172]:38179 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933558AbbCDHN3 (ORCPT ); Wed, 4 Mar 2015 02:13:29 -0500 Date: Wed, 4 Mar 2015 08:13:24 +0100 From: Ingo Molnar To: Borislav Petkov Cc: X86 ML , Andy Lutomirski , LKML , Linus Torvalds Subject: Re: [PATCH v2 07/15] x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2 Message-ID: <20150304071324.GA22028@gmail.com> References: <1424776497-3180-1-git-send-email-bp@alien8.de> <1424776497-3180-8-git-send-email-bp@alien8.de> <20150304062552.GA16111@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150304062552.GA16111@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > * Borislav Petkov wrote: > > > From: Borislav Petkov > > > > Use the asm macro and drop the locally grown version. > > > @@ -73,9 +49,11 @@ ENTRY(_copy_to_user) > > jc bad_to_user > > cmpq TI_addr_limit(%rax),%rcx > > ja bad_to_user > > + ALTERNATIVE_2 "jmp copy_user_generic_unrolled", \ > > + "jmp copy_user_generic_string", \ > > + X86_FEATURE_REP_GOOD, \ > > + "jmp copy_user_enhanced_fast_string", \ > > + X86_FEATURE_ERMS > > Btw., as a future optimization, wouldn't it be useful to patch this > function at its first instruction, i.e. to have three fully functional > copy_user_generic_ variants and choose to jmp to one of them in the > first instruction of the original function? > > The advantage would be two-fold: > > 1) right now: smart microarchitectures that are able to optimize > jump-after-jump (and jump-after-call) targets in their branch > target cache can do so in this case, reducing the overhead of the > patching, possibly close to zero in the cached case. Btw., the x86 memset() variants are using this today, and I think this is the most optimal jump-patching variant, even if it means a small amount of code duplication between the copy_user variants. Thanks, Ingo