From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailtransmit05.runbox.com (mailtransmit05.runbox.com [185.226.149.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47EFE32F74F for ; Wed, 26 Nov 2025 15:19:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.38 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764170351; cv=none; b=AEyXZWb2FYDnqdHmcOj5c93oCd4r5lJOTDE4uhO0Wqyo6M/F3cR0t87v5mDIK4f2ZXq/L5sf8V7qAt7qAs18C6xKREwaj9sETSDUFdPqNKhqyVFGCA3/fjyuHD+MNvZ8C1JKk7F8rmzTaDhMrVJ8r31MfYlYxzDaFAPBPSCtYmM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764170351; c=relaxed/simple; bh=akyUDuyoZHGdnEVkRGbEKbOz4wyRTgsbh2lomOVbVUI=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jvMXv2AAJQ3vkW3bvMm+/D5SgkDh6HQi1AOjkw7vo0BRnBBv0yygf06oQMwXck6o1ta+53HuTkJr1f0ftonVLvYYAZTeZbXpQL75ipYidNxjz/ClxCIecGkzogYnSdGo4eHEfC6s9OTj9xaPrmeeyj9h4v60z3dqB3IWsLTLfn4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=runbox.com; spf=pass smtp.mailfrom=runbox.com; dkim=pass (2048-bit key) header.d=runbox.com header.i=@runbox.com header.b=Ygvyqlmc; arc=none smtp.client-ip=185.226.149.38 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=runbox.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=runbox.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=runbox.com header.i=@runbox.com header.b="Ygvyqlmc" Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com) by mailtransmit05.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1vOHIF-009fXx-TP; Wed, 26 Nov 2025 16:18:55 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=runbox.com; s=selector1; h=Content-Transfer-Encoding:Content-Type:MIME-Version: References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date; bh=0lmAoRWxBgk2KUGLSBpqAUepxxacltK55S7LagYtH3k=; b=Ygvyqlmc1JWUwFXp2iy0jmGNmY z6wPmgznGvlEaZsEBbxMrg9p7KGevFomdvXOD46FUaPTJ5NDMsMlBGXCwBUvQ8eRb31ImgUgT9MQj 3KhvDQdEd7aqeGjUbY9QKPAuhF+OKnQLD6xDNdJLbiqoxwYSvUhLQDueW1xcYQP7WasozUYvzXxEo xjaamCwSkdeWnwXxD8JJN8vPmg1uUJOLzzZNI07kZ5Nj6mWQgrrLvu6DZbBgrH6bUIU4P/rW50Rw/ shpcltLCNDmjbkqa+4stbwF2yoezRk2w70lHNDgWJpfpkwK25Uq+MtxU1WcEwubrk2Ngh8Xsrxmrk 2X9ksHsA==; Received: from [10.9.9.74] (helo=submission03.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1vOHIF-0006Jj-12; Wed, 26 Nov 2025 16:18:55 +0100 Received: by submission03.runbox with esmtpsa [Authenticated ID (1493616)] (TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256) (Exim 4.93) id 1vOHIA-00Dyl7-Hg; Wed, 26 Nov 2025 16:18:50 +0100 Date: Wed, 26 Nov 2025 15:18:48 +0000 From: david laight To: Uros Bizjak Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Subject: Re: [PATCH] x86: Optimize __get_user_asm() assembly Message-ID: <20251126151848.2f57d5d4@pumpkin> In-Reply-To: <20251126132352.448052-1-ubizjak@gmail.com> References: <20251126132352.448052-1-ubizjak@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Wed, 26 Nov 2025 14:23:32 +0100 Uros Bizjak wrote: > On x86, byte and word moves (MOVB, MOVW) write only to the low > portion of a 32-bit register, leaving the upper bits unchanged. > Modern compilers therefore prefer using MOVZBL and MOVZWL to load > 8-bit and 16-bit values with zero-extension to the full register > width. > > Update __get_user_asm() to follow this convention by explicitly > zero-extending 8-bit and 16-bit loads from memory. > > An additional benefit of this change is that it enables the full > integer register set to be used for 8-bit loads. Also, it > eliminates the need for manual zero-extension of 8-bit values. > > There is only a minimal increase in code size: Interesting, where does that come from. I'd have thought it should shrink it. The mov[sz]x is one byte longer. Perhaps a lot of 8-bit values are never zero-extended? I'm trying to remember what the magic 'k' is for. Does gas treat %krxx as %dxx ? Which would matter if 'x' is 64bit when using "=r" for 32bit reads. But, in that case, I think you need it on the 'movl' as well. There is the slight problem (which affects for of the asm) is that gcc (I think) can't assume that the high 32bits of a register result from an inline asm function are zero. But if you return 'u64' then everything at the call site has to be extended as well. I wonder (not looked at the generated code yet) whether casting the u64 result from the function to u32 has the effect of avoiding the zero extend if the value is needed as a 64bit one. This may (or may not) affect get_user() but will affect the 'bit scan' functions. This means you might get better code if 'x' is always u64 (with gas generating instructions that write to the low 32 bits - zero the high ones) but with a cast to u32 before any value can be used (esp when inlined). Of course, I might be smoking substances again... David > > text data bss dec hex filename > 28258526 4810554 740932 33810012 03e65c vmlinux-old.o > 28258550 4810554 740932 33810036 03e674 vmlinux-new.o > > Signed-off-by: Uros Bizjak > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: "H. Peter Anvin" > - > --- > arch/x86/include/asm/uaccess.h | 40 +++++++++++++++------------------- > 1 file changed, 17 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h > index 367297b188c3..cb463c1301f6 100644 > --- a/arch/x86/include/asm/uaccess.h > +++ b/arch/x86/include/asm/uaccess.h > @@ -260,30 +260,27 @@ do { \ > unsigned int __gu_low, __gu_high; \ > const unsigned int __user *__gu_ptr; \ > __gu_ptr = (const void __user *)(ptr); \ > - __get_user_asm(__gu_low, __gu_ptr, "l", "=r", label); \ > - __get_user_asm(__gu_high, __gu_ptr+1, "l", "=r", label); \ > + __get_user_asm(__gu_low, __gu_ptr, "movl", "", label); \ > + __get_user_asm(__gu_high, __gu_ptr+1, "movl", "", label); \ > (x) = ((unsigned long long)__gu_high << 32) | __gu_low; \ > } while (0) > #else > #define __get_user_asm_u64(x, ptr, label) \ > - __get_user_asm(x, ptr, "q", "=r", label) > + __get_user_asm(x, ptr, "movq", "", label) > #endif > > #define __get_user_size(x, ptr, size, label) \ > do { \ > __chk_user_ptr(ptr); \ > switch (size) { \ > - case 1: { \ > - unsigned char x_u8__; \ > - __get_user_asm(x_u8__, ptr, "b", "=q", label); \ > - (x) = x_u8__; \ > + case 1: \ > + __get_user_asm(x, ptr, "movzbl", "k", label); \ > break; \ > - } \ > case 2: \ > - __get_user_asm(x, ptr, "w", "=r", label); \ > + __get_user_asm(x, ptr, "movzwl", "k", label); \ > break; \ > case 4: \ > - __get_user_asm(x, ptr, "l", "=r", label); \ > + __get_user_asm(x, ptr, "movl", "", label); \ > break; \ > case 8: \ > __get_user_asm_u64(x, ptr, label); \ > @@ -294,11 +291,11 @@ do { \ > instrument_get_user(x); \ > } while (0) > > -#define __get_user_asm(x, addr, itype, ltype, label) \ > +#define __get_user_asm(x, addr, insn, opmod, label) \ > asm_goto_output("\n" \ > - "1: mov"itype" %[umem],%[output]\n" \ > + "1: " insn " %[umem],%" opmod "[output]\n" \ > _ASM_EXTABLE_UA(1b, %l2) \ > - : [output] ltype(x) \ > + : [output] "=r" (x) \ > : [umem] "m" (__m(addr)) \ > : : label) > > @@ -326,26 +323,23 @@ do { \ > }) > > #else > -#define __get_user_asm_u64(x, ptr, retval) \ > - __get_user_asm(x, ptr, retval, "q") > +#define __get_user_asm_u64(x, ptr, retval) \ > + __get_user_asm(x, ptr, "movq", "", retval) > #endif > > #define __get_user_size(x, ptr, size, retval) \ > do { \ > - unsigned char x_u8__; \ > - \ > retval = 0; \ > __chk_user_ptr(ptr); \ > switch (size) { \ > case 1: \ > - __get_user_asm(x_u8__, ptr, retval, "b"); \ > - (x) = x_u8__; \ > + __get_user_asm(x, ptr, "movzbl", "k", retval); \ > break; \ > case 2: \ > - __get_user_asm(x, ptr, retval, "w"); \ > + __get_user_asm(x, ptr, "movzwl", "k", retval); \ > break; \ > case 4: \ > - __get_user_asm(x, ptr, retval, "l"); \ > + __get_user_asm(x, ptr, "movl", "", retval); \ > break; \ > case 8: \ > __get_user_asm_u64(x, ptr, retval); \ > @@ -355,9 +349,9 @@ do { \ > } \ > } while (0) > > -#define __get_user_asm(x, addr, err, itype) \ > +#define __get_user_asm(x, addr, insn, opmod, err) \ > asm volatile("\n" \ > - "1: mov"itype" %[umem],%[output]\n" \ > + "1: " insn " %[umem],%" opmod "[output]\n" \ > "2:\n" \ > _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG | \ > EX_FLAG_CLEAR_AX, \