From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BAEE20AF69 for ; Thu, 6 Mar 2025 09:57:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741255075; cv=none; b=HigtySCGkgU7nYrfJ3zOVUErL4Erv2ePkxi0Egl7mcitm8RoUFhzcTLUNQ4C85jMwLXQPQkPz/O7A/WQZz07rHw9X0YbDtc0rkzMRLSA8qgaFWie8xFame7PBFdaoXSrmmzJIR+GXGjM91uBQInvz1d3BnXB1jziJ+berW9y2Yo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741255075; c=relaxed/simple; bh=Ao19rpVVji7s5n35VmpbbTgcnwYhjv4subEZAwg1LOE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RrIqVcDjhRDkfguY6sW/lwrpHxk6RTEzryn1dI6YvI82MFXkRL4nSHh+JyWr2kp88ol3/316qpWE9cC2Jnq78VDHJ24fbGMDwtZUgNSScT/MJ9kwiJ7cJss4jqasxVDMaJQZdwvg4WqY7rstjos+EMlNw+LKJGrBjg5XEFHVN+A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ffjkQmJu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ffjkQmJu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 63E71C4CEE2; Thu, 6 Mar 2025 09:57:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741255074; bh=Ao19rpVVji7s5n35VmpbbTgcnwYhjv4subEZAwg1LOE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ffjkQmJuIYIQPi9YQBxWzCIPS+FkgqyBj8qGeikJ3QcACPRAm/Uveq7qX27octhp1 qBshRrMGuj9iz78AHyCoObXfJoDX3Y0GbmGlHnWVjWOlQr0ZyW5BcqdXgCwfTyTRWM 4mZVzWTbH2x9B9uib9NVhRgZW36vxUi+p22oFxv00JVTmLL8FbCam6uq9jSLnwx/fT pKnTe3Y3Cz+Ujum7DRFTPRb/22WT48mA19sMka02RipucqxfUjILexmeRnVmbIcqnI 4jGDWRbpQOHz8Y/ow3h02o6Adav1pDebWeWz7Kmp1EvhOB5/hmoVnXGSn14QaIv1tR +z7bxChXS8UXA== Date: Thu, 6 Mar 2025 10:57:44 +0100 From: Ingo Molnar To: Uros Bizjak Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Linus Torvalds Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns Message-ID: References: <20250228123825.2729925-1-ubizjak@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250228123825.2729925-1-ubizjak@gmail.com> * Uros Bizjak wrote: > According to: > > https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html > > the usage of asm pseudo directives in the asm template can confuse > the compiler to wrongly estimate the size of the generated > code. > > The LOCK_PREFIX macro expands to several asm pseudo directives, so > its usage in atomic locking insns causes instruction length estimate > to fail significantly (the specially instrumented compiler reports > the estimated length of these asm templates to be 6 instructions long). > > This incorrect estimate further causes unoptimal inlining decisions, > unoptimal instruction scheduling and unoptimal code block alignments > for functions that use these locking primitives. > > Use asm_inline instead: > > https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html > > which is a feature that makes GCC pretend some inline assembler code > is tiny (while it would think it is huge), instead of just asm. > > For code size estimation, the size of the asm is then taken as > the minimum size of one instruction, ignoring how many instructions > compiler thinks it is. > > The code size of the resulting x86_64 defconfig object file increases > for 33.264 kbytes, representing 1.2% code size increase: > > text data bss dec hex filename > 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o > 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o > > mainly due to different inlining decisions of -O2 build. So my request here would be not more benchmark figures (I don't think it's a realistic expectation for contributors to be able to measure much of an effect with such a type of change, let alone be certain what a macro or micro-benchmark measures is causally connected with the patch), but I'd like to ask for some qualitative analysis on the code generation side: - +1.2% code size increase is a lot, especially if it's under the default build flags of the kernel. Where does the extra code come from? - Is there any effect on Clang? Are its inlining decisions around these asm() statements comparable, worse/better? A couple of concrete examples would go a long way: - "Function XXX was inlined 3 times before the patch, and it was inlined 30 times after the patch. I have reviewed two such inlining locations, and they have added more code to unlikely or failure-handling branches collected near the function epilogue, while the fast-path of the function was more optimal." Or you might end up finding: - "Function YYY was inlined 3x more frequently after the patch, but the inlining decision increased register pressure and created less optimal code in the fast-path, increasing both code size and likely decreasing fast-path performance." Obviously we'd be sad about the second case, but it's well within the spectrum of possibilities when we look at "+1.2% object code size increase". What we cannot do is to throw up our hands and claim "-O2 trades performance for size, and thus this patch improves performance". We don't know that for sure and 30 years of kernel development created a love-and-hate relationship and a fair level of distrust between kernel developers and compiler inlining decisions, especially around x86 asm() statements ... So these are roughly the high level requirements around such patches. Does this make sense? Thanks, Ingo