From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8AAB6C43334 for ; Thu, 14 Jul 2022 09:11:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=IeZJVsaqy73l6FVaKZcL0jyYn+/dnFbaDLfhl0AVbB4=; b=kV3NJKAsuVhygG SnQu7KfMidnjQ/5tfZQgG8jf4uQO4rJOY5HnyWavVDiLClJchBb0lXc5+4dcQoP8WsY9Viekj7oUn HLRJioLNaR1mAo4JldGPzOgbUvhjfRhK/eHfyWjvRtSzOHs97Or9nO6iJXOzb2945GzUs4MIx+TyP jpdm50dWWmmhLV9l9HuGk19njZ0tZax7LOeg0RpF0dSEGdz1cle4YDmTwi3BNad2kuUYrwcrjzzvA tC06G4bhGwZ2ze0OtlJkc9B/1xrX13nCYKBdQA++SoK89Mqh52LDyJFKFRT0alSLfED5LrFOG12vd E4+fYZNVsFeKUJLR4WrQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oBurl-00CfQX-MR; Thu, 14 Jul 2022 09:10:37 +0000 Received: from metis.ext.pengutronix.de ([2001:67c:670:201:290:27ff:fe1d:cc33]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oBuri-00CfMK-7A for linux-arm-kernel@lists.infradead.org; Thu, 14 Jul 2022 09:10:35 +0000 Received: from ptx.hi.pengutronix.de ([2001:67c:670:100:1d::c0]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oBurc-00017G-F1; Thu, 14 Jul 2022 11:10:28 +0200 Received: from sha by ptx.hi.pengutronix.de with local (Exim 4.92) (envelope-from ) id 1oBura-0007zV-O5; Thu, 14 Jul 2022 11:10:26 +0200 Date: Thu, 14 Jul 2022 11:10:26 +0200 From: Sascha Hauer To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Ingo Molnar , kernel@pengutronix.de Subject: Re: Performance impact of CONFIG_FUNCTION_TRACER Message-ID: <20220714091026.GM2387@pengutronix.de> References: <20220705105416.GE5208@pengutronix.de> <20220705103901.41a70cf0@rorschach.local.home> <20220705215948.GK5208@pengutronix.de> <20220705182746.4ce53681@rorschach.local.home> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20220705182746.4ce53681@rorschach.local.home> X-Sent-From: Pengutronix Hildesheim X-URL: http://www.pengutronix.de/ X-IRC: #ptxdist @freenode X-Accept-Language: de,en X-Accept-Content-Type: text/plain X-Uptime: 11:06:53 up 105 days, 21:36, 81 users, load average: 0.15, 0.38, 0.40 User-Agent: Mutt/1.10.1 (2018-07-13) X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::c0 X-SA-Exim-Mail-From: sha@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-arm-kernel@lists.infradead.org X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220714_021034_279589_53753E7D X-CRM114-Status: GOOD ( 35.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jul 05, 2022 at 06:27:46PM -0400, Steven Rostedt wrote: > On Tue, 5 Jul 2022 23:59:48 +0200 > Sascha Hauer wrote: > > > > > > > As I believe due to using a link register for function calls, ARM > > > requires adding two 4 byte nops to every function where as x86 only > > > adds a single 5 byte nop. > > > > > > Although nops are very fast (they should not be processed in the CPU's > > > pipe line, but I don't know if that's true for every arch). It also > > > affects instruction cache misses, as adding 8 bytes around the code > > > will cause more cache misses than when they do not exist. > > > > Just digged around a bit and saw that on ARM it's not even a real nop. > > The compiler emits: > > > > push {lr} > > bl 8010e7c0 <__gnu_mcount_nc> > > > > Which is then turned into a nop by replacing the second instruction with > > > > add sp, sp, #4 > > > > to bring the stack pointer back to its original value. This indeed must > > be processed by the CPU pipeline. I wonder if that could be optimized by > > replacing both instructions with a nop. I have no idea though if that's > > feasible at all or if the overhead would even get smaller by that. > > The problem is that there's no easy way to do that, because a task > could have been preempted after doing the 'push {lr}' and before the > 'bl'. Thus, you create a race by changing either one to a nop first. > > I wonder if it would have been better to change the first one to a jump > passed the second :-/ I gave this a try, but the performance was not better compared to the stack push/pop operations we have now. I also tried to replace both instructions with nops (mov r0, r0), still no better performance. I guess we have to live with it then. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel