From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D2F37C433EF for ; Tue, 5 Jul 2022 22:28:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=IPTyTnCxUPbZeQSHms2TXglPdiOJDnXoR8UgGD6gAmM=; b=loldyu/TyXoolY SOZRNh24ayDhaAZEA+DtcCdqvLSgQsSah2rD7oYgztYl+LISqaCWskIf5dLXML+wBRmehJncdroqg nnS8/jM8vwt+LMY2tQ8VEiUqWttqmYVGyvrJrrYhzrYuFIPw8rA2UWfJxHJlj9QzW5F3pglejVrtQ UJMj9osN52iswYY75kyB1Qs4CkQcP57AIcy9JSQuvkYAJNpZX7KsvvxeUv4HXoCkX+iz+n4q+j83R 71v/7sJ0GTYtPyoTWhwoOA9Z9r1AjLQ0d9MDOZVhM9LxZzgTCuF4OXmOXfYHsUNlOPJk8oWA0pBeh jyJs5qtw+jtUmth0aC6Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1o8r1P-0038No-3R; Tue, 05 Jul 2022 22:27:55 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1o8r1L-0038ME-RK for linux-arm-kernel@lists.infradead.org; Tue, 05 Jul 2022 22:27:53 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2F5E1B81A21; Tue, 5 Jul 2022 22:27:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43F86C341C7; Tue, 5 Jul 2022 22:27:48 +0000 (UTC) Date: Tue, 5 Jul 2022 18:27:46 -0400 From: Steven Rostedt To: Sascha Hauer Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Ingo Molnar , kernel@pengutronix.de Subject: Re: Performance impact of CONFIG_FUNCTION_TRACER Message-ID: <20220705182746.4ce53681@rorschach.local.home> In-Reply-To: <20220705215948.GK5208@pengutronix.de> References: <20220705105416.GE5208@pengutronix.de> <20220705103901.41a70cf0@rorschach.local.home> <20220705215948.GK5208@pengutronix.de> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220705_152752_082721_D47C0D8C X-CRM114-Status: GOOD ( 34.00 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 5 Jul 2022 23:59:48 +0200 Sascha Hauer wrote: > > > > As I believe due to using a link register for function calls, ARM > > requires adding two 4 byte nops to every function where as x86 only > > adds a single 5 byte nop. > > > > Although nops are very fast (they should not be processed in the CPU's > > pipe line, but I don't know if that's true for every arch). It also > > affects instruction cache misses, as adding 8 bytes around the code > > will cause more cache misses than when they do not exist. > > Just digged around a bit and saw that on ARM it's not even a real nop. > The compiler emits: > > push {lr} > bl 8010e7c0 <__gnu_mcount_nc> > > Which is then turned into a nop by replacing the second instruction with > > add sp, sp, #4 > > to bring the stack pointer back to its original value. This indeed must > be processed by the CPU pipeline. I wonder if that could be optimized by > replacing both instructions with a nop. I have no idea though if that's > feasible at all or if the overhead would even get smaller by that. The problem is that there's no easy way to do that, because a task could have been preempted after doing the 'push {lr}' and before the 'bl'. Thus, you create a race by changing either one to a nop first. I wonder if it would have been better to change the first one to a jump passed the second :-/ Actually, if you don't mind setups that take a long time, if you change the first to a jump passed the second, then do synchronize_rcu_rude() (which may take a while, possibly several seconds or more) then you know that all users now only see the jump, and none will see the bl. Then you could convert the bl to nop, and then even change the jump to nop after that. To convert back, you would need to reverse it. Convert the first nop back to a jmp, run synchronize_rcu_rude(). Then convert the second nop to the bl, and then convert the first to the push {lr}. > > > > > Also, there's some configurations that use the old mcount that does add > > some more code to handle the mcount case. > > > > So if this is just to have us change the kconfig, I'm happy to do that. > > Yes, would be good to make the kconfig text clear. The overhead itself > is fine when people know that's the price to pay for getting the > function tracer. Agreed. I'll write up a patch. -- Steve _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel