From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 63129C61DA4 for ; Thu, 23 Feb 2023 09:29:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=grKhPRIj9rlwg0QewwVUO22LoVxB5O6ADKP+LYgVbXg=; b=R+idynELBr5zkY 7VLCnTPFxOTe3ANixIGR4GBBhAdlS5F8Mup2+BbO6Lv5aw1g8cz9p8zGN326yUV/HFHlORPg0BPZx eQeL1p1TQQl62JEIVoISLTCcV7gg16OjcerhFNk9f60szh2d9Ub39XwvUgh2CbRvDRdd0PcZdvNs8 RNQBAxjdZnlEUIsAWASWcJNuhzpN49Cg3O1GTWfQiCRqq4wiZzoDMjpN32og4pF9nfTB7VqsB/aHM AOij9tiTvSdQRbiTE2GRDfaOJYGeuTmgnozfjr+uHtcnrPEaqAMxJUFaxZf6ntEvDW5P4vmrtRRuK 97hj9jb19PTbnY8wwLTw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pV7uI-00Fhhu-VG; Thu, 23 Feb 2023 09:28:55 +0000 Received: from mail-pj1-x1033.google.com ([2607:f8b0:4864:20::1033]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pV7uF-00FhgR-EE for linux-arm-kernel@lists.infradead.org; Thu, 23 Feb 2023 09:28:53 +0000 Received: by mail-pj1-x1033.google.com with SMTP id cp7-20020a17090afb8700b0023756229427so3256446pjb.1 for ; Thu, 23 Feb 2023 01:28:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UjYf5J3J9XUndvOCZh6Z2mddp8OlC3USsn2OpMMQFcQ=; b=fzy43LkRNYzLBqeigNgTlVG3MgLUKnye9UyydsNIaeLaWFKo+evR+6Os+9PGcUOG8G apL/w9t1Uuf57Uo75+X97K7KP2LjXJy93YK3ueA56y02wPibb2CctXAwhOzkfe/zFP5n cNzTMz433bVZosZhDkFI48gQgs0mctxXyc4k4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UjYf5J3J9XUndvOCZh6Z2mddp8OlC3USsn2OpMMQFcQ=; b=t3TbJEJHuhZ5e8hsXRsuVdrxE9Zpdm4apGOqwd/i/sSHH9Om6x2xd/+qT+tbmBNs8L Ok9+rk5h/xySSbcWE+1BtRqxD6Ix0bYHZAyJaJsVhqyEIsPV+Rt4YB3bCJPtVr9jfDTb KZZMARF2i7fiIzc+l8EMimIiU8wMvDiJOapEAtb71TmXwKYOsgKtZu9NkrhomqlDisgj 1e1ZKc0GUq5NLQRHvTM9iJihwS/ePWpoRHjHshi7Wa48109L2gmp6vj7sB3skdv1tEm6 xHaXeO6hG/pohYGPG3QVTdBx98XQvAdeK26alvQphun1pplLIBEuGudEYLVv+3ADNY0s irgA== X-Gm-Message-State: AO0yUKX45Izi2zbjrf6Jcnf3eb9t8GFWaHrvWIb+A3iQO8bnhOp7NVk9 yKcdRImktTwd7f4YttVtOCqkKw== X-Google-Smtp-Source: AK7set8uXHWcK3Hy+r3E1WdhKcS7IcC6e6RewXaM+OA3MLeiDF94C8dGG47KHbC0gVknZdGAQrZdIQ== X-Received: by 2002:a05:6a20:7f8a:b0:b8:841d:85bb with SMTP id d10-20020a056a207f8a00b000b8841d85bbmr12167567pzj.0.1677144527784; Thu, 23 Feb 2023 01:28:47 -0800 (PST) Received: from google.com ([2401:fa00:1:10:24fe:6a37:3375:c5bc]) by smtp.gmail.com with ESMTPSA id c17-20020aa78c11000000b0056d7cc80ea4sm6642222pfd.110.2023.02.23.01.28.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 01:28:47 -0800 (PST) Date: Thu, 23 Feb 2023 17:28:43 +0800 From: Chen-Yu Tsai To: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org, Catalin Marinas , Steven Rostedt , lenb@kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, mhiramat@kernel.org, ndesaulniers@google.com, ojeda@kernel.org, peterz@infradead.org, rafael.j.wysocki@intel.com, revest@chromium.org, robert.moore@intel.com, will@kernel.org Subject: Re: [PATCH v3 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Message-ID: References: <20230123134603.1064407-1-mark.rutland@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230123134603.1064407-1-mark.rutland@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230223_012851_521950_7848219B X-CRM114-Status: GOOD ( 42.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Jan 23, 2023 at 01:45:55PM +0000, Mark Rutland wrote: > Hi Catalin, Steve, > > I'm not sure how we want to merge this, so I've moved the core ftrace > patch to the start of the series so that it can more easily be placed on > a stable branch if we want that to go via the ftrace tree and the rest > to go via arm64. > > This is cleanly pasing the ftrace selftests from v6.2-rc3 (results in > the final patch). > > Aside from that, usual cover letter below. > > This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and > enables support for this on arm64. This significantly reduces the > overhead of tracing when a callsite/tracee has a single associated > tracer, avoids a number of issues that make it undesireably and > infeasible to use dynamically-allocated trampolines (e.g. branch range > limitations), and makes it possible to implement support for > DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future. > > The main idea is to give each ftrace callsite an associated pointer to > an ftrace_ops. The architecture's ftrace_caller trampoline can recover > the ops pointer and invoke ops->func from this without needing to use > ftrace_ops_list_func, which has to iterate through all registered ops. > > To make this work, we use -fpatchable-function-entry=M,N, there N NOPs > are placed before the function entry point. On arm64 NOPs are always 4 > bytes, so by allocating 2 per-function NOPs, we have enough space to > place a 64-bit value. So that we can manipulate the pointer atomically, > we need to align instrumented functions to at least 8 bytes, which we > can ensure using -falign-functions=8. Does this work all the time? Or is it influenced by other Kconfig options? I'm getting random misaligned patch-site warnings like the following: Misaligned patch-site gic_handle_irq+0x0/0x12c Misaligned patch-site __traceiter_initcall_level+0x0/0x60 Misaligned patch-site __traceiter_initcall_start+0x0/0x60 Misaligned patch-site __traceiter_initcall_finish+0x0/0x68 Misaligned patch-site do_one_initcall+0x0/0x300 Misaligned patch-site do_one_initcall+0x2b0/0x300 Misaligned patch-site match_dev_by_label+0x0/0x50 Misaligned patch-site match_dev_by_uuid+0x0/0x48 Misaligned patch-site wait_for_initramfs+0x0/0x68 Misaligned patch-site panic_show_mem+0x0/0x88 Misaligned patch-site 0xffffffd3b4fef074 [...] (I assume the unresolved symbol(s) are from modules.) The warnings were seen on next-20230223 and many versions before, with Debian's GCC 12.2 cross compile toolchain. I also tried next-20230223 with Linaro's toolchains gcc-linaro-12.2.1-2023.01-x86_64_aarch64-linux-gnu and gcc-linaro-13.0.0-2022.11-x86_64_aarch64-linux-gnu and the warnings appeared as well. Checking panic_show_mem in various places from the Debian GCC's build: $ aarch64-linux-gnu-nm init/initramfs.o | grep panic_show_mem 0000000000000070 t panic_show_mem $ aarch64-linux-gnu-nm init/built-in.a | grep panic_show_mem 0000000000000070 t panic_show_mem $ aarch64-linux-gnu-nm built-in.a | grep panic_show_mem 0000000000000070 t panic_show_mem $ aarch64-linux-gnu-nm vmlinux.a | grep panic_show_mem 0000000000000070 t panic_show_mem $ aarch64-linux-gnu-nm vmlinux.o | grep panic_show_mem 0000000000001534 t panic_show_mem $ aarch64-linux-gnu-nm vmlinux | grep panic_show_mem ffffffc0080158dc t panic_show_mem Looks like individual object files do have functions aligned at 8-byte boundaries, but when all the object files are collected and linked together into vmlinux.o, the higher alignment gets dropped and some functions end up on 4-byte boundaries. Regards ChenYu > > Each callsite ends up looking like: > > # Aligned to 8 bytes > func - 8: > < pointer to ops > > func: > BTI // optional > MOV X9, LR > NOP // patched to `BL ftrace_caller` > func_body: > > When entering ftrace_caller, the LR points at func_body, and the > ftrace_ops can be recovered at a negative offset from this the LR value: > > BIC , LR, 0x7 // Align down (skips BTI) > LDR , [, #-16] // load ops pointer > > The ftrace_ops::func (and any other ftrace_ops fields) can then be > recovered from this pointer to the ops. > > The first three patches enable the function alignment, working around > cases where GCC drops alignment for cold functions or when building with > '-Os'. > > The final four patches implement support for > DYNAMIC_FTRACE_WITH_CALL_OPS on arm64. As noted in the final patch, this > results in a significant reduction in overhead: > > Before this series: > > Number of tracers || Total time | Per-call average time (ns) > Relevant | Irrelevant || (ns) | Total | Overhead > =========+============++=============+==============+============ > 0 | 0 || 94,583 | 0.95 | - > 0 | 1 || 93,709 | 0.94 | - > 0 | 2 || 93,666 | 0.94 | - > 0 | 10 || 93,709 | 0.94 | - > 0 | 100 || 93,792 | 0.94 | - > ---------+------------++-------------+--------------+------------ > 1 | 1 || 6,467,833 | 64.68 | 63.73 > 1 | 2 || 7,509,708 | 75.10 | 74.15 > 1 | 10 || 23,786,792 | 237.87 | 236.92 > 1 | 100 || 106,432,500 | 1,064.43 | 1063.38 > ---------+------------++-------------+--------------+------------ > 1 | 0 || 1,431,875 | 14.32 | 13.37 > 2 | 0 || 6,456,334 | 64.56 | 63.62 > 10 | 0 || 22,717,000 | 227.17 | 226.22 > 100 | 0 || 103,293,667 | 1032.94 | 1031.99 > ---------+------------++-------------+--------------+-------------- > > Note: per-call overhead is estiamated relative to the baseline case > with 0 relevant tracers and 0 irrelevant tracers. > > After this series: > > Number of tracers || Total time | Per-call average time (ns) > Relevant | Irrelevant || (ns) | Total | Overhead > =========+============++=============+==============+============ > 0 | 0 || 94,541 | 0.95 | - > 0 | 1 || 93,666 | 0.94 | - > 0 | 2 || 93,709 | 0.94 | - > 0 | 10 || 93,667 | 0.94 | - > 0 | 100 || 93,792 | 0.94 | - > ---------+------------++-------------+--------------+------------ > 1 | 1 || 281,000 | 2.81 | 1.86 > 1 | 2 || 281,042 | 2.81 | 1.87 > 1 | 10 || 280,958 | 2.81 | 1.86 > 1 | 100 || 281,250 | 2.81 | 1.87 > ---------+------------++-------------+--------------+------------ > 1 | 0 || 280,959 | 2.81 | 1.86 > 2 | 0 || 6,502,708 | 65.03 | 64.08 > 10 | 0 || 18,681,209 | 186.81 | 185.87 > 100 | 0 || 103,550,458 | 1,035.50 | 1034.56 > ---------+------------++-------------+--------------+------------ > > Note: per-call overhead is estiamated relative to the baseline case > with 0 relevant tracers and 0 irrelevant tracers. > > > This version of the series can be found in my kernel.org git repo: > > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git > > Tagged as: > > arm64-ftrace-per-callsite-ops-20230113 > > Since v1 [1]: > * Fold in Ack from Rafael > * Update comments/commits with description of the GCC issue > * Move the cold attribute changes to compiler_types.h > * Drop the unnecessary changes to the weak attribute > * Move declaration of ftrace_ops earlier > * Clean up and improve commit messages > * Regenerate statistics on misaligned text symbols > > Since v2 [2]: > * Fold in Steve's Reviewed-by tag > * Move core ftrace patch to the start of the series > * Add ftrace selftest reults to final patch > * Use FUNCTION_ALIGNMENT_4B by default > * Fix commit message typos > > [1] https://lore.kernel.org/linux-arm-kernel/20230109135828.879136-1-mark.rutland@arm.com/ > [2] https://lore.kernel.org/linux-arm-kernel/20230113180355.2930042-1-mark.rutland@arm.com/ > > Thanks, > Mark. > > Mark Rutland (8): > ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS > Compiler attributes: GCC cold function alignment workarounds > ACPI: Don't build ACPICA with '-Os' > arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT > arm64: insn: Add helpers for BTI > arm64: patching: Add aarch64_insn_write_literal_u64() > arm64: ftrace: Update stale comment > arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS > > arch/arm64/Kconfig | 4 + > arch/arm64/Makefile | 5 +- > arch/arm64/include/asm/ftrace.h | 15 +-- > arch/arm64/include/asm/insn.h | 1 + > arch/arm64/include/asm/linkage.h | 4 +- > arch/arm64/include/asm/patching.h | 2 + > arch/arm64/kernel/asm-offsets.c | 4 + > arch/arm64/kernel/entry-ftrace.S | 32 +++++- > arch/arm64/kernel/ftrace.c | 158 +++++++++++++++++++++++++++- > arch/arm64/kernel/patching.c | 17 +++ > drivers/acpi/acpica/Makefile | 2 +- > include/linux/compiler_attributes.h | 6 -- > include/linux/compiler_types.h | 27 +++++ > include/linux/ftrace.h | 18 +++- > kernel/exit.c | 9 +- > kernel/trace/Kconfig | 7 ++ > kernel/trace/ftrace.c | 109 ++++++++++++++++++- > 17 files changed, 380 insertions(+), 40 deletions(-) > > -- > 2.30.2 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel