From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: ardb@kernel.org, catalin.marinas@arm.com, james.morse@arm.com,
joey.gouly@arm.com, mark.rutland@arm.com, maz@kernel.org,
will@kernel.org
Subject: [PATCH 9/9] HACK: arm64: alternatives: dump summary of alternatives
Date: Thu, 1 Sep 2022 16:14:03 +0100 [thread overview]
Message-ID: <20220901151403.1735836-10-mark.rutland@arm.com> (raw)
In-Reply-To: <20220901151403.1735836-1-mark.rutland@arm.com>
NOTE: THIS PATCH IS NOT INTENDED FOR UPSTREAM.
To figure out whether it's worth making further changes to alternatives
(e.g. whether it's worth replacing regular entries with callbacks), it
would be useful to know the makeup of alternatives in a given kernel
Image or module.
This patch makes the alternatives code dump a summary of kernel
alternatives at boot time, and module alternatives at module load time.
For example, a defconfig v6.0-rc3+ kernel build with GCC 12.1.0 looks
reports:
| alternatives: Alternatives summary:
| entries: 28000 (336000 bytes)
| regular: 17280
| callback: 10720
| instructions: 32052 (128208 bytes)
| replacements: 20962 ( 83848 bytes)
| alternatives: cpucap 1 => entries: 925 orig: 1295, repl: 0, cb: 925
| alternatives: cpucap 2 => entries: 10 orig: 10, repl: 10, cb: 0
| alternatives: cpucap 4 => entries: 2 orig: 2, repl: 2, cb: 0
| alternatives: cpucap 5 => entries: 49 orig: 142, repl: 142, cb: 0
| alternatives: cpucap 10 => entries: 36 orig: 36, repl: 36, cb: 0
| alternatives: cpucap 11 => entries: 9 orig: 12, repl: 12, cb: 0
| alternatives: cpucap 12 => entries: 3 orig: 6, repl: 6, cb: 0
| alternatives: cpucap 13 => entries: 17 orig: 17, repl: 17, cb: 0
| alternatives: cpucap 14 => entries: 3 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 16 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 18 => entries: 7 orig: 13, repl: 13, cb: 0
| alternatives: cpucap 19 => entries: 2 orig: 2, repl: 2, cb: 0
| alternatives: cpucap 20 => entries: 17 orig: 17, repl: 17, cb: 0
| alternatives: cpucap 24 => entries: 1128 orig: 1128, repl: 1128, cb: 0
| alternatives: cpucap 26 => entries: 10780 orig: 13953, repl: 4158, cb: 9795
| alternatives: cpucap 27 => entries: 39 orig: 39, repl: 39, cb: 0
| alternatives: cpucap 28 => entries: 4 orig: 8, repl: 8, cb: 0
| alternatives: cpucap 29 => entries: 15 orig: 15, repl: 15, cb: 0
| alternatives: cpucap 30 => entries: 15 orig: 27, repl: 27, cb: 0
| alternatives: cpucap 31 => entries: 3 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 32 => entries: 59 orig: 118, repl: 118, cb: 0
| alternatives: cpucap 33 => entries: 6 orig: 6, repl: 6, cb: 0
| alternatives: cpucap 36 => entries: 20 orig: 20, repl: 20, cb: 0
| alternatives: cpucap 37 => entries: 2727 orig: 2727, repl: 2727, cb: 0
| alternatives: cpucap 38 => entries: 3 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 40 => entries: 25 orig: 29, repl: 29, cb: 0
| alternatives: cpucap 41 => entries: 11 orig: 21, repl: 21, cb: 0
| alternatives: cpucap 42 => entries: 142 orig: 152, repl: 152, cb: 0
| alternatives: cpucap 44 => entries: 63 orig: 63, repl: 63, cb: 0
| alternatives: cpucap 45 => entries: 4 orig: 4, repl: 4, cb: 0
| alternatives: cpucap 46 => entries: 5 orig: 5, repl: 5, cb: 0
| alternatives: cpucap 47 => entries: 2 orig: 2, repl: 2, cb: 0
| alternatives: cpucap 50 => entries: 3 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 51 => entries: 105 orig: 105, repl: 105, cb: 0
| alternatives: cpucap 52 => entries: 57 orig: 59, repl: 59, cb: 0
| alternatives: cpucap 53 => entries: 3 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 54 => entries: 5 orig: 5, repl: 5, cb: 0
| alternatives: cpucap 55 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 59 => entries: 28 orig: 28, repl: 28, cb: 0
| alternatives: cpucap 60 => entries: 2 orig: 2, repl: 2, cb: 0
| alternatives: cpucap 61 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 65 => entries: 2 orig: 2, repl: 2, cb: 0
| alternatives: cpucap 68 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 70 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 71 => entries: 1 orig: 3, repl: 3, cb: 0
| alternatives: cpucap 72 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 73 => entries: 32 orig: 32, repl: 32, cb: 0
| alternatives: cpucap 74 => entries: 4 orig: 4, repl: 4, cb: 0
| alternatives: cpucap 75 => entries: 5 orig: 5, repl: 5, cb: 0
| alternatives: cpucap 76 => entries: 11391 orig: 11391, repl: 11391, cb: 0
| alternatives: cpucap 77 => entries: 1 orig: 1, repl: 1, cb: 0
| alternatives: cpucap 78 => entries: 64 orig: 224, repl: 224, cb: 0
| alternatives: cpucap 79 => entries: 141 orig: 282, repl: 282, cb: 0
| alternatives: cpucap 80 => entries: 19 orig: 19, repl: 19, cb: 0
From this, it's worth noting:
* cpucap 1 is ARM64_ALWAYS_SYSTEM.
* cpucap 24 is ARM64_HAS_IRQ_PRIO_MASKING. Due to the existing structure
of the alternatives, alternative entries are created for the
irqflags.h code even when CONFIG_ARM64_PSEUDO_NMI=n, creating ~14KiB
of alt_instr entries, and ~4KiB of replacement instructions.
This could be avoided by reworking the irqflags.h code to use the new
alternative_has_feature_*() helpers.
* cpucap 26 is ARM64_HAS_LSE_ATOMICS, and most entries are using the
shared NOP patcher. The other entries are for inline cmpxchg
sequences.
* cpucap 37 is ARM64_HAS_VIRT_HOST_EXTN.
* cpucap 76 is ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE, which could be
rewritten to use a callback to patch LDR to LDAR (or vice-versa), were
the insn framework extended, to save ~44KiB of replacement
instructions.
NOTE: THIS PATCH IS NOT INTENDED FOR UPSTREAM.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
arch/arm64/kernel/alternative.c | 67 ++++++++++++++++++++++++++++++++-
1 file changed, 65 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 122c59ce2772b..2f55d03adbe80 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -138,15 +138,74 @@ static void clean_dcache_range_nopatch(u64 start, u64 end)
} while (cur += d_size, cur < end);
}
+#define for_each_region_alt(region, alt) \
+ for (struct alt_instr *alt = (region)->begin; \
+ (alt) < (region)->end; \
+ (alt)++)
+
+void summarize_alternatives(const struct alt_region *region)
+{
+ unsigned int entries[ARM64_NCAPS] = { 0 };
+ unsigned int orig_len[ARM64_NCAPS] = { 0 };
+ unsigned int repl_len[ARM64_NCAPS] = { 0 };
+ unsigned int callbacks[ARM64_NCAPS] = { 0 };
+
+ unsigned int total_entries = 0;
+ unsigned int total_orig = 0;
+ unsigned int total_repl = 0;
+ unsigned int total_callbacks = 0;
+
+ for_each_region_alt(region, alt) {
+ int cap = ALT_CAP(alt);
+
+ entries[cap]++;
+ total_entries++;
+
+ orig_len[cap] += alt->orig_len;
+ total_orig += alt->orig_len;
+
+ repl_len[cap] += alt->alt_len;
+ total_repl += alt->alt_len;
+
+ if (ALT_HAS_CB(alt)) {
+ callbacks[cap]++;
+ total_callbacks++;
+ }
+ }
+
+ pr_info("Alternatives summary:\n"
+ " entries: %6u (%6zu bytes)\n"
+ " regular: %6d\n"
+ " callback: %6d\n"
+ " instructions: %6u (%6u bytes)\n"
+ " replacements: %6u (%6u bytes)\n",
+ total_entries, total_entries * sizeof (struct alt_instr),
+ total_entries - total_callbacks,
+ total_callbacks,
+ total_orig / AARCH64_INSN_SIZE, total_orig,
+ total_repl / AARCH64_INSN_SIZE, total_repl);
+
+ for (int i = 0; i < ARM64_NCAPS; i++) {
+ if (!entries[i])
+ continue;
+
+ pr_info("cpucap %2d => entries: %5d orig: %5d, repl: %5d, cb: %5d\n",
+ i,
+ entries[i],
+ orig_len[i] / AARCH64_INSN_SIZE,
+ repl_len[i] / AARCH64_INSN_SIZE,
+ callbacks[i]);
+ }
+}
+
static void __nocfi __apply_alternatives(const struct alt_region *region,
bool is_module,
unsigned long *feature_mask)
{
- struct alt_instr *alt;
__le32 *origptr, *updptr;
alternative_cb_t alt_cb;
- for (alt = region->begin; alt < region->end; alt++) {
+ for_each_region_alt(region, alt) {
int nr_inst;
int cap = ALT_CAP(alt);
@@ -245,6 +304,8 @@ void __init apply_boot_alternatives(void)
/* If called on non-boot cpu things could go wrong */
WARN_ON(smp_processor_id() != 0);
+ summarize_alternatives(&kernel_alternatives);
+
pr_info("applying boot alternatives\n");
__apply_alternatives(&kernel_alternatives, false,
@@ -262,6 +323,8 @@ void apply_alternatives_module(void *start, size_t length)
bitmap_fill(all_capabilities, ARM64_NCAPS);
+ summarize_alternatives(®ion);
+
__apply_alternatives(®ion, true, &all_capabilities[0]);
}
#endif
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-09-01 15:17 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 15:13 [PATCH 0/9] arm64: alternatives: improvements Mark Rutland
2022-09-01 15:13 ` [PATCH 1/9] arm64: cpufeature: make cpus_have_cap() noinstr-safe Mark Rutland
2022-09-01 15:13 ` [PATCH 2/9] arm64: alternatives: kvm: prepare for cap changes Mark Rutland
2022-09-01 15:13 ` [PATCH 3/9] arm64: alternatives: proton-pack: " Mark Rutland
2022-09-02 16:19 ` Joey Gouly
2022-09-05 8:46 ` Mark Rutland
2022-09-01 15:13 ` [PATCH 4/9] arm64: alternatives: hoist print out of __apply_alternatives() Mark Rutland
2022-09-01 15:13 ` [PATCH 5/9] arm64: alternatives: make alt_region const Mark Rutland
2022-09-06 15:18 ` Ard Biesheuvel
2022-09-12 9:31 ` Mark Rutland
2022-09-12 10:13 ` Ard Biesheuvel
2022-09-12 12:13 ` Mark Rutland
2022-09-01 15:14 ` [PATCH 6/9] arm64: alternatives: have callbacks take a cap Mark Rutland
2022-09-02 15:54 ` Joey Gouly
2022-09-05 8:48 ` Mark Rutland
2022-09-01 15:14 ` [PATCH 7/9] arm64: alternatives: add alternative_has_feature_*() Mark Rutland
2022-09-01 15:14 ` [PATCH 8/9] arm64: alternatives: add shared NOP callback Mark Rutland
2022-09-01 15:14 ` Mark Rutland [this message]
2022-09-12 12:36 ` [PATCH 9/9] HACK: arm64: alternatives: dump summary of alternatives Mark Brown
2022-09-12 16:14 ` Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220901151403.1735836-10-mark.rutland@arm.com \
--to=mark.rutland@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=joey.gouly@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox