All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Shuai Xue" <xueshuai@linux.alibaba.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing
Date: Tue, 23 Jun 2026 13:19:10 +0000	[thread overview]
Message-ID: <20260623131911.803831F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623130234.8709-3-xueshuai@linux.alibaba.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] `disasm_line__parse_arm64` corrupts disassembly lines from LLVM and Capstone by assuming the first word is always raw hex bytes.
- [High] Assigning `dl->ins.name = ""` leads to a glibc crash (`free(): invalid pointer`) during teardown.
- [Low] Missing explicit inclusion of `<stdio.h>` for `sscanf`.

Pre-existing issues:
- [High] Reads past the end of the line buffer when processing raw hexadecimal instructions.
--

commit 76213afac23a5e082aabd6700958b58201fa5b8a
Author: Shuai Xue <xueshuai@linux.alibaba.com>

perf disasm: Add ARM64 architecture detection and raw instruction parsing

This commit adds an architecture detection helper for ARM64 binaries and
introduces a new parser to extract raw 32-bit instruction words from ARM64
objdump output. It wires up the new ARM64 parsing path alongside the existing
PowerPC implementation.

> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 59ba88e1f7443..83fad4f01442b 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -902,6 +916,53 @@ static int disasm_line__parse_powerpc(struct disasm_line *dl, struct annotate_ar

[Severity: High]
This is a pre-existing issue, but does the pointer arithmetic in
disasm_line__parse_powerpc() read past the end of the line buffer?

char *tmp_raw_insn, *name_raw_insn = skip_spaces(line);
char *name = skip_spaces(name_raw_insn + RAW_BYTES);

If symbol__disassemble_raw() populates the line argument with an 8-character
string, and RAW_BYTES is hardcoded to 11, could skip_spaces() end up reading
from uninitialized stack memory?

> +/*
> + * Parses ARM64 disassembly output which includes raw instruction bytes.
> + * ARM64 objdump format:
> + *   a9bf7bfd 	stp	x29, x30, [sp, #-16]!
> + *
> + * The raw instruction is a hex word (typically 8 chars) followed by whitespace.
> + */
> +static int disasm_line__parse_arm64(struct disasm_line *dl, struct annotate_args *args)
> +{
> +	char *line = dl->al.line;
> +	const char **namep = &dl->ins.name;
> +	char **rawp = &dl->ops.raw;
> +	char *name_raw_insn = skip_spaces(line);
> +	char *end_raw, *name, *tmp_raw_insn;
> +	int ret = 0;
> +
> +	if (name_raw_insn[0] == '\0')
> +		return -1;
> +
> +	/* Find end of raw instruction hex by looking for whitespace */
> +	end_raw = name_raw_insn;
> +	while (*end_raw && !isspace(*end_raw))
> +		end_raw++;
> +
> +	name = skip_spaces(end_raw);
> +
> +	if (args->options->disassembler_used)
> +		ret = disasm_line__parse(name, namep, rawp);

[Severity: High]
If a user analyzes a binary with the capstone or llvm disassemblers, the
output lacks raw hexadecimal bytes. Does this loop assume the first
non-whitespace token is always the raw hex instruction?

If so, could this isolate the instruction mnemonic by mistake, and cause the
remaining operands to be parsed incorrectly as the instruction mnemonic?

> +	else
> +		*namep = "";

[Severity: High]
When perf annotate is run with the source flag, the disassembler_used option
evaluates to 0. This assigns a read-only string literal to the namep pointer.

Later on, disasm_line__free() unconditionally attempts to free this memory:

void disasm_line__free(struct disasm_line *dl)
{
        zfree(&dl->ins.name);
        ...

Will attempting to free this string literal cause a glibc crash?

> +
> +	tmp_raw_insn = strndup(name_raw_insn, end_raw - name_raw_insn);
> +	if (tmp_raw_insn == NULL) {
> +		if (args->options->disassembler_used)
> +			zfree(namep);
> +		return -1;
> +	}
> +
> +	remove_spaces(tmp_raw_insn);
> +
> +	if (sscanf(tmp_raw_insn, "%x", &dl->raw.raw_insn) != 1)

[Severity: Low]
Does this new usage of sscanf require an explicit include of <stdio.h> at the
top of the file to prevent musl libc build failures?

> +		dl->raw.raw_insn = 0;
> +	free(tmp_raw_insn);
> +
> +	return ret;
> +}
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623130234.8709-1-xueshuai@linux.alibaba.com?part=2

  reply	other threads:[~2026-06-23 13:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 13:02 [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 1/5] perf annotate-data: Widen type_state_reg::imm_value to u64 Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 2/5] perf disasm: Add ARM64 architecture detection and raw instruction parsing Shuai Xue
2026-06-23 13:19   ` sashiko-bot [this message]
2026-06-23 13:02 ` [RFC PATCH v1 3/5] perf dwarf-regs: Add ARM64 register and offset extraction from raw instructions Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 4/5] perf annotate: Wire up ARM64 data type profiling infrastructure Shuai Xue
2026-06-23 13:02 ` [RFC PATCH v1 5/5] perf annotate-arch: Add ARM64 data type profiling support Shuai Xue
2026-06-23 13:32   ` sashiko-bot
2026-06-23 16:56 ` [RFC PATCH v1 0/5] perf annotate: " Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623131911.803831F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.