From: Quentin Monnet <quentin@isovalent.com>
To: Yonghong Song <yhs@fb.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <kafai@fb.com>,
"Song Liu" <songliubraving@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
bpf <bpf@vger.kernel.org>,
"Niklas Söderlund" <niklas.soderlund@corigine.com>,
"Simon Horman" <simon.horman@corigine.com>
Subject: Re: [PATCH bpf-next 6/7] bpftool: Add LLVM as default library for disassembling JIT-ed programs
Date: Sun, 11 Sep 2022 21:13:33 +0100 [thread overview]
Message-ID: <02cdc01d-bac0-20cc-945d-7a6a46028cdc@isovalent.com> (raw)
In-Reply-To: <ca1d357f-7901-8b01-65c2-a832f49deb81@fb.com>
On 07/09/2022 19:02, Yonghong Song wrote:
>
>
> On 9/7/22 9:33 AM, Quentin Monnet wrote:
>> On 07/09/2022 17:10, Alexei Starovoitov wrote:
>>> On Wed, Sep 7, 2022 at 7:20 AM Quentin Monnet <quentin@isovalent.com>
>>> wrote:
>>>>
>>>> On 07/09/2022 00:46, Alexei Starovoitov wrote:
>>>>> On Tue, Sep 6, 2022 at 6:36 AM Quentin Monnet
>>>>> <quentin@isovalent.com> wrote:
>>>>>> # bpftool prog dump jited id 56
>>>>>> bpf_prog_6deef7357e7b4530:
>>>>>> 0: nopl (%rax,%rax)
>>>>>> 5: nop
>>>>>> 7: pushq %rbp
>>>>>> 8: movq %rsp, %rbp
>>>>>> b: pushq %rbx
>>>>>> c: pushq %r13
>>>>>> e: pushq %r14
>>>>>> 10: movq %rdi, %rbx
>>>>>> 13: movzwq 176(%rbx), %r13
>>>>>> 1b: xorl %r14d, %r14d
>>>>>> 1e: orl $2, %r14d
>>>>>> 22: movl $1, %eax
>>>>>> 27: cmpq $2, %r14
>>>>>> 2b: jne 2
>>>>>> 2d: xorl %eax, %eax
>>>>>> 2f: popq %r14
>>>>>
>>>>> If I'm reading the asm correctly the difference is significant.
>>>>> jne 0x2f was an absolute address and jmps were easy
>>>>> to follow.
>>>>> While in llvm disasm it's 'jne 2' ?! What is 2 ?
>>>>> 2 bytes from the next insn of 0x2d ?
>>>>
>>>> Yes, that's it. Apparently, this is how the operand is encoded, and
>>>> libbfd does the translation to the absolute address:
>>>>
>>>> # bpftool prog dump jited id 7868 opcodes
>>>> [...]
>>>> 2b: jne 0x000000000000002f
>>>> 75 02
>>>> [...]
>>>>
>>>> The same difference is observable between objdump and llvm-objdump
>>>> on an
>>>> x86-64 binary for example, although they usually have labels to
>>>> refer to
>>>> ("jne -22 <_obstack_memory_used+0x7d0>"), making the navigation
>>>> easier. The only mention I could find of that difference is a report
>>>> from 2013 [0].
>>>>
>>>> [0] https://discourse.llvm.org/t/llvm-objdump-disassembling-jmp/29584/2
>>>>
>>>>> That is super hard to read.
>>>>> Is there a way to tune/configure llvm disasm?
>>>>
>>>> There's a function and some options to tune it, but I tried them and
>>>> none applies to converting the jump operands.
>>>>
>>>> int LLVMSetDisasmOptions(LLVMDisasmContextRef DC, uint64_t
>>>> Options);
>>>>
>>>> /* The option to produce marked up assembly. */
>>>> #define LLVMDisassembler_Option_UseMarkup 1
>>>> /* The option to print immediates as hex. */
>>>> #define LLVMDisassembler_Option_PrintImmHex 2
>>>> /* The option use the other assembler printer variant */
>>>> #define LLVMDisassembler_Option_AsmPrinterVariant 4
>>>> /* The option to set comment on instructions */
>>>> #define LLVMDisassembler_Option_SetInstrComments 8
>>>> /* The option to print latency information alongside
>>>> instructions */
>>>> #define LLVMDisassembler_Option_PrintLatency 16
>>>>
>>>> I found that LLVMDisassembler_Option_AsmPrinterVariant read better,
>>>> although in my patch I kept the default output which looked closer to
>>>> the existing from libbfd. Here's what the option produces:
>>>>
>>>> bpf_prog_6deef7357e7b4530:
>>>> 0: nop dword ptr [rax + rax]
>>>> 5: nop
>>>> 7: push rbp
>>>> 8: mov rbp, rsp
>>>> b: push rbx
>>>> c: push r13
>>>> e: push r14
>>>> 10: mov rbx, rdi
>>>> 13: movzx r13, word ptr [rbx + 180]
>>>> 1b: xor r14d, r14d
>>>> 1e: or r14d, 2
>>>> 22: mov eax, 1
>>>> 27: cmp r14, 2
>>>> 2b: jne 2
>>>> 2d: xor eax, eax
>>>> 2f: pop r14
>>>> 31: pop r13
>>>> 33: pop rbx
>>>> 34: leave
>>>> 35: re
>>>>
>>>> But the jne operand remains a '2'. I'm not aware of any option to
>>>> change
>>>> it in LLVM's disassembler :(.
>>>
>>> Hmm. llvm-objdump -d test_maps
>>> looks fine:
>>> 41bfcb: e8 6f f7 ff ff callq 0x41b73f
>>> <find_extern_btf_id>
>>>
>>> the must be something llvm disasm is missing when you feed raw bytes
>>> into it.
>>> Please keep investigating. In this form I'm afraid it's no go.
>>
>> OK, I'll keep looking
>
> Quentin, if eventually there is no existing solution for this problem,
> we could improve llvm API to encode branch target in more
> easy-to-understand form.
>
TL;DR: I figured it out, with no LLVM fix required. I'll send a v2.
---
The details:
In my previous example, I ran on Ubuntu 20.04 with llvm-objdump v10.
Looking at just the v11 with the same command, I get the relative
addresses like in Alexei's output:
$ llvm-objdump-10 -d /usr/bin/grep | grep -m1 jne
4f20: 0f 85 e7 0b 00 00 jne 3047 <.text+0xddd>
$ llvm-objdump-11 -d /usr/bin/grep | grep -m1 jne
4f20: 0f 85 e7 0b 00 00 jne 0x5b0d <.text+0xddd>
$ printf '%#x\n' $((0x4f20 + 3047 + 6))
0x5b0d
The change was introduced between the two versions by the following commits:
5fad05e80dd0 ("[MCInstPrinter] Pass `Address` parameter to
MCOI::OPERAND_PCREL typed operands. NFC")
https://reviews.llvm.org/D76574
87de9a0786d9 ("[X86InstPrinter] Change printPCRelImm to print the
target address in hexadecimal form")
https://reviews.llvm.org/D76580
The first one in particular introduces a PrintBranchImmAsAddress
variable in the MCInstPrinter, and makes llvm-obdump call
"setPrintBranchImmAsAddress(true);".
Now, this function is not exposed to the C interface, llvm-c, that
bpftool would use. Instead, it's available through the more
comprehensive C++ interface that llvm-objdump relies on. I've tried to
replicate these two commits in the MCDisassembler that llvm-c interfaces
with, and succeeded.
But while looking at unit tests to extend it with the relevant checks, I
realised that the existing test would already expect an address instead
of an operand [0]. Turns out that the symbolLookupCallback() callback
defined in the test and passed when creating the disassembler ends up
changing the display so we get the operands as addresses, as we want.
[0]
https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0/llvm/unittests/MC/Disassembler.cpp#L64
---
I validated that this works in bpftool too, and will send v2 with that
change.
P.S. Thank you Niklas for testing!
next prev parent reply other threads:[~2022-09-11 20:13 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 13:36 [PATCH bpf-next 0/7] bpftool: Add LLVM as default library for disassembling JIT-ed programs Quentin Monnet
2022-09-06 13:36 ` [PATCH bpf-next 1/7] bpftool: Define _GNU_SOURCE only once Quentin Monnet
2022-09-06 23:14 ` Song Liu
2022-09-06 13:36 ` [PATCH bpf-next 2/7] bpftool: Remove asserts from JIT disassembler Quentin Monnet
2022-09-06 23:16 ` Song Liu
2022-09-06 13:36 ` [PATCH bpf-next 3/7] bpftool: Split FEATURE_TESTS/FEATURE_DISPLAY definitions in Makefile Quentin Monnet
2022-09-06 23:18 ` Song Liu
2022-09-06 13:36 ` [PATCH bpf-next 4/7] bpftool: Group libbfd defs in Makefile, only pass them if we use libbfd Quentin Monnet
2022-09-06 23:31 ` Song Liu
2022-09-07 14:19 ` Quentin Monnet
2022-09-06 13:36 ` [PATCH bpf-next 5/7] bpftool: Refactor disassembler for JIT-ed programs Quentin Monnet
2022-09-06 23:55 ` Song Liu
2022-09-06 13:36 ` [PATCH bpf-next 6/7] bpftool: Add LLVM as default library for disassembling " Quentin Monnet
2022-09-06 23:46 ` Alexei Starovoitov
2022-09-07 14:20 ` Quentin Monnet
2022-09-07 16:10 ` Alexei Starovoitov
2022-09-07 16:33 ` Quentin Monnet
2022-09-07 18:02 ` Yonghong Song
2022-09-11 20:13 ` Quentin Monnet [this message]
2022-09-07 0:06 ` Song Liu
2022-09-07 14:20 ` Quentin Monnet
2022-09-07 16:37 ` Song Liu
2022-09-06 13:36 ` [PATCH bpf-next 7/7] bpftool: Add llvm feature to "bpftool version" Quentin Monnet
2022-09-10 19:41 ` [PATCH bpf-next 0/7] bpftool: Add LLVM as default library for disassembling JIT-ed programs Niklas Söderlund
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02cdc01d-bac0-20cc-945d-7a6a46028cdc@isovalent.com \
--to=quentin@isovalent.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=niklas.soderlund@corigine.com \
--cc=sdf@google.com \
--cc=simon.horman@corigine.com \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox