public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alan Maguire <alan.maguire@oracle.com>,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	dwarves@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Song Liu <song@kernel.org>
Subject: Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching
Date: Tue, 12 Nov 2024 10:51:56 -0800	[thread overview]
Message-ID: <b6aa0c9e-822e-4c51-8dbe-ba6efe520b43@linux.dev> (raw)
In-Reply-To: <5e640168-7753-413a-ab00-f297948e84ef@oracle.com>




On 11/12/24 10:33 AM, Alan Maguire wrote:
> On 12/11/2024 17:07, Yonghong Song wrote:
>>
>>
>> On 11/12/24 8:56 AM, Alan Maguire wrote:
>>> On 12/11/2024 01:51, Yonghong Song wrote:
>>>>
>>>> On 11/11/24 7:39 AM, Alan Maguire wrote:
>>>>> On 08/11/2024 18:05, Yonghong Song wrote:
>>>>>> Song Liu reported that a kernel func (perf_event_read()) cannot be
>>>>>> traced
>>>>>> in certain situations since the func is not in vmlinux bTF. This
>>>>>> happens
>>>>>> in kernels 6.4, 6.9 and 6.11 and the kernel is built with pahole 1.27.
>>>>>>
>>>>>> The perf_event_read() signature in kernel (kernel/events/core.c):
>>>>>>       static int perf_event_read(struct perf_event *event, bool group)
>>>>>>
>>>>>> Adding '-V' to pahole command line, and the following error msg can
>>>>>> be found:
>>>>>>       skipping addition of 'perf_event_read'(perf_event_read) due to
>>>>>> unexpected register used for parameter
>>>>>>
>>>>>> Eventually the error message is attributed to the setting
>>>>>> (parm->unexpected_reg = 1) in parameter__new() function.
>>>>>>
>>>>>> The following is the dwarf representation for perf_event_read():
>>>>>>        0x0334c034:   DW_TAG_subprogram
>>>>>>                    DW_AT_low_pc    (0xffffffff812c6110)
>>>>>>                    DW_AT_high_pc   (0xffffffff812c640a)
>>>>>>                    DW_AT_frame_base        (DW_OP_reg7 RSP)
>>>>>>                    DW_AT_GNU_all_call_sites        (true)
>>>>>>                    DW_AT_name      ("perf_event_read")
>>>>>>                    DW_AT_decl_file ("/rw/compile/kernel/events/core.c")
>>>>>>                    DW_AT_decl_line (4641)
>>>>>>                    DW_AT_prototyped        (true)
>>>>>>                    DW_AT_type      (0x03324f6a "int")
>>>>>>        0x0334c04e:     DW_TAG_formal_parameter
>>>>>>                      DW_AT_location        (0x007de9fd:
>>>>>>                         [0xffffffff812c6115, 0xffffffff812c6141):
>>>>>> DW_OP_reg5 RDI
>>>>>>                         [0xffffffff812c6141, 0xffffffff812c6323):
>>>>>> DW_OP_reg14 R14
>>>>>>                         [0xffffffff812c6323, 0xffffffff812c63fe):
>>>>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value
>>>>>>                         [0xffffffff812c63fe, 0xffffffff812c6405):
>>>>>> DW_OP_reg14 R14
>>>>>>                         [0xffffffff812c6405, 0xffffffff812c640a):
>>>>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value)
>>>>>>                      DW_AT_name    ("event")
>>>>>>                      DW_AT_decl_file       ("/rw/compile/kernel/events/
>>>>>> core.c")
>>>>>>                      DW_AT_decl_line       (4641)
>>>>>>                      DW_AT_type    (0x0333aac2 "perf_event *")
>>>>>>        0x0334c05e:     DW_TAG_formal_parameter
>>>>>>                      DW_AT_location        (0x007dea82:
>>>>>>                         [0xffffffff812c6137, 0xffffffff812c63f2):
>>>>>> DW_OP_reg12 R12
>>>>>>                         [0xffffffff812c63f2, 0xffffffff812c63fe):
>>>>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI), DW_OP_stack_value
>>>>>>                         [0xffffffff812c63fe, 0xffffffff812c640a):
>>>>>> DW_OP_reg12 R12)
>>>>>>                      DW_AT_name    ("group")
>>>>>>                      DW_AT_decl_file       ("/rw/compile/kernel/events/
>>>>>> core.c")
>>>>>>                      DW_AT_decl_line       (4641)
>>>>>>                      DW_AT_type    (0x03327059 "bool")
>>>>>>
>>>>>> By inspecting the binary, the second argument ("bool group") is used
>>>>>> in the function. The following are the disasm code:
>>>>>>        ffffffff812c6110 <perf_event_read>:
>>>>>>        ffffffff812c6110: 0f 1f 44 00 00        nopl    (%rax,%rax)
>>>>>>        ffffffff812c6115: 55                    pushq   %rbp
>>>>>>        ffffffff812c6116: 41 57                 pushq   %r15
>>>>>>        ffffffff812c6118: 41 56                 pushq   %r14
>>>>>>        ffffffff812c611a: 41 55                 pushq   %r13
>>>>>>        ffffffff812c611c: 41 54                 pushq   %r12
>>>>>>        ffffffff812c611e: 53                    pushq   %rbx
>>>>>>        ffffffff812c611f: 48 83 ec 18           subq    $24, %rsp
>>>>>>        ffffffff812c6123: 41 89 f4              movl    %esi, %r12d
>>>>>>        <=========== NOTE that here '%esi' is used and moved to '%r12d'.
>>>>>>        ffffffff812c6126: 49 89 fe              movq    %rdi, %r14
>>>>>>        ffffffff812c6129: 65 48 8b 04 25 28 00 00 00    movq    %gs:40,
>>>>>> %rax
>>>>>>        ffffffff812c6132: 48 89 44 24 10        movq    %rax, 16(%rsp)
>>>>>>        ffffffff812c6137: 8b af a8 00 00 00     movl    168(%rdi), %ebp
>>>>>>        ffffffff812c613d: 85 ed                 testl   %ebp, %ebp
>>>>>>        ffffffff812c613f: 75 3f                 jne
>>>>>> 0xffffffff812c6180 <perf_event_read+0x70>
>>>>>>        ffffffff812c6141: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:
>>>>>> (%rax,%rax)
>>>>>>        ffffffff812c614b: 0f 1f 44 00 00        nopl    (%rax,%rax)
>>>>>>        ffffffff812c6150: 49 8b 9e 28 02 00 00  movq    552(%r14), %rbx
>>>>>>        ffffffff812c6157: 48 89 df              movq    %rbx, %rdi
>>>>>>        ffffffff812c615a: e8 c1 a0 d7 00        callq
>>>>>> 0xffffffff82040220 <_raw_spin_lock_irqsave>
>>>>>>        ffffffff812c615f: 49 89 c7              movq    %rax, %r15
>>>>>>        ffffffff812c6162: 41 8b ae a8 00 00 00  movl    168(%r14), %ebp
>>>>>>        ffffffff812c6169: 85 ed                 testl   %ebp, %ebp
>>>>>>        ffffffff812c616b: 0f 84 9a 00 00 00     je
>>>>>> 0xffffffff812c620b <perf_event_read+0xfb>
>>>>>>        ffffffff812c6171: 48 89 df              movq    %rbx, %rdi
>>>>>>        ffffffff812c6174: 4c 89 fe              movq    %r15, %rsi
>>>>>>        <=========== NOTE: %rsi is overwritten
>>>>>>        ......
>>>>>>        ffffffff812c63f0: 41 5c                 popq    %r12
>>>>>>        <============ POP r12
>>>>>>        ffffffff812c63f2: 41 5d                 popq    %r13
>>>>>>        ffffffff812c63f4: 41 5e                 popq    %r14
>>>>>>        ffffffff812c63f6: 41 5f                 popq    %r15
>>>>>>        ffffffff812c63f8: 5d                    popq    %rbp
>>>>>>        ffffffff812c63f9: e9 e2 a8 d7 00        jmp
>>>>>> 0xffffffff82040ce0 <__x86_return_thunk>
>>>>>>        ffffffff812c63fe: 31 c0                 xorl    %eax, %eax
>>>>>>        ffffffff812c6400: e9 be fe ff ff        jmp
>>>>>> 0xffffffff812c62c3 <perf_event_read+0x1b3>
>>>>>>
>>>>>> It is not clear why dwarf didn't encode %rsi in locations. But
>>>>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI) tells us that RSI is live at
>>>>>> the entry of perf_event_read(). So this patch tries to search
>>>>>> DW_OP_GNU_entry_value/DW_OP_entry_value location/expression so if
>>>>>> the expected parameter register matchs the register in
>>>>>> DW_OP_GNU_entry_value/DW_OP_entry_value, then the original parameter
>>>>>> is not optimized.
>>>>>>
>>>>>> For one of internal 6.11 kernel, there are 62498 functions in BTF and
>>>>>> perf_event_read() is not there. With this patch, there are 61552
>>>>>> functions
>>>>>> in BTF and perf_event_read() is included.
>>>>>>
>>>>> hi Yonghong,
>>>>>
>>>>> I'm confused by these numbers. I would have thought your changes would
>>>>> have led to a net increase of functions encoded in vmlinux BTF since we
>>>>> are now likely catching more cases where registers are expected.
>>>>> When I
>>>>> ran your patches against an LLVM-built kernel, that's what I saw; 70
>>>>> additional functions were recognized as having expected parameters, and
>>>>> thus were encoded in BTF. In your case it looks like we lost nearly
>>>>> 1000
>>>>> functions. Any idea what's going on there? If you can share your
>>>>> config,
>>>>> LLVM version I can dig into this from my side too. Thanks!
>>>> Attached is my config (based on one of meta internal configs). I tried
>>>> with master branch with head:
>>>>
>>>> 7b6e5bfa2541380b478ea1532880210ea3e39e11 (HEAD -> master, origin/master,
>>>> origin/HEAD) Merge branch 'refactor-lock-management'
>>>> ae6e3a273f590a2b64f14a9fab3546c3a8f44ed4 bpf: Drop special callback
>>>> reference handling
>>>> f6b9a69a9e56b2083aca8a925fc1a28eb698e3ed bpf: Refactor active lock
>>>> management
>>>>
>>>> I am using pahole v1.27.
>>>>
>>>> I am using an llvm built from upstream. The following is llvm-project
>>>> head:
>>>> beb12f92c71981670e07e47275efc6b5647011c1 (HEAD -> main) [RISCV] Add
>>>> +optimized-nfN-segment-load-store (#114414)
>>>> 6bad4514c938b3b48c0c719b8dd98b3906f2c290 [AArch64] Extend vector mull
>>>> test coverage. NFC
>>>> 915b910d800d7fab6a692294ff1d7075d8cba824 [libc] Fix typos in proxy type
>>>> headers (#114717)
>>>> 98ea1a81a28a6dd36941456c8ab4ce46f665f57a [IPO] Remove unused includes
>>>> (NFC) (#114716)
>>>>
>>>> With the above setup, when to do
>>>>
>>>> pahole -JV --
>>>> btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs vmlinux >& log.pahole
>>>>
>>>> You will find the below info in the log:
>>>>     skipping addition of 'perf_event_read'(perf_event_read) due to
>>>> unexpected register used for paramet
>>>>
>>>> In the dwarf:
>>>>
>>>> 0x02122746:   DW_TAG_subprogram
>>>>                   DW_AT_low_pc    (0xffffffff81299740)
>>>>                   DW_AT_high_pc   (0xffffffff812999f7)
>>>>                   DW_AT_frame_base        (DW_OP_reg7 RSP)
>>>>                   DW_AT_GNU_all_call_sites        (true)
>>>>                   DW_AT_name      ("perf_event_read")
>>>>                   DW_AT_decl_file ("/home/yhs/work/bpf-next/kernel/
>>>> events/
>>>> core.c")
>>>>                   DW_AT_decl_line (4746)
>>>>                   DW_AT_prototyped        (true)
>>>>                   DW_AT_type      (0x020f95f5 "int")
>>>>
>>>> 0x02122760:     DW_TAG_formal_parameter
>>>>                     DW_AT_location        (0x00769b72:
>>>>                        [0xffffffff81299745, 0xffffffff81299764):
>>>> DW_OP_reg5 RDI
>>>>                        [0xffffffff81299764, 0xffffffff81299937):
>>>> DW_OP_reg3 RBX
>>>>                        [0xffffffff81299937, 0xffffffff812999f0):
>>>> DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value
>>>>                        [0xffffffff812999f0, 0xffffffff812999f7):
>>>> DW_OP_reg3 RBX)
>>>>                     DW_AT_name    ("event")
>>>>                     DW_AT_decl_file       ("/home/yhs/work/bpf-next/
>>>> kernel/events/core.c")
>>>>                     DW_AT_decl_line       (4746)
>>>>                     DW_AT_type    (0x0210f654 "perf_event *")
>>>>                       0x02122770:     DW_TAG_formal_parameter
>>>>                     DW_AT_location        (0x00769c61:
>>>>                        [0xffffffff81299758, 0xffffffff81299926):
>>>> DW_OP_reg6 RBP
>>>>                        [0xffffffff81299926, 0xffffffff812999f0):
>>>> DW_OP_GNU_entry_value(DW_OP_reg4 RSI), DW_OP_stack_value
>>>>                        [0xffffffff812999f0, 0xffffffff812999f7):
>>>> DW_OP_reg6 RBP)
>>>>                     DW_AT_name    ("group")
>>>>                     DW_AT_decl_file       ("/home/yhs/work/bpf-next/
>>>> kernel/events/core.c")
>>>>
>>>> The above is slightly different from our production kernel where Song
>>>> reported. But essence is the same.
>>>> The second parameter needs to check DW_OP_GNU_entry_value(DW_OP_reg4
>>>> RSI) to ensure the second
>>>> argument is available.
>>>>
>>>> My patch is supposed to only make improvement. I am curiously why you
>>>> get less functions encoded in BTF.
>>>>
>>> Thanks for the config etc! When I build bpf-next using master branch
>>> llvm and this config, I see
>>>
>>> with baseline (master branch pahole): 62371 functions, no perf_event_read
>>> your series on top of master branch pahole: 62433 functions,
>>> perf_event_read present
>>>
>>> So that's consistent with what I've seen with other configs; more
>>> functions are present in vmlinux BTF since we are now seeing more cases
>>> where parameters are in fact consistent.  The part that confuses me
>>> though is the numbers you initially reported above
>>>
>>> "for one of internal 6.11 kernel, there are 62498 functions in BTF and
>>> perf_event_read() is not there. With this patch, there are 61552
>>> functions in BTF and perf_event_read() is included."
>>>
>>> These numbers suggest you lost nearly 1000 functions when building
>>> vmlinux BTF with pahole using this series. That's the part I don't
>>> understand - we should just see a gain in numbers of functions in
>>> vmlinux BTF, right? Did you mean 62552 functions rather than 61552
>>> perhaps?
>> Sorry, really embarrassing. it is typo. Indeed it should be 62552 functions
>> in BTF instead.
>>
> No problem, makes perfect sense now, thanks! I'm trying to reproduce the
> core dumps Eduard saw now with this setup; I'll report back if I manage
> to do so and see if locks as Jiri and Arnaldo suggested help. If so a v2
> along the lines of Eduard's suggested change plus locking might be the
> best approach, what do you think? Thanks!

Thanks Alan. I will wait for your patch to fix locking issue and then
submitting v2 with your patch.

I am just starting to reproduce the issue as well with Eduard's original
script (running pahole in a loop with -j). Currently it run to iteratio 28
and not failure yet.

>
> Alan


  reply	other threads:[~2024-11-12 18:52 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 18:05 [PATCH dwarves 0/3] Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 1/3] dwarf_loader: Refactor function parameter__new() Yonghong Song
2024-11-11 11:26   ` Alan Maguire
2024-11-08 18:05 ` [PATCH dwarves 2/3] dwarf_loader: Refactor function check_dwarf_locations() Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-10 11:38   ` Eduard Zingerman
2024-11-11  8:01     ` Eduard Zingerman
2024-11-11 12:36       ` Jiri Olsa
2024-11-11 13:42         ` Arnaldo Carvalho de Melo
2024-11-15 16:26           ` elfutils thread-safety (Was: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching) Mark Wielaard
2024-11-11 18:51     ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-11  9:54   ` Jiri Olsa
2024-11-11 18:54     ` Yonghong Song
2024-11-11 15:39   ` Alan Maguire
2024-11-12  1:51     ` Yonghong Song
2024-11-12 16:56       ` Alan Maguire
2024-11-12 17:07         ` Yonghong Song
2024-11-12 18:33           ` Alan Maguire
2024-11-12 18:51             ` Yonghong Song [this message]
2024-11-12 19:10             ` Arnaldo Carvalho de Melo
2024-11-13 17:33               ` Alan Maguire
2024-11-13 18:27                 ` Yonghong Song
2024-11-14 12:16                   ` Alan Maguire
2024-11-14 16:29                     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6aa0c9e-822e-4c51-8dbe-ba6efe520b43@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dwarves@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox