linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Francis Laniel <flaniel@linux.microsoft.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-trace-kernel@vger.kernel.org,
	Song Liu <songliubraving@fb.com>,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU
Date: Wed, 23 Aug 2023 11:54:48 +0200	[thread overview]
Message-ID: <5964493.lOV4Wx5bFT@pwmachine> (raw)
In-Reply-To: <20230823093614.fd704a98387d0ba1d23a29fb@kernel.org>

Hi.

Le mercredi 23 août 2023, 02:36:14 CEST Masami Hiramatsu a écrit :
> Hi,
> 
> On Mon, 21 Aug 2023 14:55:14 +0200
> 
> Francis Laniel <flaniel@linux.microsoft.com> wrote:
> > > Could you tell me how do you use this feature, for what perpose?
> > 
> > Sure (I think I detailed this in the cover letter but I only sent it to
> > the
> > "main" mailing list and not the tracing one, sorry for this
> > inconvenience)!
> > 
> > Basically, I was adding NTFS tracing to an existing tool which monitors
> > slow I/Os using BPF [1].
> > To test the tool, I compiled a kernel with both NTFS module built-in and
> > figured out the write operations when done on ntfs3 were missing from the
> > output of the tool.
> > The problem comes from the library I use in the tool which does not handle
> > well when it exists different symbols with the same name.
> > Contrary to perf, which only handles kprobes through sysfs, the library
> > handles it in both way (sysfs and PMU) with a preference for PMU when
> > available [2].
> > 
> > After some discussion, I thought there could be a way to handle this
> > automatically in the kernel when using PMU kprobes, hence this patch.
> > I totally understand the case I described above is really a corner one,
> > but I thought this feature could be useful for other people.
> > In the case of the library itself, we could indeed find the address in
> > /proc/ kallsyms but it would mean having CAP_SYS_ADMIN which is not
> > forcefully something we want to enforce.
> > Also, if we need to read /boot/vmlinuz or /boot/System.map we also need to
> > be root as these files often belong to root and cannot be read by others.
> > So, this patch solves the above problem while not needing specific
> > capabilities as the kernel will solve it for us.
> 
> Thanks for the explanation. I got the background, and still have some
> questions.
> 
> - Is the analysis tool really necessary to be used by users other than
>   CAP_SYS_ADMIN? Even if it is useful, I still doubt CAP_PERFMON is safer
>   than CAP_SYS_ADMIN, because BPF program can access any kernel register.

For the tool itself, this is indeed not a problem as we rely on CAP_SYS_ADMIN.
But this one for the library, as they do not want to enforce CAP_SYS_ADMIN to 
use the library.

> - My concern about this solution (enabling kprobe PMU on all symbols which
>   have the same name) makes it hard to run the same BPF program on it.
>   This is because symbols with the same name do not necessarily have the
>   same arguments (theoretically). Also, the BPF will trace unwanted symbols
>   at unwanted timing.

Good point for the same name but different arguments!
I was too focused on my case (ntfs_file_write_iter()) and forgot about this.

> - Can you expand that library to handle the same name symbols differently?
>   I think this should be done in the user space, or in the kallsyms like
>   storing symbols with source line information.

I think we can find a way to handle this in user space by potentially 
abstracting the several PMU probe under one.
Or we can simply explode if a name correspond to several symbols and ask the 
user to use addr + offset to precise the symbol in this case.

> I understand this demand, but solving that with probing *all* symbols seems
> like a brute force solution and may cause another problem later.
> 
> But this is a good discussion item. Last month Alessandro sent a script
> which makes such symbols unique. Current problem is that the numbering is
> not enough to identify which one is from which source code.

Definitely, I wrote this specifically to create a discussion and gather some 
comments, hence the RFC tag.

> https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@gm
> ail.com/

I will definitely take a look at this contribution! Thank you for sharing the 
link!

> > > If you just need to trace/profile a specific function which has the same
> > > name symbols, you might be better to use `perf probe` +
> > > `/sys/kernel/tracing` or `perf record -e EVENT`.
> > > 
> > > Or if you need to run it with CAP_PERFMON, without CAP_SYS_ADMIN,
> > > we need to change a userspace tool to find the correct address and
> > > pass it to the perf_event_open().
> > > 
> > > > > > Added new events:
> > > > > >   probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > > >   probe:ntfs_file_write_iter (on ntfs_file_write_iter)
> > > > > > 
> > > > > > You can now use it in all perf tools, such as:
> > > > > >         perf record -e probe:ntfs_file_write_iter -aR sleep 1
> > > > > > 
> > > > > > root@vm-amd64:~# cat /sys/kernel/tracing/kprobe_events
> > > > > > p:probe/ntfs_file_write_iter _text+5088544
> > > > > > p:probe/ntfs_file_write_iter _text+5278560
> > > > > > 
> > > > > > > Thought?
> > > > > > 
> > > > > > This contribution is basically here to sort of mimic what perf
> > > > > > does
> > > > > > but
> > > > > > with PMU kprobes, as this is not possible to write in a sysfs file
> > > > > > with
> > > > > > this type of probe.
> > > > > 
> > > > > OK, I see it is for BPF only. Maybe BPF program can filter correct
> > > > > one
> > > > > to access the argument etc.
> > > > 
> > > > I am not sure I understand, can you please precise?
> > > > The eBPF program will be run when the kprobe will be triggered, so if
> > > > the
> > > > kprobe is armed for the function (e.g. old ntfs_file_write_iter()),
> > > > the
> > > > eBPF program will never be called.
> > > 
> > > As I said above, it is userspace BPF loader issue, because it has to
> > > specify the correct address via unique symbol + offset, instead of
> > > attaching all of them. I think that will be more side-effects.
> > > 
> > > But anyway, thanks for pointing this issue. I should fix kprobe event to
> > > reject the symbols which is not unique. That should be pointed by other
> > > unique symbols.
> > 
> > You are welcome and I thank you for the discussion.
> > Can you please precise more what you think about "reject the symbols which
> > is not unique"?
> 
> > Basically something like this:
> Yes, that's what I said.

OK, I will write something and send it as an RFC before end of the week then.

> > struct trace_event_call *
> > create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> > 
> > 			  bool is_return)
> > 
> > {
> > 
> > 	...
> > 	if (!addr && func) {
> 
> if (func) {  /* because anyway if user specify "func" we have to solve
>  the symbol address */
> 
> > 		array.addrs = NULL;
> > 		array.size = 0;
> > 		ret = kallsyms_on_each_match_symbol(add_addr, func, &array);
> > 		if (ret)
> > 		
> > 			goto error_free;
> > 		
> > 		if (array.size != 1) {
> > 		
> > 			/*
> > 			
> > 			 * Function name corresponding to several symbols must
> > 			 * be passed by address only.
> > 			 */
> > 			
> > 			return -EINVAL;
> 
> This case may return a unique error code so that the caller can notice
> the problem.

Is it OK to add a dedicated error code for such a case?

> Thank you,
> 
> > 		}
> > 	
> > 	}
> > 	
> > 	
> > 	
> > 	...
> > 
> > }
> > ?
> > If my understanding is correct, I think I can write a patch to achieve
> > this.
> > 
> > 
> > 
> > Best regards.
> > ---
> > [1]: https://github.com/inspektor-gadget/inspektor-gadget/pull/1879
> > [2]: https://github.com/cilium/ebpf/blob/
> > 270c859894bd38cdd0c7783317b16343409e4df8/link/kprobe.go#L165-L191

Best regards.



  reply	other threads:[~2023-08-23 10:49 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230816163517.112518-1-flaniel@linux.microsoft.com>
2023-08-16 16:35 ` [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU Francis Laniel
2023-08-16 18:42   ` Steven Rostedt
2023-08-17 10:59     ` Francis Laniel
2023-08-17 15:13       ` Steven Rostedt
2023-08-18  9:01         ` Francis Laniel
2023-08-18 12:37         ` Masami Hiramatsu
2023-08-18 15:41           ` Steven Rostedt
2023-08-18 18:13             ` Francis Laniel
2023-08-18 18:20               ` Steven Rostedt
2023-08-19  1:15                 ` Masami Hiramatsu
2023-08-19 15:22                   ` Song Liu
2023-08-20  9:32                     ` Masami Hiramatsu
2023-08-20 10:02                       ` Song Liu
2023-08-20 13:16                         ` Masami Hiramatsu
2023-08-21  6:09                           ` Song Liu
2023-08-21 10:01                             ` Masami Hiramatsu
2023-08-21 14:45                               ` Steven Rostedt
2023-08-21 18:07                                 ` Kees Cook
2023-08-21 14:29                         ` Steven Rostedt
2023-08-21 15:19                           ` Masami Hiramatsu
2023-08-21 15:28                             ` Steven Rostedt
2023-08-17  7:50   ` Masami Hiramatsu
2023-08-17 11:06     ` Francis Laniel
2023-08-18 13:05       ` Masami Hiramatsu
2023-08-18 18:12         ` Francis Laniel
2023-08-19  1:11           ` Masami Hiramatsu
2023-08-20 20:23             ` Jiri Olsa
2023-08-21 12:22               ` Francis Laniel
2023-08-20 20:34             ` Jiri Olsa
2023-08-21 12:24               ` Francis Laniel
2023-08-22 13:13                 ` Jiri Olsa
2023-08-21 12:55             ` Francis Laniel
2023-08-23  0:36               ` Masami Hiramatsu
2023-08-23  9:54                 ` Francis Laniel [this message]
2023-08-23 13:45                   ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5964493.lOV4Wx5bFT@pwmachine \
    --to=flaniel@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).