kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* Linux Device Performance Analyses - Where are my cylces?
@ 2025-08-01  9:34 Lucas Tanure
  2025-08-01 14:42 ` Levi Zim
  0 siblings, 1 reply; 2+ messages in thread
From: Lucas Tanure @ 2025-08-01  9:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List, kernelnewbies

Hey there,

I'm a kernel developer working on an Android device with a 5.10 kernel. 
I'm trying to improve performance, and to do that, I need to figure out 
exactly where the CPU is spending its time.

Our vendor provides a generic Android BSP, so the system is likely 
running many unnecessary processes for things like GPS and phone calls 
that my device doesn't even need. Since I don't have a good handle on 
all these processes, I want to create a detailed breakdown of all the 
tasks, workqueues, IRQs, kernel threads, softIRQs, and tasklets running 
and measure the CPU time each one is using.

To do this, I've been using kernel traces with the following events:

task:task_newtask sched:sched_process_fork
sched:sched_process_exec sched:sched_process_exit sched:sched_switch 
irq:irq_handler_entry irq:irq_handler_exit irq:softirq_entry 
irq:softirq_exit workqueue:workqueue_execute_start 
workqueue:workqueue_execute_end syscalls:sys_enter_execve 
syscalls:sys_exit_execve syscalls:sys_enter_execveat 
syscalls:sys_exit_execveat

However, the trace logs don't provide the full picture. For example, 
when a new process is executed, the logs don't specify the script being 
run. The sched_switch event tells me which PID is being executed, but 
multiple processes can originate from the same binary, so it's hard to 
distinguish them.

To solve this, I've developed a patch for the do_execveat_common 
function to log all new processes along with their full command lines. I 
plan to use this patch with a Python script to parse the logs and 
generate a report on CPU usage.

My main questions are:

- Is my patch correct? When I log the new process, do current->pid and 
argv refer to the same new process?
- It feels like logging new processes with their command lines is a 
fundamental requirement for this kind of analysis. Am I missing a better 
or more standard way of doing this?
- I don't want to reinvent the wheel, so if there's a better way to 
analyze kernel performance on a device, I'd love to hear about it.

After looking into other methods like eBPF and kprobes, I found that 
this patch is the most straightforward way to get the information I need.

Any insights would be greatly appreciated!

Thanks
Lucas Tanure

diff --git a/fs/exec.c b/fs/exec.c
index 6e1f8628ab9c..ab797ba0571c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -513,6 +513,49 @@ static int bprm_stack_limits(struct linux_binprm *bprm)
  	return 0;
  }

+static char* str_cmdline(int argc, struct user_arg_ptr argv, struct 
linux_binprm *bprm)
+{
+	char *buf = (char *)get_zeroed_page(GFP_KERNEL);
+	ssize_t pos = 0;
+	int arg;
+
+	for (arg = 0; arg < argc; arg++) {
+		const char __user *str;
+		int len;
+
+		str = get_user_arg_ptr(argv, arg);
+		if (IS_ERR(str))
+			goto free_page;
+
+		// this includes a final null.
+		len = strnlen_user(str, MAX_ARG_STRLEN);
+		if (!len)
+			goto free_page;
+
+		if (!valid_arg_len(bprm, len))
+			goto free_page;
+
+		if (pos + len >= PAGE_SIZE)
+			break; // Return the command line up to a page
+
+		len -= 1; // Don't copy the final null.
+		if (copy_from_user(buf+pos, str, len))
+			goto free_page;
+
+		pos += len;
+
+		if (arg < argc - 1)
+			buf[pos++] = ' ';
+	}
+
+	return buf;
+
+free_page:
+	free_page((unsigned long)buf);
+
+	return NULL;
+}
+
  /*
   * 'copy_strings()' copies argument/environment strings from the old
   * processes's memory to the new process's stack.  The call to 
get_user_pages()
@@ -1874,6 +1917,7 @@ static int do_execveat_common(int fd, struct 
filename *filename,
  {
  	struct linux_binprm *bprm;
  	int retval;
+	char *cmdline;

  	if (IS_ERR(filename))
  		return PTR_ERR(filename);
@@ -1943,6 +1987,15 @@ static int do_execveat_common(int fd, struct 
filename *filename,
  		bprm->argc = 1;
  	}

+	cmdline = str_cmdline(bprm->argc, argv, bprm);
+	pr_info("%s pid: %d, comm: %s # filename: %s # cmdline: %s\n", __func__,
+								       current->pid,
+								       current->comm,
+								       bprm->filename,
+								       cmdline ? cmdline : "NULL");
+	if (cmdline)
+		free_page((unsigned long)cmdline);
+
  	retval = bprm_execve(bprm, fd, filename, flags);
  out_free:
  	free_bprm(bprm);

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: Linux Device Performance Analyses - Where are my cylces?
  2025-08-01  9:34 Linux Device Performance Analyses - Where are my cylces? Lucas Tanure
@ 2025-08-01 14:42 ` Levi Zim
  0 siblings, 0 replies; 2+ messages in thread
From: Levi Zim @ 2025-08-01 14:42 UTC (permalink / raw)
  To: Lucas Tanure, Linux Kernel Mailing List, kernelnewbies

Hi Lucas,

On 8/1/25 5:34 PM, Lucas Tanure wrote:
> Hey there,
>
> I'm a kernel developer working on an Android device with a 5.10 
> kernel. I'm trying to improve performance, and to do that, I need to 
> figure out exactly where the CPU is spending its time.
>
> Our vendor provides a generic Android BSP, so the system is likely 
> running many unnecessary processes for things like GPS and phone calls 
> that my device doesn't even need. Since I don't have a good handle on 
> all these processes, I want to create a detailed breakdown of all the 
> tasks, workqueues, IRQs, kernel threads, softIRQs, and tasklets 
> running and measure the CPU time each one is using.
>
> To do this, I've been using kernel traces with the following events:
>
> task:task_newtask sched:sched_process_fork
> sched:sched_process_exec sched:sched_process_exit sched:sched_switch 
> irq:irq_handler_entry irq:irq_handler_exit irq:softirq_entry 
> irq:softirq_exit workqueue:workqueue_execute_start 
> workqueue:workqueue_execute_end syscalls:sys_enter_execve 
> syscalls:sys_exit_execve syscalls:sys_enter_execveat 
> syscalls:sys_exit_execveat
>
> However, the trace logs don't provide the full picture. For example, 
> when a new process is executed, the logs don't specify the script 
> being run. The sched_switch event tells me which PID is being 
> executed, but multiple processes can originate from the same binary, 
> so it's hard to distinguish them.
>
> To solve this, I've developed a patch for the do_execveat_common 
> function to log all new processes along with their full command lines. 
> I plan to use this patch with a Python script to parse the logs and 
> generate a report on CPU usage.
>
> My main questions are:
>
> - Is my patch correct? When I log the new process, do current->pid and 
> argv refer to the same new process?
Although I am not an experienced kernel developer, the patch LGTM, but 
it comes with a caveat that joining the args with whitespace makes it 
impossible to tell whether a whitespace comes from the arg itself or 
serves as a delimiter for the argument. It is fine for identifying the 
process but can cause problems for debugging.
> - It feels like logging new processes with their command lines is a 
> fundamental requirement for this kind of analysis. Am I missing a 
> better or more standard way of doing this?

I think the standard way to do this is using

- strace(strace -Y -f -qqq -s99999 -e trace=execve,execveat <CMD>), 
which is implemented via ptrace and might be hard to use if you want to 
trace all exec events as that would require you to strace all processes.
- execsnoop, which is implemented via eBPF and thus naturally traces all 
exec events. There are multiple implementations of execsnoop, from BCC 
and bpftrace. (And there are other implementations, one from 
brendangregg/perf-tools implemented via ftrace for example)
- tracexec [1] (disclaimer: I am the author),  which has both ptrace and 
eBPF backends united within three consistent user interface(log, TUI and 
JSON export). I think this is so far the most debugging friendly 
solution as it can also capture the (diff of) environment variables, 
inherited file descriptors and output the cmdline as copy'n'paste-able 
shell command. Most eBPF tracing solutions like execsnoop suffer from an 
issue that it silently ignores exec events happening in non-native 
syscall interface(like 32bit exec on x64 systems), which is addressed in 
tracexec. But as the eBPF backend of tracexec is still experimental and 
requires 6.x kernel, it might not suit your need well.

> - I don't want to reinvent the wheel, so if there's a better way to 
> analyze kernel performance on a device, I'd love to hear about it.
>
> After looking into other methods like eBPF and kprobes, I found that 
> this patch is the most straightforward way to get the information I need.
I think eBPF is the most convenient way to go and there are already a 
lot of tools available but as you are already a kernel developer you may 
find patching the kernel more convenient. Hope you find the solution 
that best suit you.

[1]: https://github.com/kxxt/tracexec

Best,
Levi

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-01 15:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01  9:34 Linux Device Performance Analyses - Where are my cylces? Lucas Tanure
2025-08-01 14:42 ` Levi Zim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).