Re: Payload of syscall_entry_execve - Mathieu Desnoyers via lttng-dev

From: Mathieu Desnoyers via lttng-dev <lttng-dev@lists.lttng.org>
To: Valentin Grigorev <valentin.grigorev@jetbrains.com>
Cc: lttng-dev <lttng-dev@lists.lttng.org>
Subject: Re: Payload of syscall_entry_execve
Date: Thu, 9 Jul 2020 09:15:16 -0400 (EDT)	[thread overview]
Message-ID: <434663210.6203.1594300516699.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CABhb4uu6dxo_5pVUTXaz3qkGS41Ame77Bp5ePWT42cJ2PANFvA@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 2782 bytes --]

----- On Jul 9, 2020, at 7:19 AM, lttng-dev <lttng-dev@lists.lttng.org> wrote: 

> Hello!

> Currently, I'm developing a process monitor on the base of LTTng, and I face the
> challenge of accessing command-line arguments passed to execve syscall.
> I'm using LTTng live session and Babeltrace 2 C API to analyze events in online
> mode.

> syscall_entry_execve event has 3 payload fields: filename, argv, and envp. The
> first one is a normal C-string, the second and the third semantically are `char
> *const *`,
> but provided by LTTng as simple unsigned integers (the corresponding fields in
> Babaltrace2 event payload have type BT_FIELD_CLASS_TYPE_UNSIGNED_INTEGER,
> while I expect BT_FIELD_CLASS_TYPE_DYNAMIC_ARRAY). As far as I understand, these
> integers are argv and envp pointers casted to uint64_t. But in the majority of
> cases, events produced by LTTng are analyzed by another process and often even
> offline, so these pointers became completely unuseful.

> Could you say, if there are some configuration parameters that enable to pass
> argv and envp content in syscall_entry_execve payload? Or some other ways to
> get this
> information from LTTng.

> P.S. I consider getting this information from /proc/pid/cmdline, but it is not
> looking like a clean solution.

The main reason why we don't implement this kind of instrumentation is because it would then 
capture security-sensitive data into the trace. Likewise for payload of read() and write() system 
calls for instance. 

I am not against instrumenting this information, but it should be done by add-on modules which 
can be compiled-out, and would be runtime-disabled by default. Also, we would need to extend the 
tracepoint instrumentation to identify fields which are security-sensitive, so they could be specifically 
disabled at runtime. This would also require CTF2 (Common Trace Format 2) to happen, so we can 
tag specific fields as containing sensitive data. Users should really know that they are tracing sensitive 
information when they do so. 

So adding the instrumentation to the project is not the hard part. The hard part is making sure it is 
configurable, not captured by default, and clearly identified in the traces. 

There is a second technical issue that would need solving for capturing argv and envp: we would need 
to ensure tracepoints hooked on system calls can take page faults, which is not possible today. The 
odds of taking a page fault when reading through argv and envp in a newly forked process are probably 
quite high, which would cause incomplete data. This cannot be solved in lttng-modules alone, we need 
to improve the kernel tracepoint instrumentation subsystem to do so. 

Thanks, 

Mathieu 

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 

[-- Attachment #1.2: Type: text/html, Size: 4139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev