linux-trace-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: rostedt@goodmis.org, mhiramat@kernel.org
Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
	beaub@linux.microsoft.com
Subject: [PATCH v4 10/10] user_events: Add documentation file
Date: Thu,  4 Nov 2021 10:04:33 -0700	[thread overview]
Message-ID: <20211104170433.2206-11-beaub@linux.microsoft.com> (raw)
In-Reply-To: <20211104170433.2206-1-beaub@linux.microsoft.com>

Add a documentation file about user_events with example code, etc.
explaining how it may be used.

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
---
 Documentation/trace/user_events.rst | 298 ++++++++++++++++++++++++++++
 1 file changed, 298 insertions(+)
 create mode 100644 Documentation/trace/user_events.rst

diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
new file mode 100644
index 000000000000..d79c9f07d012
--- /dev/null
+++ b/Documentation/trace/user_events.rst
@@ -0,0 +1,298 @@
+=========================================
+user_events: User-based Event Tracing
+=========================================
+
+:Author: Beau Belgrave
+
+Overview
+--------
+User based trace events allow user processes to create events and trace data
+that can be viewed via existing tools, such as ftrace, perf and eBPF.
+To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
+
+Programs can view status of the events via 
+/sys/kernel/debug/tracing/user_events_status and can both register and write
+data out via /sys/kernel/debug/tracing/user_events_data.
+
+Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
+delete user based events via the u: prefix. The format of the command to
+dynamic_events is the same as the ioctl with the u: prefix applied.
+
+Typically programs will register a set of events that they wish to expose to
+tools that can read trace_events (such as ftrace and perf). The registration
+process gives back two ints to the program for each event. The first int is the
+status index. This index describes which byte in the 
+/sys/kernel/debug/tracing/user_events_status file represents this event. The
+second int is the write index. This index describes the data when a write() or
+writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
+
+The structures referenced in this document are contained with the
+/include/uap/linux/user_events.h file in the source tree.
+
+**NOTE:** *Both user_events_status and user_events_data are under the tracefs filesystem
+and may be mounted at different paths than above.*
+
+Registering
+-----------
+Registering within a user process is done via ioctl() out to the
+/sys/kernel/debug/tracing/user_events_data file. The command to issue is
+DIAG_IOCSREG. This command takes a struct user_reg as an argument.
+
+The struct user_reg requires two values, the first is the size of the structure
+to ensure forward and backward compatibility. The second is the command string
+to issue for registering.
+
+User based events show up under tracefs like any other event under the subsystem
+named "user_events". This means tools that wish to attach to the events need to
+use /sys/kernel/debug/tracing/events/user_events/[name]/enable or perf record
+-e user_events:[name] when attaching/recording.
+
+**NOTE:** *The write_index returned is only valid for the FD that was used*
+
+Command Format
+^^^^^^^^^^^^^^
+The command string format is as follows:
+
+::
+
+  name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
+
+Supported Flags
+^^^^^^^^^^^^^^^
+**BPF_ITER** - EBPF programs attached to this event will get the raw iovec
+struct instead of any data copies for max performance.
+
+Field Format
+^^^^^^^^^^^^
+
+::
+
+  type name [size]
+
+Basic types are supported (__data_loc, u32, u64, int, char, char[20]).
+User programs are encouraged to use clearly sized types like u32.
+
+**NOTE:** *Long is not supported since size can vary between user and kernel.*
+
+The size is only valid for types that start with a struct prefix.
+This allows user programs to describe custom structs out to tools, if required.
+
+For example, a struct in C that looks like this:
+
+::
+
+  struct mytype {
+    char data[20];
+  };
+
+Would be represented by the following field:
+
+::
+
+  struct mytype myname 20
+
+Status
+------
+When tools attach/record user based events the status of the event is updated
+in realtime. This allows user programs to only incur the cost of the write() or
+writev() calls when something is actively attached to the event.
+
+User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
+check the status for each event that is registered. The byte to check in the
+file is given back after the register ioctl() via user_reg.status_index.
+Currently the size of user_events_status is a single page, however, custom
+kernel configurations can change this size to allow more user based events. In
+all cases the size of the file is a multiple of a page size.
+
+For example, if the register ioctl() gives back a status_index of 3 you would
+check byte 3 of the returned mmap data to see if anything is attached to that
+event.
+
+Administrators can easily check the status of all registered events by reading
+the user_events_status file directly via a terminal. The output is as follows:
+
+::
+
+  Byte:Name [# Comments]
+  ...
+
+  Active: ActiveCount
+  Buisy: BusyCount
+  Max: MaxCount
+
+For example, on a system that has a single event the output looks like this:
+
+::
+
+  1:test
+
+  Active: 1
+  Busy: 0
+  Max: 4096
+
+If a user enables the user event via ftrace, the output would change to this:
+
+:: 
+
+  1:test # Used by ftrace
+
+  Active: 1
+  Busy: 1
+  Max: 4096
+
+**NOTE:** *A status index of 0 will never be returned. This allows user 
+programs to have an index that can be used on error cases.*
+
+Status Bits
+^^^^^^^^^^^
+The byte being checked will be non-zero if anything is attached. Programs can
+check specific bits in the byte to see what mechanism has been attached.
+
+The following values are defined to aid in checking what has been attached:
+**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
+
+**EVENT_STATUS_PERF** - Bit set if perf/eBPF has been attached (Bit 1).
+
+Writing Data
+------------
+After registering an event the same fd that was used to register can be used
+to write an entry for that event. The write_index returned must be at the start
+of the data, then the remaining data is treated as the payload of the event.
+
+For example, if write_index returned was 1 and I wanted to write out an int
+payload of the event. Then the data would have to be 8 bytes (2 ints) long,
+with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
+value I want as the payload.
+
+In memory this would look like this:
+
+::
+
+  int index;
+  int payload;
+
+User programs might have well known structs that they wish to use to emit out
+as payloads. In those cases writev() can be used, with the first vector being
+the index and the following vector(s) being the actual event payload.
+
+For example, if I have a struct like this:
+
+::
+
+  struct payload {
+        int src;
+        int dst;
+        int flags;
+  };
+
+It's advised for user programs to do the following:
+
+:: 
+
+  struct iovec io[2];
+  struct payload e;
+
+  io[0].iov_base = &write_index;
+  io[0].iov_len = sizeof(write_index);
+  io[1].iov_base = &e;
+  io[1].iov_len = sizeof(e);
+
+  writev(fd, (const struct iovec*)io, 2);
+
+**NOTE:** *The write_index is not emitted out into the trace being recorded.*
+
+EBPF
+----
+EBPF programs that attach to a user-based event tracepoint are given a pointer
+to a struct user_bpf_context. The bpf context contains the data type (which can
+be a user or kernel buffer, or can be a pointer to the iovec) and the data
+length that was emitted (minus the write_index).
+
+Example Code
+------------
+
+::
+
+  #include <errno.h>
+  #include <sys/ioctl.h>
+  #include <sys/mman.h>
+  #include <fcntl.h>
+  #include <stdio.h>
+  #include <unistd.h>
+  #include <linux/user_events.h>
+  
+  /* Assumes debugfs is mounted */
+  const char *data_file = "/sys/kernel/debug/tracing/user_events_data";
+  const char *status_file = "/sys/kernel/debug/tracing/user_events_status";
+  
+  static int event_status(char **status)
+  {
+  	int fd = open(status_file, O_RDONLY);
+  
+  	*status = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_READ,
+  		       MAP_SHARED, fd, 0);
+  
+  	close(fd);
+  
+  	if (*status == MAP_FAILED)
+  	      return -1;
+  
+  	return 0;
+  }
+  
+  static int event_reg(int fd, const char *command, int *status, int *write)
+  {
+  	struct user_reg reg = {0};
+  
+  	reg.size = sizeof(reg);
+  	reg.name_args = (__u64)command;
+  
+  	if (ioctl(fd, DIAG_IOCSREG, &reg) == -1)
+  		return -1;
+  
+  	*status = reg.status_index;
+  	*write = reg.write_index;
+  
+  	return 0;
+  }
+  
+  int main(int argc, char **argv)
+  {
+  	int data_fd, status, write;
+  	char *status_page;
+  	struct iovec io[2];
+  	__u32 count = 0;
+  
+  	if (event_status(&status_page) == -1)
+  		return errno;
+  
+  	data_fd = open(data_file, O_RDWR);
+  
+  	if (event_reg(data_fd, "test u32 count", &status, &write) == -1)
+  		return errno;
+  
+  	/* Setup iovec */
+  	io[0].iov_base = &status;
+  	io[0].iov_len = sizeof(status);
+  	io[1].iov_base = &count;
+  	io[1].iov_len = sizeof(count);
+  
+  ask:
+  	printf("Press enter to check status...\n");
+  	getchar();
+  
+  	/* Check if anyone is listening */
+  	if (status_page[status]) {
+  		/* Yep, trace out our data */
+  		writev(data_fd, (const struct iovec*)io, 2);
+  
+  		/* Increase the count */
+  		count++;
+  
+  		printf("Something was attached, wrote data\n");
+  	}
+  
+  	goto ask;
+  
+  	return 0;
+  }
-- 
2.17.1


  parent reply	other threads:[~2021-11-04 17:04 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-04 17:04 [PATCH v4 00/10] user_events: Enable user processes to create and write to trace events Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 01/10] user_events: Add UABI header for user access to user_events Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 02/10] user_events: Add minimal support for trace_event into ftrace Beau Belgrave
2021-11-04 21:34   ` kernel test robot
2021-11-08  2:32     ` Masami Hiramatsu
2021-11-08 16:59       ` Beau Belgrave
2021-11-07 14:31   ` Masami Hiramatsu
2021-11-08 17:13     ` Beau Belgrave
2021-11-08 18:16       ` Steven Rostedt
2021-11-08 20:25         ` Beau Belgrave
2021-11-08 21:00           ` Steven Rostedt
2021-11-08 22:09             ` Beau Belgrave
2021-11-08 22:30               ` Steven Rostedt
2021-11-08 22:59                 ` Beau Belgrave
2021-11-09  4:58               ` Masami Hiramatsu
2021-11-09  2:56           ` Masami Hiramatsu
2021-11-09 19:08             ` Beau Belgrave
2021-11-09 19:25               ` Steven Rostedt
2021-11-09 20:14                 ` Beau Belgrave
2021-11-09 20:45                   ` Steven Rostedt
2021-11-09 21:27                     ` Beau Belgrave
2021-11-09 21:39                       ` Steven Rostedt
2021-11-10 13:56               ` Masami Hiramatsu
2021-11-11 17:33                 ` Beau Belgrave
2021-11-12 13:40                   ` Masami Hiramatsu
2021-11-07 18:18   ` Steven Rostedt
2021-11-08 19:56     ` Beau Belgrave
2021-11-08 20:53       ` Steven Rostedt
2021-11-08 21:15         ` Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 03/10] user_events: Add print_fmt generation support for basic types Beau Belgrave
2021-11-08 22:03   ` Steven Rostedt
2021-11-04 17:04 ` [PATCH v4 04/10] user_events: Handle matching arguments from dyn_events Beau Belgrave
2021-11-08 22:05   ` Steven Rostedt
2021-11-04 17:04 ` [PATCH v4 05/10] user_events: Add basic perf and eBPF support Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 06/10] user_events: Add self-test for ftrace integration Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 07/10] user_events: Add self-test for dynamic_events integration Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 08/10] user_events: Add self-test for perf_event integration Beau Belgrave
2021-11-04 17:04 ` [PATCH v4 09/10] user_events: Optimize writing events by only copying data once Beau Belgrave
2021-11-08 22:45   ` Steven Rostedt
2021-11-08 23:00     ` Beau Belgrave
2021-11-08 23:04       ` Steven Rostedt
2021-11-08 23:17         ` Beau Belgrave
2021-11-08 23:20           ` Steven Rostedt
2021-11-04 17:04 ` Beau Belgrave [this message]
2021-11-04 19:05   ` [PATCH v4 10/10] user_events: Add documentation file Jonathan Corbet
2021-11-04 21:08     ` Beau Belgrave
2021-11-04 21:18       ` Jonathan Corbet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211104170433.2206-11-beaub@linux.microsoft.com \
    --to=beaub@linux.microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).