All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Florent Revest <revest@chromium.org>
Cc: linux-trace-kernel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	bpf <bpf@vger.kernel.org>, Sven Schnelle <svens@linux.ibm.com>,
	Alexei Starovoitov <ast@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alan Maguire <alan.maguire@oracle.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>, Guo Ren <guoren@kernel.org>
Subject: [RFC PATCH v2 17/31] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
Date: Wed,  8 Nov 2023 23:27:38 +0900	[thread overview]
Message-ID: <169945365765.55307.4129860949245209954.stgit@devnote2> (raw)
In-Reply-To: <169945345785.55307.5003201137843449313.stgit@devnote2>

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call
fgraph_reserve_data() to store up to 32 words onto the task's shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v2:
  - Retrieve the reserved size by fgraph_retrieve_data().
  - Expand the maximum data size to 32 words.
  - Update stack index with __get_index(val) if FGRAPH_TYPE_ARRAY entry.
  - fix typos and make description lines shorter than 76 chars.
---
 include/linux/ftrace.h |    3 +
 kernel/trace/fgraph.c  |  248 +++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 217 insertions(+), 34 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 3f9f1f48e8fd..3bc01329548b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1074,6 +1074,9 @@ struct fgraph_ops {
 	int				idx;
 };
 
+void *fgraph_reserve_data(int size_bytes);
+void *fgraph_retrieve_data(int *size_bytes);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 79bdd3c775dd..4d8664942335 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -38,25 +38,36 @@
  * bits: 14 - 15	Type of storage
  *			  0 - reserved
  *			  1 - fgraph_array index
+ *			  2 - reservered data
  * For fgraph_array_index:
  *  bits: 16 - 23	The fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 16 - 17	The size in words that is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
+ *
+ * |                                     | <- task->curr_ret_stack
+ * +-------------------------------------+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|type:1|(5)| ( 3 for index of fourth fgraph_ops)
+ * +-------------------------------------+
+ * | (3 << FGRAPH_DATA_SHIFT)|type:2|(4) | ( Data with size of 2 words)
+ * +-------------------------------------+ ( It is 4 words from the ret_stack)
+ * |         STORED DATA WORD 2          |
+ * |         STORED DATA WORD 1          |
+ * +-------------------------------------+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|type:1|(1)| ( 0 for index of first fgraph_ops)
+ * +-------------------------------------+
+ * | struct ftrace_ret_stack             |
+ * |   (stores the saved ret pointer)    |
+ * +-------------------------------------+
+ * |             (X) | (N)               | ( N words away from last ret_stack)
+ * |                                     |
  *
- * |                                  | <- task->curr_ret_stack
- * +----------------------------------+
- * | (3 << FGRAPH_ARRAY_SHIFT)|(2)    | ( 3 for index of fourth fgraph_ops)
- * +----------------------------------+
- * | (0 << FGRAPH_ARRAY_SHIFT)|(1)    | ( 0 for index of first fgraph_ops)
- * +----------------------------------+
- * | struct ftrace_ret_stack          |
- * |   (stores the saved ret pointer) |
- * +----------------------------------+
- * |             (X) | (N)            | ( N words away from previous ret_stack)
- * |                                  |
  *
  * If a backtrace is required, and the real return pointer needs to be
  * fetched, then it looks at the task's curr_ret_stack index, if it
@@ -77,12 +88,17 @@
 enum {
 	FGRAPH_TYPE_RESERVED	= 0,
 	FGRAPH_TYPE_ARRAY	= 1,
+	FGRAPH_TYPE_DATA	= 2,
 };
 
 #define FGRAPH_ARRAY_SIZE	16
 #define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
 #define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
+#define FGRAPH_DATA_SIZE	5
+#define FGRAPH_DATA_MASK	((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
 /* Currently the max stack index can't be more than register callers */
 #define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
 
@@ -97,6 +113,8 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
+
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the
  * ret_stack to store task specific state.
@@ -111,21 +129,44 @@ static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int __get_index(unsigned long val)
+{
+	return val & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int __get_type(unsigned long val)
+{
+	return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline int __get_array(unsigned long val)
+{
+	return (val >> FGRAPH_ARRAY_SHIFT) & FGRAPH_ARRAY_MASK;
+}
+
+static inline int __get_data(unsigned long val)
+{
+	return (val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK;
+}
+
 static inline int get_ret_stack_index(struct task_struct *t, int offset)
 {
-	return current->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+	return __get_index(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_type(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) &
-		FGRAPH_TYPE_MASK;
+	return __get_type(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_array(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_ARRAY_SHIFT) &
-		FGRAPH_ARRAY_MASK;
+	return __get_array(current->ret_stack[offset]);
+}
+
+static inline int get_data_idx(struct task_struct *t, int offset)
+{
+	return __get_data(current->ret_stack[offset]);
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
@@ -161,6 +202,124 @@ static void ret_stack_init_task_vars(unsigned long *ret_stack)
 	memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
 }
 
+/**
+ * fgraph_reserve_data - Reserve storage on the task's ret_stack
+ * @size_bytes: The size in bytes to reserve (max of 4 words in size)
+ *
+ * Reserves space of up to 4 words (in word increments) on the
+ * task's ret_stack shadow stack, for a given fgraph_ops during
+ * the entryfunc() call. If entryfunc() returns zero, the storage
+ * is discarded. An entryfunc() can only call this once per iteration.
+ * The fgraph_ops retfunc() can retrieve this stored data with
+ * fgraph_retrieve_data().
+ *
+ * Returns: On success, a pointer to the data on the stack.
+ *   Otherwise, NULL if there's not enough space left on the
+ *   ret_stack for the data, or if fgraph_reserve_data() was called
+ *   more than once for a single entryfunc() call.
+ */
+void *fgraph_reserve_data(int size_bytes)
+{
+	unsigned long val;
+	void *data;
+	int curr_ret_stack = current->curr_ret_stack;
+	int data_size;
+	int size;
+
+	if (size_bytes > FGRAPH_MAX_DATA_SIZE)
+		return NULL;
+
+	/* Convert to number of longs + data word */
+	data_size = ALIGN(size_bytes, sizeof(long)) / sizeof(long) + 1;
+
+	/* The size to add to ret_stack (including the reserve word) */
+	size = data_size + 1;
+
+	val = current->ret_stack[curr_ret_stack - 1];
+
+	switch (__get_type(val)) {
+	case FGRAPH_TYPE_RESERVED:
+		/*
+		 * A reserve word is only saved after the ret_stack
+		 * or after a data storage, not after an fgraph_array
+		 * entry. It's OK if its after the ret_stack in which
+		 * case the index will be one, but if the index is
+		 * greater than 1 it means it's a double call to
+		 * fgraph_reserve_data()
+		 */
+		if (__get_index(val) > 1)
+			return NULL;
+		/*
+		 * Leave the reserve in case the entryfunc() doesn't
+		 * want to be recorded.
+		 */
+		break;
+	case FGRAPH_TYPE_ARRAY:
+		break;
+	default:
+		return NULL;
+	}
+	data = &current->ret_stack[curr_ret_stack];
+
+	curr_ret_stack += size;
+	if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_INDEX))
+		return NULL;
+
+	val = __get_index(val) + size;
+
+	/* Set the last word to be reserved */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	/* Make sure interrupts see this */
+	barrier();
+	current->curr_ret_stack = curr_ret_stack;
+	/* Again sync with interrupts, and reset reserve */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	val = (data_size << FGRAPH_DATA_SHIFT) |
+		(FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT) |
+		(val - 1);
+
+	/* Save the data header */
+	current->ret_stack[curr_ret_stack - 2] = val;
+
+	return data;
+}
+
+/**
+ * fgraph_retrieve_data - Retrieve stored data from fgraph_reserve_data()
+ * @size_bytes: pointer to retrieved data size.
+ *
+ * This is to be called by a fgraph_ops retfunc(), to retrieve data that
+ * was stored by the fgraph_ops entryfunc() on the function entry.
+ * That is, this will retrieve the data that was reserved on the
+ * entry of the function that corresponds to the exit of the function
+ * that the fgraph_ops retfunc() is called on.
+ *
+ * Returns: The stored data from fgraph_reserve_data() called by the
+ *    matching entryfunc() for the retfunc() this is called from.
+ *   Or NULL if there was nothing stored.
+ */
+void *fgraph_retrieve_data(int *size_bytes)
+{
+	unsigned long val;
+	int curr_ret_stack = current->curr_ret_stack;
+
+	/* Top of stack is the fgraph_ops */
+	val = current->ret_stack[curr_ret_stack - 1];
+	/* Check if there's nothing between the fgraph_ops and ret_stack */
+	if (__get_index(val) == 1)
+		return NULL;
+	val = current->ret_stack[curr_ret_stack - 2];
+	if (__get_type(val) != FGRAPH_TYPE_DATA)
+		return NULL;
+	if (size_bytes)
+		*size_bytes = (__get_data(val) - 1) * sizeof(long);
+
+	return &current->ret_stack[curr_ret_stack -
+				   (__get_data(val) + 1)];
+}
+
 /**
  * fgraph_get_task_var - retrieve a task specific state variable
  * @gops: The ftrace_ops that owns the task specific variable
@@ -356,6 +515,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int save_curr_ret_stack;
 	int offset;
 	int start;
 	int type;
@@ -395,8 +555,10 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
+		save_curr_ret_stack = current->curr_ret_stack;
 		if (ftrace_ops_test(&gops->ops, func, NULL) &&
 		    gops->entryfunc(&trace, gops)) {
+			/* Note, curr_ret_stack could change by enryfunc() */
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -430,6 +592,9 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			barrier();
 			current->ret_stack[offset] = val;
 			cnt++;
+		} else {
+			/* Clear out any saved storage */
+			current->curr_ret_stack = save_curr_ret_stack;
 		}
 	}
 
@@ -544,10 +709,10 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
-	int offset;
+	int curr_ret_stack;
+	int stop_at;
 	int index;
 	int idx;
-	int i;
 
 	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
@@ -563,24 +728,39 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
 	trace.retval = fgraph_ret_regs_return_value(ret_regs);
 #endif
 
-	offset = current->curr_ret_stack - 1;
-	index = get_ret_stack_index(current, offset);
+	curr_ret_stack = current->curr_ret_stack;
+	index = get_ret_stack_index(current, curr_ret_stack - 1);
+
+	stop_at = curr_ret_stack - index;
 
 	/* index has to be at least one! Optimize for it */
-	i = 0;
 	do {
-		idx = get_fgraph_array(current, offset - i);
-		fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
-		i++;
-	} while (i < index);
+		unsigned long val;
+
+		val = current->ret_stack[curr_ret_stack - 1];
+		switch (__get_type(val)) {
+		case FGRAPH_TYPE_ARRAY:
+			idx = __get_array(val);
+			fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
+			curr_ret_stack -= __get_index(val);
+			break;
+		case FGRAPH_TYPE_RESERVED:
+			curr_ret_stack--;
+			break;
+		case FGRAPH_TYPE_DATA:
+			curr_ret_stack -= __get_data(val);
+			break;
+		default:
+			WARN_ONCE(1, "Bad fgraph ret_stack data type %d",
+				  __get_type(val));
+			curr_ret_stack--;
+		}
+		/* Make sure interrupts see the update after the above */
+		barrier();
+		current->curr_ret_stack = curr_ret_stack;
+	} while (curr_ret_stack > stop_at);
 
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_stack -= FGRAPH_RET_INDEX;
 	current->curr_ret_depth--;
 	return ret;
 }


  parent reply	other threads:[~2023-11-08 14:27 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 14:24 [RFC PATCH v2 00/31] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph Masami Hiramatsu (Google)
2023-11-08 14:24 ` [RFC PATCH v2 01/31] tracing: Add a comment about ftrace_regs definition Masami Hiramatsu (Google)
2023-11-08 23:14   ` Masami Hiramatsu
2023-11-10 11:11     ` Mark Rutland
2023-11-11  2:24       ` Masami Hiramatsu
2023-11-08 14:24 ` [RFC PATCH v2 02/31] x86: tracing: Add ftrace_regs definition in the header Masami Hiramatsu (Google)
2023-11-08 14:24 ` [RFC PATCH v2 03/31] seq_buf: Export seq_buf_puts() Masami Hiramatsu (Google)
2023-11-08 14:25 ` [RFC PATCH v2 04/31] function_graph: Convert ret_stack to a series of longs Masami Hiramatsu (Google)
2023-11-08 14:25 ` [RFC PATCH v2 05/31] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long Masami Hiramatsu (Google)
2023-11-08 14:25 ` [RFC PATCH v2 06/31] function_graph: Add an array structure that will allow multiple callbacks Masami Hiramatsu (Google)
2023-11-08 14:25 ` [RFC PATCH v2 07/31] function_graph: Allow multiple users to attach to function graph Masami Hiramatsu (Google)
2023-11-08 14:25 ` [RFC PATCH v2 08/31] function_graph: Remove logic around ftrace_graph_entry and return Masami Hiramatsu (Google)
2023-11-08 14:26 ` [RFC PATCH v2 09/31] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Masami Hiramatsu (Google)
2023-11-08 14:26 ` [RFC PATCH v2 10/31] ftrace: Allow function_graph tracer to be enabled in instances Masami Hiramatsu (Google)
2023-11-08 14:26 ` [RFC PATCH v2 11/31] ftrace: Allow ftrace startup flags exist without dynamic ftrace Masami Hiramatsu (Google)
2023-11-08 14:26 ` [RFC PATCH v2 12/31] function_graph: Have the instances use their own ftrace_ops for filtering Masami Hiramatsu (Google)
2023-11-10  1:51   ` Masami Hiramatsu
2023-11-10  2:18     ` Steven Rostedt
2023-11-10  3:09       ` Masami Hiramatsu
2023-11-08 14:26 ` [RFC PATCH v2 13/31] function_graph: Add "task variables" per task for fgraph_ops Masami Hiramatsu (Google)
2023-11-08 14:27 ` [RFC PATCH v2 14/31] function_graph: Move set_graph_function tests to shadow stack global var Masami Hiramatsu (Google)
2023-11-08 14:27 ` [RFC PATCH v2 15/31] function_graph: Move graph depth stored data " Masami Hiramatsu (Google)
2023-11-08 14:27 ` [RFC PATCH v2 16/31] function_graph: Move graph notrace bit " Masami Hiramatsu (Google)
2023-11-08 14:27 ` Masami Hiramatsu (Google) [this message]
2023-11-08 14:27 ` [RFC PATCH v2 18/31] function_graph: Add selftest for passing local variables Masami Hiramatsu (Google)
2023-11-08 14:28 ` [RFC PATCH v2 19/31] function_graph: Add a new entry handler with parent_ip and ftrace_regs Masami Hiramatsu (Google)
2023-11-08 14:28 ` [RFC PATCH v2 20/31] function_graph: Add a new exit " Masami Hiramatsu (Google)
2023-11-08 23:17   ` Masami Hiramatsu
2023-11-08 14:28 ` [RFC PATCH v2 21/31] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS Masami Hiramatsu (Google)
2023-11-08 14:28 ` [RFC PATCH v2 22/31] fprobe: Use ftrace_regs in fprobe entry handler Masami Hiramatsu (Google)
2023-11-08 14:28 ` [RFC PATCH v2 23/31] fprobe: Use ftrace_regs in fprobe exit handler Masami Hiramatsu (Google)
2023-11-08 14:28 ` [RFC PATCH v2 24/31] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs Masami Hiramatsu (Google)
2023-11-08 14:29 ` [RFC PATCH v2 25/31] tracing: Add ftrace_fill_perf_regs() for perf event Masami Hiramatsu (Google)
2023-11-08 14:29 ` [RFC PATCH v2 26/31] fprobe: Rewrite fprobe on function-graph tracer Masami Hiramatsu (Google)
2023-11-10  7:17   ` Masami Hiramatsu
2023-11-11  1:44     ` Steven Rostedt
2023-11-11  3:01       ` Masami Hiramatsu
2023-11-08 14:29 ` [RFC PATCH v2 27/31] tracing/fprobe: Remove nr_maxactive from fprobe Masami Hiramatsu (Google)
2023-11-08 14:29 ` [RFC PATCH v2 28/31] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS Masami Hiramatsu (Google)
2023-11-08 14:29 ` [RFC PATCH v2 29/31] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled Masami Hiramatsu (Google)
2023-11-08 14:30 ` [RFC PATCH v2 30/31] selftests: ftrace: Remove obsolate maxactive syntax check Masami Hiramatsu (Google)
2023-11-08 14:30 ` [RFC PATCH v2 31/31] Documentation: probes: Update fprobe on function-graph tracer Masami Hiramatsu (Google)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=169945365765.55307.4129860949245209954.stgit@devnote2 \
    --to=mhiramat@kernel.org \
    --cc=acme@kernel.org \
    --cc=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=guoren@kernel.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=martin.lau@linux.dev \
    --cc=peterz@infradead.org \
    --cc=revest@chromium.org \
    --cc=rostedt@goodmis.org \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.