* [for-next][PATCH 01/15] lib/bootconfig: Fix the xbc_get_info kerneldoc
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 02/15] samples/kretprobes: Fix return value if register_kretprobe() failed Steven Rostedt
` (13 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, Masami Hiramatsu
From: Masami Hiramatsu <mhiramat@kernel.org>
Fix the kernel doc of xbc_get_info() to add '@' to the parameters.
Link: https://lkml.kernel.org/r/163525086738.676803.15352231787913236933.stgit@devnote2
Fixes: e306220cb7b7 ("bootconfig: Add xbc_get_info() for the node information")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
lib/bootconfig.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 3276675b25e1..a10ab25f6fcc 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -67,8 +67,8 @@ static inline void xbc_free_mem(void *addr, size_t size)
#endif
/**
* xbc_get_info() - Get the information of loaded boot config
- * node_size: A pointer to store the number of nodes.
- * data_size: A pointer to store the size of bootconfig data.
+ * @node_size: A pointer to store the number of nodes.
+ * @data_size: A pointer to store the size of bootconfig data.
*
* Get the number of used nodes in @node_size if it is not NULL,
* and the size of bootconfig data in @data_size if it is not NULL.
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 02/15] samples/kretprobes: Fix return value if register_kretprobe() failed
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 01/15] lib/bootconfig: Fix the xbc_get_info kerneldoc Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 03/15] docs, kprobes: Remove invalid URL and add new reference Steven Rostedt
` (12 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Tiezhu Yang, Masami Hiramatsu
From: Tiezhu Yang <yangtiezhu@loongson.cn>
Use the actual return value instead of always -1 if register_kretprobe()
failed.
E.g. without this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Operation not permitted
With this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Unknown symbol in module
Link: https://lkml.kernel.org/r/1635213091-24387-2-git-send-email-yangtiezhu@loongson.cn
Fixes: 804defea1c02 ("Kprobes: move kprobe examples to samples/")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
samples/kprobes/kretprobe_example.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/kprobes/kretprobe_example.c b/samples/kprobes/kretprobe_example.c
index 5dc1bf3baa98..228321ecb161 100644
--- a/samples/kprobes/kretprobe_example.c
+++ b/samples/kprobes/kretprobe_example.c
@@ -86,7 +86,7 @@ static int __init kretprobe_init(void)
ret = register_kretprobe(&my_kretprobe);
if (ret < 0) {
pr_err("register_kretprobe failed, returned %d\n", ret);
- return -1;
+ return ret;
}
pr_info("Planted return probe at %s: %p\n",
my_kretprobe.kp.symbol_name, my_kretprobe.kp.addr);
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 03/15] docs, kprobes: Remove invalid URL and add new reference
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 01/15] lib/bootconfig: Fix the xbc_get_info kerneldoc Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 02/15] samples/kretprobes: Fix return value if register_kretprobe() failed Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 04/15] test_kprobes: Move it from kernel/ to lib/ Steven Rostedt
` (11 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Tiezhu Yang, Masami Hiramatsu
From: Tiezhu Yang <yangtiezhu@loongson.cn>
The following reference is invalid, remove it.
https://www.ibm.com/developerworks/library/l-kprobes/index.html
Add the following new reference "An introduction to KProbes":
https://lwn.net/Articles/132196/
Link: https://lkml.kernel.org/r/1635213091-24387-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Documentation/trace/kprobes.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/trace/kprobes.rst b/Documentation/trace/kprobes.rst
index 998149ce2fd9..f318bceda1e6 100644
--- a/Documentation/trace/kprobes.rst
+++ b/Documentation/trace/kprobes.rst
@@ -784,6 +784,6 @@ References
For additional information on Kprobes, refer to the following URLs:
-- https://www.ibm.com/developerworks/library/l-kprobes/index.html
+- https://lwn.net/Articles/132196/
- https://www.kernel.org/doc/ols/2006/ols2006v2-pages-109-124.pdf
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 04/15] test_kprobes: Move it from kernel/ to lib/
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (2 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 03/15] docs, kprobes: Remove invalid URL and add new reference Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 05/15] MAINTAINERS: Update KPROBES and TRACING entries Steven Rostedt
` (10 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Tiezhu Yang, Masami Hiramatsu
From: Tiezhu Yang <yangtiezhu@loongson.cn>
Since config KPROBES_SANITY_TEST is in lib/Kconfig.debug, it is better to
let test_kprobes.c in lib/, just like other similar tests found in lib/.
Link: https://lkml.kernel.org/r/1635213091-24387-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/Makefile | 1 -
lib/Makefile | 1 +
{kernel => lib}/test_kprobes.c | 0
3 files changed, 1 insertion(+), 1 deletion(-)
rename {kernel => lib}/test_kprobes.c (100%)
diff --git a/kernel/Makefile b/kernel/Makefile
index 4df609be42d0..9e4d33dce8a5 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,6 @@ obj-$(CONFIG_PID_NS) += pid_namespace.o
obj-$(CONFIG_IKCONFIG) += configs.o
obj-$(CONFIG_IKHEADERS) += kheaders.o
obj-$(CONFIG_SMP) += stop_machine.o
-obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
obj-$(CONFIG_AUDITSYSCALL) += auditsc.o audit_watch.o audit_fsnotify.o audit_tree.o
obj-$(CONFIG_GCOV_KERNEL) += gcov/
diff --git a/lib/Makefile b/lib/Makefile
index 5efd1b435a37..864ff515814d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o
obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o
obj-$(CONFIG_TEST_HMM) += test_hmm.o
obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
+obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
#
# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
diff --git a/kernel/test_kprobes.c b/lib/test_kprobes.c
similarity index 100%
rename from kernel/test_kprobes.c
rename to lib/test_kprobes.c
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 05/15] MAINTAINERS: Update KPROBES and TRACING entries
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (3 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 04/15] test_kprobes: Move it from kernel/ to lib/ Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 06/15] selftests/ftrace: Stop tracing while reading the trace file by default Steven Rostedt
` (9 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Tiezhu Yang, Masami Hiramatsu
From: Tiezhu Yang <yangtiezhu@loongson.cn>
There is no git tree for KPROBES in MAINTAINERS, it is not convinent to
rebase, lib/test_kprobes.c and samples/kprobes belong to kprobe, so add
git tree and missing files for KPROBES, and also use linux-trace.git for
TRACING to avoid confusing.
Link: https://lkml.kernel.org/r/1635213091-24387-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
MAINTAINERS | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 5b33791bb8e9..8d26693cc317 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10437,10 +10437,13 @@ M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
M: "David S. Miller" <davem@davemloft.net>
M: Masami Hiramatsu <mhiramat@kernel.org>
S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
F: Documentation/trace/kprobes.rst
F: include/asm-generic/kprobes.h
F: include/linux/kprobes.h
F: kernel/kprobes.c
+F: lib/test_kprobes.c
+F: samples/kprobes
KS0108 LCD CONTROLLER DRIVER
M: Miguel Ojeda <ojeda@kernel.org>
@@ -18967,7 +18970,7 @@ TRACING
M: Steven Rostedt <rostedt@goodmis.org>
M: Ingo Molnar <mingo@redhat.com>
S: Maintained
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
F: Documentation/trace/ftrace.rst
F: arch/*/*/*/ftrace.h
F: arch/*/kernel/ftrace.c
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 06/15] selftests/ftrace: Stop tracing while reading the trace file by default
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (4 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 05/15] MAINTAINERS: Update KPROBES and TRACING entries Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 07/15] tracing: Add support for creating hist trigger variables from literal Steven Rostedt
` (8 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu
From: Masami Hiramatsu <mhiramat@kernel.org>
Stop tracing while reading the trace file by default, to prevent
the test results while checking it and to avoid taking a long time
to check the result.
If there is any testcase which wants to test the tracing while reading
the trace file, please override this setting inside the test case.
This also recovers the pause-on-trace when clean it up.
Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
tools/testing/selftests/ftrace/ftracetest | 2 +-
tools/testing/selftests/ftrace/test.d/functions | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index 8ec1922e974e..c3311c8c4089 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -428,7 +428,7 @@ for t in $TEST_CASES; do
exit 1
fi
done
-(cd $TRACING_DIR; initialize_ftrace) # for cleanup
+(cd $TRACING_DIR; finish_ftrace) # for cleanup
prlog ""
prlog "# of passed: " `echo $PASSED_CASES | wc -w`
diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions
index 000fd05e84b1..5f6cbec847fc 100644
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -124,10 +124,22 @@ initialize_ftrace() { # Reset ftrace to initial-state
[ -f uprobe_events ] && echo > uprobe_events
[ -f synthetic_events ] && echo > synthetic_events
[ -f snapshot ] && echo 0 > snapshot
+
+# Stop tracing while reading the trace file by default, to prevent
+# the test results while checking it and to avoid taking a long time
+# to check the result.
+ [ -f options/pause-on-trace ] && echo 1 > options/pause-on-trace
+
clear_trace
enable_tracing
}
+finish_ftrace() {
+ initialize_ftrace
+# And recover it to default.
+ [ -f options/pause-on-trace ] && echo 0 > options/pause-on-trace
+}
+
check_requires() { # Check required files and tracers
for i in "$@" ; do
r=${i%:README}
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 07/15] tracing: Add support for creating hist trigger variables from literal
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (5 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 06/15] selftests/ftrace: Stop tracing while reading the trace file by default Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 08/15] tracing: Add division and multiplication support for hist triggers Steven Rostedt
` (7 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh
From: Kalesh Singh <kaleshsingh@google.com>
Currently hist trigger expressions don't support the use of numeric
literals:
e.g. echo 'hist:keys=common_pid:x=$y-1234'
--> is not valid expression syntax
Having the ability to use numeric constants in hist triggers supports
a wider range of expressions for creating variables.
Add support for creating trace event histogram variables from numeric
literals.
e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger
A negative numeric constant is created, using unary minus operator
(parentheses are required).
e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger
Constants can be used with division/multiplication (added in the
next patch in this series) to implement granularity filters for frequent
trace events. For instance we can limit emitting the rss_stat
trace event to when there is a 512KB cross over in the rss size:
# Create a synthetic event to monitor instead of the high frequency
# rss_stat event
echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
int member; long size' >> tracing/synthetic_events
# Create a hist trigger that emits the synthetic rss_stat_throttled
# event only when the rss size crosses a 512KB boundary.
echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
.rss_stat_throttled(mm_id,curr,member,size)'
>> events/kmem/rss_stat/trigger
A use case for using constants with addition/subtraction is not yet
known, but for completeness the use of constants are supported for all
operators.
Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 71 +++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index b64aed538628..e6165e36d3b6 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -66,7 +66,8 @@
C(EMPTY_SORT_FIELD, "Empty sort field"), \
C(TOO_MANY_SORT_FIELDS, "Too many sort fields (Max = 2)"), \
C(INVALID_SORT_FIELD, "Sort field must be a key or a val"), \
- C(INVALID_STR_OPERAND, "String type can not be an operand in expression"),
+ C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), \
+ C(EXPECT_NUMBER, "Expecting numeric literal"),
#undef C
#define C(a, b) HIST_ERR_##a
@@ -89,6 +90,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field,
#define HIST_FIELD_OPERANDS_MAX 2
#define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX)
#define HIST_ACTIONS_MAX 8
+#define HIST_CONST_DIGITS_MAX 21
enum field_op_id {
FIELD_OP_NONE,
@@ -152,6 +154,9 @@ struct hist_field {
bool read_once;
unsigned int var_str_idx;
+
+ /* Numeric literals are represented as u64 */
+ u64 constant;
};
static u64 hist_field_none(struct hist_field *field,
@@ -163,6 +168,15 @@ static u64 hist_field_none(struct hist_field *field,
return 0;
}
+static u64 hist_field_const(struct hist_field *field,
+ struct tracing_map_elt *elt,
+ struct trace_buffer *buffer,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ return field->constant;
+}
+
static u64 hist_field_counter(struct hist_field *field,
struct tracing_map_elt *elt,
struct trace_buffer *buffer,
@@ -341,6 +355,7 @@ enum hist_field_flags {
HIST_FIELD_FL_CPU = 1 << 15,
HIST_FIELD_FL_ALIAS = 1 << 16,
HIST_FIELD_FL_BUCKET = 1 << 17,
+ HIST_FIELD_FL_CONST = 1 << 18,
};
struct var_defs {
@@ -1516,6 +1531,12 @@ static void expr_field_str(struct hist_field *field, char *expr)
{
if (field->flags & HIST_FIELD_FL_VAR_REF)
strcat(expr, "$");
+ else if (field->flags & HIST_FIELD_FL_CONST) {
+ char str[HIST_CONST_DIGITS_MAX];
+
+ snprintf(str, HIST_CONST_DIGITS_MAX, "%llu", field->constant);
+ strcat(expr, str);
+ }
strcat(expr, hist_field_name(field, 0));
@@ -1689,6 +1710,15 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
goto out;
}
+ if (flags & HIST_FIELD_FL_CONST) {
+ hist_field->fn = hist_field_const;
+ hist_field->size = sizeof(u64);
+ hist_field->type = kstrdup("u64", GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
+ goto out;
+ }
+
if (flags & HIST_FIELD_FL_STACKTRACE) {
hist_field->fn = hist_field_none;
goto out;
@@ -2090,6 +2120,29 @@ static struct hist_field *create_alias(struct hist_trigger_data *hist_data,
return alias;
}
+static struct hist_field *parse_const(struct hist_trigger_data *hist_data,
+ char *str, char *var_name,
+ unsigned long *flags)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_field *field = NULL;
+ u64 constant;
+
+ if (kstrtoull(str, 0, &constant)) {
+ hist_err(tr, HIST_ERR_EXPECT_NUMBER, errpos(str));
+ return NULL;
+ }
+
+ *flags |= HIST_FIELD_FL_CONST;
+ field = create_hist_field(hist_data, NULL, *flags, var_name);
+ if (!field)
+ return NULL;
+
+ field->constant = constant;
+
+ return field;
+}
+
static struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
struct trace_event_file *file, char *str,
unsigned long *flags, char *var_name)
@@ -2100,6 +2153,15 @@ static struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
unsigned long buckets = 0;
int ret = 0;
+ if (isdigit(str[0])) {
+ hist_field = parse_const(hist_data, str, var_name, flags);
+ if (!hist_field) {
+ ret = -EINVAL;
+ goto out;
+ }
+ return hist_field;
+ }
+
s = strchr(str, '.');
if (s) {
s = strchr(++s, '.');
@@ -4945,6 +5007,8 @@ static void hist_field_debug_show_flags(struct seq_file *m,
if (flags & HIST_FIELD_FL_ALIAS)
seq_puts(m, " HIST_FIELD_FL_ALIAS\n");
+ else if (flags & HIST_FIELD_FL_CONST)
+ seq_puts(m, " HIST_FIELD_FL_CONST\n");
}
static int hist_field_debug_show(struct seq_file *m,
@@ -4966,6 +5030,9 @@ static int hist_field_debug_show(struct seq_file *m,
field->var.idx);
}
+ if (field->flags & HIST_FIELD_FL_CONST)
+ seq_printf(m, " constant: %llu\n", field->constant);
+
if (field->flags & HIST_FIELD_FL_ALIAS)
seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n",
field->var_ref_idx);
@@ -5208,6 +5275,8 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
if (hist_field->flags & HIST_FIELD_FL_CPU)
seq_puts(m, "common_cpu");
+ else if (hist_field->flags & HIST_FIELD_FL_CONST)
+ seq_printf(m, "%llu", hist_field->constant);
else if (field_name) {
if (hist_field->flags & HIST_FIELD_FL_VAR_REF ||
hist_field->flags & HIST_FIELD_FL_ALIAS)
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 08/15] tracing: Add division and multiplication support for hist triggers
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (6 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 07/15] tracing: Add support for creating hist trigger variables from literal Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 09/15] tracing: Fix operator precedence for hist triggers expression Steven Rostedt
` (6 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh
From: Kalesh Singh <kaleshsingh@google.com>
Adds basic support for division and multiplication operations for
hist trigger variable expressions.
For simplicity this patch only supports, division and multiplication
for a single operation expression (e.g. x=$a/$b), as currently
expressions are always evaluated right to left. This can lead to some
incorrect results:
e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger
8-4-2 should evaluate to 2 i.e. (8-4)-2
but currently x evaluate to 6 i.e. 8-(4-2)
Multiplication and division in sub-expressions will work correctly, once
correct operator precedence support is added (See next patch in this
series).
For the undefined case of division by 0, the histogram expression
evaluates to (u64)(-1). Since this cannot be detected when the
expression is created, it is the responsibility of the user to be
aware and account for this possibility.
Examples:
echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \
>> event/trigger
echo 'hist:keys=common_pid:y=5*$b' \
>> event/trigger
Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 72 +++++++++++++++++++++++++++++++-
1 file changed, 71 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index e6165e36d3b6..1edec5d471c1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -97,6 +97,8 @@ enum field_op_id {
FIELD_OP_PLUS,
FIELD_OP_MINUS,
FIELD_OP_UNARY_MINUS,
+ FIELD_OP_DIV,
+ FIELD_OP_MULT,
};
/*
@@ -285,6 +287,40 @@ static u64 hist_field_minus(struct hist_field *hist_field,
return val1 - val2;
}
+static u64 hist_field_div(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct trace_buffer *buffer,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ struct hist_field *operand1 = hist_field->operands[0];
+ struct hist_field *operand2 = hist_field->operands[1];
+
+ u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+ u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+
+ /* Return -1 for the undefined case */
+ if (!val2)
+ return -1;
+
+ return div64_u64(val1, val2);
+}
+
+static u64 hist_field_mult(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct trace_buffer *buffer,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ struct hist_field *operand1 = hist_field->operands[0];
+ struct hist_field *operand2 = hist_field->operands[1];
+
+ u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+ u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+
+ return val1 * val2;
+}
+
static u64 hist_field_unary_minus(struct hist_field *hist_field,
struct tracing_map_elt *elt,
struct trace_buffer *buffer,
@@ -1592,6 +1628,12 @@ static char *expr_str(struct hist_field *field, unsigned int level)
case FIELD_OP_PLUS:
strcat(expr, "+");
break;
+ case FIELD_OP_DIV:
+ strcat(expr, "/");
+ break;
+ case FIELD_OP_MULT:
+ strcat(expr, "*");
+ break;
default:
kfree(expr);
return NULL;
@@ -1607,7 +1649,7 @@ static int contains_operator(char *str)
enum field_op_id field_op = FIELD_OP_NONE;
char *op;
- op = strpbrk(str, "+-");
+ op = strpbrk(str, "+-/*");
if (!op)
return FIELD_OP_NONE;
@@ -1628,6 +1670,12 @@ static int contains_operator(char *str)
case '+':
field_op = FIELD_OP_PLUS;
break;
+ case '/':
+ field_op = FIELD_OP_DIV;
+ break;
+ case '*':
+ field_op = FIELD_OP_MULT;
+ break;
default:
break;
}
@@ -2361,10 +2409,26 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
case FIELD_OP_PLUS:
sep = "+";
break;
+ case FIELD_OP_DIV:
+ sep = "/";
+ break;
+ case FIELD_OP_MULT:
+ sep = "*";
+ break;
default:
goto free;
}
+ /*
+ * Multiplication and division are only supported in single operator
+ * expressions, since the expression is always evaluated from right
+ * to left.
+ */
+ if ((field_op == FIELD_OP_DIV || field_op == FIELD_OP_MULT) && level > 0) {
+ hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str));
+ return ERR_PTR(-EINVAL);
+ }
+
operand1_str = strsep(&str, sep);
if (!operand1_str || !str)
goto free;
@@ -2436,6 +2500,12 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
case FIELD_OP_PLUS:
expr->fn = hist_field_plus;
break;
+ case FIELD_OP_DIV:
+ expr->fn = hist_field_div;
+ break;
+ case FIELD_OP_MULT:
+ expr->fn = hist_field_mult;
+ break;
default:
ret = -EINVAL;
goto free;
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 09/15] tracing: Fix operator precedence for hist triggers expression
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (7 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 08/15] tracing: Add division and multiplication support for hist triggers Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 10/15] tracing/histogram: Simplify handling of .sym-offset in expressions Steven Rostedt
` (5 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh, Namhyung Kim
From: Kalesh Singh <kaleshsingh@google.com>
The current histogram expression evaluation logic evaluates the
expression from right to left. This can lead to incorrect results
if the operations are not associative (as is the case for subtraction
and, the now added, division operators).
e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2
64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2
Division and multiplication are currently limited to single operation
expression due to operator precedence support not yet implemented.
Rework the expression parsing to support the correct evaluation of
expressions containing operators of different precedences; and fix
the associativity error by evaluating expressions with operators of
the same precedence from left to right.
Examples:
(1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \
>> event/trigger
(2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger
(3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger
(4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger
Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 210 ++++++++++++++++++++-----------
1 file changed, 140 insertions(+), 70 deletions(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1edec5d471c1..7a50ea2ac6b1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -67,7 +67,9 @@
C(TOO_MANY_SORT_FIELDS, "Too many sort fields (Max = 2)"), \
C(INVALID_SORT_FIELD, "Sort field must be a key or a val"), \
C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), \
- C(EXPECT_NUMBER, "Expecting numeric literal"),
+ C(EXPECT_NUMBER, "Expecting numeric literal"), \
+ C(UNARY_MINUS_SUBEXPR, "Unary minus not supported in sub-expressions"), \
+ C(SYM_OFFSET_SUBEXPR, ".sym-offset not supported in sub-expressions"),
#undef C
#define C(a, b) HIST_ERR_##a
@@ -1644,40 +1646,96 @@ static char *expr_str(struct hist_field *field, unsigned int level)
return expr;
}
-static int contains_operator(char *str)
+/*
+ * If field_op != FIELD_OP_NONE, *sep points to the root operator
+ * of the expression tree to be evaluated.
+ */
+static int contains_operator(char *str, char **sep)
{
enum field_op_id field_op = FIELD_OP_NONE;
- char *op;
+ char *minus_op, *plus_op, *div_op, *mult_op;
+
+
+ /*
+ * Report the last occurrence of the operators first, so that the
+ * expression is evaluated left to right. This is important since
+ * subtraction and division are not associative.
+ *
+ * e.g
+ * 64/8/4/2 is 1, i.e 64/8/4/2 = ((64/8)/4)/2
+ * 14-7-5-2 is 0, i.e 14-7-5-2 = ((14-7)-5)-2
+ */
- op = strpbrk(str, "+-/*");
- if (!op)
- return FIELD_OP_NONE;
+ /*
+ * First, find lower precedence addition and subtraction
+ * since the expression will be evaluated recursively.
+ */
+ minus_op = strrchr(str, '-');
+ if (minus_op) {
+ /* Unfortunately, the modifier ".sym-offset" can confuse things. */
+ if (minus_op - str >= 4 && !strncmp(minus_op - 4, ".sym-offset", 11))
+ goto out;
- switch (*op) {
- case '-':
/*
- * Unfortunately, the modifier ".sym-offset"
- * can confuse things.
+ * Unary minus is not supported in sub-expressions. If
+ * present, it is always the next root operator.
*/
- if (op - str >= 4 && !strncmp(op - 4, ".sym-offset", 11))
- return FIELD_OP_NONE;
-
- if (*str == '-')
+ if (minus_op == str) {
field_op = FIELD_OP_UNARY_MINUS;
- else
- field_op = FIELD_OP_MINUS;
- break;
- case '+':
- field_op = FIELD_OP_PLUS;
- break;
- case '/':
+ goto out;
+ }
+
+ field_op = FIELD_OP_MINUS;
+ }
+
+ plus_op = strrchr(str, '+');
+ if (plus_op || minus_op) {
+ /*
+ * For operators of the same precedence use to rightmost as the
+ * root, so that the expression is evaluated left to right.
+ */
+ if (plus_op > minus_op)
+ field_op = FIELD_OP_PLUS;
+ goto out;
+ }
+
+ /*
+ * Multiplication and division have higher precedence than addition and
+ * subtraction.
+ */
+ div_op = strrchr(str, '/');
+ if (div_op)
field_op = FIELD_OP_DIV;
- break;
- case '*':
+
+ mult_op = strrchr(str, '*');
+ /*
+ * For operators of the same precedence use to rightmost as the
+ * root, so that the expression is evaluated left to right.
+ */
+ if (mult_op > div_op)
field_op = FIELD_OP_MULT;
- break;
- default:
- break;
+
+out:
+ if (sep) {
+ switch (field_op) {
+ case FIELD_OP_UNARY_MINUS:
+ case FIELD_OP_MINUS:
+ *sep = minus_op;
+ break;
+ case FIELD_OP_PLUS:
+ *sep = plus_op;
+ break;
+ case FIELD_OP_DIV:
+ *sep = div_op;
+ break;
+ case FIELD_OP_MULT:
+ *sep = mult_op;
+ break;
+ case FIELD_OP_NONE:
+ default:
+ *sep = NULL;
+ break;
+ }
}
return field_op;
@@ -2003,7 +2061,7 @@ static char *field_name_from_var(struct hist_trigger_data *hist_data,
if (strcmp(var_name, name) == 0) {
field = hist_data->attrs->var_defs.expr[i];
- if (contains_operator(field) || is_var_ref(field))
+ if (contains_operator(field, NULL) || is_var_ref(field))
continue;
return field;
}
@@ -2266,21 +2324,24 @@ static struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
struct trace_event_file *file,
char *str, unsigned long flags,
- char *var_name, unsigned int level);
+ char *var_name, unsigned int *n_subexprs);
static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
struct trace_event_file *file,
char *str, unsigned long flags,
- char *var_name, unsigned int level)
+ char *var_name, unsigned int *n_subexprs)
{
struct hist_field *operand1, *expr = NULL;
unsigned long operand_flags;
int ret = 0;
char *s;
+ /* Unary minus operator, increment n_subexprs */
+ ++*n_subexprs;
+
/* we support only -(xxx) i.e. explicit parens required */
- if (level > 3) {
+ if (*n_subexprs > 3) {
hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str));
ret = -EINVAL;
goto free;
@@ -2297,8 +2358,16 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
}
s = strrchr(str, ')');
- if (s)
+ if (s) {
+ /* unary minus not supported in sub-expressions */
+ if (*(s+1) != '\0') {
+ hist_err(file->tr, HIST_ERR_UNARY_MINUS_SUBEXPR,
+ errpos(str));
+ ret = -EINVAL;
+ goto free;
+ }
*s = '\0';
+ }
else {
ret = -EINVAL; /* no closing ')' */
goto free;
@@ -2312,7 +2381,7 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
}
operand_flags = 0;
- operand1 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level);
+ operand1 = parse_expr(hist_data, file, str, operand_flags, NULL, n_subexprs);
if (IS_ERR(operand1)) {
ret = PTR_ERR(operand1);
goto free;
@@ -2382,60 +2451,61 @@ static int check_expr_operands(struct trace_array *tr,
static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
struct trace_event_file *file,
char *str, unsigned long flags,
- char *var_name, unsigned int level)
+ char *var_name, unsigned int *n_subexprs)
{
struct hist_field *operand1 = NULL, *operand2 = NULL, *expr = NULL;
unsigned long operand_flags;
int field_op, ret = -EINVAL;
char *sep, *operand1_str;
- if (level > 3) {
+ if (*n_subexprs > 3) {
hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str));
return ERR_PTR(-EINVAL);
}
- field_op = contains_operator(str);
+ /*
+ * ".sym-offset" in expressions has no effect on their evaluation,
+ * but can confuse operator parsing.
+ */
+ if (*n_subexprs == 0) {
+ sep = strstr(str, ".sym-offset");
+ if (sep) {
+ *sep = '\0';
+ if (strpbrk(str, "+-/*") || strpbrk(sep + 11, "+-/*")) {
+ *sep = '.';
+ hist_err(file->tr, HIST_ERR_SYM_OFFSET_SUBEXPR,
+ errpos(sep));
+ return ERR_PTR(-EINVAL);
+ }
+ *sep = '.';
+ }
+ }
+
+ field_op = contains_operator(str, &sep);
if (field_op == FIELD_OP_NONE)
return parse_atom(hist_data, file, str, &flags, var_name);
if (field_op == FIELD_OP_UNARY_MINUS)
- return parse_unary(hist_data, file, str, flags, var_name, ++level);
+ return parse_unary(hist_data, file, str, flags, var_name, n_subexprs);
- switch (field_op) {
- case FIELD_OP_MINUS:
- sep = "-";
- break;
- case FIELD_OP_PLUS:
- sep = "+";
- break;
- case FIELD_OP_DIV:
- sep = "/";
- break;
- case FIELD_OP_MULT:
- sep = "*";
- break;
- default:
- goto free;
- }
+ /* Binary operator found, increment n_subexprs */
+ ++*n_subexprs;
- /*
- * Multiplication and division are only supported in single operator
- * expressions, since the expression is always evaluated from right
- * to left.
- */
- if ((field_op == FIELD_OP_DIV || field_op == FIELD_OP_MULT) && level > 0) {
- hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str));
- return ERR_PTR(-EINVAL);
- }
+ /* Split the expression string at the root operator */
+ if (!sep)
+ goto free;
+ *sep = '\0';
+ operand1_str = str;
+ str = sep+1;
- operand1_str = strsep(&str, sep);
if (!operand1_str || !str)
goto free;
operand_flags = 0;
- operand1 = parse_atom(hist_data, file, operand1_str,
- &operand_flags, NULL);
+
+ /* LHS of string is an expression e.g. a+b in a+b+c */
+ operand1 = parse_expr(hist_data, file, operand1_str, operand_flags, NULL, n_subexprs);
if (IS_ERR(operand1)) {
ret = PTR_ERR(operand1);
operand1 = NULL;
@@ -2447,9 +2517,9 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
goto free;
}
- /* rest of string could be another expression e.g. b+c in a+b+c */
+ /* RHS of string is another expression e.g. c in a+b+c */
operand_flags = 0;
- operand2 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level);
+ operand2 = parse_expr(hist_data, file, str, operand_flags, NULL, n_subexprs);
if (IS_ERR(operand2)) {
ret = PTR_ERR(operand2);
operand2 = NULL;
@@ -3883,9 +3953,9 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
unsigned long flags)
{
struct hist_field *hist_field;
- int ret = 0;
+ int ret = 0, n_subexprs = 0;
- hist_field = parse_expr(hist_data, file, field_str, flags, var_name, 0);
+ hist_field = parse_expr(hist_data, file, field_str, flags, var_name, &n_subexprs);
if (IS_ERR(hist_field)) {
ret = PTR_ERR(hist_field);
goto out;
@@ -4026,7 +4096,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
struct hist_field *hist_field = NULL;
unsigned long flags = 0;
unsigned int key_size;
- int ret = 0;
+ int ret = 0, n_subexprs = 0;
if (WARN_ON(key_idx >= HIST_FIELDS_MAX))
return -EINVAL;
@@ -4039,7 +4109,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
hist_field = create_hist_field(hist_data, NULL, flags, NULL);
} else {
hist_field = parse_expr(hist_data, file, field_str, flags,
- NULL, 0);
+ NULL, &n_subexprs);
if (IS_ERR(hist_field)) {
ret = PTR_ERR(hist_field);
goto out;
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 10/15] tracing/histogram: Simplify handling of .sym-offset in expressions
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (8 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 09/15] tracing: Fix operator precedence for hist triggers expression Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 11/15] tracing/histogram: Covert expr to const if both operands are constants Steven Rostedt
` (4 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh
From: Kalesh Singh <kaleshsingh@google.com>
The '-' in .sym-offset can confuse the hist trigger arithmetic
expression parsing. Simplify the handling of this by replacing the
'sym-offset' with 'symXoffset'. This allows us to correctly evaluate
expressions where the user may have inadvertently added a .sym-offset
modifier to one of the operands in an expression, instead of bailing
out. In this case the .sym-offset has no effect on the evaluation of the
expression. The only valid use of the .sym-offset is as a hist key
modifier.
Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 43 +++++++++++++-------------------
1 file changed, 17 insertions(+), 26 deletions(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 7a50ea2ac6b1..bbaf2e16b7ae 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -68,8 +68,7 @@
C(INVALID_SORT_FIELD, "Sort field must be a key or a val"), \
C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), \
C(EXPECT_NUMBER, "Expecting numeric literal"), \
- C(UNARY_MINUS_SUBEXPR, "Unary minus not supported in sub-expressions"), \
- C(SYM_OFFSET_SUBEXPR, ".sym-offset not supported in sub-expressions"),
+ C(UNARY_MINUS_SUBEXPR, "Unary minus not supported in sub-expressions"),
#undef C
#define C(a, b) HIST_ERR_##a
@@ -1672,10 +1671,6 @@ static int contains_operator(char *str, char **sep)
*/
minus_op = strrchr(str, '-');
if (minus_op) {
- /* Unfortunately, the modifier ".sym-offset" can confuse things. */
- if (minus_op - str >= 4 && !strncmp(minus_op - 4, ".sym-offset", 11))
- goto out;
-
/*
* Unary minus is not supported in sub-expressions. If
* present, it is always the next root operator.
@@ -2138,7 +2133,11 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
*flags |= HIST_FIELD_FL_HEX;
else if (strcmp(modifier, "sym") == 0)
*flags |= HIST_FIELD_FL_SYM;
- else if (strcmp(modifier, "sym-offset") == 0)
+ /*
+ * 'sym-offset' occurrences in the trigger string are modified
+ * to 'symXoffset' to simplify arithmetic expression parsing.
+ */
+ else if (strcmp(modifier, "symXoffset") == 0)
*flags |= HIST_FIELD_FL_SYM_OFFSET;
else if ((strcmp(modifier, "execname") == 0) &&
(strcmp(field_name, "common_pid") == 0))
@@ -2463,24 +2462,6 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
return ERR_PTR(-EINVAL);
}
- /*
- * ".sym-offset" in expressions has no effect on their evaluation,
- * but can confuse operator parsing.
- */
- if (*n_subexprs == 0) {
- sep = strstr(str, ".sym-offset");
- if (sep) {
- *sep = '\0';
- if (strpbrk(str, "+-/*") || strpbrk(sep + 11, "+-/*")) {
- *sep = '.';
- hist_err(file->tr, HIST_ERR_SYM_OFFSET_SUBEXPR,
- errpos(sep));
- return ERR_PTR(-EINVAL);
- }
- *sep = '.';
- }
- }
-
field_op = contains_operator(str, &sep);
if (field_op == FIELD_OP_NONE)
@@ -5999,7 +5980,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
struct synth_event *se;
const char *se_name;
bool remove = false;
- char *trigger, *p;
+ char *trigger, *p, *start;
int ret = 0;
lockdep_assert_held(&event_mutex);
@@ -6047,6 +6028,16 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
trigger = strstrip(trigger);
}
+ /*
+ * To simplify arithmetic expression parsing, replace occurrences of
+ * '.sym-offset' modifier with '.symXoffset'
+ */
+ start = strstr(trigger, ".sym-offset");
+ while (start) {
+ *(start + 4) = 'X';
+ start = strstr(start + 11, ".sym-offset");
+ };
+
attrs = parse_hist_trigger_attrs(file->tr, trigger);
if (IS_ERR(attrs))
return PTR_ERR(attrs);
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 11/15] tracing/histogram: Covert expr to const if both operands are constants
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (9 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 10/15] tracing/histogram: Simplify handling of .sym-offset in expressions Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 12/15] tracing/histogram: Optimize division by a power of 2 Steven Rostedt
` (3 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh
From: Kalesh Singh <kaleshsingh@google.com>
If both operands of a hist trigger expression are constants, convert the
expression to a constant. This optimization avoids having to perform the
same calculation multiple times and also saves on memory since the
merged constants are represented by a single struct hist_field instead
or multiple.
Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 104 ++++++++++++++++++++++---------
1 file changed, 74 insertions(+), 30 deletions(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index bbaf2e16b7ae..71b453576d85 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -2411,9 +2411,15 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
return ERR_PTR(ret);
}
+/*
+ * If the operands are var refs, return pointers the
+ * variable(s) referenced in var1 and var2, else NULL.
+ */
static int check_expr_operands(struct trace_array *tr,
struct hist_field *operand1,
- struct hist_field *operand2)
+ struct hist_field *operand2,
+ struct hist_field **var1,
+ struct hist_field **var2)
{
unsigned long operand1_flags = operand1->flags;
unsigned long operand2_flags = operand2->flags;
@@ -2426,6 +2432,7 @@ static int check_expr_operands(struct trace_array *tr,
if (!var)
return -EINVAL;
operand1_flags = var->flags;
+ *var1 = var;
}
if ((operand2_flags & HIST_FIELD_FL_VAR_REF) ||
@@ -2436,6 +2443,7 @@ static int check_expr_operands(struct trace_array *tr,
if (!var)
return -EINVAL;
operand2_flags = var->flags;
+ *var2 = var;
}
if ((operand1_flags & HIST_FIELD_FL_TIMESTAMP_USECS) !=
@@ -2453,9 +2461,12 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
char *var_name, unsigned int *n_subexprs)
{
struct hist_field *operand1 = NULL, *operand2 = NULL, *expr = NULL;
- unsigned long operand_flags;
+ struct hist_field *var1 = NULL, *var2 = NULL;
+ unsigned long operand_flags, operand2_flags;
int field_op, ret = -EINVAL;
char *sep, *operand1_str;
+ hist_field_fn_t op_fn;
+ bool combine_consts;
if (*n_subexprs > 3) {
hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str));
@@ -2512,11 +2523,38 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
goto free;
}
- ret = check_expr_operands(file->tr, operand1, operand2);
+ switch (field_op) {
+ case FIELD_OP_MINUS:
+ op_fn = hist_field_minus;
+ break;
+ case FIELD_OP_PLUS:
+ op_fn = hist_field_plus;
+ break;
+ case FIELD_OP_DIV:
+ op_fn = hist_field_div;
+ break;
+ case FIELD_OP_MULT:
+ op_fn = hist_field_mult;
+ break;
+ default:
+ ret = -EINVAL;
+ goto free;
+ }
+
+ ret = check_expr_operands(file->tr, operand1, operand2, &var1, &var2);
if (ret)
goto free;
- flags |= HIST_FIELD_FL_EXPR;
+ operand_flags = var1 ? var1->flags : operand1->flags;
+ operand2_flags = var2 ? var2->flags : operand2->flags;
+
+ /*
+ * If both operands are constant, the expression can be
+ * collapsed to a single constant.
+ */
+ combine_consts = operand_flags & operand2_flags & HIST_FIELD_FL_CONST;
+
+ flags |= combine_consts ? HIST_FIELD_FL_CONST : HIST_FIELD_FL_EXPR;
flags |= operand1->flags &
(HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS);
@@ -2533,37 +2571,43 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
expr->operands[0] = operand1;
expr->operands[1] = operand2;
- /* The operand sizes should be the same, so just pick one */
- expr->size = operand1->size;
+ if (combine_consts) {
+ if (var1)
+ expr->operands[0] = var1;
+ if (var2)
+ expr->operands[1] = var2;
- expr->operator = field_op;
- expr->name = expr_str(expr, 0);
- expr->type = kstrdup_const(operand1->type, GFP_KERNEL);
- if (!expr->type) {
- ret = -ENOMEM;
- goto free;
- }
+ expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
- switch (field_op) {
- case FIELD_OP_MINUS:
- expr->fn = hist_field_minus;
- break;
- case FIELD_OP_PLUS:
- expr->fn = hist_field_plus;
- break;
- case FIELD_OP_DIV:
- expr->fn = hist_field_div;
- break;
- case FIELD_OP_MULT:
- expr->fn = hist_field_mult;
- break;
- default:
- ret = -EINVAL;
- goto free;
+ expr->operands[0] = NULL;
+ expr->operands[1] = NULL;
+
+ /*
+ * var refs won't be destroyed immediately
+ * See: destroy_hist_field()
+ */
+ destroy_hist_field(operand2, 0);
+ destroy_hist_field(operand1, 0);
+
+ expr->name = expr_str(expr, 0);
+ } else {
+ expr->fn = op_fn;
+
+ /* The operand sizes should be the same, so just pick one */
+ expr->size = operand1->size;
+
+ expr->operator = field_op;
+ expr->type = kstrdup_const(operand1->type, GFP_KERNEL);
+ if (!expr->type) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ expr->name = expr_str(expr, 0);
}
return expr;
- free:
+free:
destroy_hist_field(operand1, 0);
destroy_hist_field(operand2, 0);
destroy_hist_field(expr, 0);
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 12/15] tracing/histogram: Optimize division by a power of 2
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (10 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 11/15] tracing/histogram: Covert expr to const if both operands are constants Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 13/15] tracing/histogram: Document expression arithmetic and constants Steven Rostedt
` (2 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh
From: Kalesh Singh <kaleshsingh@google.com>
The division is a slow operation. If the divisor is a power of 2, use a
shift instead.
Results were obtained using Android's version of perf (simpleperf[1]) as
described below:
1. hist_field_div() is modified to call 2 test functions:
test_hist_field_div_[not]_optimized(); passing them the
same args. Use noinline and volatile to ensure these are
not optimized out by the compiler.
2. Create a hist event trigger that uses division:
events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>'
>> trigger
events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x'
>> trigger
3. Run Android's lmkd_test[2] to generate rss_stat events, and
record CPU samples with Android's simpleperf:
simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g
-f 2000 -o perf.data
== Results ==
Divisor is a power of 2 (divisor == 32):
test_hist_field_div_not_optimized | 8,717,091 cpu-cycles
test_hist_field_div_optimized | 1,643,137 cpu-cycles
If the divisor is a power of 2, the optimized version is ~5.3x faster.
Divisor is not a power of 2 (divisor == 33):
test_hist_field_div_not_optimized | 4,444,324 cpu-cycles
test_hist_field_div_optimized | 5,497,958 cpu-cycles
If the divisor is not a power of 2, as expected, the optimized version is
slightly slower (~24% slower).
[1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md
[2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp
Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 71b453576d85..452daad7cfb3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -304,6 +304,10 @@ static u64 hist_field_div(struct hist_field *hist_field,
if (!val2)
return -1;
+ /* Use shift if the divisor is a power of 2 */
+ if (!(val2 & (val2 - 1)))
+ return val1 >> __ffs64(val2);
+
return div64_u64(val1, val2);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 13/15] tracing/histogram: Document expression arithmetic and constants
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (11 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 12/15] tracing/histogram: Optimize division by a power of 2 Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 14/15] ftrace: disable preemption when recursion locked Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 15/15] ftrace: do CPU checking after preemption disabled Steven Rostedt
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Kalesh Singh, Namhyung Kim
From: Kalesh Singh <kaleshsingh@google.com>
Histogram expressions now support division, and multiplication in
addition to the already supported subtraction and addition operators.
Numeric constants can also be used in a hist trigger expressions
or assigned to a variable and used by refernce in an expression.
Link: https://lkml.kernel.org/r/20211025200852.3002369-9-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
Documentation/trace/histogram.rst | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 533415644c54..e12699abaee8 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1763,6 +1763,20 @@ using the same key and variable from yet another event::
# echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ...' >> event3/trigger
+Expressions support the use of addition, subtraction, multiplication and
+division operators (+-*/).
+
+Note that division by zero always returns -1.
+
+Numeric constants can also be used directly in an expression::
+
+ # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/1000000 ...' >> event/trigger
+
+or assigned to a variable and referenced in a subsequent expression::
+
+ # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger
+ # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger
+
2.2.2 Synthetic Events
----------------------
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [for-next][PATCH 14/15] ftrace: disable preemption when recursion locked
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (12 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 13/15] tracing/histogram: Document expression arithmetic and constants Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
2021-10-27 16:17 ` Steven Rostedt
2021-10-27 16:09 ` [for-next][PATCH 15/15] ftrace: do CPU checking after preemption disabled Steven Rostedt
14 siblings, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Petr Mladek, Guo Ren, Ingo Molnar,
James E.J. Bottomley, Helge Deller, Michael Ellerman,
Benjamin Herrenschmidt, Paul Mackerras, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov,
H. Peter Anvin, Josh Poimboeuf, Jiri Kosina, Joe Lawrence,
Masami Hiramatsu, Nicholas Piggin, Jisheng Zhang, Miroslav Benes,
Abaci, Peter Zijlstra, Michael Wang
From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= <yun.wang@linux.alibaba.com>
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.
And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.
This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.
Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com
CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 11 ++++++++++-
kernel/livepatch/patch.c | 12 ++++++------
kernel/trace/ftrace.c | 15 +++++----------
kernel/trace/trace_functions.c | 5 -----
9 files changed, 21 insertions(+), 32 deletions(-)
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228abbf2..834cffcfbce3 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 01581f715737..b14011d3c2f1 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -211,7 +211,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -240,7 +239,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
}
__this_cpu_write(current_kprobe, NULL);
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58338cc..072ebe7f290b 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a82f419..7142ec42e889 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f6d3a5..dd2ec14adb77 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 24f284eb55a7..a13f23b04d73 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -155,6 +155,9 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
# define do_ftrace_record_recursion(ip, pip) do { } while (0)
#endif
+/*
+ * Preemption is promised to be disabled when return bit >= 0.
+ */
static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
int start, int max)
{
@@ -189,14 +192,20 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
current->trace_recursion = val;
barrier();
+ preempt_disable_notrace();
+
return bit + 1;
}
+/*
+ * Preemption will be enabled (if it was previously enabled).
+ */
static __always_inline void trace_clear_recursion(int bit)
{
if (!bit)
return;
+ preempt_enable_notrace();
barrier();
bit--;
trace_recursion_clear(bit);
@@ -209,7 +218,7 @@ static __always_inline void trace_clear_recursion(int bit)
* tracing recursed in the same context (normal vs interrupt),
*
* Returns: -1 if a recursion happened.
- * >= 0 if no recursion
+ * >= 0 if no recursion.
*/
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
unsigned long parent_ip)
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029aea67f1..fe316c021d73 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,14 +49,15 @@ static void notrace klp_ftrace_handler(unsigned long ip,
ops = container_of(fops, struct klp_ops, fops);
+ /*
+ * The ftrace_test_recursion_trylock() will disable preemption,
+ * which is required for the variant of synchronize_rcu() that is
+ * used to allow patching functions where RCU is not watching.
+ * See klp_synchronize_transition() for more details.
+ */
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (WARN_ON_ONCE(bit < 0))
return;
- /*
- * A variant of synchronize_rcu() is used to allow patching functions
- * where RCU is not watching, see klp_synchronize_transition().
- */
- preempt_disable_notrace();
func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
@@ -120,7 +121,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
klp_arch_set_pc(fregs, (unsigned long)func->new_func);
unlock:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2057ad363772..b4ed1a301232 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7198,16 +7198,15 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op;
int bit;
+ /*
+ * The ftrace_test_and_set_recursion() will disable preemption,
+ * which is required since some of the ops may be dynamically
+ * allocated, they must be freed after a synchronize_rcu().
+ */
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
if (bit < 0)
return;
- /*
- * Some of the ops may be dynamically allocated,
- * they must be freed after a synchronize_rcu().
- */
- preempt_disable_notrace();
-
do_for_each_ftrace_op(op, ftrace_ops_list) {
/* Stub functions don't need to be called nor tested */
if (op->flags & FTRACE_OPS_FL_STUB)
@@ -7231,7 +7230,6 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
}
} while_for_each_ftrace_op(op);
out:
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
@@ -7279,12 +7277,9 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
-
if (!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching())
op->func(ip, parent_ip, op, fregs);
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
NOKPROBE_SYMBOL(ftrace_ops_assist_func);
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f5d1f9..9f1bfbe105e8 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
return;
trace_ctx = tracing_gen_ctx();
- preempt_disable_notrace();
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
trace_function(tr, ip, parent_ip, trace_ctx);
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
#ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
-
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip,
out:
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
static void
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [for-next][PATCH 14/15] ftrace: disable preemption when recursion locked
2021-10-27 16:09 ` [for-next][PATCH 14/15] ftrace: disable preemption when recursion locked Steven Rostedt
@ 2021-10-27 16:17 ` Steven Rostedt
0 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:17 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Petr Mladek, Guo Ren, Ingo Molnar,
James E.J. Bottomley, Helge Deller, Michael Ellerman,
Benjamin Herrenschmidt, Paul Mackerras, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov,
H. Peter Anvin, Josh Poimboeuf, Jiri Kosina, Joe Lawrence,
Masami Hiramatsu, Nicholas Piggin, Jisheng Zhang, Miroslav Benes,
Abaci, Peter Zijlstra, Michael Wang
On Wed, 27 Oct 2021 12:09:54 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= <yun.wang@linux.alibaba.com>
This is just a artifact from my scripts that use quilt to send the email
(quilt is at fault here). But my git repo has it correctly encoded.
-- Steve
^ permalink raw reply [flat|nested] 17+ messages in thread
* [for-next][PATCH 15/15] ftrace: do CPU checking after preemption disabled
2021-10-27 16:09 [for-next][PATCH 00/15] tracing: More updates for 5.16 Steven Rostedt
` (13 preceding siblings ...)
2021-10-27 16:09 ` [for-next][PATCH 14/15] ftrace: disable preemption when recursion locked Steven Rostedt
@ 2021-10-27 16:09 ` Steven Rostedt
14 siblings, 0 replies; 17+ messages in thread
From: Steven Rostedt @ 2021-10-27 16:09 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Guo Ren, Ingo Molnar,
James E.J. Bottomley, Helge Deller, Michael Ellerman,
Benjamin Herrenschmidt, Paul Mackerras, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov,
H. Peter Anvin, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
Petr Mladek, Joe Lawrence, Masami Hiramatsu,
Peter Zijlstra (Intel), Nicholas Piggin, Jisheng Zhang, Abaci,
Michael Wang
From: =?UTF-8?q?=E7=8E=8B=E8=B4=87?= <yun.wang@linux.alibaba.com>
With CONFIG_DEBUG_PREEMPT we observed reports like:
BUG: using smp_processor_id() in preemptible
caller is perf_ftrace_function_call+0x6f/0x2e0
CPU: 1 PID: 680 Comm: a.out Not tainted
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xcf
check_preemption_disabled+0x104/0x110
? optimize_nops.isra.7+0x230/0x230
? text_poke_bp_batch+0x9f/0x310
perf_ftrace_function_call+0x6f/0x2e0
...
__text_poke+0x5/0x620
text_poke_bp_batch+0x9f/0x310
This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.
Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.
Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com
CC: Steven Rostedt <rostedt@goodmis.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
kernel/trace/trace_event_perf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 6aed10e2f7ce..fba8cb77a73a 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -441,13 +441,13 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
if (!rcu_is_watching())
return;
- if ((unsigned long)ops->private != smp_processor_id())
- return;
-
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
+ if ((unsigned long)ops->private != smp_processor_id())
+ goto out;
+
event = container_of(ops, struct perf_event, ftrace_ops);
/*
--
2.33.0
^ permalink raw reply related [flat|nested] 17+ messages in thread