* [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers
@ 2022-09-06 22:53 Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Steven Rostedt @ 2022-09-06 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi
While looking at the histogram and filter code, I realized that it's filled
with function pointers. With retpolines causing a big slowdown, I thought
that was problematic. Thus, I decided to see what would happen if I changed
the function pointers into enums, and instead called a single function
that did a switch on those enums and called the necessary functions
directly. The results were pretty clear.
The first patch was to update the trace event benchmark event to include
a integer value "delta" of the delta that it took to complete
(it currently only shows the delta as part of a string). By doing
so, I could benchmark the histogram and filter logic with it.
Before this series, the histogram with a single filter (to ignore the
first event, which has a delta of zero), had:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 129 } hitcount: 2213
{ delta: 130 } hitcount: 285965
{ delta: 131 } hitcount: 1146545
{ delta: 132 } hitcount: 5185432
{ delta: 133 } hitcount: 19896215
{ delta: 134 } hitcount: 53118616
{ delta: 135 } hitcount: 83816709
{ delta: 136 } hitcount: 68329562
{ delta: 137 } hitcount: 41859349
{ delta: 138 } hitcount: 46257797
{ delta: 139 } hitcount: 54400831
{ delta: 140 } hitcount: 72875007
{ delta: 141 } hitcount: 76193272
{ delta: 142 } hitcount: 49504263
{ delta: 143 } hitcount: 38821072
{ delta: 144 } hitcount: 47702679
{ delta: 145 } hitcount: 41357297
{ delta: 146 } hitcount: 22058238
{ delta: 147 } hitcount: 9720002
{ delta: 148 } hitcount: 3193542
{ delta: 149 } hitcount: 927030
{ delta: 150 } hitcount: 850772
{ delta: 151 } hitcount: 1477380
{ delta: 152 } hitcount: 2687977
{ delta: 153 } hitcount: 2865985
{ delta: 154 } hitcount: 1977492
{ delta: 155 } hitcount: 2475607
{ delta: 156 } hitcount: 3403612
{ delta: 157 } hitcount: 2264011
{ delta: 158 } hitcount: 1096214
{ delta: 159 } hitcount: 504653
{ delta: 160 } hitcount: 218869
{ delta: 161 } hitcount: 103246
[..]
Where the bulk was around 142ns, and the fastest time was 129ns.
After this series:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 103 } hitcount: 60
{ delta: 104 } hitcount: 16966
{ delta: 105 } hitcount: 396625
{ delta: 106 } hitcount: 3223400
{ delta: 107 } hitcount: 12053754
{ delta: 108 } hitcount: 20241711
{ delta: 109 } hitcount: 14850200
{ delta: 110 } hitcount: 4946599
{ delta: 111 } hitcount: 3479315
{ delta: 112 } hitcount: 18698299
{ delta: 113 } hitcount: 62388733
{ delta: 114 } hitcount: 95803834
{ delta: 115 } hitcount: 58278130
{ delta: 116 } hitcount: 15364800
{ delta: 117 } hitcount: 5586866
{ delta: 118 } hitcount: 2346880
{ delta: 119 } hitcount: 1131091
{ delta: 120 } hitcount: 620896
{ delta: 121 } hitcount: 236652
{ delta: 122 } hitcount: 105957
{ delta: 123 } hitcount: 119107
{ delta: 124 } hitcount: 54494
{ delta: 125 } hitcount: 63856
{ delta: 126 } hitcount: 64454
{ delta: 127 } hitcount: 34818
{ delta: 128 } hitcount: 41446
{ delta: 129 } hitcount: 51242
{ delta: 130 } hitcount: 28361
{ delta: 131 } hitcount: 23926
{ delta: 132 } hitcount: 22253
{ delta: 133 } hitcount: 16994
{ delta: 134 } hitcount: 14970
{ delta: 135 } hitcount: 13464
{ delta: 136 } hitcount: 11452
{ delta: 137 } hitcount: 12212
{ delta: 138 } hitcount: 12280
{ delta: 139 } hitcount: 9127
{ delta: 140 } hitcount: 9553
Where the bulk was around 114ns and the fast time was 103ns.
That's almost a 20% speedup!!!
Changes since v1: https://lore.kernel.org/all/20220823214606.344269352@goodmis.org/
- Fixed combining constants (Masami Hiramatsu)
Steven Rostedt (Google) (4):
tracing: Add numeric delta time to the trace event benchmark
tracing/hist: Call hist functions directly via a switch statement
tracing: Move struct filter_pred into trace_events_filter.c
tracing/filter: Call filter predicate functions directly via a switch statement
----
kernel/trace/trace.h | 13 --
kernel/trace/trace_benchmark.c | 2 +-
kernel/trace/trace_benchmark.h | 8 +-
kernel/trace/trace_events_filter.c | 239 ++++++++++++++++++++++++-----------
kernel/trace/trace_events_hist.c | 246 +++++++++++++++++++++++++------------
5 files changed, 343 insertions(+), 165 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/4] tracing: Add numeric delta time to the trace event benchmark
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
@ 2022-09-06 22:53 ` Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2022-09-06 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
In order to testing filtering and histograms via the trace event
benchmark, record the delta time of the last event as a numeric value
(currently, it just saves it within the string) so that filters and
histograms can use it.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_benchmark.c | 2 +-
kernel/trace/trace_benchmark.h | 8 +++++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index 801c2a7f7605..54d5fa35c90a 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -51,7 +51,7 @@ static void trace_do_benchmark(void)
local_irq_disable();
start = trace_clock_local();
- trace_benchmark_event(bm_str);
+ trace_benchmark_event(bm_str, bm_last);
stop = trace_clock_local();
local_irq_enable();
diff --git a/kernel/trace/trace_benchmark.h b/kernel/trace/trace_benchmark.h
index 79e6fbe5b365..c3e91060dc94 100644
--- a/kernel/trace/trace_benchmark.h
+++ b/kernel/trace/trace_benchmark.h
@@ -14,19 +14,21 @@ extern void trace_benchmark_unreg(void);
TRACE_EVENT_FN(benchmark_event,
- TP_PROTO(const char *str),
+ TP_PROTO(const char *str, u64 delta),
- TP_ARGS(str),
+ TP_ARGS(str, delta),
TP_STRUCT__entry(
__array( char, str, BENCHMARK_EVENT_STRLEN )
+ __field( u64, delta)
),
TP_fast_assign(
memcpy(__entry->str, str, BENCHMARK_EVENT_STRLEN);
+ __entry->delta = delta;
),
- TP_printk("%s", __entry->str),
+ TP_printk("%s delta=%llu", __entry->str, __entry->delta),
trace_benchmark_reg, trace_benchmark_unreg
);
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] tracing/hist: Call hist functions directly via a switch statement
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
@ 2022-09-06 22:53 ` Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2022-09-06 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 129 } hitcount: 2213
{ delta: 130 } hitcount: 285965
{ delta: 131 } hitcount: 1146545
{ delta: 132 } hitcount: 5185432
{ delta: 133 } hitcount: 19896215
{ delta: 134 } hitcount: 53118616
{ delta: 135 } hitcount: 83816709
{ delta: 136 } hitcount: 68329562
{ delta: 137 } hitcount: 41859349
{ delta: 138 } hitcount: 46257797
{ delta: 139 } hitcount: 54400831
{ delta: 140 } hitcount: 72875007
{ delta: 141 } hitcount: 76193272
{ delta: 142 } hitcount: 49504263
{ delta: 143 } hitcount: 38821072
{ delta: 144 } hitcount: 47702679
{ delta: 145 } hitcount: 41357297
{ delta: 146 } hitcount: 22058238
{ delta: 147 } hitcount: 9720002
{ delta: 148 } hitcount: 3193542
{ delta: 149 } hitcount: 927030
{ delta: 150 } hitcount: 850772
{ delta: 151 } hitcount: 1477380
{ delta: 152 } hitcount: 2687977
{ delta: 153 } hitcount: 2865985
{ delta: 154 } hitcount: 1977492
{ delta: 155 } hitcount: 2475607
{ delta: 156 } hitcount: 3403612
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount: 1892484
{ delta: 122 } hitcount: 6205004
{ delta: 123 } hitcount: 11583521
{ delta: 124 } hitcount: 37590979
{ delta: 125 } hitcount: 108308504
{ delta: 126 } hitcount: 131672461
{ delta: 127 } hitcount: 88700598
{ delta: 128 } hitcount: 65939870
{ delta: 129 } hitcount: 45055004
{ delta: 130 } hitcount: 33174464
{ delta: 131 } hitcount: 31813493
{ delta: 132 } hitcount: 29011676
{ delta: 133 } hitcount: 22798782
{ delta: 134 } hitcount: 22072486
{ delta: 135 } hitcount: 17034113
{ delta: 136 } hitcount: 8982490
{ delta: 137 } hitcount: 2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount: 1651944
{ delta: 140 } hitcount: 4112073
{ delta: 141 } hitcount: 3963269
{ delta: 142 } hitcount: 1712508
{ delta: 143 } hitcount: 575941
{ delta: 144 } hitcount: 351427
{ delta: 145 } hitcount: 218077
{ delta: 146 } hitcount: 167297
{ delta: 147 } hitcount: 146198
{ delta: 148 } hitcount: 116122
{ delta: 149 } hitcount: 58993
{ delta: 150 } hitcount: 40228
The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events_hist.c | 246 +++++++++++++++++++++----------
1 file changed, 169 insertions(+), 77 deletions(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index fdf784620c28..48465f7e97b4 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -104,6 +104,38 @@ enum field_op_id {
FIELD_OP_MULT,
};
+enum hist_field_fn {
+ HIST_FIELD_FN_NOP,
+ HIST_FIELD_FN_VAR_REF,
+ HIST_FIELD_FN_COUNTER,
+ HIST_FIELD_FN_CONST,
+ HIST_FIELD_FN_LOG2,
+ HIST_FIELD_FN_BUCKET,
+ HIST_FIELD_FN_TIMESTAMP,
+ HIST_FIELD_FN_CPU,
+ HIST_FIELD_FN_STRING,
+ HIST_FIELD_FN_DYNSTRING,
+ HIST_FIELD_FN_RELDYNSTRING,
+ HIST_FIELD_FN_PSTRING,
+ HIST_FIELD_FN_S64,
+ HIST_FIELD_FN_U64,
+ HIST_FIELD_FN_S32,
+ HIST_FIELD_FN_U32,
+ HIST_FIELD_FN_S16,
+ HIST_FIELD_FN_U16,
+ HIST_FIELD_FN_S8,
+ HIST_FIELD_FN_U8,
+ HIST_FIELD_FN_UMINUS,
+ HIST_FIELD_FN_MINUS,
+ HIST_FIELD_FN_PLUS,
+ HIST_FIELD_FN_DIV,
+ HIST_FIELD_FN_MULT,
+ HIST_FIELD_FN_DIV_POWER2,
+ HIST_FIELD_FN_DIV_NOT_POWER2,
+ HIST_FIELD_FN_DIV_MULT_SHIFT,
+ HIST_FIELD_FN_EXECNAME,
+};
+
/*
* A hist_var (histogram variable) contains variable information for
* hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF
@@ -123,15 +155,15 @@ struct hist_var {
struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
- hist_field_fn_t fn;
- unsigned int ref;
- unsigned int size;
- unsigned int offset;
- unsigned int is_signed;
unsigned long buckets;
const char *type;
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
struct hist_trigger_data *hist_data;
+ enum hist_field_fn fn_num;
+ unsigned int ref;
+ unsigned int size;
+ unsigned int offset;
+ unsigned int is_signed;
/*
* Variable fields contain variable-specific info in var.
@@ -166,14 +198,11 @@ struct hist_field {
u64 div_multiplier;
};
-static u64 hist_field_none(struct hist_field *field,
- struct tracing_map_elt *elt,
- struct trace_buffer *buffer,
- struct ring_buffer_event *rbe,
- void *event)
-{
- return 0;
-}
+static u64 hist_fn_call(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct trace_buffer *buffer,
+ struct ring_buffer_event *rbe,
+ void *event);
static u64 hist_field_const(struct hist_field *field,
struct tracing_map_elt *elt,
@@ -250,7 +279,7 @@ static u64 hist_field_log2(struct hist_field *hist_field,
{
struct hist_field *operand = hist_field->operands[0];
- u64 val = operand->fn(operand, elt, buffer, rbe, event);
+ u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
return (u64) ilog2(roundup_pow_of_two(val));
}
@@ -264,7 +293,7 @@ static u64 hist_field_bucket(struct hist_field *hist_field,
struct hist_field *operand = hist_field->operands[0];
unsigned long buckets = hist_field->buckets;
- u64 val = operand->fn(operand, elt, buffer, rbe, event);
+ u64 val = hist_fn_call(operand, elt, buffer, rbe, event);
if (WARN_ON_ONCE(!buckets))
return val;
@@ -285,8 +314,8 @@ static u64 hist_field_plus(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
- u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+ u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
return val1 + val2;
}
@@ -300,8 +329,8 @@ static u64 hist_field_minus(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
- u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+ u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
return val1 - val2;
}
@@ -315,8 +344,8 @@ static u64 hist_field_div(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
- u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+ u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
/* Return -1 for the undefined case */
if (!val2)
@@ -338,7 +367,7 @@ static u64 div_by_power_of_two(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
return val1 >> __ffs64(operand2->constant);
}
@@ -352,7 +381,7 @@ static u64 div_by_not_power_of_two(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
return div64_u64(val1, operand2->constant);
}
@@ -366,7 +395,7 @@ static u64 div_by_mult_and_shift(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
/*
* If the divisor is a constant, do a multiplication and shift instead.
@@ -400,8 +429,8 @@ static u64 hist_field_mult(struct hist_field *hist_field,
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];
- u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event);
- u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event);
+ u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event);
+ u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event);
return val1 * val2;
}
@@ -414,7 +443,7 @@ static u64 hist_field_unary_minus(struct hist_field *hist_field,
{
struct hist_field *operand = hist_field->operands[0];
- s64 sval = (s64)operand->fn(operand, elt, buffer, rbe, event);
+ s64 sval = (s64)hist_fn_call(operand, elt, buffer, rbe, event);
u64 val = (u64)-sval;
return val;
@@ -657,19 +686,19 @@ struct snapshot_context {
* Returns the specific division function to use if the divisor
* is constant. This avoids extra branches when the trigger is hit.
*/
-static hist_field_fn_t hist_field_get_div_fn(struct hist_field *divisor)
+static enum hist_field_fn hist_field_get_div_fn(struct hist_field *divisor)
{
u64 div = divisor->constant;
if (!(div & (div - 1)))
- return div_by_power_of_two;
+ return HIST_FIELD_FN_DIV_POWER2;
/* If the divisor is too large, do a regular division */
if (div > (1 << HIST_DIV_SHIFT))
- return div_by_not_power_of_two;
+ return HIST_FIELD_FN_DIV_NOT_POWER2;
divisor->div_multiplier = div64_u64((u64)(1 << HIST_DIV_SHIFT), div);
- return div_by_mult_and_shift;
+ return HIST_FIELD_FN_DIV_MULT_SHIFT;
}
static void track_data_free(struct track_data *track_data)
@@ -1334,38 +1363,32 @@ static const char *hist_field_name(struct hist_field *field,
return field_name;
}
-static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
+static enum hist_field_fn select_value_fn(int field_size, int field_is_signed)
{
- hist_field_fn_t fn = NULL;
-
switch (field_size) {
case 8:
if (field_is_signed)
- fn = hist_field_s64;
+ return HIST_FIELD_FN_S64;
else
- fn = hist_field_u64;
- break;
+ return HIST_FIELD_FN_U64;
case 4:
if (field_is_signed)
- fn = hist_field_s32;
+ return HIST_FIELD_FN_S32;
else
- fn = hist_field_u32;
- break;
+ return HIST_FIELD_FN_U32;
case 2:
if (field_is_signed)
- fn = hist_field_s16;
+ return HIST_FIELD_FN_S16;
else
- fn = hist_field_u16;
- break;
+ return HIST_FIELD_FN_U16;
case 1:
if (field_is_signed)
- fn = hist_field_s8;
+ return HIST_FIELD_FN_S8;
else
- fn = hist_field_u8;
- break;
+ return HIST_FIELD_FN_U8;
}
- return fn;
+ return HIST_FIELD_FN_NOP;
}
static int parse_map_size(char *str)
@@ -1922,19 +1945,19 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
goto out; /* caller will populate */
if (flags & HIST_FIELD_FL_VAR_REF) {
- hist_field->fn = hist_field_var_ref;
+ hist_field->fn_num = HIST_FIELD_FN_VAR_REF;
goto out;
}
if (flags & HIST_FIELD_FL_HITCOUNT) {
- hist_field->fn = hist_field_counter;
+ hist_field->fn_num = HIST_FIELD_FN_COUNTER;
hist_field->size = sizeof(u64);
hist_field->type = "u64";
goto out;
}
if (flags & HIST_FIELD_FL_CONST) {
- hist_field->fn = hist_field_const;
+ hist_field->fn_num = HIST_FIELD_FN_CONST;
hist_field->size = sizeof(u64);
hist_field->type = kstrdup("u64", GFP_KERNEL);
if (!hist_field->type)
@@ -1943,14 +1966,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
}
if (flags & HIST_FIELD_FL_STACKTRACE) {
- hist_field->fn = hist_field_none;
+ hist_field->fn_num = HIST_FIELD_FN_NOP;
goto out;
}
if (flags & (HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET)) {
unsigned long fl = flags & ~(HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET);
- hist_field->fn = flags & HIST_FIELD_FL_LOG2 ? hist_field_log2 :
- hist_field_bucket;
+ hist_field->fn_num = flags & HIST_FIELD_FL_LOG2 ? HIST_FIELD_FN_LOG2 :
+ HIST_FIELD_FN_BUCKET;
hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL);
hist_field->size = hist_field->operands[0]->size;
hist_field->type = kstrdup_const(hist_field->operands[0]->type, GFP_KERNEL);
@@ -1960,14 +1983,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
}
if (flags & HIST_FIELD_FL_TIMESTAMP) {
- hist_field->fn = hist_field_timestamp;
+ hist_field->fn_num = HIST_FIELD_FN_TIMESTAMP;
hist_field->size = sizeof(u64);
hist_field->type = "u64";
goto out;
}
if (flags & HIST_FIELD_FL_CPU) {
- hist_field->fn = hist_field_cpu;
+ hist_field->fn_num = HIST_FIELD_FN_CPU;
hist_field->size = sizeof(int);
hist_field->type = "unsigned int";
goto out;
@@ -1987,14 +2010,14 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
goto free;
if (field->filter_type == FILTER_STATIC_STRING) {
- hist_field->fn = hist_field_string;
+ hist_field->fn_num = HIST_FIELD_FN_STRING;
hist_field->size = field->size;
} else if (field->filter_type == FILTER_DYN_STRING) {
- hist_field->fn = hist_field_dynstring;
+ hist_field->fn_num = HIST_FIELD_FN_DYNSTRING;
} else if (field->filter_type == FILTER_RDYN_STRING)
- hist_field->fn = hist_field_reldynstring;
+ hist_field->fn_num = HIST_FIELD_FN_RELDYNSTRING;
else
- hist_field->fn = hist_field_pstring;
+ hist_field->fn_num = HIST_FIELD_FN_PSTRING;
} else {
hist_field->size = field->size;
hist_field->is_signed = field->is_signed;
@@ -2002,9 +2025,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (!hist_field->type)
goto free;
- hist_field->fn = select_value_fn(field->size,
- field->is_signed);
- if (!hist_field->fn) {
+ hist_field->fn_num = select_value_fn(field->size,
+ field->is_signed);
+ if (hist_field->fn_num == HIST_FIELD_FN_NOP) {
destroy_hist_field(hist_field, 0);
return NULL;
}
@@ -2340,7 +2363,7 @@ static struct hist_field *create_alias(struct hist_trigger_data *hist_data,
if (!alias)
return NULL;
- alias->fn = var_ref->fn;
+ alias->fn_num = var_ref->fn_num;
alias->operands[0] = var_ref;
if (init_var_ref(alias, var_ref, var_ref->system, var_ref->event_name)) {
@@ -2523,7 +2546,7 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
expr->flags |= operand1->flags &
(HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS);
- expr->fn = hist_field_unary_minus;
+ expr->fn_num = HIST_FIELD_FN_UMINUS;
expr->operands[0] = operand1;
expr->size = operand1->size;
expr->is_signed = operand1->is_signed;
@@ -2595,7 +2618,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
unsigned long operand_flags, operand2_flags;
int field_op, ret = -EINVAL;
char *sep, *operand1_str;
- hist_field_fn_t op_fn;
+ enum hist_field_fn op_fn;
bool combine_consts;
if (*n_subexprs > 3) {
@@ -2654,16 +2677,16 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
switch (field_op) {
case FIELD_OP_MINUS:
- op_fn = hist_field_minus;
+ op_fn = HIST_FIELD_FN_MINUS;
break;
case FIELD_OP_PLUS:
- op_fn = hist_field_plus;
+ op_fn = HIST_FIELD_FN_PLUS;
break;
case FIELD_OP_DIV:
- op_fn = hist_field_div;
+ op_fn = HIST_FIELD_FN_DIV;
break;
case FIELD_OP_MULT:
- op_fn = hist_field_mult;
+ op_fn = HIST_FIELD_FN_MULT;
break;
default:
ret = -EINVAL;
@@ -2719,13 +2742,16 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
op_fn = hist_field_get_div_fn(operand2);
}
+ expr->fn_num = op_fn;
+
if (combine_consts) {
if (var1)
expr->operands[0] = var1;
if (var2)
expr->operands[1] = var2;
- expr->constant = op_fn(expr, NULL, NULL, NULL, NULL);
+ expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL);
+ expr->fn_num = HIST_FIELD_FN_CONST;
expr->operands[0] = NULL;
expr->operands[1] = NULL;
@@ -2739,8 +2765,6 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
expr->name = expr_str(expr, 0);
} else {
- expr->fn = op_fn;
-
/* The operand sizes should be the same, so just pick one */
expr->size = operand1->size;
expr->is_signed = operand1->is_signed;
@@ -3065,7 +3089,7 @@ static inline void __update_field_vars(struct tracing_map_elt *elt,
struct hist_field *var = field_var->var;
struct hist_field *val = field_var->val;
- var_val = val->fn(val, elt, buffer, rbe, rec);
+ var_val = hist_fn_call(val, elt, buffer, rbe, rec);
var_idx = var->var.idx;
if (val->flags & HIST_FIELD_FL_STRING) {
@@ -4186,6 +4210,74 @@ static u64 hist_field_execname(struct hist_field *hist_field,
return (u64)(unsigned long)(elt_data->comm);
}
+static u64 hist_fn_call(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct trace_buffer *buffer,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ switch (hist_field->fn_num) {
+ case HIST_FIELD_FN_VAR_REF:
+ return hist_field_var_ref(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_COUNTER:
+ return hist_field_counter(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_CONST:
+ return hist_field_const(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_LOG2:
+ return hist_field_log2(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_BUCKET:
+ return hist_field_bucket(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_TIMESTAMP:
+ return hist_field_timestamp(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_CPU:
+ return hist_field_cpu(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_STRING:
+ return hist_field_string(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_DYNSTRING:
+ return hist_field_dynstring(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_RELDYNSTRING:
+ return hist_field_reldynstring(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_PSTRING:
+ return hist_field_pstring(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_S64:
+ return hist_field_s64(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_U64:
+ return hist_field_u64(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_S32:
+ return hist_field_s32(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_U32:
+ return hist_field_u32(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_S16:
+ return hist_field_s16(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_U16:
+ return hist_field_u16(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_S8:
+ return hist_field_s8(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_U8:
+ return hist_field_u8(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_UMINUS:
+ return hist_field_unary_minus(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_MINUS:
+ return hist_field_minus(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_PLUS:
+ return hist_field_plus(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_DIV:
+ return hist_field_div(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_MULT:
+ return hist_field_mult(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_DIV_POWER2:
+ return div_by_power_of_two(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_DIV_NOT_POWER2:
+ return div_by_not_power_of_two(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_DIV_MULT_SHIFT:
+ return div_by_mult_and_shift(hist_field, elt, buffer, rbe, event);
+ case HIST_FIELD_FN_EXECNAME:
+ return hist_field_execname(hist_field, elt, buffer, rbe, event);
+ default:
+ return 0;
+ }
+}
+
/* Convert a var that points to common_pid.execname to a string */
static void update_var_execname(struct hist_field *hist_field)
{
@@ -4197,7 +4289,7 @@ static void update_var_execname(struct hist_field *hist_field)
kfree_const(hist_field->type);
hist_field->type = "char[]";
- hist_field->fn = hist_field_execname;
+ hist_field->fn_num = HIST_FIELD_FN_EXECNAME;
}
static int create_var_field(struct hist_trigger_data *hist_data,
@@ -4956,7 +5048,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
for_each_hist_val_field(i, hist_data) {
hist_field = hist_data->fields[i];
- hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
+ hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
if (hist_field->flags & HIST_FIELD_FL_VAR) {
var_idx = hist_field->var.idx;
@@ -4987,7 +5079,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
for_each_hist_key_field(i, hist_data) {
hist_field = hist_data->fields[i];
if (hist_field->flags & HIST_FIELD_FL_VAR) {
- hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec);
+ hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec);
var_idx = hist_field->var.idx;
tracing_map_set_var(elt, var_idx, hist_val);
}
@@ -5062,7 +5154,7 @@ static void event_hist_trigger(struct event_trigger_data *data,
HIST_STACKTRACE_SKIP);
key = entries;
} else {
- field_contents = key_field->fn(key_field, elt, buffer, rbe, rec);
+ field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec);
if (key_field->flags & HIST_FIELD_FL_STRING) {
key = (void *)(unsigned long)field_contents;
use_compound_key = true;
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] tracing: Move struct filter_pred into trace_events_filter.c
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
@ 2022-09-06 22:53 ` Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt
2022-09-09 12:05 ` [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Masami Hiramatsu
4 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2022-09-06 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The structure filter_pred and the typedef of the function used are only
referenced by trace_events_filter.c. There's no reason to have it in an
external header file. Move them into the only file they are used in.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace.h | 13 -------------
kernel/trace/trace_events_filter.c | 13 +++++++++++++
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 900e75d96c84..54ee5711c729 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1435,8 +1435,6 @@ event_trigger_unlock_commit(struct trace_event_file *file,
struct filter_pred;
struct regex;
-typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
-
typedef int (*regex_match_func)(char *str, struct regex *r, int len);
enum regex_type {
@@ -1455,17 +1453,6 @@ struct regex {
regex_match_func match;
};
-struct filter_pred {
- filter_pred_fn_t fn;
- u64 val;
- struct regex regex;
- unsigned short *ops;
- struct ftrace_event_field *field;
- int offset;
- int not;
- int op;
-};
-
static inline bool is_string_field(struct ftrace_event_field *field)
{
return field->filter_type == FILTER_DYN_STRING ||
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 4b1057ab9d96..c49c689ce4ad 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -43,6 +43,19 @@ enum filter_op_ids { OPS };
static const char * ops[] = { OPS };
+typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
+
+struct filter_pred {
+ filter_pred_fn_t fn;
+ u64 val;
+ struct regex regex;
+ unsigned short *ops;
+ struct ftrace_event_field *field;
+ int offset;
+ int not;
+ int op;
+};
+
/*
* pred functions are OP_LE, OP_LT, OP_GE, OP_GT, and OP_BAND
* pred_funcs_##type below must match the order of them above.
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] tracing/filter: Call filter predicate functions directly via a switch statement
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
` (2 preceding siblings ...)
2022-09-06 22:53 ` [PATCH v2 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
@ 2022-09-06 22:53 ` Steven Rostedt
2022-09-09 12:05 ` [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Masami Hiramatsu
4 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2022-09-06 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Masami Hiramatsu, Tom Zanussi
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Due to retpolines, indirect calls are much more expensive than direct
calls. The filters have a select set of functions it uses for the
predicates. Instead of using function pointers to call them, create a
filter_pred_fn_call() function that uses a switch statement to call the
predicate functions directly. This gives almost a 10% speedup to the
filter logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount: 1892484
{ delta: 122 } hitcount: 6205004
{ delta: 123 } hitcount: 11583521
{ delta: 124 } hitcount: 37590979
{ delta: 125 } hitcount: 108308504
{ delta: 126 } hitcount: 131672461
{ delta: 127 } hitcount: 88700598
{ delta: 128 } hitcount: 65939870
{ delta: 129 } hitcount: 45055004
{ delta: 130 } hitcount: 33174464
{ delta: 131 } hitcount: 31813493
{ delta: 132 } hitcount: 29011676
{ delta: 133 } hitcount: 22798782
{ delta: 134 } hitcount: 22072486
{ delta: 135 } hitcount: 17034113
{ delta: 136 } hitcount: 8982490
{ delta: 137 } hitcount: 2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount: 1651944
{ delta: 140 } hitcount: 4112073
{ delta: 141 } hitcount: 3963269
{ delta: 142 } hitcount: 1712508
{ delta: 143 } hitcount: 575941
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 103 } hitcount: 60
{ delta: 104 } hitcount: 16966
{ delta: 105 } hitcount: 396625
{ delta: 106 } hitcount: 3223400
{ delta: 107 } hitcount: 12053754
{ delta: 108 } hitcount: 20241711
{ delta: 109 } hitcount: 14850200
{ delta: 110 } hitcount: 4946599
{ delta: 111 } hitcount: 3479315
{ delta: 112 } hitcount: 18698299
{ delta: 113 } hitcount: 62388733
{ delta: 114 } hitcount: 95803834
{ delta: 115 } hitcount: 58278130
{ delta: 116 } hitcount: 15364800
{ delta: 117 } hitcount: 5586866
{ delta: 118 } hitcount: 2346880
{ delta: 119 } hitcount: 1131091
{ delta: 120 } hitcount: 620896
{ delta: 121 } hitcount: 236652
{ delta: 122 } hitcount: 105957
{ delta: 123 } hitcount: 119107
{ delta: 124 } hitcount: 54494
{ delta: 125 } hitcount: 63856
{ delta: 126 } hitcount: 64454
{ delta: 127 } hitcount: 34818
{ delta: 128 } hitcount: 41446
{ delta: 129 } hitcount: 51242
{ delta: 130 } hitcount: 28361
{ delta: 131 } hitcount: 23926
The peak before was 126ns per event, after the peak is 114ns, and the
fastest time went from 113ns to 103ns.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/trace_events_filter.c | 230 ++++++++++++++++++++---------
1 file changed, 157 insertions(+), 73 deletions(-)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index c49c689ce4ad..96acc2b71ac7 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -43,10 +43,33 @@ enum filter_op_ids { OPS };
static const char * ops[] = { OPS };
-typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
+enum filter_pred_fn {
+ FILTER_PRED_FN_NOP,
+ FILTER_PRED_FN_64,
+ FILTER_PRED_FN_S64,
+ FILTER_PRED_FN_U64,
+ FILTER_PRED_FN_32,
+ FILTER_PRED_FN_S32,
+ FILTER_PRED_FN_U32,
+ FILTER_PRED_FN_16,
+ FILTER_PRED_FN_S16,
+ FILTER_PRED_FN_U16,
+ FILTER_PRED_FN_8,
+ FILTER_PRED_FN_S8,
+ FILTER_PRED_FN_U8,
+ FILTER_PRED_FN_COMM,
+ FILTER_PRED_FN_STRING,
+ FILTER_PRED_FN_STRLOC,
+ FILTER_PRED_FN_STRRELLOC,
+ FILTER_PRED_FN_PCHAR_USER,
+ FILTER_PRED_FN_PCHAR,
+ FILTER_PRED_FN_CPU,
+ FILTER_PRED_FN_,
+ FILTER_PRED_TEST_VISITED,
+};
struct filter_pred {
- filter_pred_fn_t fn;
+ enum filter_pred_fn fn_num;
u64 val;
struct regex regex;
unsigned short *ops;
@@ -603,44 +626,48 @@ predicate_parse(const char *str, int nr_parens, int nr_preds,
return ERR_PTR(ret);
}
+enum pred_cmp_types {
+ PRED_CMP_TYPE_NOP,
+ PRED_CMP_TYPE_LT,
+ PRED_CMP_TYPE_LE,
+ PRED_CMP_TYPE_GT,
+ PRED_CMP_TYPE_GE,
+ PRED_CMP_TYPE_BAND,
+};
+
#define DEFINE_COMPARISON_PRED(type) \
-static int filter_pred_LT_##type(struct filter_pred *pred, void *event) \
-{ \
- type *addr = (type *)(event + pred->offset); \
- type val = (type)pred->val; \
- return *addr < val; \
-} \
-static int filter_pred_LE_##type(struct filter_pred *pred, void *event) \
+static int filter_pred_##type(struct filter_pred *pred, void *event) \
{ \
- type *addr = (type *)(event + pred->offset); \
- type val = (type)pred->val; \
- return *addr <= val; \
-} \
-static int filter_pred_GT_##type(struct filter_pred *pred, void *event) \
-{ \
- type *addr = (type *)(event + pred->offset); \
- type val = (type)pred->val; \
- return *addr > val; \
-} \
-static int filter_pred_GE_##type(struct filter_pred *pred, void *event) \
-{ \
- type *addr = (type *)(event + pred->offset); \
- type val = (type)pred->val; \
- return *addr >= val; \
-} \
-static int filter_pred_BAND_##type(struct filter_pred *pred, void *event) \
-{ \
- type *addr = (type *)(event + pred->offset); \
- type val = (type)pred->val; \
- return !!(*addr & val); \
-} \
-static const filter_pred_fn_t pred_funcs_##type[] = { \
- filter_pred_LE_##type, \
- filter_pred_LT_##type, \
- filter_pred_GE_##type, \
- filter_pred_GT_##type, \
- filter_pred_BAND_##type, \
-};
+ switch (pred->op) { \
+ case OP_LT: { \
+ type *addr = (type *)(event + pred->offset); \
+ type val = (type)pred->val; \
+ return *addr < val; \
+ } \
+ case OP_LE: { \
+ type *addr = (type *)(event + pred->offset); \
+ type val = (type)pred->val; \
+ return *addr <= val; \
+ } \
+ case OP_GT: { \
+ type *addr = (type *)(event + pred->offset); \
+ type val = (type)pred->val; \
+ return *addr > val; \
+ } \
+ case OP_GE: { \
+ type *addr = (type *)(event + pred->offset); \
+ type val = (type)pred->val; \
+ return *addr >= val; \
+ } \
+ case OP_BAND: { \
+ type *addr = (type *)(event + pred->offset); \
+ type val = (type)pred->val; \
+ return !!(*addr & val); \
+ } \
+ default: \
+ return 0; \
+ } \
+}
#define DEFINE_EQUALITY_PRED(size) \
static int filter_pred_##size(struct filter_pred *pred, void *event) \
@@ -849,11 +876,6 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
return cmp ^ pred->not;
}
-static int filter_pred_none(struct filter_pred *pred, void *event)
-{
- return 0;
-}
-
/*
* regex_match_foo - Basic regex callbacks
*
@@ -999,6 +1021,19 @@ static void filter_build_regex(struct filter_pred *pred)
}
}
+
+#ifdef CONFIG_FTRACE_STARTUP_TEST
+static int test_pred_visited_fn(struct filter_pred *pred, void *event);
+#else
+static int test_pred_visited_fn(struct filter_pred *pred, void *event)
+{
+ return 0;
+}
+#endif
+
+
+static int filter_pred_fn_call(struct filter_pred *pred, void *event);
+
/* return 1 if event matches, 0 otherwise (discard) */
int filter_match_preds(struct event_filter *filter, void *rec)
{
@@ -1016,7 +1051,7 @@ int filter_match_preds(struct event_filter *filter, void *rec)
for (i = 0; prog[i].pred; i++) {
struct filter_pred *pred = prog[i].pred;
- int match = pred->fn(pred, rec);
+ int match = filter_pred_fn_call(pred, rec);
if (match == prog[i].when_to_branch)
i = prog[i].target;
}
@@ -1202,10 +1237,10 @@ int filter_assign_type(const char *type)
return FILTER_OTHER;
}
-static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op,
- int field_size, int field_is_signed)
+static enum filter_pred_fn select_comparison_fn(enum filter_op_ids op,
+ int field_size, int field_is_signed)
{
- filter_pred_fn_t fn = NULL;
+ enum filter_pred_fn fn = FILTER_PRED_FN_NOP;
int pred_func_index = -1;
switch (op) {
@@ -1214,50 +1249,99 @@ static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op,
break;
default:
if (WARN_ON_ONCE(op < PRED_FUNC_START))
- return NULL;
+ return fn;
pred_func_index = op - PRED_FUNC_START;
if (WARN_ON_ONCE(pred_func_index > PRED_FUNC_MAX))
- return NULL;
+ return fn;
}
switch (field_size) {
case 8:
if (pred_func_index < 0)
- fn = filter_pred_64;
+ fn = FILTER_PRED_FN_64;
else if (field_is_signed)
- fn = pred_funcs_s64[pred_func_index];
+ fn = FILTER_PRED_FN_S64;
else
- fn = pred_funcs_u64[pred_func_index];
+ fn = FILTER_PRED_FN_U64;
break;
case 4:
if (pred_func_index < 0)
- fn = filter_pred_32;
+ fn = FILTER_PRED_FN_32;
else if (field_is_signed)
- fn = pred_funcs_s32[pred_func_index];
+ fn = FILTER_PRED_FN_S32;
else
- fn = pred_funcs_u32[pred_func_index];
+ fn = FILTER_PRED_FN_U32;
break;
case 2:
if (pred_func_index < 0)
- fn = filter_pred_16;
+ fn = FILTER_PRED_FN_16;
else if (field_is_signed)
- fn = pred_funcs_s16[pred_func_index];
+ fn = FILTER_PRED_FN_S16;
else
- fn = pred_funcs_u16[pred_func_index];
+ fn = FILTER_PRED_FN_U16;
break;
case 1:
if (pred_func_index < 0)
- fn = filter_pred_8;
+ fn = FILTER_PRED_FN_8;
else if (field_is_signed)
- fn = pred_funcs_s8[pred_func_index];
+ fn = FILTER_PRED_FN_S8;
else
- fn = pred_funcs_u8[pred_func_index];
+ fn = FILTER_PRED_FN_U8;
break;
}
return fn;
}
+
+static int filter_pred_fn_call(struct filter_pred *pred, void *event)
+{
+ switch (pred->fn_num) {
+ case FILTER_PRED_FN_64:
+ return filter_pred_64(pred, event);
+ case FILTER_PRED_FN_S64:
+ return filter_pred_s64(pred, event);
+ case FILTER_PRED_FN_U64:
+ return filter_pred_u64(pred, event);
+ case FILTER_PRED_FN_32:
+ return filter_pred_32(pred, event);
+ case FILTER_PRED_FN_S32:
+ return filter_pred_s32(pred, event);
+ case FILTER_PRED_FN_U32:
+ return filter_pred_u32(pred, event);
+ case FILTER_PRED_FN_16:
+ return filter_pred_16(pred, event);
+ case FILTER_PRED_FN_S16:
+ return filter_pred_s16(pred, event);
+ case FILTER_PRED_FN_U16:
+ return filter_pred_u16(pred, event);
+ case FILTER_PRED_FN_8:
+ return filter_pred_8(pred, event);
+ case FILTER_PRED_FN_S8:
+ return filter_pred_s8(pred, event);
+ case FILTER_PRED_FN_U8:
+ return filter_pred_u8(pred, event);
+ case FILTER_PRED_FN_COMM:
+ return filter_pred_comm(pred, event);
+ case FILTER_PRED_FN_STRING:
+ return filter_pred_string(pred, event);
+ case FILTER_PRED_FN_STRLOC:
+ return filter_pred_strloc(pred, event);
+ case FILTER_PRED_FN_STRRELLOC:
+ return filter_pred_strrelloc(pred, event);
+ case FILTER_PRED_FN_PCHAR_USER:
+ return filter_pred_pchar_user(pred, event);
+ case FILTER_PRED_FN_PCHAR:
+ return filter_pred_pchar(pred, event);
+ case FILTER_PRED_FN_CPU:
+ return filter_pred_cpu(pred, event);
+ case FILTER_PRED_TEST_VISITED:
+ return test_pred_visited_fn(pred, event);
+ default:
+ return 0;
+ }
+}
+
/* Called when a predicate is encountered by predicate_parse() */
static int parse_pred(const char *str, void *data,
int pos, struct filter_parse_error *pe,
@@ -1351,7 +1435,7 @@ static int parse_pred(const char *str, void *data,
parse_error(pe, FILT_ERR_IP_FIELD_ONLY, pos + i);
goto err_free;
}
- pred->fn = filter_pred_none;
+ pred->fn_num = FILTER_PRED_FN_NOP;
/*
* Quotes are not required, but if they exist then we need
@@ -1429,16 +1513,16 @@ static int parse_pred(const char *str, void *data,
filter_build_regex(pred);
if (field->filter_type == FILTER_COMM) {
- pred->fn = filter_pred_comm;
+ pred->fn_num = FILTER_PRED_FN_COMM;
} else if (field->filter_type == FILTER_STATIC_STRING) {
- pred->fn = filter_pred_string;
+ pred->fn_num = FILTER_PRED_FN_STRING;
pred->regex.field_len = field->size;
} else if (field->filter_type == FILTER_DYN_STRING) {
- pred->fn = filter_pred_strloc;
+ pred->fn_num = FILTER_PRED_FN_STRLOC;
} else if (field->filter_type == FILTER_RDYN_STRING)
- pred->fn = filter_pred_strrelloc;
+ pred->fn_num = FILTER_PRED_FN_STRRELLOC;
else {
if (!ustring_per_cpu) {
@@ -1449,9 +1533,9 @@ static int parse_pred(const char *str, void *data,
}
if (ustring)
- pred->fn = filter_pred_pchar_user;
+ pred->fn_num = FILTER_PRED_FN_PCHAR_USER;
else
- pred->fn = filter_pred_pchar;
+ pred->fn_num = FILTER_PRED_FN_PCHAR;
}
/* go past the last quote */
i++;
@@ -1499,10 +1583,10 @@ static int parse_pred(const char *str, void *data,
pred->val = val;
if (field->filter_type == FILTER_CPU)
- pred->fn = filter_pred_cpu;
+ pred->fn_num = FILTER_PRED_FN_CPU;
else {
- pred->fn = select_comparison_fn(pred->op, field->size,
- field->is_signed);
+ pred->fn_num = select_comparison_fn(pred->op, field->size,
+ field->is_signed);
if (pred->op == OP_NE)
pred->not = 1;
}
@@ -2309,7 +2393,7 @@ static void update_pred_fn(struct event_filter *filter, char *fields)
struct filter_pred *pred = prog[i].pred;
struct ftrace_event_field *field = pred->field;
- WARN_ON_ONCE(!pred->fn);
+ WARN_ON_ONCE(pred->fn_num == FILTER_PRED_FN_NOP);
if (!field) {
WARN_ONCE(1, "all leafs should have field defined %d", i);
@@ -2319,7 +2403,7 @@ static void update_pred_fn(struct event_filter *filter, char *fields)
if (!strchr(fields, *field->name))
continue;
- pred->fn = test_pred_visited_fn;
+ pred->fn_num = FILTER_PRED_TEST_VISITED;
}
}
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
` (3 preceding siblings ...)
2022-09-06 22:53 ` [PATCH v2 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt
@ 2022-09-09 12:05 ` Masami Hiramatsu
4 siblings, 0 replies; 6+ messages in thread
From: Masami Hiramatsu @ 2022-09-09 12:05 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Masami Hiramatsu,
Tom Zanussi
On Tue, 06 Sep 2022 18:53:13 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> While looking at the histogram and filter code, I realized that it's filled
> with function pointers. With retpolines causing a big slowdown, I thought
> that was problematic. Thus, I decided to see what would happen if I changed
> the function pointers into enums, and instead called a single function
> that did a switch on those enums and called the necessary functions
> directly. The results were pretty clear.
>
> The first patch was to update the trace event benchmark event to include
> a integer value "delta" of the delta that it took to complete
> (it currently only shows the delta as part of a string). By doing
> so, I could benchmark the histogram and filter logic with it.
>
> Before this series, the histogram with a single filter (to ignore the
> first event, which has a delta of zero), had:
>
> # event histogram
> #
> # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
> #
>
> { delta: 129 } hitcount: 2213
> { delta: 130 } hitcount: 285965
> { delta: 131 } hitcount: 1146545
> { delta: 132 } hitcount: 5185432
> { delta: 133 } hitcount: 19896215
> { delta: 134 } hitcount: 53118616
> { delta: 135 } hitcount: 83816709
> { delta: 136 } hitcount: 68329562
> { delta: 137 } hitcount: 41859349
> { delta: 138 } hitcount: 46257797
> { delta: 139 } hitcount: 54400831
> { delta: 140 } hitcount: 72875007
> { delta: 141 } hitcount: 76193272
> { delta: 142 } hitcount: 49504263
> { delta: 143 } hitcount: 38821072
> { delta: 144 } hitcount: 47702679
> { delta: 145 } hitcount: 41357297
> { delta: 146 } hitcount: 22058238
> { delta: 147 } hitcount: 9720002
> { delta: 148 } hitcount: 3193542
> { delta: 149 } hitcount: 927030
> { delta: 150 } hitcount: 850772
> { delta: 151 } hitcount: 1477380
> { delta: 152 } hitcount: 2687977
> { delta: 153 } hitcount: 2865985
> { delta: 154 } hitcount: 1977492
> { delta: 155 } hitcount: 2475607
> { delta: 156 } hitcount: 3403612
> { delta: 157 } hitcount: 2264011
> { delta: 158 } hitcount: 1096214
> { delta: 159 } hitcount: 504653
> { delta: 160 } hitcount: 218869
> { delta: 161 } hitcount: 103246
> [..]
>
> Where the bulk was around 142ns, and the fastest time was 129ns.
>
> After this series:
>
> # event histogram
> #
> # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
> #
>
> { delta: 103 } hitcount: 60
> { delta: 104 } hitcount: 16966
> { delta: 105 } hitcount: 396625
> { delta: 106 } hitcount: 3223400
> { delta: 107 } hitcount: 12053754
> { delta: 108 } hitcount: 20241711
> { delta: 109 } hitcount: 14850200
> { delta: 110 } hitcount: 4946599
> { delta: 111 } hitcount: 3479315
> { delta: 112 } hitcount: 18698299
> { delta: 113 } hitcount: 62388733
> { delta: 114 } hitcount: 95803834
> { delta: 115 } hitcount: 58278130
> { delta: 116 } hitcount: 15364800
> { delta: 117 } hitcount: 5586866
> { delta: 118 } hitcount: 2346880
> { delta: 119 } hitcount: 1131091
> { delta: 120 } hitcount: 620896
> { delta: 121 } hitcount: 236652
> { delta: 122 } hitcount: 105957
> { delta: 123 } hitcount: 119107
> { delta: 124 } hitcount: 54494
> { delta: 125 } hitcount: 63856
> { delta: 126 } hitcount: 64454
> { delta: 127 } hitcount: 34818
> { delta: 128 } hitcount: 41446
> { delta: 129 } hitcount: 51242
> { delta: 130 } hitcount: 28361
> { delta: 131 } hitcount: 23926
> { delta: 132 } hitcount: 22253
> { delta: 133 } hitcount: 16994
> { delta: 134 } hitcount: 14970
> { delta: 135 } hitcount: 13464
> { delta: 136 } hitcount: 11452
> { delta: 137 } hitcount: 12212
> { delta: 138 } hitcount: 12280
> { delta: 139 } hitcount: 9127
> { delta: 140 } hitcount: 9553
>
> Where the bulk was around 114ns and the fast time was 103ns.
>
> That's almost a 20% speedup!!!
>
> Changes since v1: https://lore.kernel.org/all/20220823214606.344269352@goodmis.org/
>
> - Fixed combining constants (Masami Hiramatsu)
This looks good to me and confirmed that the previous issue is solved.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Thank you!
>
> Steven Rostedt (Google) (4):
> tracing: Add numeric delta time to the trace event benchmark
> tracing/hist: Call hist functions directly via a switch statement
> tracing: Move struct filter_pred into trace_events_filter.c
> tracing/filter: Call filter predicate functions directly via a switch statement
>
> ----
> kernel/trace/trace.h | 13 --
> kernel/trace/trace_benchmark.c | 2 +-
> kernel/trace/trace_benchmark.h | 8 +-
> kernel/trace/trace_events_filter.c | 239 ++++++++++++++++++++++++-----------
> kernel/trace/trace_events_hist.c | 246 +++++++++++++++++++++++++------------
> 5 files changed, 343 insertions(+), 165 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-09 12:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-06 22:53 [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 1/4] tracing: Add numeric delta time to the trace event benchmark Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 2/4] tracing/hist: Call hist functions directly via a switch statement Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 3/4] tracing: Move struct filter_pred into trace_events_filter.c Steven Rostedt
2022-09-06 22:53 ` [PATCH v2 4/4] tracing/filter: Call filter predicate functions directly via a switch statement Steven Rostedt
2022-09-09 12:05 ` [PATCH v2 0/4] tracing: Have filters and histograms use a call table instead of pointers Masami Hiramatsu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox