[PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering

BPF List
 help / color / mirror / Atom feed

* [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering
@ 2022-09-21 16:42 Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 1/4] selftests/bpf: fix double bpf_object__close() in veristate Andrii Nakryiko
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-21 16:42 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team

Add three more critical features to veristat tool, which make it sufficient
for a practical work on BPF verifier:

  - CSV output, which allows easier programmatic post-processing of stats;

  - building upon CSV output, veristat now supports comparison mode, in which
    two previously captured CSV outputs from veristat are compared with each
    other in a convenient form;

  - flexible allow/deny filtering using globs for BPF object files and
    programs, allowing to narrow down target BPF programs to be verified.

See individual patches for more details and examples.

v1->v2:
- split out double-free fix into patch #1 (Yonghong);
- fixed typo in verbose flag (Quentin);
- baseline and comparison stats were reversed in output table, fixed that.

Andrii Nakryiko (4):
  selftests/bpf: fix double bpf_object__close() in veristate
  selftests/bpf: add CSV output mode for veristat
  selftests/bpf: add comparison mode to veristat
  selftests/bpf: add ability to filter programs in veristat

 tools/testing/selftests/bpf/veristat.c   | 853 ++++++++++++++++++++---
 tools/testing/selftests/bpf/veristat.cfg |  17 +
 2 files changed, 787 insertions(+), 83 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/veristat.cfg

-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 bpf-next 1/4] selftests/bpf: fix double bpf_object__close() in veristate
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
@ 2022-09-21 16:42 ` Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 2/4] selftests/bpf: add CSV output mode for veristat Andrii Nakryiko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-21 16:42 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team

bpf_object__close(obj) is called twice for BPF object files with single
BPF program in it. This causes crash. Fix this by not calling
bpf_object__close() unnecessarily.

Fixes: c8bc5e050976 ("selftests/bpf: Add veristat tool for mass-verifying BPF object files")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/veristat.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c
index 39e6dc41e504..c0c8a65cda52 100644
--- a/tools/testing/selftests/bpf/veristat.c
+++ b/tools/testing/selftests/bpf/veristat.c
@@ -300,7 +300,6 @@ static int process_obj(const char *filename)
 		prog = bpf_object__next_program(obj, NULL);
 		bpf_program__set_autoload(prog, true);
 		process_prog(filename, obj, prog);
-		bpf_object__close(obj);
 		goto cleanup;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 bpf-next 2/4] selftests/bpf: add CSV output mode for veristat
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 1/4] selftests/bpf: fix double bpf_object__close() in veristate Andrii Nakryiko
@ 2022-09-21 16:42 ` Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 3/4] selftests/bpf: add comparison mode to veristat Andrii Nakryiko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-21 16:42 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team

Teach veristat to output results as CSV table for easier programmatic
processing. Change what was --output/-o argument to now be --emit/-e.
And then use --output-format/-o <fmt> to specify output format.
Currently "table" and "csv" is supported, table being default.

For CSV output mode veristat is using spec identifiers as column names.
E.g., instead of "Total states" veristat uses "total_states" as a CSV
header name.

Internally veristat recognizes three formats, one of them
(RESFMT_TABLE_CALCLEN) is a special format instructing veristat to
calculate column widths for table output. This felt a bit cleaner and
more uniform than either creating separate functions just for this.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/veristat.c | 111 +++++++++++++++++--------
 1 file changed, 75 insertions(+), 36 deletions(-)

diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c
index c0c8a65cda52..0472bfae3c9d 100644
--- a/tools/testing/selftests/bpf/veristat.c
+++ b/tools/testing/selftests/bpf/veristat.c
@@ -46,10 +46,17 @@ struct stat_specs {
 	int lens[ALL_STATS_CNT];
 };
 
+enum resfmt {
+	RESFMT_TABLE,
+	RESFMT_TABLE_CALCLEN, /* fake format to pre-calculate table's column widths */
+	RESFMT_CSV,
+};
+
 static struct env {
 	char **filenames;
 	int filename_cnt;
 	bool verbose;
+	enum resfmt out_fmt;
 
 	struct verif_stats *prog_stats;
 	int prog_stat_cnt;
@@ -78,8 +85,9 @@ const char argp_program_doc[] =
 static const struct argp_option opts[] = {
 	{ NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
 	{ "verbose", 'v', NULL, 0, "Verbose mode" },
-	{ "output", 'o', "SPEC", 0, "Specify output stats" },
+	{ "emit", 'e', "SPEC", 0, "Specify stats to be emitted" },
 	{ "sort", 's', "SPEC", 0, "Specify sort order" },
+	{ "output-format", 'o', "FMT", 0, "Result output format (table, csv), default is table." },
 	{},
 };
 
@@ -97,7 +105,7 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 	case 'v':
 		env.verbose = true;
 		break;
-	case 'o':
+	case 'e':
 		err = parse_stats(arg, &env.output_spec);
 		if (err)
 			return err;
@@ -107,6 +115,16 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 		if (err)
 			return err;
 		break;
+	case 'o':
+		if (strcmp(arg, "table") == 0) {
+			env.out_fmt = RESFMT_TABLE;
+		} else if (strcmp(arg, "csv") == 0) {
+			env.out_fmt = RESFMT_CSV;
+		} else {
+			fprintf(stderr, "Unrecognized output format '%s'\n", arg);
+			return -EINVAL;
+		}
+		break;
 	case ARGP_KEY_ARG:
 		tmp = realloc(env.filenames, (env.filename_cnt + 1) * sizeof(*env.filenames));
 		if (!tmp)
@@ -147,7 +165,7 @@ static struct stat_def {
 	[FILE_NAME] = { "File", {"file_name", "filename", "file"}, true /* asc */ },
 	[PROG_NAME] = { "Program", {"prog_name", "progname", "prog"}, true /* asc */ },
 	[VERDICT] = { "Verdict", {"verdict"}, true /* asc: failure, success */ },
-	[DURATION] = { "Duration, us", {"duration", "dur"}, },
+	[DURATION] = { "Duration (us)", {"duration", "dur"}, },
 	[TOTAL_INSNS] = { "Total insns", {"total_insns", "insns"}, },
 	[TOTAL_STATES] = { "Total states", {"total_states", "states"}, },
 	[PEAK_STATES] = { "Peak states", {"peak_states"}, },
@@ -385,7 +403,21 @@ static int cmp_prog_stats(const void *v1, const void *v2)
 #define HEADER_CHAR '-'
 #define COLUMN_SEP "  "
 
-static void output_headers(bool calc_len)
+static void output_header_underlines(void)
+{
+	int i, j, len;
+
+	for (i = 0; i < env.output_spec.spec_cnt; i++) {
+		len = env.output_spec.lens[i];
+
+		printf("%s", i == 0 ? "" : COLUMN_SEP);
+		for (j = 0; j < len; j++)
+			printf("%c", HEADER_CHAR);
+	}
+	printf("\n");
+}
+
+static void output_headers(enum resfmt fmt)
 {
 	int i, len;
 
@@ -393,34 +425,30 @@ static void output_headers(bool calc_len)
 		int id = env.output_spec.ids[i];
 		int *max_len = &env.output_spec.lens[i];
 
-		if (calc_len) {
+		switch (fmt) {
+		case RESFMT_TABLE_CALCLEN:
 			len = snprintf(NULL, 0, "%s", stat_defs[id].header);
 			if (len > *max_len)
 				*max_len = len;
-		} else {
+			break;
+		case RESFMT_TABLE:
 			printf("%s%-*s", i == 0 ? "" : COLUMN_SEP,  *max_len, stat_defs[id].header);
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
+		case RESFMT_CSV:
+			printf("%s%s", i == 0 ? "" : ",", stat_defs[id].names[0]);
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
 		}
 	}
 
-	if (!calc_len)
-		printf("\n");
+	if (fmt == RESFMT_TABLE)
+		output_header_underlines();
 }
 
-static void output_header_underlines(void)
-{
-	int i, j, len;
-
-	for (i = 0; i < env.output_spec.spec_cnt; i++) {
-		len = env.output_spec.lens[i];
-
-		printf("%s", i == 0 ? "" : COLUMN_SEP);
-		for (j = 0; j < len; j++)
-			printf("%c", HEADER_CHAR);
-	}
-	printf("\n");
-}
-
-static void output_stats(const struct verif_stats *s, bool calc_len)
+static void output_stats(const struct verif_stats *s, enum resfmt fmt, bool last)
 {
 	int i;
 
@@ -453,23 +481,36 @@ static void output_stats(const struct verif_stats *s, bool calc_len)
 			exit(1);
 		}
 
-		if (calc_len) {
+		switch (fmt) {
+		case RESFMT_TABLE_CALCLEN:
 			if (str)
 				len = snprintf(NULL, 0, "%s", str);
 			else
 				len = snprintf(NULL, 0, "%ld", val);
 			if (len > *max_len)
 				*max_len = len;
-		} else {
+			break;
+		case RESFMT_TABLE:
 			if (str)
 				printf("%s%-*s", i == 0 ? "" : COLUMN_SEP, *max_len, str);
 			else
 				printf("%s%*ld", i == 0 ? "" : COLUMN_SEP,  *max_len, val);
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
+		case RESFMT_CSV:
+			if (str)
+				printf("%s%s", i == 0 ? "" : ",", str);
+			else
+				printf("%s%ld", i == 0 ? "" : ",", val);
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
 		}
 	}
 
-	if (!calc_len)
-		printf("\n");
+	if (last && fmt == RESFMT_TABLE)
+		output_header_underlines();
 }
 
 int main(int argc, char **argv)
@@ -505,20 +546,18 @@ int main(int argc, char **argv)
 
 	qsort(env.prog_stats, env.prog_stat_cnt, sizeof(*env.prog_stats), cmp_prog_stats);
 
-	/* calculate column widths */
-	output_headers(true);
-	for (i = 0; i < env.prog_stat_cnt; i++) {
-		output_stats(&env.prog_stats[i], true);
+	if (env.out_fmt == RESFMT_TABLE) {
+		/* calculate column widths */
+		output_headers(RESFMT_TABLE_CALCLEN);
+		for (i = 0; i < env.prog_stat_cnt; i++)
+			output_stats(&env.prog_stats[i], RESFMT_TABLE_CALCLEN, false);
 	}
 
 	/* actually output the table */
-	output_headers(false);
-	output_header_underlines();
+	output_headers(env.out_fmt);
 	for (i = 0; i < env.prog_stat_cnt; i++) {
-		output_stats(&env.prog_stats[i], false);
+		output_stats(&env.prog_stats[i], env.out_fmt, i == env.prog_stat_cnt - 1);
 	}
-	output_header_underlines();
-	printf("\n");
 
 	printf("Done. Processed %d object files, %d programs.\n",
 	       env.filename_cnt, env.prog_stat_cnt);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 bpf-next 3/4] selftests/bpf: add comparison mode to veristat
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 1/4] selftests/bpf: fix double bpf_object__close() in veristate Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 2/4] selftests/bpf: add CSV output mode for veristat Andrii Nakryiko
@ 2022-09-21 16:42 ` Andrii Nakryiko
  2022-09-21 16:42 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add ability to filter programs in veristat Andrii Nakryiko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-21 16:42 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team

Add ability to compare and contrast two veristat runs, previously
recorded with veristat using CSV output format.

When veristat is called with -C (--compare) flag, veristat expects
exactly two input files specified, both should be in CSV format.
Expectation is that it's output from previous veristat runs, but as long
as column names and formats match, it should just work. First CSV file
is designated as a "baseline" provided, and the second one is
comparison (experiment) data set. Establishing baseline matters later
when calculating difference percentages, see below.

Veristat parses these two CSV files and "reconstructs" verifier stats
(it could be just a subset of all possible stats). File and program
names are mandatory as they are used as joining key (these two "stats"
are designated as "key stats" in the code).

Veristat currently enforces that the set of stats recorded in both CSV
has to exactly match, down to exact order. This is just a simplifying
condition which can be lifted with a bit of additional pre-processing to
reorded stat specs internally, which I didn't bother doing, yet.

For all the non-key stats, veristat will output three columns: one for
baseline data, one for comparison data, and one with an absolute and
relative percentage difference. If either baseline or comparison values
are missing (that is, respective CSV file doesn't have a row with
*exactly* matching file and program name), those values are assumed to
be empty or zero. In such case relative percentages are forced to +100%
or -100% output, for consistency with a typical case.

Veristat's -e (--emit) and -s (--sort) specs still apply, so even if CSV
contains lots of stats, user can request to compare only a subset of
them (and specify desired column order as well). Similarly, both CSV and
human-readable table output is honored. Note that input is currently
always expected to be CSV.

Here's an example shell session, recording data for biosnoop tool on two
different kernels and comparing them afterwards, outputting data in table
format.

  # on slightly older production kernel
  $ sudo ./veristat biosnoop_bpf.o
  File            Program                   Verdict  Duration (us)  Total insns  Total states  Peak states
  --------------  ------------------------  -------  -------------  -----------  ------------  -----------
  biosnoop_bpf.o  blk_account_io_merge_bio  success             37           24             1            1
  biosnoop_bpf.o  blk_account_io_start      failure              0            0             0            0
  biosnoop_bpf.o  block_rq_complete         success             76          104             6            6
  biosnoop_bpf.o  block_rq_insert           success             83           85             7            7
  biosnoop_bpf.o  block_rq_issue            success             79           85             7            7
  --------------  ------------------------  -------  -------------  -----------  ------------  -----------
  Done. Processed 1 object files, 5 programs.
  $ sudo ./veristat ~/local/tmp/fbcode-bpf-objs/biosnoop_bpf.o -o csv > baseline.csv
  $ cat baseline.csv
  file_name,prog_name,verdict,duration,total_insns,total_states,peak_states
  biosnoop_bpf.o,blk_account_io_merge_bio,success,36,24,1,1
  biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0
  biosnoop_bpf.o,block_rq_complete,success,82,104,6,6
  biosnoop_bpf.o,block_rq_insert,success,78,85,7,7
  biosnoop_bpf.o,block_rq_issue,success,74,85,7,7

  # on latest bpf-next kernel
  $ sudo ./veristat biosnoop_bpf.o
  File            Program                   Verdict  Duration (us)  Total insns  Total states  Peak states
  --------------  ------------------------  -------  -------------  -----------  ------------  -----------
  biosnoop_bpf.o  blk_account_io_merge_bio  success             31           24             1            1
  biosnoop_bpf.o  blk_account_io_start      failure              0            0             0            0
  biosnoop_bpf.o  block_rq_complete         success             76          104             6            6
  biosnoop_bpf.o  block_rq_insert           success             83           91             7            7
  biosnoop_bpf.o  block_rq_issue            success             74           91             7            7
  --------------  ------------------------  -------  -------------  -----------  ------------  -----------
  Done. Processed 1 object files, 5 programs.
  $ sudo ./veristat biosnoop_bpf.o -o csv > comparison.csv
  $ cat comparison.csv
  file_name,prog_name,verdict,duration,total_insns,total_states,peak_states
  biosnoop_bpf.o,blk_account_io_merge_bio,success,71,24,1,1
  biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0
  biosnoop_bpf.o,block_rq_complete,success,82,104,6,6
  biosnoop_bpf.o,block_rq_insert,success,83,91,7,7
  biosnoop_bpf.o,block_rq_issue,success,87,91,7,7

  # now let's compare with human-readable output (note that no sudo needed)
  # we also ignore verification duration in this case to shortned output
  $ ./veristat -C baseline.csv comparison.csv -e file,prog,verdict,insns
  File            Program                   Verdict (A)  Verdict (B)  Verdict (DIFF)  Total insns (A)  Total insns (B)  Total insns (DIFF)
  --------------  ------------------------  -----------  -----------  --------------  ---------------  ---------------  ------------------
  biosnoop_bpf.o  blk_account_io_merge_bio  success      success      MATCH                        24               24         +0 (+0.00%)
  biosnoop_bpf.o  blk_account_io_start      failure      failure      MATCH                         0                0       +0 (+100.00%)
  biosnoop_bpf.o  block_rq_complete         success      success      MATCH                       104              104         +0 (+0.00%)
  biosnoop_bpf.o  block_rq_insert           success      success      MATCH                        91               85         -6 (-6.59%)
  biosnoop_bpf.o  block_rq_issue            success      success      MATCH                        91               85         -6 (-6.59%)
  --------------  ------------------------  -----------  -----------  --------------  ---------------  ---------------  ------------------

While not particularly exciting example (it turned out to be kind of hard to
quickly find a nice example with significant difference just because of kernel
version bump), it should demonstrate main features.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/veristat.c | 543 ++++++++++++++++++++++---
 1 file changed, 492 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c
index 0472bfae3c9d..c6837bac357f 100644
--- a/tools/testing/selftests/bpf/veristat.c
+++ b/tools/testing/selftests/bpf/veristat.c
@@ -43,7 +43,7 @@ struct stat_specs {
 	int spec_cnt;
 	enum stat_id ids[ALL_STATS_CNT];
 	bool asc[ALL_STATS_CNT];
-	int lens[ALL_STATS_CNT];
+	int lens[ALL_STATS_CNT * 3]; /* 3x for comparison mode */
 };
 
 enum resfmt {
@@ -57,16 +57,20 @@ static struct env {
 	int filename_cnt;
 	bool verbose;
 	enum resfmt out_fmt;
+	bool comparison_mode;
 
 	struct verif_stats *prog_stats;
 	int prog_stat_cnt;
 
+	/* baseline_stats is allocated and used only in comparsion mode */
+	struct verif_stats *baseline_stats;
+	int baseline_stat_cnt;
+
 	struct stat_specs output_spec;
 	struct stat_specs sort_spec;
 } env;
 
-static int libbpf_print_fn(enum libbpf_print_level level,
-		    const char *format, va_list args)
+static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
 {
 	if (!env.verbose)
 		return 0;
@@ -78,9 +82,10 @@ static int libbpf_print_fn(enum libbpf_print_level level,
 const char *argp_program_version = "veristat";
 const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
 const char argp_program_doc[] =
-"veristat    BPF verifier stats collection tool.\n"
+"veristat    BPF verifier stats collection and comparison tool.\n"
 "\n"
-"USAGE: veristat <obj-file> [<obj-file>...]\n";
+"USAGE: veristat <obj-file> [<obj-file>...]\n"
+"   OR: veristat -C <baseline.csv> <comparison.csv>\n";
 
 static const struct argp_option opts[] = {
 	{ NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
@@ -88,6 +93,7 @@ static const struct argp_option opts[] = {
 	{ "emit", 'e', "SPEC", 0, "Specify stats to be emitted" },
 	{ "sort", 's', "SPEC", 0, "Specify sort order" },
 	{ "output-format", 'o', "FMT", 0, "Result output format (table, csv), default is table." },
+	{ "compare", 'C', NULL, 0, "Comparison mode" },
 	{},
 };
 
@@ -125,6 +131,9 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 			return -EINVAL;
 		}
 		break;
+	case 'C':
+		env.comparison_mode = true;
+		break;
 	case ARGP_KEY_ARG:
 		tmp = realloc(env.filenames, (env.filename_cnt + 1) * sizeof(*env.filenames));
 		if (!tmp)
@@ -141,6 +150,12 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 	return 0;
 }
 
+static const struct argp argp = {
+	.options = opts,
+	.parser = parse_arg,
+	.doc = argp_program_doc,
+};
+
 static const struct stat_specs default_output_spec = {
 	.spec_cnt = 7,
 	.ids = {
@@ -219,6 +234,20 @@ static int parse_stats(const char *stats_str, struct stat_specs *specs)
 	return 0;
 }
 
+static void free_verif_stats(struct verif_stats *stats, size_t stat_cnt)
+{
+	int i;
+
+	if (!stats)
+		return;
+
+	for (i = 0; i < stat_cnt; i++) {
+		free(stats[i].file_name);
+		free(stats[i].prog_name);
+	}
+	free(stats);
+}
+
 static char verif_log_buf[64 * 1024];
 
 static int parse_verif_log(const char *buf, size_t buf_sz, struct verif_stats *s)
@@ -448,6 +477,33 @@ static void output_headers(enum resfmt fmt)
 		output_header_underlines();
 }
 
+static void prepare_value(const struct verif_stats *s, enum stat_id id,
+			  const char **str, long *val)
+{
+	switch (id) {
+	case FILE_NAME:
+		*str = s->file_name;
+		break;
+	case PROG_NAME:
+		*str = s->prog_name;
+		break;
+	case VERDICT:
+		*str = s->stats[VERDICT] ? "success" : "failure";
+		break;
+	case DURATION:
+	case TOTAL_INSNS:
+	case TOTAL_STATES:
+	case PEAK_STATES:
+	case MAX_STATES_PER_INSN:
+	case MARK_READ_MAX_LEN:
+		*val = s->stats[id];
+		break;
+	default:
+		fprintf(stderr, "Unrecognized stat #%d\n", id);
+		exit(1);
+	}
+}
+
 static void output_stats(const struct verif_stats *s, enum resfmt fmt, bool last)
 {
 	int i;
@@ -458,28 +514,7 @@ static void output_stats(const struct verif_stats *s, enum resfmt fmt, bool last
 		const char *str = NULL;
 		long val = 0;
 
-		switch (id) {
-		case FILE_NAME:
-			str = s->file_name;
-			break;
-		case PROG_NAME:
-			str = s->prog_name;
-			break;
-		case VERDICT:
-			str = s->stats[VERDICT] ? "success" : "failure";
-			break;
-		case DURATION:
-		case TOTAL_INSNS:
-		case TOTAL_STATES:
-		case PEAK_STATES:
-		case MAX_STATES_PER_INSN:
-		case MARK_READ_MAX_LEN:
-			val = s->stats[id];
-			break;
-		default:
-			fprintf(stderr, "Unrecognized stat #%d\n", id);
-			exit(1);
-		}
+		prepare_value(s, id, &str, &val);
 
 		switch (fmt) {
 		case RESFMT_TABLE_CALCLEN:
@@ -509,38 +544,28 @@ static void output_stats(const struct verif_stats *s, enum resfmt fmt, bool last
 		}
 	}
 
-	if (last && fmt == RESFMT_TABLE)
+	if (last && fmt == RESFMT_TABLE) {
 		output_header_underlines();
+		printf("Done. Processed %d object files, %d programs.\n",
+		       env.filename_cnt, env.prog_stat_cnt);
+	}
 }
 
-int main(int argc, char **argv)
+static int handle_verif_mode(void)
 {
-	static const struct argp argp = {
-		.options = opts,
-		.parser = parse_arg,
-		.doc = argp_program_doc,
-	};
-	int err = 0, i;
-
-	if (argp_parse(&argp, argc, argv, 0, NULL, NULL))
-		return 1;
+	int i, err;
 
 	if (env.filename_cnt == 0) {
 		fprintf(stderr, "Please provide path to BPF object file!\n");
 		argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat");
-		return 1;
+		return -EINVAL;
 	}
 
-	if (env.output_spec.spec_cnt == 0)
-		env.output_spec = default_output_spec;
-	if (env.sort_spec.spec_cnt == 0)
-		env.sort_spec = default_sort_spec;
-
 	for (i = 0; i < env.filename_cnt; i++) {
 		err = process_obj(env.filenames[i]);
 		if (err) {
 			fprintf(stderr, "Failed to process '%s': %d\n", env.filenames[i], err);
-			goto cleanup;
+			return err;
 		}
 	}
 
@@ -559,15 +584,431 @@ int main(int argc, char **argv)
 		output_stats(&env.prog_stats[i], env.out_fmt, i == env.prog_stat_cnt - 1);
 	}
 
-	printf("Done. Processed %d object files, %d programs.\n",
-	       env.filename_cnt, env.prog_stat_cnt);
+	return 0;
+}
+
+static int parse_stat_value(const char *str, enum stat_id id, struct verif_stats *st)
+{
+	switch (id) {
+	case FILE_NAME:
+		st->file_name = strdup(str);
+		if (!st->file_name)
+			return -ENOMEM;
+		break;
+	case PROG_NAME:
+		st->prog_name = strdup(str);
+		if (!st->prog_name)
+			return -ENOMEM;
+		break;
+	case VERDICT:
+		if (strcmp(str, "success") == 0) {
+			st->stats[VERDICT] = true;
+		} else if (strcmp(str, "failure") == 0) {
+			st->stats[VERDICT] = false;
+		} else {
+			fprintf(stderr, "Unrecognized verification verdict '%s'\n", str);
+			return -EINVAL;
+		}
+		break;
+	case DURATION:
+	case TOTAL_INSNS:
+	case TOTAL_STATES:
+	case PEAK_STATES:
+	case MAX_STATES_PER_INSN:
+	case MARK_READ_MAX_LEN: {
+		long val;
+		int err, n;
+
+		if (sscanf(str, "%ld %n", &val, &n) != 1 || n != strlen(str)) {
+			err = -errno;
+			fprintf(stderr, "Failed to parse '%s' as integer\n", str);
+			return err;
+		}
+
+		st->stats[id] = val;
+		break;
+	}
+	default:
+		fprintf(stderr, "Unrecognized stat #%d\n", id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int parse_stats_csv(const char *filename, struct stat_specs *specs,
+			   struct verif_stats **statsp, int *stat_cntp)
+{
+	char line[4096];
+	FILE *f;
+	int err = 0;
+	bool header = true;
+
+	f = fopen(filename, "r");
+	if (!f) {
+		err = -errno;
+		fprintf(stderr, "Failed to open '%s': %d\n", filename, err);
+		return err;
+	}
+
+	*stat_cntp = 0;
+
+	while (fgets(line, sizeof(line), f)) {
+		char *input = line, *state = NULL, *next;
+		struct verif_stats *st = NULL;
+		int col = 0;
+
+		if (!header) {
+			void *tmp;
+
+			tmp = realloc(*statsp, (*stat_cntp + 1) * sizeof(**statsp));
+			if (!tmp) {
+				err = -ENOMEM;
+				goto cleanup;
+			}
+			*statsp = tmp;
+			st = &(*statsp)[*stat_cntp];
+			*stat_cntp += 1;
+		}
+
+		while ((next = strtok_r(state ? NULL : input, ",\n", &state))) {
+			if (header) {
+				/* for the first line, set up spec stats */
+				err = parse_stat(next, specs);
+				if (err)
+					goto cleanup;
+				continue;
+			}
+
+			/* for all other lines, parse values based on spec */
+			if (col >= specs->spec_cnt) {
+				fprintf(stderr, "Found extraneous column #%d in row #%d of '%s'\n",
+					col, *stat_cntp, filename);
+				err = -EINVAL;
+				goto cleanup;
+			}
+			err = parse_stat_value(next, specs->ids[col], st);
+			if (err)
+				goto cleanup;
+			col++;
+		}
+
+		if (!header && col < specs->spec_cnt) {
+			fprintf(stderr, "Not enough columns in row #%d in '%s'\n",
+				*stat_cntp, filename);
+			err = -EINVAL;
+			goto cleanup;
+		}
+
+		header = false;
+	}
+
+	if (!feof(f)) {
+		err = -errno;
+		fprintf(stderr, "Failed I/O for '%s': %d\n", filename, err);
+	}
 
 cleanup:
-	for (i = 0; i < env.prog_stat_cnt; i++) {
-		free(env.prog_stats[i].file_name);
-		free(env.prog_stats[i].prog_name);
+	fclose(f);
+	return err;
+}
+
+/* empty/zero stats for mismatched rows */
+static const struct verif_stats fallback_stats = { .file_name = "", .prog_name = "" };
+
+static bool is_key_stat(enum stat_id id)
+{
+	return id == FILE_NAME || id == PROG_NAME;
+}
+
+static void output_comp_header_underlines(void)
+{
+	int i, j, k;
+
+	for (i = 0; i < env.output_spec.spec_cnt; i++) {
+		int id = env.output_spec.ids[i];
+		int max_j = is_key_stat(id) ? 1 : 3;
+
+		for (j = 0; j < max_j; j++) {
+			int len = env.output_spec.lens[3 * i + j];
+
+			printf("%s", i + j == 0 ? "" : COLUMN_SEP);
+
+			for (k = 0; k < len; k++)
+				printf("%c", HEADER_CHAR);
+		}
+	}
+	printf("\n");
+}
+
+static void output_comp_headers(enum resfmt fmt)
+{
+	static const char *table_sfxs[3] = {" (A)", " (B)", " (DIFF)"};
+	static const char *name_sfxs[3] = {"_base", "_comp", "_diff"};
+	int i, j, len;
+
+	for (i = 0; i < env.output_spec.spec_cnt; i++) {
+		int id = env.output_spec.ids[i];
+		/* key stats don't have A/B/DIFF columns, they are common for both data sets */
+		int max_j = is_key_stat(id) ? 1 : 3;
+
+		for (j = 0; j < max_j; j++) {
+			int *max_len = &env.output_spec.lens[3 * i + j];
+			bool last = (i == env.output_spec.spec_cnt - 1) && (j == max_j - 1);
+			const char *sfx;
+
+			switch (fmt) {
+			case RESFMT_TABLE_CALCLEN:
+				sfx = is_key_stat(id) ? "" : table_sfxs[j];
+				len = snprintf(NULL, 0, "%s%s", stat_defs[id].header, sfx);
+				if (len > *max_len)
+					*max_len = len;
+				break;
+			case RESFMT_TABLE:
+				sfx = is_key_stat(id) ? "" : table_sfxs[j];
+				printf("%s%-*s%s", i + j == 0 ? "" : COLUMN_SEP,
+				       *max_len - (int)strlen(sfx), stat_defs[id].header, sfx);
+				if (last)
+					printf("\n");
+				break;
+			case RESFMT_CSV:
+				sfx = is_key_stat(id) ? "" : name_sfxs[j];
+				printf("%s%s%s", i + j == 0 ? "" : ",", stat_defs[id].names[0], sfx);
+				if (last)
+					printf("\n");
+				break;
+			}
+		}
+	}
+
+	if (fmt == RESFMT_TABLE)
+		output_comp_header_underlines();
+}
+
+static void output_comp_stats(const struct verif_stats *base, const struct verif_stats *comp,
+			      enum resfmt fmt, bool last)
+{
+	char base_buf[1024] = {}, comp_buf[1024] = {}, diff_buf[1024] = {};
+	int i;
+
+	for (i = 0; i < env.output_spec.spec_cnt; i++) {
+		int id = env.output_spec.ids[i], len;
+		int *max_len_base = &env.output_spec.lens[3 * i + 0];
+		int *max_len_comp = &env.output_spec.lens[3 * i + 1];
+		int *max_len_diff = &env.output_spec.lens[3 * i + 2];
+		const char *base_str = NULL, *comp_str = NULL;
+		long base_val = 0, comp_val = 0, diff_val = 0;
+
+		prepare_value(base, id, &base_str, &base_val);
+		prepare_value(comp, id, &comp_str, &comp_val);
+
+		/* normalize all the outputs to be in string buffers for simplicity */
+		if (is_key_stat(id)) {
+			/* key stats (file and program name) are always strings */
+			if (base != &fallback_stats)
+				snprintf(base_buf, sizeof(base_buf), "%s", base_str);
+			else
+				snprintf(base_buf, sizeof(base_buf), "%s", comp_str);
+		} else if (base_str) {
+			snprintf(base_buf, sizeof(base_buf), "%s", base_str);
+			snprintf(comp_buf, sizeof(comp_buf), "%s", comp_str);
+			if (strcmp(base_str, comp_str) == 0)
+				snprintf(diff_buf, sizeof(diff_buf), "%s", "MATCH");
+			else
+				snprintf(diff_buf, sizeof(diff_buf), "%s", "MISMATCH");
+		} else {
+			snprintf(base_buf, sizeof(base_buf), "%ld", base_val);
+			snprintf(comp_buf, sizeof(comp_buf), "%ld", comp_val);
+
+			diff_val = comp_val - base_val;
+			if (base == &fallback_stats || comp == &fallback_stats || base_val == 0) {
+				snprintf(diff_buf, sizeof(diff_buf), "%+ld (%+.2lf%%)",
+					 diff_val, comp_val < base_val ? -100.0 : 100.0);
+			} else {
+				snprintf(diff_buf, sizeof(diff_buf), "%+ld (%+.2lf%%)",
+					 diff_val, diff_val * 100.0 / base_val);
+			}
+		}
+
+		switch (fmt) {
+		case RESFMT_TABLE_CALCLEN:
+			len = strlen(base_buf);
+			if (len > *max_len_base)
+				*max_len_base = len;
+			if (!is_key_stat(id)) {
+				len = strlen(comp_buf);
+				if (len > *max_len_comp)
+					*max_len_comp = len;
+				len = strlen(diff_buf);
+				if (len > *max_len_diff)
+					*max_len_diff = len;
+			}
+			break;
+		case RESFMT_TABLE: {
+			/* string outputs are left-aligned, number outputs are right-aligned */
+			const char *fmt = base_str ? "%s%-*s" : "%s%*s";
+
+			printf(fmt, i == 0 ? "" : COLUMN_SEP, *max_len_base, base_buf);
+			if (!is_key_stat(id)) {
+				printf(fmt, COLUMN_SEP, *max_len_comp, comp_buf);
+				printf(fmt, COLUMN_SEP, *max_len_diff, diff_buf);
+			}
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
+		}
+		case RESFMT_CSV:
+			printf("%s%s", i == 0 ? "" : ",", base_buf);
+			if (!is_key_stat(id)) {
+				printf("%s%s", i == 0 ? "" : ",", comp_buf);
+				printf("%s%s", i == 0 ? "" : ",", diff_buf);
+			}
+			if (i == env.output_spec.spec_cnt - 1)
+				printf("\n");
+			break;
+		}
+	}
+
+	if (last && fmt == RESFMT_TABLE)
+		output_comp_header_underlines();
+}
+
+static int cmp_stats_key(const struct verif_stats *base, const struct verif_stats *comp)
+{
+	int r;
+
+	r = strcmp(base->file_name, comp->file_name);
+	if (r != 0)
+		return r;
+	return strcmp(base->prog_name, comp->prog_name);
+}
+
+static int handle_comparison_mode(void)
+{
+	struct stat_specs base_specs = {}, comp_specs = {};
+	enum resfmt cur_fmt;
+	int err, i, j;
+
+	if (env.filename_cnt != 2) {
+		fprintf(stderr, "Comparison mode expects exactly two input CSV files!\n");
+		argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat");
+		return -EINVAL;
+	}
+
+	err = parse_stats_csv(env.filenames[0], &base_specs,
+			      &env.baseline_stats, &env.baseline_stat_cnt);
+	if (err) {
+		fprintf(stderr, "Failed to parse stats from '%s': %d\n", env.filenames[0], err);
+		return err;
+	}
+	err = parse_stats_csv(env.filenames[1], &comp_specs,
+			      &env.prog_stats, &env.prog_stat_cnt);
+	if (err) {
+		fprintf(stderr, "Failed to parse stats from '%s': %d\n", env.filenames[1], err);
+		return err;
 	}
-	free(env.prog_stats);
+
+	/* To keep it simple we validate that the set and order of stats in
+	 * both CSVs are exactly the same. This can be lifted with a bit more
+	 * pre-processing later.
+	 */
+	if (base_specs.spec_cnt != comp_specs.spec_cnt) {
+		fprintf(stderr, "Number of stats in '%s' and '%s' differs (%d != %d)!\n",
+			env.filenames[0], env.filenames[1],
+			base_specs.spec_cnt, comp_specs.spec_cnt);
+		return -EINVAL;
+	}
+	for (i = 0; i < base_specs.spec_cnt; i++) {
+		if (base_specs.ids[i] != comp_specs.ids[i]) {
+			fprintf(stderr, "Stats composition differs between '%s' and '%s' (%s != %s)!\n",
+				env.filenames[0], env.filenames[1],
+				stat_defs[base_specs.ids[i]].names[0],
+				stat_defs[comp_specs.ids[i]].names[0]);
+			return -EINVAL;
+		}
+	}
+
+	qsort(env.prog_stats, env.prog_stat_cnt, sizeof(*env.prog_stats), cmp_prog_stats);
+	qsort(env.baseline_stats, env.baseline_stat_cnt, sizeof(*env.baseline_stats), cmp_prog_stats);
+
+	/* for human-readable table output we need to do extra pass to
+	 * calculate column widths, so we substitute current output format
+	 * with RESFMT_TABLE_CALCLEN and later revert it back to RESFMT_TABLE
+	 * and do everything again.
+	 */
+	if (env.out_fmt == RESFMT_TABLE)
+		cur_fmt = RESFMT_TABLE_CALCLEN;
+	else
+		cur_fmt = env.out_fmt;
+
+one_more_time:
+	output_comp_headers(cur_fmt);
+
+	/* If baseline and comparison datasets have different subset of rows
+	 * (we match by 'object + prog' as a unique key) then assume
+	 * empty/missing/zero value for rows that are missing in the opposite
+	 * data set
+	 */
+	i = j = 0;
+	while (i < env.baseline_stat_cnt || j < env.prog_stat_cnt) {
+		bool last = (i == env.baseline_stat_cnt - 1) || (j == env.prog_stat_cnt - 1);
+		const struct verif_stats *base, *comp;
+		int r;
+
+		base = i < env.baseline_stat_cnt ? &env.baseline_stats[i] : &fallback_stats;
+		comp = j < env.prog_stat_cnt ? &env.prog_stats[j] : &fallback_stats;
+
+		if (!base->file_name || !base->prog_name) {
+			fprintf(stderr, "Entry #%d in '%s' doesn't have file and/or program name specified!\n",
+				i, env.filenames[0]);
+			return -EINVAL;
+		}
+		if (!comp->file_name || !comp->prog_name) {
+			fprintf(stderr, "Entry #%d in '%s' doesn't have file and/or program name specified!\n",
+				j, env.filenames[1]);
+			return -EINVAL;
+		}
+
+		r = cmp_stats_key(base, comp);
+		if (r == 0) {
+			output_comp_stats(base, comp, cur_fmt, last);
+			i++;
+			j++;
+		} else if (comp == &fallback_stats || r < 0) {
+			output_comp_stats(base, &fallback_stats, cur_fmt, last);
+			i++;
+		} else {
+			output_comp_stats(&fallback_stats, comp, cur_fmt, last);
+			j++;
+		}
+	}
+
+	if (cur_fmt == RESFMT_TABLE_CALCLEN) {
+		cur_fmt = RESFMT_TABLE;
+		goto one_more_time; /* ... this time with feeling */
+	}
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int err = 0, i;
+
+	if (argp_parse(&argp, argc, argv, 0, NULL, NULL))
+		return 1;
+
+	if (env.output_spec.spec_cnt == 0)
+		env.output_spec = default_output_spec;
+	if (env.sort_spec.spec_cnt == 0)
+		env.sort_spec = default_sort_spec;
+
+	if (env.comparison_mode)
+		err = handle_comparison_mode();
+	else
+		err = handle_verif_mode();
+
+	free_verif_stats(env.prog_stats, env.prog_stat_cnt);
+	free_verif_stats(env.baseline_stats, env.baseline_stat_cnt);
 	for (i = 0; i < env.filename_cnt; i++)
 		free(env.filenames[i]);
 	free(env.filenames);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 bpf-next 4/4] selftests/bpf: add ability to filter programs in veristat
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2022-09-21 16:42 ` [PATCH v2 bpf-next 3/4] selftests/bpf: add comparison mode to veristat Andrii Nakryiko
@ 2022-09-21 16:42 ` Andrii Nakryiko
  2022-09-22  2:50 ` [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering patchwork-bot+netdevbpf
  2022-09-22  2:52 ` Alexei Starovoitov
  5 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-21 16:42 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team

Add -f (--filter) argument which accepts glob-based filters for
narrowing down what BPF object files and programs within them should be
processed by veristat. This filtering applies both to comparison and
main (verification) mode.

Filter can be of two forms:
  - file (object) filter: 'strobemeta*'; in this case all the programs
    within matching files are implicitly allowed (or denied, depending
    if it's positive or negative rule, see below);
  - file and prog filter: 'strobemeta*/*unroll*' will further filter
    programs within matching files to only allow those program names that
    match '*unroll*' glob.

As mentioned, filters can be positive (allowlisting) and negative
(denylisting). Negative filters should start with '!': '!strobemeta*'
will deny any filename which basename starts with "strobemeta".

Further, one extra special syntax is supported to allow more convenient
use in practice. Instead of specifying rule on the command line,
veristat allows to specify file that contains rules, both positive and
negative, one line per one filter. This is achieved with -f @<filepath>
use, where <filepath> points to a text file containing rules (negative
and positive rules can be mixed). For convenience empty lines and lines
starting with '#' are ignored. This feature is useful to have some
pre-canned list of object files and program names that are tested
repeatedly, allowing to check in a list of rules and quickly specify
them on the command line.

As a demonstration (and a short cut for nearest future), create a small
list of "interesting" BPF object files from selftests/bpf and commit it
as veristat.cfg. It currently includes 73 programs, most of which are
the most complex and largest BPF programs in selftests, as judged by
total verified instruction count and verifier states total.

If there is overlap between positive or negative filters, negative
filter takes precedence (denylisting is stronger than allowlisting). If
no allow filter is specified, veristat implicitly assumes '*/*' rule. If
no deny rule is specified, veristat (logically) assumes no negative
filters.

Also note that -f (just like -e and -s) can be specified multiple times
and their effect is cumulative.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/veristat.c   | 212 ++++++++++++++++++++++-
 tools/testing/selftests/bpf/veristat.cfg |  17 ++
 2 files changed, 227 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/veristat.cfg

diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c
index c6837bac357f..51030234b60a 100644
--- a/tools/testing/selftests/bpf/veristat.c
+++ b/tools/testing/selftests/bpf/veristat.c
@@ -52,6 +52,11 @@ enum resfmt {
 	RESFMT_CSV,
 };
 
+struct filter {
+	char *file_glob;
+	char *prog_glob;
+};
+
 static struct env {
 	char **filenames;
 	int filename_cnt;
@@ -68,6 +73,11 @@ static struct env {
 
 	struct stat_specs output_spec;
 	struct stat_specs sort_spec;
+
+	struct filter *allow_filters;
+	struct filter *deny_filters;
+	int allow_filter_cnt;
+	int deny_filter_cnt;
 } env;
 
 static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
@@ -94,10 +104,13 @@ static const struct argp_option opts[] = {
 	{ "sort", 's', "SPEC", 0, "Specify sort order" },
 	{ "output-format", 'o', "FMT", 0, "Result output format (table, csv), default is table." },
 	{ "compare", 'C', NULL, 0, "Comparison mode" },
+	{ "filter", 'f', "FILTER", 0, "Filter expressions (or @filename for file with expressions)." },
 	{},
 };
 
 static int parse_stats(const char *stats_str, struct stat_specs *specs);
+static int append_filter(struct filter **filters, int *cnt, const char *str);
+static int append_filter_file(const char *path);
 
 static error_t parse_arg(int key, char *arg, struct argp_state *state)
 {
@@ -134,6 +147,18 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
 	case 'C':
 		env.comparison_mode = true;
 		break;
+	case 'f':
+		if (arg[0] == '@')
+			err = append_filter_file(arg + 1);
+		else if (arg[0] == '!')
+			err = append_filter(&env.deny_filters, &env.deny_filter_cnt, arg + 1);
+		else
+			err = append_filter(&env.allow_filters, &env.allow_filter_cnt, arg);
+		if (err) {
+			fprintf(stderr, "Failed to collect program filter expressions: %d\n", err);
+			return err;
+		}
+		break;
 	case ARGP_KEY_ARG:
 		tmp = realloc(env.filenames, (env.filename_cnt + 1) * sizeof(*env.filenames));
 		if (!tmp)
@@ -156,6 +181,150 @@ static const struct argp argp = {
 	.doc = argp_program_doc,
 };
 
+
+/* Adapted from perf/util/string.c */
+static bool glob_matches(const char *str, const char *pat)
+{
+	while (*str && *pat && *pat != '*') {
+		if (*str != *pat)
+			return false;
+		str++;
+		pat++;
+	}
+	/* Check wild card */
+	if (*pat == '*') {
+		while (*pat == '*')
+			pat++;
+		if (!*pat) /* Tail wild card matches all */
+			return true;
+		while (*str)
+			if (glob_matches(str++, pat))
+				return true;
+	}
+	return !*str && !*pat;
+}
+
+static bool should_process_file(const char *filename)
+{
+	int i;
+
+	if (env.deny_filter_cnt > 0) {
+		for (i = 0; i < env.deny_filter_cnt; i++) {
+			if (glob_matches(filename, env.deny_filters[i].file_glob))
+				return false;
+		}
+	}
+
+	if (env.allow_filter_cnt == 0)
+		return true;
+
+	for (i = 0; i < env.allow_filter_cnt; i++) {
+		if (glob_matches(filename, env.allow_filters[i].file_glob))
+			return true;
+	}
+
+	return false;
+}
+
+static bool should_process_prog(const char *filename, const char *prog_name)
+{
+	int i;
+
+	if (env.deny_filter_cnt > 0) {
+		for (i = 0; i < env.deny_filter_cnt; i++) {
+			if (glob_matches(filename, env.deny_filters[i].file_glob))
+				return false;
+			if (!env.deny_filters[i].prog_glob)
+				continue;
+			if (glob_matches(prog_name, env.deny_filters[i].prog_glob))
+				return false;
+		}
+	}
+
+	if (env.allow_filter_cnt == 0)
+		return true;
+
+	for (i = 0; i < env.allow_filter_cnt; i++) {
+		if (!glob_matches(filename, env.allow_filters[i].file_glob))
+			continue;
+		/* if filter specifies only filename glob part, it implicitly
+		 * allows all progs within that file
+		 */
+		if (!env.allow_filters[i].prog_glob)
+			return true;
+		if (glob_matches(prog_name, env.allow_filters[i].prog_glob))
+			return true;
+	}
+
+	return false;
+}
+
+static int append_filter(struct filter **filters, int *cnt, const char *str)
+{
+	struct filter *f;
+	void *tmp;
+	const char *p;
+
+	tmp = realloc(*filters, (*cnt + 1) * sizeof(**filters));
+	if (!tmp)
+		return -ENOMEM;
+	*filters = tmp;
+
+	f = &(*filters)[*cnt];
+	f->file_glob = f->prog_glob = NULL;
+
+	/* filter can be specified either as "<obj-glob>" or "<obj-glob>/<prog-glob>" */
+	p = strchr(str, '/');
+	if (!p) {
+		f->file_glob = strdup(str);
+		if (!f->file_glob)
+			return -ENOMEM;
+	} else {
+		f->file_glob = strndup(str, p - str);
+		f->prog_glob = strdup(p + 1);
+		if (!f->file_glob || !f->prog_glob) {
+			free(f->file_glob);
+			free(f->prog_glob);
+			f->file_glob = f->prog_glob = NULL;
+			return -ENOMEM;
+		}
+	}
+
+	*cnt = *cnt + 1;
+	return 0;
+}
+
+static int append_filter_file(const char *path)
+{
+	char buf[1024];
+	FILE *f;
+	int err = 0;
+
+	f = fopen(path, "r");
+	if (!f) {
+		err = -errno;
+		fprintf(stderr, "Failed to open '%s': %d\n", path, err);
+		return err;
+	}
+
+	while (fscanf(f, " %1023[^\n]\n", buf) == 1) {
+		/* lines starting with # are comments, skip them */
+		if (buf[0] == '\0' || buf[0] == '#')
+			continue;
+		/* lines starting with ! are negative match filters */
+		if (buf[0] == '!')
+			err = append_filter(&env.deny_filters, &env.deny_filter_cnt, buf + 1);
+		else
+			err = append_filter(&env.allow_filters, &env.allow_filter_cnt, buf);
+		if (err)
+			goto cleanup;
+	}
+
+cleanup:
+	fclose(f);
+	return err;
+}
+
 static const struct stat_specs default_output_spec = {
 	.spec_cnt = 7,
 	.ids = {
@@ -283,6 +452,9 @@ static int process_prog(const char *filename, struct bpf_object *obj, struct bpf
 	int err = 0;
 	void *tmp;
 
+	if (!should_process_prog(basename(filename), bpf_program__name(prog)))
+		return 0;
+
 	tmp = realloc(env.prog_stats, (env.prog_stat_cnt + 1) * sizeof(*env.prog_stats));
 	if (!tmp)
 		return -ENOMEM;
@@ -330,6 +502,9 @@ static int process_obj(const char *filename)
 	LIBBPF_OPTS(bpf_object_open_opts, opts);
 	int err = 0, prog_cnt = 0;
 
+	if (!should_process_file(basename(filename)))
+		return 0;
+
 	old_libbpf_print_fn = libbpf_set_print(libbpf_print_fn);
 
 	obj = bpf_object__open_file(filename, &opts);
@@ -666,7 +841,10 @@ static int parse_stats_csv(const char *filename, struct stat_specs *specs,
 				goto cleanup;
 			}
 			*statsp = tmp;
+
 			st = &(*statsp)[*stat_cntp];
+			memset(st, 0, sizeof(*st));
+
 			*stat_cntp += 1;
 		}
 
@@ -692,14 +870,34 @@ static int parse_stats_csv(const char *filename, struct stat_specs *specs,
 			col++;
 		}
 
-		if (!header && col < specs->spec_cnt) {
+		if (header) {
+			header = false;
+			continue;
+		}
+
+		if (col < specs->spec_cnt) {
 			fprintf(stderr, "Not enough columns in row #%d in '%s'\n",
 				*stat_cntp, filename);
 			err = -EINVAL;
 			goto cleanup;
 		}
 
-		header = false;
+		if (!st->file_name || !st->prog_name) {
+			fprintf(stderr, "Row #%d in '%s' is missing file and/or program name\n",
+				*stat_cntp, filename);
+			err = -EINVAL;
+			goto cleanup;
+		}
+
+		/* in comparison mode we can only check filters after we
+		 * parsed entire line; if row should be ignored we pretend we
+		 * never parsed it
+		 */
+		if (!should_process_prog(st->file_name, st->prog_name)) {
+			free(st->file_name);
+			free(st->prog_name);
+			*stat_cntp -= 1;
+		}
 	}
 
 	if (!feof(f)) {
@@ -1012,5 +1210,15 @@ int main(int argc, char **argv)
 	for (i = 0; i < env.filename_cnt; i++)
 		free(env.filenames[i]);
 	free(env.filenames);
+	for (i = 0; i < env.allow_filter_cnt; i++) {
+		free(env.allow_filters[i].file_glob);
+		free(env.allow_filters[i].prog_glob);
+	}
+	free(env.allow_filters);
+	for (i = 0; i < env.deny_filter_cnt; i++) {
+		free(env.deny_filters[i].file_glob);
+		free(env.deny_filters[i].prog_glob);
+	}
+	free(env.deny_filters);
 	return -err;
 }
diff --git a/tools/testing/selftests/bpf/veristat.cfg b/tools/testing/selftests/bpf/veristat.cfg
new file mode 100644
index 000000000000..1a385061618d
--- /dev/null
+++ b/tools/testing/selftests/bpf/veristat.cfg
@@ -0,0 +1,17 @@
+# pre-canned list of rather complex selftests/bpf BPF object files to monitor
+# BPF verifier's performance on
+bpf_flow*
+bpf_loop_bench*
+loop*
+netif_receive_skb*
+profiler*
+pyperf*
+strobemeta*
+test_cls_redirect*
+test_l4lb
+test_sysctl*
+test_tcp_hdr_*
+test_usdt*
+test_verif_scale*
+test_xdp_noinline*
+xdp_synproxy*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2022-09-21 16:42 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add ability to filter programs in veristat Andrii Nakryiko
@ 2022-09-22  2:50 ` patchwork-bot+netdevbpf
  2022-09-22  2:52 ` Alexei Starovoitov
  5 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-22  2:50 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed, 21 Sep 2022 09:42:50 -0700 you wrote:
> Add three more critical features to veristat tool, which make it sufficient
> for a practical work on BPF verifier:
> 
>   - CSV output, which allows easier programmatic post-processing of stats;
> 
>   - building upon CSV output, veristat now supports comparison mode, in which
>     two previously captured CSV outputs from veristat are compared with each
>     other in a convenient form;
> 
> [...]

Here is the summary with links:
  - [v2,bpf-next,1/4] selftests/bpf: fix double bpf_object__close() in veristate
    https://git.kernel.org/bpf/bpf-next/c/f338ac910567
  - [v2,bpf-next,2/4] selftests/bpf: add CSV output mode for veristat
    https://git.kernel.org/bpf/bpf-next/c/e5eb08d8fe46
  - [v2,bpf-next,3/4] selftests/bpf: add comparison mode to veristat
    https://git.kernel.org/bpf/bpf-next/c/394169b079b5
  - [v2,bpf-next,4/4] selftests/bpf: add ability to filter programs in veristat
    https://git.kernel.org/bpf/bpf-next/c/bde4a96cdcad

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering
  2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2022-09-22  2:50 ` [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering patchwork-bot+netdevbpf
@ 2022-09-22  2:52 ` Alexei Starovoitov
  2022-09-22 16:21   ` Andrii Nakryiko
  5 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2022-09-22  2:52 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Wed, Sep 21, 2022 at 9:43 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Add three more critical features to veristat tool, which make it sufficient
> for a practical work on BPF verifier:

Nice tool!
Few requests:

1.
Please fix -v mode, since
veristat -v loop1.bpf.linked3.o
hangs forever. Maybe not forever, but I didn't have that much
time to find out whether it will finish eventually.

2.
Please add some sort of progress indicator to
veristat *.linked3.o
otherwise it's not clear when it will be done and
-v cannot be used to find out because of bug 1.

3.
could you make it ignore non bpf elf files?
So it can be used just with veristat *.o ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering
  2022-09-22  2:52 ` Alexei Starovoitov
@ 2022-09-22 16:21   ` Andrii Nakryiko
  0 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2022-09-22 16:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Müller
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team

On Wed, Sep 21, 2022 at 7:52 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 21, 2022 at 9:43 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add three more critical features to veristat tool, which make it sufficient
> > for a practical work on BPF verifier:
>
> Nice tool!

Thanks!

> Few requests:
>
> 1.
> Please fix -v mode, since
> veristat -v loop1.bpf.linked3.o
> hangs forever. Maybe not forever, but I didn't have that much
> time to find out whether it will finish eventually.
>

[vmuser@archvm bpf]$ sudo ./veristat -v loop1.bpf.linked3.o
libbpf: prog 'nested_loops': BPF program load failed: No space left on device
libbpf: prog 'nested_loops': failed to load: -28
libbpf: failed to load object 'loop1.bpf.linked3.o'
...<hangs>...

Yep, some bug if the program fails to validate due to -ENOSPC. I'll fix it.

BTW, I also want to extend this verbose capability to allow specifying
log_level = 2 for BPF verifier, because for successfully validated BPF
programs log_level = 1 is empty (except for stats).

> 2.
> Please add some sort of progress indicator to
> veristat *.linked3.o
> otherwise it's not clear when it will be done and
> -v cannot be used to find out because of bug 1.

just dumping every entry to console would be too verbose, so I was
thinking to overwrite single line with updated progress, if output is
a console. I'll look up how to do it.

>
> 3.
> could you make it ignore non bpf elf files?
> So it can be used just with veristat *.o ?

Yep, I can, will add that as a convenience.

Note that for now you probably want to do '*.bpf.linked3.o'. Daniel
added .bpf.o suffix which helped greatly, but I think we might want to
tweak this a bit more to make sure that what we have as .bpf.linked3.o
is just .bpf.o, and initial .bpf.o (which might require further static
linking to be complete) will be called something else.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-22 16:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-21 16:42 [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering Andrii Nakryiko
2022-09-21 16:42 ` [PATCH v2 bpf-next 1/4] selftests/bpf: fix double bpf_object__close() in veristate Andrii Nakryiko
2022-09-21 16:42 ` [PATCH v2 bpf-next 2/4] selftests/bpf: add CSV output mode for veristat Andrii Nakryiko
2022-09-21 16:42 ` [PATCH v2 bpf-next 3/4] selftests/bpf: add comparison mode to veristat Andrii Nakryiko
2022-09-21 16:42 ` [PATCH v2 bpf-next 4/4] selftests/bpf: add ability to filter programs in veristat Andrii Nakryiko
2022-09-22  2:50 ` [PATCH v2 bpf-next 0/4] veristat: CSV output, comparison mode, filtering patchwork-bot+netdevbpf
2022-09-22  2:52 ` Alexei Starovoitov
2022-09-22 16:21   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox